Speech Technology (eBook)

Theory and Applications
eBook Download: PDF
2010 | 2010
XXVII, 331 Seiten
Springer US (Verlag)
978-0-387-73819-2 (ISBN)

Lese- und Medienproben

Speech Technology -
Systemvoraussetzungen
106,99 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

This book gives an overview of the research and application of speech technologies in different areas. One of the special characteristics of the book is that the authors take a broad view of the multiple research areas and take the multidisciplinary approach to the topics. One of the goals in this book is to emphasize the application. User experience, human factors and usability issues are the focus in this book.



Dr. Chen is an associate professor at the Computing Science Department, Chalmers
University of Sweden. She has been working in Human Factors and Ergonomics
research for over 20 years and has published over 20 papers in the cognitive science
specially related to speech technology application. Chen has over 20 years teaching and
research experience on ergonomics, human factors and human-computer interaction.
She has been teaching on human cognition, human-computer interaction, usability and
user-centered design, and research methodology in undergraduate and graduate level. In
the past 8 years, her research interests in focused on speech and multimodal interaction
design in different applications.


Dr. Jokinen is a Professor of Language Technology at the University of Helsinki. She
has played a leading role in several academic and industrial research projects
concerning spoken dialogue systems, cooperative communication, adaptation, and
multimodality. She has published a large number of articles and papers, organized many
workshops in major international workshops, and given several invited talks and
seminars. She is the secretary of the SIGDial, the ISCA/ACL Special Interest Group for
Discourse and Dialogue.


New Trends in Speech Based Interactive Systems will give an overview of the research and application of speech technologies in different areas. The basic technology development areas include: Automatic speech recognition technology, speech synthesis, spoken interaction, natural language understanding, speaker recognition, emotion in spoken dialogue systems, expressive speech synthesis, affective computing, multimodal communication, interaction technologies, and animated agents. The application areas include: in-vehicle information system and interaction, military application, other industrial applications such as for space, air traffic control, or complicated control room interaction, simulator control, etc., and application for special users and in entertainment systems. General design and usability evaluation methodologies from the user's perspective will also be included in the book.The overview of each technology includes: 1. The history and the development of the technology. 2. What are the hot research topics/interests in the technology? 3. What kind of mistakes/assumption that people has tried, so to avoid repeating the mistakes. 4. What can be the special interests/problems in different application areas? 5. The trend of the technical development in the coming few years.The overview of each application area includes: 1. The history of the application studies. 2. What kind of speech technologies are interested in the application. 3. The analysis of the factors that affect the real-time application. 4. What can be the possible usability requirements and interaction design problems? 5. The trend of the research focus in the coming few years.One of the special characteristics of the book is that the authors will not just present their own research work, but take a broad view of the multiple research areas and take the multidisciplinary approach to the topics. One of the goals in this book is to emphasize the application. User experience, human factors and usability issues are the focus in this book.

Dr. Chen is an associate professor at the Computing Science Department, ChalmersUniversity of Sweden. She has been working in Human Factors and Ergonomicsresearch for over 20 years and has published over 20 papers in the cognitive sciencespecially related to speech technology application. Chen has over 20 years teaching andresearch experience on ergonomics, human factors and human-computer interaction.She has been teaching on human cognition, human-computer interaction, usability anduser-centered design, and research methodology in undergraduate and graduate level. Inthe past 8 years, her research interests in focused on speech and multimodal interactiondesign in different applications. Dr. Jokinen is a Professor of Language Technology at the University of Helsinki. Shehas played a leading role in several academic and industrial research projectsconcerning spoken dialogue systems, cooperative communication, adaptation, andmultimodality. She has published a large number of articles and papers, organized manyworkshops in major international workshops, and given several invited talks andseminars. She is the secretary of the SIGDial, the ISCA/ACL Special Interest Group forDiscourse and Dialogue.

Preface 5
Acknowledgments 10
Contents 11
Contributors 13
List of Acronyms 15
About the Authors 18
About the Editors 24
1 History and Development of Speech Recognition 25
1.1 Introduction 25
1.2 Five Decades of Progress in Speech Recognition 25
1.2.1 The First-Generation Technology (1950s and 1960s) 26
1.2.2 The Second-Generation Technology (Late 1960s and 1970s) 26
1.2.3 The Third-Generation Technology (1980s) 28
1.2.4 The Third Generation, Further Advances (1990s) 29
1.2.5 The Third Generation, Further Advances (2000s) 30
1.2.6 Summary of Technological Progress 31
1.2.7 Changes in the Past Three Decades 33
1.3 Research Issues toward the Fourth-Generation ASR Technology 34
1.3.1 How to Narrow the Gap Between Machine and Human Speech Recognition 34
1.3.2 Robust Acoustic Modeling 35
1.3.3 Robust Language Modeling 37
1.3.4 Speech Corpora 38
1.4 Conclusion 39
References 39
2 Challenges in Speech Synthesis 43
2.1 Introduction 43
2.2 Thousand Years of Speech Synthesis Research 45
2.2.1 From Middle Ages Over Enlightenment to Industrial Revolution: Mechanical Synthesizers 45
2.2.2 The 20th Century: Electronic Synthesizers 47
2.3 The Many Hats of Speech Synthesis Challenges 48
2.3.1 Evaluation, Standardization, and Scientific Exchange 48
2.3.2 Techniques of Speech Synthesis 50
2.3.2.1 Concatenative Synthesis 51
2.3.2.2 HMM-Based Synthesis 52
2.3.2.3 Voice Conversion 53
2.4 Conclusion 54
References 54
3 Spoken Language Dialogue Models 57
3.1 Introduction 57
3.2 Historical Overview 58
3.2.1 Early Ideas of a Thinking Machine 59
3.2.2 Experimental Prototypes and Dialogue Models 60
3.2.3 Large-Scale Projects: From Written to Spoken Dialogues 61
3.2.4 Dialogue Technology: Industrial Perspectives 63
3.2.5 Current Trends: Towards Multimodal Intelligent Systems 65
3.3 Dialogue Modelling 66
3.3.1 Dialogue Management Models 66
3.3.2 Discourse Modelling 67
3.3.2.1 Top-Down Approach 67
3.3.2.2 Dialogue Act and Plan-Based Approaches 68
3.3.2.3 Bottom-Up Approach 70
3.3.3 Conversational Principles 72
3.3.4 HCI and Dialogue Models 74
3.4 Conclusion 75
References 77
4 The Industry of Spoken-Dialog Systems and the Third Generation of Interactive Applications 85
4.1 Introduction 85
4.2 A Change of Perspective 86
4.3 Beyond Directed Dialog 88
4.4 Architectural Evolution and Standards 89
4.5 The Structure of the Spoken-Dialog Industry 93
4.6 The Speech Application Lifecycle 94
4.7 Speech 3.0: The Third Generation of Spoken-Dialog Systems 97
4.8 Conclusions 99
References 100
5 Deceptive Speech: Clues from Spoken Language 102
5.1 Introduction 102
5.2 Perceptual and Descriptive Studies of Deception 103
5.3 Practitioners Lore 106
5.4 Computational Approaches to Deceptive Speech 107
5.4.1 Lexical and Semantic Analysis 107
5.4.2 Voice Stress Analysis 107
5.5 Machine-Learning Approaches 108
5.6 Conclusion 109
References 109
6 Cognitive Approaches to Spoken Language Technology 112
6.1 Introduction 112
6.1.1 Limitations of Current Technology 112
6.1.2 What Is Missing? 113
6.2 Models of Natural Cognition 114
6.2.1 Cognitive Science 115
6.2.2 Hierarchical Control 116
6.2.3 Emulation Mechanisms 117
6.2.4 Mirror Neurons 117
6.3 Artificial Cognitive Systems 119
6.3.1 Embodied Cognition 119
6.3.2 Grounding Language 120
6.4 Roadmap for the Future 121
6.4.1 The Way Forward? 121
6.4.2 A New Scientific Discipline: Cognitive Informatics 123
References 124
7 Expressive Speech Processing and Prosody Engineering: An Illustrated Essay on the Fragmented Nature of Real Interactive Speech 127
7.1 Introduction 127
7.2 Prosodic Information Exchange 128
7.2.1 Natural Interactive Speech 128
7.2.2 Two-Way Interactive Speech 130
7.2.3 Speech Fragments 133
7.3 Acoustic Correlates of Discourse-Related Non-verbal Speech Sounds 133
7.3.1 Voice Quality, Prosody, and Affect 134
7.3.2 Multi-speaker Variation in Prosody and Tone-of-Voice 135
7.4 Technological Applications 138
7.4.1 Discourse Flow and Prosody Engineering 139
7.4.2 Sensing Affect Detecting Changes in People from Variation in Their Speaking Style and Tone-of-Voice
7.4.3 Toward the Synthesis of Expressive Speech 140
7.5 Discussion 141
7.6 Conclusion 142
References 142
8 Interacting with Embodied Conversational Agents 144
8.1 Introduction 144
8.2 Types of Conversational Settings 145
8.2.1 TV-Style Presenters 146
8.2.2 Virtual Dialogue Partners 147
8.2.3 Role-Plays and Simulated Conversations 147
8.2.4 Multi-threaded Multi-party Conversation 148
8.3 Dialogue Management 149
8.4 Communicative Signals 151
8.5 Emotional Signals 154
8.6 Expressive Behaviours 155
8.7 Perceptive Behaviours 157
8.8 Social Talk 158
8.9 Design Methodology for Modelling ECA Behaviours 160
8.10 Evaluation of Verbal and Non-verbal Dialogue Behaviours 162
8.10.1 Studies Focusing on the Relationship Between Verbal and Non-verbal Means 162
8.10.2 Studies Investigating the Benefit of Empirically Grounded ECA Dialogue Behaviours 163
8.10.3 Studies Investigating the Dialogue Behaviours of Humans Interacting with an ECA 163
8.10.4 Studies Investigating Social Aspects of ECA Dialogue Behaviours 164
8.11 Conclusion 165
References 165
9 Multimodal Information Processing for Affective Computing 171
9.1 Introduction 171
9.2 Multimodal-Based Affective HumanComputer Interaction 172
9.2.1 Emotional Speech Processing 172
9.2.2 Affect in Facial Expression 174
9.2.3 Affective Multimodal System 176
9.2.4 Affective Understanding 177
9.3 Projects and Applications 178
9.3.1 Affective-Cognitive for Learning and Decision Making 178
9.3.2 Affective Robot 178
9.3.3 Oz 178
9.3.4 Affective Facial and Vocal Expression 179
9.3.5 Affective Face-to-Face Communication 179
9.3.6 Humaine 179
9.4 Research Challenges 180
9.4.1 Cognitive Structure of Affects 180
9.4.2 Multimodal-Based Affective Information Processing 181
9.4.3 Affective Features Capturing in Real Environments 181
9.4.4 Affective Interaction in Multi-agent Systems 182
9.5 Conclusion 182
References 183
10 Spoken Language Translation 187
10.1 The Dream of the Universal Translator 187
10.2 Component Technologies 188
10.2.1 Speech Recognition Engines 188
10.2.2 Translation Engines 189
10.2.3 Synthesis 191
10.3 Specific Systems 192
10.3.1 SLT and MedSLT 194
10.3.2 Phraselator 200
10.3.3 Diplomat/Tongues 201
10.3.4 S-MINDS 205
10.3.4.1 ASR Component 205
10.3.4.2 Translation Component 206
10.3.4.3 N-Best Merging of Results 207
10.3.4.4 User Interface Component 208
10.3.4.5 Speech Synthesis Component 208
10.3.4.6 Evaluations 208
10.4 Further Directions 209
References 210
11 Application of Speech Technology in Vehicles 214
11.1 Introduction 214
11.2 Complicated Vehicle Information Systems 215
11.3 Driver Distraction Due to Speech Interaction 217
11.4 Speech as Input/Output Device 219
11.4.1 Noise Inside Vehicles 219
11.4.2 Identify Suitable Functions 222
11.5 Dialogue System Design 222
11.6 In-Vehicle Research Projects 223
11.7 Commercial In-Vehicle Dialogue Systems 225
11.8 Multimodal Interaction 226
11.9 Drivers States and Traits, In-Vehicle Voice and Driving Behavior 227
11.9.1 Driver States and Traits 227
11.9.2 Emotions and Driving 228
11.9.3 Age of Voice, Personality, and Driving 230
11.10 Usability and Acceptance 233
References 234
12 Spoken Dialogue Application in Space: The Clarissa Procedure Browser 239
12.1 Introduction 239
12.2 System Overview 241
12.2.1 Supported Functionality 241
12.2.2 Modules 242
12.3 Writing Voice-Navigable Documents 243
12.3.1 Representing Procedure-Related Discourse Context 244
12.4 Grammar-Based Recognition 246
12.4.1 Regulus and Alterf 246
12.4.2 Using Regulus and Alterf in Clarissa 247
12.4.3 Evaluating Speech Understanding Performance 249
12.5 Rejecting User Speech 251
12.5.1 The Accept/Reject Decision Task 252
12.5.2 An SVM-Based Approach 253
12.5.2.1 Choosing a Kernel Function 254
12.5.2.2 Making the Cost Function Asymmetric 254
12.5.3 Experiments 255
12.6 Side-Effect Free Dialogue Management 256
12.6.1 Side-Effect Free Dialogue Management 257
12.6.2 Specific Issues 258
12.6.2.1 ''Undo'' and ''Correction'' Moves 258
12.6.2.2 Confirmations 259
12.6.2.3 Querying the Environment 260
12.6.2.4 Regression Testing and Evaluation 260
12.7 Results of the On-Orbit Test 260
12.8 Conclusion 262
12.8.1 Procedures 262
12.8.2 Recognition 262
12.8.3 Response Filtering 262
12.8.4 Dialogue Management 263
12.8.5 General 263
12.8.5.1 A Note on Versions 264
Appendix: Detailed Results for System Performance 264
The Recognition Task 264
12.0.1 The Accept/Reject Task 266
Kernel Types 266
Asymmetric Error Costs 267
Recognition Methods 267
References 267
13 Military Applications: Human Factors Aspects of Speech-Based Systems 269
13.1 Introduction 269
13.2 The Military Domain 269
13.2.1 Users 270
13.2.2 Technology 271
13.2.3 Environment 273
13.3 Applications 275
13.3.1 Air 275
13.3.2 Land 277
13.3.3 Sea 279
13.4 General Discussion 280
13.4.1 Users 280
13.4.2 Technology 281
13.4.3 Environment 282
13.5 Future Research 282
13.5.1 Challenges 282
13.5.2 Recommendations for Future Research 284
References 285
14 Accessibility and Design for All Solutions Through Speech Technology 289
14.1 Introduction 289
14.1.1 Text and Speech Media 290
14.1.2 Multimedia 290
14.2 Applications for Blind blind or Partially Sighted Persons 291
14.2.1 Screen-reader 291
14.2.2 Screen-readers' Technical Requirements 293
14.2.3 Relationship with the TTS Module 294
14.2.4 Audio-Browsing Tools 295
14.2.5 General Purpose Speech-Enabled Applications 296
14.2.6 Ambient and Security Problems for the Blind User 297
14.3 Applications for the Mobility Impaired mobility impaired 298
14.4 Applications for the Speech Impaired speech impaired 302
14.5 Applications for the Hearing Impaired hearing impaired 304
14.6 Applications for the Elderly elderly 305
14.7 Accessibility and Application 307
14.7.1 Navigation in Built Environments and Transportation 307
14.7.2 Access to Complex Documents 309
14.7.3 Applications for Instructional Games 312
14.7.4 Accessibility to Ebooks 313
14.8 Conclusion 315
References 316
15 Assessment and Evaluation of Speech-Based Interactive Systems: From Manual Annotation to Automatic Usability Evaluation 318
15.1 Introduction 318
15.2 A Brief History of Assessment and Evaluation 319
15.2.1 Performance and Quality 320
15.3 Assessment of Speech-System Components 321
15.3.1 Assessment of Speech Recognition 323
15.3.2 Assessment of Speech and Natural Language Understanding 323
15.3.3 Assessment of Dialog Management 324
15.3.4 Assessment of Speech Output 325
15.4 Evaluation of Entire Systems 326
15.4.1 Detection and Classification of Interaction Problems 327
15.4.2 Parametric Description of Interactions 328
15.4.3 Subjective Quality Evaluation 328
15.4.4 Usability Inspection 329
15.5 Prediction of Quality Judgments 330
15.6 Conclusions and Future Trends 331
15.6.1 Multimodal, Adaptive, and Non-task-Oriented Systems 332
15.6.2 Semi-automatic Evaluation 333
15.6.3 Quality Prediction 334
References 335
Index 340

Erscheint lt. Verlag 1.7.2010
Zusatzinfo XXVII, 331 p.
Verlagsort New York
Sprache englisch
Themenwelt Geisteswissenschaften
Mathematik / Informatik Informatik
Technik Elektrotechnik / Energietechnik
Technik Nachrichtentechnik
Schlagworte Affective computing • Chen • Communication • Design • Information • in-vehicle information system • Jokinen • language • Speech processing • Speech Recognition • Speech Synthesis • Speech Technology • Spoken Dialogue System • Usability • User Experience
ISBN-10 0-387-73819-3 / 0387738193
ISBN-13 978-0-387-73819-2 / 9780387738192
Haben Sie eine Frage zum Produkt?
Wie bewerten Sie den Artikel?
Bitte geben Sie Ihre Bewertung ein:
Bitte geben Sie Daten ein:
PDFPDF (Wasserzeichen)
Größe: 5,5 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasser­zeichen und ist damit für Sie persona­lisiert. Bei einer missbräuch­lichen Weiter­gabe des eBooks an Dritte ist eine Rück­ver­folgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
Konzepte, Methoden, Lösungen und Arbeitshilfen für die Praxis

von Ernst Tiemeyer

eBook Download (2023)
Carl Hanser Verlag GmbH & Co. KG
69,99
Konzepte, Methoden, Lösungen und Arbeitshilfen für die Praxis

von Ernst Tiemeyer

eBook Download (2023)
Carl Hanser Verlag GmbH & Co. KG
69,99