Next:
Part I: Tutorial Overview
Up:
HTKBook
Previous:
HTKBook
Contents
Part I: Tutorial Overview
1 The Fundamentals of HTK
1.1 General Principles of HMMs
1.2 Isolated Word Recognition
1.3 Output Probability Specification
1.4 Baum-Welch Re-Estimation
1.5 Recognition and Viterbi Decoding
1.6 Continuous Speech Recognition
2 An Overview of the HTK Toolkit
2.1 HTK Software Architecture
2.2 Generic Properties of a HTK Tool
2.3 The Toolkit
2.3.1 Data Preparation Tools
2.3.2 Training Tools
2.3.3 Recognition Tools
2.3.4 Analysis Tool
2.4 Whats New in Version 2.0?
2.4.1 Whats New in Version 2.1?
3 A Tutorial Example of Using HTK
3.1 Data Preparation
3.1.1 Step 1 - the Task Grammar
3.1.2 Step 2 - the Dictionary
3.1.3 Step 3 - Recording the Data
3.1.4 Step 4 - Creating the Transcription Files
3.1.5 Step 5 - Coding the Data
3.2 Creating Monophone HMMs
3.2.1 Step 6 - Creating Flat Start Monophones
3.2.2 Step 7 - Fixing the Silence Models
3.2.3 Step 8 - Realigning the Training Data
3.3 Creating Tied-State Triphones
3.3.1 Step 9 - Making Triphones from Monophones
3.3.2 Step 10 - Making Tied-State Triphones
3.4 Recogniser Evaluation
3.4.1 Step 11 - Recognising the Test Data
3.5 Running the Recogniser Live
3.6 Summary
Part II: HTK in Depth
4 The Operating Environment
4.1 The Command Line
4.2 Script Files
4.3 Configuration Files
4.4 Standard Options
4.5 Error Reporting
4.6 Strings and Names
4.7 Memory Management
4.8 Input/Output via Pipes and Networks
4.9 Byte-swapping of HTK data files
4.10 Summary
5 Speech Input/Output
5.1 General Mechanism
5.2 Speech Signal Processing
5.3 Linear Prediction Analysis
5.4 Filterbank Analysis
5.5 Energy Measures
5.6 Delta and Acceleration Coefficients
5.7 Storage of Parameter Files
5.7.1 HTK Format Parameter Files
5.7.2 Esignal Format Parameter Files
5.8 Waveform File Formats
5.8.1 HTK File Format
5.8.2 Esignal File Format
5.8.3 TIMIT File Format
5.8.4 NIST File Format
5.8.5 SCRIBE File Format
5.8.6 SDES1 File Format
5.8.7 AIFF File Format
5.8.8 SUNAU8 File Format
5.8.9 OGI File Format
5.8.10 WAVE File Format
5.8.11 ALIEN and NOHEAD File Formats
5.9 Direct Audio Input/Output
5.10 Multiple Input Streams
5.11 Vector Quantisation
5.12 Viewing Speech with HLIST
5.13 Copying and Coding using HCOPY
5.14 Version 1.5 Compatibility
5.15 Summary
6 Transcriptions and Label Files
6.1 Label File Structure
6.2 Label File Formats
6.2.1 HTK Label Files
6.2.2 ESPS Label Files
6.2.3 TIMIT Label Files
6.2.4 SCRIBE Label Files
6.3 Master Label Files
6.3.1 General Principles of MLFs
6.3.2 Syntax and Semantics
6.3.3 MLF Search
6.3.4 MLF Examples
6.4 Editing Label Files
6.5 Summary
7 HMM Definition Files
7.1 The HMM Parameters
7.2 Basic HMM Definitions
7.3 Macro Definitions
7.4 HMM Sets
7.5 Tied-Mixture Systems
7.6 Discrete Probability HMMs
7.7 Tee Models
7.8 Binary Storage Format
7.9 The HMM Definition Language
8 HMM Parameter Estimation
8.1 Training Strategies
8.2 Initialisation using HINIT
8.3 Flat Starting with HCOMPV
8.4 Isolated Unit Re-Estimation using HREST
8.5 Embedded Training using HEREST
8.6 Single-Pass Retraining
8.7 Parameter Re-Estimation Formulae
8.7.1 Viterbi Training (HINIT)
8.7.2 Forward/Backward Probabilities
8.7.3 Single Model Reestimation(HREST)
8.7.4 Embedded Model Reestimation(HEREST)
9 HMM System Refinement
9.1 Using HHED
9.2 Constructing Context-Dependent Models
9.3 Parameter Tying and Item Lists
9.4 Data-Driven Clustering
9.5 Tree-Based Clustering
9.6 Mixture Incrementing
9.7 Miscellaneous Operations
10 Discrete and Tied-Mixture Models
10.1 Modelling Discrete Sequences
10.2 Using Discrete Models with Speech
10.3 Tied Mixture Systems
10.4 Parameter Smoothing
11 Networks, Dictionaries and Language Models
11.1 How Networks are Used
11.2 Word Networks and Standard Lattice Format
11.3 Building a Word Network with HPARSE
11.4 Bigram Language Models
11.5 Building a Word Network with HBUILD
11.6 Testing a Word Network using HSGEN
11.7 Constructing a Dictionary
11.8 Word Network Expansion
11.9 Other Kinds of Recognition System
12 Decoding
12.1 Decoder Operation
12.2 Decoder Organisation
12.3 Recognition using Test Databases
12.4 Evaluating Recognition Results
12.5 Generating Forced Alignments
12.6 Recognition using Direct Audio Input
12.7 N-Best Lists and Lattices
Part III: Reference Section
13 The HTK Tools
13.1 HBuild
13.1.1 Function
13.1.2 Use
13.1.3 Tracing
13.2 HCompV
13.2.1 Function
13.2.2 Use
13.2.3 Tracing
13.3 HCopy
13.3.1 Function
13.3.2 Use
13.3.3 Trace Output
13.4 HDMan
13.4.1 Function
13.4.2 Use
13.4.3 Tracing
13.5 HERest
13.5.1 Function
13.5.2 Use
13.5.3 Tracing
13.6 HHEd
13.6.1 Function
*
AT i j prob itemList(t)
13.6.2 Use
13.6.3 Tracing
13.7 HInit
13.7.1 Function
13.7.2 Use
13.7.3 Tracing
13.8 HLEd
13.8.1 Function
13.8.2 Use
13.8.3 Tracing
13.9 HList
13.9.1 Function
13.9.2 Use
13.9.3 Tracing
13.10 HLStats
13.10.1 Function
13.10.2 Bigram Generation
13.10.3 Use
13.10.4 Tracing
13.11 HParse
13.11.1 Function
13.11.2 Network Definition
13.11.3 Compatibility Mode
13.11.4 Use
13.11.5 Tracing
13.12 HQuant
13.12.1 Function
13.12.2 VQ Codebook Format
13.12.3 Use
13.12.4 Tracing
13.13 HRest
13.13.1 Function
13.13.2 Use
13.13.3 Tracing
13.14 HResults
13.14.1 Function
13.14.2 Use
13.14.3 Tracing
13.15 HSGen
13.15.1 Function
13.15.2 Use
13.15.3 Tracing
13.16 HSLab
13.16.1 Function
13.16.2 Use
13.16.3 Tracing
13.17 HSmooth
13.17.1 Function
13.17.2 Use
13.17.3 Tracing
13.18 HVite
13.18.1 Function
13.18.2 Use
13.18.3 Tracing
14 Configuration Variables
14.1 Configuration Variables used in Library Modules
14.2 Configuration Variables used in Tools
15 Error and Warning Codes
15.1 Generic Errors
15.2 Summary of Errors by Tool and Module
16 HTK Standard Lattice Format (SLF)
16.1 SLF Files
16.2 Format
16.3 Syntax
16.4 Field Types
16.5 Example SLF file
Index
About this document ...
Next:
Part I: Tutorial Overview
Up:
HTKBook
Previous:
HTKBook
ECRL HTK_V2.1: email
[email protected]