Throughout this chapter, a text-based representation has been used for the external storage of HMM definitions. For experimental work, text-based storage allows simple and direct access to HMM parameters and this can be invaluable. However, when using very large HMM sets, storage in text form is less practical since it is inefficient in its use of memory and the time taken to load can be excessive due to the large number of character to float conversions needed.
To solve these problems, HTK also provides a binary storage format. In binary mode, keywords are written as a single colon followed by an 8 bit code representing the actual keyword. Any subsequent numerical information following the keyword is then in binary. Integers are written as 16-bit shorts and all floating-point numbers are written as 32-bit single precision floats. The repeat factor used in the run-length encoding scheme for tied-mixture and discrete HMMs is written as a single byte. Its presence immediately after a 16-bit discrete log probability is indicated by setting the top bit to 1 (this is the reason why the range of discrete log probabilities is limited to 0 to 32767 i.e. only 15 bits are used for the actual value). For tied-mixtures, the repeat count is signalled by subtracting 2.0 from the weight.
Binary storage format and text storage format can be mixed within and between input files. Each time a keyword is encountered, its coding is used to determine whether the subsequent numerical information should be input in text or binary form. This means, for example, that binary files can be manually patched by replacing a binary-format definition by a text format definition.
HTK tools provide a standard command line option (-B) to indicate that HMM definitions should be output in binary format. Alternatively, the Boolean configuration variable SAVEBINARY can be set to true to force binary format output.