Sunday, July 29, 2018

Building the language model for dialogs

Building the language model for dialogs


Im in search how to build a combined language model suitable for dialog decoding. I have quite a lot of dialog transcriptions, but they arent comparable with generic model built from the large corpora from the view of the coverage. It would be nice to combine them somehow to get the structure of the first model and the diversity of the second one. In one article I read its possible just to interpolate them lineary. So probably I just need to get closer in touch with SRILM toolkit

Its discouraging that sphinx4 doesnt support high-order n-grams. Another article mentions a solution for that to join some often word combinations into compound words.

Btw, generic model gives 40% accuracy while home-groun dialog model gives 60, so its a promising direction anyhow.

visit link download