Unfortunately, it appeared that it's very hard to build a model which is relatively "generic". The language models are very domain-dependent, it's almost impossible to build a good language model for every possible text. Books are almost useless for conversational transcription, no matter what amount of book texts do you have. And, you need terabytes of data in order to reduce accuracy just 1%.
Still, the released language model is an attempt to do so. More importantly, the source texts used to build the langauge model are more-or-less widely available, so it will be possible to extend and improve the model in the future.
I found it quite interesting to solve this problem of domain-dependence. Despite the common fact that trigram models work "relatively well", in fact they do not. This survey I find very relevant
TWO DECADES OF STATISTICAL LANGUAGE MODELING WHERE DO WE GO FROM HERE? by Ronald Rosenfeld
Brittleness across domains: Current language models are extremely sensitive to changes in the style, topic or genre of the text on which they are trained. For example, to model casual phone conversations, one is much better off using 2 million words of transcripts from such conversations than using 140 million words of transcripts from TV and radio news broadcasts. This effect is quite strong even for changes that seem trivial to a human: a language model
trained on Dow-Jones newswire text will see its perplexity doubled when applied to the very similar Associated Press newswire text from the same time period...