I'm on the plane to London, reading a review copy of a book with the - I have to say - unappealing title A Damp Squid. (Thereby hangs a tale, of course, which you can read when the book is published, in December.) It contains a lot of thought-provoking stuff about how dictionaries - in particular, the OED - are now made and what else we can learn from the tools that lexicographers use to make them.
Case in point: "collocates," words that go together naturally and relatively commonly, and "collocations," combinations of such words - for instance, "eccentric behavior" but "quirky perspective." Today's lexicographers can generate statistics about how often a given word appears before, after, and in the vicinity of other particular words. This helps them zero in on precise definitions, but the idea is interesting to me for other reasons.
It brings to mind clichés and the puzzle of how these differ from good collocations. Writers are constantly being told and telling themselves to use "fresh" language. If instead of "case in point," above, I'd written "case at issue," would that be fresh language, or would it just be weird? (I'd say the latter.) Is "fresh language" itself a cliché, or is it a desirable collocation? (In between.)
I'd love to be turned loose on the "corpora" - vast collections of text and speech - from which lexicographers generate those statistics. It would be fun to find out whether "fresh language" is stale and "eccentric perspective" quirky. My suspicion is that I'd just be quantifying what's known as an ear for language, and the project would be about as useful as, and useful in a similar way to, figuring out the differences in molecular composition between good and mediocre food. But let's see if I get a chance to ask the Oxonians about collocations. Information often contains surprises.
PS: My computer's power ran out before I got to the end of the book. The material toward the end is on subjects I know well, like "style wars" and "usages people hate." It furthered my suspicion that quantification has its limits.