Since today, Thursday October 1, is the feast day of the great Little Flower, Saint Thérèse of Lisieux, I naturally was going... eh?
Oh, excuse me. That's Doctor Little Flower!
Of course it is... what a great thought for us, this dear little sister who, though cloistered, helps out those who live far away... amazing.
Well, perhaps I ought to do something doctoral to celebrate.
All right. Let's try using my very own doctoral skills on Chesterton. Note: don't try this at home, or even on the INTERNET, I mean out here in the e-cosmos. I am a professional, and know what I'm getting into. You might get all kinds of nondeterministic effects, after all! Whew. All right, now that I took care of my warning, let's go.
I wondered whether there was some quick way to get a glimpse of Chesterton's vocabulary use over his writing career, and realized that my work on the uniqueness of rRNA strings could readily be applied to his writing. I thought it would be interesting to learn what words might appear exactly ONCE in any given year of his ILN essays. If we could acquire the signatures, grouped by years, for his ILN essays, we might get a hint of how his vocabulary altered.
What's a signature? That's what we called a portion of a rRNA sequence which we found to be unique to a given species. That is, the signature is some sequence of RNA bases which appears only in one species, and in no other, so it can therefore act as a signature of that species. In the same way, if there is a word which appears only in GKC's ILN essays for 1911 (say), but never in any other of his ILN essays, then that word is a signature for 1911. (As you will learn, "signatory" is a signature for 1911 - talk about paradoxes!)
So I dusted off the machinery, rubbed my hands a few times, said the usual starting prayers for software development - hey, I use 13th century metaphysics, since I want to get things done (see Heretics CW1:46 for more!) Then I proceeded with the experiment. Heh, heh, heh. (No, that's NOT my usual "hee hee" - that's the doctoral mad scientist laugh. We doctors take special classes to learn to do it effectively, along with how to wear those funny little beanies, and Latin, and all kinds of fun things. It's great.)
Since I am also an engineer, I used some tricks, and devised a tidy little linear-time algorithm (which took lots less time than it does to tell you) And then I wrote the program, and ran it. (Actually I run the program as I write it, which was something I learned to do long before I became a doctor.) And I got some interesting results - and then I also checked the results, since I know what happens when one does not check one's work... it makes one's boss very unhappy, and one's customer FURIOUS... But things looked good, so I decided I could risk telling you here.
Of course 1905 and 1936 are the smallest, since he only wrote for parts of those years, and the others (such as 1915 and 1920) are on the low side. As I examined the list I noted that there are some indications that a handful of words are still spelled incorrectly in AMBER, and there are a few hyphenation issues also. But I did some checks, and the signatures appear to be authentic:
For example, "aggregate" only appears in 1905 and "circumlocutions" in 1906 and "Ecuador" in 1909.... but there are a goodly number of others.
Here is the list of the years and signatures:
And here is a graph showing the same information:
Very curious, you say, but what does it mean?
Well... one might make any sort of argument about what all this means, but I am not trying to argue anything at all. I merely wanted to give a suitable tribute as a Chestertonian Computer Scientist (and a doctor) to our Doctor Little Flower for her feast day. I am sure she will have a good laugh with GKC and FBC about it.