GEMINILOGGBOOKOBERDADAISTICUS
Zipf's law in concrete poetry
Update on my concrete / generative poetry work, and a call for participation. For background, see my post from just over a year ago:
Lorologoi and Zipf's law
The project is making progress. The simplest part is to generate huge quantities of text. But there is no point in generating a long manuscript or thousands of short poems just because you can. I'm trying to make this readable literature. Reading it aloud is the most exacting test. Although I don't quite know what the words and sentences mean, I often have a vague sense of the poem's atmosphere. And I'm working on an appendix with a dictionary where some of the words are defined in a hybrid of the language of the poems themselves and a pan-european ad hoc pidgin language. The current stage of the work is mostly about manual editing of these computer generated poems.
In the first phase I tried various strategies to create new words with some similarity to existing languages. The words should be possible to pronounce, not necessarily easy for a non-native speaker of this quasi-language which of course has no native speakers. With some practice one can learn to read even rather awkward phonetic sequences. You can't know what flows smoothly and what doesn't without reading aloud. Now that I have some 200 pages of poems, I have to worry about structure, narrative construction, metaphors, rhymes and alliterations. Well, that's an exaggeration. Since the words mostly haven't settled on any fixed meaning yet, there is no way to work on the semantic level. But I'd like to imagine that I do, and in some sense I try to care about the meaning of these poems. But I'm making slow progress on finding out what word classes some of the words might plausibly belong to, and sometimes come up with definitive meanings. Rhymes have their place, although it is often difficult to find any suitable rhyming words in the word lists I have accumulated. Palindromes are usually easier, because I have a function that reverses words and "spell-checks" them against some rudimentary rules. Reversed words are only accepted if they pass the spell check.
In this stage a helpful tool is statistical analysis. I've found a snippet of Python code that counts word occurrences and lists a histogram in order from the most often used to all the hapax prolegomena. When you invent lots of new words and distribute them in poems the text will end up containing hekomals of one-off words. Currently there are short of 6000 unique words in a text that totals about 11000 words. Thus over half of the words occur only once. I suspect Finnegan's Wake is about as extreme in neologisms, and that's what makes it hard to read.
The most common words are single vowels, then some two-letter combinations. This would happen for purely combinatorical reasons, but I've boosted some of these words by putting them in a special file which is used to draw these short words (which might be articles, nouns, prepositions, conjunctives, pronouns, or numerals), or inserting them between two long words to ease up the flow.
Anyway, the histogram of word frequencies against rank has more or less the expected declining shape: roughly 1/r where r is the rank. Single occurrence words are extremely common compared to natural languages. As a literary critic might say, the reader is bombarded with new impressions all the time. Therefore I'm trying to collapse some of the rare words, especially short ones with similar spellings that function as fillers between the bones of nouns, verbs, and adjectives. Natural languages don't have many synonyms for "the," "a," "it," "this," "on," "is," and other short common words, but my language appears to have a few of each.
Although there is no explicit syntax, word order must also be taken into consideration. For a few words, I have established where in the sentence they should go, although there is enough poetic licence to jumble the order. Since most words still haven't been assigned any word class, they remain flexible in use. Maybe nouns could have tense like verbs? I also try to make phonetical adjustments in order to make the text easier to read, such as beginning a word with a vowel if the previous word ends with a consonant, and vice versa.
Zipf's law says that the frequency (number of occurrences) of a word is inversely proportional to the rank of the word, sorted by frequency. But Zipf's law has been found in many contexts, and proven to be possible to generate by many different processes. It would be more remarkable if all words in some large corpus (except for a plain word list) occurred equally often. Small discrepancies from Zipf's law can be found, and these are not random:
This article points out that the meaning of words must influence their frequency. Some concepts are spoken about more often than others. Some word classes, such as prepositions and articles, are fewer in number than nouns, verbs, and adjectives, but are needed to produce grammatically correct sentences. This is a fascinating analogue to the fact that each letter in a given language has its expected frequency, which is how secret codes are broken; each concept also has its expected frequency although it varies with context and topic.
How does rank / word frequency relate to syntax? How many single-use words can there be in a row? Clearly a log-log plot of the corpus of the Lorologoi follows the expected shape, but there are too many distinct words. The long tail is way too long to resemble that of most ordinary texts. Prolix writers may choose to display their exceptionally large vocabularies in their novels. On the other end of the spectrum, some parsimoniously worded poetry might get away with a rose and a dozen of other words repeated, reused, and permutated.
Attention! This way please
My latest video from the mail art project uses one of the poems sent to Yadamaniart. Make Art Fun Again!
Next I'll try to work on a longer radiophonic piece using readings of the poems. I should bring in some guest stars to read a few phrases and poems too. If you speak at least two different languages without too much accent, or if you like to imitate foreign languages and dialects, this could be something for you. Get in touch if you want to contribute to the project (my email address is on the main page). Sopranos, altos, counter-tenors, tenors, and bass voices are all welcome. Singing is optional. No compensation will be given, save for eternal fame.