What Can Computers Teach Us About Poetry?

Colossus ComputerThe idea that analysing poetry with computers could teach us anything about the art is controversial. A recent survey I conducted of more than 300 tech-savvy poets confirmed that–while they generally agree that technology has been good for poetry in terms of fostering community, creating networking opportunities, and providing remote learning–they would rather computer scientists keep the ones and zeroes away from their iambs and spondees.

Intuitively, this makes sense–after all, we write poems for people, not machines. Poetry is one of the most intimately human of activities. Yet analytical methods, properly interpreted, can reveal new aspects of poetry that we readers and writers might miss. Blind spots can be corrected, what we sense intuitively can be confirmed scientifically, and computers may indeed help us to see old words with new eyes.

Analysis of aesthetic matters must be conducted on aesthetic terms. For this reason, many of the recent computer analyses of poems have relied on data from psycholinguistic research, wherein subjects were asked to classify different words across a range of dimensions such as the concreteness of imagery evoked, the emotionality (positive or negative) elicited by the word, and how easy or hard a word might be to define.

In 2000, Richard Forsyth used statistical methods to analyse differences between some of the more successful (i.e. published) and obscure (i.e. unpublished) poems from well-known poets writing in English. Subsequently, in 2012, Justine Kao and Dan Jurafsky conducted a statistical analysis of “professional” (published in a reputable anthology) versus “amateur” (available on an amateur poetry website) poems.

Here are tables showing some of the statistically-significant findings from those two studies:

SuccessfulObscure
Fewer Syllables per WordMore Syllables per Word
Fewer Letters per WordMore Letters per Word
More Common WordsMore Rare Words
Less Diversity of Word ChoiceGreater Diversity of Word Choice
ProfessionalAmateur
More Concrete WordsMore Abstract Words
More Likely to Use Approximate RhymesMore Likely to Use Perfect Rhymes
Less AlliterationMore Alliteration
Fewer Highly-Emotional WordsMore Highly-Emotional Words

These findings make sense in light of contemporary trends in poetry since the advent of free verse. Now we have a statistical quantification of trends that literary critics have understood qualitatively for some time.

In 2013, Michael Dalvean extended the work of Kao and Jurafsky using machine learning instead of statistical methods, analysing the same base of “amateur” and “professional” poems. The approach of classifying types of documents based on what computers can be “taught” about their various properties is what has given us useful and effective spam filters. So, why not classify poems?

The findings from this more sophisticated analysis match the previous study in some ways–the “professional” poems can be identified by a computer looking for concrete words with low emotionality, just as the “professional” poems were found to have a statistically significant increase in these traits as well.

Here the findings diverge, however. The computers attempt at “learning” what an amateur poem is (or is not) also found ways to use prepositions (such as “the” and “a”) as part of the basis for its decision making. That is certainly not an obvious approach to a human critic. Combining several of these rules together, the computer achieved an 80% success rate in classifying poems correctly.

However, I believe it is a mistake to argue, as Dalvean does, that a successful computerised method of classifying “amateur” and “professional” poems into either one or the other could be extended to create a spectrum between the two, and thereby extrapolated out to automatically “rank” poetry in terms of its objective quality.

As a practicing poet, I know that beyond some of the “mistakes” that amateurs make in approaching poetry–such as using flowery, emotive, and abstract language–rating the quality of serious poets’ work is a highly subjective matter. Even some of our greatest critics cannot agree (though I suspect that any one of them could surely spot an amateur poem with at least an 80% degree of accuracy). So, while sifting out truly “amateur” poems may something we can teach a computer to do, we must take with a grain of salt the idea that machines will be “grading” our poems in a way that would match some kind of human consensus about quality anytime soon.

That said, by approaching computerised analysis of poetry for the sake of informing literary criticism, we may indeed make new discoveries. I recently conducted a simple analysis of more than 3,000 poems published in Poetry Magazine–arguably the most reputable American poetry magazine of our time. I used a computer to count and analyse the frequency of words. Comparing this to a similar analysis of poems accepted in the tenure of the most recent editor, Don Share, I was surprised to find that the same “kinds” of words showed up most frequently both in the historical and present-day analysis.

Some on social media incorrectly took this as a criticism, since these “poetry words” do seem, in isolation, like they are trying rather too hard to be poetic. In context, though, it was interesting to discover that such words only appear 11% of the time on average in any one poem. There is also a massive “long tail” (42%) of words that only get used once. This is interesting in light of Forsyth’s findings about the low overall diversity of word choice in successful poems. As a consequence, we might consider that these “poetry words” are like salt–the right amount enhances the meal, whereas too much almost certainly can ruin it.

It may be that we poets have always been chasing after some element of the sublime (represented by these “poetry words”), but succeeding in doing so in contemporary poetry only to the extent that we can reinvent these timeless themes (aided, in part, by the “long tail” of non-repeating words). However, this is not something we would necessarily conclude on our own as readers of poetry.

This is because computers counting words are not reading for meaning. They highlight the similarities where a human reader sees the differences. Could it be that such non-human methods can therefore give insights into decidedly human phenomena, such as individual and collective subconscious preoccupations?

For now, one thing is clear–there is much still to be gained by applying analytical methods, aided by computers, to poetry. The key to supporting the poetry community with such findings is to interpret the results to the benefit of literary criticism, using machines to help us peek under the rocks that we might not otherwise inspect.

As computers become increasingly a part of the fabric of our lives, it can be tempting to try to keep them separate from the timeless tradition of language arts. Yet by embracing what is best about about computing in service to human experience, we have much to gain. A new breed of cyborg literary critics–human at the core, but enhanced by technology–may be able to tell us more about poetry now than ever.

Call me “RoboPoet”.