What Can Computers Teach Us About Poetry?

Colossus ComputerThe idea that analysing poetry with computers could teach us anything about the art is controversial. A recent survey I conducted of more than 300 tech-savvy poets confirmed that — while they generally agree that technology has been good for poetry in terms of fostering community, creating networking opportunities, and providing remote learning — they would rather computer scientists keep the ones and zeroes away from their iambs and spondees.

Intuitively, this makes sense — after all, we write poems for people, not machines. Poetry is one of the most intimately human of activities. Yet analytical methods, properly interpreted, can reveal new aspects of poetry that we readers and writers might miss. Blind spots can be corrected, what we sense intuitively can be confirmed scientifically, and computers may indeed help us to see old words with new eyes.

Analysis of aesthetic matters must be conducted on aesthetic terms. For this reason, many of the recent computer analyses of poems have relied on data from psycholinguistic research, wherein subjects were asked to classify different words across a range of dimensions such as the concreteness of imagery evoked, the emotionality (positive or negative) elicited by the word, and how easy or hard a word might be to define.

In 2000, Richard Forsyth used statistical methods to analyse differences between some of the more successful (i.e. published) and obscure (i.e. unpublished) poems from well-known poets writing in English. Subsequently, in 2012, Justine Kao and Dan Jurafsky conducted a statistical analysis of “professional” (published in a reputable anthology) versus “amateur” (available on an amateur poetry website) poems.

Here are tables showing some of the statistically-significant findings from those two studies:

Successful Obscure
Fewer Syllables per Word More Syllables per Word
Fewer Letters per Word More Letters per Word
More Common Words More Rare Words
Less Diversity of Word Choice Greater Diversity of Word Choice
Professional Amateur
More Concrete Words More Abstract Words
More Likely to Use Approximate Rhymes More Likely to Use Perfect Rhymes
Less Alliteration More Alliteration
Fewer Highly-Emotional Words More Highly-Emotional Words

These findings make sense in light of contemporary trends in poetry since the advent of free verse. Now we have a statistical quantification of trends that literary critics have understood qualitatively for some time.

In 2013, Michael Dalvean extended the work of Kao and Jurafsky using machine learning instead of statistical methods, analysing the same base of “amateur” and “professional” poems. The approach of classifying types of documents based on what computers can be “taught” about their various properties is what has given us useful and effective spam filters. So, why not classify poems?

The findings from this more sophisticated analysis match the previous study in some ways — the “professional” poems can be identified by a computer looking for concrete words with low emotionality, just as the “professional” poems were found to have a statistically significant increase in these traits as well.

Here the findings diverge, however. The computers attempt at “learning” what an amateur poem is (or is not) also found ways to use prepositions (such as “the” and “a”) as part of the basis for its decision making. That is certainly not an obvious approach to a human critic. Combining several of these rules together, the computer achieved an 80% success rate in classifying poems correctly.

However, I believe it is a mistake to argue, as Dalvean does, that a successful computerised method of classifying “amateur” and “professional” poems into either one or the other could be extended to create a spectrum between the two, and thereby extrapolated out to automatically “rank” poetry in terms of its objective quality.

As a practicing poet, I know that beyond some of the “mistakes” that amateurs make in approaching poetry — such as using flowery, emotive, and abstract language — rating the quality of serious poets’ work is a highly subjective matter. Even some of our greatest critics cannot agree (though I suspect that any one of them could surely spot an amateur poem with at least an 80% degree of accuracy). So, while sifting out truly “amateur” poems may something we can teach a computer to do, we must take with a grain of salt the idea that machines will be “grading” our poems in a way that would match some kind of human consensus about quality anytime soon.

That said, by approaching computerised analysis of poetry for the sake of informing literary criticism, we may indeed make new discoveries. I recently conducted a simple analysis of more than 3,000 poems published in Poetry Magazine — arguably the most reputable American poetry magazine of our time. I used a computer to count and analyse the frequency of words. Comparing this to a similar analysis of poems accepted in the tenure of the most recent editor, Don Share, I was surprised to find that the same “kinds” of words showed up most frequently both in the historical and present-day analysis.

Some on social media incorrectly took this as a criticism, since these “poetry words” do seem, in isolation, like they are trying rather too hard to be poetic. In context, though, it was interesting to discover that such words only appear 11% of the time on average in any one poem. There is also a massive “long tail” (42%) of words that only get used once. This is interesting in light of Forsyth’s findings about the low overall diversity of word choice in successful poems. As a consequence, we might consider that these “poetry words” are like salt — the right amount enhances the meal, whereas too much almost certainly can ruin it.

It may be that we poets have always been chasing after some element of the sublime (represented by these “poetry words”), but succeeding in doing so in contemporary poetry only to the extent that we can reinvent these timeless themes (aided, in part, by the “long tail” of non-repeating words). However, this is not something we would necessarily conclude on our own as readers of poetry.

This is because computers counting words are not reading for meaning. They highlight the similarities where a human reader sees the differences. Could it be that such non-human methods can therefore give insights into decidedly human phenomena, such as individual and collective subconscious preoccupations?

For now, one thing is clear — there is much still to be gained by applying analytical methods, aided by computers, to poetry. The key is to supporting the poetry community with such findings is to interpret the results to the benefit of literary criticism, using machines to help us peek under the rocks that we might not otherwise inspect.

As computers become increasingly a part of the fabric of our lives, it can be tempting to try to keep them separate from the timeless tradition of language arts. Yet by embracing what is best about about computing in service to human experience, we have much to gain. A new breed of cyborg literary critics — human at the core, but enhanced by technology — may be able to tell us more about poetry now than ever.

Call me “RoboPoet”.


“School Trip” Read by Phil Abrams (Video)

The Public Poetry Series, sponsored by Fjords Review, aims to foster a person-to-person experience of poetry through video. The actor Phil Abrams has done a remarkable job reading my poem “School Trip” to camera.

<a href="https://www.youtube.com/watch?v=binga5XpTmU"><img src="http://cdn5.peakepro.com/files/2014/11/Screen-Shot-2014-11-19-at-21.08.45.png" alt="Phil Abrams reads &quot;School Trip&quot;" class="alignnone" style="width: 100%; max-width: 560px;"/><br/>Click here to watch the video</a>

He seems to feel and then say, unfolding his nuanced emotional range line-by-line in extreme close-up, embodying a kind of haggard, Giamatti-like anti-hero that is the perfect speaker for this poem.

Be sure to check out all the videos in the Public Poetry Series here.


Unconscious Preoccupations, Machine Revelations

Turnabout is fair play. Having analysed several thousand poems from Poetry magazine, I have decided to turn the same methodology on myself.

I analysed 5,751 words from the 79 poems from my current pamphlet The Silence Teacher and my forthcoming collection The Knowledge.

Here are my top twenty-five most commonly-used words:

  1. air (27)
  2. light (26)
  3. eyes (23)
  4. day (21)
  5. water (20)
  6. night (20)
  7. face (19)
  8. man (17)
  9. hands (17)
  10. hand (16)
  11. life (15)
  12. place (15)
  13. head (14)
  14. small (14)
  15. world (14)
  16. sound (13)
  17. hold (12)
  18. fingers (12)
  19. love (12)
  20. long (12)
  21. late (12)
  22. white (12)
  23. blue (12)
  24. dark (11)
  25. call (11)

Ouch. Far from the nuanced poet I aspire to be, this reads like I missed my calling writing second-rate Raymond Chandler pastiche, romance novels, or a truly bizarre hybrid of the two. But again, the frequently-used words are actually used relatively infrequently in each individual poem.

In my case, 17% of words in top 100 make their way into any one poem, whereas a once-again-considerable 43% of words in each poem are never repeated in another poem. To put it another way, my average poem is 72 words long, with 12 words in top 100 and 31 words that are never repeated in any other poem in either collection.

Interestingly, my individual top twenty-five lists isn’t an exact match of the top twenty-five for the other poems I analysed. For example, the number-one word across more than 3,000 high-quality Poetry magazine poems, “time”, is nowhere in my own top twenty-five. I suspect, however, that if I were to analyse poems on a poet-by-poet basis from these 3,000, the individual preoccupations and concerns would start to tease out, and many poets in Poetry would stray far from the norm as well.

Furthermore, the specific concerns of my two books are very different from one another, as illustrated in the following word clouds:

The Knowledge

the-knowledge

 

 

The Silence Teacher

the-silence-teacher

So, individually, we’re all very different, from poet to poet and book to book, each with our own unique preoccupations. Yet collectively, when pooled, these preoccupations seem to converge on specific words.

What fascinates me about this is that purely analytical methods can reveal aspects of poetry that we readers and writers consciously miss. Computers counting words are not reading for meaning. They are able to highlight the similarities where a human reader sees only the differences.

Could it be that such non-human methods can therefore give insights into decidedly human phenomena, such as the individual and collective subconscious?

It would be interesting to analyse a significantly larger base of poetry texts, to see if they continue to converge on certain words.

For now, one thing is clear — these “Apollonian” words (as Dave Bonta so rightly identified them to be) make a frequent occurrence in all kinds of poetry, including fresh and lively contemporary poems.

Perhaps we poets have always been chasing after some element of the sublime (the top one hundred), but succeed in doing so in the postmodern age only to the extent that we can reinvent these timeless themes (aided, in part, by the non-repeating words). Emerson tells us that poetry, “must be as new as foam, and as old as the rock.” Perhaps the right mix of foam-words and rock-words is part of that endeavour.

Humans write poems for other humans, not for machines. Yet the mechanical analysis of text, correctly interpreted and contextualised, may indeed help us to see old words with new eyes.


No Such Thing as Bad Words

“The Difference Between Medicine and Poison is in the Dose”

-Circa Survive (song title)

In response to my recent analysis of the frequency of words used in past issues of Poetry magazine, current editor Don Share issued me a good-humoured challenge:

So, I analysed 395 poems from 13 issues of Poetry edited by Don Share from October 2013 to November 2014.

I was at first surprised to discover that the nature of the results are not substantially different than those of the nearly 3,000 past issues.

The average poem is 92 words in length (again, once stop words have been excluded), containing 14% of words in the top-100 and 24% of words that were only used once across the 395 poems analysed.

Here are the top 25 words:

  1. time (137)
  2. light (104)
  3. night (94)
  4. long (93)
  5. love (93)
  6. man (92)
  7. eyes (92)
  8. white (89)
  9. world (87)
  10. face (83)
  11. air (82)
  12. left (81)
  13. black (79)
  14. water (78)
  15. head (76)
  16. life (75)
  17. day (71)
  18. hand (69)
  19. people (69)
  20. wind (68)
  21. inside (65)
  22. sea (64)
  23. red (62)
  24. things (61)
  25. lost (60)

I found this surprising because Don is, by reputation and in my experience, one of the most interesting and innovative editors around. He’s undeniably on the pulse of contemporary poetry. So why do these words seem like they come from the poems of a century ago?

I think the answer is pretty simple: there no bad words in poetry, only the overuse of “poetry words” in any single poem. No single poem analysed used even a fraction of the top twenty five, and I know that on average the majority of words (80-90%) in most poems were not from these top words. Furthermore, a substantial percentage of words showed up only once across all poems, which demonstrates a high degree of linguistic innovation.

Cumulatively, though, these words do keep showing up in poetry (and in Poetry). What is equally interesting to me is the idea that a certain number — in fact, just the right number — of these words may be sometimes necessary to make a poem what it is. These words are like salt — a little bit seasons things, but too much can ruin the dish.

Frequently-used words are used frequently for a reason. These words are terse, expressive, and acquired early in life. They hold a power that, if overused, derails our trust in the author, and defuses our interest in the poem.

Yet they also seem to be some of the great workhorses of our language. So, to me the moral here is: don’t be afraid to use them; but don’t wear these poor creatures out.

Good poems make use of the range of our language the way good painters make use of the range of their palette. To scoff at a composer choosing C-major or a painter choosing pure red is to miss the essentials of technique, context, and intention.

For this reason, to me, there are no bad words, only words used badly.

(Click here to read an analysis of my own poetry using the same methodology.)


Top “Poetry Words”

Having counted the occurrence of words in nearly 3,000 poems published in Poetry Magazine to create a parameterised random word generator, I am making some other interesting discoveries about these words.

First, as one Twitter user pointed out, the words that come up at each “frequency of occurrence” setting on the generator have their own distinct feel, as if very different types of poets might gravitate toward different clusters of words:

I also created a word cloud using Wordle of the top 100 most-used words, which reveals the nature of these words:

Poetry Words

They are all words of one or two syllables, the likes of which you might find in high concentration in my early angst-ridden adolescent poetry journals.

What is interesting, though, is that these words do not appear in high concentration. Of the more than 300,000 instances of words in these poems (the average being just over 100 words in total per poem), these words occur just 11% of the time.

So, the “average” Poetry magazine poem (though, in truth, there may be no such thing) is 106 words long, and incorporates 11 of these top “poetry words” per poem.

Here is a list of the top 25 “poetry words”, with their word counts:

  1. time (944)
  2. love (831)
  3. day (763)
  4. light (732)
  5. night (725)
  6. man (710)
  7. world (696)
  8. long (677)
  9. eyes (631)
  10. life (624)
  11. water (527)
  12. hand (509)
  13. white (506)
  14. air (495)
  15. body (495)
  16. dark (486)
  17. face (477)
  18. dead (463)
  19. heart (451)
  20. years (443)
  21. left (443)
  22. god (439) [both capitalisations combined]
  23. sky (436)
  24. sun (432)
  25. wind (432)

Note that while “Man” is sixth, woman is 59th on the list. “White” comes in at #11, “black” at #26. And the poets all-time top obsession is, of course, “time” (and then “love”, in that order).

I also classified these words by type using the Wordnet database. Nearly all of the words are nouns or verbs, with only a single modifier showing up in the top-100 list. That word is “hard” (at #93 on the list). I guess your writing teacher was right to suggest that you avoid piling on modifiers for emphasis.

That said, the infrequently-used words are also considerable in number. In fact, words that occur only once in the nearly 3,000 poems analysed make up 42% of all the words used. I wonder if high-quality prose could boast an equally sizeable “long tail” of unique words. Clearly, part of innovation in language involves vocabulary.

As Emerson once said, “Every word was once a poem.” These days, we have a lot to choose from.

(Click here to read the follow-up, with more analysis of “poetry words” and their implications.)


In Praise of Randomness

Random LettersSometimes I need a little help turning over the creative engine when starting a new poem. I have developed a tool that helps me to do just that, and am sharing it with the community in case it helps other poets to ignite their muse as well.

Poetic constraints — such as patterns of alliteration, metre, and rhyme — originally served as mnemonic devices in pre-literate societies. Patterned speech is inherently easier to remember, which is why recalling a nursery rhyme is still easier than memorising prose. Stylised forms of language remained in favour long after writing developed, but in the twenty-first century, the only requirement of a contemporary poet is that they somehow end up writing a poem.
Continue reading…


Revolutionising Poetry with Technology (Survey Results)

p:\First and foremost, thanks to the more than 300 people who took a minute or two out of their busy lives to respond to my brief survey. Clearly people want to record their opinions, and hear what others think, about poetry and technology.

You can see the general report of survey results here. I have also charted and analysed this information below, with some interesting conclusions.

Intention and Methods

First, I should say that the intention of this survey was not to get a broad picture of general attitudes toward poetry, but to focus on specific aspects in a specific group. For a good general analysis, I recommend the Poetry Foundation’s Poetry in America study.

Now, a brief word about my methods. I posted the survey to my website and my social media networks, where it was generously shared by a wide range of established and up-and-coming poets. I also posted this survey to two prominent amateur writer websites, where the focus is on community critique.
Continue reading…