Translate into a different language

Saturday, November 19, 2016

How Well Can Computers Read Fiction? | The Atlantic

Photo: Veronique Greenwood
"Computational tools have the ability to analyze books’ emotional arcs, but it’s unclear what they can really find out about literature." according to Veronique Greenwood, writer based in New York.

A scanner passes over a book at the University of Michigan in Ann Arbor as part of Google's Book Search project.
Photo: Carlos Osorio / AP

In recent years, literature has been getting attention from an unusual quarter: mathematics. Alongside statistical physicists analyzing the connections between characters in the Icelandic sagas, and computer scientists exploring the life and death of words in English fiction, a team of mathematicians at the University of Vermont have now looked at more than 1,000 texts to see if they could automatically extract their emotional arcs. And their results show something interesting, not just about narratives, but also about using this approach to study literature.

The Vermont researchers worked with test subjects to create a program capable of assigning emotional value—positive, negative or neutral—to words. ‘Terrorist’ is rated negative in the program’s word bank, while ‘win’ is positive. Then they selected texts from the massive volunteer effort to digitize books known as Project Gutenberg, which currently exists as a repository of public-domain writings. Finally, the researchers ran a series of analyses to chart the shape of the emotional arcs in the texts.

And indeed, according to the paper put up on in June 2016, some patterns showed up again and again. About 85 percent of the works that the researchers looked at could be separated into six groups. Some of the groups lent themselves to colorful names—such as ‘Icarus,’ for an emotional type that rises, then falls; and ‘Rags to Riches,’ for one that starts negative and then rises. Some of Gutenberg’s most-downloaded works fit the ‘Cinderella’ model, with a rise, a fall, and a rise. You can see how you might start to draw conclusions about what stories play best, or how small the true number of arcs in human storytelling is.

But looking closer at the books originally included in the study, you might start to question the reliability of those results. To begin with, the analysis used not only Robinson Crusoe and A Christmas Carol, but books such as Notes on Nursing and A History of Art for Beginners. A compilation of Hans Christian Andersen tales was handled as if it was a single story, rather than a series of stand-alone narratives. The book that fit the Icarus arc best was a collection of 196 yoga sutras. Another odd marriage was the ‘Cinderella arc and its top fit: Boethius’ The Consolation of Philosophy.

Something is not quite right here, and indeed, this is one of the difficulties of doing automated analysis. It is a touchy business to take a large chunk of information, like all the books available on Project Gutenberg, and filter them so that the answers you get match the question you think you’re asking. Andrew Reagan, the graduate student who is the paper’s lead author, readily agrees—even getting to this hodgepodge of texts took a great deal of weeding on his part. Project Gutenberg, after all, is thick with dictionaries and poems and even the text of the Human Genome Project, all of which had to be removed.

Source: The Atlantic  

If you enjoyed this post, make sure you subscribe to my Email Updates!