updated 4/18/2012 2:53:12 PM ET 2012-04-18T18:53:12

Grading a batch of freshman composition essays can leave a teacher feeling like an automaton, but does that mean a robot could take over the job? Researchers at the University of Akron in Ohio recently found that eight commercially available automated essay grading programs, plus one open-source solution from Carnegie Mellon University researchers, gave similar scores to papers as human graders. The researchers presented their results April 16 at the annual conference of the National Council on Measurement in Education. 

"The demonstration showed conclusively that automated essay scoring systems are fast, accurate and cost effective," Tom Vander Ark, one of the study's directors and CEO of digital learning consulting company Open Education Solutions, said in a statement. The promising results may mean more auto-grading in state standardized tests or in cash-strapped community colleges, which would not have to pay readers or use up instructor time in grading essays.

The researchers collected 22,029 essays written by 7th-, 8th- and 10th-graders for several different state standardized tests. They included all types of essays, including narrative, persuasive and descriptive essays and responses to provided reading. The essays had already been graded by people, using whatever rubrics applied to the tests for which the essays were originally written. 

Robo-graders employ natural language processing, a computer science technique that Twitter-mining programs and other automated readers use. Like all natural language processing programs, the graders first analyze sample essays graded by expert humans to learn what good and bad essays look like. Then they apply those lessons to new essays.

To test the automated readers, researchers gave essay-grading companies a sample set of essays, plus their human-given scores, to use to train their programs. Together, the eight commercial programs in the study provide 97 percent of the world's robo-essay-grading today. After the training period, the companies ran their programs on a new set of testing essays the researchers gave them. Researchers compared the automatically-generated scores with the human scores. A series of statistical tests found the automated scores were very close to human scores. 

"This demonstration of rapid and accurate automated essay scoring will encourage states to include more writing in their state assessments," said Barbara Chow of the William and Flora Hewlett Foundation, which funded the study. The foundation is also sponsoring a contest that promises $100,000 to whoever can program the best essay-grading machine. Programmers need to submit by April 30.

Preparing for essay tests will help students improve their critical thinking and communication skills, Chow said. Whether standardized assessments help anybody at all is under debate, however, so it's sure to be a point of contention if adding robo-graded essays will help.

One major weakness in automated grading programs is that though they learn by example, a machine's version of learning by example is not quite the same as a person's. Machines see different things than people do. For example, because of the lack of program-readable paragraph tags in the essays entered in the study, the grading programs couldn't detect paragraph breaks, according to the University of Akron paper. Paragraph breaks are important to the effectiveness of an essay's structure and a human could easily see where paragraphs are. 

The same researchers have two more robo-grader assessments planned. One will test short-answer-grading programs and another will test programs for evaluating graphs, proofs and formulas on math tests.

Extra credit: Les Perelman, director of a writing program at MIT, posted one automated grader's humorously flawed results to the comments section of an article at Inside Higher Ed. Be sure to read the program's glowing praise at the end of the essay.

© 2012 All rights reserved. More from


Discussion comments


Most active discussions

  1. votes comments
  2. votes comments
  3. votes comments
  4. votes comments