Of Papers and Processors

By Ryan Smits

The first draft of this post was written by hand to preserve the intimate process of writing as a form of thinking and meditation. There were no notifications, spelling/grammar suggestions, auto-corrections, or other commands from machines to disconnect mind, hand, pen, and paper in the penning of these thoughts.

I attended a session at the 2022 AERA (American Educational Research Association) annual conference about automated writing evaluation that I have been reflecting on. In an effort to maintain an attitude of technoskepticism rather than technodefeatism, I went to a session titled “Automated Writing Trait Analysis to Support Instruction.” This session promised to “review the use of natural language processing to build automated trait models and examine the potential for such models to provide richer formative information for teachers.” To accomplish this they directed their models to be “trained on 1.37 million submissions to ETS’s [Educational Testing Service] digital writing service, Criterion®.” The session was led by employees from ETS and, predictably, it was an hour of listening to them sing the praises of automated writing evaluation. In all fairness, it was remarkable to hear about the process and sophistication that went into training their model, but it was all the more frightening to imagine a future where writing instruction is guided by machines rather than people.

For the uninitiated, ETS is the world’s largest private nonprofit educational testing and assessment organization. They are responsible for the GRE, TOEFL, and teacher Praxis exams to name just a few of the major assessments they are responsible for. Criterion is an online writing evaluation service that they sell licenses for and it utilizes what they call e-rater. I could not find a disclaimer on their website, but I suspect that every essay that has been submitted through Criterion is collected to further train their evaluation models. I imagine that the actual language is buried in the terms of service when one “agrees” to use their product (because students have a choice where they turn in assignments, right?). Turnitin also uses the same e-rater engine, which I assume more of us are familiar with. I couldn’t help but wonder if one of my old class essays was sucked up into their dataset.

They explained that the large corpus of essays allows for a more complete picture of all the variation across different genres of writing. They used natural language processing to determine features of writing, factor analysis for variation within features, and profiles to display how developed one’s writing is. After they sufficiently praised the sophistication of their model, they explained how they combined the e-rater, WAVES, and Text Evaluator engines into one comprehensive engine to address more features of writing. In short, e-rater outputs scores and analyzes conventions, WAVES outputs feedback through suggestions to the writer, and Text Evaluator outputs a readability analysis. 

With their powers combined they can automate writing evaluation and feedback! At what point are we going to just start all using GPT3 to generate text and feed it into automated writing evaluation software producing some bizarre feedback loop that has left the intended authors and readers of essays behind — humans.

Endnotes

 1. This description was obtained from the AERA conference agenda.

2.  See http://www.fundinguniverse.com/company-histories/educational-testing-service-history/ 

3.  For example, Algorithms of Oppression by Safiya Noble & Race After Technology by Ruha Benjamin

Previous
Previous

Should we be more like the Luddites?

Next
Next

Why you can’t pay attention to this book review of Stolen Focus