Blog · 13 Jul 2022

What is the history of multiple-choice exams? What is its impact on education?

Christine Lee

Content Manager

The multiple-choice exam format is a common way to assess student learning. Multiple-choice questions (or MCQs) are lauded as an efficient and objective method for measuring the breadth of student knowledge, often used in standardized and accreditation exams. Its prevalence has made it a subject of pedagogical research, making it a widely studied format.

Before going into potential ways to ensure multiple-choice questions uphold assessment with integrity , it’s important to consider the historical context of multiple-choice exams and how they came to be so prevalent.

Quantifying intelligence became an educational focus at the turn of the 20th century. In 1905, French psychologist Alfred Binet and collaborator Theodore Simon created the Binet-Simon Intelligence Scale –a precursor to the modern IQ test. (NEA )

The origin of multiple choice testing, which aims to quantify knowledge, is widely attributed to Frederick J. Kelly, author of the Kansas Silent Reading Test . Kelly’s intentions were to eliminate subjectivity and increase efficiency in evaluating student reading ability. In 1914, he believed the multiple-choice format was the best way to achieve these goals (Veritas Journal, 2019 ).

During this time period, American secondary education underwent a dramatic shift. New laws made two years of high school a requirement and as a result, students attending secondary schools escalated fivefold. Just as educators grappling with large and over-enrolled classrooms today embrace MCQs, an efficient and quick testing methodology became a necessary and welcome solution in 1914 (Davidson, 2011 ).

According to Watters, “in conjunction with twentieth century [sic] futurism, it was nod [sic] towards an education system that could become more automated.” The benefits of multiple-choice exams then have the same benefits today: 1) They have the perception of being more objective; 2) Tests can be graded quickly and at scale; and 3) Multiple-choice enables standardization (Watters, 2015 ).

Let’s now focus on the word “standardization.”

With increased student enrollment came a need for communication between institutions. Grading became more standardized , objective, and scale-based; so did testing, led by multiple-choice exams and the belief that the MCQ format was “scientific.” Testing, too, could not be specific to students or institutions but have meaning to third parties. To that end, the United States’ College Entrance Examination Board implemented the Scholastic Aptitude Test (SAT) in 1926 using multiple-choice questions, shifting from a prior oral and essay format. (Wolfhurst & Ruzicka, 2021 ). “Because masses of students could all take the same test, all be graded in the same way, and all turned into numbers crunched to yield comparative results, they were ripe to become yet another product of the machine age: statistics, with different statisticians coming up with different psychometric theories about the best number of items, the best number of questions, and so forth” (Davidson, 2011 ).

By the 1930s, multiple-choice tests and other right/wrong binary questions like true-false questions, became widely used in United States schools (Ramirez, 2013 ).

Multiple-choice questions also proliferated in global education, particularly through standardized testing and the desire to quantify student learning across institutions. Australia’s adoption of multiple-choice assessment formats resemble the United States’ and reflect population changes and increased student enrollment. According to Klenowski and Wyatt-Smith , “The emergence of Australia’s testing industry can be understood against a background of three main historical phases in assessment: industrialisation and universal schooling at the turn of the 20th century; the rise of the middle class and capitalism in the middle of the century; and the emergence of calls from the field to centre on views of education and purposes for schooling and assessment” (Earl, 2005 ).

East Asia adopted multiple-choice testing to meet the needs of mass education, particularly for secondary and higher education entrance examinations. In Japan, particularly, “Many Japanese educators felt that it was impossible to guarantee fair scoring of essay questions. Many also believed that essay questions and interviews conferred an unfair advantage on children from economically and culturally privileged families.” This cultural context led to the proliferation of multiple-choice questions as a widely accepted format, which Shiro states leaves out expository writing, debate, and other communication skills (Shiro, 2016 ).

Kelly, the aforementioned educator credited with the multiple-choice question format, came to rue his own innovation, which enabled standardized testing. In fact, he tried to reverse the course of standardization while President of University of Idaho, touting a liberal arts curriculum emphasizing general and critical thinking. In 1930, he was dismissed for resisting modernization (Veritas, 2019 ).

Today, many educators would sympathize with Kelly’s final sentiment. Many pedagogists believe, and go as far as to state, “Multiple-choice tests are not catalysts for learning. They incite the bad habit of teaching to tests and not covering material that will not be evaluated” (Ramirez, 2013 ). Yet–the ubiquitous nature of multiple-choice exams indicate that they fill a necessary gap, even if a pragmatic and logistical one, in teaching and learning.

For teachers–multiple-choice exams, simply put, are quicker to grade. Students prefer them, too–finding them much easier to study for–which adds to their ongoing popularity as an assessment format.

The above factors can make multiple-choice exams a shortcut solution; in other words, saving time for teachers and emphasizing test-taking strategy and memorization over student mastery of concepts. So how can one ensure that multiple-choice exams uphold assessment integrity?

Here are some critiques of the multiple-choice exam format and ways to enable fair and inclusive exam design and accurate measurement of student learning:

If multiple-choice questions cannot test deep conceptual knowledge , one way to uphold assessment with integrity is to include different formats within an exam’s design. Multiple-choice questions can test breadth of knowledge in a short period of time–and when paired with short answer questions that measure depth of knowledge–can evaluate a wide range of student learning.

Of course, when it comes to academic integrity, it’s important to mitigate cheating; multiple-choice exams are vulnerable to misconduct precisely because answers can be easily memorized or passed on between students. To that end, instructors can provide different versions of a multiple-choice test.

Multiple-choice exams fall short when it comes to formative feedback loops. The most common feedback with multiple-choice questions is the binary indication of whether a student response is correct or wrong. This right/wrong feedback limits student learning potential. According to researchers, one way to provide more detailed feedback on multiple-choice formats is “highlighting why various options are correct or incorrect (e.g. C is correct because … versus B can’t be correct because … etc.). This comparative approach across response options can help learners to bridge knowledge gaps but may also be limited by an overt focus on specific details of the individual item. This form of feedback should help learners to transfer any gained knowledge to new but similar items (i.e. near transfer) but, due to its specificity, may not potentiate far transfer (i.e. transfer to novel problems in unfamiliar contexts, or novel and structurally related problems)” (Ryan, et al., 2020 ).

Eliminate the obviously incorrect options within a multiple-choice question. Research suggests, “If the directions spell out that students should select ‘the best answer,’ ‘the main reason’ or the ‘most likely’ solution, that means some of the answer options can be correct but not as correct as the right answer, which means that those questions require more and deeper thinking” (Weimer, 2018 ). Make guessing more difficult and accurately measure student knowledge.

Item analysis can shore up multiple-choice exam formats and pinpoint questions that are too easy or too difficult, as well as highlight other irregularities and student response patterns. For instance, item analysis can help identify options students never choose, and thus eliminate obviously incorrect options.

The credibility of assessment is critical—for academic reputation and integrity. Multiple-choice questions were introduced with the best intentions of improving efficiency and increasing objectivity of student learning measurements. They are widely used, because they do uphold efficiency and do measure the breadth of student knowledge; instructors who understand the history of multiple-choice exams and the limits of this format and make appropriate adjustments can uphold assessment with integrity.

Learn more about Gradescope by Turnitin