Although everyone is worried about ChatGPT’s impact on English class essays, artificial intelligence has significantly advanced understanding, which may permanently alter how we test. The most recent language model from OpenAI, GPT-4, which was released earlier this week, can now easily ace LSATs, bar exams, and other assessments for higher education.
According to information gleaned from an openly available whitepaper by OpenAI and noted in a series of tweets by Wharton professor Ethan Mollick, GPT-4 achieved scores in the 90th percentile for the universal bar exam, the 88th percentile on the LSAT, and the 93rd percentile on the SAT Evidence-Based Reading and Writing, totaling nearly perfect scores on more than two dozen renowned exams (it seemed to have some issues with AP English).
GPT-4 passed a mock bar exam
GPT-4 performing strongly on standardized tests like BAR, LSAT, etc.
Most people take several years of study and months of prep to pass one of these exams. https://t.co/rTz5YvLzzQ
— Chris Staudinger (@ChrisStaud) March 15, 2023
“We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning,” Open AI explained in a blog post. “GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, despite being less competent than humans in many situations, it performs at a human-level on a variety of professional and academic criteria.
The GPT-4 is also improving exponentially; although the GPT-3.5 had a score “in the bottom 10%,” the GPT-4 passed a mock bar exam with a score in the top 10% of test takers. But what’s truly unsettling about GPT-4 is how readily it responds to visual cues. In seven examples offered by OpenAI, GPT-4 was capable of decoding jokes, memes, charts, and a document that was shown as an image.
Is it a matter of concern?
So, do you need to worry? Yes, in certain respects. “GPT-4 has quite similar limitations as earlier GPT models,” even OpenAI admits. Adding further, “most importantly, it still is not fully reliable (it ‘hallucinates’ facts and makes reasoning errors). Great care should be taken when using language model outputs, particularly in high-stakes contexts, with the exact protocol (such as human review, grounding with additional context, or avoiding high-stakes uses altogether) matching the needs of a specific use-case.”
The language model, on the other hand, has significantly advanced in just a few months, indicating that AI is getting closer to being a trustworthy instrument that may be utilized responsibly in a variety of different fields. This may encourage people to abandon standardized testing, which has its own drawbacks. Remember that scores make up a significant portion of the data we use to evaluate applicants for higher education programme. This information may need to be revisited,” said a different Wharton professor.