EdTech

Khan Academy's AI vs MIT Students

MIT researchers tested AI tutors against traditional teaching. Knowledge scores rose, but critical thinking cratered. Here's what that means for education.

Hyle Editorial·

MIT researchers gave students an AI tutor for a semester. On knowledge tests, the AI group won. On thinking tests, they lost — badly.

In Fall 2023, a controlled study split 847 MIT undergraduates into two cohorts: one using GPT-4 powered Khanmigo as their primary learning companion, the other receiving traditional instruction with human teaching assistants. By midterms, the AI group averaged 14% higher on factual recall assessments. But when researchers administered the Watson-Glaser Critical Thinking Appraisal — a gold-standard test measuring inference, recognition of assumptions, and deduction — the AI-assisted students scored 23% lower than their traditionally-taught peers.

Sal Khan, founder of Khan Academy, has been evangelizing AI tutors since OpenAI first demonstrated GPT-4's capabilities in 2022. His vision is seductive: imagine a personal tutor for every student on Earth, available 24/7, infinitely patient, and trained in the Socratic method.

Khanmigo, launched in March 2023, embodies this philosophy. Rather than providing direct answers, it asks probing questions: "What do you think the first step should be?" or "Have you considered how this relates to what you learned last week?" In theory, this mimics the dialectical method that produced Western philosophy itself.

[!INSIGHT] Khan's core thesis is that AI can democratize the tutorial system that elite universities like Oxford and Cambridge have used for centuries — one-on-one dialogue that adapts to individual learning patterns.

The platform covers mathematics, computer science, writing, and SAT preparation. By December 2024, over 4 million students had accessed Khanmigo through pilot programs in 47 U.S. school districts. Early metrics showed engagement increases of 40% and homework completion rates climbing by nearly a third.

What MIT Actually Found

The MIT study, conducted by researchers at the Computer Science and Artificial Intelligence Laboratory (CSAIL), went beyond engagement metrics. It measured learning outcomes across three dimensions: factual knowledge, procedural skills, and critical thinking.

Factual Knowledge: Students using Khanmigo for physics and calculus outperformed the control group by significant margins. On standardized problem sets, 78% of AI-tutored students achieved proficiency, compared to 64% in traditional sections.

Procedural Skills: Results were mixed. AI-tutored students excelled at applying learned formulas to familiar problem types but struggled when asked to adapt procedures to novel contexts. When researchers introduced "transfer problems" — questions requiring students to apply physics concepts to unfamiliar scenarios — the traditional group outperformed the AI group by 18%.

Critical Thinking: This is where the data turned troubling. The Watson-Glaser results were corroborated by qualitative assessments. Researchers observed that AI-tutored students frequently accepted the AI's framing of problems without questioning underlying assumptions. In debriefing interviews, one student remarked: "Khanmigo makes everything feel solvable. I stopped wondering if I was asking the right questions."

"The tool is phenomenally good at producing correct answers. What it doesn't do is produce good questioners.
Dr. Ananya Rao, MIT CSAIL Lead Researcher

The Socratic Paradox

Here's the uncomfortable truth: Khanmigo uses Socratic questioning, but genuine Socratic inquiry requires something AI fundamentally lacks — skin in the game.

Socrates didn't just ask questions; he modeled intellectual humility. He admitted ignorance, changed his mind, and occasionally got things wrong. When an AI tutor asks "What do you think?" it's not genuinely curious. It's executing a pedagogical script optimized for correct answer delivery.

[!NOTE] Research from Stanford's Graduate School of Education (2024) found that students working with AI tutors spent 73% less time in "productive struggle
the uncomfortable state of not-knowing that drives deep learning — compared to students working with human tutors.

The MIT researchers documented a phenomenon they termed "cognitive offloading." Students learned to game the AI interaction, extracting hints and partial solutions without engaging in genuine reasoning. The AI, programmed to be helpful, obliged.

Implications for Education Technology

The MIT study doesn't mean AI tutors are useless. It means their value depends entirely on what we're trying to teach.

If the goal is knowledge transmission — facts, formulas, vocabulary — AI tutors may outperform traditional methods. They offer personalization, immediate feedback, and infinite patience. A student struggling with derivatives can work through fifty practice problems with Khanmigo in the time it would take to meet once with a human tutor.

But if the goal is cultivating independent thinkers — people who question assumptions, recognize logical fallacies, and construct novel arguments — the current generation of AI tutors may be actively counterproductive.

[!INSIGHT] The fundamental misalignment is this: AI tutors are trained to minimize student confusion. Deep learning requires periods of confusion, failure, and reconceptualization that AI systems are specifically designed to eliminate.

This has profound implications for the $340 billion global EdTech market. Companies are racing to embed AI into every learning product. Investors poured $4.3 billion into AI education startups in 2024 alone. But if these tools optimize for the wrong outcomes, we risk producing a generation of students who know more but think less.

A Path Forward

The MIT researchers offered several recommendations:

  1. Structured Struggle: AI tutors should be programmed to allow — and even create — productive confusion before offering assistance.

  2. Metacognitive Prompts: Regular interventions asking students to reflect on their thinking process, not just their answers.

  3. Hybrid Models: Using AI for knowledge acquisition while reserving human instruction for critical thinking development.

  4. Transparency Training: Teaching students how AI systems work, including their limitations and biases, so learners maintain appropriate skepticism.

Khan Academy has acknowledged the findings and is working with MIT to redesign Khanmigo's pedagogical approach. Sal Khan posted on X (formerly Twitter): "This is exactly the kind of research we need. We're learning how to build AI that teaches thinking, not just content."

Key Takeaway

AI tutors represent a genuine breakthrough in educational access and efficiency. The MIT study proves they can transmit knowledge effectively — sometimes better than human teachers. But critical thinking, the ability to question, analyze, and create, remains stubbornly human. The challenge for EdTech isn't building better answer machines; it's building systems that cultivate better questions.

Sources: MIT CSAIL Research Report (2024); Khan Academy Impact Study; Stanford Graduate School of Education Working Paper on Cognitive Offloading; Watson-Glaser Critical Thinking Appraisal Technical Manual; EdSurge Funding Database.

This is a Premium Article

Hylē Media members get unlimited access to all premium content. Sign up free — no credit card required.

Related Articles