The prospect of massive-scale online schooling seems to be all the rage
at the moment. Recent competing initiatives include Khan Academy,
OpenCourseWare, Udacity, Coursera, and edX (the latter ones sponsored
by top-name schools such Stanford, Harvard, or MIT, or else founded by
ex-faculty members). The idea of universal and free access to college
programs from top researchers has fired the imagination of many in
the blogosphere, and some have predicted the imminent collapse of
traditional universities in the face of this “
tsunami”.
As a
college educator myself, I felt compelled to survey one of these
courses, so as to assess their general quality, advantages, and
disadvantages. (Perhaps there would be some techniques that I could
fold into my own courses.) This summer, Sebastian Thrun's Udacity
unveiled a new course, Introduction to Statistics, taught by Thrun
himself, which I felt would be ideal for my purposes – my current
job largely specializing in teaching statistics at one of the
community colleges in the City University of New York (and my
master's degree being in Mathematics & Statistics). Having
enrolled, I proceeded through the entirety of the course, watching
all of the lecture videos and taking all of the web-based quizzes and
the final exam.
In
brief, here is my overall assessment: the course is amazingly,
shockingly awful. It is poorly structured; it evidences an almost
complete lack of planning for the lectures; it routinely fails to
properly define or use standard terms or notation; it necessitates
occasional massive gaps where “magic” happens; and it results in
nonstandard computations that would not be accepted in normal
statistical work. In surveying the course, some nights I personally
got seriously depressed at the notion that this might be standard
fare for the college lectures encountered by most students during
their academic careers.
Below I
will try to pick out a “Top 10” list of problems with the course.
These are not comprehensive, but I feel that they do give a basic
sense for the issues involved.
1. Lack of Planning
Generally,
the lectures and the overall sequence feel like they haven't been
planned out in advance (and as a result, they don't connect together very
well). One lecture is interrupted by a visitor walking into
Thrun's office as he records it, and this is left in the video itself
(Unit 17.8). Other lectures use a data set of students' guesses about
Thrun's weight for a hypothesis test on
his actual weight – which, not being a population parameter, is
totally incorrect and “an abuse” (as he admits himself in Unit
32.1); yet this semi-accidental data set was convenient to
access, and so was apparently considered acceptable.
But probably the best example of the lack of planning is how radically off-syllabus the
course went from its initial advertising. Now, I've taught courses where things didn't go
entirely according to plan – maybe a lecture went a half-day long,
but never in all my years of teaching has a course so massively
diverged from the initial plan or course description. Below you can compare the starting advertised syllabus (before any lectures were posted) to the revised final syllabus (after the lectures were actually produced). You'll see that they are remarkably different.
Initial syllabus:
Visualizing relationships in data – Seeing relationships in data
and predicting based on them; dealing with noise
Processes that generates data – Random processes; counting,
computing with sample spaces; conditional probability; Bayes Rule
Processes with a large number of events – Normal distributions;
the central limit theorem; adding random variables
Real data and distributions – Sampling distributions; confidence
intervals; hypothesis tests; outliers
Systematically understanding relationships – Least squares;
residuals; inference
Understanding more complex relationships – Transformation;
smoothing; regression for two or more variables, categorical
variables
Where to go next – Statistics vs machine learning; what to study
next; where statistics is used; Final exam
Corrected syllabus:
Visualizing relationships in data – Seeing relationships in data
and predicting based on them; Simpson's paradox
Probability – Probability; Bayes Rule; Correlation vs. Causation
Estimation – Maximum Likelihood Estimation; Mean, Mean, Mode;
Standard Deviation, Variance
Outliers and Normal Distribution – Outliers, Quartiles; Binomial
Distribution; Central Limit Theorem; Manipulating Normal
Distribution
Inference – Confidence intervals; Hypothesis Testing
Regression – Linear regression; correlation
Final Exam
2. Sloppy Writing
Now,
I've become fairly “religious” about the text of mathematics –
reading the details correctly, and writing with precision, being absolutely paramount. (And I've found that for my remedial students,
this fairly simple-sounding skill is a nearly insurmountable
stumbling block.) When I saw the Udacity interface, I was initially
excited; instead of a lecturer standing in front of a chalkboard, the
frame is focused on the writing surface, which gives us the
opportunity to highlight and be careful about the writing (this
being similar to Khan Academy, etc.) But soon I became keenly
disappointed at how poor and unclear the written presentation was.
There are
at least two related issues. The first is that new terms and symbols
are almost never given written definitions. Personally, I find that
discussions and questions usually return to the definitions of
terms, so setting those out carefully is the first and most important
task. Here, new terms are casually described in the audio track, but
they are neither technically careful nor visible to the viewer. I
think this is exacerbated by the course's commitment to not
following any textbook or other written source – after the first
encounter, there is no capacity to search, index, or reference back
to terms or definitions that you might need later on (and this holds
as well for specialized symbols for sums, products, conditionals,
logical operators, etc., that tend to materialize for the first time in
the middle of a problem).
But the
second issue is that the algebraic manipulations themselves are
uniformly sloppy and disjointed; some bits of the work will be
written down, the next bit discussed verbally, then another unrelated
scrap written down, etc. There are unfixed typos in words and
equations. Statements and tables go unlabeled, so when a problem is
done you can't tell from looking at it what the point was. Notation
varies unpredictably: at different points in the course, the symbols
μ,
x-bar, and E(x) are all used for the sample mean without introduction
or warning. Usually formulas are absent until given in summary at the
end of a section, and then disparaged as being “confusing and
complicated” (Unit 9.10) or “really clumsy” (Unit 9.15), which
I think is a great pedagogical loss for learning to read and write
math properly. At one point you get to see the assistant instructor write that “0.1 = 0.06561” (Problem Set 2.6), which to me is an unforgivable, cardinal sin. In many cases one would have to rely on
the discussion forums for a fellow student to present a clear and
complete piece of written math for any of the example problems.
3. Quiz Regime
The
pattern of lectures goes like this: A video nugget of a few minutes
will be shown (perhaps 2-5 minutes), which leads to a web-based quiz
question (prompting for retries until success), and then a brief
video explanation of the answer. In general, I like this idea of frequent questioning and I do
the same thing in my own classes: regular check-ins for myself and my
students that we've successfully communicated the ideas at hand.
But a
couple of things make this wonky here. One is that, obviously, the
communication is not really two-way; neither Thrun nor the system is
really “listening” to take note of when a presentation has misfired and
needs clarification. Another is that the quiz regime timing seems
forced and frequently not at a point when there is really a
legitimate new idea to check in on. I would guess that as much as
half the time a question is actually asked before students
have been given the tools to answer it, being used as a means of
introducing a new section. Things like, “Don't get disturbed if you
don't know the answer” (Unit 1.4), or “I'd be amazed if you got
this correct!” (Unit 9.13), are heard frequently. These kinds of
questions seem inherently unfair and, I can only imagine,
discouraging to many students.
4. Population and Sample
Astoundingly,
the Udacity Introduction to Statistics course manages to go almost
its entire length without ever mentioning or making
any distinction between the population and sample in a study. I say
I'm “astounded” because in my classes (and any one I've surveyed
or looked at), this is the key idea in introductory
inferential statistics. It's the very first thing that is mentioned
in my class (or the book), and it's the very last thing on the last
day, too. It's the entire reason why inferential statistics is
necessary in the first place. In fact, the very word “statistics”
means measures for one (sample) and not the other (population)
– but you'll never learn that from this class.
As a
result, Thrun goes the entire course using the symbols μ
and σ
to indicate the mean and standard deviation of both a random variable
(population) and a limited data set (sample), whereas normally they
indicate only the former. He'll switch between the two essentially
without notice, saying something like “the observed standard
deviation” (Unit 25.3), or “our empirical mean” (Unit 25.4).
The x-bar notation appears late in the course, mid-way through a
problem statement – and then being used to indicate the mean of a
population
in a hypothesis test, which is exactly reversed from normal usage
(Problem Set 5.5). And the customary (unbiased) formula for sample
standard deviation is entirely
missing from the course,
necessitating annotated instructor comments to point out that the
results you get from this class would not be acceptable in any other
venue (Unit 27.3).
5. Normal Curve Calculations
A
similar astounding absence: The entire sequence of Udacity's
Introduction to Statistics passes without ever calculating any values
for normal curves. Again, since the course is committed to being
independent of any outside resource (no textbook, no tables, no
statistical software suite), the result is that calculating
probabilities or values for normal distributions is simply impossible
and never occurs. Students don't have any opportunity to develop an
intuition for normal-curve probabilities. The Empirical Rule (the
68/95/99% rule-of-thumb for standard deviations) is never mentioned.
When the time comes to compute confidence intervals, Thrun is forced
to give the direction, “just multiply this value over here with
1.96 – the magic number!” (Unit 24.19), not having any way to
explain where this comes from, nor even mentioning at the time that
this is specific to a 95% confidence level.
Thrun
spends a surprising amount of time developing the actual formula
for a normal curve, but no calculations are made with it and its
utility in an introductory course is highly questionable. The absence
is doubly weird because at one point he asserts, “That's the
purpose of the normal distribution for the sake of this class... we
just do it for the normal distribution where things are relatively
easy to compute”. (Unit 20.15)
6. CLT Not Explained
Another
bizarre gap: what one would think to be the keystone to inferences
for a mean, the Central Limit Theorem (the fact that the distribution
of possible sample-mean values automatically takes on a normal shape
with large sample size) is never clearly stated, nor its importance
explained. There is an optional programming unit with the name in the
title (Unit 19), which does generate a bell-shaped histogram of a few
thousand randomized sample means, and ends by stating that how this
relates to the Central Limit Theorem will be discussed in the next
unit. The next unit is on the Normal Distribution, but it still
neglects to actually state the CLT, and instead winds up engaging in a
rather baroque discussion to wit, “it's a transition from a discrete
space of finitely many outcomes to a space of infinitely many
outcomes” (Unit 20.14). There's a later point where Thrun says,
“Remember the Central Limit Theorem? Remember what it said?”
(Unit 25.2), and weirdly, this is the first time he actually outright
(if very briefly) states it. This is cursorily tied into how
confidence intervals work (blink and you'll miss it), and also said
to relate to “1.96 the magic number” in an unverifiable way (Unit
25.2-3). It's enormously unclear, and I think a distressing misstep.
7. Bipolar Difficulty
Throughout
the course, lectures and exercises veer rapidly between utterly
trivial and nigh-impossible. I think this is a reflection of the
one-way communication channel, such that Thrun can't have any
awareness of what counts as easy and what counts as hard to the
students. Frequently the “problem sets” at the end of a section
will have work that is dramatically different than anything shown in
the lectures. The first half-dozen units of the class are fairly long and obvious presentations of reading different tables and charts
and linear relationships. Then at some point he switches into a
remarkably difficult “complete the proof” exercise demonstrating
that the sample mean is in fact the correct Maximum Likelihood
Estimator for the population mean (Problem Set 3.1; not that he uses
the terms sample/population) – granted that this is “optional”,
but the course hasn't had any proofs at all to that point, the
overall strategy of the proof isn't declared, and it involves
numerous calculus concepts. Even my graduate text in statistical
inference (Casella/Berger) felt compelled to present and explain that
proof in its entirety. (Later, when he revisits this same exercise
again in Unit 23, Thrun actually does finally explain the technique,
which I presume to be a response to earlier complaints in this
regard.)
Similar
whiplash will be experienced at other points in the course. For
example, one student wrote in the discussion forums for the course
(regarding a different problem), “Questions such as this one and
the one before it 'Many Flips' are counter productive. The previously
explained course material was mostly very smooth and gradual.
Reaching 'Many Flips' felt like crashing into a reinforced concrete
wall.” (
Link).
That's a perfect description of what I think the experience will be for many
first-time students.
8. Final Exam Certification
The
course ends with a web-based final exam with 16 questions in the same
vein as the section quizzes that have appeared all along. Upon completion, the student is able to print out a PDF “certificate of
accomplishment” saying that they've taken this course from
Udacity, with one of several success levels (Highest Distinction for
all 16 questions correct, High Distinction 13/16, Accomplishment
10/16, or Completion 8/16).
Now
obviously, a somewhat delicate issue is that this is a completely
worthless, faux-certification for a number of reasons. Obvious ones
would be: (1) Udacity has no accreditation, oversight, or recognition from any
outside body, and (2) the questions are all fixed and the
answers are probably posted somewhere online in full. But even more
importantly, and what really surprised me, was: (3) the fact that you
can re-submit all of your answers as many times as you like until
they are confirmed correct (just like the quizzes; and some are even
multiple-choice). Another would be: (4) the final exam is just remarkably
easy; could this be a response to recent criticisms that only a tiny
percent of students who register for courses like these ever
complete them? If this is a PR problem for Udacity, then obviously
they can reduce the difficulty of a course to whatever level
generates a desired completion rate.
Recently,
the blog “Godel's Lost Letter and P=NP” by Georgia Tech's Richard
Lipton had a lengthy post considering a perceived security problem
with programs like Thrun's at Udacity: namely, that a student could
freely register multiple accounts and keep taking the final exam
until they achieved an acceptable score. But this overlooks the
rather blatant fact that no one need go to such lengths, since the
system already allows you to re-submit each individual exam item as
many times as you like until success. Apparently Thrun's own response
to Lipton's concern was to propose tracking of IP addresses to
identify duplicate students, which bizarrely suggests a complete lack
of awareness of how his own final exams work. (“
Well
Thrun told me about it in person when I visited his company this
winter. They also can track IP addresses and they can see what is
going on with their students.”
; “Cheating
or Mastering?”, August 21, 2012)
9. Hucksterism
As if
the content-based problems noted above weren't enough, running
throughout Thrun's presentations is a routine, suspiciously hard-sell
call for how stellar the class was and how much you, the viewer, have
learned. Personally, I found this to be both grating and a
thou-dost-protest-too-much
lampshading of the flaws of the course. (You might think that I'm being too harsh,
but puncturing this kind of stuff is, after all, the raison d'être
of the AngryMath blog). He says: “You now know a lot about scatter
plots!” (Unit 3.12) (yeah, lots). “Isn't this a lot of fun?
Isn't statistics really great? (Unit 6.16) (surely someone thinks
otherwise). “You are a very capable statistician at this point!”
(Unit 32.12) (hyperbole at best). “When people say this is a
contradiction... just smile [in disagreement] and say you took Sebastian's Stats 101
and you understand.” (Unit 22.5) (yeah, I'll get right on that).
10. Lack of Updates?
Finally,
here's a core a problem that multiplies and exacerbates all the
others. In normal college teaching, a truly dedicated instructor will
go through a never-ending process of constant refinement and
improvement for their courses, based on two-way interaction and
feedback from live students. (I know I do; I've taught my
introductory statistics course several dozen times and I still sit
down and note possible improvements after almost every single class
session.)
So in
theory, any of the problems that I've noted above could be revisited
and fixed on future pass-throughs of the course. But will that happen
at Udacity, or any other massive online academic program? I strongly
suspect not – likely, the entire attraction for someone like Thrun
(and the business case for institutions like his) is to be able to
record basic lectures once and then never have to revisit them again.
Or in other words: All the millions of students using these ventures
will be permanently experiencing the shaky, version-1.0 trial run of
a new course, when the instructor is him- or herself just barely figuring out
how to teach it for the first time, and without the benefit of
two-way feedback or any refinements.
Summary
Based on
my review of the Udacity Introduction to Statistics course, I see
some compelling strategic advantages for live in-class teachers, that
will not be soon washed away by massive online video learning. Chief
among them are the presence of actual two-way communication between
teacher and students, such that the instructor can modify, expand,
and respond to questions when appropriate (in regards to clarity of
presentation, quiz questions, missing pieces, and rationalizing
difficulty levels); and the ability to engage in a cycle of constant
improvements and refinements every time the course is taught by a
dedicated teacher. Also, I feel that written text is ultimately more
useful than videos, being more elegant and precise, easier to search
and index key terms and examples, suffering fewer technical problems,
easier to update, and generally being truer to the form of
mathematical written presentation in the first place. In addition to
these, Thrun's lectures at Udacity have a stunning number of critical
flaws (in regards to planning, sequencing, clarity, writing, and
missing major topics) that leave me amazed if any actual intro-level
student manages to make their way through the whole class.
Perhaps
the upshot here is a restatement of the old saw: “You get what you
pay for.” (Udacity being currently free, with a mission-statement
to remain that way). Or else another: “Don't take a class from a
world-famous researcher, because they don't really have time or
interest for teaching.” Obviously, Sebastian Thrun is not just a
teacher-by-online-video; he's also a Google Vice-President and
Fellow, a Research Professor of Computer Science at Stanford, former
director of the Stanford AI Laboratory, head of teams competing in
DARPA challenges, and leads the development of Google's self-driving
car program. How much time or focus would we expect him to have for a
freshman-level introductory math course? (Not much; in one lecture he
mentions that he's recording at 3AM and compares it to his “day
job” at Google.) Some of these shortcomings may be overcome by a
more dedicated teacher. But others seem endemic to the massive-online
project as a whole, and I suspect that the industry as a whole will
turn out to be an over-inflating bubble that bursts at some point, much
like other internet sensations of the recent past.