Student information by the numbers

I alluded to the increase in information available about student learning from monitoring online interactions. I want to make this explicit and discuss the numbers a little bit. In a typical college course grade book, an instructor will record a dozen maybe fifteen homework grades, half a dozen quiz scores and, say, three exams. If each grade is on a scale of 0-100, we should allocate 7 bits for each grade (homework grades might not even be this detailed in practice). With 24 grades in the final gradebook, that makes 168 bits of information on the basis of which the student is finally evaluated.

In online interactions, students are almost always allowed multiple attempts, and sometimes they are given optional hints or can choose to do a practice problem first. The system can theoretically track every click, but practically speaking, we will at least keep track of the following information: number of attempts (3 bits), hints or intermediate resources used (3-6 bits), time-to-response (3-6 bits), problem-specific parameters quantifying the difficulty and discrimination parameters (10 bits). For each problem solved online, we thus record 20 bits of information. At 250 problems in a typical semester-long course, that makes 5000 bits of relevant information. Note this is more modest than the factor of 100 I have previously suggested, but I am being conservative in both the high and low estimates.

What is the difference between 168 bits and 5000 bits? I hope Daniel Kahneman won’t mind my repurposing an image he uses in his new book Thinking, Fast and Slow. Kahneman introduces the image to make a point about processes that occur very quickly in the brain, such as reading someone’s facial expression. Here is what happens if you have to compromise the eye’s extraordinary resolution for the limited information available in the two cases we are considering:


The image on the left contains about 168 pixels and the image on the right about 5000. I’ve cheated in that the pixel depth is actually 8 bits, i.e. grayscale not just black & white, so it would be more accurate to call them 168 byte and 5000 byte images. Interesting that at 168 pixels, you can tell that you’re looking at a face, whereas at 5000 you can tell that face is about to yell at you.

Some of what is out there

Tom Mitchell of CMU has a fascinating slide talk available online about Human and Machine Learning. Mitchell’s expertise is definitely weighted to the computer science side, but he is also concerned with the cognitive neuroscience component of human learning. Which I just want to point out early on is somewhat orthogonal to anything I am trying to do–I will have very little to say about dopamine response, though there is a natural connection between dopamine response and reinforcement learning as it pertains to machines (scholarpedia on reinforcement learning from both algorithmic and neuronal perspectives).

This work helps me bring into relief the parts of machine learning and human learning that I am personally focused on, at least for now, which subsumes formal education (or school learning, K-16) and educational measurement on the human learning side and data mining and pattern recognition on the machine learning side. Formal education helps to distinguish between the kind of learning humans do when they acquire the skill to recognize a chair for what it is (pre school, presumably) vs when they learn to answer word problems which might involve algebraic solution.

Traditional education research involves a lot of theory of cognition and a lot of case studies, trials, interviews and assessments. But today a whole lot of education is happening in online or computer-based environments, and with that the sheer quantity of hard data increases by at least a factor of 100 (easily more). Educational measurement has gotten more sophisticated as well, employing a growing statistical arsenal. It is irresistible to bring some of these techniques to the data, which is what I actually do.  But I’m trying to resist it a little bit at the moment, because I think there is a learning moment for me to use machine learning techniques to better mine these data, to recognize new patterns and to develop new educational metrics. I want to write a machine learning algorithm which will come up with IRT by itself…and then come up with something even better. The long-term objective of all of this learning about learning is to improve (formal) education.

First training example

Three months ago I started a post-doc in physics education research at MIT in a group called RELATE, which stands for Research in Learning and Tutoring Effectively. [Before that I was a teacher, a furniture-maker/sculptor and a particle physics grad student, in reverse chronological order.] The work at RELATE, in the smallest possible nutshell, is focused on quantitative analysis in physics education research, especially for online learning environments. There is a lot of data mining here, informed by concept maps and cognitive models. Publications on the website tell the story better.

When I arrived at MIT, I inherited a project that had been worked on by at least two former postdocs (R. Warnakulasooriya and later C. Cardamone): applying Item Response Theory (IRT) as an assessment tool in our research. IRT is a psychometric method that allows you to measure parameters about assessment items (i.e. difficulty and discrimination of questions) simultaneously with the ability/skill of the person answering the questions. IRT extracts more information than classical test theory (CTT) and allows you to measure students on the same scale using different questions. It also helps you evaluate the quality of your items. There will be a lot more written about IRT in the future of this blog. One of the facets about IRT that is particularly relevant to mention is that it is not really designed to be used in a dynamic, noisy, online learning environment. IRT was developed for use in high-stakes testing, which is highly controlled. So we have a whole host of other interesting problems.

Because we are interested in the different ways that online learning takes place, I was paying attention when Stanford opened up three courses this Fall (Artificial Intelligence, Machine Learning, and Databases) to a wide on-line audience. Over 100,000 people enrolled in some of these. I started out curious about course format, but I quickly became even more interested in the content, especially Andrew Ng’s Machine Learning course. Much of this work is quite related to the IRT analysis I have been doing, and now my mind is spinning. Can we use machine learning algorithms to develop even better assessment tools than IRT, to…analyze student learning?

In order to keep track of some of these thoughts, I decided to start a blog (my first). The topic is the interplay of human learning and machine learning. On the simplest level, RELATE is interested in studying (and improving) human learning using machines, machines which help us analyze data that are obtained from the interaction of human students with other machines.  On a deeper level, Machine Learning can be used to learn about human learning from the same data.  This is less circular than it sounds. All three pairwise links between the nodes human, learning, and machine contain synergistic ideas.  In fact here is a fascinating workshop description from the NIPS 2008 conference titled Machine learning meets human learning.

That is the starting point.