Computertechnologie
Machine Learning Works Great—Mathematicians Just Don’t Know Why
Author: Ingrid Daubechies. Ingrid Daubechies Science
WIRED | 2015-12-12
At a dinner I attended some years ago, the distinguished differential
geometer Eugenio Calabi volunteered to me his tongue-in-cheek
distinction between pure and applied mathematicians. A pure
mathematician, when stuck on the problem under study, often decides to
narrow the problem further and so avoid the obstruction. An applied
mathematician interprets being stuck as an indication that it is time to
learn more mathematics and find better tools.
I have always loved this point of view; it explains how applied
mathematicians will always need to make use of the new concepts and
structures that are constantly being developed in more foundational
mathematics. This is particularly evident today in the ongoing effort to
understand “big data”—data sets that are too large or complex to be
understood using traditional data-processing techniques.
Our current mathematical understanding of many techniques that are
central to the ongoing big-data revolution is inadequate, at best.
Consider the simplest case, that of supervised learning, which has been
used by companies such as Google, Facebook and Apple to create voice- or
image-recognition technologies with a near-human level of accuracy.
These systems start with a massive corpus of training samples—millions
or billions of images or voice recordings—which are used to train a deep
neural network to spot statistical regularities. As in other areas of
machine learning, the hope is that computers can churn through enough
data to “learn” the task: Instead of being programmed with the detailed
steps necessary for the decision process, the computers follow
algorithms that gradually lead them to focus on the relevant patterns.
In mathematical terms, these supervised-learning systems are given a
large set of inputs and the corresponding outputs; the goal is for a
computer to learn the function that will reliably transform a new input
into the correct output. To do this, the computer breaks down the
mystery function into a number of layers of unknown functions called
sigmoid functions. These S-shaped functions look like a street-to-curb
transition: a smoothened step from one level to another, where the
starting level, the height of the step and the width of the transition
region are not determined ahead of time.
Inputs enter the first layer of sigmoid functions, which spits out
results that can be combined before being fed into a second layer of
sigmoid functions, and so on. This web of resulting functions
constitutes the “network” in a neural network. A “deep” one has many
layers.
Decades ago, researchers proved that these networks are universal,
meaning that they can generate all possible functions. Other researchers
later proved a number of theoretical results about the unique
correspondence between a network and the function it generates. But
these results assume networks that can have extremely large numbers of
layers and of function nodes within each layer. In practice, neural
networks use anywhere between two and two dozen layers. Because of this
limitation, none of the classical results come close to explaining why
neural networks and deep learning work as spectacularly well as they do.
It is the guiding principle of many applied mathematicians that if
something mathematical works really well, there must be a good
underlying mathematical reason for it, and we ought to be able to
understand it. In this particular case, it may be that we don’t even
have the appropriate mathematical framework to figure it out yet. (Or,
if we do, it may have been developed within an area of “pure”
mathematics from which it hasn’t yet spread to other mathematical
disciplines.)
Another technique used in machine learning is unsupervised learning,
which is used to discover hidden connections in large data sets. Let’s
say, for example, that you’re a researcher who wants to learn more about
human personality types. You’re awarded an extremely generous grant that
allows you to give 200,000 people a 500-question personality test, with
answers that vary on a scale from one to 10. Eventually you find
yourself with 200,000 data points in 500 virtual “dimensions”—one
dimension for each of the original questions on the personality quiz.
These points, taken together, form a lower-dimensional “surface” in the
500-dimensional space in the same way that a simple plot of elevation
across a mountain range creates a two-dimensional surface in
three-dimensional space.
What you would like to do, as a researcher, is identify this
lower-dimensional surface, thereby reducing the personality portraits of
the 200,000 subjects to their essential properties—a task that is
similar to finding that two variables suffice to identify any point in
the mountain-range surface. Perhaps the personality-test surface can
also be described with a simple function, a connection between a number
of variables that is significantly smaller than 500. This function is
likely to reflect a hidden structure in the data.
In the last 15 years or so, researchers have created a number of tools
to probe the geometry of these hidden structures. For example, you might
build a model of the surface by first zooming in at many different
points. At each point, you would place a drop of virtual ink on the
surface and watch how it spread out. Depending on how the surface is
curved at each point, the ink would diffuse in some directions but not
in others. If you were to connect all the drops of ink, you would get a
pretty good picture of what the surface looks like as a whole. And with
this information in hand, you would no longer have just a collection of
data points. Now you would start to see the connections on the surface,
the interesting loops, folds and kinks. This would give you a map for
how to explore it.
These methods are already leading to interesting and useful results, but
many more techniques will be needed. Applied mathematicians have plenty
of work to do. And in the face of such challenges, they trust that many
of their “purer” colleagues will keep an open mind, follow what is going
on, and help discover connections with other existing mathematical
frameworks. Or perhaps even build new ones.
Original story reprinted with permission from Quanta Magazine, an
editorially independent publication of the Simons Foundation whose
mission is to enhance public understanding of science by covering
research developments and trends in mathematics and the physical and
life sciences.
read:http://www.wired.com/2015/12/machine-learning-works-greatmathematicians-just-dont-know-why/