Abe Davis: New video technology that reveals an object’s hidden properties

Abe Davis: New video technology that reveals an object’s hidden properties


Most of us think of motion
as a very visual thing. If I walk across this stage
or gesture with my hands while I speak, that motion is something that you can see. But there’s a world of important motion
that’s too subtle for the human eye, and over the past few years, we’ve started to find that cameras can often see this motion
even when humans can’t. So let me show you what I mean. On the left here, you see video
of a person’s wrist, and on the right, you see video
of a sleeping infant, but if I didn’t tell you
that these were videos, you might assume that you were looking
at two regular images, because in both cases, these videos appear to be
almost completely still. But there’s actually a lot
of subtle motion going on here, and if you were to touch
the wrist on the left, you would feel a pulse, and if you were to hold
the infant on the right, you would feel the rise
and fall of her chest as she took each breath. And these motions carry
a lot of significance, but they’re usually
too subtle for us to see, so instead, we have to observe them through direct contact, through touch. But a few years ago, my colleagues at MIT developed
what they call a motion microscope, which is software that finds
these subtle motions in video and amplifies them so that they
become large enough for us to see. And so, if we use their software
on the left video, it lets us see the pulse in this wrist, and if we were to count that pulse, we could even figure out
this person’s heart rate. And if we used the same software
on the right video, it lets us see each breath
that this infant takes, and we can use this as a contact-free way
to monitor her breathing. And so this technology is really powerful
because it takes these phenomena that we normally have
to experience through touch and it lets us capture them visually
and non-invasively. So a couple years ago, I started working
with the folks that created that software, and we decided to pursue a crazy idea. We thought, it’s cool
that we can use software to visualize tiny motions like this, and you can almost think of it
as a way to extend our sense of touch. But what if we could do the same thing
with our ability to hear? What if we could use video
to capture the vibrations of sound, which are just another kind of motion, and turn everything that we see
into a microphone? Now, this is a bit of a strange idea, so let me try to put it
in perspective for you. Traditional microphones
work by converting the motion of an internal diaphragm
into an electrical signal, and that diaphragm is designed
to move readily with sound so that its motion can be recorded
and interpreted as audio. But sound causes all objects to vibrate. Those vibrations are just usually
too subtle and too fast for us to see. So what if we record them
with a high-speed camera and then use software
to extract tiny motions from our high-speed video, and analyze those motions to figure out
what sounds created them? This would let us turn visible objects
into visual microphones from a distance. And so we tried this out, and here’s one of our experiments, where we took this potted plant
that you see on the right and we filmed it with a high-speed camera while a nearby loudspeaker
played this sound. (Music: “Mary Had a Little Lamb”) And so here’s the video that we recorded, and we recorded it at thousands
of frames per second, but even if you look very closely, all you’ll see are some leaves that are pretty much
just sitting there doing nothing, because our sound only moved those leaves
by about a micrometer. That’s one ten-thousandth of a centimeter, which spans somewhere between
a hundredth and a thousandth of a pixel in this image. So you can squint all you want, but motion that small is pretty much
perceptually invisible. But it turns out that something
can be perceptually invisible and still be numerically significant, because with the right algorithms, we can take this silent,
seemingly still video and we can recover this sound. (Music: “Mary Had a Little Lamb”) (Applause) So how is this possible? How can we get so much information
out of so little motion? Well, let’s say that those leaves
move by just a single micrometer, and let’s say that that shifts our image
by just a thousandth of a pixel. That may not seem like much, but a single frame of video may have hundreds of thousands
of pixels in it, and so if we combine all
of the tiny motions that we see from across that entire image, then suddenly a thousandth of a pixel can start to add up
to something pretty significant. On a personal note, we were pretty psyched
when we figured this out. (Laughter) But even with the right algorithm, we were still missing
a pretty important piece of the puzzle. You see, there are a lot of factors
that affect when and how well this technique will work. There’s the object and how far away it is; there’s the camera
and the lens that you use; how much light is shining on the object
and how loud your sound is. And even with the right algorithm, we had to be very careful
with our early experiments, because if we got
any of these factors wrong, there was no way to tell
what the problem was. We would just get noise back. And so a lot of our early
experiments looked like this. And so here I am, and on the bottom left, you can kind of
see our high-speed camera, which is pointed at a bag of chips, and the whole thing is lit
by these bright lamps. And like I said, we had to be
very careful in these early experiments, so this is how it went down. (Video) Abe Davis: Three, two, one, go. Mary had a little lamb!
Little lamb! Little lamb! (Laughter) AD: So this experiment
looks completely ridiculous. (Laughter) I mean, I’m screaming at a bag of chips — (Laughter) — and we’re blasting it with so much light, we literally melted the first bag
we tried this on. (Laughter) But ridiculous as this experiment looks, it was actually really important, because we were able
to recover this sound. (Audio) Mary had a little lamb!
Little lamb! Little lamb! (Applause) AD: And this was really significant, because it was the first time
we recovered intelligible human speech from silent video of an object. And so it gave us this point of reference, and gradually we could start
to modify the experiment, using different objects
or moving the object further away, using less light or quieter sounds. And we analyzed all of these experiments until we really understood
the limits of our technique, because once we understood those limits, we could figure out how to push them. And that led to experiments like this one, where again, I’m going to speak
to a bag of chips, but this time we’ve moved our camera
about 15 feet away, outside, behind a soundproof window, and the whole thing is lit
by only natural sunlight. And so here’s the video that we captured. And this is what things sounded like
from inside, next to the bag of chips. (Audio) Mary had a little lamb
whose fleece was white as snow, and everywhere that Mary went,
that lamb was sure to go. AD: And here’s what we were able
to recover from our silent video captured outside behind that window. (Audio) Mary had a little lamb
whose fleece was white as snow, and everywhere that Mary went,
that lamb was sure to go. (Applause) AD: And there are other ways
that we can push these limits as well. So here’s a quieter experiment where we filmed some earphones
plugged into a laptop computer, and in this case, our goal was to recover
the music that was playing on that laptop from just silent video of these two little plastic earphones, and we were able to do this so well that I could even Shazam our results. (Laughter) (Music: “Under Pressure” by Queen) (Applause) And we can also push things
by changing the hardware that we use. Because the experiments
I’ve shown you so far were done with a camera,
a high-speed camera, that can record video
about a 100 times faster than most cell phones, but we’ve also found a way
to use this technique with more regular cameras, and we do that by taking advantage
of what’s called a rolling shutter. You see, most cameras
record images one row at a time, and so if an object moves
during the recording of a single image, there’s a slight time delay
between each row, and this causes slight artifacts that get coded into each frame of a video. And so what we found
is that by analyzing these artifacts, we can actually recover sound
using a modified version of our algorithm. So here’s an experiment we did where we filmed a bag of candy while a nearby loudspeaker played the same “Mary Had a Little Lamb”
music from before, but this time, we used just a regular
store-bought camera, and so in a second, I’ll play for you
the sound that we recovered, and it’s going to sound
distorted this time, but listen and see if you can still
recognize the music. (Audio: “Mary Had a Little Lamb”) And so, again, that sounds distorted, but what’s really amazing here
is that we were able to do this with something
that you could literally run out and pick up at a Best Buy. So at this point, a lot of people see this work, and they immediately think
about surveillance. And to be fair, it’s not hard to imagine how you might use
this technology to spy on someone. But keep in mind that there’s already
a lot of very mature technology out there for surveillance. In fact, people have been using lasers to eavesdrop on objects
from a distance for decades. But what’s really new here, what’s really different, is that now we have a way
to picture the vibrations of an object, which gives us a new lens
through which to look at the world, and we can use that lens to learn not just about forces like sound
that cause an object to vibrate, but also about the object itself. And so I want to take a step back and think about how that might change
the ways that we use video, because we usually use video
to look at things, and I’ve just shown you how we can use it to listen to things. But there’s another important way
that we learn about the world: that’s by interacting with it. We push and pull and poke and prod things. We shake things and see what happens. And that’s something that video
still won’t let us do, at least not traditionally. So I want to show you some new work, and this is based on an idea I had
just a few months ago, so this is actually the first time
I’ve shown it to a public audience. And the basic idea is that we’re going
to use the vibrations in a video to capture objects in a way
that will let us interact with them and see how they react to us. So here’s an object, and in this case, it’s a wire figure
in the shape of a human, and we’re going to film that object
with just a regular camera. So there’s nothing special
about this camera. In fact, I’ve actually done this
with my cell phone before. But we do want to see the object vibrate, so to make that happen, we’re just going to bang a little bit
on the surface where it’s resting while we record this video. So that’s it: just five seconds
of regular video, while we bang on this surface, and we’re going to use
the vibrations in that video to learn about the structural
and material properties of our object, and we’re going to use that information
to create something new and interactive. And so here’s what we’ve created. And it looks like a regular image, but this isn’t an image,
and it’s not a video, because now I can take my mouse and I can start interacting
with the object. And so what you see here is a simulation of how this object would respond to new forces
that we’ve never seen before, and we created it from just
five seconds of regular video. (Applause) And so this is a really powerful
way to look at the world, because it lets us predict
how objects will respond to new situations, and you could imagine, for instance,
looking at an old bridge and wondering what would happen,
how would that bridge hold up if I were to drive my car across it. And that’s a question
that you probably want to answer before you start driving
across that bridge. And of course, there are going to be
limitations to this technique, just like there were
with the visual microphone, but we found that it works
in a lot of situations that you might not expect, especially if you give it longer videos. So for example,
here’s a video that I captured of a bush outside of my apartment, and I didn’t do anything to this bush, but by capturing a minute-long video, a gentle breeze caused enough vibrations that we could learn enough about this bush
to create this simulation. (Applause) And so you could imagine giving this
to a film director, and letting him control, say, the strength and direction of wind
in a shot after it’s been recorded. Or, in this case, we pointed our camera
at a hanging curtain, and you can’t even see
any motion in this video, but by recording a two-minute-long video, natural air currents in this room created enough subtle,
imperceptible motions and vibrations that we could learn enough
to create this simulation. And ironically, we’re kind of used to having
this kind of interactivity when it comes to virtual objects, when it comes to video games
and 3D models, but to be able to capture this information
from real objects in the real world using just simple, regular video, is something new that has
a lot of potential. So here are the amazing people
who worked with me on these projects. (Applause) And what I’ve shown you today
is only the beginning. We’ve just started to scratch the surface of what you can do
with this kind of imaging, because it gives us a new way to capture our surroundings
with common, accessible technology. And so looking to the future, it’s going to be
really exciting to explore what this can tell us about the world. Thank you. (Applause)

100 thoughts on “Abe Davis: New video technology that reveals an object’s hidden properties

  1. I'd like to see them interstate this technology into kerliean photography so they can discover the sound and structure of energy

  2. I'm a film director and I must have this wonderful technology!! Tks for creating this new toy !!!!! ๐Ÿ‘๐Ÿ‘๐Ÿ‘๐Ÿ‘๐Ÿ‘๐Ÿ‘

  3. A company or group could and should use this software to record the heart beats of politicians while they debate. Then they could feed the heart beat into a lie detector machine. @ real time we would know when and if they were lying…pls advertise and sell the software so that we can fix our government.

  4. It won't just tell us about our world but others also. By videotaping a distended planet we can detect so much of its being.

  5. Point it at the sun for a month… or at le earth from L1 er something. Predict weather n stuff..

    Please forgive thine funny type n' grammar.

  6. get a real job that can help the real people………….mate.take away tec and where would u be…?

  7. This kinda reminds me of one of the first ever "telephones" which used light to code information and then it was shined on a piece of paper which vibrated

  8. I don't understand, they are acting as if this is new…?
    Military has has devices for years that can monitor sound (voices) from the vibration of windows of the room the people are talking in. (Hence picking up the audio from vibrations.) using video would only enhance that more I guess. But my point is, this "technology" has existed for a while now

  9. I don't like how they first tell the audience what is being said and then they playback what they recorded.ย  This is like playing a file backwards and hearing messages.ย  It's your brain creating patterns and trying to make intelligible speech out of it.ย  It needs to be clearer and much less ambiguity.ย  Otherwise it's like looking up at clouds in the sky to findย  pictures in the clouds.ย  That's just how your brain works to find meaningful patterns in stimuli.

  10. What a waste of time. ย The first 15 min was the same material from a previous TED talk, way too much background. ย And the last 5 was showing how we can take video to do the same thing that animators have been doing on computers for a decade. ย Why is this TED talk worthy again?

  11. The work is great, but let us not be over enthusiastic. I'm no expert in this field, but I can sense immediately some hidden issues. 1) All the wonders here largely depend on the super-precision of the special cameras. For a regular camera at a reasonable distance, I suspect the noises will overwhelm, and it will become a problem like chaos – undeterministic. 2) Highlighting pulse sensation in this visual way is really a fake exaggeration which does not correspond to reality; it's no more than the authors' personal interpretation. 3) I suspect the sound detection is done through checking the resonance frequencies on the leaves via the pixels or shapes, and is fundamentally prone to massive noises; radar type of detections will do infinitely better. 4) Visual extrapolation of a bush movement does not imply its feasibility. The visual detection of those tiny movements serves as some sort of initial motion vectors which can be utilised as the salt or seed for visual extrapolation. So it's a good work, but it is not really shocking to me.

  12. Hats off to the smart work these guys have done!
    I think one day we will be able to talk to nature too. Wow! That'll be awesome! ๐Ÿ™‚

  13. Can it measure the firmness of the ground and detect underwater streams? Or sinkholes, (I fear sinkholes)-? Or gold, or oil deposits?

  14. That had been a great thing to send out to space when we look at planers and asteroids or maybe to look down on earth to see if we can hear or see coming earthquake oe eruption of a volcano.

  15. When I read the title I had assumed it was going to be a camera that uses sound to find objects that are behind other objects.

  16. So, while this is amazing – and I don't want to sound like a paranoid wacko… but the main application for this tech is clearly spying… using any vibrating object in a room will now work as a "bug" for the spy as long as they can film it. ugh.

  17. so if a tree falls in a forest and no one is around to hear it and you film a near by bush with a high speed camera does it make a sound?

  18. May be it would be able to capture the early waves created by an earthquake (like many of the birds and few animals comes to know and migrate away from it) ย and give enough room to to prepared for it.

  19. I would be curious to see 1 or 2 minute exposure of micro-organisms growing in culture. There has already been some work done on yeast moulds that suggests some sort of cognition or coherent movement.

  20. Is the still from a point near the last frame, like after the sound has been completed? this might be a stupid question.

  21. This will help solve crime in the future, imagine being an investigator and using this technology one day to take video and try and extract sound in a place were a murder occurred.

  22. But then also what if you took it backwards, having the mathematics of how things interact with vibrations and reverse-engineer an image/structure/model. At large-scale, you could get an image pf things that we can't see for whatever reason. Assuming technogy ever got so far as to really record the sounds of space/our satelites went further you could literally construct visual structures of stars and stuff. But if we even use sound to get images, maybe somehow we can use the same sounds to give us readings on structure. What if we could look for habitible planets by knowing how sound reacts with water, and then grabbing the data on what it sounds like on Earth, and use that understanding to be able to distinguish water on other planets. Idk. This is great stuff!

  23. EAGLE EYE, 2008. I remember that when the computer analyze sounds through vibrating a glass of coffee.

  24. Only 129,158 views for one year for this amazing official TED video ?
    Something's wrong with the youtube users.

  25. amazing to learn – I would love to become involved with this and if there is any possibility to help I will

  26. This presenter is scientifically clever but oozes arrogance. Implicitly he is comparing himself to Thomas Edison. That is laughable. Edison famously recorded the"Mary had a little lamb" on his first audio recording.

  27. I still don't get how they could extract audio data from a two dimentional video recording… He didn't mention the impact of the audiowave on the whole target spoken to (NO 3D depth info)… tbh I am having doubts about all this…

  28. Would be great for Crime Scenes..Listening to the Crimes actually taking place!!!… Aboriginal peoples all know that Rock absorbs and records ALL SOUND!!!

  29. I`m interested, can be pulled out sound from silent films, as Charlie Chaplin, Garold Lloyd and other?

  30. The Russians did something quite similar in the 40s. https://en.wikipedia.org/wiki/Laser_microphone.

  31. CIA/NSA (and Google and Facebook for that matter) are probably busting down this guys door trying to adapt it for their purposes.
    I don't think they would do such a thing you say? Our privacy still matters you say?
    If you believe that way, I find that uproariously funny. So naรฏve..'

  32. The sheeple laughed when he said "surveillance".
    Nobody is spying you. cameras and microphones multiple places in your home, in your car, by the street, but nobody uses it, just vanishes in the space. There's no survellance.
    Except by the CIA boss who said that internet of things makes spying people very easy.

  33. This type of research could probably be used to restore sound on old soundless films. Peter Jackson (Park Road Studios) restored tons of WW1 footage and added colour and corrected the frame rates to hand-cranked films (The company removed the jitter/flashes from old films and made smooth frame rates).

  34. People have been so conditioned by tel-LIE-vision public programming they believe everything somebody with a degree tells them, never questioning the funding motives behind these seminars.

  35. This is a terrible idea. I hope it's as just as difficult to implement as it is a terrible idea.I have a feeling it's going to be as simple as scratching an itch though.

Leave a Reply

Your email address will not be published. Required fields are marked *