Just have a look at this.
This clip is from an Indian film called “Enthiran” which was released in 2012. The movie is about a scientist who builds a truly intelligent humanoid which is capable of doing all the tasks that a human could. Initially, the robot is programmed with the laws of robotics but later learns to defy those laws and poses a threat to human existence. It was a huge commercial success and went on to receive a number of awards for its VFX, art design etc.
Also check this out
Most of you reading this might be familiar with this clip. It’s from a movie called “I Robot” which was released in 2004. This movie is one of the best movies with “robotics” as its core theme.
When I saw both these movies, I was in middle school, and as a kid with absolutely no knowledge about the field of robotics, I was filled with awe throughout the entire duration. I had imagined then, that in another 10 to 15 years, when I grow up, humans would share their world with robots just like how it was shown in these movies. At that exact moment, I didn’t have enough evidence to think of the question How far along are we in realizing truly intelligent robots?
Now as this question comes to mind, I honestly just feel like laughing out loud. But for me it is also one of those examples that make me sad about the outlook for AI, Computer Vision in Robotics. What would it take for a robot (which is just basically just a computer) for instance, to fight these enemy robots as shown from the “I Robot” clip above? For the non-STEM folk, I challenge you to think explicitly of all the pieces of knowledge that have to fall in place for it to make sense. Here is my short attempt:
Now for the cherry on the cake, all these steps have to be performed in real time using powerful compute hardware. I could write in length about how each of the above steps work, but that is not the point of the blog.
Movies like these are made purely out of the director’s imagination and people who watch the movie tend to often misinterpret the job of roboticists. In fact, any roboticist who is reading this blog might be literally in tears. Non-STEM people who watch such movies are often led into believing that the robots that are shown in movies are very close to becoming a reality. Our family members, friends tend to ask us when are we going to build the next Sonny. (Yes, I feel you). The purpose of this blog is to give the general audience a brief intuition into how extremely meticulous the job of roboticists is. Building each sub system of the robot is by itself a tremendous achievement, imagine the complexity of putting it all together to build a humanoid like Sonny.
Truly intelligent humanoids like Sonny or Chitti from Enthiran, have the ability to think and act on their own. Yes, Deep Learning comes under the subdivision called “Supervised Learning”, where we train our models with existing datasets. But in this case, we use “Unsupervised Learning” where the robot is expected to learn from experience. For instance, when a baby is just starting to walk, it doesn’t know how to, however it trips, falls, tries to get up and eventually after some time, it is able to walk. Unsupervised Learning works in a similar way, where our models learn from their mistakes, they have a pre-defined reward function, which runs towards a negative quantity if the robot performs the wrong action and runs towards a positive quantity when the robot performs the desired action. Hence, the optimization algorithm would be defined as “Maximize the reward function”. Easier said than done xD, this sub division of Machine Learning contains some of the most complicated math that you might’ve ever seen in your life.
Thinking about the complexity and scale of the problem further, a seemingly unavoidable conclusion for me is that we may also need embodiment, and that the only way to build robots that can interpret scenes like we humans do is to allow them to get exposed to all the years of (structured, temporally coherent) experience we have, ability to interact with the world, and some magical active learning/inference architecture that I can barely even imagine when I think backwards about what it should be capable of, the complexity is simple mind -boggling. I hate to say it but the state of CV, AI and robotics, when we consider this task, and when we think about how we can ever go from here to there. The road ahead is long, uncertain, unclear and foggy. In any case, we are very, very far and this depresses me. We will have to continue innovating , what else is the way forward?