I’ve been spending a lot of time learning about robotics and machine learning. I’ve documented some of that with my robot Ty but after reaching a stopping point with him, I wanted to go deeper. I took AI for Robotics from Udacity (free!) and started reading Sebastian Thrun’s Probabilistic Robotics. I feel like I’m learning a lot; certainly my brain often feels quite stuffed.
Finishing up the Udacity course, I did well on the AI for Robotics final mini-exam, easily knowing the answers to all of the intuitive questions and knowing where to look for the equation or algorithm based questions.
After the exam came a project where I was to control a hunter robot capturing a lost robot driving in circles. The first few problems of that were straightforward control theory stuff. It got harder, requiring dealing with noise and estimation of the future despite noise but nothing impossible. I finished the problem but opted not to try the extra credit (very, very noisy readings from the lost robot) because my methods were not good enough for that (though I had some ideas).
In my solution, I filtered the data to be less noisy, determined the robot behavior and estimated the real position based on that. I was still left with noise so my solution was obviously non-optimal but the grader-program gave me 1000 steps to solve it and I usually got it in 200 (I estimated that ~35 would likely be optimal so I knew I was slacking).
The next day, I wondered if I should give the extra credit more of a go. I looked online to see how other people had solve the problem that I’d finished. I found only one person who put their code on github and they used a Kalman Filter for the estimation.
Kalman Filters were covered in the class. The project was supposed to be over the whole class. It didn’t even occur to me to break out that heavyweight algorithm. It is definitely a possible solution but it wasn’t even in my head when I thought about how to catch the robot.
I keep having this problem with all the new things I’ve learned. I know about the techniques, I can even describe them, but I can’t figure out how to apply them to problems (or real world situations). I’m pretty sure I’m going to forget these skills if I’ve already forgotten about them.
I wonder if I need to make a list of the skills I’ve learned and start figuring out where they could apply on my robot Ty and how I would go about using them. Even if I don’t implement the features, at least I’ll have a map of things I could do. A visible set of tools might help me to stop reaching for the hammer I’m accustomed to.
I’m presenting this here in hopes you can tell me what I forgot (or in case you have a problem and want to investigate a solution). There is a good chance I’ll get parts wrong so please don’t rely on my musings as a single source for your complicated machine learning for robots solutions.
First, the stuff I’m likely to use in the future, maybe on Ty:
Kalman Filter: Kalman Filters are good for combining different sensors with statistically consistent noisy to synthesize better solutions. For example, they are often used to estimate location or inertial given several noisy inputs from accelerometers, gyros, magnetometers, and GPS. Combining the stepper motor input with current draw might help me figure out when Ty is close to pressing a button. Also: there are several varieties of Kalman.
Particle Filter: A particle filter models the system many times (aka particles) then uses input to narrow down which simulations are the best. These are often used to estimate location in a maze given uncertain starting point. This might be useful for figuring out where Ty’s finger is given the camera input and stepper motor input.
Motion Control: PID with twiddle for determining constants: PID control is important and common but tuning the parameters is often considered as much art as science. The twiddling procedure goes through an optimization to find the best overall PID parameters. It does require adequate simulation or an extensive testbed. I’m not currently using PID control in Ty though I use it in my professional life. I haven’t tried the twiddle algorithm, mostly because it is similar to my by-hand process and I haven’t needed it.
Computer Vision: Colors: I felt like a good 30% of computer vision is playing with colors: using different color space formulations (RGB, LUV, and so on) to pull out the most relevant information from an image and then masking the color or light level off so you see only that. I’m amazed at how important this is. But I’m also amazed at how terrible it is. For Ty, I used the camera and colors to isolate the bright red of a laser beam and then had the arm follow it, like a cat. That was fun and consistently worked. However, looking for a blue sticker to track the arm location depends on lighting and phases of the moon to work. I can tweak it to work for every situation but who wants to do that? There is more to do here but I don’t know how to learn better techniques other than through trial and error.
Computer Vision: Canny edge detection: Separating objects is important and software does this by finding edges. The Canny edge detection algorithm is very popular because it works well given its limited processing. It is used in self-driving cars to find lanes, stop light cameras to find license plates, and in Ty to find the edges of the blue sticker that marks the arm location.
Computer Vision: Homography: Given a 2d picture of an apple to look for, how do you find that apple in a new camera image that may be at an entirely different angle? Well, apparently you can construct the 3d characteristics of the apple and then look for that 3d apple in the new image. It is kind of magical and fairly robust, returning the rotation matrix so you can convert between the initial apple and the camera-apple. We talked with Kathleen Tuite on the Embedded show about using homography (and SIFT and FLANN) and photogrammetry to connect photos of buildings together to make a model of the building. Ty uses homography to find the keyboard keys given a perfect keyboard image (and the key location mappings in the perfect keyboard image). It works surprisingly well.
Deep Learning: Decision Trees: Sometimes I’m not great at making decisions so an algorithm called “decision trees” sound appealing. However, this ends up being a series of if-then-else code with the conditionals chosen by an algorithm to best separate the data. Essentially, this algorithm is like playing Twenty Questions. I’m not really sure how these are important in deep learning but let me tell you, I can if-then-else with the best of them. Some of my best algorithms came about through (fairly manual) parametric comparisons.
Deep Learning: Neural Nets: Can you ever really know neural nets? I suppose I know enough to talk about layers and visualize some of what is going on inside. I understand the math behind it and how important the training data is to the output. I know that NNs are often the hammer of other peoples’ toolbox, but for me, I never see those neural nails. For Ty, once I get the ROS Gazebo simulator working, I’m hoping to use neural nets in reinforcement learning to make the whole system magically work. I think I’ll just put in the camera as pixels, have the neural net control the three servo positions and then punish the algorithm anytime it doesn’t press the button I want. It should work fine. Neural nets always do.
Deep Learning: TensorFlow: I want to say TensorFlow is a language built to make deep learning simpler but that isn’t right. The language to use is python or C. TensorFlow is a library optimized to make machine learning code run faster on your computer. This is necessary because training the neural net to do what you want is deadly slow unless you take advantage of all your processor features. Developed by Google, this is the most popular machine learning framework.
Deep Learning: Keras: What if you took TensorFlow and made it easier to use? Keras is amazingly powerful if you know what you are doing. Even better, some of the hand holding it provides tends to shed light into TensorFlow’s processes. However, it hides all the gooey and important details; its simplicity makes it easy to create unintentionally large systems. I like Keras a lot but I’m pretty sure I’ll shoot myself in the foot with it.
Deep Learning: Nvidia TensorRT and Caffe: Caffe is Nvidia’s deep learning framework, usually use with C++ and CUDA, optimizing the GPU usage to make training and inference faster. TensorRT is some voodoo that makes it all go faster, I think? I learned TensorFlow so it makes more sense to me. However, Ty runs on an Nvidia board so I should learn Caffe. Or at least learn how to train my models. Or figure out what it does. Something.
Deep Learning: Convolutional Neural Net (CNN): CNNs are a specific form of neural network that is often used on image inputs. The NN input are groupings of the image, for example a 5x5 pixel view into a camera image. The weights tend to look for areas with shapes (a curve to the left, a circular shape) that feed into later layers, building up the desired response by breaking down the image into component parts. This is a lot like they tell you how to draw in art class.
Deep Learning: LeNet and GoogleNet: These algorithms are specific forms of CNN configuration often used in object identification problems. These specify how many layers, how big the view is in each layer, and how the layers connect. Designing your own is prone to over reliance on the small set of available data, using a well-established method is likely to let you focus on the difficult part of training instead of the even more difficult part of designing the algorithm. I’ve used both of these in identifying signs in the Self-Driving Car course as well as the Jetson’s object identification demo. At one time, I thought Ty could identify keys or keyboards using object identification but these are fairly big algorithms for that small problem.
Deep Learning: Transfer Learning: Kind of like taking the tires off a truck and using them on a go-cart, transfer learning lets you snip off part of a network trained to identify A (apples) and then retrain it in a very short amount of time identify B (kittens). Given a well trained apple-finding network that took days to train, you could be finding kittens in a few minutes. This is black magic, and I’d say run far and fast, but it is really handy. I dream of having Ty type what it sees but I’d want to retrain an object identification system to be more hackerspace oriented as that is likely where Ty would be when showing off.
Deep Learning: Behavioral Cloning: Ok, we all get that training neural networks is computationally intensive. It is important to remember that it is also laborious to the humans who must collect and clean the data. What if, instead, you could teach a car to drive like you do by giving it a bunch of video of you driving? That’s what behavioral cloning does, or at least what we did in the self-driving car class with a track simulator. If I build a joystick control system for Ty’s then pressed each key a few (hundred) times, sending the camera and stepper control goal output to the neural net, Ty would be able to clone my behavior and find the keys on its own. Theoretically. Might be neat to try...
Localization: A* Searching: If I’m a rat in a maze, how can I find the cheese? I want the cheese now but I also want to find an optimal path so I can get to the cheese more quickly next time. As a rat, I’d want an algorithm that would be efficient and A* (pronounced “A star”) is efficient. Realistically, it is a lot like breadth-first searching of, well, anything. This is a special case of dynamic programming where you find and optimize a path as you go.
Localization: SLAM (GraphSLAM and online GraphSLAM): SLAM stands for simultaneous localization and mapping, essentially figuring out where you are and keeping track so you can get back to here. This is often used by robots to build maps of caverns or rooms, documenting where they’ve already been and estimating unexplored regions. I don’t think I can use this on Ty but I suspect my Roomba uses it to vacuum.
Wow, that list is a lot bigger than I expected. I guess I did learn a few things. Now for some things that I may have already forgotten, only able to take comfort in the fact that learning them a second time will be easier:
Histogram Filter: what is this again? It was in the chapter with particle filters so mumble mumble noisy sensors mumble location, maybe?
Computer Vision: Histogram of Oriented Gradients (HOG): This technique was in Udacity’s Self-Driving Car term 1. It was used to identify other cars so this is likely useful for object recognition. Fascinating at the time, the HOG algorithm had something to do with how the edges were oriented, whether the edge was straight across, vertical, or diagonal. I don’t really remember how and why it worked other than it was computationally intensive and extremely sensitive to color and size of the object to identify. The HOG output was used as an input to the neural net.
Deep Learning: Support Vector Machine: This wasn’t even my first pass through support vector machines, I’ve used them for classification in the past. But they don’t stick in my brain.
Stuff I’ve heard of on the periphery and I want to know more but have found it heavy going:
Deep Learning: Recurrent Neural Networks (RNNs): RNNs are a form of neural network often used in speech recognition and handwriting recognition that keeps state around, like a feedback loop. These seem popular and interesting but I haven’t gotten a chance to try them out.
Reinforcement Learning: Reinforcement learning is a whole different branch of machine learning that focuses on letting the system learn the rules itself. There is no training data, only an indication of whether the systems output is correct or not. RL systems are used to play games (such as AlphaGo) but are often still research projects. Because they need a lot of training, they must have a reasonable simulation environment, one of the reasons I want ROS’s Gazebo to work. On the other hand, reinforcement learning builds on top of deep learning so this is an even more difficult thing to get my head around.
Note: this isn’t even close to a complete list of machine learning topics. It is a huge field with so many layers, kind of like mathematics where you have to see algebra four or five times to really get a handle on its flexibility and power.
Udacity’s AI for Robotics course is approachable and intuition building but the same topics presented in Thrun’s Probabilistic Robotics are given a far more rigorous (and confusing) approach. I wish I had notes from the course instead of trying to look up the information in the book.
By the way, back to AI for Robotics and the final project. I went back and redid my solution using a particle filter. It was much better with the solution happening in ~27 steps (vs. 200 with my previous solution). The super noisy extra credit data was still beyond its abilities (and my willingness to spend time) to solve.