This semester at Uni I took a Computer Vision and Robotics course which had a semester-long team project component. I learnt an enormous ammount about kinematics, actuators, dynamics, control, motion planning, perception and computer vision in general, and had a blast working with a great team to build a complex robot that used computer vision extensively.
Our challenge was broken into 3 smaller assessable milestones, however the end goal was to develop a Lego Mindstorms based robot that used computer vision and an Xbox Kinect sensor to detect the location of Coke Cans and Bottles within various (unkown) scenes. Once detection was complete, the robot then had to place a straw within the mouth of each can / bottle without human interaction.
In addition to the above, the robot had to be robust against differing types and orientations of cans and bottles, lighting and background noise (the scene was intentionally cluttered). Oh yeah, the code had to be written entirely in Matlab :(
A stack overflow post I saw today was asking a very similar / related question and prompted me to write down my experiences for future reference.
Our robot design iterated / pivoted several times over the course of the semester (an early design is shown below) however the final design featured an RRP linear robot arm with a tripod-mounted Kinect facing vertically downwards onto the test environment.
The final robot design (below) had a Rotary base plate, a Rotary arm actuator driving a 4-bar mechanism to extend the robots reach, and a primatic vertical actuator that lowered the straw into the target opening.
The design also featured 3 yellow lego fiducials that were used to calibrate the extrinsic camera parameters prior to each test run. Initially we used yellow lego man heads for these targets, leading to us nicknaming the robot ‘Chuck Norris’.
The algorithm for detecting the coke cans worked as follows;
1) A Macbeth Color Calibration target was used to compute the color response of the Kinect sensor.
2) The Kinect depth channel is used as a first pass to find objects within the scene
3) The Kinect color channels are converted to HSV space and the Hue parameter is used in combination with a Rapidly Exploring Random Tree-type algorithm to search for red parts near these objects. Only the hue parameter is used to provide robustness against lighting differences.
4) After this, the circular hough transform is used to find circles of expected sizes (opening of a bottle vs opening of a can).
5) The centers of these circles are then ran through the parralax correction formula to find the coordintaes of the base of the can / bottle.
After many late nights, and several early mornings, our team successfully completed almost all project challenges bar one (one of our late nights is shown below). It was a great semester and I learned a lot! Thank you Spns and other lecturers!