Final Project in C106A : Introduction to Robotics
Our goal was to create a robot that could perform a handshake with a human.
Robots may appear inherently inhuman, with mechanical components designed to move through space with precision and speed, therefore they lack the organic movements normal to humans. Our project aimed to bridge this gap by creating a robot that could perform a handshake with a human, turning robotic motion into something organic and human-like. This task is deceptively complex, as it requires a robot to move through it’s joint space in a way that is both precise and fluid, while also being able to sense the presence of a human hand and adjust its motion accordingly.
We decided to go with a 4 Module System Architecture:
We opted for a single RealSense camera with depth information mounted on a tripod to have a clear view of the scene. This setup allowed us to use a single camera system while still being able to extract 3D coordinates from the camera data. However, we had to position the tripod at a sufficiently large distance to ensure there was enough room for the robot arm to move without obscuring the human’s hand. We also attached an AR tag to the camera lens to determine the camera’s position from the robot’s frame. This setup had some issues, such as inconsistent positioning and uncertainty about the orientation of the camera’s axes, which affected the accuracy of the position.
We chose to track the feature point of the wrist, as it provided a consistent point for tracking the human’s motion. If we had more time, we could have used additional points to determine the orientation of the hand as well, however because some of the points on the hand are obscured, we could not have simply used the depth map to compute the 3D points, and would have had to use percieved depth information from Mediapipe in combination with the real depth at a certain point to compute 3D coordinates of 3 points on the hand. We streamed the data of the points at the frame rate of the camera, as we believed having more data would be generally helpful for the trajectory and we could figure out how to use it later.
For simplicity, we mirrored the hand’s position across the x, y, and z planes, which allowed us to achieve almost all the functionality we desired for executing a handshake procedure. We did not publish orientation information due to time constraints and focused more on achieving smooth motion.
Initially, we tested simple linear path following, but it was too restrictive for the points we were feeding in, causing the robot to not move. We then tested simple approaches using the MoveIt controller to reach the published points, but this resulted in strange paths and jittery movement as the robot would stop after each path was executed. To address this, we set simple thresholds on the distance between points and adjusted the sampling rate by taking the median of a window of 5 points to remove outliers and reduce unnecessary movements. We also set joint constraints on the Sawyer arm to prevent unnecessary joint rotations and created bounding box objects to prevent collisions with obstacles. Finally, we used Cartesian path planning to take in a queue of points, which led to some latency in determining the path but resulted in smoother and more consistent movement.
These design choices impacted the project’s robustness, durability, and efficiency. The single camera setup provided a cost-effective solution but introduced challenges in positioning and orientation accuracy. The wrist tracking method ensured consistent motion tracking but limited the ability to capture hand orientation. The simplified mirroring approach allowed for functional handshakes but lacked orientation data, which could be critical in more complex applications. The path execution improvements, such as joint constraints and Cartesian path planning, enhanced the smoothness and consistency of the robot’s movements, making the system more reliable and efficient in real-world scenarios. However without online path planning, the robot’s movements were slower and less responsive, meaning it could not keep up with the human’s movements.
We used the following hardware components for our project:
Sawyer Robot by Rethink Robotics: A versatile and collaborative robot arm. Sawyer Robot by Rethink Robotics
Intel RealSense Depth Camera D435: A depth camera used for tracking the human hand in 3D space. Intel RealSense Depth Camera D435
We began our implementation from Lab 5 and Lab 7 codebases which provided us with functionality for inverse kinematics and QR code detection.
The software components we developed include:
cam_transform.py:
realsense_tracking.py:
hand_pose_tracking
node and /hand_pose
publisher that publishes a PointStamped
object (queue size of 1).HandLandmarker
MediaPipe object and processes images to find the coordinates of hand landmarks, specifically the wrist point./hand_pose
topic.realtime_move.py:
arm_controller
node that subscribes to /hand_pose
.cam_transform.py
.PointStamped
read from /hand_pose
and publishes it to /hand_pose_base
.handshake_procedure.py:
handshake_procedure
node.hand_pose_base
.ProcedureStep
object and saves to a JSON file).ProcedureStep
: boolean axis_x
, boolean axis_y
, boolean axis_z
, float duration
.HandshakeProcedure
: loops for the duration amount of time and continuously calculates and publishes the desired robot point as a PointStamped
to the /robot_point
publisher.move_hand.py:
arm_controller
node.MoveGroupCommander
using the RRT planner.CollisionObject
: creates the collision box objects representing the walls and ceiling of the arm environment.JointConstraint
objects and adds them as constraints to the planner./robot_point
and runs pose_callback()
.pose_callback
: Reads in the desired point and adds the point to the waypoints queue if it meets the specified threshold for L2 distance away.compute_cartesian_path
with the queue of waypoints (avoid_collision=True), and executes this plan.cam_transform.py
to determine the transform from the camera frame to the robot frame and save the transform as a JSON file.realsense_tracking.py
to start an interactive window displaying the hand features being tracked and published to the /hand_pose
topic. Additionally run realtime_move.py
to transform the hand pose from the camera frame to the robot frame and publish it to the /hand_pose_base
topic.handshake_procedure.py
and specify the desired handshake configuration (i.e., which axes to mirror about). This procedure is saved to a JSON file, which can be specified as a command line argument to skip this step in the future. Center the hand in the window and establish the point to mirror about.move_hand.py
to start the path planner and movement node. The Sawyer arm will start following/mirroring hand movements.Our project was able to successfully track a human hand in 3D space and mirror its movements with the Sawyer robot. The robot was able to perform a handshake with a human, moving its end effector in a smooth and natural manner. It was both capable of tracking human hand movements and mirroring them in order to create a handshake motion. The robot was able to execute the handshake procedure within a designated area and did not harm the human, surrounding objects, or itself.
Our results largely matched our design criteria, except for some minor inaccuracies and latency. As shown in the video, we were able to make the Sawyer robot mirror and track in such a way that it could perform certain kinds of handshakes with a human partner. The movement was somewhat slow due to the time taken to perform real-time path planning. The tracking could also be slightly inaccurate due to human error during camera calibration or if the camera was moved.
We encountered difficulties calibrating the transform from the camera frame to the robot base frame. This was caused by the poor quality of the robot wrist camera, bad lighting, glare from the camera, and interference from other AR markers. We overcame these problems by moving the wrist camera closer, turning off the RealSense camera, and recalibrating until the transform stabilized. We also had issues with the path planning settling on paths that were unnecessarily long or dangerous. We solved this by adding collision boxes to bound the arm’s workspace and by adding joint constraints. Additionally, we used a Cartesian path planner, which planned a path based on a set of waypoints, producing more sensible paths. We also encountered a delay in the robot’s movement due to the time spent on path planning.
Our method for transforming hand points in the camera frame to the robot base frame is somewhat improvised. Ideally, we would streamline a process where the Sawyer robot automatically computes this transform without the need for an AR marker and manual calibration. To reduce latency caused by path planning, we filtered out points that were close together to lessen the load on the path planner. While effective to some extent, a more robust solution would involve optimizing the path planning process itself to enhance speed. Even with filtering, delays persist, requiring the human partner to move slowly for successful tracking and mirroring. With additional time, we would focus on speeding up path planning and improving the responsiveness of the robot.