Project
# | Title | Team Members | TA | Documents | Sponsor |
---|---|---|---|---|---|
25 | Long-horizon Task Completion with Robotic Arms by Human Instructions |
Bingjun Guo Qi Long Qingran Wu Yuxi Chen |
design_document1.pdf proposal1.pdf |
Gaoang Wang | |
# Problem The use of robotic arms for long-horizon tasks such as assembling, cooking, and packing that involve multi-step operations is growing. The interdependencies among subtasks, shifting environmental conditions, and requirement for constant feedback integration, however, make it extremely difficult to execute such tasks steadily. Current approaches frequently have trouble with skill chaining, task decomposition, and preserving robustness while being executed. For robotic arms to be able to manipulate objects on their own based on real-time feedback and finish long-term tasks, a comprehensive framework that incorporates perception and planning is therefore required. # Solution Overview Our solution for enabling a robot to conduct a series of tasks is to combine Perception, Planning and Acting Intelligence as a whole. The robotic arm is our primary entity. Firstly, for perception, it has a e.g. RGB camera on the top to capture the scene, including recognizing the objects using computer vision. Then, for planning, with uploaded captured images and user's instructions, the robot will do the analysis and task planning. Finally, for acting, the plan is reflected as a guide for the robotic arm to move. During the process of acting, the sensors including the rgb-camera on the robotic arm will provide continuous feedback, which will revise the action of the robotic arm in a control system loop. The whole process will loop over these three steps until the series of tasks are completed. # Solution Components ## Output Subsystem - A robotic arm (UR3) - A specially designed grasper - Exclusive tools for the long-horizon task set (e.g. a screwdriver for assembly tasks) ## Feedback Subsystem - Visual sensors (e.g. a RGB or RGB-D camera, depending on availability) - Tactile sensors - Corresponding circuits preprocessing the perceptional signals ## Planning Subsystem - A language model to extract semantic information from instructions - A vision model that preprocess input images - An agent model that process inputs, plan movements, and carry out tasks according to feedback # Criteria of Success - Overall: The robotic arm can successfully complete a certain set of long-horizon tasks (t.b.d according to feasibility) based on human instructions in a zero-shot manner. - Perception: The system can accurately recognize objects in the scene. - Planning: The system can generate reasonable multi-step operations. - Acting: The robotic arm can follow the generated plan and adjust its movement based on real-time feedback to improve accuracy and robustness. - Safety: The robotic arm can avoid collisions. |