Object Tracking Robot | Jiaxing/Louis Li

Overview

This project was developed as the final assignment for Autonomous Robotics I (Spring 2025) at Carnegie Mellon University, taught by Prof. Ziad Youssfi. The goal was to build an autonomous mobile robot capable of finding and tracking any user-specified object from the 80 classes in the COCO dataset using state-of-the-art computer vision and robotics frameworks.

The robot successfully integrates deep learning-based object detection, ROS 2 (Robot Operating System 2) for distributed control, and NVIDIA’s Isaac ROS accelerated libraries for real-time inference on edge computing hardware.

System Architecture

Hardware Platform

Mobile Base: Waveshare JetBot chassis with differential drive
Computing: NVIDIA Jetson Orin Nano (GPU-accelerated edge AI computer)
Vision: Intel RealSense D435i camera with RGB and stereoscopic depth sensing
Power: Onboard battery system for autonomous operation

Software Stack

Operating System: Ubuntu 20.04 with ROS 2 Foxy
Object Detection: YOLOv8 (You Only Look Once) trained on COCO dataset
Acceleration: NVIDIA Isaac ROS for GPU-optimized computer vision pipelines
Control Framework: ROS 2 Action Server/Client architecture for asynchronous task management

Implementation

The system was implemented using a distributed ROS 2 architecture:

1. Perception Pipeline

Real-time video streaming from Intel RealSense camera
Hardware-accelerated YOLOv8 inference using Isaac ROS DNN nodes
Detection of 80 object classes (person, car, bottle, teddy bear, etc.) with confidence scores
Depth estimation for 3D localization of detected objects

2. ROS 2 Action Server

Implemented a custom action server for object search and tracking tasks
Accepts client requests specifying target object class
Provides feedback on search progress (scanning, object detected, tracking)
Returns success/failure status with final object position

3. ROS 2 Action Client

Command interface allowing users to specify target objects
Handles action goal submission and result processing
Manages state transitions between searching, tracking, and idle modes

4. Motion Control

Differential drive controller for base navigation
PID-based visual servoing to center target in camera frame
Adaptive speed control based on object distance and confidence
Obstacle avoidance using depth information

Performance and Results

The robot successfully demonstrated autonomous object finding and tracking capabilities across multiple object classes:

Real-time demonstrations: (Left) Successfully finding and tracking a teddy bear; (Right) Detecting and tracking an orange among other objects.

Key Achievements:

Detection Accuracy: Successfully detected target objects with >90% confidence in varied lighting conditions
Tracking Stability: Maintained continuous visual tracking while navigating toward targets
Real-time Performance: Achieved 15-20 FPS inference rate on Jetson Orin Nano
Multi-object Handling: Correctly distinguished between different object classes in cluttered scenes
Depth Integration: Used stereoscopic depth for distance estimation and approach control

Technical Highlights:

Robust Object Recognition: YOLOv8’s single-shot detection enabled fast, accurate identification without manual feature engineering
GPU Acceleration: Isaac ROS reduced inference latency by ~3x compared to CPU-only implementation
Asynchronous Control: ROS 2 action framework allowed non-blocking task execution with real-time feedback
Modular Architecture: Clean separation between perception, planning, and control enabled rapid debugging and iteration

Challenges and Solutions

Hardware Logistics

The project faced significant logistical challenges with late delivery of the robot base and supply constraints on the Jetson Orin Nano (NVIDIA’s price reduction led to widespread unavailability). Despite having only a few weeks for hardware testing, the team successfully integrated all components and achieved the demonstration objectives.

Software Integration

Integrating Isaac ROS with custom ROS 2 nodes required careful attention to message types, timing, and GPU memory management. The solution involved using ROS 2’s component composition for efficient zero-copy message passing and configuring Isaac ROS parameters for optimal Jetson performance.

Future Improvements

Potential enhancements identified for future iterations:

Integration with Isaac Sim for software-in-the-loop testing before hardware deployment
Implementation of Visual SLAM for autonomous mapping and localization
Path planning algorithms (e.g., A*, RRT) for navigation in complex environments
SMACC2 state machines for more structured behavior management
Multi-object tracking for simultaneous detection of multiple targets

Course Context

This project synthesized knowledge from three main course modules:

Deep Learning Fundamentals: Understanding CNNs, loss functions, and training pipelines
ROS 2 Ecosystem: Publishers, subscribers, services, actions, and distributed systems
Isaac ROS Integration: GPU-accelerated perception for real-time robotics applications

The hands-on nature of the project made abstract ML concepts tangible and demonstrated the practical challenges of deploying AI on resource-constrained robotics platforms. Special thanks to Prof. Ziad Youssfi and the teaching assistants for their support throughout the semester!