Goal-Driven Navigation via Deep Reinforcement Learning

This project trains a mobile robot to autonomously navigate toward randomly placed goals while avoiding obstacles, using the TD3 (Twin Delayed Deep Deterministic Policy Gradient) algorithm. The system runs in ROS Noetic + Gazebo 11, with a Velodyne 3D LiDAR and RGB camera as sensor inputs, outputting continuous linear and angular velocity commands.

Training Demo

Simulation Environment & Sensors

Gazebo Environment

Velodyne LiDAR (RViz)

Algorithm: TD3

Twin Delayed Deep Deterministic Policy Gradient (TD3) addresses the overestimation bias of standard actor-critic methods through three key modifications:

1. Clipped Double Q-Learning — Two independent critic networks estimate the Q-value; the minimum is used for the TD target, reducing overestimation.

2. Delayed Policy Updates — The actor and target networks update less frequently than the critics (every 2 critic updates), allowing the value estimates to stabilize before the policy changes.

3. Target Policy Smoothing — Noise is added to the target action during critic updates, smoothing the value function and reducing exploitation of sharp Q-function peaks.

System Architecture

Component	Details
Simulation	Gazebo 11, ROS Noetic
Sensors	Velodyne 3D LiDAR, RGB camera
Action space	Continuous linear + angular velocity
State space	LiDAR scan + goal direction + distance
RL Algorithm	TD3 (actor-critic)
Deep learning	PyTorch 1.10+
Training monitor	TensorBoard

The actor network outputs deterministic velocity commands given the current sensor state. The dual critic networks estimate the expected cumulative reward, providing a stable learning signal via the Bellman equation.

View Code on GitHub