Goal-Driven Navigation via Deep Reinforcement Learning

This project trains a mobile robot to autonomously navigate toward randomly placed goals while avoiding obstacles, using the TD3 (Twin Delayed Deep Deterministic Policy Gradient) algorithm. The system runs in ROS Noetic + Gazebo 11, with a Velodyne 3D LiDAR and RGB camera as sensor inputs, outputting continuous linear and angular velocity commands.


Training Demo

Training demo

Simulation Environment & Sensors

Gazebo Environment

Gazebo environment

Velodyne LiDAR (RViz)

Velodyne LiDAR RViz

Algorithm: TD3

Twin Delayed Deep Deterministic Policy Gradient (TD3) addresses the overestimation bias of standard actor-critic methods through three key modifications:

1. Clipped Double Q-Learning — Two independent critic networks estimate the Q-value; the minimum is used for the TD target, reducing overestimation.

2. Delayed Policy Updates — The actor and target networks update less frequently than the critics (every 2 critic updates), allowing the value estimates to stabilize before the policy changes.

3. Target Policy Smoothing — Noise is added to the target action during critic updates, smoothing the value function and reducing exploitation of sharp Q-function peaks.


System Architecture

Component Details
Simulation Gazebo 11, ROS Noetic
Sensors Velodyne 3D LiDAR, RGB camera
Action space Continuous linear + angular velocity
State space LiDAR scan + goal direction + distance
RL Algorithm TD3 (actor-critic)
Deep learning PyTorch 1.10+
Training monitor TensorBoard

The actor network outputs deterministic velocity commands given the current sensor state. The dual critic networks estimate the expected cumulative reward, providing a stable learning signal via the Bellman equation.


View Code on GitHub