Adapting PAD to Trifinger Robotic Manipulation in Causal World

Published: December 12, 2025

Deep Reinforcement Learning (RL) policies frequently fail when deployed in environments that differ from their training conditions. In this work, we propose a self-supervised Test- Time Adaptation (TTA) framework that leverages an auxiliary Inverse Dynamics (ID) objective to update the agent’s encoder during deployment. Unlike prior methods that update all encoder weights, we introduce an adaptation strategy that freezes the policy backbone and updates only the normalization statistics of the shared layers. We evaluate this method on the CausalWorld robotic benchmark under two distinct shift types: physical dynamics (robot mass) and spatial distribution (goal position). Our results show that joint training with the auxiliary task significantly improves baseline robustness. Furthermore, we find that Lifelong Adaptation outperforms episodic adapta tion and zero-shot baselines, recovering policy performance on severe out-of-distribution (OOD) goal shifts where frozen agents fail. These findings suggest that constrained, self-supervised adaptation is a viable path for robustifying robot policies against unforeseen variations without access to reward signals.

Code is available at this url.

Download Paper | Download Bibtex

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Alexandre St-Aubin

Share on