Adapting PAD to Trifinger Robotic Manipulation in Causal World
Published:
Deep Reinforcement Learning (RL) policies frequently fail when deployed in environments that differ from their training conditions. In this work, we propose a self-supervised Test- Time Adaptation (TTA) framework that leverages an auxiliary Inverse Dynamics (ID) objective to update the agent’s encoder during deployment. Unlike prior methods that update all encoder weights, we introduce an adaptation strategy that freezes the policy backbone and updates only the normalization statistics of the shared layers. We evaluate this method on the CausalWorld robotic benchmark under two distinct shift types: physical dynamics (robot mass) and spatial distribution (goal position). Our results show that joint training with the auxiliary task significantly improves baseline robustness. Furthermore, we find that Lifelong Adaptation outperforms episodic adapta tion and zero-shot baselines, recovering policy performance on severe out-of-distribution (OOD) goal shifts where frozen agents fail. These findings suggest that constrained, self-supervised adaptation is a viable path for robustifying robot policies against unforeseen variations without access to reward signals.
Code is available at this url.
