NEWSFERENCE
FRI, 08 May 2026 04:00:00
LIVE
$ today --liveF1TodayF2YesterdayF3ArchiveF4About
NEXT SCAN
← BACK TO TODAY/CLUSTER · ARXIV · RESEARCH
CLUSTER · TIER 2
FIRST SEEN 5D AGO
ARXIVRESEARCH

Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning

arXiv:2604.18978v2 Announce Type: replace-cross Abstract: Scaling critic capacity is a promising direction for improving off-policy reinforcement learning (RL). However, recent work shows that larger critics are prone to overfitting and instability in replay-based bootstrapped training. In this paper, we propose using Low-Rank Adaptation (LoRA) as a structural regularizer for critic learning. Our approach freezes randomly initialized base matrices and optimizes only the corresponding low-rank adapters, thereby constraining critic updates to a low-dimensional subspace. We evaluate our method across different off-policy RL algorithms, including SAC and FastTD3 based on different network architectures. Empirically, LoRA efficiently reduces critic loss during training and improves overall policy performance, achieving the best or competitive results on most tasks. Extensive experiments demonstrate that our low-rank updates provide a simple and effective form of structural regularization for critic learning in off-policy RL.

Sources
2
X mentions
First seen
5Dago
Velocity
+4%/6h
CONTRIBUTING SOURCES
2 ARTICLES
  1. Apple Machine Learning6D AGO
    machinelearning.apple.com/research/rvpo-risk-sensitive-alignment
  2. arXiv: Artificial Intelligence5D AGO
    arxiv.org/abs/2604.18978
X DISCOURSE
AWAITING X SIGNAL
No notable English-language X chatter on this entity yet.