NEWSFERENCE
MON, 11 May 2026 04:00:00
LIVE
$ today --liveF1TodayF2YesterdayF3ArchiveF4About
NEXT SCAN
← BACK TO TODAY/CLUSTER · ARXIV · RESEARCH
CLUSTER · TIER 3
FIRST SEEN 5D AGO
ARXIVRESEARCH

A Theory of Online Learning with Autoregressive Chain-of-Thought Reasoning

arXiv:2605.06819v1 Announce Type: new Abstract: Autoregressive generation lies at the heart of the mechanism of large language models. It can be viewed as the repeated application of a next-token generator: starting from an input string (prompt), the generator is applied for $M$ steps, and the last generated token is taken as the final output. [Joshi et al., 2025] proposed a PAC model for studying the learnability of the input-output maps arising from this process. We develop an online analogue of this framework, focusing on the mistake bound of learning the final output induced by an unknown next-token generator. We distinguish between two forms of feedback. In the End-to-End model, after each round the learner observes only the final token produced after $M$ autoregressive steps. In the Chain-of-Thought model, the learner is additionally shown the entire $M$-step trajectory. Our goal is to understand how the optimal mistake bound depends on the generation horizon $M$, and to what extent observing intermediate tokens can reduce this dependence. Our main results show that the online theory of autoregressive learning exhibits a qualitative picture analogous to the statistical one found by [Hanneke et al., 2026], but with a different scale of dependence on the generation horizon. In the End-to-End model, we prove a taxonomy of possible mistake-bound growth rates in the generation horizon $M$: essentially any rate between constant and logarithmic can arise. We further show that this logarithmic ceiling is unavoidable. In the Chain-of-Thought model, we show that access to the full generated trajectory eliminates the dependence on $M$ altogether. We also analyze autoregressive linear threshold classes, and prove optimal mistake bounds, as well as a new lower bound for the statistical setting. Along the way, our results resolve several questions left open by [Joshi et al., 2025].

Sources
1
X mentions
First seen
5Dago
Velocity
+2%/6h
CONTRIBUTING SOURCES
1 ARTICLES
  1. arXiv: Machine Learning5D AGO
    arxiv.org/abs/2605.06819
X DISCOURSE
AWAITING X SIGNAL
No notable English-language X chatter on this entity yet.