Irina Piontkovskaia, Sergey Nikolenko
Task vectors, LoRA, activation steering, and random search around pretrained weights all suggest that learned behaviour can be controlled by linear directions. We ask which linear structures actually exist and on what scale. In a synthetic multitask transformer and LoRA adapters on DistilGPT-2 / GPT-2 we find strong local low-rank task-gradient structure but reject the fixed-task-plane hypothesis: static bases miss the recovery direction, and the useful basis drifts substantially within 100 steps. However, the first recovery updates form a trajectory-prefix basis capturing 77% of the LoRA recovery displacement. We develop random search theory with a Gaussian local-linear theorem that justifies the effectiveness of random parameter search even in very high dimensions. We also study the relation between parameter perturbations and activation steering: a single gradient step produces an activation shift with 0.58 cosine to a labelled-contrast CAA steering vector, with a similar steering effect on Qwen-0.5B BoolQ statements. We validate our results with experiments on synthetic Transformers and LLMs. Our results suggest that linear structures in trained networks are not global task directions, but evolving local geometries that partially persist across parameter and activation spaces.
This paper investigates a fundamental question about the geometry of trained neural networks: what kind of linear structure actually exists in weight and activation spaces, and how stable is it? The paper's central thesis is that the linear structures leveraged by task vectors, LoRA, activation steering, and random search are local and evolving rather than global and static. The authors formalize this through three theoretical contributions: (1) a best-of-N theorem explaining why random parameter search works in high dimensions, (2) a signal density proposition explaining subspace selection, and (3) a Krylov subspace characterization of recovery trajectories. They also establish an empirical bridge between weight perturbations and activation steering vectors via the pushforward identity.
The "recoverable but not stationary" framing is the paper's most conceptually valuable contribution — reframing the widespread implicit assumption of fixed task subspaces as a more nuanced picture of drifting, trajectory-aligned local geometry.
Theoretical results: The best-of-N theorem (Theorem 1) is mathematically straightforward — essentially a consequence of Gaussian projection properties and extreme value theory — but its interpretation is insightful. The dimension-independence of the projection variance σ²‖a‖² is the key observation that explains why random search isn't hopeless in 10⁹-dimensional spaces. Proposition 1 on signal density vs. mass adds practical value. Lemma 1 on Krylov recovery is clean but limited by the constant-Hessian assumption, acknowledged by the authors.
Experimental design: The "recovery after forgetting" setup is well-designed as a controlled probe — it creates a known displacement vector Δ_GD against which subspaces can be measured. The multi-scale experimental progression (synthetic → LoRA → LLM) is sensible. However, there are notable weaknesses:
The authors are commendably transparent about these limitations.
Theoretical framing: The paper provides a useful conceptual vocabulary (trajectory-prefix basis, signal density, local-linear regime) that could influence how researchers think about model editing, continual learning, and LoRA adaptation. The insight that sequential task specialization is brittle while simultaneous training is robust (Table 5, Figure 4) has practical implications for multitask fine-tuning.
Random search theory: The dimension-independent signal result could influence zeroth-order optimization and evolutionary strategy research for LLMs. The F₁/F₂ decomposition for locating the linear regime is a practical diagnostic tool.
Weight-to-activation bridge: The pushforward connection between weight perturbations and activation steering is conceptually elegant and potentially impactful for interpretability and alignment research. If gradient steps naturally produce vectors aligned with contrastive steering vectors (0.58 cosine), this could simplify steering vector construction. However, the effect is demonstrated on limited configurations.
Breadth of impact: The paper touches many active research areas — LoRA, model merging, activation steering, continual learning, random search — but doesn't go deep enough in any single area to be transformative. It's more of an analytical/diagnostic contribution than a methodological one.
The paper addresses a timely question. The proliferation of linear intervention methods (task vectors, LoRA, activation steering, ReFT) has created an implicit shared assumption about global linear task structure that hasn't been rigorously examined. The "neural thickets" result from Gan et al. (2026) specifically motivates the random search angle. The paper's differential-geometric perspective is a natural complement to the more empirical mechanistic interpretability literature.
1. Conceptual clarity: The progression from "static task plane" to "drifting trajectory bundle" is well-articulated and supported by multiple experimental modalities
2. Theory-experiment alignment: Proposition 1's predictions about signal density match the observed random search rankings (Figure 2b)
3. Honest reporting: The authors clearly delineate where their results are clean (synthetic, LoRA) versus noisy (LLM scale), and where controls fail to separate from signal
4. Multi-level experimental design: Testing across synthetic transformers, LoRA adapters, and full LLMs provides appropriate validation
5. The 77% trajectory-prefix capture is a clean, memorable quantitative result
1. Limited task diversity: Two LoRA task pairs and one primary LLM steering task (BoolQ) are insufficient for strong generalization claims
2. The trajectory prefix requires the trajectory: As the authors acknowledge, the most useful subspace (trajectory prefix) requires already running recovery, limiting practical applicability
3. Scaling concerns: The partial failure of low-rank claims at 7B scale raises questions about whether the framework applies where it matters most
4. Activation steering results are preliminary: The 0.58 cosine is moderate, and Table 6 shows inconsistent results across models (OLMo BoolQ: +1.2 target gain at -17 side cost)
5. The theoretical results, while correct, are not particularly deep — Theorem 1 follows from standard Gaussian properties, and Lemma 1 is textbook for quadratic losses
6. No comparison to concurrent work on representation geometry (e.g., linear representation hypothesis literature)
This is a solid analytical paper that asks the right question and provides a useful conceptual framework. Its main contribution is diagnostic rather than constructive: it clarifies what linear structures exist in trained networks but doesn't yet yield a new algorithm or method. The theoretical results are clean but not technically surprising. The experiments are well-designed but limited in scope. The paper would benefit from broader task coverage and stronger LLM-scale validation. It represents good science — careful, honest, and well-framed — but falls short of being a major breakthrough.
Generated Jun 10, 2026
Paper 2 likely has higher impact: it targets a central bottleneck (quadratic attention) with broad, timely relevance to scalable sequence modeling. It provides an across-domain empirical comparison (code pretraining, distillation, time-series foundation models) plus a unifying formulation and mechanistic analysis, supporting methodological rigor and generalizable principles for subquadratic architectures. This combination of practical guidance (which architecture works) and explanatory framework can influence model design across fields. Paper 1 is novel and insightful on local linear structure/steering, but its immediate real-world leverage appears narrower and more interpretive.
Paper 2 addresses the fundamental mechanisms of Large Language Models (LoRA, activation steering), a highly active and globally impactful research area. Its insights into local linear structures directly influence LLM fine-tuning, interpretability, and alignment, offering broader and more immediate real-world applications across the AI community compared to the more specialized focus on PINN error bounds in Paper 1.
Paper 1 addresses a critical and highly timely bottleneck in modern LLM training—the rollout stage of RL. By demonstrating a 1.8x end-to-end acceleration using a novel loss function and rejection sampling, it offers immense practical utility that is likely to be widely adopted in industry and academia. While Paper 2 provides valuable theoretical insights into network geometry and steering, Paper 1's concrete, large-scale computational efficiency gains present a higher immediate scientific and practical impact.
Paper 2 investigates the foundational mechanics of how Large Language Models adapt and represent tasks, directly impacting highly active fields like mechanistic interpretability, efficient fine-tuning (LoRA), and model alignment. While Paper 1 offers a solid algorithmic improvement for preference-based RL, Paper 2 provides theoretical and empirical insights into LLM geometries that have broader implications across the current AI research landscape.
Paper 1 likely has higher impact: it addresses a central, timely question in foundation-model research (what linear structure really exists behind task vectors/LoRA/steering), unifies several popular phenomena, and contributes both empirical findings across LLMs and a theoretical result (Gaussian local-linear theorem) explaining random search efficacy in high dimensions. Its breadth spans interpretability, fine-tuning, optimization, and model control, with implications for many downstream methods. Paper 2 is practically valuable for robotics and likely improves performance, but the core idea (Fourier features) is less novel and narrower in cross-field reach.
Paper 1 has higher likely impact due to stronger novelty and broader relevance to current LLM practice: it unifies and tests hypotheses behind task vectors/LoRA/activation steering, introduces an explicit “non-stationary local linear geometry” picture, and adds theory explaining why random search can work in high dimensions. Its applications span model editing, steering, fine-tuning, and interpretability across many pretrained models. Paper 2 is valuable and more biologically motivated, but its scope is narrower (scaling FA in vision nets) and the proposed fixes (orthogonalization/normalization) are more incremental.
Paper 1 targets a highly active area (LLM control/editing, LoRA, activation steering) with broad, near-term relevance across ML and AI safety. It contributes conceptual clarification (rejecting fixed task-plane hypothesis), empirical results across synthetic and real LLMs, and a theoretical justification for random parameter search in high dimensions. These findings could influence multiple intervention methods and how practitioners think about linear structure in networks. Paper 2 is novel and rigorous in applied topology/ECT encoding, but its applications are narrower and likely to impact a smaller community.
Paper 2 offers fundamental insights into the geometric structure of LLM representations, bridging task vectors, LoRA, and activation steering. By challenging the stationary task-plane assumption and establishing theoretical foundations for local linear structures, it has broad implications for interpretability, alignment, and fine-tuning. Paper 1 presents a solid methodological improvement for continual learning, but Paper 2's foundational contributions are likely to influence a wider range of future research directions across the field.
Paper 2 addresses a critical and immediate bottleneck in deploying massive Mixture-of-Experts (MoE) models. Its practical approach to compressing models via intermediate dimension trimming offers significant reductions in memory and improvements in inference throughput without catastrophic capability loss. While Paper 1 provides valuable theoretical insights into the dynamics of local linear structures and steering, Paper 2 has much higher potential for rapid, widespread adoption and real-world application in the deployment of state-of-the-art LLMs.
Paper 1 addresses a concrete, widely-relevant problem in RLVR/PPO for LLMs—the mismatch between uniform trust regions and autoregressive generation—with a principled solution (CPPO) backed by theoretical bounds and empirical improvements across model scales. Given the enormous current interest in LLM reasoning via RL (e.g., DeepSeek-R1, OpenAI o1), this has immediate practical applicability. Paper 2 provides interesting mechanistic insights about linear structures in neural networks, but its findings are more exploratory and descriptive, with narrower immediate practical impact and experiments on smaller models.