The Hidden Power of Scaling Factor in LoRA Optimization
Abstract
Low-Rank Adaptation (LoRA) scaling factor α functions as a primary optimization driver rather than a secondary learning rate complement, with theoretical and empirical analysis revealing its superior impact on convergence and optimal scaling behavior.
In Low-Rank Adaptation (LoRA), the scaling factor α is often treated as a mere complement to the learning rate, yet its role in optimization remains poorly understood. In this paper, we reveal that the scaling factor α and the learning rate function differently, with α emerging as the dominant driver of effective optimization, delivering gains that cannot be replicated by learning rate scaling alone. Through the synergy of extensive empirical analysis and a theoretical Signal-Drift framework, we uncover three findings into LoRA's scaling mechanism: First, LoRA's spectral suppression smooths the optimization landscape, rendering standard hyperparameters overly conservative and creating an optimization gap. Second, when leveraging this smoothness to accelerate convergence, α outperforms the learning rate by amplifying the task signal without increasing the drift ratio. Third, the optimal scaling factor follows a sublinear relationship with the rank, well characterized by a square-root law with an unexpectedly large coefficient, revealing the insufficient scaling of existing rank-tied heuristics. Based on these insights, we propose LoRA-α, a minimalist framework that restores α to its principled regime, making LoRA compatible with standard small learning rates. Extensive evaluations across diverse tasks demonstrate that LoRA-α consistently improves performance while streamlining hyperparameter search, unleashing the learning potential of LoRA.
Community
Maybe the first systematic study (empirical + theoretical) of LoRA's scaling factor $\alpha$ from an optimization perspective!
Recent studies highlight that a large learning rate ($\eta$) is crucial for LoRA optimization. However, this paper points out that such a conclusion was drawn while leaving the scaling factor $\alpha$ systematically underexplored. Through a joint empirical and theoretical lens, the authors reveal a shifting paradigm: a significantly large scaling factor $\alpha$ is what actually matters most, delivering optimization gains that learning rate scaling alone cannot replicate.
Key takeaways:
LoRA's low-rank nature smooths the optimization landscape (spectral suppression), making standard hyperparameters overly conservative and causing an optimization gap.
$\alpha$ vs $\eta$: Increasing $\alpha$ acts as a "purity-preserving accelerator", which amplifies the task signal without increasing the bilinear drift, outperforming learning rate scaling.
Under standard, small learning rates, an optimal $\alpha$ must be sufficiently large and follow a sublinear relationship with rank ($r$). This reveals that popular rank-tied heuristics (like $\alpha = r$ or $2r$) leave LoRA severely under-scaled due to their insufficient magnitudes.
Based on these insights, the authors propose LoRA-$\alpha$, which scales $\alpha$ based on a principled square-root law (e.g., using a large base coefficient like $256\sqrt{r}$). This minimalist shift allows LoRA to directly inherit standard, small Full Fine-Tuning (FFT) learning rates while matching or even exceeding FFT performance across NLP, multimodal, and RL tasks.
Bye-bye expensive hyperparameter tuning! 👋
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- TLoRA: Task-aware Low Rank Adaptation of Large Language Models (2026)
- NaRA: Noise-Aware LoRA for Parameter-Efficient Fine-Tuning of Diffusion LLMs (2026)
- Learning in the Fisher Subspace: A Guided Initialization for LoRA Fine-Tuning (2026)
- Balanced LoRA: Removing Parameter Invariance to Accelerate Convergence (2026)
- Can Muon Fine-tune Adam-Pretrained Models? (2026)
- Strategic Over-Parameterization for Generalizable Low-Rank Adaptation (2026)
- Post-Optimization Adaptive Rank Allocation for LoRA (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2606.12883 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper