cool-papers

dhuynh95 's Collections

cool-papers

funny-stuff

updated 30 days ago

Upvote

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Paper • 2403.09029 • Published Mar 14, 2024 • 57
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

Paper • 2403.12968 • Published Mar 19, 2024 • 25
RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15, 2024 • 72
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14, 2024 • 79
Simple and Scalable Strategies to Continually Pre-train Large Language Models

Paper • 2403.08763 • Published Mar 13, 2024 • 51
Language models scale reliably with over-training and on downstream tasks

Paper • 2403.08540 • Published Mar 13, 2024 • 15
Algorithmic progress in language models

Paper • 2403.05812 • Published Mar 9, 2024 • 19
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8, 2024 • 64
TnT-LLM: Text Mining at Scale with Large Language Models

Paper • 2403.12173 • Published Mar 18, 2024 • 20
Larimar: Large Language Models with Episodic Memory Control

Paper • 2403.11901 • Published Mar 18, 2024 • 33
Reverse Training to Nurse the Reversal Curse

Paper • 2403.13799 • Published Mar 20, 2024 • 13
When Do We Not Need Larger Vision Models?

Paper • 2403.13043 • Published Mar 19, 2024 • 26
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Paper • 2403.14624 • Published Mar 21, 2024 • 53
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions

Paper • 2403.15246 • Published Mar 22, 2024 • 11
Can large language models explore in-context?

Paper • 2403.15371 • Published Mar 22, 2024 • 33
The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26, 2024 • 82
Long-form factuality in large language models

Paper • 2403.18802 • Published Mar 27, 2024 • 26
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text

Paper • 2403.18421 • Published Mar 27, 2024 • 23
Octopus v2: On-device language model for super agent

Paper • 2404.01744 • Published Apr 2, 2024 • 58
Poro 34B and the Blessing of Multilinguality

Paper • 2404.01856 • Published Apr 2, 2024 • 15
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2, 2024 • 37
Training LLMs over Neurally Compressed Text

Paper • 2404.03626 • Published Apr 4, 2024 • 23
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models

Paper • 2404.03543 • Published Apr 4, 2024 • 18
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

Paper • 2404.02575 • Published Apr 3, 2024 • 50
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18, 2024 • 55
Compression Represents Intelligence Linearly

Paper • 2404.09937 • Published Apr 15, 2024 • 28
Flamingo: a Visual Language Model for Few-Shot Learning

Paper • 2204.14198 • Published Apr 29, 2022 • 16
Executable Code Actions Elicit Better LLM Agents

Paper • 2402.01030 • Published Feb 1, 2024 • 192
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Paper • 2406.04271 • Published Jun 6, 2024 • 29
To Believe or Not to Believe Your LLM

Paper • 2406.02543 • Published Jun 4, 2024 • 35
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Paper • 2406.06469 • Published Jun 10, 2024 • 29
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale

Paper • 2406.19280 • Published Jun 27, 2024 • 63
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Paper • 2407.01370 • Published Jul 1, 2024 • 89
Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages

Paper • 2407.03321 • Published Jul 3, 2024 • 20
Training Task Experts through Retrieval Based Distillation

Paper • 2407.05463 • Published Jul 7, 2024 • 10
InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct

Paper • 2407.05700 • Published Jul 8, 2024 • 14
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

Paper • 2407.15711 • Published Jul 22, 2024 • 9
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

Paper • 2407.20183 • Published Jul 29, 2024 • 43
OmniParser for Pure Vision Based GUI Agent

Paper • 2408.00203 • Published Aug 1, 2024 • 24
Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models

Paper • 2408.06663 • Published Aug 13, 2024 • 16
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 45
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

Paper • 2408.13257 • Published Aug 23, 2024 • 26
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3, 2024 • 83
ContextCite: Attributing Model Generation to Context

Paper • 2409.00729 • Published Sep 1, 2024 • 14
Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published Sep 5, 2024 • 92
A Controlled Study on Long Context Extension and Generalization in LLMs

Paper • 2409.12181 • Published Sep 18, 2024 • 45
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Paper • 2409.12183 • Published Sep 18, 2024 • 39
Attention Prompting on Image for Large Vision-Language Models

Paper • 2409.17143 • Published Sep 25, 2024 • 7
Can Models Learn Skill Composition from Examples?

Paper • 2409.19808 • Published Sep 29, 2024 • 9
Law of the Weakest Link: Cross Capabilities of Large Language Models

Paper • 2409.19951 • Published Sep 30, 2024 • 54
LLaVA-Critic: Learning to Evaluate Multimodal Models

Paper • 2410.02712 • Published Oct 3, 2024 • 37
From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond

Paper • 2411.03590 • Published Nov 6, 2024 • 10
Autoregressive Models in Vision: A Survey

Paper • 2411.05902 • Published Nov 8, 2024 • 19
Stronger Models are NOT Stronger Teachers for Instruction Tuning

Paper • 2411.07133 • Published Nov 11, 2024 • 38
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

Paper • 2501.05452 • Published Jan 9, 2025 • 15
Demystifying Domain-adaptive Post-training for Financial LLMs

Paper • 2501.04961 • Published Jan 9, 2025 • 11
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Paper • 2412.19723 • Published Dec 27, 2024 • 87
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

Paper • 2501.04003 • Published Jan 7, 2025 • 27
MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 32
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong

Paper • 2501.09775 • Published Jan 16, 2025 • 32
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Paper • 2501.09781 • Published Jan 16, 2025 • 27
CodeMonkeys: Scaling Test-Time Compute for Software Engineering

Paper • 2501.14723 • Published Jan 24, 2025 • 10
s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31, 2025 • 125
LIMO: Less is More for Reasoning

Paper • 2502.03387 • Published Feb 5, 2025 • 62
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?

Paper • 2502.01941 • Published Feb 4, 2025 • 15
Demystifying Long Chain-of-Thought Reasoning in LLMs

Paper • 2502.03373 • Published Feb 5, 2025 • 58
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles

Paper • 2502.01081 • Published Feb 3, 2025 • 13
Great Models Think Alike and this Undermines AI Oversight

Paper • 2502.04313 • Published Feb 6, 2025 • 33
Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models

Paper • 2502.06755 • Published Feb 10, 2025 • 8
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon

Paper • 2502.07445 • Published Feb 11, 2025 • 11
NoLiMa: Long-Context Evaluation Beyond Literal Matching

Paper • 2502.05167 • Published Feb 7, 2025 • 16
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models

Paper • 2502.14802 • Published Feb 20, 2025 • 13
Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering

Paper • 2502.13962 • Published Feb 19, 2025 • 28
Small Models Struggle to Learn from Strong Reasoners

Paper • 2502.12143 • Published Feb 17, 2025 • 39
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?

Paper • 2502.12215 • Published Feb 17, 2025 • 16
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

Paper • 2502.11196 • Published Feb 16, 2025 • 23
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation

Paper • 2502.08826 • Published Feb 12, 2025 • 17
Intuitive physics understanding emerges from self-supervised pretraining on natural videos

Paper • 2502.11831 • Published Feb 17, 2025 • 20
Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking

Paper • 2502.09083 • Published Feb 13, 2025 • 4
MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20, 2025 • 195
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks

Paper • 2502.08235 • Published Feb 12, 2025 • 59
VLM^2-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues

Paper • 2502.12084 • Published Feb 17, 2025 • 35
MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models

Paper • 2502.14302 • Published Feb 20, 2025 • 9
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?

Paper • 2502.19361 • Published Feb 26, 2025 • 28
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

Paper • 2502.18906 • Published Feb 26, 2025 • 12
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

Paper • 2502.17535 • Published Feb 24, 2025 • 8
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs

Paper • 2502.17422 • Published Feb 24, 2025 • 7
Chain of Draft: Thinking Faster by Writing Less

Paper • 2502.18600 • Published Feb 25, 2025 • 50
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users

Paper • 2503.02268 • Published Mar 4, 2025 • 11
LINGOLY-TOO: Disentangling Memorisation from Reasoning with Linguistic Templatisation and Orthographic Obfuscation

Paper • 2503.02972 • Published Mar 4, 2025 • 25
How to Steer LLM Latents for Hallucination Detection?

Paper • 2503.01917 • Published Mar 1, 2025 • 11
On the Acquisition of Shared Grammatical Representations in Bilingual Language Models

Paper • 2503.03962 • Published Mar 5, 2025 • 4
Where do Large Vision-Language Models Look at when Answering Questions?

Paper • 2503.13891 • Published Mar 18, 2025 • 8
Can Large Vision Language Models Read Maps Like a Human?

Paper • 2503.14607 • Published Mar 18, 2025 • 10
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

Paper • 2503.21620 • Published Mar 27, 2025 • 62
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Paper • 2503.18878 • Published Mar 24, 2025 • 121
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead

Paper • 2504.00294 • Published Mar 31, 2025 • 10
What Makes a Good Natural Language Prompt?

Paper • 2506.06950 • Published Jun 7, 2025 • 11
Hidden in plain sight: VLMs overlook their visual representations

Paper • 2506.08008 • Published Jun 9, 2025 • 7
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Paper • 2506.11928 • Published Jun 13, 2025 • 25
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs

Paper • 2506.14245 • Published Jun 17, 2025 • 45
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Paper • 2507.00432 • Published Jul 1, 2025 • 79
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

Paper • 2507.19478 • Published Jul 25, 2025 • 33
Attention Basin: Why Contextual Position Matters in Large Language Models

Paper • 2508.05128 • Published Aug 7, 2025 • 4
Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models

Paper • 2508.02120 • Published Aug 4, 2025 • 20
Ovis2.5 Technical Report

Paper • 2508.11737 • Published Aug 15, 2025 • 114
UItron: Foundational GUI Agent with Advanced Perception and Planning

Paper • 2508.21767 • Published Aug 29, 2025 • 12
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

Paper • 2509.19284 • Published Sep 23, 2025 • 23
Video models are zero-shot learners and reasoners

Paper • 2509.20328 • Published Sep 24, 2025 • 100
OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents

Paper • 2510.24563 • Published Oct 28, 2025 • 23
Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents

Paper • 2602.16855 • Published Feb 15 • 51
On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking

Paper • 2602.16849 • Published Feb 18 • 7
AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents

Paper • 2603.18429 • Published Mar 19 • 26

Upvote

Collection guide
Browse collections