arXiv.org e[B!]新着記事・評価 - はてなブックマーク

A survey of cross-validation procedures for model selection
3 users
arxiv.org

Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on the model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishi
- 学び
- 2025/10/19 19:23

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
3 users
arxiv.org

Large language model (LLM) applications such as agents and domain-specific reasoning increasingly rely on context adaptation -- modifying inputs with instructions, strategies, or evidence, rather than weight updates. Prior approaches improve usability but often suffer from brevity bias, which drops domain insights for concise summaries, and from context collapse, where iterative rewriting erodes d
- テクノロジー
- 2025/10/11 14:31
Paper2Video: Automatic Video Generation from Scientific Papers
3 users
arxiv.org

Academic presentation videos have become an essential medium for research communication, yet producing them remains highly labor-intensive, often requiring hours of slide design, recording, and editing for a short 2 to 10 minutes video. Unlike natural video, presentation video generation involves distinctive challenges: inputs from research papers, dense multi-modal information (text, figures, tab
- エンタメ
- 2025/10/08 15:22
- slide
- video
PLaMo 2 Technical Report
5 users
arxiv.org

In this report, we introduce PLaMo 2, a series of Japanese-focused large language models featuring a hybrid Samba-based architecture that transitions to full attention via continual pre-training to support 32K token contexts. Training leverages extensive synthetic corpora to overcome data scarcity, while computational efficiency is achieved through weight reuse and structured pruning. This efficie
- 学び
- 2025/09/08 11:18
On the Theoretical Limitations of Embedding-Based Retrieval
31 users
arxiv.org

Vector embeddings have been tasked with an ever-increasing set of retrieval tasks over the years, with a nascent rise in using them for reasoning, instruction-following, coding, and more. These new benchmarks push embeddings to work for any query and any notion of relevance that could be given. While prior works have pointed out theoretical limitations of vector embeddings, there is a common assum
- 学び
- 2025/09/01 00:17
- work
- あとで読む
Memento: Fine-tuning LLM Agents without Fine-tuning LLMs
3 users
arxiv.org

In this paper, we introduce a novel learning paradigm for Adaptive Large Language Model (LLM) agents that eliminates the need for fine-tuning the underlying LLMs. Existing approaches are often either rigid, relying on static, handcrafted reflection workflows, or computationally intensive, requiring gradient updates of LLM model parameters. In contrast, our method enables low-cost continual adaptat
- テクノロジー
- 2025/08/28 19:50
- AI
Organ-Agents: Virtual Human Physiology Simulator via LLMs
3 users
arxiv.org

Recent advances in large language models (LLMs) have enabled new possibilities in simulating complex physiological systems. We introduce Organ-Agents, a multi-agent framework that simulates human physiology via LLM-driven agents. Each Simulator models a specific system (e.g., cardiovascular, renal, immune). Training consists of supervised fine-tuning on system-specific time-series data, followed b
- テクノロジー
- 2025/08/21 19:11
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
3 users
arxiv.org

Large Language Models (LLMs) have delivered impressive results in language understanding, generation, reasoning, and pushes the ability boundary of multimodal models. Transformer models, as the foundation of modern LLMs, offer a strong baseline with excellent scaling properties. However, the traditional transformer architecture requires substantial computations and poses significant obstacles for
- 学び
- 2025/08/18 19:33
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
5 users
arxiv.org

Chain-of-Thought (CoT) prompting has been shown to improve Large Language Model (LLM) performance on various tasks. With this approach, LLMs appear to produce human-like reasoning steps before providing answers (a.k.a., CoT reasoning), which often leads to the perception that they engage in deliberate inferential processes. However, some initial findings suggest that CoT reasoning may be more supe
- 学び
- 2025/08/12 06:42
- 未分類
- あとで読む
Mapping the Parasocial AI Market: User Trends, Engagement and Risks
10 users
arxiv.org

A scan of 110 AI companion platforms reveals a rapidly growing global market for emotionally engaging, personalized AI interactions. While parasocial use of general-purpose AI (GPAI) tools currently dominates, a growing number of platforms are designed specifically for care, transactional, or mating companionship. In the UK alone, these platforms receive between 46 million and 91 million monthly v
- テクノロジー
- 2025/08/10 21:39
- AI
- あとで読む
Working with AI: Measuring the Occupational Implications of Generative AI
3 users
arxiv.org

Working with AI: Measuring the Occupational Implications of Generative AI∗ Kiran Tomlinson1 , Sonia Jaffe1 , Will Wang1 , Scott Counts2 , and Siddharth Suri1 1 Microsoft Research 2 Microsoft Abstract Given the rapid adoption of generative AI and its potential to impact a wide range of tasks, under- standing the effects of AI on the economy is one of society’s most important questions. In this work
- テクノロジー
- 2025/08/03 22:32
Persona Vectors: Monitoring and Controlling Character Traits in Language Models
5 users
arxiv.org

Large language models interact with users through a simulated 'Assistant' persona. While the Assistant is typically trained to be helpful, harmless, and honest, it sometimes deviates from these ideals. In this paper, we identify directions in the model's activation space-persona vectors-underlying several traits, such as evil, sycophancy, and propensity to hallucinate. We confirm that these vector
- テクノロジー
- 2025/08/03 13:25
- あとで読む
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
3 users
arxiv.org

When language models (LMs) are trained via reinforcement learning (RL) to generate natural language "reasoning chains", their performance improves on a variety of difficult question answering tasks. Today, almost all successful applications of RL for reasoning use binary reward functions that evaluate the correctness of LM outputs. Because such reward functions do not penalize guessing or low-conf
- テクノロジー
- 2025/08/01 09:09
- AI
Trivial Trojans: How Minimal MCP Servers Enable Cross-Tool Exfiltration of Sensitive Data
4 users
arxiv.org

The Model Context Protocol (MCP) represents a significant advancement in AI-tool integration, enabling seamless communication between AI agents and external services. However, this connectivity introduces novel attack vectors that remain largely unexplored. This paper demonstrates how unsophisticated threat actors, requiring only basic programming skills and free web tools, can exploit MCP's trust
- 学び
- 2025/07/31 18:33
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
3 users
arxiv.org

Large language models (LLMs) are increasingly adapted to downstream tasks via reinforcement learning (RL) methods like Group Relative Policy Optimization (GRPO), which often require thousands of rollouts to learn new tasks. We argue that the interpretable nature of language can often provide a much richer learning medium for LLMs, compared with policy gradients derived from sparse, scalar rewards.
- 学び
- 2025/07/31 18:10
Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning
5 users
arxiv.org

Retrieval-Augmented Generation (RAG) mitigates hallucination in LLMs by incorporating external knowledge, but relies on chunk-based retrieval that lacks structural semantics. GraphRAG methods improve RAG by modeling knowledge as entity-relation graphs, but still face challenges in high construction cost, fixed one-time retrieval, and reliance on long-context reasoning and prompt design. To address
- テクノロジー
- 2025/07/31 01:01
- RAG
- AI
- あとで読む
Deep Researcher with Test-Time Diffusion
3 users
arxiv.org

Deep research agents, powered by Large Language Models (LLMs), are rapidly advancing; yet, their performance often plateaus when generating complex, long-form research reports using generic test-time scaling algorithms. Drawing inspiration from the iterative nature of human research, which involves cycles of searching, reasoning, and revision, we propose the Test-Time Diffusion Deep Researcher (TT
- テクノロジー
- 2025/07/29 09:48
- AI
- あとで読む
Hierarchical Reasoning Model
7 users
arxiv.org

Reasoning, the process of devising and executing complex goal-oriented action sequences, remains a critical challenge in AI. Current large language models (LLMs) primarily employ Chain-of-Thought (CoT) techniques, which suffer from brittle task decomposition, extensive data requirements, and high latency. Inspired by the hierarchical and multi-timescale processing in the human brain, we propose th
- テクノロジー
- 2025/07/28 10:30
A Survey of Context Engineering for Large Language Models
11 users
arxiv.org

The performance of Large Language Models (LLMs) is fundamentally determined by the contextual information provided during inference. This survey introduces Context Engineering, a formal discipline that transcends simple prompt design to encompass the systematic optimization of information payloads for LLMs. We present a comprehensive taxonomy decomposing Context Engineering into its foundational c
- テクノロジー
- 2025/07/19 12:57
Working with AI: Measuring the Occupational Implications of Generative AI
4 users
arxiv.org

Given the rapid adoption of generative AI and its potential to impact a wide range of tasks, understanding the effects of AI on the economy is one of society's most important questions. In this work, we take a step toward that goal by analyzing the work activities people do with AI, how successfully and broadly those activities are done, and combine that with data on what occupations do those acti
- テクノロジー
- 2025/07/15 18:28
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI
3 users
arxiv.org

This review presents a comprehensive analysis of two emerging paradigms in AI-assisted software development: vibe coding and agentic coding. While both leverage large language models (LLMs), they differ fundamentally in autonomy, architectural design, and the role of the developer. Vibe coding emphasizes intuitive, human-in-the-loop interaction through prompt-based, conversational workflows that s
- テクノロジー
- 2025/07/04 18:26
Potemkin Understanding in Large Language Models
9 users
arxiv.org

Large language models (LLMs) are regularly evaluated using benchmark datasets. But what justifies making inferences about an LLM's capabilities based on its answers to a curated set of questions? This paper first introduces a formal framework to address this question. The key is to note that the benchmarks used to test LLMs -- such as AP exams -- are also those used to test people. However, this r
- 学び
- 2025/07/03 20:44
Small Language Models are the Future of Agentic AI
7 users
arxiv.org

Large language models (LLMs) are often praised for exhibiting near-human performance on a wide range of tasks and valued for their ability to hold a general conversation. The rise of agentic AI systems is, however, ushering in a mass of applications in which language models perform a small number of specialized tasks repetitively and with little variation. Here we lay out the position that small l
- テクノロジー
- 2025/07/02 12:55
- SLM
- paper
- AI
- あとで読む
From Unstructured Communication to Intelligent RAG: Multi-Agent Automation for Supply Chain Knowledge Bases
4 users
arxiv.org

Supply chain operations generate vast amounts of operational data; however, critical knowledge such as system usage practices, troubleshooting workflows, and resolution techniques often remains buried within unstructured communications like support tickets, emails, and chat logs. While RAG systems aim to leverage such communications as a knowledge base, their effectiveness is limited by raw data c
- テクノロジー
- 2025/07/02 06:49
CitySim: Modeling Urban Behaviors and City Dynamics with Large-Scale LLM-Driven Agent Simulation
6 users
arxiv.org

Modeling human behavior in urban environments is fundamental for social science, behavioral studies, and urban planning. Prior work often rely on rigid, hand-crafted rules, limiting their ability to simulate nuanced intentions, plans, and adaptive behaviors. Addressing these challenges, we envision an urban simulator (CitySim), capitalizing on breakthroughs in human-level intelligence exhibited by
- テクノロジー
- 2025/07/01 01:04
- AI
- あとで読む
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
21 users
arxiv.org

Modern Parameter-Efficient Fine-Tuning (PEFT) methods such as low-rank adaptation (LoRA) reduce the cost of customizing large language models (LLMs), yet still require a separate optimization run for every downstream dataset. We introduce \textbf{Drag-and-Drop LLMs (\textit{DnD})}, a prompt-conditioned parameter generator that eliminates per-task training by mapping a handful of unlabeled task pro
- テクノロジー
- 2025/06/29 19:24
- AI
- あとで読む
Prover Agent: An Agent-based Framework for Formal Mathematical Proofs
3 users
arxiv.org

We present Prover Agent, a novel AI agent for automated theorem proving that integrates large language models (LLMs) with a formal proof assistant, Lean. Prover Agent coordinates an informal reasoning LLM, a formal prover model, and feedback from Lean while also generating auxiliary lemmas to assist in discovering the overall proof strategy. It achieves an 86.1% success rate on the MiniF2F benchma
- 学び
- 2025/06/28 07:56
Mercury: Ultra-Fast Language Models Based on Diffusion
3 users
arxiv.org

We present Mercury, a new generation of commercial-scale large language models (LLMs) based on diffusion. These models are parameterized via the Transformer architecture and trained to predict multiple tokens in parallel. In this report, we detail Mercury Coder, our first set of diffusion LLMs designed for coding applications. Currently, Mercury Coder comes in two sizes: Mini and Small. These mode
- 学び
- 2025/06/25 11:47
Advanced linear algebra
31 users
arxiv.org

This is an introduction to advanced linear algebra, with emphasis on geometric aspects, and with some applications included too. We first review basic linear algebra, notably with the spectral theorem in its general form, and with the theory of the resultant and discriminant. Then we discuss the Jordan form and its basic applications to physics, and other advanced decomposition results for the mat
- 学び
- 2025/06/25 09:35
- math
- あとで読む

はてなブックマーク

はてなブックマーク

『arXiv.org e-Print archive』

A survey of cross-validation procedures for model selection

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Paper2Video: Automatic Video Generation from Scientific Papers

PLaMo 2 Technical Report

On the Theoretical Limitations of Embedding-Based Retrieval

Memento: Fine-tuning LLM Agents without Fine-tuning LLMs

Organ-Agents: Virtual Human Physiology Simulator via LLMs

Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Mapping the Parasocial AI Market: User Trends, Engagement and Risks

Working with AI: Measuring the Occupational Implications of Generative AI

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty

Trivial Trojans: How Minimal MCP Servers Enable Cross-Tool Exfiltration of Sensitive Data

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning

Deep Researcher with Test-Time Diffusion

Hierarchical Reasoning Model

A Survey of Context Engineering for Large Language Models

Working with AI: Measuring the Occupational Implications of Generative AI

Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI

Potemkin Understanding in Large Language Models

Small Language Models are the Future of Agentic AI

From Unstructured Communication to Intelligent RAG: Multi-Agent Automation for Supply Chain Knowledge Bases

CitySim: Modeling Urban Behaviors and City Dynamics with Large-Scale LLM-Driven Agent Simulation

Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Prover Agent: An Agent-based Framework for Formal Mathematical Proofs

Mercury: Ultra-Fast Language Models Based on Diffusion

Advanced linear algebra

キーボードショートカット一覧

はてなブックマーク

公式Twitter

はてなのサービス

『arXiv.org e-Print archive』

このページはまだブックマークされていません

キーボードショートカット一覧

公式Twitter

はてなのサービス

このページはまだ
ブックマークされていません