Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Building a Foundational Guardrail for General Agentic Systems via Synthetic Data
[go: Go Back, main page]

https://howiehwong.github.io/Agentic-Guardian/
Model: https://huggingface.co/Safiron/Safiron
Code: https://github.com/HowieHwong/Agentic-Guardian

\n","updatedAt":"2025-10-14T04:06:39.244Z","author":{"_id":"639d94ab7145123e0d44e48a","avatarUrl":"/avatars/5bb6a65b306d1383c4a8bcd9334b470a.svg","fullname":"Yue Huang","name":"HowieHwong","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6507483720779419},"editors":["HowieHwong"],"editorAvatarUrls":["/avatars/5bb6a65b306d1383c4a8bcd9334b470a.svg"],"reactions":[],"isReport":false}},{"id":"68eefaef2355fce20af2d84e","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":317,"isUserFollowing":false},"createdAt":"2025-10-15T01:37:51.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [TableMind: An Autonomous Programmatic Agent for Tool-Augmented Table Reasoning](https://huggingface.co/papers/2509.06278) (2025)\n* [SafeMind: Benchmarking and Mitigating Safety Risks in Embodied LLM Agents](https://huggingface.co/papers/2509.25885) (2025)\n* [SITCOM: Scaling Inference-Time COMpute for VLAs](https://huggingface.co/papers/2510.04041) (2025)\n* [From Evidence to Trajectory: Abductive Reasoning Path Synthesis for Training Retrieval-Augmented Generation Agents](https://huggingface.co/papers/2509.23071) (2025)\n* [SaFeR-VLM: Toward Safety-aware Fine-grained Reasoning in Multimodal Models](https://huggingface.co/papers/2510.06871) (2025)\n* [Plan Verification for LLM-Based Embodied Task Completion Agents](https://huggingface.co/papers/2509.02761) (2025)\n* [UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios](https://huggingface.co/papers/2509.21766) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-10-15T01:37:51.700Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":317,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7242236137390137},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2510.09781","authors":[{"_id":"68edcbffde1fee572713a927","name":"Yue Huang","hidden":false},{"_id":"68edcbffde1fee572713a928","user":{"_id":"639f8277beb95d698de007dd","avatarUrl":"/avatars/57f223ccd9d3cb03166ccf0e41361c58.svg","isPro":false,"fullname":"HangHua","user":"hhua2","type":"user"},"name":"Hang Hua","status":"claimed_verified","statusLastChangedAt":"2025-10-14T07:29:51.627Z","hidden":false},{"_id":"68edcbffde1fee572713a929","name":"Yujun Zhou","hidden":false},{"_id":"68edcbffde1fee572713a92a","name":"Pengcheng Jing","hidden":false},{"_id":"68edcbffde1fee572713a92b","name":"Manish Nagireddy","hidden":false},{"_id":"68edcbffde1fee572713a92c","name":"Inkit Padhi","hidden":false},{"_id":"68edcbffde1fee572713a92d","name":"Greta Dolcetti","hidden":false},{"_id":"68edcbffde1fee572713a92e","name":"Zhangchen Xu","hidden":false},{"_id":"68edcbffde1fee572713a92f","name":"Subhajit Chaudhury","hidden":false},{"_id":"68edcbffde1fee572713a930","name":"Ambrish Rawat","hidden":false},{"_id":"68edcbffde1fee572713a931","user":{"_id":"668cfac466f73756c8555a39","avatarUrl":"/avatars/1f101c07a0503aca49bb047290902ada.svg","isPro":false,"fullname":"Liubov Nedoshivina","user":"nedshivina","type":"user"},"name":"Liubov Nedoshivina","status":"claimed_verified","statusLastChangedAt":"2026-02-09T21:07:26.663Z","hidden":false},{"_id":"68edcbffde1fee572713a932","name":"Pin-Yu Chen","hidden":false},{"_id":"68edcbffde1fee572713a933","name":"Prasanna Sattigeri","hidden":false},{"_id":"68edcbffde1fee572713a934","name":"Xiangliang Zhang","hidden":false}],"publishedAt":"2025-10-10T18:42:32.000Z","submittedOnDailyAt":"2025-10-14T02:36:39.228Z","title":"Building a Foundational Guardrail for General Agentic Systems via\n Synthetic Data","submittedOnDailyBy":{"_id":"639d94ab7145123e0d44e48a","avatarUrl":"/avatars/5bb6a65b306d1383c4a8bcd9334b470a.svg","isPro":false,"fullname":"Yue Huang","user":"HowieHwong","type":"user"},"summary":"While LLM agents can plan multi-step tasks, intervening at the planning\nstage-before any action is executed-is often the safest way to prevent harm,\nsince certain risks can lead to severe consequences once carried out. However,\nexisting guardrails mostly operate post-execution, which is difficult to scale\nand leaves little room for controllable supervision at the plan level. To\naddress this challenge, we highlight three critical gaps in current research:\ndata gap, model gap, and evaluation gap. To close the data gap, we introduce\nAuraGen, a controllable engine that (i) synthesizes benign trajectories, (ii)\ninjects category-labeled risks with calibrated difficulty, and (iii) filters\noutputs via an automated reward model, producing large and reliable corpora for\npre-execution safety. To close the guardian model gap, we propose a\nfoundational guardrail Safiron, combining a cross-planner adapter with a\ncompact guardian model. The adapter unifies different input formats, while\nSafiron flags risky cases, assigns risk types, and generates rationales;\ntrained in two stages with a broadly explored data recipe, Safiron achieves\nrobust transfer across settings. To close the evaluation gap, we release\nPre-Exec Bench, a realistic benchmark covering diverse tools and branching\ntrajectories, which measures detection, fine-grained categorization,\nexplanation, and cross-planner generalization in human-verified scenarios.\nExtensive experiments demonstrate consistent gains of the proposed guardrail\nover strong baselines on Pre-Exec Bench, and ablations further distill\nactionable practices, providing a practical template for safer agentic systems.","upvotes":27,"discussionId":"68edcbffde1fee572713a935","githubRepo":"https://github.com/HowieHwong/Agentic-Guardian","githubRepoAddedBy":"user","ai_summary":"AuraGen and Safiron address pre-execution safety gaps in LLM agents by synthesizing benign trajectories, injecting risks, and using a cross-planner adapter for robust risk detection and explanation.","ai_keywords":["AuraGen","Safiron","cross-planner adapter","compact guardian model","automated reward model","Pre-Exec Bench","risk detection","risk categorization","explanation","cross-planner generalization"],"githubStars":40},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6512a4322f0aa026dd6dc9f8","avatarUrl":"/avatars/7cc88d2d8061a83a24bb4458d7cbb242.svg","isPro":false,"fullname":"wyf","user":"wyf23187","type":"user"},{"_id":"66e4aa8d4926518abbf5cae2","avatarUrl":"/avatars/dcff2521e0292b602f86c76fc4b5bbae.svg","isPro":false,"fullname":"XiangqiWang","user":"qisein","type":"user"},{"_id":"6347afbb8af48bc2fb542cdd","avatarUrl":"/avatars/62f2a03fa310bd4c7a6beb219e0ace61.svg","isPro":false,"fullname":"haomin","user":"DaydreamerMZM","type":"user"},{"_id":"64574d8e182c64e989846ba2","avatarUrl":"/avatars/db4bc496a745e1d7de48215b30f6fd3e.svg","isPro":false,"fullname":"Tyrannosaurus","user":"Tyrannosaurus","type":"user"},{"_id":"68263a4782eb9b05042f6892","avatarUrl":"/avatars/d17d90123d9d5e88038d940a9f049334.svg","isPro":false,"fullname":"anonymous","user":"myspace1","type":"user"},{"_id":"653df1323479e9ebbe3eb6cc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/653df1323479e9ebbe3eb6cc/K_g-r1iMRNKj99LXPuYF3.jpeg","isPro":true,"fullname":"Zhangchen Xu","user":"zhangchenxu","type":"user"},{"_id":"6344c87f0f69ad8aa61dfcf6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6344c87f0f69ad8aa61dfcf6/tTVHu2l2aiAnK160vgT6u.jpeg","isPro":false,"fullname":"Yolo Y. Tang","user":"yunlong10","type":"user"},{"_id":"654d82b8d2db4280d9351bc5","avatarUrl":"/avatars/bcf1a67ea1282bf2124b6c964c717232.svg","isPro":false,"fullname":"Xinyi Liu","user":"Xinyi125","type":"user"},{"_id":"639f8277beb95d698de007dd","avatarUrl":"/avatars/57f223ccd9d3cb03166ccf0e41361c58.svg","isPro":false,"fullname":"HangHua","user":"hhua2","type":"user"},{"_id":"65a05abf07184d32fa002d41","avatarUrl":"/avatars/3a23e7e568d2024381ed31b56c1c461a.svg","isPro":false,"fullname":"Yujun Zhou","user":"yujunzhou","type":"user"},{"_id":"68215c609ce83e31a64ed37d","avatarUrl":"/avatars/c7a2f9266af3040e55c533e7ff738734.svg","isPro":false,"fullname":"Alan Rosston","user":"AlanRosston520","type":"user"},{"_id":"638e1cae4867bb0d7b52cd07","avatarUrl":"/avatars/fa9852d17c718a2881b1ef18e4fb2a40.svg","isPro":false,"fullname":"Gegedi","user":"Gegedi","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2510.09781

Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

Published on Oct 10, 2025
· Submitted by
Yue Huang
on Oct 14, 2025
Authors:
,
,
,
,
,
,
,
,
,
,
,

Abstract

AuraGen and Safiron address pre-execution safety gaps in LLM agents by synthesizing benign trajectories, injecting risks, and using a cross-planner adapter for robust risk detection and explanation.

AI-generated summary

While LLM agents can plan multi-step tasks, intervening at the planning stage-before any action is executed-is often the safest way to prevent harm, since certain risks can lead to severe consequences once carried out. However, existing guardrails mostly operate post-execution, which is difficult to scale and leaves little room for controllable supervision at the plan level. To address this challenge, we highlight three critical gaps in current research: data gap, model gap, and evaluation gap. To close the data gap, we introduce AuraGen, a controllable engine that (i) synthesizes benign trajectories, (ii) injects category-labeled risks with calibrated difficulty, and (iii) filters outputs via an automated reward model, producing large and reliable corpora for pre-execution safety. To close the guardian model gap, we propose a foundational guardrail Safiron, combining a cross-planner adapter with a compact guardian model. The adapter unifies different input formats, while Safiron flags risky cases, assigns risk types, and generates rationales; trained in two stages with a broadly explored data recipe, Safiron achieves robust transfer across settings. To close the evaluation gap, we release Pre-Exec Bench, a realistic benchmark covering diverse tools and branching trajectories, which measures detection, fine-grained categorization, explanation, and cross-planner generalization in human-verified scenarios. Extensive experiments demonstrate consistent gains of the proposed guardrail over strong baselines on Pre-Exec Bench, and ablations further distill actionable practices, providing a practical template for safer agentic systems.

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.09781 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.09781 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.