- \n
- Executive Summary \n
- Detailed Breakdown \n
- Practical Applications \n
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\nThe following papers were recommended by the Semantic Scholar API
\n- \n
- ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback (2026) \n
- ASTRIDE: A Security Threat Modeling Platform for Agentic-AI Applications (2025) \n
- Towards Verifiably Safe Tool Use for LLM Agents (2026) \n
- MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction (2026) \n
- AGENTSAFE: A Unified Framework for Ethical Assurance and Governance in Agentic AI (2025) \n
- RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic (2025) \n
- SafePro: Evaluating the Safety of Professional-Level AI Agents (2026) \n
Please give a thumbs up to this comment if you found it helpful!
\nIf you want recommendations for any Paper on Hugging Face checkout this Space
\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
cool
\n","updatedAt":"2026-01-30T02:10:14.155Z","author":{"_id":"68a30d028af8e250ce22ef9e","avatarUrl":"/avatars/2343895f72f6332e12ef83bb152bcc30.svg","fullname":"Lifa Zhu","name":"zhulf0804","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.28421300649642944},"editors":["zhulf0804"],"editorAvatarUrls":["/avatars/2343895f72f6332e12ef83bb152bcc30.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2601.18491","authors":[{"_id":"697831d9026bdf0473116e5c","user":{"_id":"6621f4eb64e84619e578aad6","avatarUrl":"/avatars/b1ad96ee354b999fcafb2998a636609c.svg","isPro":false,"fullname":"Dongrui Liu","user":"shenqiorient","type":"user"},"name":"Dongrui Liu","status":"claimed_verified","statusLastChangedAt":"2026-02-20T08:38:14.334Z","hidden":false},{"_id":"697831d9026bdf0473116e5d","user":{"_id":"66e2624a436a1798365e4581","avatarUrl":"/avatars/6c605807d34faa8fb505e135a4b47776.svg","isPro":false,"fullname":"Qihan Ren","user":"jasonrqh","type":"user"},"name":"Qihan Ren","status":"claimed_verified","statusLastChangedAt":"2026-01-28T11:31:15.765Z","hidden":false},{"_id":"697831d9026bdf0473116e5e","name":"Chen Qian","hidden":false},{"_id":"697831d9026bdf0473116e5f","name":"Shuai Shao","hidden":false},{"_id":"697831d9026bdf0473116e60","name":"Yuejin Xie","hidden":false},{"_id":"697831d9026bdf0473116e61","name":"Yu Li","hidden":false},{"_id":"697831d9026bdf0473116e62","name":"Zhonghao Yang","hidden":false},{"_id":"697831d9026bdf0473116e63","name":"Haoyu Luo","hidden":false},{"_id":"697831d9026bdf0473116e64","name":"Peng Wang","hidden":false},{"_id":"697831d9026bdf0473116e65","name":"Qingyu Liu","hidden":false},{"_id":"697831d9026bdf0473116e66","name":"Binxin Hu","hidden":false},{"_id":"697831d9026bdf0473116e67","name":"Ling Tang","hidden":false},{"_id":"697831d9026bdf0473116e68","name":"Jilin Mei","hidden":false},{"_id":"697831d9026bdf0473116e69","name":"Dadi Guo","hidden":false},{"_id":"697831d9026bdf0473116e6a","name":"Leitao Yuan","hidden":false},{"_id":"697831d9026bdf0473116e6b","name":"Junyao Yang","hidden":false},{"_id":"697831d9026bdf0473116e6c","name":"Guanxu Chen","hidden":false},{"_id":"697831d9026bdf0473116e6d","name":"Qihao Lin","hidden":false},{"_id":"697831d9026bdf0473116e6e","name":"Yi Yu","hidden":false},{"_id":"697831d9026bdf0473116e6f","name":"Bo Zhang","hidden":false},{"_id":"697831d9026bdf0473116e70","name":"Jiaxuan Guo","hidden":false},{"_id":"697831d9026bdf0473116e71","name":"Jie Zhang","hidden":false},{"_id":"697831d9026bdf0473116e72","name":"Wenqi Shao","hidden":false},{"_id":"697831d9026bdf0473116e73","name":"Huiqi Deng","hidden":false},{"_id":"697831d9026bdf0473116e74","name":"Zhiheng Xi","hidden":false},{"_id":"697831d9026bdf0473116e75","name":"Wenjie Wang","hidden":false},{"_id":"697831d9026bdf0473116e76","name":"Wenxuan Wang","hidden":false},{"_id":"697831d9026bdf0473116e77","name":"Wen Shen","hidden":false},{"_id":"697831d9026bdf0473116e78","name":"Zhikai Chen","hidden":false},{"_id":"697831d9026bdf0473116e79","name":"Haoyu Xie","hidden":false},{"_id":"697831d9026bdf0473116e7a","name":"Jialing Tao","hidden":false},{"_id":"697831d9026bdf0473116e7b","name":"Juntao Dai","hidden":false},{"_id":"697831d9026bdf0473116e7c","name":"Jiaming Ji","hidden":false},{"_id":"697831d9026bdf0473116e7d","name":"Zhongjie Ba","hidden":false},{"_id":"697831d9026bdf0473116e7e","name":"Linfeng Zhang","hidden":false},{"_id":"697831d9026bdf0473116e7f","name":"Yong Liu","hidden":false},{"_id":"697831d9026bdf0473116e80","name":"Quanshi Zhang","hidden":false},{"_id":"697831d9026bdf0473116e81","name":"Lei Zhu","hidden":false},{"_id":"697831d9026bdf0473116e82","name":"Zhihua Wei","hidden":false},{"_id":"697831d9026bdf0473116e83","name":"Hui Xue","hidden":false},{"_id":"697831d9026bdf0473116e84","name":"Chaochao Lu","hidden":false},{"_id":"697831d9026bdf0473116e85","name":"Jing Shao","hidden":false},{"_id":"697831d9026bdf0473116e86","name":"Xia Hu","hidden":false}],"publishedAt":"2026-01-26T13:45:41.000Z","submittedOnDailyAt":"2026-01-28T01:26:49.833Z","title":"AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security","submittedOnDailyBy":{"_id":"66e2624a436a1798365e4581","avatarUrl":"/avatars/6c605807d34faa8fb505e135a4b47776.svg","isPro":false,"fullname":"Qihan Ren","user":"jasonrqh","type":"user"},"summary":"The rise of AI agents introduces complex safety and security challenges arising from autonomous tool use and environmental interactions. Current guardrail models lack agentic risk awareness and transparency in risk diagnosis. To introduce an agentic guardrail that covers complex and numerous risky behaviors, we first propose a unified three-dimensional taxonomy that orthogonally categorizes agentic risks by their source (where), failure mode (how), and consequence (what). Guided by this structured and hierarchical taxonomy, we introduce a new fine-grained agentic safety benchmark (ATBench) and a Diagnostic Guardrail framework for agent safety and security (AgentDoG). AgentDoG provides fine-grained and contextual monitoring across agent trajectories. More Crucially, AgentDoG can diagnose the root causes of unsafe actions and seemingly safe but unreasonable actions, offering provenance and transparency beyond binary labels to facilitate effective agent alignment. AgentDoG variants are available in three sizes (4B, 7B, and 8B parameters) across Qwen and Llama model families. Extensive experimental results demonstrate that AgentDoG achieves state-of-the-art performance in agentic safety moderation in diverse and complex interactive scenarios. All models and datasets are openly released.","upvotes":123,"discussionId":"697831d9026bdf0473116e87","githubRepo":"https://github.com/AI45Lab/AgentDoG","githubRepoAddedBy":"user","ai_summary":"AI agents face safety and security challenges from autonomous tool use and environmental interactions, requiring advanced guardrail frameworks for risk diagnosis and transparent monitoring.","ai_keywords":["agentic guardrail","three-dimensional taxonomy","agentic safety benchmark","Diagnostic Guardrail framework","agent safety and security","agent trajectories","root cause diagnosis","fine-grained monitoring","model variants","state-of-the-art performance"],"githubStars":353,"organization":{"_id":"68f716f832b31e42cbc2be7f","name":"AI45Research","fullname":"AI45Research","avatar":"https://cdn-uploads.huggingface.co/production/uploads/68f6ffaa04d1019724af41fc/EVBafPHXvChszTM5tJcc9.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"66e2624a436a1798365e4581","avatarUrl":"/avatars/6c605807d34faa8fb505e135a4b47776.svg","isPro":false,"fullname":"Qihan Ren","user":"jasonrqh","type":"user"},{"_id":"682844d0f300562abd28e0c9","avatarUrl":"/avatars/8cd5e929cfa19331f38a6f9c97f841b0.svg","isPro":false,"fullname":"dsadsa","user":"yueyue0407","type":"user"},{"_id":"67f01286fcdc4fce387f1de5","avatarUrl":"/avatars/7fcbb3600a9ace264c981875201b65fe.svg","isPro":false,"fullname":"YU LI","user":"YuLillll","type":"user"},{"_id":"6683b16b21b4b783180afdb0","avatarUrl":"/avatars/fd3aa72cd29a05bec793421341b77049.svg","isPro":false,"fullname":"QingyuLiu","user":"QingyuLiu","type":"user"},{"_id":"6621f4eb64e84619e578aad6","avatarUrl":"/avatars/b1ad96ee354b999fcafb2998a636609c.svg","isPro":false,"fullname":"Dongrui Liu","user":"shenqiorient","type":"user"},{"_id":"655f654f6821269b27090d04","avatarUrl":"/avatars/1b00d85a46e32dbd569684caac29231d.svg","isPro":false,"fullname":"Q","user":"ActorQ","type":"user"},{"_id":"6745c589d2d740914ec2574f","avatarUrl":"/avatars/7b2ff6848d42cd140a775df0c2bc9384.svg","isPro":false,"fullname":"Xiaofang Yang","user":"fffovo","type":"user"},{"_id":"67455c7ff07989f1a6e637f3","avatarUrl":"/avatars/a371f65f321b33c4dab52be9185f153d.svg","isPro":false,"fullname":"EaKal","user":"EaKal","type":"user"},{"_id":"68b4286d2511563aba883bf4","avatarUrl":"/avatars/694b76207ed8483c297bd1427d724761.svg","isPro":false,"fullname":"Qihao Lin","user":"lqh201106","type":"user"},{"_id":"6972f18f18a1c6c6b075313a","avatarUrl":"/avatars/e0bd395bfb2fac6423e9d84aa748acaa.svg","isPro":false,"fullname":"YUAN","user":"angeloyuan","type":"user"},{"_id":"63525f2156ef05f3a1f52362","avatarUrl":"/avatars/0748e51ff76d044dc425044e208b8342.svg","isPro":false,"fullname":"Wenxuan Wang","user":"JarvisWang","type":"user"},{"_id":"642ec9831d1737803dc1c30a","avatarUrl":"/avatars/c9ded838bad09004c15a27200e66a108.svg","isPro":false,"fullname":"linfeng zhang","user":"linfengZ","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":1,"organization":{"_id":"68f716f832b31e42cbc2be7f","name":"AI45Research","fullname":"AI45Research","avatar":"https://cdn-uploads.huggingface.co/production/uploads/68f6ffaa04d1019724af41fc/EVBafPHXvChszTM5tJcc9.png"}}">AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security
Abstract
AI agents face safety and security challenges from autonomous tool use and environmental interactions, requiring advanced guardrail frameworks for risk diagnosis and transparent monitoring.
The rise of AI agents introduces complex safety and security challenges arising from autonomous tool use and environmental interactions. Current guardrail models lack agentic risk awareness and transparency in risk diagnosis. To introduce an agentic guardrail that covers complex and numerous risky behaviors, we first propose a unified three-dimensional taxonomy that orthogonally categorizes agentic risks by their source (where), failure mode (how), and consequence (what). Guided by this structured and hierarchical taxonomy, we introduce a new fine-grained agentic safety benchmark (ATBench) and a Diagnostic Guardrail framework for agent safety and security (AgentDoG). AgentDoG provides fine-grained and contextual monitoring across agent trajectories. More Crucially, AgentDoG can diagnose the root causes of unsafe actions and seemingly safe but unreasonable actions, offering provenance and transparency beyond binary labels to facilitate effective agent alignment. AgentDoG variants are available in three sizes (4B, 7B, and 8B parameters) across Qwen and Llama model families. Extensive experimental results demonstrate that AgentDoG achieves state-of-the-art performance in agentic safety moderation in diverse and complex interactive scenarios. All models and datasets are openly released.
Community
AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security
Nice work!!!
Cool !
gopnotong
Great!
arXivlens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/agentdog-a-diagnostic-guardrail-framework-for-ai-agent-safety-and-security-2641-2f6f42ae
- Executive Summary
- Detailed Breakdown
- Practical Applications
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback (2026)
- ASTRIDE: A Security Threat Modeling Platform for Agentic-AI Applications (2025)
- Towards Verifiably Safe Tool Use for LLM Agents (2026)
- MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction (2026)
- AGENTSAFE: A Unified Framework for Ethical Assurance and Governance in Agentic AI (2025)
- RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic (2025)
- SafePro: Evaluating the Safety of Professional-Level AI Agents (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
cool