Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - The Art of Saying No: Contextual Noncompliance in Language Models
https://github.com/allenai/noncompliance\n","updatedAt":"2024-07-18T02:54:30.557Z","author":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","fullname":"AK","name":"akhaliq","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":9178,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.615837037563324},"editors":["akhaliq"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg"],"reactions":[],"isReport":false}},{"id":"6699c20fe016f3926d7d4bc7","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2024-07-19T01:31:59.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors](https://huggingface.co/papers/2406.14598) (2024)\n* [OR-Bench: An Over-Refusal Benchmark for Large Language Models](https://huggingface.co/papers/2405.20947) (2024)\n* [MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?](https://huggingface.co/papers/2406.17806) (2024)\n* [Self and Cross-Model Distillation for LLMs: Effective Methods for Refusal Pattern Alignment](https://huggingface.co/papers/2406.11285) (2024)\n* [How Far Can In-Context Alignment Go? Exploring the State of In-Context Alignment](https://huggingface.co/papers/2406.11474) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2024-07-19T01:31:59.358Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7417636513710022},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2407.12043","authors":[{"_id":"669883d63fea9b7c13086ca0","user":{"_id":"65282b8d578679aac7888aec","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65282b8d578679aac7888aec/dibBkhH-z1c70mJZZxJ7u.jpeg","isPro":false,"fullname":"Faeze Brahman","user":"faezeb","type":"user"},"name":"Faeze Brahman","status":"extracted_pending","statusLastChangedAt":"2024-07-18T02:54:18.522Z","hidden":false},{"_id":"669883d63fea9b7c13086ca1","user":{"_id":"64f2520f35653e77f02f288c","avatarUrl":"/avatars/33620ae12f604e63eb9ec5ee9d89a820.svg","isPro":false,"fullname":"Sachin Kumar","user":"shocheen","type":"user"},"name":"Sachin Kumar","status":"extracted_confirmed","statusLastChangedAt":"2024-07-18T17:02:33.303Z","hidden":false},{"_id":"669883d63fea9b7c13086ca2","user":{"_id":"67416750576079f381355c93","avatarUrl":"/avatars/73f9cacb9c7497d2af63dce3b93f3e61.svg","isPro":false,"fullname":"Vidhisha Balachandran","user":"vidhishab","type":"user"},"name":"Vidhisha Balachandran","status":"claimed_verified","statusLastChangedAt":"2025-04-03T08:30:11.808Z","hidden":false},{"_id":"669883d63fea9b7c13086ca3","name":"Pradeep Dasigi","hidden":false},{"_id":"669883d63fea9b7c13086ca4","user":{"_id":"6556cff80e7a7067a934445f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6556cff80e7a7067a934445f/PoT7qQ6tVqLGGrYvGBbr8.jpeg","isPro":false,"fullname":"Valentina Pyatkin","user":"valpy","type":"user"},"name":"Valentina Pyatkin","status":"claimed_verified","statusLastChangedAt":"2024-10-24T07:16:15.764Z","hidden":false},{"_id":"669883d63fea9b7c13086ca5","user":{"_id":"6349886c429608888c42319a","avatarUrl":"/avatars/f84b5fe8b76172878274754e3399d6ec.svg","isPro":false,"fullname":"Abhilasha Ravichander","user":"lasha-nlp","type":"user"},"name":"Abhilasha Ravichander","status":"claimed_verified","statusLastChangedAt":"2025-01-15T17:15:14.653Z","hidden":false},{"_id":"669883d63fea9b7c13086ca6","name":"Sarah Wiegreffe","hidden":false},{"_id":"669883d63fea9b7c13086ca7","name":"Nouha Dziri","hidden":false},{"_id":"669883d63fea9b7c13086ca8","name":"Khyathi Chandu","hidden":false},{"_id":"669883d63fea9b7c13086ca9","name":"Jack Hessel","hidden":false},{"_id":"669883d63fea9b7c13086caa","name":"Yulia Tsvetkov","hidden":false},{"_id":"669883d63fea9b7c13086cab","name":"Noah A. Smith","hidden":false},{"_id":"669883d63fea9b7c13086cac","name":"Yejin Choi","hidden":false},{"_id":"669883d63fea9b7c13086cad","name":"Hannaneh Hajishirzi","hidden":false}],"publishedAt":"2024-07-02T07:12:51.000Z","submittedOnDailyAt":"2024-07-18T01:24:30.550Z","title":"The Art of Saying No: Contextual Noncompliance in Language Models","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Chat-based language models are designed to be helpful, yet they should not\ncomply with every user request. While most existing work primarily focuses on\nrefusal of \"unsafe\" queries, we posit that the scope of noncompliance should be\nbroadened. We introduce a comprehensive taxonomy of contextual noncompliance\ndescribing when and how models should not comply with user requests. Our\ntaxonomy spans a wide range of categories including incomplete, unsupported,\nindeterminate, and humanizing requests (in addition to unsafe requests). To\ntest noncompliance capabilities of language models, we use this taxonomy to\ndevelop a new evaluation suite of 1000 noncompliance prompts. We find that most\nexisting models show significantly high compliance rates in certain previously\nunderstudied categories with models like GPT-4 incorrectly complying with as\nmany as 30% of requests. To address these gaps, we explore different training\nstrategies using a synthetically-generated training set of requests and\nexpected noncompliant responses. Our experiments demonstrate that while direct\nfinetuning of instruction-tuned models can lead to both over-refusal and a\ndecline in general capabilities, using parameter efficient methods like low\nrank adapters helps to strike a good balance between appropriate noncompliance\nand other capabilities.","upvotes":5,"discussionId":"669883da3fea9b7c13086e19","githubRepo":"https://github.com/allenai/compred","githubRepoAddedBy":"auto","ai_summary":"A taxonomy for language model noncompliance is introduced, and experiments show that parameter-efficient fine-tuning can balance noncompliance with maintaining model capabilities.","ai_keywords":["contextual noncompliance","taxonomy","noncompliance prompts","instruction-tuned models","parameter-efficient fine-tuning","low rank adapters"],"githubStars":6},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"644e1b1d9b4e87c31bab0a14","avatarUrl":"/avatars/88bb4c4a67dc8958069e9014f5e73a0b.svg","isPro":false,"fullname":"Michael Barry","user":"MichaelBarryUK","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"64f2520f35653e77f02f288c","avatarUrl":"/avatars/33620ae12f604e63eb9ec5ee9d89a820.svg","isPro":false,"fullname":"Sachin Kumar","user":"shocheen","type":"user"},{"_id":"638ee2ed848625f9fad11034","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1670309301345-638ee2ed848625f9fad11034.jpeg","isPro":false,"fullname":"Besmira Nushi","user":"nushib","type":"user"},{"_id":"669003cf7ab665b9f531c4f8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/669003cf7ab665b9f531c4f8/YyYoDYrF768VbjmqSn6q8.jpeg","isPro":false,"fullname":"Giuseppe Magazzù","user":"saiteki-kai","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
A taxonomy for language model noncompliance is introduced, and experiments show that parameter-efficient fine-tuning can balance noncompliance with maintaining model capabilities.
AI-generated summary
Chat-based language models are designed to be helpful, yet they should not
comply with every user request. While most existing work primarily focuses on
refusal of "unsafe" queries, we posit that the scope of noncompliance should be
broadened. We introduce a comprehensive taxonomy of contextual noncompliance
describing when and how models should not comply with user requests. Our
taxonomy spans a wide range of categories including incomplete, unsupported,
indeterminate, and humanizing requests (in addition to unsafe requests). To
test noncompliance capabilities of language models, we use this taxonomy to
develop a new evaluation suite of 1000 noncompliance prompts. We find that most
existing models show significantly high compliance rates in certain previously
understudied categories with models like GPT-4 incorrectly complying with as
many as 30% of requests. To address these gaps, we explore different training
strategies using a synthetically-generated training set of requests and
expected noncompliant responses. Our experiments demonstrate that while direct
finetuning of instruction-tuned models can lead to both over-refusal and a
decline in general capabilities, using parameter efficient methods like low
rank adapters helps to strike a good balance between appropriate noncompliance
and other capabilities.