RLHN: Cleaned Training Datasets with False Negatives Identified & Relabeled as ground truth.
https://aclanthology.org/2025.findings-emnlp.481/\n
models
31
datasets
20
Citation
\n@inproceedings{thakur-etal-2025-hard,\n title = \"Hard Negatives, Hard Lessons: Revisiting Training Data Quality for Robust Information Retrieval with {LLM}s\",\n author = \"Thakur, Nandan and\n Zhang, Crystina and\n Ma, Xueguang and\n Lin, Jimmy\",\n editor = \"Christodoulopoulos, Christos and\n Chakraborty, Tanmoy and\n Rose, Carolyn and\n Peng, Violet\",\n booktitle = \"Findings of the Association for Computational Linguistics: EMNLP 2025\",\n month = nov,\n year = \"2025\",\n address = \"Suzhou, China\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://aclanthology.org/2025.findings-emnlp.481/\",\n doi = \"10.18653/v1/2025.findings-emnlp.481\",\n pages = \"9064--9083\",\n ISBN = \"979-8-89176-335-7\",\n abstract = \"Training robust retrieval and reranker models typically relies on large-scale retrieval datasets; for example, the BGE collection contains 1.6 million query-passage pairs sourced from various data sources. However, we find that certain datasets can negatively impact model effectiveness {---} pruning 8 out of 15 datasets from the BGE collection, reduces the training set size by 2.35{\\texttimes}, surprisingly increases nDCG@10 on BEIR by 1.0 point. This motivates a deeper examination of training data quality, with a particular focus on ``false negatives'', where relevant passages are incorrectly labeled as irrelevant. We utilize LLMs as a simple, cost-effective approach to \\textit{identify} and \\textit{relabel} false negatives in training datasets. Experimental results show that relabeling false negatives as true positives improves both E5 (base) and Qwen2.5-7B retrieval models by 0.7-1.4 points on BEIR and by 1.7-1.8 points at nDCG@10 on zero-shot AIR-Bench evaluation. Similar gains are observed for rerankers fine-tuned on the relabeled data, such as Qwen2.5-3B on BEIR. The reliability of LLMs to identify false negatives is supported by human annotation results. Our training dataset and code are publicly available.\"\n}\n\n","classNames":"hf-sanitized hf-sanitized-WINFEsLMP82gEAgAn_mtB"},"users":[{"_id":"611fb1cbfa8355ed0309de81","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665676377085-611fb1cbfa8355ed0309de81.jpeg","isPro":true,"fullname":"Xinyu ZHANG","user":"crystina-z","type":"user"},{"_id":"60196690dd31fde3c1062960","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1612277330660-noauth.jpeg","isPro":false,"fullname":"Nandan Thakur","user":"nthakur","type":"user"},{"_id":"5ec82854968f6028e0559f70","avatarUrl":"/avatars/45b58d912f7d00cb351947cd79d5eeb4.svg","isPro":true,"fullname":"Xueguang Ma","user":"MrLight","type":"user"}],"userCount":3,"collections":[{"slug":"rlhn/rlhn-datasets-682cbc0568218a3ba2f1f68e","title":"RLHN Datasets","description":"RLHN: Cleaned Training Datasets with False Negatives Identified & Relabeled as ground truth.","gating":false,"lastUpdated":"2025-05-23T17:24:58.481Z","owner":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"items":[{"_id":"682cbc1087842187a06162e7","position":0,"type":"dataset","author":"rlhn","downloads":69,"gated":false,"id":"rlhn/rlhn-680K","lastModified":"2025-05-27T19:04:34.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":648766,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":6,"isLikedByUser":false,"isBenchmark":false},{"_id":"682cbe884252bc860667e8c8","position":1,"type":"dataset","author":"rlhn","downloads":56,"gated":false,"id":"rlhn/rlhn-400K","lastModified":"2025-05-27T19:08:44.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":390175,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":2,"isLikedByUser":false,"isBenchmark":false},{"_id":"682cbeadcd757e8f5b022ed3","position":2,"type":"dataset","author":"rlhn","downloads":78,"gated":false,"id":"rlhn/rlhn-250K","lastModified":"2025-05-27T19:10:59.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":247534,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":1,"isLikedByUser":false,"isBenchmark":false},{"_id":"682cc0cc9e5bd1ea3ebee923","position":3,"type":"dataset","author":"rlhn","downloads":18,"gated":false,"id":"rlhn/rlhn-100K","lastModified":"2025-05-27T19:12:23.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":93581,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":1,"isLikedByUser":false,"isBenchmark":false}],"position":0,"theme":"green","private":false,"shareUrl":"https://hf.co/collections/rlhn/rlhn-datasets","upvotes":4,"isUpvotedByUser":false},{"slug":"rlhn/qwen25-7b-retrievers-682cd29ed57ba1e4d10345e0","title":"Qwen2.5-7B Retrievers","description":"","gating":false,"lastUpdated":"2025-05-20T19:07:43.289Z","owner":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"items":[{"_id":"682cd2b1d71351d9bda1fab3","position":0,"type":"model","author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":0,"gated":false,"id":"rlhn/Qwen2.5-7B-rlhn-400K","availableInferenceProviders":[],"lastModified":"2025-04-13T04:14:48.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false},{"_id":"682cd2c71af9fd341df1d01b","position":1,"type":"model","author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":0,"gated":false,"id":"rlhn/Qwen2.5-7B-hn-remove-400K","availableInferenceProviders":[],"lastModified":"2025-05-20T18:42:57.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false},{"_id":"682cd2cf94ed89a9b2895df3","position":2,"type":"model","author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":0,"gated":false,"id":"rlhn/Qwen2.5-7B-default-400K","availableInferenceProviders":[],"lastModified":"2025-05-20T18:42:03.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false}],"position":1,"theme":"blue","private":false,"shareUrl":"https://hf.co/collections/rlhn/qwen25-7b-retrievers","upvotes":0,"isUpvotedByUser":false},{"slug":"rlhn/rerankers-681d95b2a1745c91b1c461a4","title":"Rerankers","description":"","gating":false,"lastUpdated":"2025-05-20T19:07:43.289Z","owner":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"items":[{"_id":"681d95bc5d583b6261769ce8","position":0,"type":"model","author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":0,"gated":false,"id":"rlhn/Qwen2.5-3B-default-250K-reranker","availableInferenceProviders":[],"lastModified":"2025-05-09T05:27:25.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false},{"_id":"681d95bfd061971c9d9a2535","position":1,"type":"model","author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":0,"gated":false,"id":"rlhn/Qwen2.5-3B-default-400K-reranker","availableInferenceProviders":[],"lastModified":"2025-05-09T05:27:36.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false},{"_id":"681d95c4c17db5107b407eab","position":3,"type":"model","author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":0,"gated":false,"id":"rlhn/Qwen2.5-3B-default-680K-reranker","availableInferenceProviders":[],"lastModified":"2025-05-09T05:28:21.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false},{"_id":"681d95c7a22f640e6e9e8667","position":4,"type":"model","author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":0,"gated":false,"id":"rlhn/Qwen2.5-3B-rlhn-100K-reranker","availableInferenceProviders":[],"lastModified":"2025-05-09T05:28:44.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false}],"position":2,"theme":"blue","private":false,"shareUrl":"https://hf.co/collections/rlhn/rerankers","upvotes":0,"isUpvotedByUser":false},{"slug":"rlhn/e5-base-retrievers-682cca6c0629c7deb51ead20","title":"E5 (base) Retrievers","description":"","gating":false,"lastUpdated":"2025-05-20T19:07:43.277Z","owner":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"items":[{"_id":"682cca978806b2e336ffc989","position":0,"type":"model","author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":4,"gated":false,"id":"rlhn/e5-base-rlhn-680K","availableInferenceProviders":[],"lastModified":"2025-04-10T18:56:28.000Z","likes":1,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false,"numParameters":109482240},{"_id":"682cca9d0907f4a88cc139bc","position":1,"type":"model","author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":0,"gated":false,"id":"rlhn/e5-base-rlhn-100K","availableInferenceProviders":[],"lastModified":"2025-04-08T14:46:22.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false,"numParameters":109482240},{"_id":"682ccaa089b5f79a1d45e067","position":2,"type":"model","author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":0,"gated":false,"id":"rlhn/e5-base-rlhn-250K","availableInferenceProviders":[],"lastModified":"2025-04-08T14:46:42.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false,"numParameters":109482240},{"_id":"682ccaa4cb2128493cbf3fa7","position":3,"type":"model","author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":0,"gated":false,"id":"rlhn/e5-base-rlhn-400K","availableInferenceProviders":[],"lastModified":"2025-04-08T19:02:51.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false,"numParameters":109482240}],"position":3,"theme":"orange","private":false,"shareUrl":"https://hf.co/collections/rlhn/e5-base-retrievers","upvotes":0,"isUpvotedByUser":false}],"datasets":[{"author":"rlhn","downloads":18,"gated":false,"id":"rlhn/default-680K-bge-reranker-v2-gemma","lastModified":"2025-06-27T18:49:00.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":679859,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":1,"isLikedByUser":false,"isBenchmark":false},{"author":"rlhn","downloads":16,"gated":false,"id":"rlhn/default-680K-mxbai-rerank-large-v2","lastModified":"2025-06-27T00:32:07.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":679882,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":2,"isLikedByUser":false,"isBenchmark":false},{"author":"rlhn","downloads":5,"gated":false,"id":"rlhn/remove-100K","lastModified":"2025-05-27T20:01:37.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":61026,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"rlhn","downloads":16,"gated":false,"id":"rlhn/remove-250K","lastModified":"2025-05-27T20:00:09.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":151329,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"rlhn","downloads":9,"gated":false,"id":"rlhn/remove-400K","lastModified":"2025-05-27T19:59:02.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":247624,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"rlhn","downloads":22,"gated":false,"id":"rlhn/remove-680K","lastModified":"2025-05-27T19:25:54.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":323622,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"rlhn","downloads":6,"gated":false,"id":"rlhn/hn-remove-250K","lastModified":"2025-05-27T19:23:42.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":246696,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"rlhn","downloads":4,"gated":false,"id":"rlhn/hn-remove-100K","lastModified":"2025-05-27T19:22:21.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":93254,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"rlhn","downloads":13,"gated":false,"id":"rlhn/hn-remove-680K","lastModified":"2025-05-27T19:16:58.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":648766,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"rlhn","downloads":16,"gated":false,"id":"rlhn/hn-remove-400K","lastModified":"2025-05-27T19:15:52.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":388858,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false}],"models":[{"author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":0,"gated":false,"id":"rlhn/Qwen2.5-7B-hn-remove-400K","availableInferenceProviders":[],"lastModified":"2025-05-20T18:42:57.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false},{"author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":0,"gated":false,"id":"rlhn/Qwen2.5-7B-default-400K","availableInferenceProviders":[],"lastModified":"2025-05-20T18:42:03.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false},{"author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":0,"gated":false,"id":"rlhn/Qwen2.5-3B-rlhn-680K-reranker","availableInferenceProviders":[],"lastModified":"2025-05-09T05:29:31.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false},{"author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":0,"gated":false,"id":"rlhn/Qwen2.5-3B-hn-remove-680K-reranker","availableInferenceProviders":[],"lastModified":"2025-05-09T05:29:23.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false},{"author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":0,"gated":false,"id":"rlhn/Qwen2.5-3B-rlhn-400K-reranker","availableInferenceProviders":[],"lastModified":"2025-05-09T05:29:06.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false},{"author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":0,"gated":false,"id":"rlhn/Qwen2.5-3B-rlhn-100K-reranker","availableInferenceProviders":[],"lastModified":"2025-05-09T05:28:44.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false},{"author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":0,"gated":false,"id":"rlhn/Qwen2.5-3B-default-680K-reranker","availableInferenceProviders":[],"lastModified":"2025-05-09T05:28:21.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false},{"author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":0,"gated":false,"id":"rlhn/Qwen2.5-3B-default-400K-reranker","availableInferenceProviders":[],"lastModified":"2025-05-09T05:27:36.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false},{"author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":0,"gated":false,"id":"rlhn/Qwen2.5-3B-default-250K-reranker","availableInferenceProviders":[],"lastModified":"2025-05-09T05:27:25.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false},{"author":"rlhn","authorData":{"_id":"681d90ebe72b0a18b3fd2184","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60196690dd31fde3c1062960/D7wZPcbOLmbhVcbE0FO_u.png","fullname":"RLHN","name":"rlhn","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"downloads":0,"gated":false,"id":"rlhn/Qwen2.5-3B-default-100K-reranker","availableInferenceProviders":[],"lastModified":"2025-05-09T05:27:11.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false}],"paperPreviews":[],"spaces":[],"buckets":[],"numBuckets":0,"numDatasets":20,"numModels":31,"numSpaces":1,"lastOrgActivities":[],"acceptLanguages":["*"],"canReadRepos":false,"canReadSpaces":false,"blogPosts":[],"currentRepoPage":0,"filters":{},"paperView":false}">
AI & ML interests
None defined yet.
Organization Card
Welcome to RLHN (EMNLP 2025 Findings)
RLHN (ReLabeing Hard Negatives) uses a cascading LLM framework to identify and relabel false negatives in IR training datasets.
This repository contains training datasets curated by RLHN & models fine-tuned on these curated datasets.
List of Contributors:
- Nandan Thakur*
- Crystina Zhang*
- Xueguang Ma
- Jimmy Lin
Paper URL: https://aclanthology.org/2025.findings-emnlp.481/
Citation
@inproceedings{thakur-etal-2025-hard,
title = "Hard Negatives, Hard Lessons: Revisiting Training Data Quality for Robust Information Retrieval with {LLM}s",
author = "Thakur, Nandan and
Zhang, Crystina and
Ma, Xueguang and
Lin, Jimmy",
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-emnlp.481/",
doi = "10.18653/v1/2025.findings-emnlp.481",
pages = "9064--9083",
ISBN = "979-8-89176-335-7",
abstract = "Training robust retrieval and reranker models typically relies on large-scale retrieval datasets; for example, the BGE collection contains 1.6 million query-passage pairs sourced from various data sources. However, we find that certain datasets can negatively impact model effectiveness {---} pruning 8 out of 15 datasets from the BGE collection, reduces the training set size by 2.35{\texttimes}, surprisingly increases nDCG@10 on BEIR by 1.0 point. This motivates a deeper examination of training data quality, with a particular focus on ``false negatives'', where relevant passages are incorrectly labeled as irrelevant. We utilize LLMs as a simple, cost-effective approach to \textit{identify} and \textit{relabel} false negatives in training datasets. Experimental results show that relabeling false negatives as true positives improves both E5 (base) and Qwen2.5-7B retrieval models by 0.7-1.4 points on BEIR and by 1.7-1.8 points at nDCG@10 on zero-shot AIR-Bench evaluation. Similar gains are observed for rerankers fine-tuned on the relabeled data, such as Qwen2.5-3B on BEIR. The reliability of LLMs to identify false negatives is supported by human annotation results. Our training dataset and code are publicly available."
}
models
31
rlhn/Qwen2.5-7B-hn-remove-400K
Updated
rlhn/Qwen2.5-7B-default-400K
Updated
rlhn/Qwen2.5-3B-rlhn-680K-reranker
Updated
rlhn/Qwen2.5-3B-hn-remove-680K-reranker
Updated
rlhn/Qwen2.5-3B-rlhn-400K-reranker
Updated
rlhn/Qwen2.5-3B-rlhn-100K-reranker
Updated
rlhn/Qwen2.5-3B-default-680K-reranker
Updated
rlhn/Qwen2.5-3B-default-400K-reranker
Updated
rlhn/Qwen2.5-3B-default-250K-reranker
Updated
rlhn/Qwen2.5-3B-default-100K-reranker
Updated
datasets
20
rlhn/default-680K-bge-reranker-v2-gemma
Viewer
•
Updated
•
680k
•
18
•
1
rlhn/default-680K-mxbai-rerank-large-v2
Viewer
•
Updated
•
680k
•
16
•
2
rlhn/remove-100K
Viewer
•
Updated
•
61k
•
5
rlhn/remove-250K
Viewer
•
Updated
•
151k
•
16
rlhn/remove-400K
Viewer
•
Updated
•
248k
•
9
rlhn/remove-680K
Viewer
•
Updated
•
324k
•
22
rlhn/hn-remove-250K
Viewer
•
Updated
•
247k
•
6
rlhn/hn-remove-100K
Viewer
•
Updated
•
93.3k
•
4
rlhn/hn-remove-680K
Viewer
•
Updated
•
649k
•
13
rlhn/hn-remove-400K
Viewer
•
Updated
•
389k
•
16