Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - jina-embeddings-v3: Multilingual Embeddings With Task LoRA
[go: Go Back, main page]

https://huggingface.co/jinaai/jina-embeddings-v3

\n","updatedAt":"2024-09-17T14:11:33.381Z","author":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","fullname":"AK","name":"akhaliq","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":9179,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.38144469261169434},"editors":["akhaliq"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg"],"reactions":[],"isReport":false}},{"id":"66ea2dbe64583d8d2f4e3c65","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2024-09-18T01:32:46.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever](https://huggingface.co/papers/2408.16672) (2024)\n* [mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval](https://huggingface.co/papers/2407.19669) (2024)\n* [Ruri: Japanese General Text Embeddings](https://huggingface.co/papers/2409.07737) (2024)\n* [NLLB-E5: A Scalable Multilingual Retrieval Model](https://huggingface.co/papers/2409.05401) (2024)\n* [The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design](https://huggingface.co/papers/2408.12503) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-09-18T01:32:46.378Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7065617442131042},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"674407d6a05eb026d7381b1f","author":{"_id":"64b4f83b966b28317e606037","avatarUrl":"/avatars/e3b1acb4065e833c133c4442260e331c.svg","fullname":"hena","name":"HeNa111","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false},"createdAt":"2024-11-25T05:15:02.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"\ndoes it support multi-modal","html":"

does it support multi-modal

\n","updatedAt":"2024-11-25T05:16:34.252Z","author":{"_id":"64b4f83b966b28317e606037","avatarUrl":"/avatars/e3b1acb4065e833c133c4442260e331c.svg","fullname":"hena","name":"HeNa111","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.30968737602233887},"editors":["HeNa111"],"editorAvatarUrls":["/avatars/e3b1acb4065e833c133c4442260e331c.svg"],"reactions":[],"isReport":false}},{"id":"674438875809de4a7bfe02b9","author":{"_id":"64c23f6d569648a60737eddb","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64c23f6d569648a60737eddb/iZq7bp-yYaGl5VBVoN5Dg.jpeg","fullname":"Saba Sturua","name":"jupyterjazz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":17,"isUserFollowing":false},"createdAt":"2024-11-25T08:42:47.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"@HeNa111, `jina-embeddings-v3` supports only text. However, we recently released [`jina-clip-v2`](https://huggingface.co/jinaai/jina-clip-v2) which is similar to `jina-embeddings-v3` and additionally supports images.","html":"

\n\n@HeNa111\n\t, jina-embeddings-v3 supports only text. However, we recently released jina-clip-v2 which is similar to jina-embeddings-v3 and additionally supports images.

\n","updatedAt":"2024-11-25T08:42:47.658Z","author":{"_id":"64c23f6d569648a60737eddb","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64c23f6d569648a60737eddb/iZq7bp-yYaGl5VBVoN5Dg.jpeg","fullname":"Saba Sturua","name":"jupyterjazz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":17,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.862115204334259},"editors":["jupyterjazz"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/64c23f6d569648a60737eddb/iZq7bp-yYaGl5VBVoN5Dg.jpeg"],"reactions":[],"isReport":false}},{"id":"6759ca251005050c969d6b7e","author":{"_id":"60bfa4237f75bb4d92557db9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60bfa4237f75bb4d92557db9/8Vu3xJkqI59GrtoFrZbwj.jpeg","fullname":"Wilfredo Martel","name":"wilfoderek","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2024-12-11T17:21:41.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Hi everyone!\nI am currently working on a project focused on asymmetric semantic search involving hard negative sentences, and I would like to fine-tune a model using this approach. I am seeking a practical example to better understand the process.\n\nCould you please provide an example , please.\n","html":"

Hi everyone!
I am currently working on a project focused on asymmetric semantic search involving hard negative sentences, and I would like to fine-tune a model using this approach. I am seeking a practical example to better understand the process.

\n

Could you please provide an example , please.

\n","updatedAt":"2024-12-11T17:21:41.714Z","author":{"_id":"60bfa4237f75bb4d92557db9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60bfa4237f75bb4d92557db9/8Vu3xJkqI59GrtoFrZbwj.jpeg","fullname":"Wilfredo Martel","name":"wilfoderek","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9407299160957336},"editors":["wilfoderek"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/60bfa4237f75bb4d92557db9/8Vu3xJkqI59GrtoFrZbwj.jpeg"],"reactions":[],"isReport":false}},{"id":"679761627d631a66940f58a7","author":{"_id":"64d32c8180f189e40bd8e52f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d32c8180f189e40bd8e52f/wpj9rJZ-vHOvCRkHn4elJ.png","fullname":"Prashant Khairnar","name":"PrashantKhairnar","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false},"createdAt":"2025-01-27T10:35:14.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"does it work for arabic\n","html":"

does it work for arabic

\n","updatedAt":"2025-01-27T10:35:34.428Z","author":{"_id":"64d32c8180f189e40bd8e52f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d32c8180f189e40bd8e52f/wpj9rJZ-vHOvCRkHn4elJ.png","fullname":"Prashant Khairnar","name":"PrashantKhairnar","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"fr","probability":0.4394937753677368},"editors":["PrashantKhairnar"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/64d32c8180f189e40bd8e52f/wpj9rJZ-vHOvCRkHn4elJ.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2409.10173","authors":[{"_id":"66e926caff154e75c55e597f","user":{"_id":"64c23f6d569648a60737eddb","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64c23f6d569648a60737eddb/iZq7bp-yYaGl5VBVoN5Dg.jpeg","isPro":false,"fullname":"Saba Sturua","user":"jupyterjazz","type":"user"},"name":"Saba Sturua","status":"admin_assigned","statusLastChangedAt":"2024-09-17T15:08:29.709Z","hidden":false},{"_id":"66e926caff154e75c55e5980","user":{"_id":"64a830cd6cc1a9a131f62619","avatarUrl":"/avatars/0c6ba301a66f2db73049c9fe0e97f2ef.svg","isPro":false,"fullname":"Isabelle Mohr","user":"isacat","type":"user"},"name":"Isabelle Mohr","status":"admin_assigned","statusLastChangedAt":"2024-09-17T15:08:44.633Z","hidden":false},{"_id":"66e926caff154e75c55e5981","user":{"_id":"64d22f33032a420d1863b6ea","avatarUrl":"/avatars/ed3eaf4bab70dd6ab9a2b67b5928e4fb.svg","isPro":false,"fullname":"Mohammad Kalim Akram","user":"makram93","type":"user"},"name":"Mohammad Kalim Akram","status":"admin_assigned","statusLastChangedAt":"2024-09-17T15:08:51.647Z","hidden":false},{"_id":"66e926caff154e75c55e5982","user":{"_id":"6476ff2699a5ce743ccea3fc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6476ff2699a5ce743ccea3fc/zmFmF8tXXDaAGcl8RYiRr.jpeg","isPro":false,"fullname":"Michael Günther","user":"michael-guenther","type":"user"},"name":"Michael Günther","status":"admin_assigned","statusLastChangedAt":"2024-09-17T15:09:15.549Z","hidden":false},{"_id":"66e926caff154e75c55e5983","user":{"_id":"63491dc83d8dc83a55cb749c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63491dc83d8dc83a55cb749c/IoqJrOIaEnYO_S7si4KGp.jpeg","isPro":false,"fullname":"Bo Wang","user":"bwang0911","type":"user"},"name":"Bo Wang","status":"claimed_verified","statusLastChangedAt":"2024-10-02T07:41:34.129Z","hidden":false},{"_id":"66e926caff154e75c55e5984","user":{"_id":"651e707ca92910108b12ecf4","avatarUrl":"/avatars/140ef2d6980aa2d7fec1550d245fc33e.svg","isPro":false,"fullname":"Markus Krimmel","user":"Markus28","type":"user"},"name":"Markus Krimmel","status":"admin_assigned","statusLastChangedAt":"2024-09-17T15:10:04.475Z","hidden":false},{"_id":"66e926caff154e75c55e5985","name":"Feng Wang","hidden":false},{"_id":"66e926caff154e75c55e5986","user":{"_id":"64f8620e492828088373ddf9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64f8620e492828088373ddf9/g6XQmUzEPNMQYNc34gYpR.jpeg","isPro":false,"fullname":"Georgios Mastrapas","user":"gmastrapas","type":"user"},"name":"Georgios Mastrapas","status":"admin_assigned","statusLastChangedAt":"2024-09-17T15:10:19.566Z","hidden":false},{"_id":"66e926caff154e75c55e5987","user":{"_id":"651e7084570ba4662812114b","avatarUrl":"/avatars/1438e5caa483f63dc0da5ee7508ef7eb.svg","isPro":false,"fullname":"Andreas Koukounas","user":"koukandre","type":"user"},"name":"Andreas Koukounas","status":"admin_assigned","statusLastChangedAt":"2024-09-17T15:10:25.511Z","hidden":false},{"_id":"66e926caff154e75c55e5988","user":{"_id":"651e7084570ba4662812114b","avatarUrl":"/avatars/1438e5caa483f63dc0da5ee7508ef7eb.svg","isPro":false,"fullname":"Andreas Koukounas","user":"koukandre","type":"user"},"name":"Andreas Koukounas","status":"admin_assigned","statusLastChangedAt":"2024-09-17T15:10:31.608Z","hidden":false},{"_id":"66e926caff154e75c55e5989","user":{"_id":"60124e957bdbda85b0874f7d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60124e957bdbda85b0874f7d/tBxDFq00g3Gz8X0Bv2bwD.jpeg","isPro":false,"fullname":"Nan Wang","user":"nan","type":"user"},"name":"Nan Wang","status":"claimed_verified","statusLastChangedAt":"2024-10-22T08:03:17.814Z","hidden":false},{"_id":"66e926caff154e75c55e598a","user":{"_id":"603763514de52ff951d89793","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/603763514de52ff951d89793/n-QouGYg7oE5QeDaAb3Ns.png","isPro":false,"fullname":"Han Xiao","user":"hanxiao","type":"user"},"name":"Han Xiao","status":"admin_assigned","statusLastChangedAt":"2024-09-17T15:14:54.542Z","hidden":false}],"publishedAt":"2024-09-16T11:10:29.000Z","submittedOnDailyAt":"2024-09-17T12:41:33.374Z","title":"jina-embeddings-v3: Multilingual Embeddings With Task LoRA","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"We introduce jina-embeddings-v3, a novel text embedding model with 570\nmillion parameters, achieves state-of-the-art performance on multilingual data\nand long-context retrieval tasks, supporting context lengths of up to 8192\ntokens. The model includes a set of task-specific Low-Rank Adaptation (LoRA)\nadapters to generate high-quality embeddings for query-document retrieval,\nclustering, classification, and text matching. Additionally, Matryoshka\nRepresentation Learning is integrated into the training process, allowing\nflexible truncation of embedding dimensions without compromising performance.\nEvaluation on the MTEB benchmark shows that jina-embeddings-v3 outperforms the\nlatest proprietary embeddings from OpenAI and Cohere on English tasks, while\nachieving superior performance compared to multilingual-e5-large-instruct\nacross all multilingual tasks.","upvotes":33,"discussionId":"66e926cbff154e75c55e59a7","ai_summary":"jina-embeddings-v3, a large-scale text embedding model, achieves state-of-the-art performance in multilingual and long-context retrieval tasks using Low-Rank Adaptation and Matryoshka Representation Learning.","ai_keywords":["Low-Rank Adaptation","LoRA adapters","Matryoshka Representation Learning","MTEB benchmark"],"organization":{"_id":"63563e0c2d14fcd7d83743cf","name":"jinaai","fullname":"Jina AI","avatar":"https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/wD54VbAHHyHop3uYlJKl4.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63491dc83d8dc83a55cb749c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63491dc83d8dc83a55cb749c/IoqJrOIaEnYO_S7si4KGp.jpeg","isPro":false,"fullname":"Bo Wang","user":"bwang0911","type":"user"},{"_id":"6317233cc92fd6fee317e030","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6317233cc92fd6fee317e030/cJHSvvimr1kqgQfHOjO5n.png","isPro":false,"fullname":"Tom Aarsen","user":"tomaarsen","type":"user"},{"_id":"60107b385ac3e86b3ea4fc34","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1627505688463-60107b385ac3e86b3ea4fc34.jpeg","isPro":true,"fullname":"Daniel van Strien","user":"davanstrien","type":"user"},{"_id":"6444b3135af87c73bbbd7447","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6444b3135af87c73bbbd7447/-WLquJY3E1KZSJbnYUkwD.jpeg","isPro":true,"fullname":"Frank Sommers","user":"fsommers","type":"user"},{"_id":"6425025862ae109cd74187d5","avatarUrl":"/avatars/e4d319ec0d00a5568405bf3b5dda5c5b.svg","isPro":false,"fullname":"Nick Emb","user":"nicoism","type":"user"},{"_id":"64c23f6d569648a60737eddb","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64c23f6d569648a60737eddb/iZq7bp-yYaGl5VBVoN5Dg.jpeg","isPro":false,"fullname":"Saba Sturua","user":"jupyterjazz","type":"user"},{"_id":"62acad9bc35bb36ff09073ca","avatarUrl":"/avatars/e5493dd37ddc875192ca3e5e5c3d9ab7.svg","isPro":false,"fullname":"Alireza Farzaneh","user":"AlirezaF138","type":"user"},{"_id":"64ae46a65d48838462093dfd","avatarUrl":"/avatars/f8c8416e4640eb9d13fbb81fdd9f487b.svg","isPro":false,"fullname":"Aditya Sharma","user":"adi751","type":"user"},{"_id":"635efe2b398ff343c4fa209b","avatarUrl":"/avatars/53ebfcab852efd849a848a26dc65751c.svg","isPro":true,"fullname":"elsatch","user":"elsatch","type":"user"},{"_id":"64e8625b21540e1da324b795","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64e8625b21540e1da324b795/pviqKnJoIEJ1zB55Ybj0A.jpeg","isPro":false,"fullname":"sergicalsix","user":"sergicalsix","type":"user"},{"_id":"5f353bb37e58354338621655","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1639773384591-5f353bb37e58354338621655.jpeg","isPro":false,"fullname":"Nicholas Broad","user":"nbroad","type":"user"},{"_id":"64587be872b60ae7a3817858","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64587be872b60ae7a3817858/BbdOOxOCEzWTvEpkWp8MM.png","isPro":false,"fullname":"Minbyul Jeong","user":"Minbyul","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"63563e0c2d14fcd7d83743cf","name":"jinaai","fullname":"Jina AI","avatar":"https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/wD54VbAHHyHop3uYlJKl4.png"}}">
Papers
arxiv:2409.10173

jina-embeddings-v3: Multilingual Embeddings With Task LoRA

Published on Sep 16, 2024
· Submitted by
AK
on Sep 17, 2024

Abstract

jina-embeddings-v3, a large-scale text embedding model, achieves state-of-the-art performance in multilingual and long-context retrieval tasks using Low-Rank Adaptation and Matryoshka Representation Learning.

AI-generated summary

We introduce jina-embeddings-v3, a novel text embedding model with 570 million parameters, achieves state-of-the-art performance on multilingual data and long-context retrieval tasks, supporting context lengths of up to 8192 tokens. The model includes a set of task-specific Low-Rank Adaptation (LoRA) adapters to generate high-quality embeddings for query-document retrieval, clustering, classification, and text matching. Additionally, Matryoshka Representation Learning is integrated into the training process, allowing flexible truncation of embedding dimensions without compromising performance. Evaluation on the MTEB benchmark shows that jina-embeddings-v3 outperforms the latest proprietary embeddings from OpenAI and Cohere on English tasks, while achieving superior performance compared to multilingual-e5-large-instruct across all multilingual tasks.

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

does it support multi-modal

Paper author

@HeNa111 , jina-embeddings-v3 supports only text. However, we recently released jina-clip-v2 which is similar to jina-embeddings-v3 and additionally supports images.

Hi everyone!
I am currently working on a project focused on asymmetric semantic search involving hard negative sentences, and I would like to fine-tune a model using this approach. I am seeking a practical example to better understand the process.

Could you please provide an example , please.

does it work for arabic

Sign up or log in to comment

Models citing this paper 11

Browse 11 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2409.10173 in a dataset README.md to link it from this page.

Spaces citing this paper 47

Collections including this paper 8