Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Retrieval | OpenAI API
[go: Go Back, main page]

Primary navigation

Legacy APIs

Retrieval

Search your data using semantic similarity.

The Retrieval API allows you to perform semantic search over your data, which is a technique that surfaces semantically similar results — even when they match few or no keywords. Retrieval is useful on its own, but is especially powerful when combined with our models to synthesize responses.

Retrieval depiction

The Retrieval API is powered by vector stores, which serve as indices for your data. This guide will cover how to perform semantic search, and go into the details of vector stores.

Quickstart

  • Create vector store and upload files.

  • Create vector store with files
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    from openai import OpenAI
    client = OpenAI()
    
    vector_store = client.vector_stores.create(        # Create vector store
        name="Support FAQ",
    )
    
    client.vector_stores.files.upload_and_poll(        # Upload file
        vector_store_id=vector_store.id,
        file=open("customer_policies.txt", "rb")
    )
  • Send search query to get relevant results.

  • Search query
    1
    2
    3
    4
    5
    6
    user_query = "What is the return policy?"
    
    results = client.vector_stores.search(
        vector_store_id=vector_store.id,
        query=user_query,
    )

    To learn how to use the results with our models, check out the synthesizing responses section.

    Semantic search is a technique that leverages vector embeddings to surface semantically relevant results. Importantly, this includes results with few or no shared keywords, which classical search techniques might miss.

    For example, let’s look at potential results for "When did we go to the moon?":

    TextKeyword SimilaritySemantic Similarity
    The first lunar landing occurred in July of 1969.0%65%
    The first man on the moon was Neil Armstrong.27%43%
    When I ate the moon cake, it was delicious.40%28%

    (Jaccard used for keyword, cosine with text-embedding-3-small used for semantic.)

    Notice how the most relevant result contains none of the words in the search query. This flexibility makes semantic search a very powerful technique for querying knowledge bases of any size.

    Semantic search is powered by vector stores, which we cover in detail later in the guide. This section will focus on the mechanics of semantic search.

    You can query a vector store using the search function and specifying a query in natural language. This will return a list of results, each with the relevant chunks, similarity scores, and file of origin.

    Search query
    1
    2
    3
    4
    results = client.vector_stores.search(
        vector_store_id=vector_store.id,
        query="How many woodchucks are allowed per passenger?",
    )
    Results
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    {
      "object": "vector_store.search_results.page",
      "search_query": "How many woodchucks are allowed per passenger?",
      "data": [
        {
          "file_id": "file-12345",
          "filename": "woodchuck_policy.txt",
          "score": 0.85,
          "attributes": {
            "region": "North America",
            "author": "Wildlife Department"
          },
          "content": [
            {
              "type": "text",
              "text": "According to the latest regulations, each passenger is allowed to carry up to two woodchucks."
            },
            {
              "type": "text",
              "text": "Ensure that the woodchucks are properly contained during transport."
            }
          ]
        },
        {
          "file_id": "file-67890",
          "filename": "transport_guidelines.txt",
          "score": 0.75,
          "attributes": {
            "region": "North America",
            "author": "Transport Authority"
          },
          "content": [
            {
              "type": "text",
              "text": "Passengers must adhere to the guidelines set forth by the Transport Authority regarding the transport of woodchucks."
            }
          ]
        }
      ],
      "has_more": false,
      "next_page": null
    }

    A response will contain 10 results maximum by default, but you can set up to 50 using the max_num_results param.

    Query rewriting

    Certain query styles yield better results, so we’ve provided a setting to automatically rewrite your queries for optimal performance. Enable this feature by setting rewrite_query=true when performing a search.

    The rewritten query will be available in the result’s search_query field.

    OriginalRewritten
    I’d like to know the height of the main office building.primary office building height
    What are the safety regulations for transporting hazardous materials?safety regulations for hazardous materials
    How do I file a complaint about a service issue?service complaint filing process

    Attribute filtering

    Attribute filtering helps narrow down results by applying criteria, such as restricting searches to a specific date range. You can define and combine criteria in attribute_filter to target files based on their attributes before performing semantic search.

    Use comparison filters to compare a specific key in a file’s attributes with a given value, and compound filters to combine multiple filters using and and or.

    Comparison filter
    1
    2
    3
    4
    5
    {
      "type": "eq" | "ne" | "gt" | "gte" | "lt" | "lte" | "in" | "nin",  // comparison operators
      "key": "attributes_key",                           // attributes key
      "value": "target_value"                             // value to compare against
    }
    Compound filter
    1
    2
    3
    4
    {
      "type": "and" | "or",                                // logical operators
      "filters": [...]                                   
    }

    Below are some example filters.

    Filter for a region
    1
    2
    3
    4
    5
    {
      "type": "eq",
      "key": "region",
      "value": "us"
    }