JP7626451B2

JP7626451B2 - Information processing device, information processing method, and information processing program

Info

Publication number: JP7626451B2
Application number: JP2021147126A
Authority: JP
Inventors: 伸裕鍜治
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-09-09
Filing date: 2021-09-09
Publication date: 2025-02-07
Anticipated expiration: 2041-09-09
Also published as: JP2023039822A

Description

本発明は、情報処理装置、情報処理方法、及び情報処理プログラムに関する。 The present invention relates to an information processing device, an information processing method, and an information processing program.

ユーザが入力した文字列を訂正（変換）する技術が提供されている。例えば、ユーザが入力したクエリに誤記が含まれる場合等において、その誤記を訂正した上で検索結果をユーザに提供する技術が提供されている（例えば特許文献１等）。 Technology is provided that corrects (converts) character strings entered by a user. For example, when a query entered by a user contains a typo, technology is provided that corrects the typo and then provides the user with search results (e.g., Patent Document 1, etc.).

特許第６５２９４５６号公報Patent No. 6529456

しかしながら、上記の従来技術では、文字列を適切に訂正できない場合がある。例えば、従来技術では第１文字列とその第１文字列の誤記として入力され得る第２文字列とを対応付けた辞書を用いて、文字列を変換しているに過ぎず、辞書に含まれない文字列については対応することが難しい。そのため、文字列を訂正するためにも文字列間の類似性を適切に判定することが望まれている。 However, the above-mentioned conventional technology may not be able to properly correct character strings. For example, the conventional technology simply converts character strings using a dictionary that associates a first character string with a second character string that may be input as a typo of the first character string, and it is difficult to handle character strings that are not included in the dictionary. For this reason, it is desirable to properly determine the similarity between character strings in order to correct the character strings.

本願は、上記に鑑みてなされたものであって、文字列間の類似性を適切に判定する情報処理装置、情報処理方法、及び情報処理プログラムを提供することを目的とする。 The present application has been made in view of the above, and aims to provide an information processing device, an information processing method, and an information processing program that appropriately determine the similarity between character strings.

本願に係る情報処理装置は、所定の文字列である第１文字列を異表記に変換した第１異表記文字列と、前記第１文字列とは異なる文字列である第２文字列を異表記に変換した第２異表記文字列とを取得する取得部と、前記第１異表記文字列と、前記第２異表記文字列とに基づいて、前記第１文字列と前記第２文字列との類似性を判定する判定部と、を備えたことを特徴とする。 The information processing device according to the present application is characterized by including an acquisition unit that acquires a first variant string obtained by converting a first string, which is a predetermined string, into a variant spelling, and a second variant string obtained by converting a second string, which is a string different from the first string, into a variant spelling, and a determination unit that determines the similarity between the first string and the second string based on the first variant string and the second variant string.

実施形態の一態様によれば、文字列間の類似性を適切に判定することができるという効果を奏する。 According to one aspect of the embodiment, it is possible to appropriately determine the similarity between character strings.

図１は、実施形態に係る情報処理の一例を示す図である。FIG. 1 is a diagram illustrating an example of information processing according to an embodiment. 図２は、文字列と異表記との関係の一例を示す図である。FIG. 2 is a diagram showing an example of the relationship between character strings and variant spellings. 図３は、実施形態に係る情報処理装置の構成例を示す図である。FIG. 3 is a diagram illustrating an example of the configuration of the information processing apparatus according to the embodiment. 図４は、実施形態に係る判定用情報記憶部の一例を示す図である。FIG. 4 is a diagram illustrating an example of a determination information storage unit according to the embodiment. 図５は、実施形態に係る文字列情報記憶部の一例を示す図である。FIG. 5 is a diagram illustrating an example of a character string information storage unit according to the embodiment. 図６は、実施形態に係る情報処理装置による処理の一例を示すフローチャートである。FIG. 6 is a flowchart illustrating an example of processing by the information processing device according to the embodiment. 図７は、オートマトンに関する一例を示す概念図である。FIG. 7 is a conceptual diagram showing an example of an automaton. 図８は、ハードウェア構成の一例を示す図である。FIG. 8 is a diagram illustrating an example of a hardware configuration.

以下に、本願に係る情報処理装置、情報処理方法、及び情報処理プログラムを実施するための形態（以下、「実施形態」と呼ぶ）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る情報処理装置、情報処理方法、及び情報処理プログラムが限定されるものではない。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 Below, the information processing device, information processing method, and information processing program according to the present application will be described in detail with reference to the drawings. Note that the information processing device, information processing method, and information processing program according to the present application are not limited to these embodiments. In addition, the same parts in the following embodiments will be denoted by the same reference numerals, and duplicated descriptions will be omitted.

（実施形態）
〔１．情報処理〕
図１を用いて、実施形態に係る情報処理の一例について説明する。図１は、実施形態に係る情報処理の一例を示す図である。図１では、情報処理装置１００は、ユーザが入力したクエリ（文字列）を対象として類似性の判定を行い、判定結果を基に文字列の訂正（変更）を行う場合を一例として示す。なお、図１で示す処理は一例に過ぎず、類似性の判定及び文字列の訂正（変更）の対象は、クエリに限らず、文字列であれば、どのような文字列が対象であってもよい。 (Embodiment)
[1. Information Processing]
An example of information processing according to the embodiment will be described with reference to FIG. 1. FIG. 1 is a diagram showing an example of information processing according to the embodiment. FIG. 1 shows an example in which the information processing device 100 judges similarity for a query (character string) input by a user and corrects (changes) the character string based on the judgment result. Note that the process shown in FIG. 1 is merely an example, and the target of the similarity judgment and character string correction (change) is not limited to a query, and any character string may be the target.

まず、情報処理システム１の構成について説明する。図１に示すように、情報処理システム１は、端末装置１０と、情報処理装置１００とが含まれる。端末装置１０と、情報処理装置１００とは図示しない所定の通信網を介して、有線または無線により通信可能に接続される。なお、図１に示した情報処理システム１には、複数台の端末装置１０や、複数台の情報処理装置１００が含まれてもよい。 First, the configuration of the information processing system 1 will be described. As shown in FIG. 1, the information processing system 1 includes a terminal device 10 and an information processing device 100. The terminal device 10 and the information processing device 100 are connected to each other via a predetermined communication network (not shown) so as to be able to communicate with each other via a wired or wireless connection. Note that the information processing system 1 shown in FIG. 1 may include multiple terminal devices 10 and multiple information processing devices 100.

情報処理装置１００は、文字列を異表記に変換した文字列（「異表記文字列」ともいう）に基づいて、文字列間との類似性を判定するコンピュータである。例えば、情報処理装置１００は、所定の文字列である第１文字列を異表記に変換した第１異表記文字列と、第１文字列とは異なる文字列である第２文字列を異表記に変換した第２異表記文字列とに基づいて、第１文字列と第２文字列との類似性を判定する。以下では、所定の対象を示す文字列を第１文字列とし、ユーザが入力したクエリ（文字列）を第２文字列として説明する。 The information processing device 100 is a computer that determines the similarity between strings based on a string obtained by converting a string into a variant notation (also called a "variant notation string"). For example, the information processing device 100 determines the similarity between a first string and a second string based on a first variant notation string obtained by converting a first string, which is a specified string, into a variant notation, and a second variant notation string obtained by converting a second string, which is a string different from the first string, into a variant notation. In the following description, a string indicating a specified target is defined as the first string, and a query (string) input by a user is defined as the second string.

図１では、情報処理装置１００は、クエリを入力したユーザが利用する端末装置１０に、検索結果を提供する検索装置としても機能する。例えば、情報処理装置１００は、ウェブページの対象として検索処理を実行する検索エンジンとしての機能を有する。なお、訂正要否を判定し用いるクエリを確定した後の処理、すなわちクエリを用いて検索を行う点は通常の検索エンジンと同様であるため、適宜詳細な説明を省略する。例えば、情報処理装置１００は、クエリを用いた検索処理の対象となる対象情報群であるウェブページ群が、インデックスされて格納されたデータベースを有し、そのデータベースの情報を対象として検索処理を実行する。なお、検索される対象は、ウェブページに限らず、クエリを用いて検索可能であれば、どのような情報であってもよく、例えば電子商取引における商品等の取引対象等、様々な検索対象であってもよい。 In FIG. 1, the information processing device 100 also functions as a search device that provides search results to the terminal device 10 used by the user who input the query. For example, the information processing device 100 has a function as a search engine that executes search processing on web pages as targets. Note that the processing after determining whether correction is necessary and determining the query to be used, i.e., performing a search using the query, is similar to that of a normal search engine, and therefore detailed explanations will be omitted as appropriate. For example, the information processing device 100 has a database in which a group of web pages, which are the target information group that is the target of the search processing using the query, are indexed and stored, and executes the search processing on the information in the database as the target. Note that the target to be searched is not limited to a web page, and may be any information that can be searched using a query, and may be various search targets, such as trading targets such as goods in electronic commerce.

図１では情報処理装置１００が検索装置としても機能する、すなわち情報処理装置１００と検索装置とが一体である場合を説明するが、情報処理装置１００と検索装置とは別体であってもよい。この場合、情報処理システム１には、クエリを対象とした検索処理を実行し、検索結果を提供する検索サービスを提供する検索装置が含まれる。例えば、情報処理装置１００は、ユーザが入力したクエリ、または訂正後のクエリを検索装置へ送信し、検索装置から検索結果を受信し、その検索結果をユーザが利用する端末装置１０に送信する。なお、検索装置が直接ユーザが利用する端末装置１０へ検索結果を送信してもよい。 Although FIG. 1 illustrates a case where the information processing device 100 also functions as a search device, i.e., the information processing device 100 and the search device are integrated, the information processing device 100 and the search device may be separate entities. In this case, the information processing system 1 includes a search device that executes search processing for a query and provides a search service that provides search results. For example, the information processing device 100 transmits a query entered by a user or a corrected query to the search device, receives search results from the search device, and transmits the search results to the terminal device 10 used by the user. Note that the search device may transmit the search results directly to the terminal device 10 used by the user.

端末装置１０は、ユーザによって利用されるデバイス（コンピュータ）である。端末装置１０は、ユーザによる操作を受け付ける。端末装置１０は、ユーザによるクエリの入力を受け付ける。端末装置１０は、情報処理装置１００から提供された検索結果を表示する。 The terminal device 10 is a device (computer) used by a user. The terminal device 10 accepts operations by the user. The terminal device 10 accepts input of a query by the user. The terminal device 10 displays the search results provided by the information processing device 100.

また、端末装置１０は、加速度センサやジャイロセンサ等を有し、ユーザの運動状態を検知する。また、端末装置１０は、ＧＰＳセンサ等の位置センサを有し、ユーザの位置情報を検知する。また、端末装置１０は、温度センサや気圧センサ等の種々の機能を有し、温度や気圧等のユーザの置かれている環境情報を検知し、取得可能であってもよい。また、端末装置１０は、心拍センサ等の種々の機能を有し、ユーザの生体情報を検知し、取得可能であってもよい。例えば、端末装置１０を利用するユーザは、端末装置１０と通信可能なウェアラブルデバイスを身に付けることにより、端末装置１０によりユーザ自身のコンテキスト情報を取得可能としてもよい。例えば、端末装置１０を利用するユーザは、端末装置１０と通信可能なリストバンド型のウェアラブルデバイスを身に付けることにより、端末装置１０によりユーザ自身の心拍（脈拍）に関する情報を端末装置１０が取得可能としてもよい。また、端末装置１０は、画像センサを有してもよい。なお、上記は一例であり、端末装置１０は、様々な情報を検知するセンサを有してもよい。 The terminal device 10 also has an acceleration sensor, a gyro sensor, etc., and detects the user's motion state. The terminal device 10 also has a position sensor such as a GPS sensor, and detects the user's position information. The terminal device 10 also has various functions such as a temperature sensor and an air pressure sensor, and may be able to detect and acquire environmental information in which the user is placed, such as temperature and air pressure. The terminal device 10 also has various functions such as a heart rate sensor, and may be able to detect and acquire biometric information of the user. For example, a user who uses the terminal device 10 may wear a wearable device capable of communicating with the terminal device 10, thereby enabling the terminal device 10 to acquire the user's own context information. For example, a user who uses the terminal device 10 may wear a wristband-type wearable device capable of communicating with the terminal device 10, thereby enabling the terminal device 10 to acquire information regarding the user's own heart rate (pulse). The terminal device 10 may also have an image sensor. Note that the above is an example, and the terminal device 10 may have sensors that detect various information.

また、以下では、端末装置１０をユーザと表記する場合がある。すなわち、以下では、ユーザを端末装置１０と読み替えることもできる。なお、端末装置１０は、例えば、スマートフォンや、タブレット型端末や、ノート型ＰＣ（Personal Computer）や、デスクトップＰＣや、携帯電話機や、ＰＤＡ（Personal Digital Assistant）等により実現される。図１では、端末装置１０がノート型ＰＣである場合を示す。 In addition, in the following, the terminal device 10 may be referred to as a user. In other words, in the following, the user may be read as the terminal device 10. The terminal device 10 may be realized, for example, by a smartphone, a tablet terminal, a notebook PC (Personal Computer), a desktop PC, a mobile phone, a PDA (Personal Digital Assistant), etc. FIG. 1 shows a case where the terminal device 10 is a notebook PC.

以下、図１を用いて、情報処理の一例を説明する。図１では、ユーザがユーザＩＤ「Ｕ１」により識別されるユーザ（以下、「ユーザＵ１」とする場合がある）である場合を示す。端末装置１０は、ユーザＵ１による「ホゲ神津尾」というクエリの入力を受け付ける（ステップＳ１１）。なお、「ホゲ神津尾」は具体的な対象がない文字列であるものとする。例えば、「ホゲ神津尾」は、「ホゲ行動」と入力したかったユーザＵ１が入力を誤った（スペルミスの）文字列であるものとする。図１に示す例において、ユーザＵ１は、端末装置１０の画面に表示された検索サイトのページ中の検索窓にクエリ「ホゲ神津尾」を入力し、検索実行のボタンを押下する。 An example of information processing will be described below with reference to FIG. 1. FIG. 1 shows a case where the user is a user identified by a user ID "U1" (hereinafter, may be referred to as "user U1"). The terminal device 10 accepts input of the query "hoge kozuo" by user U1 (step S11). It is assumed that "hoge kozuo" is a character string that has no specific target. For example, it is assumed that "hoge kozuo" is a character string that user U1 entered incorrectly (misspelled) when he intended to enter "hoge behavior". In the example shown in FIG. 1, user U1 enters the query "hoge kozuo" into a search box on a page of a search site displayed on the screen of the terminal device 10, and presses a button to execute the search.

そして、端末装置１０は、クエリ「ホゲ神津尾」を情報処理装置１００へ送信する（ステップＳ１２）。端末装置１０からクエリ「ホゲ神津尾」を受信した情報処理装置１００は、受信したクエリ「ホゲ神津尾」である第２文字列ＣＳ２と比較するキーワード（第１文字列）を取得する（ステップＳ１３）。情報処理装置１００は、「ホゲ神津尾」と比較するキーワード（第１文字列）をキーワード群ＤＢから取得する。キーワード群ＤＢに含まれる各文字列は、各々が対象を示す文字列である。例えば、キーワード群ＤＢは、新語やスペルミスが生じやすい単語が含まれる。例えば、キーワード群ＤＢは、文字列情報記憶部１２２（図５参照）に記憶される。 Then, the terminal device 10 transmits the query "hoge komatsuo" to the information processing device 100 (step S12). The information processing device 100, which has received the query "hoge komatsuo" from the terminal device 10, acquires a keyword (first character string) to be compared with the second character string CS2, which is the received query "hoge komatsuo" (step S13). The information processing device 100 acquires a keyword (first character string) to be compared with "hoge komatsuo" from the keyword group DB. Each character string included in the keyword group DB is a character string that indicates an object. For example, the keyword group DB includes new words and words that are prone to spelling mistakes. For example, the keyword group DB is stored in the character string information storage unit 122 (see FIG. 5).

情報処理装置１００は、キーワード群ＤＢを検索し「ホゲ神津尾」と比較する第１文字列をキーワード群ＤＢから取得する。例えば、情報処理装置１００は、キーワード群ＤＢ中の文字列のうち、クエリ「ホゲ神津尾」と少なくとも一部が一致する文字列を第１文字列として取得する。図１では、情報処理装置１００は、クエリ「ホゲ神津尾」と文字列「ホゲ」が一致する文字列「ホゲ行動」を取得する。すなわち、情報処理装置１００は、文字列「ホゲ行動」をクエリ「ホゲ神津尾」である第２文字列ＣＳ２と比較する第１文字列ＣＳ１として取得する。なお、「ホゲ行動」は具体的な対象を示す文字列（例えばエンターテインメント商品のタイトル等）であるものとする。 The information processing device 100 searches the keyword group DB and acquires from the keyword group DB a first character string to be compared with "Hoge Komatsuo". For example, the information processing device 100 acquires, from among the character strings in the keyword group DB, a character string that at least partially matches the query "Hoge Komatsuo" as the first character string. In FIG. 1, the information processing device 100 acquires a character string "Hoge behavior" in which the query "Hoge Komatsuo" and the character string "Hoge" match. That is, the information processing device 100 acquires the character string "Hoge behavior" as the first character string CS1 to be compared with the second character string CS2, which is the query "Hoge Komatsuo". It is assumed that "Hoge behavior" is a character string indicating a specific object (for example, the title of an entertainment product).

なお、図１では説明を簡単にするために、１つの第１文字列のみを取得する場合を示すが、情報処理装置１００は、複数の第１文字列を取得し、各第１文字列とクエリ「ホゲ神津尾」との類似度を算出し、類似度が最も高い第１文字列を対象として、類似性の判定や文字列の訂正（変更）の処理を行ってもよい。 For simplicity of explanation, FIG. 1 shows a case where only one first character string is acquired, but the information processing device 100 may acquire multiple first character strings, calculate the similarity between each first character string and the query "hoge kozuo", and perform similarity determination and character string correction (change) processing on the first character string with the highest similarity.

情報処理装置１００は、文字列「ホゲ行動」である第１文字列ＣＳ１を異表記に変換する（ステップＳ１４）。図１では、情報処理装置１００は、第１文字列ＣＳ１がローマ字に変換することにより、ローマ字の文字列「ｈｏｇｅｋｏｕｄｏｕ」である第１異表記文字列ＤＮ１を生成する。なお、情報処理装置１００は、第１文字列を第１異表記文字列へ変換可能であれば、どのような情報を用いて変換を行ってもよい。例えば、情報処理装置１００は、第１文字列と第１異表記文字列とが対応付けられた一覧リストを用いて、変換を行ってもよい。 The information processing device 100 converts the first character string CS1, which is the character string "hogekoudou", into a variant spelling (step S14). In FIG. 1, the information processing device 100 converts the first character string CS1 into roman characters to generate a first variant spelling string DN1, which is the roman character string "hogekoudou". Note that the information processing device 100 may use any information to perform the conversion as long as it is possible to convert the first character string into the first variant spelling string. For example, the information processing device 100 may perform the conversion using a list in which the first character string and the first variant spelling string are associated with each other.

また、情報処理装置１００は、文字列「ホゲ神津尾」である第２文字列ＣＳ２を異表記に変換する（ステップＳ１５）。図１では、情報処理装置１００は、第２文字列ＣＳ２がローマ字に変換することにより、ローマ字の文字列「ｈｏｇｅｋｏｕｄｕｏ」である第２異表記文字列ＤＮ２を生成する。なお、情報処理装置１００は、第２文字列を第２異表記文字列へ変換可能であれば、どのような情報を用いて変換を行ってもよい。例えば、情報処理装置１００は、端末装置１０からユーザＵ１が文字列「ホゲ神津尾」を入力した際にどのキーをタッチ（選択）したかを示す入力情報を受信し、その入力情報を用いて、変換を行ってもよい。 The information processing device 100 also converts the second character string CS2, which is the character string "hogekoudou", into a variant spelling (step S15). In FIG. 1, the information processing device 100 converts the second character string CS2 into roman characters to generate a second variant spelling string DN2, which is the roman character string "hogekoudou". Note that the information processing device 100 may use any information to perform the conversion as long as it is possible to convert the second character string into the second variant spelling string. For example, the information processing device 100 may receive input information from the terminal device 10 indicating which key the user U1 touched (selected) when inputting the character string "hogekoudou", and perform the conversion using the input information.

そして、情報処理装置１００は、第１異表記文字列と、第２異表記文字列とに基づいて、第１文字列と第２文字列との類似度を算出する（ステップＳ１６）。図１では、情報処理装置１００は、第１異表記文字列ＤＮ１と、第２異表記文字列ＤＮ２とに基づいて、第１文字列ＣＳ１と第２文字列ＣＳ２との類似度を算出する。すなわち、情報処理装置１００は、ローマ字の文字列「ｈｏｇｅｋｏｕｄｏｕ」と、ローマ字の文字列「ｈｏｇｅｋｏｕｄｕｏ」とに基づいて、文字列「ホゲ行動」と文字列「ホゲ神津尾」との類似度を算出する。 Then, the information processing device 100 calculates the similarity between the first string and the second string based on the first variant string and the second variant string (step S16). In FIG. 1, the information processing device 100 calculates the similarity between the first string CS1 and the second string CS2 based on the first variant string DN1 and the second variant string DN2. That is, the information processing device 100 calculates the similarity between the string "hogekoudou" and the string "hogekoudou" based on the Roman alphabet string "hogekoudou" and the Roman alphabet string "hogekoudou".

例えば、情報処理装置１００は、ローマ字の文字列「ｈｏｇｅｋｏｕｄｏｕ」と、ローマ字の文字列「ｈｏｇｅｋｏｕｄｕｏ」との間の編集距離を、文字列「ホゲ行動」と文字列「ホゲ神津尾」との類似度として算出する。図１では、情報処理装置１００は、文字列「ホゲ行動」と文字列「ホゲ神津尾」との類似度ＶＬ１を算出する。類似度ＶＬ１は具体的な数値であるものとする。なお、上記は一例に過ぎず、情報処理装置１００は、様々な値を類似度として用いてもよく、例えば編集距離の逆数を類似度として用いてもよいし、編集距離の逆数を正規化した値を類似度として用いてもよい。 For example, the information processing device 100 calculates the edit distance between the Roman alphabet character string "hogekoudou" and the Roman alphabet character string "hogekoudou" as the similarity between the character string "hogekoudou" and the character string "hogekoudou". In FIG. 1, the information processing device 100 calculates the similarity VL1 between the character string "hogekoudou" and the character string "hogekoudou". It is assumed that the similarity VL1 is a specific numerical value. Note that the above is merely an example, and the information processing device 100 may use various values as the similarity, for example, the inverse of the edit distance may be used as the similarity, or a value obtained by normalizing the inverse of the edit distance may be used as the similarity.

そして、情報処理装置１００は、算出した類似度を用いて、第１文字列と第２文字列との類似性を判定する（ステップＳ１７）。例えば、情報処理装置１００は、算出した類似度を用いて、第１文字列と第２文字列との類似性を判定し、類似性が高いと判定した場合、第２文字列を第１文字列に訂正すると判定する。情報処理装置１００は、算出した類似度と閾値とを比較して、第１文字列と第２文字列との類似性を判定する。なお、情報処理装置１００は、閾値を外部装置から取得してもよいし、判定用情報記憶部１２１（図４参照）に記憶した閾値を用いてもよい。 Then, the information processing device 100 uses the calculated similarity to determine the similarity between the first string and the second string (step S17). For example, the information processing device 100 uses the calculated similarity to determine the similarity between the first string and the second string, and if it determines that the similarity is high, it determines to correct the second string to the first string. The information processing device 100 compares the calculated similarity with a threshold value to determine the similarity between the first string and the second string. Note that the information processing device 100 may obtain the threshold value from an external device, or may use a threshold value stored in the determination information storage unit 121 (see FIG. 4).

図１では、情報処理装置１００は、閾値ＴＨ１を用いて第１文字列と第２文字列との類似性を判定する。閾値ＴＨ１は具体的な数値であるものとする。情報処理装置１００は、算出した類似度ＶＬ１と閾値ＴＨ１とを比較して、第１文字列と第２文字列との類似性を判定する。図１では、情報処理装置１００は、算出した類似度ＶＬ１が閾値ＴＨ１未満であるため、文字列「ホゲ行動」と文字列「ホゲ神津尾」との類似性が高いと判定する。そのため、情報処理装置１００は、判定結果ＲＳに示すように、クエリ「ホゲ神津尾」に訂正が必要であると判定する。そして、情報処理装置１００は、クエリ「ホゲ神津尾」を、第１文字列ＣＳ１である文字列「ホゲ行動」に変更し、文字列「ホゲ行動」を訂正後クエリとして用いると判定する。 In FIG. 1, the information processing device 100 uses a threshold value TH1 to determine the similarity between the first character string and the second character string. The threshold value TH1 is assumed to be a specific numerical value. The information processing device 100 compares the calculated similarity VL1 with the threshold value TH1 to determine the similarity between the first character string and the second character string. In FIG. 1, the information processing device 100 determines that the similarity between the character string "hoge behavior" and the character string "hoge komatsuo" is high because the calculated similarity VL1 is less than the threshold value TH1. Therefore, the information processing device 100 determines that the query "hoge komatsuo" needs to be corrected, as shown in the determination result RS. Then, the information processing device 100 determines that the query "hoge komatsuo" should be changed to the character string "hoge behavior", which is the first character string CS1, and that the character string "hoge behavior" should be used as the corrected query.

情報処理装置１００は、クエリ「ホゲ神津尾」を訂正後クエリ「ホゲ行動」に訂正する。そして、情報処理装置１００は、訂正後クエリ「ホゲ行動」を用いて、検索処理を実行する。 The information processing device 100 corrects the query "Hoge Kozuo" to a corrected query "Hoge behavior". The information processing device 100 then executes a search process using the corrected query "Hoge behavior".

そして、情報処理装置１００は、訂正後クエリ「ホゲ行動」の検索結果をユーザＵ１が利用する端末装置１０へ提供する（ステップＳ１８）。図１では、情報処理装置１００は、クエリ「ホゲ神津尾」ではなく訂正後クエリ「ホゲ行動」での検索結果であることを通知する情報と共に、「ホゲ行動」の検索結果を表示するコンテンツを端末装置１０へ提供してもよい。なお、情報処理装置１００が行う情報提供が上記限らず、様々な態様であってもよい。例えば、情報処理装置１００は、検索結果を提供する前に、クエリ「ホゲ神津尾」を訂正後クエリ「ホゲ行動」に訂正するか否かをユーザＵ１に確認する情報を提供してもよい。この場合、情報処理装置１００は、ユーザＵ１が選択したクエリに対応する検索結果を端末装置１０へ提供してもよい。また、情報処理装置１００は、訂正後クエリ「ホゲ行動」の検索結果とともに、クエリ「ホゲ神津尾」の検索結果を提供してもよい。 Then, the information processing device 100 provides the search results of the corrected query "Hoge behavior" to the terminal device 10 used by the user U1 (step S18). In FIG. 1, the information processing device 100 may provide the terminal device 10 with content displaying the search results of "Hoge behavior" together with information notifying the terminal device 10 that the search results are not the query "Hoge Kozuo" but the corrected query "Hoge behavior". Note that the information provided by the information processing device 100 is not limited to the above, and may be in various forms. For example, before providing the search results, the information processing device 100 may provide information to confirm with the user U1 whether or not to correct the query "Hoge Kozuo" to the corrected query "Hoge behavior". In this case, the information processing device 100 may provide the terminal device 10 with the search results corresponding to the query selected by the user U1. The information processing device 100 may also provide the search results of the query "Hoge Kozuo" together with the search results of the corrected query "Hoge behavior".

上述したように、情報処理装置１００は、文字列を異表記に変換した異表記文字列に基づいて、文字列間との類似性を判定することにより、文字列間の類似性を適切に判定することができる。例えば、情報処理装置１００は、文字列「ホゲ行動」と文字列「ホゲ神津尾」とをそのまま比較した場合、最初の「ホゲ」以外は全く異なる文字列となる。一方で、情報処理装置１００は、文字列「ホゲ行動」の異表記文字列「ｈｏｇｅｋｏｕｄｏｕ」と文字列「ホゲ神津尾」の異表記文字列「ｈｏｇｅｋｏｕｄｕｏ」とを比較した場合、「ｈｏｇｅｋｏｕｄ」までは一致する。すなわち、文字列「ホゲ行動」の異表記文字列「ｈｏｇｅｋｏｕｄｏｕ」と文字列「ホゲ神津尾」の異表記文字列「ｈｏｇｅｋｏｕｄｕｏ」との間は、最後の２文字が「ｏｕ」と「ｕｏ」とであることのみが差異であり、隣接文字の転置であることが分かる。そのため、情報処理装置１００は、異表記に変換した異表記文字列に基づいて、文字列間との類似性を判定することにより、文字列間の類似性を適切に判定することができる。 As described above, the information processing device 100 can appropriately determine the similarity between character strings by determining the similarity between the character strings based on the variant character string obtained by converting the character string into a variant character string. For example, when the information processing device 100 directly compares the character string "hogekoudou" with the character string "hogekoudou", the character strings are completely different except for the first "hoge". On the other hand, when the information processing device 100 compares the variant character string "hogekoudou" of the character string "hogekoudou" with the variant character string "hogekoudou" of the character string "hogekoudou", they match up to "hogekoudou". In other words, the only difference between the variant character string "hogekoudou" of the character string "hogekoudou" of the character string "hogekoudou" of the character string "hogekoudou" is that the last two characters are "ou" and "uo", and it can be seen that the adjacent characters are transposed. Therefore, the information processing device 100 can appropriately determine the similarity between character strings by determining the similarity between character strings based on the variant character strings converted into variant characters.

例えば、文字列を単純に比較する既存手法の場合、文字一致率が低い場合、文字列間の類似性を判定することが難しい。一方で、情報処理装置１００は、文字列を異表記に変換した異表記文字列に基づいて、文字列間との類似性を判定することで、文字一致率が低い文字列間であっても適切に類似性を判定することができる。なお、上述した処理例では、検索エンジンのスペル訂正のケース（用途）を一例として説明したが、情報処理装置１００による処理が適用可能であればどのような用途に用いられてもよく、例えばクエリログからスペルミスとその訂正結果を抽出する用途等の様々な用途に用いられてもよい。 For example, in the case of existing methods that simply compare character strings, it is difficult to determine the similarity between character strings when the character matching rate is low. On the other hand, the information processing device 100 can appropriately determine the similarity between character strings even when the character matching rate is low by determining the similarity between the character strings based on a variant character string obtained by converting a character string into a variant character string. Note that, in the above processing example, the case (application) of spelling correction in a search engine is described as an example, but the information processing device 100 may be used for any application as long as the processing by the information processing device 100 is applicable, and may be used for various applications such as extracting spelling mistakes and their correction results from a query log.

〔１－１．文字列と異表記の関係の例〕
ここで、文字列と異表記の関係の例について図２を用いて説明する。図２は、文字列と異表記との関係の一例を示す図である。図２に示す第１ケースＣＳ１及び第２ケースＣＳ２は、文字一致率が低く、既存手法では類似性の判定が困難な場合の一例を示す。 [1-1. Examples of relationships between character strings and variant spellings]
Here, an example of the relationship between character strings and variant spellings will be described with reference to Fig. 2. Fig. 2 is a diagram showing an example of the relationship between character strings and variant spellings. A first case CS1 and a second case CS2 shown in Fig. 2 show an example of a case where the character matching rate is low and it is difficult to determine similarity using existing methods.

図２中の第１ケースＣＳ１は、１文字のみの違いの例を示す。この場合、第１文字列「藤井君」は、所定の対象（有名人等）を示す文字列である。また、第２文字列「ふしい君」は、所定の対象を示さない文字列、例えば「藤井君」の入力ミスである。 The first case CS1 in FIG. 2 shows an example where there is a difference in only one character. In this case, the first character string "Fujii-kun" is a character string that indicates a specific object (such as a famous person). The second character string "Fushii-kun" is a character string that does not indicate a specific object, for example, an input error of "Fujii-kun."

例えば、情報処理装置１００は、文字列「藤井君」と文字列「ふしい君」とをそのまま比較した場合、最後の「君」以外は全く異なる文字列となる。一方で、情報処理装置１００は、文字列「藤井君」の異表記文字列「ｈｕｊｉｉｋｕｎ」と文字列「ふしい君」の異表記文字列「ｈｕｓｉｉｋｕｎ」とを比較した場合、３文字目が「ｊ」と「ｓ」とであることの差異のみであり、１文字のみの違いであることが分かる。 For example, when the information processing device 100 directly compares the character strings "Fujii-kun" and "Fushii-kun", the character strings are completely different except for the last "kun". On the other hand, when the information processing device 100 compares the variant character string "fujiikun" of the character string "Fujii-kun" with the variant character string "husiikun" of the character string "fushii-kun", it is found that the only difference is that the third character is "j" and the third character is "s", which is a difference of only one character.

そのため、情報処理装置１００は、異表記に変換した異表記文字列に基づいて、文字列間との類似性を判定することにより、文字列間の類似性を適切に判定することができる。なお、図２中の第２ケースＣＳ２は、図１で例示した隣接文字の転置の例であるため詳細な説明は省略する。 Therefore, the information processing device 100 can appropriately determine the similarity between character strings by determining the similarity between the character strings based on the variant character strings converted into variant characters. Note that the second case CS2 in FIG. 2 is an example of the transposition of adjacent characters illustrated in FIG. 1, and therefore a detailed description thereof is omitted.

〔１－２．オートマトン〕
なお、情報処理装置１００は、上記に限らず様々な技術を用いて、文字列の類似性を判定してもよい。例えば、全てのローマ字表記の可能性を考慮するのは計算効率面で改善の余地がある。 [1-2. Automata]
The information processing device 100 may determine the similarity of character strings using various techniques other than those described above. For example, there is room for improvement in terms of calculation efficiency when considering all possible romanized spellings.

そこで、情報処理装置１００は、オートマトンに関する技術を用いて、文字列の類似性を判定してもよい。この点について、図７を用いて一例を説明する。図７は、オートマトンに関する一例を示す概念図である。 Therefore, the information processing device 100 may determine the similarity of character strings using technology related to automata. An example of this point will be described with reference to FIG. 7. FIG. 7 is a conceptual diagram showing an example related to automata.

図７中の遷移図ＧＲ１は、図１で説明した文字列「ホゲ神津尾」中の文字列「神津尾」をオートマトンで表現した図を示す。例えば、遷移図ＧＲ１のうち太字になっている矢印の経路（第１経路）が選択されることを示す。すなわち、図７は「ｋ」→「ｏ」→「ｕ」→「ｄ」→「ｕ」→「ｏ」の順の第１経路が選択された場合を示す。 Transition diagram GR1 in Figure 7 shows an automaton representation of the character string "Kamuzuo" in the character string "Hoge Kamuzuo" explained in Figure 1. For example, this indicates that the path indicated by the bold arrow in transition diagram GR1 (first path) is selected. In other words, Figure 7 shows the case where the first path in the order "k" → "o" → "u" → "d" → "u" → "o" is selected.

図７中の遷移図ＧＲ２は、図１で説明した文字列「ホゲ行動」中の文字列「行動」をオートマトンで表現した図を示す。例えば、遷移図ＧＲ２のうち点線になっている矢印の経路（第２経路）が選択されることを示す。すなわち、図７は「ｋ」→「ｏ」→「ｕ」→「ｄ」→「ｏ」→「ｕ」の順の第２経路が選択された場合を示す。 Transition diagram GR2 in Figure 7 shows an automaton representation of the character string "behavior" in the character string "hoge behavior" described in Figure 1. For example, it shows that the path indicated by the dotted arrow in transition diagram GR2 (second path) is selected. In other words, Figure 7 shows the case where the second path in the order of "k" → "o" → "u" → "d" → "o" → "u" is selected.

図７の例では、遷移図ＧＲ１のうち太字になっている第１経路と、遷移図ＧＲ２のうち点線になっている第２経路の編集距離が１であり（ｕｏ→ｏｕの転置操作が発生）、これは全組み合わせの中で最小なので、「ホゲ神津尾」と「ホゲ行動」の編集距離は１となる。図７では、例えば、情報処理装置１００は、「ホゲ神津尾」と「ホゲ行動」との類似度を「１」と算出する。 In the example of FIG. 7, the edit distance between the first path in bold in transition diagram GR1 and the second path in dotted line in transition diagram GR2 is 1 (a transposition operation of uo → ou occurs), which is the smallest among all combinations, so the edit distance between "Hoge Komatsuo" and "Hoge behavior" is 1. In FIG. 7, for example, the information processing device 100 calculates the similarity between "Hoge Komatsuo" and "Hoge behavior" to be "1".

例えば、情報処理装置１００は、異表記候補をオートマトンで表現し、オートマトン間で効率的に計算可能な尺度（オートマトンの編集距離など）を類似度としてもよい。また、情報処理装置１００は、オートマトンは重み付き（ローマ字表記の確信度のようなもの）にして、重みを類似度に反映させても良い。 For example, the information processing device 100 may represent variant spelling candidates as automata, and use a measure that can be efficiently calculated between automata (such as the edit distance of the automata) as the similarity. The information processing device 100 may also weight the automata (such as the confidence level of romanization) and reflect the weight in the similarity.

例えば、情報処理装置１００は、全てのローマ字表記の可能性を、オートマトンを用いて簡潔に表現し、２つのオートマトンの編集距離（始点から終点までの経路ペアのうち編集距離が最小のもの）を類似度としてもよい。この場合、例えば、情報処理装置１００は、下記の文献に開示されている技術を用いてもよい。
・Mehryar Mohri, “Edit-Distance of Weighted Automata: General Definitions and Algorithms”, International Journal of Foundations of Computer Science, 2003. For example, the information processing device 100 may express all the possibilities of romanization concisely using an automaton, and may use the edit distance between two automata (the path pair from the start point to the end point with the smallest edit distance) as the similarity. In this case, for example, the information processing device 100 may use the technology disclosed in the following document.
・Mehryar Mohri, “Edit-Distance of Weighted Automata: General Definitions and Algorithms”, International Journal of Foundations of Computer Science, 2003.

なお、上記は一例に過ぎず、情報処理装置１００は、オートマトンに関する技術を適宜用いてもよい。情報処理装置１００は、カナとローマ字とのぞれぞれが別ルート（経路）で含まれているオートマトン等、複数種別の表記を含むオートマトンを用いてもよい。 Note that the above is merely an example, and the information processing device 100 may use automaton-related technology as appropriate. The information processing device 100 may use an automaton that includes multiple types of notation, such as an automaton that includes kana and romaji in separate routes.

〔１－３．文字列及び異表記の例〕
なお、上記の例では、日本語を対象として処理を説明したが、情報処理装置１００は、日本語に限らず様々な言語が対象として処理を行ってもよい。例えば、情報処理装置１００は、英語、中国語等の様々な言語の文字列を対象として処理を行ってもよい。 [1-3. Examples of character strings and variants]
In the above example, the processing is described with Japanese as the target language, but the information processing device 100 may process various languages, not limited to Japanese. For example, the information processing device 100 may process character strings in various languages, such as English and Chinese.

例えば、例えば、情報処理装置１００は、英語を対象とする場合、英語表記の文字列の発音を示す発音記号を異表記文字列として処理を行ってもよい。例えば、情報処理装置１００は、中国語を対象とする場合、中国語表記の文字列の発音を示すピンインを異表記文字列として処理を行ってもよい。なお、いずれの言語であっても、類似度を算出や類似性の判定については、上述した日本語の場合と同様に行えばよく、詳細な説明は省略する。 For example, when the information processing device 100 is targeting English, the information processing device 100 may process phonetic symbols indicating the pronunciation of a string written in English as a variant character string. For example, when the information processing device 100 is targeting Chinese, the information processing device 100 may process pinyin indicating the pronunciation of a string written in Chinese as a variant character string. Regardless of the language, the calculation of similarity and the determination of similarity can be performed in the same manner as in the case of Japanese described above, and detailed explanations will be omitted.

〔２．情報処理装置の構成〕
次に、図３を用いて、実施形態に係る情報処理装置１００の構成について説明する。図３は、実施形態に係る情報処理装置１００の構成例を示す図である。図３に示すように、情報処理装置１００は、通信部１１０と、記憶部１２０と、制御部１３０とを有する。なお、情報処理装置１００は、情報処理装置１００の管理者等から各種操作を受け付ける入力部（例えば、キーボードやマウス等）や、各種情報を表示するための表示部（例えば、液晶ディスプレイ等）を有してもよい。 2. Configuration of information processing device
Next, the configuration of the information processing device 100 according to the embodiment will be described with reference to Fig. 3. Fig. 3 is a diagram showing an example of the configuration of the information processing device 100 according to the embodiment. As shown in Fig. 3, the information processing device 100 has a communication unit 110, a storage unit 120, and a control unit 130. Note that the information processing device 100 may have an input unit (e.g., a keyboard, a mouse, etc.) that accepts various operations from an administrator of the information processing device 100, and a display unit (e.g., a liquid crystal display, etc.) that displays various information.

（通信部１１０）
通信部１１０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部１１０は、所定の通信網（ネットワーク）と有線または無線で接続され、端末装置１０との間で情報の送受信を行う。 (Communication unit 110)
The communication unit 110 is realized by, for example, a network interface card (NIC) etc. The communication unit 110 is connected to a predetermined communication network by wire or wirelessly, and transmits and receives information to and from the terminal device 10.

（記憶部１２０）
記憶部１２０は、例えば、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。実施形態に係る記憶部１２０は、図３に示すように、判定用情報記憶部１２１と、文字列情報記憶部１２２とを有する。なお、記憶部１２０は、上記以外にも様々な情報を記憶してもよい。 (Memory unit 120)
The storage unit 120 is realized by, for example, a semiconductor memory element such as a random access memory (RAM) or a flash memory, or a storage device such as a hard disk or an optical disk. As shown in Fig. 3, the storage unit 120 according to the embodiment has a determination information storage unit 121 and a character string information storage unit 122. Note that the storage unit 120 may store various information other than the above.

（判定用情報記憶部１２１）
実施形態に係る判定用情報記憶部１２１は、判定に関する様々な情報を記憶する。例えば、判定用情報記憶部１２１は、閾値等、類似性の判定に用いる情報を記憶する。図４は、実施形態に係る判定用情報記憶部の一例を示す図である。図４では、判定用情報記憶部１２１は、「条件ＩＤ」、「条件情報」、「内容」といった項目を有する。 (Determination information storage unit 121)
The information for determination storage unit 121 according to the embodiment stores various information related to determination. For example, the information for determination storage unit 121 stores information used for determining similarity, such as a threshold value. Fig. 4 is a diagram showing an example of the information for determination storage unit according to the embodiment. In Fig. 4, the information for determination storage unit 121 has items such as "condition ID", "condition information", and "content".

「条件ＩＤ」は、条件を識別する情報を示す。「条件情報」は、判定に用いる情報が記憶される。「内容」は、対応する条件情報がどのような処理に用いられるかを示す。 "Condition ID" indicates information that identifies the condition. "Condition information" stores information used for judgment. "Content" indicates what kind of processing the corresponding condition information is used for.

図４では、条件ＩＤ「ＣＤ１」により識別される条件は、閾値ＴＨ１であることを示す。なお、閾値ＴＨ１は、例えば０．７、５、１０等の具体的な数値であるものとする。また、閾値ＴＨ１は、類似性判定に用いられることを示す。 In FIG. 4, the condition identified by the condition ID "CD1" is the threshold value TH1. Note that the threshold value TH1 is a specific numerical value, such as 0.7, 5, 10, etc. Also, it is indicated that the threshold value TH1 is used for similarity determination.

なお、判定用情報記憶部１２１は、上記に限らず、目的に応じて種々の情報を記憶してもよい。 The determination information storage unit 121 may store various types of information depending on the purpose, not limited to the above.

（文字列情報記憶部１２２）
実施形態に係る文字列情報記憶部１２２は、文字列に関する情報を記憶する。文字列情報記憶部１２２は、所定の対象を示す文字列を記憶する。例えば、文字列情報記憶部１２２は、新語など、所定の対象を示すがスペルミスが生じやすい文字列を記憶する。図５は、本開示の第１の実施形態に係る文字列情報記憶部の一例を示す図である。図５では、文字列情報記憶部１２２は、「文字列ＩＤ」、「文字列」といった項目が含まれる。 (Character string information storage unit 122)
The string information storage unit 122 according to the embodiment stores information related to strings. The string information storage unit 122 stores strings indicating a predetermined target. For example, the string information storage unit 122 stores strings that indicate a predetermined target but are prone to spelling mistakes, such as new words. Fig. 5 is a diagram illustrating an example of the string information storage unit according to the first embodiment of the present disclosure. In Fig. 5, the string information storage unit 122 includes items such as "string ID" and "string".

「文字列ＩＤ」は、所定の対象を示す文字列として登録された文字列を識別するための識別情報を示す。「文字列」は、文字列を示す。 "String ID" indicates identification information for identifying a string registered as a string indicating a specific object. "String" indicates a string.

図５に示す例では、文字列ＩＤ「ＫＷ１」により識別される文字列（文字列ＫＷ１）は、文字列「ホゲ行動」であることを示す。例えば、文字列「ホゲ行動」は、ユーザのスペルミスが多い文字列である。 In the example shown in FIG. 5, the character string identified by the character string ID "KW1" (character string KW1) is the character string "hoge behavior." For example, the character string "hoge behavior" is a character string that is frequently misspelled by users.

なお、文字列情報記憶部１２２は、上記に限らず、目的に応じて種々の情報を記憶してもよい。例えば、文字列情報記憶部１２２に記憶される文字列（正解文字列）は、新語など、スペルミスが生じやすい文字列を管理する外部装置から情報処理装置１００が取得してもよい。 The string information storage unit 122 may store various information according to the purpose, not limited to the above. For example, the string (correct string) stored in the string information storage unit 122 may be acquired by the information processing device 100 from an external device that manages strings that are prone to spelling mistakes, such as new words.

（制御部１３０）
図３の説明に戻って、制御部１３０は、コントローラ（controller）であり、例えば、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）等によって、情報処理装置１００内部の記憶装置に記憶されている各種プログラム（情報処理プログラムの一例に相当）がＲＡＭを作業領域として実行されることにより実現される。また、制御部１３０は、コントローラであり、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現される。 (Control unit 130)
Returning to the explanation of Fig. 3, the control unit 130 is a controller, and is realized, for example, by a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or the like, executing various programs (corresponding to an example of an information processing program) stored in a storage device inside the information processing device 100 using a RAM as a working area. The control unit 130 is also a controller, and is realized, for example, by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

図３に示すように、制御部１３０は、取得部１３１と、算出部１３２と、判定部１３３と、処理部１３４と、提供部１３５とを有し、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部１３０の内部構成は、図３に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。また、制御部１３０が有する各処理部の接続関係は、図３に示した接続関係に限られず、他の接続関係であってもよい。 As shown in FIG. 3, the control unit 130 has an acquisition unit 131, a calculation unit 132, a determination unit 133, a processing unit 134, and a provision unit 135, and realizes or executes the functions and actions of the information processing described below. Note that the internal configuration of the control unit 130 is not limited to the configuration shown in FIG. 3, and may be other configurations as long as they perform the information processing described below. Also, the connection relationship between each processing unit of the control unit 130 is not limited to the connection relationship shown in FIG. 3, and may be other connection relationships.

（取得部１３１）
取得部１３１は、記憶部１２０から各種の情報を取得する。取得部１３１は、判定用情報記憶部１２１から類似性の判定に用いる情報を取得する。取得部１３１は、文字列情報記憶部１２２から、ユーザが入力したクエリとの比較対象となる文字列を取得する。 (Acquisition unit 131)
The acquisition unit 131 acquires various types of information from the storage unit 120. The acquisition unit 131 acquires information used for determining similarity from the determination information storage unit 121. The acquisition unit 131 acquires, from the character string information storage unit 122, a character string to be compared with a query input by a user.

取得部１３１は、通信部１１０を介して、外部の情報処理装置から各種情報を受信する。取得部１３１は、端末装置１０から各種情報を受信する。取得部１３１は、端末装置１０からユーザが入力したクエリを取得する。取得部１３１は、文字列を異表記に変換する外部装置（変換サーバ）から文字列が異表記に変換された異表記文字列を受信してもよい。例えば、取得部１３１は、提供部１３５が送信した文字列を受信した変換サーバが受信した文字列を異表記に変換した異表記文字列を変換サーバから受信してもよい。 The acquisition unit 131 receives various information from an external information processing device via the communication unit 110. The acquisition unit 131 receives various information from the terminal device 10. The acquisition unit 131 acquires a query input by a user from the terminal device 10. The acquisition unit 131 may receive a variant string obtained by converting a character string into a variant from an external device (conversion server) that converts a character string into a variant. For example, the acquisition unit 131 may receive a variant string obtained by converting a received character string into a variant from the conversion server that has received the character string sent by the provision unit 135.

取得部１３１は、所定の文字列である第１文字列を取得する。取得部１３１は、所定の対象を示す第１文字列を取得する。取得部１３１は、第１文字列とは異なる文字列である第２文字列を取得する。取得部１３１は、対象を示さない第２文字列を取得する。取得部１３１は、ユーザが入力したクエリである第２文字列を取得する。 The acquisition unit 131 acquires a first string, which is a predetermined string. The acquisition unit 131 acquires a first string indicating a predetermined target. The acquisition unit 131 acquires a second string, which is a string different from the first string. The acquisition unit 131 acquires a second string that does not indicate a target. The acquisition unit 131 acquires a second string, which is a query input by a user.

取得部１３１は、所定の文字列である第１文字列を異表記に変換した第１異表記文字列を取得する。取得部１３１は、所定の対象を示す第１文字列の第１異表記文字列を取得する。取得部１３１は、第１文字列とは異なる文字列である第２文字列を異表記に変換した第２異表記文字列を取得する。取得部１３１は、対象を示さない第２文字列の第２異表記文字列を取得する。取得部１３１は、ユーザが入力したクエリである第２文字列の第２異表記文字列を取得する。 The acquisition unit 131 acquires a first variant string obtained by converting a first string, which is a predetermined string, into a variant. The acquisition unit 131 acquires a first variant string of a first string indicating a predetermined target. The acquisition unit 131 acquires a second variant string obtained by converting a second string, which is a string different from the first string, into a variant. The acquisition unit 131 acquires a second variant string of a second string that does not indicate a target. The acquisition unit 131 acquires a second variant string of a second string, which is a query input by a user.

取得部１３１は、第１文字列の発音を示す第１異表記文字列と、第２文字列の発音を示す第２異表記文字列とを取得する。取得部１３１は、第１文字列の発音記号である第１異表記文字列と、第２文字列の発音記号である第２異表記文字列とを取得する。取得部１３１は、日本語の表記体系に該当する第１文字列の発音を示す第１異表記文字列と、日本語の表記体系に該当する第２文字列の発音を示す第２異表記文字列とを取得する。 The acquisition unit 131 acquires a first variant character string indicating the pronunciation of the first string and a second variant character string indicating the pronunciation of the second string. The acquisition unit 131 acquires a first variant character string which is a phonetic symbol of the first string and a second variant character string which is a phonetic symbol of the second string. The acquisition unit 131 acquires a first variant character string which is an indication of the pronunciation of the first string corresponding to the Japanese writing system and a second variant character string which is an indication of the pronunciation of the second string corresponding to the Japanese writing system.

取得部１３１は、漢字、ひらがな、及びカタカナの少なくとも１つを含む第１文字列の発音を示す第１異表記文字列と、漢字、ひらがな、及びカタカナの少なくとも１つを含む第２文字列の発音を示す第２異表記文字列とを取得する。取得部１３１は、第１文字列がローマ字に変換された第１異表記文字列と、第２文字列がローマ字に変換された第２異表記文字列とを取得する。 The acquisition unit 131 acquires a first variant character string indicating the pronunciation of a first character string including at least one of kanji, hiragana, and katakana, and a second variant character string indicating the pronunciation of a second character string including at least one of kanji, hiragana, and katakana. The acquisition unit 131 acquires a first variant character string obtained by converting the first character string into roman characters, and a second variant character string obtained by converting the second character string into roman characters.

（算出部１３２）
算出部１３２は、各種情報を算出する。算出部１３２は、記憶部１２０に記憶された各種情報に基づいて、種々の情報を算出する。算出部１３２は、取得部１３１により取得された各種情報に基づいて、種々の情報を算出する。 (Calculation Unit 132)
The calculation unit 132 calculates various pieces of information. The calculation unit 132 calculates various pieces of information based on the various pieces of information stored in the storage unit 120. The calculation unit 132 calculates various pieces of information based on the various pieces of information acquired by the acquisition unit 131.

算出部１３２は、第１異表記文字列と、第２異表記文字列との類似度を算出する。算出部１３２は、第１異表記文字列と、第２異表記文字列との間の編集距離に基づいて、類似度を算出する。算出部１３２は、第１異表記文字列と、第２異表記文字列との間の編集距離を類似度として算出する。算出部１３２は、オートマトンにより導出した編集距離に基づいて、類似度を算出する。 The calculation unit 132 calculates the similarity between the first variant character string and the second variant character string. The calculation unit 132 calculates the similarity based on the edit distance between the first variant character string and the second variant character string. The calculation unit 132 calculates the edit distance between the first variant character string and the second variant character string as the similarity. The calculation unit 132 calculates the similarity based on the edit distance derived by the automaton.

算出部１３２は、各種情報を生成してもよい。算出部１３２は、文字列を変換した変換文字列を生成してもよい。算出部１３２は、第１文字列を異表記に変換した第１異表記文字列を生成してもよい。算出部１３２は、第２文字列を異表記に変換した第２異表記文字列を生成してもよい。算出部１３２は、文字列をローマ字表記に変換することにより、異表記文字列を生成してもよい。算出部１３２は、文字列を発音記号に変換することにより、異表記文字列を生成してもよい。 The calculation unit 132 may generate various information. The calculation unit 132 may generate a converted string by converting a string. The calculation unit 132 may generate a first variant string by converting a first string into a variant. The calculation unit 132 may generate a second variant string by converting a second string into a variant. The calculation unit 132 may generate a variant string by converting a string into romanized notation. The calculation unit 132 may generate a variant string by converting a string into phonetic symbols.

（判定部１３３）
判定部１３３は、各種情報を判定する。例えば、判定部１３３は、取得部１３１により外部装置から取得された各種情報に基づいて、各種情報を判定する。例えば、判定部１３３は、記憶部１２０に記憶された情報に基づいて、各種情報を判定する。例えば、判定部１３３は、判定用情報記憶部１２１や文字列情報記憶部１２２に記憶された情報を用いて、判定を行う。 (Determination unit 133)
The determination unit 133 determines various pieces of information. For example, the determination unit 133 determines various pieces of information based on various pieces of information acquired from an external device by the acquisition unit 131. For example, the determination unit 133 determines various pieces of information based on information stored in the storage unit 120. For example, the determination unit 133 makes a determination using information stored in the determination information storage unit 121 or the character string information storage unit 122.

判定部１３３は、第１異表記文字列と、第２異表記文字列とに基づいて、第１文字列と第２文字列との類似性を判定する。判定部１３３は、算出部１３２により算出された類似度を用いて、第１文字列と第２文字列との類似性を判定する。判定部１３３は、類似度が所定値以上である場合、第１文字列と第２文字列との類似性が高いと判定する。判定部１３３は、第１文字列と第２文字列との類似性が高いと判定した場合、第２文字列を第１文字列に訂正すると判定する。 The determination unit 133 determines the similarity between the first string and the second string based on the first differently written string and the second differently written string. The determination unit 133 determines the similarity between the first string and the second string using the similarity calculated by the calculation unit 132. If the similarity is equal to or greater than a predetermined value, the determination unit 133 determines that the similarity between the first string and the second string is high. If the determination unit 133 determines that the similarity between the first string and the second string is high, it determines that the second string should be corrected to the first string.

（処理部１３４）
処理部１３４は、各種の処理を実行する。処理部１３４は、ユーザが入力したクエリに基づく検索処理を実行する。処理部１３４は、判定部１３３によりユーザが入力したクエリに訂正が不要と判定された場合、ユーザが入力したクエリを用いて検索処理を実行する。処理部１３４は、ユーザに提供する情報を生成する。処理部１３４は、ユーザに提供するコンテンツを生成する。 (Processing Unit 134)
The processing unit 134 executes various processes. The processing unit 134 executes a search process based on a query input by the user. When the determination unit 133 determines that the query input by the user does not require correction, the processing unit 134 executes a search process using the query input by the user. The processing unit 134 generates information to be provided to the user. The processing unit 134 generates content to be provided to the user.

処理部１３４は、判定部１３３によりユーザが入力したクエリに訂正（変更）が必要と判定された場合、ユーザが入力したクエリの表記を訂正する。例えば、処理部１３４は、ユーザが入力したクエリの表記を、比較した文字列に訂正（変更）する。そして、処理部１３４は、訂正（変更）後の文字列を用いて検索処理を実行する。 When the determination unit 133 determines that the query input by the user needs to be corrected (changed), the processing unit 134 corrects the notation of the query input by the user. For example, the processing unit 134 corrects (changes) the notation of the query input by the user to the compared character string. Then, the processing unit 134 executes a search process using the corrected (changed) character string.

（提供部１３５）
提供部１３５は、通信部１１０を介して、端末装置１０へ情報を送信する。提供部１３５は、ユーザが利用する端末装置１０へ検索サービスを提供する。例えば、提供部１３５は、処理部１３４による検索処理の結果である検索結果を端末装置１０へ送信する。提供部１３５は、処理部１３４により生成された情報を端末装置１０へ送信する。 (Providing Unit 135)
The providing unit 135 transmits information to the terminal device 10 via the communication unit 110. The providing unit 135 provides a search service to the terminal device 10 used by the user. For example, the providing unit 135 transmits a search result, which is a result of a search process by the processing unit 134, to the terminal device 10. The providing unit 135 transmits information generated by the processing unit 134 to the terminal device 10.

提供部１３５は、文字列を異表記に変換する外部装置（変換サーバ）に文字列を送信してもよい。例えば、提供部１３５は、異表記への変換を要求する文字列を変換サーバへ送信してもよい。 The providing unit 135 may transmit the character string to an external device (conversion server) that converts the character string into a variant spelling. For example, the providing unit 135 may transmit the character string requesting conversion into a variant spelling to the conversion server.

〔３．処理フロー〕
次に、図６を用いて、実施形態に係る情報処理システム１による情報処理の手順について説明する。図６は、実施形態に係る情報処理装置による処理の一例を示すフローチャートである。 3. Processing flow
Next, a procedure of information processing by the information processing system 1 according to the embodiment will be described with reference to Fig. 6. Fig. 6 is a flowchart showing an example of processing by the information processing device according to the embodiment.

図６に示すように、情報処理装置１００は、所定の文字列である第１文字列を異表記に変換した第１異表記文字列を取得する（ステップＳ１０１）。また、情報処理装置１００は、第１文字列とは異なる文字列である第２文字列を異表記に変換した第２異表記文字列を取得する（ステップＳ１０２）。 As shown in FIG. 6, the information processing device 100 obtains a first variant character string by converting a first character string, which is a predetermined character string, into a variant character string (step S101). The information processing device 100 also obtains a second variant character string by converting a second character string, which is a character string different from the first character string, into a variant character string (step S102).

情報処理装置１００は、第１異表記文字列と、第２異表記文字列とに基づいて、第１文字列と第２文字列との類似性を判定する（ステップＳ１０３）。 The information processing device 100 determines the similarity between the first string and the second string based on the first variant string and the second variant string (step S103).

〔４．効果〕
上述してきたように、実施形態に係る情報処理装置１００は、取得部１３１と、判定部１３３とを有する。取得部１３１は、所定の文字列である第１文字列を異表記に変換した第１異表記文字列と、第１文字列とは異なる文字列である第２文字列を異表記に変換した第２異表記文字列とを取得する。判定部１３３は、第１異表記文字列と、第２異表記文字列とに基づいて、第１文字列と第２文字列との類似性を判定する。 [4. Effects]
As described above, the information processing device 100 according to the embodiment includes the acquisition unit 131 and the determination unit 133. The acquisition unit 131 acquires a first variant string obtained by converting a first string, which is a predetermined string, into a variant string, and a second variant string obtained by converting a second string, which is a string different from the first string, into a variant string. The determination unit 133 determines the similarity between the first string and the second string based on the first variant string and the second variant string.

このように、実施形態に係る情報処理装置１００は、文字列の異表記を用いて文字列間の類似性を判定することにより、文字列間の類似性を適切に判定することができる。 In this way, the information processing device 100 according to the embodiment can appropriately determine the similarity between strings by determining the similarity between strings using different spellings of the strings.

また、実施形態に係る情報処理装置１００は、算出部１３２を有する。算出部１３２は、第１異表記文字列と、第２異表記文字列との類似度を算出する。判定部１３３は、算出部１３２により算出された類似度を用いて、第１文字列と第２文字列との類似性を判定する。 The information processing device 100 according to the embodiment also includes a calculation unit 132. The calculation unit 132 calculates the similarity between the first differently written character string and the second differently written character string. The determination unit 133 uses the similarity calculated by the calculation unit 132 to determine the similarity between the first character string and the second character string.

このように、実施形態に係る情報処理装置１００は、第１異表記文字列と、第２異表記文字列との類似度を算出し、算出した類似度を用いて、第１文字列と第２文字列との類似性を判定することにより、文字列間の類似性を適切に判定することができる。 In this way, the information processing device 100 according to the embodiment calculates the similarity between a first differently written string and a second differently written string, and uses the calculated similarity to determine the similarity between the first string and the second string, thereby appropriately determining the similarity between the strings.

また、実施形態に係る情報処理装置１００において、算出部１３２は、第１異表記文字列と、第２異表記文字列との間の編集距離に基づいて、類似度を算出する。 In addition, in the information processing device 100 according to the embodiment, the calculation unit 132 calculates the similarity based on the edit distance between the first variant character string and the second variant character string.

このように、実施形態に係る情報処理装置１００は、第１異表記文字列と、第２異表記文字列との間の編集距離に基づいて、類似度を算出することにより、文字列間の類似度を適切に算出することができる。 In this way, the information processing device 100 according to the embodiment can appropriately calculate the similarity between strings by calculating the similarity based on the edit distance between the first variant string and the second variant string.

また、実施形態に係る情報処理装置１００において、算出部１３２は、第１異表記文字列と、第２異表記文字列との間の編集距離を類似度として算出する。 In addition, in the information processing device 100 according to the embodiment, the calculation unit 132 calculates the edit distance between the first variant character string and the second variant character string as the similarity.

このように、実施形態に係る情報処理装置１００は、第１異表記文字列と、第２異表記文字列との間の編集距離を類似度とすることにより、文字列間の類似度を適切に算出することができる。 In this way, the information processing device 100 according to the embodiment can appropriately calculate the similarity between strings by using the edit distance between the first differently written string and the second differently written string as the similarity.

また、実施形態に係る情報処理装置１００において、算出部１３２は、オートマトンにより導出した編集距離に基づいて、類似度を算出する。 In addition, in the information processing device 100 according to the embodiment, the calculation unit 132 calculates the similarity based on the edit distance derived by the automaton.

このように、実施形態に係る情報処理装置１００は、オートマトンにより導出した編集距離に基づいて、類似度を算出することにより、文字列間の類似度を適切に算出することができる。 In this way, the information processing device 100 according to the embodiment can appropriately calculate the similarity between strings by calculating the similarity based on the edit distance derived by the automaton.

また、実施形態に係る情報処理装置１００において、判定部１３３は、類似度が所定値以上である場合、第１文字列と第２文字列との類似性が高いと判定する。 In addition, in the information processing device 100 according to the embodiment, the determination unit 133 determines that the similarity between the first string and the second string is high if the similarity is equal to or greater than a predetermined value.

このように、実施形態に係る情報処理装置１００は、類似度が所定値以上である場合、第１文字列と第２文字列との類似性が高いと判定することにより、文字列間の類似性を適切に判定することができる。 In this way, the information processing device 100 according to the embodiment can appropriately determine the similarity between the strings by determining that the similarity between the first string and the second string is high when the similarity is equal to or greater than a predetermined value.

また、実施形態に係る情報処理装置１００において、取得部１３１は、所定の対象を示す第１文字列の第１異表記文字列と、第２文字列の第２異表記文字列とを取得する。判定部１３３は、第１文字列と第２文字列との類似性が高いと判定した場合、第２文字列を第１文字列に訂正すると判定する。 In addition, in the information processing device 100 according to the embodiment, the acquisition unit 131 acquires a first variant character string of a first character string indicating a predetermined target and a second variant character string of a second character string. When the determination unit 133 determines that the first character string and the second character string are highly similar, it determines to correct the second character string to the first character string.

このように、実施形態に係る情報処理装置１００は、所定の対象を示す第１文字列と、第２文字列との類似性が高いと判定した場合、第２文字列を第１文字列に訂正すると判定することにより、文字列間の類似性に応じて文字列を適切に訂正可能にすることができる。 In this way, when the information processing device 100 according to the embodiment determines that there is a high similarity between a first string indicating a specific object and a second string, it determines to correct the second string to the first string, thereby making it possible to appropriately correct the string according to the similarity between the strings.

また、実施形態に係る情報処理装置１００において、取得部１３１は、第１文字列の発音を示す第１異表記文字列と、第２文字列の発音を示す第２異表記文字列とを取得する。 In addition, in the information processing device 100 according to the embodiment, the acquisition unit 131 acquires a first variant string indicating the pronunciation of the first string and a second variant string indicating the pronunciation of the second string.

このように、実施形態に係る情報処理装置１００は、文字列の発音を示す異表記を用いて文字列間の類似性を判定することにより、文字列間の類似性を適切に判定することができる。 In this way, the information processing device 100 according to the embodiment can appropriately determine the similarity between strings by determining the similarity between strings using variants that indicate the pronunciation of the strings.

また、実施形態に係る情報処理装置１００において、取得部１３１は、第１文字列の発音記号である第１異表記文字列と、第２文字列の発音記号である第２異表記文字列とを取得する。 In addition, in the information processing device 100 according to the embodiment, the acquisition unit 131 acquires a first variant string, which is the phonetic symbol of the first string, and a second variant string, which is the phonetic symbol of the second string.

このように、実施形態に係る情報処理装置１００は、文字列の発音記号を用いて文字列間の類似性を判定することにより、文字列間の類似性を適切に判定することができる。 In this way, the information processing device 100 according to the embodiment can appropriately determine the similarity between strings by using the phonetic symbols of the strings.

また、実施形態に係る情報処理装置１００において、取得部１３１は、日本語の表記体系に該当する第１文字列の発音を示す第１異表記文字列と、日本語の表記体系に該当する第２文字列の発音を示す第２異表記文字列とを取得する。 In addition, in the information processing device 100 according to the embodiment, the acquisition unit 131 acquires a first variant character string indicating the pronunciation of a first character string corresponding to the Japanese writing system, and a second variant character string indicating the pronunciation of a second character string corresponding to the Japanese writing system.

このように、実施形態に係る情報処理装置１００は、日本語の文字列を対象として、文字列間の類似性を適切に判定することができる。 In this way, the information processing device 100 according to the embodiment can appropriately determine the similarity between character strings when targeting Japanese character strings.

また、実施形態に係る情報処理装置１００において、取得部１３１は、漢字、ひらがな、及びカタカナの少なくとも１つを含む第１文字列の発音を示す第１異表記文字列と、漢字、ひらがな、及びカタカナの少なくとも１つを含む第２文字列の発音を示す第２異表記文字列とを取得する。 In addition, in the information processing device 100 according to the embodiment, the acquisition unit 131 acquires a first variant character string indicating the pronunciation of a first character string including at least one of kanji, hiragana, and katakana, and a second variant character string indicating the pronunciation of a second character string including at least one of kanji, hiragana, and katakana.

このように、実施形態に係る情報処理装置１００は、漢字、ひらがな、及びカタカナの少なくとも１つを含む文字列を対象として、文字列間の類似性を適切に判定することができる。 In this way, the information processing device 100 according to the embodiment can appropriately determine the similarity between character strings that include at least one of kanji, hiragana, and katakana.

また、実施形態に係る情報処理装置１００において、取得部１３１は、第１文字列がローマ字に変換された第１異表記文字列と、第２文字列がローマ字に変換された第２異表記文字列とを取得する。 In addition, in the information processing device 100 according to the embodiment, the acquisition unit 131 acquires a first variant character string in which the first character string is converted into roman characters, and a second variant character string in which the second character string is converted into roman characters.

このように、実施形態に係る情報処理装置１００は、文字列がローマ字に変換された異表記を用いて文字列間の類似性を判定することにより、文字列間の類似性を適切に判定することができる。 In this way, the information processing device 100 according to the embodiment can appropriately determine the similarity between character strings by determining the similarity between character strings using different notations in which the character strings are converted into Roman characters.

〔５．ハードウェア構成〕
また、上述した実施形態に係る情報処理装置１００や端末装置１０は、例えば図８に示すような構成のコンピュータ１０００によって実現される。以下、情報処理装置１００を例に挙げて説明する。図８は、ハードウェア構成の一例を示す図である。コンピュータ１０００は、出力装置１０１０、入力装置１０２０と接続され、演算装置１０３０、一次記憶装置１０４０、二次記憶装置１０５０、出力Ｉ／Ｆ（Interface）１０６０、入力Ｉ／Ｆ１０７０、ネットワークＩ／Ｆ１０８０がバス１０９０により接続された形態を有する。 5. Hardware Configuration
The information processing device 100 and the terminal device 10 according to the above-described embodiment are realized by a computer 1000 having a configuration as shown in Fig. 8, for example. The information processing device 100 will be described below as an example. Fig. 8 is a diagram showing an example of a hardware configuration. The computer 1000 is connected to an output device 1010 and an input device 1020, and has a configuration in which a calculation device 1030, a primary storage device 1040, a secondary storage device 1050, an output I/F (Interface) 1060, an input I/F 1070, and a network I/F 1080 are connected by a bus 1090.

演算装置１０３０は、一次記憶装置１０４０や二次記憶装置１０５０に格納されたプログラムや入力装置１０２０から読み出したプログラム等に基づいて動作し、各種の処理を実行する。演算装置１０３０は、例えばＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等により実現される。 The arithmetic device 1030 operates based on programs stored in the primary storage device 1040 and the secondary storage device 1050, programs read from the input device 1020, and the like, and executes various processes. The arithmetic device 1030 is realized, for example, by a CPU (Central Processing Unit), an MPU (Micro Processing Unit), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or the like.

一次記憶装置１０４０は、ＲＡＭ（Random Access Memory）等、演算装置１０３０が各種の演算に用いるデータを一次的に記憶するメモリ装置である。また、二次記憶装置１０５０は、演算装置１０３０が各種の演算に用いるデータや、各種のデータベースが登録される記憶装置であり、ＲＯＭ（Read Only Memory）、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、フラッシュメモリ等により実現される。二次記憶装置１０５０は、内蔵ストレージであってもよいし、外付けストレージであってもよい。また、二次記憶装置１０５０は、ＵＳＢメモリやＳＤ（Secure Digital）メモリカード等の取り外し可能な記憶媒体であってもよい。また、二次記憶装置１０５０は、クラウドストレージ（オンラインストレージ）やＮＡＳ（Network Attached Storage）、ファイルサーバ等であってもよい。 The primary storage device 1040 is a memory device such as a RAM (Random Access Memory) that primarily stores data used by the arithmetic device 1030 for various calculations. The secondary storage device 1050 is a storage device in which data used by the arithmetic device 1030 for various calculations and various databases are registered, and is realized by a ROM (Read Only Memory), a HDD (Hard Disk Drive), a SSD (Solid State Drive), a flash memory, or the like. The secondary storage device 1050 may be an internal storage device or an external storage device. The secondary storage device 1050 may be a removable storage medium such as a USB memory or a SD (Secure Digital) memory card. The secondary storage device 1050 may be a cloud storage device (online storage device), a NAS (Network Attached Storage), a file server, or the like.

出力Ｉ／Ｆ１０６０は、ディスプレイ、プロジェクタ、及びプリンタ等といった各種の情報を出力する出力装置１０１０に対し、出力対象となる情報を送信するためのインターフェイスであり、例えば、ＵＳＢ（Universal Serial Bus）やＤＶＩ（Digital Visual Interface）、ＨＤＭＩ（登録商標）（High Definition Multimedia Interface）といった規格のコネクタにより実現される。また、入力Ｉ／Ｆ１０７０は、マウス、キーボード、キーパッド、ボタン、及びスキャナ等といった各種の入力装置１０２０から情報を受信するためのインターフェイスであり、例えば、ＵＳＢ等により実現される。 The output I/F 1060 is an interface for transmitting information to be output to an output device 1010 that outputs various types of information, such as a display, projector, printer, etc., and is realized by a connector conforming to a standard such as USB (Universal Serial Bus), DVI (Digital Visual Interface), or HDMI (registered trademark) (High Definition Multimedia Interface). The input I/F 1070 is an interface for receiving information from various input devices 1020, such as a mouse, keyboard, keypad, button, scanner, etc., and is realized by a USB, etc.

また、出力Ｉ／Ｆ１０６０及び入力Ｉ／Ｆ１０７０はそれぞれ出力装置１０１０及び入力装置１０２０と無線で接続してもよい。すなわち、出力装置１０１０及び入力装置１０２０は、ワイヤレス機器であってもよい。 In addition, the output I/F 1060 and the input I/F 1070 may be wirelessly connected to the output device 1010 and the input device 1020, respectively. That is, the output device 1010 and the input device 1020 may be wireless devices.

また、出力装置１０１０及び入力装置１０２０は、タッチパネルのように一体化していてもよい。この場合、出力Ｉ／Ｆ１０６０及び入力Ｉ／Ｆ１０７０も、入出力Ｉ／Ｆとして一体化していてもよい。 The output device 1010 and the input device 1020 may be integrated together, such as a touch panel. In this case, the output I/F 1060 and the input I/F 1070 may also be integrated together as an input/output I/F.

なお、入力装置１０２０は、例えば、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、又は半導体メモリ等から情報を読み出す装置であってもよい。 The input device 1020 may be a device that reads information from, for example, an optical recording medium such as a CD (Compact Disc), a DVD (Digital Versatile Disc), or a PD (Phase change rewritable Disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory.

ネットワークＩ／Ｆ１０８０は、ネットワークＮを介して他の機器からデータを受信して演算装置１０３０へ送り、また、ネットワークＮを介して演算装置１０３０が生成したデータを他の機器へ送信する。 The network I/F 1080 receives data from other devices via the network N and sends it to the computing device 1030, and also transmits data generated by the computing device 1030 to other devices via the network N.

演算装置１０３０は、出力Ｉ／Ｆ１０６０や入力Ｉ／Ｆ１０７０を介して、出力装置１０１０や入力装置１０２０の制御を行う。例えば、演算装置１０３０は、入力装置１０２０や二次記憶装置１０５０からプログラムを一次記憶装置１０４０上にロードし、ロードしたプログラムを実行する。 The arithmetic unit 1030 controls the output device 1010 and the input device 1020 via the output I/F 1060 and the input I/F 1070. For example, the arithmetic unit 1030 loads a program from the input device 1020 or the secondary storage device 1050 onto the primary storage device 1040 and executes the loaded program.

例えば、コンピュータ１０００が情報処理装置１００として機能する場合、コンピュータ１０００の演算装置１０３０は、一次記憶装置１０４０上にロードされたプログラムを実行することにより、制御部１３０の機能を実現する。また、コンピュータ１０００の演算装置１０３０は、ネットワークＩ／Ｆ１０８０を介して他の機器から取得したプログラムを一次記憶装置１０４０上にロードし、ロードしたプログラムを実行してもよい。また、コンピュータ１０００の演算装置１０３０は、ネットワークＩ／Ｆ１０８０を介して他の機器と連携し、プログラムの機能やデータ等を他の機器の他のプログラムから呼び出して利用してもよい。 For example, when the computer 1000 functions as the information processing device 100, the arithmetic unit 1030 of the computer 1000 executes a program loaded onto the primary storage device 1040 to realize the functions of the control unit 130. The arithmetic unit 1030 of the computer 1000 may also load a program acquired from another device via the network I/F 1080 onto the primary storage device 1040 and execute the loaded program. The arithmetic unit 1030 of the computer 1000 may also cooperate with other devices via the network I/F 1080 and use the functions and data of a program by calling them from other programs of the other devices.

〔６．その他〕
以上、本願の実施形態を説明したが、これら実施形態の内容により本発明が限定されるものではない。また、前述した構成要素には、当業者が容易に想定できるもの、実質的に同一のもの、いわゆる均等の範囲のものが含まれる。さらに、前述した構成要素は適宜組み合わせることが可能である。さらに、前述した実施形態の要旨を逸脱しない範囲で構成要素の種々の省略、置換又は変更を行うことができる。 [6. Other]
Although the embodiments of the present application have been described above, the present invention is not limited to the contents of these embodiments. The above-described components include those that can be easily imagined by a person skilled in the art, those that are substantially the same, and those that are within the so-called equivalent range. Furthermore, the above-described components can be appropriately combined. Furthermore, various omissions, substitutions, or modifications of the components can be made without departing from the spirit of the above-described embodiments.

また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部又は一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部又は一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 Furthermore, among the processes described in the above embodiments, all or part of the processes described as being performed automatically can be performed manually, or all or part of the processes described as being performed manually can be performed automatically using known methods. In addition, the information including the processing procedures, specific names, various data, and parameters shown in the above documents and drawings can be changed as desired unless otherwise specified. For example, the various information shown in each drawing is not limited to the information shown in the drawings.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部又は一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的又は物理的に分散・統合して構成することができる。 In addition, each component of each device shown in the figure is a functional concept, and does not necessarily have to be physically configured as shown in the figure. In other words, the specific form of distribution and integration of each device is not limited to that shown in the figure, and all or part of them can be functionally or physically distributed and integrated in any unit depending on various loads, usage conditions, etc.

例えば、上述した情報処理装置１００は、複数のサーバコンピュータで実現してもよく、また、機能によっては外部のプラットホーム等をＡＰＩ（Application Programming Interface）やネットワークコンピューティング等で呼び出して実現するなど、構成は柔軟に変更できる。 For example, the information processing device 100 described above may be realized by multiple server computers, and depending on the functions, the configuration can be flexibly changed, such as by calling an external platform using an API (Application Programming Interface) or network computing.

また、上述してきた実施形態及び変形例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 The above-described embodiments and variations can be combined as appropriate to the extent that they do not cause inconsistencies in the processing content.

また、上述してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、取得部は、取得手段や取得回路に読み替えることができる。 The above-mentioned "section, module, unit" can be read as "means" or "circuit." For example, an acquisition unit can be read as an acquisition means or an acquisition circuit.

１情報処理システム
１００情報処理装置
１２０記憶部
１２１判定用情報記憶部
１２２文字列情報記憶部
１３０制御部
１３１取得部
１３２算出部
１３３判定部
１３４処理部
１３５提供部
１０端末装置 REFERENCE SIGNS LIST 1 Information processing system 100 Information processing device 120 Storage unit 121 Determination information storage unit 122 Character string information storage unit 130 Control unit 131 Acquisition unit 132 Calculation unit 133 Determination unit 134 Processing unit 135 Provision unit 10 Terminal device

Claims

an acquisition unit that acquires a first variant character string obtained by converting a first character string, which is a predetermined character string, into a variant character string, and a second variant character string obtained by converting a second character string, which is a character string different from the first character string and which has been input as a search query by a user, into a variant character string;
a determination unit that determines a similarity between the first character string and the second character string based on the first character string and the second character string;
a provision unit that provides to the user search results that are results of a search process using a character string according to a result of the determination by the determination unit, and when the user corrects the search query inputted by the user, provides a first search result that is a result of the search process using a corrected query that is the corrected character string, and a second search result that is a result of the search process using the search query inputted by the user;
An information processing device comprising:

a calculation unit that calculates a similarity between the first differently written character string and the second differently written character string;
Further equipped with
The determination unit is
The information processing apparatus according to claim 1 , further comprising: a processor configured to determine a similarity between the first character string and the second character string using the degree of similarity calculated by the calculation unit.

The calculation unit is
The information processing apparatus according to claim 2 , wherein the similarity is calculated based on an edit distance between the first differently written character string and the second differently written character string.

The calculation unit is
The information processing apparatus according to claim 3 , wherein an edit distance between the first differently written character string and the second differently written character string is calculated as the similarity.

The calculation unit is
The information processing apparatus according to claim 3 , wherein the similarity is calculated based on the edit distance derived by an automaton.

The determination unit is
6. The information processing apparatus according to claim 2, further comprising: determining that the similarity between the first character string and the second character string is high when the similarity is equal to or greater than a predetermined value.

The acquisition unit is
obtaining the first variant character string of the first character string and the second variant character string of the second character string that indicate a predetermined object;
The determination unit is
The information processing apparatus according to claim 6 , further comprising: a step of: determining, when it is determined that the similarity between the first character string and the second character string is high, correcting the second character string to the first character string.

The acquisition unit is
The information processing device according to any one of claims 1 to 7, further comprising: acquiring the first variant character string indicating a pronunciation of the first character string; and acquiring the second variant character string indicating a pronunciation of the second character string.

The acquisition unit is
The information processing apparatus according to claim 8 , further comprising: acquiring the first variant character string, which is a phonetic symbol of the first character string; and acquiring the second variant character string, which is a phonetic symbol of the second character string.

The acquisition unit is
9. The information processing device according to claim 8, further comprising: acquiring the first variant character string indicating a pronunciation of the first character string corresponding to a Japanese writing system; and acquiring the second variant character string indicating a pronunciation of the second character string corresponding to a Japanese writing system.

The acquisition unit is
11. The information processing device according to claim 10, further comprising: acquiring the first variant character string indicating a pronunciation of the first character string including at least one of kanji, hiragana, and katakana; and acquiring the second variant character string indicating a pronunciation of the second character string including at least one of kanji, hiragana, and katakana.

The acquisition unit is
The information processing apparatus according to claim 10 or 11, further comprising: acquiring the first character string with different writing modes obtained by converting the first character string into roman characters; and acquiring the second character string with different writing modes obtained by converting the second character string into roman characters.

1. A computer-implemented information processing method, comprising:
an acquiring step of acquiring a first variant string obtained by converting a first string, which is a predetermined string, into a variant, and a second variant string obtained by converting a second string, which is a string different from the first string and which has been input as a search query by a user, into a variant;
a determining step of determining a similarity between the first character string and the second character string based on the first character string and the second character string;
providing the user with search results that are the result of a search process using a character string according to a result of the determination step, in which , when the user corrects the search query inputted by the user, a first search result that is the result of the search process using a corrected query that is the corrected character string, and a second search result that is the result of the search process using the search query inputted by the user are provided;
13. An information processing method comprising:

an acquisition step of acquiring a first variant string obtained by converting a first string, which is a predetermined string, into a variant, and a second variant string obtained by converting a second string, which is a string different from the first string and which has been input as a search query by a user, into a variant;
a determination step of determining a similarity between the first character string and the second character string based on the first character string and the second character string;
a provision step of providing to the user search results that are results of a search process using a character string according to a result of the determination step, in which , when the user corrects the search query inputted by the user, a first search result that is a result of the search process using a corrected query that is the corrected character string, and a second search result that is a result of the search process using the search query inputted by the user are provided;
An information processing program characterized by causing a computer to execute the above.