JP7615841B2

JP7615841B2 - Voice recognition program and voice recognition device

Info

Publication number: JP7615841B2
Application number: JP2021060947A
Authority: JP
Inventors: 正樹中村
Original assignee: Aisin Seiki Co Ltd; Aisin Corp
Current assignee: Aisin Corp
Priority date: 2021-01-29
Filing date: 2021-03-31
Publication date: 2025-01-17
Anticipated expiration: 2041-03-31
Also published as: JP2022117376A; JP2022117375A; JP7542826B2; JP2022117374A; JP7552481B2

Description

本発明は、音声認識プログラム及び音声認識装置に関するものである。 The present invention relates to a voice recognition program and a voice recognition device.

特許文献１には、語句の種別（住所、施設名、電話番号等）毎に認識辞書が設けられ、入力された音声を、予めユーザが選択している語句の種別に該当する認識辞書で検索することで音声認識を行う技術が開示されている。具体的に、認識辞書には、仮名文字やアルファベット等による発音を表記する文字列（以下「発音文字列」という）と、その発音文字列に対応する語句とが記憶されている。入力された音声を文字列に変換した入力文字列を認識辞書の発音文字列で検索することで、該当する語句が取得される。認識辞書を語彙の種別毎に設けることで、全ての種別の語彙の発音文字列および語句を１つの認識辞書に記憶するよりも発音文字列および語句の数を抑制できるので、検索に要する時間を短縮できる。 Patent Document 1 discloses a technology for performing speech recognition by providing a recognition dictionary for each type of word (address, facility name, telephone number, etc.) and searching input speech in the recognition dictionary corresponding to the type of word selected in advance by the user. Specifically, the recognition dictionary stores character strings (hereinafter referred to as "pronunciation character strings") that represent pronunciations using kana characters, alphabets, etc., and phrases that correspond to the pronunciation character strings. The corresponding phrase is obtained by searching the input string, which is a character string converted from the input speech, in the pronunciation character strings in the recognition dictionary. By providing a recognition dictionary for each type of vocabulary, the number of pronunciation character strings and phrases can be reduced compared to storing the pronunciation character strings and phrases of all types of vocabulary in a single recognition dictionary, and the time required for search can be shortened.

特開２００６－１６２７８２号公報（例えば、段落００１２－００１５，００３３－００３７，図１，４，５）JP 2006-162782 A (for example, paragraphs 0012-0015, 0033-0037, Figures 1, 4, 5)

しかしながら、認識辞書は語句の種別毎に構成される一方で、その認識辞書には語句に応じた様々な発音文字列が記憶される。例えば、５文字の入力文字列に対し、認識辞書には語句に応じた２文字や３文字等の様々な文字数の発音文字列が記憶されるので、検索においては入力文字列と文字数が同一の発音文字列との参照に加え、入力文字列と文字数の異なる発音文字列とも参照しなければならない。これによって検索に時間を要し、音声認識のレスポンスが低下する虞があるという問題点があった。 However, while the recognition dictionary is organized by word type, it also stores various pronunciation strings corresponding to the word. For example, for an input string of five characters, the recognition dictionary stores pronunciation strings of various numbers of characters, such as two or three characters, depending on the word. Therefore, in addition to referencing pronunciation strings with the same number of characters as the input string, a search must also refer to pronunciation strings with a different number of characters than the input string. This causes problems in that searches take time and there is a risk of a slower voice recognition response.

本発明は、上述した問題点を解決するためになされたものであり、音声認識のレスポンスを向上できる音声認識プログラム及び音声認識装置を提供することを目的としている。 The present invention has been made to solve the above-mentioned problems, and aims to provide a voice recognition program and a voice recognition device that can improve the response of voice recognition.

この目的を達成するために本発明の音声認識プログラムは、記憶部を備えたコンピュータに、音声認識処理を実行させるプログラムであって、前記記憶部を、発音文字列とその発音文字列に対応する語句との組み合わせによる辞書データであって、前記発音文字列の特徴量毎に分けられた複数の変換辞書データを記憶する変換辞書記憶手段として機能させ、入力された音声をその発音を表す入力文字列に変換する音声変換ステップと、その音声変換ステップで変換された入力文字列の特徴量を取得する特徴量取得ステップと、前記変換辞書記憶手段から前記特徴量取得ステップで取得された特徴量に該当する変換辞書データを取得する辞書取得ステップと、その辞書取得ステップで取得された変換辞書データから前記音声変換ステップで変換された入力文字列に該当する語句を検索する検索ステップと、その検索ステップで語句が検索できた場合は、その検索された語句を出力し、前記検索ステップで語句が検索できなかった場合は、前記音声変換ステップで変換された入力文字列を出力する出力ステップとを備え、前記変換辞書データは、複数の前記変換辞書データに含まれる全ての発音文字列とその発音文字列に対応する語句との組み合わせが記憶される全辞書データから作成されるものであり、前記全辞書データが更新された場合に、更新後の前記全辞書データに含まれる発音文字列とその発音文字列に対応する語句との組み合わせを、前記発音文字列の特徴量に基づいて並び変え、並び変えた後の発音文字列とその発音文字列に対応する語句との組み合わせから、前記発音文字列の特徴量毎の変換辞書データが作成される。 In order to achieve this object, a speech recognition program of the present invention is a program for causing a computer having a storage unit to execute a speech recognition process, the storage unit being configured to function as a conversion dictionary storage means for storing a plurality of conversion dictionary data, which are dictionary data consisting of a combination of a pronunciation string and a word corresponding to the pronunciation string, the plurality of conversion dictionary data being divided by feature amount of the pronunciation string, and the program includes a speech conversion step of converting an input speech into an input string representing the pronunciation, a feature amount acquisition step of acquiring the feature amount of the input string converted in the speech conversion step, a dictionary acquisition step of acquiring from the conversion dictionary storage means conversion dictionary data corresponding to the feature amount acquired in the feature amount acquisition step, and ... data acquired in the dictionary acquisition step the conversion dictionary data corresponding to the input string converted in the speech conversion step. The conversion dictionary data is created from all dictionary data in which combinations of all pronunciation strings included in a plurality of conversion dictionary data and words corresponding to those pronunciation strings are stored, and when the all dictionary data is updated, combinations of pronunciation strings and words corresponding to those pronunciation strings included in the all dictionary data after the update are rearranged based on features of the pronunciation strings, and conversion dictionary data for each feature of the pronunciation strings is created from the rearranged combinations of pronunciation strings and words corresponding to those pronunciation strings .

また本発明の音声認識装置は、発音文字列とその発音文字列に対応する語句との組み合わせによる辞書データであって、前記発音文字列の特徴量毎に分けられた複数の変換辞書データを記憶する変換辞書記憶手段と、音声を入力する音声入力手段と、その音声入力手段で入力された音声をその発音を表す入力文字列に変換する音声変換手段と、その音声変換手段で変換された入力文字列の特徴量を取得する特徴量取得手段と、前記変換辞書記憶手段で記憶された変換辞書データのうち、前記特徴量取得手段で取得された特徴量に該当する変換辞書データを取得する辞書取得手段と、その辞書取得手段で取得された変換辞書データから前記音声変換手段で変換された入力文字列に該当する語句を検索する検索手段と、その検索手段で語句が検索できた場合は、その検索された語句を出力し、前記検索手段で語句が検索できなかった場合は、前記音声変換手段で変換された入力文字列を出力する出力手段とを備え、前記変換辞書データは、複数の前記変換辞書データに含まれる全ての発音文字列とその発音文字列に対応する語句との組み合わせが記憶される全辞書データから作成されるものであり、前記全辞書データが更新された場合に、更新後の前記全辞書データに含まれる発音文字列とその発音文字列に対応する語句との組み合わせを、前記発音文字列の特徴量に基づいて並び変え、並び変えた後の発音文字列とその発音文字列に対応する語句との組み合わせから、前記発音文字列の特徴量毎の変換辞書データが作成される。 The speech recognition device of the present invention further comprises: a conversion dictionary storage means for storing a plurality of conversion dictionary data consisting of a combination of a pronunciation character string and a word or phrase corresponding to the pronunciation character string, the conversion dictionary data being divided into a plurality of conversion dictionary data for each feature of the pronunciation character string; a speech input means for inputting speech; a speech conversion means for converting the speech input by the speech input means into an input string representing the pronunciation; a feature acquisition means for acquiring a feature of the input string converted by the speech conversion means; a dictionary acquisition means for acquiring conversion dictionary data corresponding to the feature acquired by the feature acquisition means from among the conversion dictionary data stored in the conversion dictionary storage means; and a search means for searching for a word or phrase corresponding to the input string converted by the speech conversion means from the conversion dictionary data acquired by the dictionary acquisition means. and an output means for outputting the searched word if the search means is able to search for a word, and for outputting the input string converted by the voice conversion means if the search means is unable to search for a word . The conversion dictionary data is created from all dictionary data in which combinations of all pronunciation strings contained in a plurality of the conversion dictionary data and words corresponding to those pronunciation strings are stored, and when the all dictionary data is updated, the combinations of pronunciation strings and words corresponding to those pronunciation strings contained in the updated all dictionary data are rearranged based on the features of the pronunciation strings, and conversion dictionary data for each feature of the pronunciation strings is created from the rearranged combinations of pronunciation strings and words corresponding to those pronunciation strings .

請求項１記載の音声認識プログラムによれば、発音文字列とその発音文字列に対応する語句との組み合わせによる辞書データであって、発音文字列の特徴量毎に分けられた複数の変換辞書データが記憶される。そして入力された音声の発音を表す入力文字列からその特徴量が取得され、その取得された特徴量に該当する変換辞書データが取得される。そして、取得された変換辞書データから入力文字列に該当する語句が検索され、語句が検索できた場合はその語句が出力され、語句が検索できなかった場合は、入力文字列が出力される。 According to the speech recognition program of claim 1, a plurality of conversion dictionary data are stored, which are dictionary data consisting of a combination of a pronunciation string and a phrase corresponding to the pronunciation string, and which are divided according to the feature of the pronunciation string. Then, the feature is acquired from the input string representing the pronunciation of the input voice, and conversion dictionary data corresponding to the acquired feature is acquired. Then, a phrase corresponding to the input string is searched for from the acquired conversion dictionary data, and if the phrase is found, the phrase is output, and if the phrase is not found, the input string is output.

即ち語句の検索に用いられる変換辞書データには、入力文字列と共通する特徴量の発音文字列とその語句とが記憶されるので、当該変換辞書データには入力文字列に類似する発音文字列のみが記憶される。これにより、当該変換辞書データから入力文字列に該当する発音文字列の語句を迅速に検索できるので、音声認識のレスポンス向上できるという効果がある。
加えて、変換辞書データは、複数の変換辞書データに含まれる全ての発音文字列とその発音文字列に対応する語句との組み合わせが記憶される全辞書データから作成され、全辞書データが更新された場合に、更新後の全辞書データに含まれる発音文字列とその発音文字列に対応する語句との組み合わせを、発音文字列の特徴量に基づいて並び変え、並び変えた後の発音文字列とその発音文字列に対応する語句との組み合わせから、発音文字列の特徴量毎の変換辞書データが作成される。
このように作成された変換辞書データには、全辞書データにおいて更新された発音文字列とその発音文字列に対応する語句との組み合わせであって、変換辞書データに対応する特徴量の発音文字列とその発音文字列に対応する語句との組み合わせを、適切に含めることができるという効果がある。また、全辞書データが更新されることで変換辞書データが自動で作成されるので、変換辞書データの更新にかかるユーザの手間を低減させることができるという効果もある。 That is, the conversion dictionary data used for searching for words and phrases stores pronunciation character strings of features common to the input string and the words and phrases, so that the conversion dictionary data stores only pronunciation character strings similar to the input string. This allows the conversion dictionary data to be quickly searched for words and phrases with pronunciation character strings that match the input string, thereby improving the response of speech recognition.
In addition, the conversion dictionary data is created from all dictionary data in which combinations of all pronunciation strings included in multiple conversion dictionary data and words corresponding to those pronunciation strings are stored, and when the all dictionary data is updated, the combinations of pronunciation strings and words corresponding to those pronunciation strings included in the updated all dictionary data are rearranged based on the features of the pronunciation strings, and conversion dictionary data for each feature of the pronunciation string is created from the rearranged combinations of pronunciation strings and words corresponding to those pronunciation strings.
The conversion dictionary data created in this way has the effect of being able to appropriately include combinations of pronunciation character strings updated in the entire dictionary data and phrases corresponding to the pronunciation character strings, and combinations of pronunciation character strings of features corresponding to the conversion dictionary data and phrases corresponding to the pronunciation character strings. In addition, since the conversion dictionary data is automatically created by updating the entire dictionary data, it is also possible to reduce the user's effort in updating the conversion dictionary data.

請求項２記載の音声認識プログラムによれば、請求項１記載の音声認識プログラムの奏する効果に加え、次の効果を奏する。入力された音声が単語毎に分解された入力文字列に変換され、その単語毎に分解された入力文字列のそれぞれの特徴量が取得される。そして、取得された単語毎に分解された入力文字列のそれぞれの特徴量に該当する変換辞書データがそれぞれ取得され、その取得されたそれぞれの変換辞書データから単語毎に分解された入力文字列のそれぞれに該当する語句が検索される。 According to the speech recognition program of claim 2, in addition to the effect of the speech recognition program of claim 1, the following effect is achieved. An input speech is converted into an input character string broken down into words, and the feature quantities of each of the input character strings broken down into words are acquired. Then, conversion dictionary data corresponding to each of the feature quantities of the acquired input character strings broken down into words is acquired, and a phrase corresponding to each of the input character strings broken down into words is searched for from each of the acquired conversion dictionary data.

即ち単語毎に分解された入力文字列の特徴量がそれぞれ取得され、その取得されたそれぞれの特徴量に該当する変換辞書データから語句が検索されるので、複数の単語が含まれる入力文字列をそのまま変換辞書データで検索する場合と比較して、きめ細く且つ精度の高い語句の出力が可能となるという効果がある。 In other words, the features of the input string broken down into words are acquired, and phrases are searched for in the conversion dictionary data that corresponds to each of the acquired features. This has the effect of enabling more detailed and accurate phrase output compared to searching the conversion dictionary data directly for an input string containing multiple words.

請求項３記載の音声認識プログラムによれば、請求項１又は２に記載の音声認識プログラムの奏する効果に加え、次の効果を奏する。特徴量が発音文字列または入力文字列の文字数とされる。これによって、変換辞書データが発音文字列の文字数に応じて記憶される。そして、入力文字列の文字数が取得され、その取得された文字数に該当する変換辞書データが取得され、語句の検索に用いられる。 According to the speech recognition program of claim 3, in addition to the effects of the speech recognition program of claim 1 or 2, the following effects are achieved. The feature value is the number of characters in the pronunciation string or the input string. As a result, the conversion dictionary data is stored according to the number of characters in the pronunciation string. Then, the number of characters in the input string is obtained, and the conversion dictionary data corresponding to the obtained number of characters is obtained and used for searching for words and phrases.

これにより、入力文字列の文字数に一致しない発音文字列の変換辞書データによる語句の検索を省略できるので、入力文字列に該当する語句を迅速に検索できる。また、入力文字列または発音文字列の文字数は、複雑な解析をすることなく容易に取得できるので、入力文字列の特徴量（文字数）に該当する変換辞書データの取得を迅速に行うことができる。これらにより、音声認識のレスポンスを向上できるという効果がある。 This makes it possible to omit searches for words using conversion dictionary data for pronunciation strings that do not match the number of characters in the input string, thereby enabling quick searches for words that match the input string. In addition, since the number of characters in the input string or pronunciation string can be easily obtained without complex analysis, conversion dictionary data that matches the feature value (number of characters) of the input string can be quickly obtained. These have the effect of improving the response of voice recognition.

請求項４記載の音声認識プログラムによれば、請求項１又は２に記載の音声認識プログラムの奏する効果に加え、次の効果を奏する。特徴量が発音文字列または入力文字列の先頭文字とされる。これによって、変換辞書データが発音文字列の先頭文字に応じて記憶される。そして、入力文字列の先頭文字が取得され、その取得された先頭文字に該当する変換辞書データが取得され、語句の検索に用いられる。 According to the speech recognition program of claim 4, in addition to the effects of the speech recognition program of claim 1 or 2, the following effects are achieved. The feature is set to the first character of the pronunciation string or the input string. As a result, the conversion dictionary data is stored according to the first character of the pronunciation string. Then, the first character of the input string is obtained, and the conversion dictionary data corresponding to the obtained first character is obtained and used for searching for words and phrases.

これにより、入力文字列の先頭文字に一致しない発音文字列の変換辞書データによる語句の検索を省略できるので、入力文字列に該当する語句を迅速に検索できる。また、入力文字列または発音文字列の先頭文字は、複雑な解析をすることなく容易に取得できるので、入力文字列の特徴量（先頭文字）に該当する変換辞書データの取得を迅速に行うことができる。これらにより、音声認識のレスポンスを向上できるという効果がある。 This makes it possible to omit searching for words using conversion dictionary data for pronunciation strings that do not match the first character of the input string, thereby enabling quick searches for words that match the input string. In addition, since the first character of an input string or pronunciation string can be easily obtained without complex analysis, conversion dictionary data that matches the feature (first character) of the input string can be quickly obtained. These have the effect of improving the response of voice recognition.

請求項５記載の音声認識プログラムによれば、請求項１又は２に記載の音声認識プログラムの奏する効果に加え、次の効果を奏する。特徴量が発音文字列または入力文字列の文字数と先頭文字との組み合わせとされる。これによって、変換辞書データが発音文字列の文字数と先頭文字との組み合わせに応じて記憶される。そして、入力文字列の文字数と先頭文字との組み合わせが取得され、その取得された文字数と先頭文字との組み合わせに該当する変換辞書データが取得され、語句の検索に用いられる。 According to the speech recognition program of claim 5, in addition to the effects of the speech recognition program of claim 1 or 2, the following effects are achieved. The feature quantity is a combination of the number of characters and the first character of the pronunciation string or the input string. As a result, the conversion dictionary data is stored according to the combination of the number of characters and the first character of the pronunciation string. Then, the combination of the number of characters and the first character of the input string is obtained, and the conversion dictionary data corresponding to the obtained combination of the number of characters and the first character is obtained and used for searching for words and phrases.

これにより、入力文字列の文字数および先頭文字に一致しない発音文字列の変換辞書データによる語句の検索を省略できるので、入力文字列に該当する語句を迅速に検索することができる。更に、特徴量を文字数と先頭文字との組み合わせとすることで、特徴量を文字数のみや特徴量を先頭文字のみとした場合と比較して、変換辞書データに記憶される発音文字列およびその語句の数を少なくすることができる。これによっても、入力文字列に該当する語句を迅速に検索できる。 This makes it possible to omit searching for words in the conversion dictionary data for pronunciation strings that do not match the number of characters and first character of the input string, thereby enabling quick searches for words that match the input string. Furthermore, by using a combination of the number of characters and the first character as the feature, it is possible to reduce the number of pronunciation strings and their words stored in the conversion dictionary data compared to when the feature is only the number of characters or only the first character. This also enables quick searches for words that match the input string.

また、入力文字列または発音文字列の文字数および先頭文字は、複雑な解析をすることなく容易に取得できるので、入力文字列の特徴量（文字数と先頭文字との組み合わせ）に該当する変換辞書データの取得を迅速に行うことができる。これらにより、音声認識のレスポンスを向上できるという効果がある。 In addition, the number of characters and the first character of the input string or pronunciation string can be easily obtained without complex analysis, so conversion dictionary data corresponding to the features of the input string (combination of number of characters and first character) can be quickly obtained. This has the effect of improving the response of voice recognition.

請求項６記載の音声認識装置によれば、請求項１記載の音声認識プログラムと同様の効果を奏する。 The voice recognition device described in claim 6 achieves the same effect as the voice recognition program described in claim 1.

携帯端末の外観図である。FIG. 2 is an external view of a mobile terminal. （ａ）は、全辞書データからの変換辞書データの作成を模式的に表した図であり、（ｂ）は、変換辞書データを用いた入力文字列の置き換えを模式的に表した図である。1A is a diagram showing a schematic representation of the creation of conversion dictionary data from all dictionary data, and FIG. 1B is a diagram showing a schematic representation of the replacement of an input character string using the conversion dictionary data. 携帯端末の電気的構成を示すブロック図である。FIG. 2 is a block diagram showing the electrical configuration of the mobile terminal. （ａ）は、音声処理のフローチャートであり、（ｂ）は、変換辞書データ作成処理のフローチャートである。13A is a flowchart of speech processing, and FIG. 13B is a flowchart of conversion dictionary data creation processing. 辞書適用処理のフローチャートである。13 is a flowchart of a dictionary application process. （ａ）は、第２実施形態における全辞書データからの変換辞書データの作成を模式的に表した図であり、（ｂ）は、第２実施形態における変換辞書データを用いた入力文字列の置き換えを模式的に表した図である。13A is a diagram showing a schematic representation of the creation of conversion dictionary data from all dictionary data in the second embodiment, and FIG. 13B is a diagram showing a schematic representation of the replacement of an input character string using conversion dictionary data in the second embodiment. 第２実施形態の携帯端末の電気的構成を示すブロック図である。FIG. 11 is a block diagram showing the electrical configuration of a mobile terminal according to a second embodiment. （ａ）は、第２実施形態の変換辞書データ作成処理のフローチャートであり、（ｂ）は、第２実施形態の辞書適用処理のフローチャートである。13A is a flowchart of a conversion dictionary data creation process according to a second embodiment, and FIG. 13B is a flowchart of a dictionary application process according to the second embodiment.

以下、本発明の好ましい実施形態について、添付図面を参照して説明する。まず、図１を参照して、本実施形態における携帯端末１の構成を説明する。図１は、携帯端末１の外観図である。携帯端末１は、ユーザＨが発する発話を音声認識する情報処理装置（コンピュータ）である。携帯端末１では、音声Ｖが入力可能に構成され、入力された音声Ｖをその発音を表す文字列である入力文字列Ｔｉに変換する。変換された入力文字列Ｔｉを図２で後述の変換辞書データＳｄに参照することで、入力文字列Ｔｉを適切な語句Ｗに置き換えてＬＣＤ１６（図３参照）に表示する。 A preferred embodiment of the present invention will now be described with reference to the accompanying drawings. First, the configuration of a mobile terminal 1 in this embodiment will be described with reference to FIG. 1. FIG. 1 is an external view of the mobile terminal 1. The mobile terminal 1 is an information processing device (computer) that performs voice recognition of speech uttered by a user H. The mobile terminal 1 is configured to be able to input a voice V, and converts the input voice V into an input string Ti, which is a string of characters representing the pronunciation. The converted input string Ti is replaced with an appropriate word or phrase W and displayed on the LCD 16 (see FIG. 3) by referring to conversion dictionary data Sd, which will be described later, in FIG. 2.

次に、図２を参照して変換辞書データＳｄ及び変換辞書データＳｄを用いた入力文字列Ｔｉの置き換えを説明する。図２（ａ）は、全辞書データＡｄからの変換辞書データＳｄの作成を模式的に表した図であり、図２（ｂ）は、変換辞書データＳｄを用いた入力文字列Ｔｉの置き換えを模式的に表した図である。 Next, the conversion dictionary data Sd and the replacement of the input character string Ti using the conversion dictionary data Sd will be described with reference to FIG. 2. FIG. 2(a) is a diagram that shows a schematic representation of the creation of the conversion dictionary data Sd from the entire dictionary data Ad, and FIG. 2(b) is a diagram that shows a schematic representation of the replacement of the input character string Ti using the conversion dictionary data Sd.

携帯端末１には、全辞書データＡｄと、変換辞書データＳｄとの２種類の辞書データが設けられる。全辞書データＡｄには、ひらがなやカタカナ等の表音文字による文字列である発音文字列Ｔｐと、その発音文字列Ｔｐに該当する漢字やアルファベット等による文字列である語句Ｗとの組み合わせが複数記憶される。全辞書データＡｄには、単語単位の語句Ｗ及びその語句Ｗに対応する発音文字列Ｔｐが記憶され、入力文字列Ｔｉの置き換えに用いられる語句Ｗ及び発音文字列Ｔｐの組み合わせの全てが記憶される。 The mobile terminal 1 is provided with two types of dictionary data: full dictionary data Ad and conversion dictionary data Sd. The full dictionary data Ad stores a plurality of combinations of pronunciation strings Tp, which are strings of phonological characters such as hiragana and katakana, and words and phrases W, which are strings of kanji characters, alphabet characters, etc., that correspond to the pronunciation strings Tp. The full dictionary data Ad stores word-by-word words W and pronunciation strings Tp that correspond to the words and phrases W, and stores all combinations of words W and pronunciation strings Tp that are used to replace the input string Ti.

変換辞書データＳｄは、その全辞書データＡｄから作成される辞書データであり、全辞書データＡｄに記憶される発音文字列Ｔｐ及びその発音文字列Ｔｐに対応する語句Ｗの組み合わせ（以下「発音文字列Ｔｐ及び語句Ｗの組み合わせ」と略す）を発音文字列Ｔｐの文字数毎に分けて構築される辞書データである。 The conversion dictionary data Sd is dictionary data created from the entire dictionary data Ad, and is constructed by dividing the combinations of the pronunciation string Tp and the words and phrases W corresponding to the pronunciation string Tp (hereinafter abbreviated as "combinations of pronunciation strings Tp and words and phrases W") stored in the entire dictionary data Ad into groups based on the number of characters in the pronunciation string Tp.

即ち変換辞書データＳｄには、１文字で構成される発音文字列Ｔｐ及びその発音文字列Ｔｐが対応する語句Ｗの組み合わせが記憶される１文字辞書データＳｄ１と、２文字で構成される発音文字列Ｔｐ及びその発音文字列Ｔｐが対応する語句Ｗの組み合わせが記憶される２文字辞書データＳｄ２とが設けられ、同様に、３～１０文字で構成される発音文字列Ｔｐ及びその発音文字列Ｔｐが対応する語句Ｗの組み合わせが記憶される３～１０文字辞書データＳｄ３～Ｓｄ１０もそれぞれ設けられる。以下、１～１０文字辞書データＳｄ３～１０のことを、まとめて「Ｍ文字辞書データＳｄＭ」という。 That is, the conversion dictionary data Sd includes one-character dictionary data Sd1 that stores a combination of a pronunciation string Tp consisting of one character and a word or phrase W that corresponds to the pronunciation string Tp, and two-character dictionary data Sd2 that stores a combination of a pronunciation string Tp consisting of two characters and a word or phrase W that corresponds to the pronunciation string Tp. Similarly, three- to ten-character dictionary data Sd3 to Sd10 that store combinations of a pronunciation string Tp consisting of three to ten characters and a word or phrase W that corresponds to the pronunciation string Tp are also provided. Hereinafter, the one- to ten-character dictionary data Sd3 to Sd10 are collectively referred to as "M-character dictionary data SdM."

このように構成された変換辞書データＳｄを用いて、入力文字列Ｔｉの置き換えが行われる。具体的に図２（ｂ）に示す通り、まず入力文字列Ｔｉを形態素解析することで、単語単位の入力文字列ＴｉＮ（Ｎは自然数）に分解する。図２（ｂ）の例では、入力文字列Ｔｉが「ほんじつはありがとうございます」とされる。よって、この入力文字列Ｔｉを形態素解析することで、入力文字列Ｔｉは「ほんじつ」による入力文字列Ｔｉ１と、「は」による入力文字列Ｔｉ２と、「ありがとう」による入力文字列Ｔｉ３と、「ございます」による入力文字列Ｔｉ４との４つの単語に分解される。 The input character string Ti is replaced using the conversion dictionary data Sd configured in this way. Specifically, as shown in FIG. 2(b), the input character string Ti is first decomposed into word-based input character strings TiN (N is a natural number) by morphological analysis. In the example of FIG. 2(b), the input character string Ti is "honjitsu arigatou gozaimasu." Therefore, by morphological analysis of this input character string Ti, the input character string Ti is decomposed into four words: input character string Ti1 consisting of "honjitsu," input character string Ti2 consisting of "wa," input character string Ti3 consisting of "arigatou," and input character string Ti4 consisting of "gozaimasu."

そして、単語毎に分解された入力文字列ＴｉＮのそれぞれを変換辞書データＳｄで参照することで、それぞれに該当する語句Ｗが取得される。具体的に、入力文字列ＴｉＮ毎にその文字数が取得され、取得された文字数と一致する文字数の発音文字列Ｔｐを有する変換辞書データＳｄが取得される。そして、入力文字列ＴｉＮと取得された変換辞書データＳｄの発音文字列Ｔｐとで検索が行われ、入力文字列ＴｉＮと一致する発音文字列Ｔｐに該当する語句Ｗが取得される。 Then, by referencing each of the input character strings TiN broken down into words in the conversion dictionary data Sd, the corresponding phrase W is obtained. Specifically, the number of characters is obtained for each input character string TiN, and conversion dictionary data Sd is obtained that has a pronunciation character string Tp whose number of characters matches the obtained number of characters. Then, a search is performed between the input character string TiN and the pronunciation character string Tp in the obtained conversion dictionary data Sd, and the phrase W corresponding to the pronunciation character string Tp that matches the input character string TiN is obtained.

図２（ｂ）の例では、入力文字列Ｔｉ１の文字数は「４」なので、変換辞書データＳｄのうちの４文字辞書データＳｄ４が取得され、４文字辞書データＳｄ４から入力文字列Ｔｉ１と一致する発音文字列Ｔｐの語句Ｗが取得される。また、入力文字列Ｔｉ２の文字数は「１」なので、１文字辞書データＳｄ１が取得され、その１文字辞書データＳｄ１から入力文字列Ｔｉ２と一致する発音文字列Ｔｐの語句Ｗが取得される。同様に、入力文字列Ｔｉ３及び入力文字列Ｔｉ４の文字数は「５」なので、５文字辞書データＳｄ５が取得され、その５文字辞書データＳｄ５から入力文字列Ｔｉ３及び入力文字列Ｔｉ４と一致する発音文字列Ｔｐの語句Ｗがそれぞれ取得される。 In the example of FIG. 2(b), the number of characters in the input string Ti1 is "4", so the four-character dictionary data Sd4 is obtained from the conversion dictionary data Sd, and the words and phrases W of the pronunciation string Tp that match the input string Ti1 are obtained from the four-character dictionary data Sd4. The number of characters in the input string Ti2 is "1", so the one-character dictionary data Sd1 is obtained, and the words and phrases W of the pronunciation string Tp that match the input string Ti2 are obtained from the one-character dictionary data Sd1. Similarly, the number of characters in the input strings Ti3 and Ti4 is "5", so the five-character dictionary data Sd5 is obtained, and the words and phrases W of the pronunciation string Tp that match the input string Ti3 and the input string Ti4 are obtained from the five-character dictionary data Sd5.

このように、発音文字列Ｔｐの文字数毎のＭ文字辞書データＳｄＭが設けられ、入力文字列ＴｉＮの文字数と一致したＭ文字辞書データＳｄＭからそれぞれの語句Ｗが検索される。入力文字列ＴｉＮの文字数と一致したＭ文字辞書データＳｄＭで語句Ｗを検索することで、そもそも入力文字列ＴｉＮの文字数と一致しない文字数の発音文字列Ｔｐとの比較を省略できるので、語句Ｗを迅速に検索することができる。 In this way, M-character dictionary data SdM is provided for each number of characters in the pronunciation string Tp, and each word or phrase W is searched for in the M-character dictionary data SdM that matches the number of characters in the input string TiN. By searching for a word or phrase W in the M-character dictionary data SdM that matches the number of characters in the input string TiN, it is possible to omit a comparison with the pronunciation string Tp, which has a number of characters that does not match the number of characters in the input string TiN, and therefore it is possible to quickly search for the word or phrase W.

また、発音文字列Ｔｐや入力文字列ＴｉＮの文字数は、複雑な解析をすることなく容易に取得できるので、変換辞書データＳｄの作成や、入力文字列ＴｉＮの文字数に一致するＭ文字辞書データＳｄＭの取得を迅速に行うことができる。これらにより、ユーザＨが音声Ｖを入力してからその音声Ｖに該当する語句Ｗが表示されるまでを、レスポンス良く行うことができる。 In addition, since the number of characters in the pronunciation string Tp and the input string TiN can be easily obtained without complex analysis, it is possible to quickly create the conversion dictionary data Sd and quickly obtain the M-character dictionary data SdM that matches the number of characters in the input string TiN. This allows for a good response from when the user H inputs the voice V to when the word W corresponding to that voice V is displayed.

更に、ユーザＨから入力文字列Ｔｉを、単語毎の入力文字列ＴｉＮに分解し、入力文字列ＴｉＮ毎に変換辞書データＳｄを検索することで、複数の単語が含まれる入力文字列Ｔｉを変換辞書データＳｄで検索する場合と比較して、単語単位のきめ細く且つ精度の高い語句Ｗの取得が可能となる。 Furthermore, by decomposing the input character string Ti from the user H into word-by-word input character strings TiN and searching the conversion dictionary data Sd for each input character string TiN, it becomes possible to obtain a word-by-word term W that is more detailed and accurate than when an input character string Ti containing multiple words is searched for in the conversion dictionary data Sd.

次に、図３を参照して、携帯端末１の電気的構成を説明する。図３は、携帯端末１の電気的構成を示すブロック図である。図３に示す通り、携帯端末１は、ＣＰＵ１０と、フラッシュＲＯＭ１１と、ＲＡＭ１２とを有し、これらはバスライン１３を介して入出力ポート１４にそれぞれ接続されている。入出力ポート１４には更に、音声Ｖを入力するマイク１５と、音声認識をした結果の語句Ｗ等が表示されるＬＣＤ１６と、ユーザＨからの指示が入力されるタッチパネル１７とが接続される。 Next, the electrical configuration of the mobile terminal 1 will be described with reference to FIG. 3. FIG. 3 is a block diagram showing the electrical configuration of the mobile terminal 1. As shown in FIG. 3, the mobile terminal 1 has a CPU 10, a flash ROM 11, and a RAM 12, which are each connected to an input/output port 14 via a bus line 13. The input/output port 14 is further connected to a microphone 15 for inputting voice V, an LCD 16 for displaying words and phrases W and the like resulting from voice recognition, and a touch panel 17 for inputting instructions from the user H.

ＣＰＵ１０は、バスライン１３により接続された各部を制御する演算装置である。フラッシュＲＯＭ１１は、書き換え可能な不揮発性のメモリであり、音声認識プログラム１１ａと、全辞書データＡｄが記憶される全辞書データ１１ｂと、変換辞書データＳｄが記憶される変換辞書データ１１ｃとが保存される。ＣＰＵ１０によって音声認識プログラム１１ａが実行されると、図４の音声処理が実行される。 The CPU 10 is a calculation device that controls each part connected by the bus line 13. The flash ROM 11 is a rewritable non-volatile memory that stores a voice recognition program 11a, all dictionary data 11b in which all dictionary data Ad is stored, and conversion dictionary data 11c in which conversion dictionary data Sd is stored. When the voice recognition program 11a is executed by the CPU 10, the voice processing of FIG. 4 is executed.

変換辞書データ１１ｃには、上記した１文字辞書データＳｄ１が記憶される１文字辞書データ１１ｃ１と、２文字辞書データＳｄ２が記憶される２文字辞書データ１１ｃ２とが設けられ、同様に３～１０文字辞書データＳｄ３～１０がそれぞれ記憶される３～１０文字辞書データ１１ｃ３～１１ｃ１０とが設けられる。以下、１～１０文字辞書データ１１ｃ１～１１ｃ１０のことを、まとめて「Ｍ文字辞書データ１１ｃＭ」という。 The conversion dictionary data 11c includes one-character dictionary data 11c1 in which the one-character dictionary data Sd1 described above is stored, two-character dictionary data 11c2 in which the two-character dictionary data Sd2 is stored, and similarly, three- to ten-character dictionary data 11c3 to 11c10 in which the three- to ten-character dictionary data Sd3 to Sd10 are stored, respectively. Hereinafter, the one- to ten-character dictionary data 11c1 to 11c10 are collectively referred to as "M-character dictionary data 11cM."

ＲＡＭ１２は、ＣＰＵ１０の音声認識プログラム１１ａの実行時に各種のワークデータやフラグ等を書き換え可能に記憶するためのメモリであり、入力文字列ＴｉＮから取得された語句Ｗが記憶される出力文字列メモリ１２ａが設けられる。 RAM 12 is a memory for rewritably storing various work data, flags, etc. when CPU 10 is executing voice recognition program 11a, and is provided with output character string memory 12a in which a word W obtained from input character string TiN is stored.

次に、図４，５を参照して、携帯端末１のＣＰＵ１０で実行される処理を説明する。図４（ａ）は、音声処理のフローチャートである。音声処理は、タッチパネル１７等を介してユーザＨから音声認識プログラム１１ａを実行する指示が入力された場合に実行される処理である。 Next, the processing executed by the CPU 10 of the mobile terminal 1 will be described with reference to Figures 4 and 5. Figure 4(a) is a flowchart of the voice processing. The voice processing is a process that is executed when an instruction to execute the voice recognition program 11a is input from the user H via the touch panel 17 or the like.

音声処理はまず、全辞書データ１１ｂの全辞書データＡｄが更新されたかを確認する（Ｓ１）。具体的に、全辞書データ１１ｂの全辞書データＡｄの発音文字列Ｔｐ及び語句Ｗの組み合わせは、追加または削除、発音文字列Ｔｐまたは語句Ｗの修正による更新が可能に構成され、ユーザＨからタッチパネル１７を介して全辞書データ１１ｂの全辞書データＡｄの更新の指示がされたかが確認される。 The speech processing first checks whether the all dictionary data Ad of the all dictionary data 11b has been updated (S1). Specifically, the combinations of the pronunciation character strings Tp and words/phrases W of the all dictionary data Ad of the all dictionary data 11b are configured to be updated by adding or deleting the pronunciation character strings Tp or the words/phrases W, and it is checked whether an instruction to update the all dictionary data Ad of the all dictionary data 11b has been given by the user H via the touch panel 17.

Ｓ１の処理において、全辞書データ１１ｂの全辞書データＡｄが更新された場合は（Ｓ１：Ｙｅｓ）、全辞書データＡｄが更新されたことで、全辞書データＡｄに含まれる発音文字列Ｔｐ及び語句Ｗの組み合わせと、変換辞書データ１１ｃに記憶される発音文字列Ｔｐ及び語句Ｗの組み合わせとに相違が発生している虞があるので、変換辞書データ作成処理（Ｓ２）を実行する。図４（ｂ）を参照して、変換辞書データ作成処理を説明する。 In the process of S1, if the all dictionary data Ad of the all dictionary data 11b is updated (S1: Yes), since there is a possibility that the update of the all dictionary data Ad may cause a difference between the combination of the pronunciation character string Tp and the word W included in the all dictionary data Ad and the combination of the pronunciation character string Tp and the word W stored in the conversion dictionary data 11c, a conversion dictionary data creation process (S2) is executed. The conversion dictionary data creation process will be described with reference to FIG. 4(b).

図４（ｂ）は、変換辞書データ作成処理のフローチャートである。変換辞書データ作成処理はまず、全辞書データ１１ｂから更新された後の全辞書データＡｄを取得する（Ｓ２０）。Ｓ２０の処理の後、取得した全辞書データＡｄの発音文字列Ｔｐ及び語句Ｗの組み合わせを発音文字列Ｔｐの文字数順に並び変える（Ｓ２１）。 Figure 4 (b) is a flowchart of the conversion dictionary data creation process. The conversion dictionary data creation process first obtains the updated all dictionary data Ad from the all dictionary data 11b (S20). After the process of S20, the combinations of the pronunciation character string Tp and the words and phrases W in the obtained all dictionary data Ad are rearranged in order of the number of characters in the pronunciation character string Tp (S21).

Ｓ２１の処理の後、発音文字列Ｔｐの文字数順に並び変えられた全辞書データＡｄから、発音文字列Ｔｐの文字数毎に発音文字列Ｔｐ及び語句Ｗの組み合わせを取得して辞書データを作成し、その辞書データを変換辞書データ１１ｃの該当する文字数のＭ文字辞書データ１１ｃＭに記憶する（Ｓ２２）。これにより、変換辞書データ１１ｃのＭ文字辞書データ１１ｃＭには、更新後の全辞書データ１１ｂの全辞書データＡｄと同一の発音文字列Ｔｐ及び語句Ｗの組み合わせが記憶される。 After the process of S21, the combination of the pronunciation string Tp and the word W is obtained for each number of characters of the pronunciation string Tp from the entire dictionary data Ad rearranged in order of the number of characters of the pronunciation string Tp to create dictionary data, and the dictionary data is stored in the M-character dictionary data 11cM of the corresponding number of characters in the conversion dictionary data 11c (S22). As a result, the M-character dictionary data 11cM of the conversion dictionary data 11c stores the same combination of the pronunciation string Tp and the word W as the entire dictionary data Ad of the updated entire dictionary data 11b.

Ｓ２２の処理の後、変換辞書データ作成処理を終了する。 After processing S22, the conversion dictionary data creation process ends.

図４（ａ）に戻る。Ｓ１の処理において全辞書データ１１ｂの全辞書データＡｄが更新されていない場合は（Ｓ１：Ｎｏ）、Ｓ２の処理をスキップする。Ｓ１，Ｓ２の処理の後、出力文字列メモリ１２ａをクリアする（Ｓ３）。Ｓ３の処理の後、マイク１５から入力された音声Ｖを文字列に変換することで、上記の入力文字列Ｔｉを取得する（Ｓ４）。なお、音声Ｖを文字列に変換する手法は公知の手法が用いられるので、その詳細の説明を省略する。 Return to FIG. 4(a). If the entire dictionary data Ad of the entire dictionary data 11b has not been updated in the process of S1 (S1: No), the process of S2 is skipped. After the processes of S1 and S2, the output character string memory 12a is cleared (S3). After the process of S3, the voice V input from the microphone 15 is converted into a character string to obtain the above-mentioned input character string Ti (S4). Note that a well-known method is used for converting the voice V into a character string, and therefore a detailed description thereof is omitted.

Ｓ４の処理の後、取得された入力文字列Ｔｉを形態素解析することで、単語毎の入力文字列ＴｉＮを取得する（Ｓ５）。なお、形態素解析は既知の手法が用いられるので、その詳細な説明は省略する。Ｓ５の処理の後、辞書適用処理（Ｓ６）を実行する。図５を参照して、辞書適用処理を説明する。 After the process of S4, the acquired input character string Ti is subjected to morphological analysis to acquire an input character string TiN for each word (S5). Note that a known method is used for the morphological analysis, and therefore a detailed description thereof is omitted. After the process of S5, a dictionary application process (S6) is executed. The dictionary application process will be described with reference to FIG. 5.

図５は、辞書適用処理のフローチャートである。辞書適用処理はまず、図４（ａ）のＳ５の処理で取得された入力文字列ＴｉＮの文字数をそれぞれ取得する（Ｓ３０）。Ｓ３０の処理の後、入力文字列Ｔｉから取得された入力文字列ＴｉＮの順番を表すカウンタ変数Ｎに１を設定する（Ｓ３１）。例えば、カウンタ変数Ｎが「１」の場合が上記の「入力文字列Ｔｉ１」とされ、カウンタ変数Ｎが「２」の場合が上記の「入力文字列Ｔｉ２」とされる。以下、入力文字列Ｔｉのカウンタ変数Ｎ番目における入力文字列ＴｉＮのことを「Ｎ番目の入力文字列ＴｉＮ」という。 Figure 5 is a flowchart of the dictionary application process. The dictionary application process first obtains the number of characters in each of the input strings TiN obtained in the process of S5 in Figure 4(a) (S30). After the process of S30, a counter variable N indicating the order of the input string TiN obtained from the input string Ti is set to 1 (S31). For example, when the counter variable N is "1", it is regarded as the above-mentioned "input string Ti1", and when the counter variable N is "2", it is regarded as the above-mentioned "input string Ti2". Hereinafter, the input string TiN at the Nth counter variable of the input string Ti is referred to as the "Nth input string TiN".

Ｓ３１の処理の後、Ｎ番目の入力文字列ＴｉＮの文字数に該当するＭ文字辞書データ１１ｃＭを変換辞書データ１１ｃから取得する（Ｓ３２）。Ｓ３２の処理の後、Ｎ番目の入力文字列ＴｉＮを、Ｓ３２の処理で取得したＭ文字辞書データ１１ｃＭで検索することで、Ｎ番目の入力文字列ＴｉＮに該当する語句Ｗを取得する（Ｓ３３）。 After the process of S31, M-character dictionary data 11cM corresponding to the number of characters of the N-th input character string TiN is obtained from the conversion dictionary data 11c (S32). After the process of S32, the N-th input character string TiN is searched for in the M-character dictionary data 11cM obtained in the process of S32, thereby obtaining a word W corresponding to the N-th input character string TiN (S33).

具体的に、Ｓ３２の処理で取得したＭ文字辞書データ１１ｃＭにおいて、Ｎ番目の入力文字列ＴｉＮと一致する発音文字列Ｔｐが検索され、一致する発音文字列Ｔｐが検索された場合は、該当する語句Ｗが取得される。一方で、Ｎ番目の入力文字列ＴｉＮと一致する発音文字列ＴｐがＭ文字辞書データ１１ｃＭから検索されなかった場合は、Ｎ番目の入力文字列ＴｉＮがそのまま語句Ｗとして取得される。 Specifically, the M-character dictionary data 11cM acquired in the process of S32 is searched for a pronunciation character string Tp that matches the Nth input character string TiN, and if a matching pronunciation character string Tp is found, the corresponding word or phrase W is acquired. On the other hand, if a pronunciation character string Tp that matches the Nth input character string TiN is not found in the M-character dictionary data 11cM, the Nth input character string TiN is acquired as it is as the word or phrase W.

Ｓ３３の処理の後、取得された語句Ｗを出力文字列メモリ１２ａに追加する（Ｓ３４）。Ｓ３４の処理の後、カウンタ変数Ｎに１を加算し（Ｓ３５）、そのカウンタ変数Ｎが入力文字列Ｔｉから分解された入力文字列ＴｉＮの数より大きいかを確認する（Ｓ３６）。Ｓ３６の処理において、カウンタ変数Ｎが入力文字列ＴｉＮの数以下の場合は（Ｓ３６：Ｎｏ）、Ｓ３２以下の処理を繰り返す。一方で、カウンタ変数Ｎが入力文字列ＴｉＮの数より大きい場合は（Ｓ３６：Ｙｅｓ）、辞書適用処理を終了する。 After the process of S33, the obtained word W is added to the output character string memory 12a (S34). After the process of S34, 1 is added to the counter variable N (S35), and it is confirmed whether the counter variable N is greater than the number of input character strings TiN decomposed from the input character string Ti (S36). In the process of S36, if the counter variable N is equal to or less than the number of input character strings TiN (S36: No), the processes from S32 onwards are repeated. On the other hand, if the counter variable N is greater than the number of input character strings TiN (S36: Yes), the dictionary application process is terminated.

図４（ａ）に戻る。Ｓ６の辞書適用処理の後、出力文字列メモリ１２ａに記憶される文字列をＬＣＤ１６に表示する（Ｓ７）。これにより、ユーザＨから入力された音声Ｖが入力文字列Ｔｉに変換され、その入力文字列Ｔｉのうち変換辞書データ１１ｃに記憶される発音文字列Ｔｐに該当するものが、該当する語句Ｗに置き換えられてＬＣＤ１６に表示される。 Returning to FIG. 4(a), after the dictionary application process of S6, the character string stored in the output character string memory 12a is displayed on the LCD 16 (S7). As a result, the voice V input by the user H is converted into an input character string Ti, and among the input character strings Ti, a character string that corresponds to a pronunciation character string Tp stored in the conversion dictionary data 11c is replaced with the corresponding word or phrase W and displayed on the LCD 16.

Ｓ７の処理の後、ユーザＨからタッチパネル１７を介して音声処理の終了する指示を取得したかを確認する（Ｓ８）。Ｓ８の処理において、音声処理の終了の指示を取得しなかった場合は（Ｓ８：Ｎｏ）、Ｓ１以下の処理を繰り返し、音声処理の終了の指示を取得した場合は（Ｓ８：Ｙｅｓ）、音声処理を終了する。 After the process of S7, it is confirmed whether an instruction to end the voice processing has been obtained from the user H via the touch panel 17 (S8). If an instruction to end the voice processing has not been obtained in the process of S8 (S8: No), the process from S1 onwards is repeated, and if an instruction to end the voice processing has been obtained (S8: Yes), the voice processing is terminated.

次に、図６～８を参照して、第２実施形態の携帯端末１００を説明する。上記した第１実施形態では、変換辞書データＳｄを発音文字列Ｔｐの文字数毎に分けて作成し、変換辞書データＳｄのうち入力文字列ＴｉＮの文字数と一致する辞書データ（即ちＭ文字辞書データＳｄＭ）を取得し、その辞書データを用いて語句Ｗを取得した。 Next, a mobile terminal 100 according to a second embodiment will be described with reference to Figures 6 to 8. In the first embodiment described above, the conversion dictionary data Sd is created by dividing it into groups based on the number of characters in the pronunciation string Tp, and dictionary data (i.e., M-character dictionary data SdM) that matches the number of characters in the input string TiN is obtained from the conversion dictionary data Sd, and the word W is obtained using that dictionary data.

これに対し、第２実施形態では、変換辞書データＳｄを発音文字列Ｔｐの先頭文字毎に分けて作成し、変換辞書データＳｄのうち入力文字列ＴｉＮの先頭文字と一致する辞書データを取得し、その辞書データを用いて語句Ｗを取得する。上記した第１実施形態と同一の部分については、同一の符号を付し、その説明は省略する。 In contrast, in the second embodiment, conversion dictionary data Sd is created for each first character of the pronunciation string Tp, and dictionary data that matches the first character of the input string TiN is obtained from the conversion dictionary data Sd, and the word W is obtained using that dictionary data. The same parts as in the first embodiment described above are given the same reference numerals, and their description is omitted.

図６（ａ）は、第２実施形態における全辞書データＡｄからの変換辞書データＳｄの作成を模式的に表した図であり、図６（ｂ）は、第２実施形態における変換辞書データＳｄを用いた入力文字列Ｔｉの置き換えを模式的に表した図である。第２実施形態では、全辞書データＡｄの発音文字列Ｔｐ及び語句Ｗの組み合わせから、発音文字列Ｔｐの先頭文字毎に変換辞書データＳｄが作成される。 Figure 6(a) is a diagram that shows a schematic representation of the creation of conversion dictionary data Sd from all dictionary data Ad in the second embodiment, and Figure 6(b) is a diagram that shows a schematic representation of the replacement of an input character string Ti using conversion dictionary data Sd in the second embodiment. In the second embodiment, conversion dictionary data Sd is created for each first character of a pronunciation character string Tp from a combination of a pronunciation character string Tp and a word or phrase W in the all dictionary data Ad.

具体的に、第２実施形態の変換辞書データＳｄは、全辞書データＡｄに記憶される発音文字列Ｔｐ及び語句Ｗの組み合わせを発音文字列Ｔｐの先頭文字毎に分けて構築される。第２実施形態は、先頭文字として「ひらがな」が用いられる。即ち変換辞書データＳｄには、先頭文字が「あ」である発音文字列Ｔｐ及びその発音文字列Ｔｐが対応する語句Ｗの組み合わせが記憶される「あ」用辞書データＳｄ２０と、先頭文字が「い」である発音文字列Ｔｐ及びその発音文字列Ｔｐが対応する語句Ｗの組み合わせが記憶される「い」用辞書データＳｄ２１とが設けられ、同様に先頭文字が「う」～「ん」である発音文字列Ｔｐ及びその発音文字列Ｔｐが対応する語句Ｗの組み合わせが記憶される「う」～「ん」用辞書データＳｄ２２～６５もそれぞれ設けられる。以下、「あ」～「ん」用辞書データＳｄ２０～６５のことを、まとめて「Ｐ用辞書データＳｄＰ」という。 Specifically, the conversion dictionary data Sd of the second embodiment is constructed by dividing the combinations of the pronunciation character string Tp and the words and phrases W stored in the entire dictionary data Ad into the first characters of the pronunciation character string Tp. In the second embodiment, "hiragana" is used as the first character. That is, the conversion dictionary data Sd includes "a" dictionary data Sd20, which stores the combination of the pronunciation character string Tp whose first character is "a" and the words and phrases W corresponding to the pronunciation character string Tp, and "i" dictionary data Sd21, which stores the combination of the pronunciation character string Tp whose first character is "i" and the words and phrases W corresponding to the pronunciation character string Tp, and also includes "u" to "n" dictionary data Sd22 to 65, which store the combination of the pronunciation character string Tp whose first character is "u" to "n" and the words and phrases W corresponding to the pronunciation character string Tp. Hereinafter, the dictionary data Sd20 to 65 for "a" to "n" are collectively referred to as "P dictionary data SdP".

第２実施形態では「が」のような濁音や「ぴ」のような半濁音が先頭文字の発音文字列Ｔｐ及び語句Ｗの組み合わせは、これらの濁点や半濁音がない文字（例えば、「が」の場合は「か」、「ぴ」の場合は「ひ」）が発音文字列Ｔｐの先頭文字のＰ用辞書データＳｄＰに記憶される。なお、これに限られず、濁音や半濁音が先頭文字である発音文字列Ｔｐ及び語句Ｗの組み合わせによるＰ用辞書データＳｄＰを、個別に変換辞書データＳｄに設けても良い。 In the second embodiment, for combinations of a pronunciation string Tp and a word W in which a voiced consonant such as "ga" or a semi-voiced consonant such as "pi" is the first character, a character without these voiced consonants or semi-voiced consonants (for example, "ka" for "ga" and "hi" for "pi") is stored in the dictionary data for P SdP for the first character of the pronunciation string Tp. Note that this is not limited to this, and dictionary data for P SdP for combinations of a pronunciation string Tp and a word W in which a voiced consonant or semi-voiced consonant is the first character may be separately provided in the conversion dictionary data Sd.

このように構成された変換辞書データＳｄを用いて、入力文字列Ｔｉの置き換えが行われる。図６（ｂ）の例では、入力文字列Ｔｉが「あさはくうきがすんでいる」であるので、この入力文字列Ｔｉを形態素解析することで、入力文字列Ｔｉは「あさ」による先頭文字が「あ」である入力文字列Ｔｉ１と、「は」による先頭文字が「は」である入力文字列Ｔｉ２と、「くうき」による先頭文字が「く」である入力文字列Ｔｉ３と、「が」による先頭文字が「が」である入力文字列Ｔｉ４と、「すんでいる」による先頭文字が「す」である入力文字列Ｔｉ５との５つの単語に分解される。 The input character string Ti is replaced using the conversion dictionary data Sd configured in this way. In the example of FIG. 6(b), the input character string Ti is "ASAHAKUUKIGASUTERU", so by morphological analysis of this input character string Ti, the input character string Ti is decomposed into five words: an input character string Ti1 with the first character "A" based on "ASA", an input character string Ti2 with the first character "HA" based on "HA", an input character string Ti3 with the first character "KU" based on "KUUKI", an input character string Ti4 with the first character "GA" based on "GA", and an input character string Ti5 with the first character "SU" based on "SUTERU".

そして、分解された入力文字列ＴｉＮ毎に変換辞書データＳｄを参照して語句Ｗが取得される。具体的に、入力文字列ＴｉＮ毎にその先頭文字が取得され、取得された先頭文字と一致する先頭文字の発音文字列Ｔｐを有するＰ用辞書データＳｄＰが取得される。 Then, for each decomposed input character string TiN, the conversion dictionary data Sd is referenced to obtain the word W. Specifically, the first character of each input character string TiN is obtained, and dictionary data SdP for P is obtained, which has a pronunciation character string Tp whose first character matches the obtained first character.

図２（ｂ）の例では、入力文字列Ｔｉ１の先頭文字は「あ」なので、変換辞書データＳｄのうちの「あ」用辞書データＳｄ２０が取得され、その辞書データから入力文字列Ｔｉ１と一致する発音文字列Ｔｐの語句Ｗが取得される。同様に、入力文字列Ｔｉ２の先頭文字は「は」なので、「は」用辞書データＳｄ４５が取得され、その辞書データから語句Ｗが取得され、入力文字列Ｔｉ３の先頭文字は「く」なので、「く」用辞書データＳｄ２７が取得され、その辞書データから語句Ｗが取得される。入力文字列Ｔｉ４の先頭文字は「が」なので、「か」用辞書データＳｄ２５が取得され、その辞書データから語句Ｗが取得され、入力文字列Ｔｉ５の先頭文字は「す」なので、「す」用辞書データＳｄ３２が取得され、その辞書データから語句Ｗが取得される。 In the example of FIG. 2B, the first character of input string Ti1 is "a", so dictionary data Sd20 for "a" is obtained from the conversion dictionary data Sd, and a word or phrase W of a pronunciation string Tp that matches input string Ti1 is obtained from the dictionary data. Similarly, the first character of input string Ti2 is "ha", so dictionary data Sd45 for "ha" is obtained, and a word or phrase W is obtained from the dictionary data. The first character of input string Ti3 is "ku", so dictionary data Sd27 for "ku" is obtained, and a word or phrase W is obtained from the dictionary data. The first character of input string Ti4 is "ga", so dictionary data Sd25 for "ka" is obtained, and a word or phrase W is obtained from the dictionary data. The first character of input string Ti5 is "su", so dictionary data Sd32 for "su" is obtained, and a word or phrase W is obtained from the dictionary data.

このように、発音文字列Ｔｐの先頭文字毎のＰ用辞書データＳｄＰが設けられ、入力文字列ＴｉＮの先頭文字と一致したＰ用辞書データＳｄＰからそれぞれの語句Ｗが検索される。入力文字列ＴｉＮの先頭文字と一致したＰ用辞書データＳｄＰで語句Ｗを検索することで、そもそも入力文字列ＴｉＮの先頭文字と一致しない文字数の発音文字列Ｔｐとの比較を省略できるので、語句Ｗを迅速に検索することができる。 In this way, dictionary data for P SdP is provided for each first character of the pronunciation string Tp, and each word or phrase W is searched for in the dictionary data for P SdP that matches the first character of the input string TiN. By searching for a word or phrase W in the dictionary data for P SdP that matches the first character of the input string TiN, it is possible to omit a comparison with the pronunciation string Tp, which has a number of characters that does not match the first character of the input string TiN in the first place, and therefore it is possible to quickly search for a word or phrase W.

また、発音文字列Ｔｐや入力文字列ＴｉＮの先頭文字は、文字列の複雑な解析をすることなく容易に取得できるので、変換辞書データＳｄの作成や、入力文字列ＴｉＮの先頭文字に一致するＰ用辞書データＳｄＰの取得を迅速に行うことができる。これらにより、ユーザＨが音声Ｖを入力してからその音声Ｖに該当する語句Ｗが表示されるまでを、迅速に行うことができる。 In addition, since the first character of the pronunciation string Tp or the input string TiN can be easily obtained without complex analysis of the string, it is possible to quickly create the conversion dictionary data Sd and quickly obtain the dictionary data SdP for P that matches the first character of the input string TiN. This allows the process from when the user H inputs the voice V to when the word W corresponding to that voice V is displayed to be completed quickly.

次に、図７を参照して、第２実施形態の携帯端末１００の電気的構成を説明する。図７は、第２実施形態の携帯端末１００の電気的構成を示すブロック図である。第２実施形態の携帯端末１００の変換辞書データ１１ｃには、Ｍ文字辞書データ１１ｃＭの代わりに、「あ」～「ん」用辞書データＳｄ２０～６５がそれぞれ記憶される「あ」～「ん」用辞書データ１１ｃ２０～１１ｃ６５が記憶される。以下、「あ」～「ん」用辞書データ１１ｃ２０～１１ｃ６５のことを「Ｐ用辞書データ１１ｃＰ」という。 Next, the electrical configuration of the mobile terminal 100 of the second embodiment will be described with reference to FIG. 7. FIG. 7 is a block diagram showing the electrical configuration of the mobile terminal 100 of the second embodiment. In the conversion dictionary data 11c of the mobile terminal 100 of the second embodiment, dictionary data 11c20-11c65 for "a" to "n" in which dictionary data Sd20-65 for "a" to "n" are stored, respectively, instead of M-character dictionary data 11cM. Hereinafter, dictionary data 11c20-11c65 for "a" to "n" will be referred to as "P dictionary data 11cP".

次に、図８を参照して、第２実施形態の携帯端末１００のＣＰＵ１０で実行される処理を説明する。図８（ａ）は、第２実施形態の変換辞書データ作成処理のフローチャートである。第２実施形態の変換辞書データ作成処理は、Ｓ２０の処理の後、取得した全辞書データＡｄの発音文字列Ｔｐ及び語句Ｗの組み合わせを発音文字列Ｔｐの先頭文字順に並び変える（Ｓ１００）。 Next, referring to FIG. 8, the process executed by the CPU 10 of the mobile terminal 100 of the second embodiment will be described. FIG. 8(a) is a flowchart of the conversion dictionary data creation process of the second embodiment. After the process of S20, the conversion dictionary data creation process of the second embodiment rearranges the combinations of the pronunciation character string Tp and the words and phrases W in all the acquired dictionary data Ad in the order of the first characters of the pronunciation character string Tp (S100).

Ｓ１００の処理の後、発音文字列Ｔｐの先頭文字順に並び変えられた全辞書データＡｄから、発音文字列Ｔｐの先頭文字毎に発音文字列Ｔｐ及び語句Ｗの組み合わせを取得して辞書データを作成し、その辞書データを変換辞書データ１１ｃの該当する先頭文字のＰ用辞書データ１１ｃＰに保存する（Ｓ１０１）。Ｓ１０１の処理の後、変換辞書データ作成処理を終了する。 After the process of S100, the system obtains a combination of the pronunciation string Tp and the word W for each first character of the pronunciation string Tp from all dictionary data Ad rearranged in the order of the first characters of the pronunciation string Tp to create dictionary data, and stores the dictionary data in dictionary data 11cP for the corresponding first character P in the conversion dictionary data 11c (S101). After the process of S101, the conversion dictionary data creation process is terminated.

次に、第２実施形態の辞書適用処理を説明する。図８（ｂ）は、第２実施形態の辞書適用処理のフローチャートである。第２実施形態の辞書適用処理はまず、図４（ａ）で上記したＳ５の処理で取得された入力文字列ＴｉＮの先頭文字をそれぞれ取得する（Ｓ１１０）。Ｓ１１０の処理の後、上記したＳ３１の処理を行い、そのＳ３１の処理の後、Ｎ番目の入力文字列ＴｉＮの先頭文字に該当するＰ用辞書データ１１ｃＰを変換辞書データ１１ｃから取得する（Ｓ１１１）。 Next, the dictionary application process of the second embodiment will be described. FIG. 8(b) is a flowchart of the dictionary application process of the second embodiment. In the dictionary application process of the second embodiment, first, the first character of each input character string TiN obtained in the process of S5 described above in FIG. 4(a) is obtained (S110). After the process of S110, the process of S31 described above is performed, and after the process of S31, dictionary data 11cP for P corresponding to the first character of the Nth input character string TiN is obtained from the conversion dictionary data 11c (S111).

Ｓ１１１の処理の後、Ｎ番目の入力文字列ＴｉＮを、Ｓ１１１の処理で取得したＰ用辞書データ１１ｃＰで検索することで、Ｎ番目の入力文字列ＴｉＮに該当する語句Ｗを取得する（Ｓ１１２）。Ｐ用辞書データ１１ｃＰから入力文字列ＴｉＮに該当する語句Ｗを取得する手法は、上記した図５のＳ３３の処理と同様なので詳細な説明は省略する。Ｓ１１２の処理の後、Ｓ３４以下の処理を実行する。 After the process of S111, the Nth input character string TiN is searched in the dictionary data for P 11cP obtained in the process of S111 to obtain a word W corresponding to the Nth input character string TiN (S112). The method of obtaining a word W corresponding to the input character string TiN from the dictionary data for P 11cP is the same as the process of S33 in FIG. 5 described above, so a detailed explanation is omitted. After the process of S112, the processes from S34 onwards are executed.

以上、実施形態に基づき本発明を説明したが、本発明は上述した実施形態に何ら限定されるものではなく、本発明の趣旨を逸脱しない範囲内で種々の改良変更が可能であることは容易に推察できるものである。 The present invention has been described above based on the embodiments, but the present invention is in no way limited to the above-mentioned embodiments, and it can be easily imagined that various improvements and modifications are possible within the scope of the invention without departing from its spirit.

第１実施形態では発音文字列Ｔｐの文字数毎に変換辞書データＳｄを作成し、第２実施形態では発音文字列Ｔｐの先頭文字毎に変換辞書データＳｄを作成したが、これらに限られない。例えば、発音文字列Ｔｐの文字数と先頭文字との組み合わせ毎に変換辞書データＳｄを作成しても良い。この場合、入力文字列ＴｉＮの文字数と先頭文字との組み合わせを取得し、その組み合わせに該当する文字数と先頭文字との組み合わせの辞書データを変換辞書データＳｄから取得し、取得された辞書データを用いて、語句Ｗの検索をすれば良い。 In the first embodiment, conversion dictionary data Sd is created for each number of characters in the pronunciation string Tp, and in the second embodiment, conversion dictionary data Sd is created for each first character of the pronunciation string Tp, but this is not limited to the above. For example, conversion dictionary data Sd may be created for each combination of the number of characters and the first character of the pronunciation string Tp. In this case, the combination of the number of characters and the first character of the input string TiN is obtained, and dictionary data for the combination of the number of characters and the first character that corresponds to that combination is obtained from the conversion dictionary data Sd, and the obtained dictionary data is used to search for the word W.

これにより、入力文字列ＴｉＮの文字数および先頭文字に一致しない発音文字列Ｔｐとの比較を省略できるので、入力文字列ＴｉＮに該当する語句Ｗを更に迅速に検索できる。更に、発音文字列Ｔｐの文字数と先頭文字との組み合わせ毎に変換辞書データＳｄを作成することで、第１実施形態の文字数毎の変換辞書データＳｄや第２実施形態の先頭文字毎の変換辞書データＳｄと比較して、１の辞書データに記憶される発音文字列Ｔｐ及び語句Ｗの数を少なくすることができる。これによっても、入力文字列ＴｉＮに該当する語句Ｗを迅速に検索できる。 This allows for omitting a comparison with pronunciation strings Tp that do not match the number of characters and first character of the input string TiN, making it possible to search for words and phrases W that correspond to the input string TiN even more quickly. Furthermore, by creating conversion dictionary data Sd for each combination of the number of characters and first character of the pronunciation string Tp, it is possible to reduce the number of pronunciation strings Tp and words and phrases W stored in one dictionary data, compared to the conversion dictionary data Sd for each number of characters in the first embodiment and the conversion dictionary data Sd for each first character in the second embodiment. This also allows for a quick search for words and phrases W that correspond to the input string TiN.

また、変換辞書データＳｄは文字数や先頭文字に応じて作成するものに限られず、発音文字列Ｔｐや入力文字列ＴｉＮの他の特徴量に応じて作成しても良い。例えば、発音文字列Ｔｐや入力文字列ＴｉＮを構成する文字の文字コード値の合計に応じて変換辞書データＳｄを作成しても良い。この場合、入力文字列ＴｉＮを構成する文字の文字コード値の合計を取得し、その文字コード値の合計に該当する辞書データを変換辞書データＳｄから取得し、取得された辞書データを用いて、語句Ｗの検索をすれば良い。 In addition, the conversion dictionary data Sd is not limited to being created according to the number of characters or the first character, but may be created according to other features of the pronunciation string Tp or the input string TiN. For example, the conversion dictionary data Sd may be created according to the sum of the character code values of the characters that make up the pronunciation string Tp or the input string TiN. In this case, the sum of the character code values of the characters that make up the input string TiN is obtained, dictionary data corresponding to that sum of character code values is obtained from the conversion dictionary data Sd, and the obtained dictionary data is used to search for the word W.

上記実施形態では、全辞書データＡｄを設け、全辞書データＡｄから変換辞書データＳｄを作成したが、これに限られない。例えば、全辞書データＡｄを省略し、変換辞書データＳｄのみで構成しても良い。この場合、図４の音声処理におけるＳ１，Ｓ２の処理の代わりに、ユーザＨから指示に応じて変換辞書データＳｄの更新を行えば良い。 In the above embodiment, the total dictionary data Ad is provided, and the conversion dictionary data Sd is created from the total dictionary data Ad, but this is not limited to the above. For example, the total dictionary data Ad may be omitted, and the conversion dictionary data Sd may be configured only. In this case, instead of the processes S1 and S2 in the voice processing of FIG. 4, the conversion dictionary data Sd may be updated in response to an instruction from the user H.

上記実施形態では、発音文字列Ｔｐや入力文字列ＴｉＮをひらがなで構成したが、これに限られず、カタカナやアルファベットや中国語のピンイン等の他の表音文字を用いても良い。また、第２実施形態の変換辞書データＳｄもひらがなによる先頭文字に応じて作成したが、これに限られず、カタカナやアルファベットや中国語のピンイン等の他の表音文字による先頭文字に応じて変換辞書データＳｄを作成しても良い。 In the above embodiment, the pronunciation string Tp and the input string TiN are composed of hiragana, but this is not limited to this and other phonetic characters such as katakana, the alphabet, or Chinese pinyin may be used. In addition, the conversion dictionary data Sd in the second embodiment is also created based on the first character in hiragana, but this is not limited to this and the conversion dictionary data Sd may be created based on the first character in other phonetic characters such as katakana, the alphabet, or Chinese pinyin.

第１実施形態では、変換辞書データＳｄに１～１０文字辞書データＳｄ３～１０の最大１０文字までの辞書データを設けたが、これに限られず、１０文字以上の辞書データを設けても良い。 In the first embodiment, the conversion dictionary data Sd includes 1-10 character dictionary data Sd3-10, which are up to 10 characters long, but this is not limited to this and dictionary data of 10 characters or more may be provided.

上記実施形態では、入力文字列Ｔｉを単語毎の入力文字列ＴｉＮに分解し、入力文字列ＴｉＮのそれぞれを変換辞書データＳｄで検索したが、これに限られず、複数の単語が含まれる入力文字列Ｔｉで変換辞書データＳｄを検索しても良い。 In the above embodiment, the input character string Ti is decomposed into word-specific input character strings TiN, and each of the input character strings TiN is searched in the conversion dictionary data Sd, but this is not limited thereto, and the conversion dictionary data Sd may be searched for an input character string Ti that includes multiple words.

上記実施形態では、入力文字列Ｔｉの単語毎の入力文字列ＴｉＮへの分解を形態素解析で行ったが、これに限られず、ＡＩ等の他の手法によって、入力文字列Ｔｉの単語毎の入力文字列ＴｉＮへの分解を行っても良い。 In the above embodiment, the decomposition of the input character string Ti into the input character string TiN for each word is performed by morphological analysis, but this is not limited to this, and the decomposition of the input character string Ti into the input character string TiN for each word may be performed by other methods such as AI.

上記実施形態では、図５，８（ｂ）のＳ３４の処理で、出力文字列メモリ１２ａの文字列をＬＣＤ１６に表示したが、これに限られない。例えば、図示しない通信装置を介して、出力文字列メモリ１２ａの文字列を他の携帯端末１等の情報処理装置に送信しても良いし、図示しないプリンタを接続し、出力文字列メモリ１２ａの文字列を紙に印刷しても良い。 In the above embodiment, the character string in the output character string memory 12a is displayed on the LCD 16 in the process of S34 in Figs. 5 and 8(b), but this is not limited to this. For example, the character string in the output character string memory 12a may be transmitted to an information processing device such as another mobile terminal 1 via a communication device (not shown), or a printer (not shown) may be connected and the character string in the output character string memory 12a may be printed on paper.

上記実施形態では、音声認識プログラム１１ａが組み込まれた携帯端末１を例示したが、これに限られず、パーソナルコンピュータやタブレット端末等の他の情報処理装置（コンピュータ）によって音声認識プログラム１１ａを実行する構成としても良い。また、音声認識プログラム１１ａをＲＯＭやＩＣチップ等に記憶し、音声認識プログラム１１ａのみを実行する専用装置に、本発明を適用しても良い。 In the above embodiment, a mobile terminal 1 incorporating the voice recognition program 11a is exemplified, but the present invention is not limited to this, and the voice recognition program 11a may be executed by another information processing device (computer) such as a personal computer or a tablet terminal. In addition, the voice recognition program 11a may be stored in a ROM or an IC chip, and the present invention may be applied to a dedicated device that executes only the voice recognition program 11a.

１，１００携帯端末（コンピュータ）
１１フラッシュＲＯＭ（記憶部）
１１ａ音声認識プログラム
Ｖ音声
Ｔｐ発音文字列
Ｗ語句
Ｓｄ変換辞書データ
１１ｃ変換辞書データ（変換辞書記憶手段）
Ｔｉ，ＴｉＮ入力文字列
Ｓ４音声変換ステップ、音声変換手段
Ｓ３０，Ｓ１１０特徴量取得ステップ、特徴量取得手段
Ｓ３２，Ｓ１１１辞書取得ステップ、特徴量取得手段
Ｓ３３，Ｓ１１２検索ステップ、検索手段
Ｓ７出力ステップ、出力手段 1,100 Mobile terminal (computer)
11 Flash ROM (storage unit)
11a Voice recognition program V Voice Tp Pronounced character string W Word Sd Conversion dictionary data 11c Conversion dictionary data (conversion dictionary storage means)
Ti, TiN Input character string S4 Speech conversion step, speech conversion means S30, S110 Feature acquisition step, feature acquisition means S32, S111 Dictionary acquisition step, feature acquisition means S33, S112 Search step, search means S7 Output step, output means

Claims

A speech recognition program for causing a computer having a storage unit to execute a speech recognition process,
The storage unit is caused to function as a conversion dictionary storage means for storing a plurality of conversion dictionary data, which are dictionary data consisting of a combination of a pronunciation character string and a word corresponding to the pronunciation character string, and which are divided by feature amount of the pronunciation character string;
a speech conversion step for converting the input speech into an input string representing the pronunciation of the input speech;
a feature acquisition step of acquiring features of the input character string converted in the speech conversion step;
a dictionary acquisition step of acquiring conversion dictionary data corresponding to the feature amount acquired in the feature amount acquisition step from the conversion dictionary storage means;
a search step of searching for a phrase corresponding to the input character string converted in the speech conversion step from the conversion dictionary data acquired in the dictionary acquisition step;
an output step of outputting the searched word or phrase when the word or phrase is found in the search step , and outputting the input character string converted in the speech conversion step when the word or phrase is not found in the search step,
the conversion dictionary data is created from all dictionary data in which combinations of all pronunciation character strings included in the plurality of conversion dictionary data and words corresponding to the pronunciation character strings are stored;
A speech recognition program characterized in that, when the entire dictionary data is updated, combinations of pronunciation strings and words corresponding to those pronunciation strings contained in the updated entire dictionary data are rearranged based on features of the pronunciation strings, and conversion dictionary data for each feature of the pronunciation strings is created from the rearranged combinations of pronunciation strings and words corresponding to those pronunciation strings .

The speech conversion step converts the input speech into a character string representing the pronunciation, and breaks the character string into words to obtain an input character string;
The feature amount acquiring step acquires feature amounts of each of the input character strings converted and decomposed into words in the speech conversion step,
The dictionary acquisition step acquires, from the conversion dictionary storage means, conversion dictionary data corresponding to each feature of the input character string decomposed into words acquired in the feature acquisition step,
2. The speech recognition program according to claim 1, wherein the search step searches for a phrase corresponding to each of the input character strings decomposed into words converted in the speech conversion step from each conversion dictionary data acquired in the dictionary acquisition step.

The speech recognition program according to claim 1 or 2, characterized in that the feature is the number of characters in the pronunciation string or the input string.

The speech recognition program according to claim 1 or 2, characterized in that the feature is the first character of the pronunciation string or the input string.

The speech recognition program according to claim 1 or 2, characterized in that the feature is a combination of the number of characters and the first character of the pronunciation string or the input string.

a conversion dictionary storage means for storing a plurality of conversion dictionary data, the conversion dictionary data being a combination of a pronunciation character string and a phrase corresponding to the pronunciation character string, the conversion dictionary data being classified according to features of the pronunciation character string;
A voice input means for inputting voice;
a speech conversion means for converting the speech inputted by the speech input means into an input character string representing the pronunciation of the speech;
A feature acquisition means for acquiring a feature of an input character string converted by the speech conversion means;
a dictionary acquisition means for acquiring conversion dictionary data corresponding to the feature acquired by the feature acquisition means from among the conversion dictionary data stored in the conversion dictionary storage means;
a search means for searching conversion dictionary data acquired by the dictionary acquisition means for a word or phrase corresponding to the input character string converted by the speech conversion means;
an output means for outputting the searched word or phrase when the search means is able to search for the word or phrase , and for outputting the input character string converted by the speech conversion means when the search means is unable to search for the word or phrase,
the conversion dictionary data is created from all dictionary data in which combinations of all pronunciation character strings included in the plurality of conversion dictionary data and words corresponding to the pronunciation character strings are stored;
A speech recognition device characterized in that, when the entire dictionary data is updated, combinations of pronunciation strings and words corresponding to the pronunciation strings contained in the updated entire dictionary data are rearranged based on features of the pronunciation strings, and conversion dictionary data for each feature of the pronunciation strings is created from the rearranged combinations of pronunciation strings and words corresponding to the pronunciation strings .