JP6619764B2

JP6619764B2 - Language processing apparatus, program and method for selecting language model according to user attribute

Info

Publication number: JP6619764B2
Application number: JP2017071895A
Authority: JP
Inventors: 安田　圭志; 圭志安田
Original assignee: KDDI Research Inc
Current assignee: KDDI Research Inc
Priority date: 2017-03-31
Filing date: 2017-03-31
Publication date: 2019-12-11
Anticipated expiration: 2037-03-31
Also published as: JP2018173846A

Description

本発明は、統計的機械翻訳システムの技術に関する。 The present invention relates to a technique of a statistical machine translation system.

統計的機械翻訳システムによれば、統計的確率論を用いた「言語モデル」と「翻訳モデル」とを有する。
「言語モデル」とは、原言語の単語の並びの尤もらしさを規定する統計モデルであって、原言語の単言語データから学習したものである。
「翻訳モデル」とは、異なる２つの言語の文章について、意味的に対応する可能性が高い単語やフレーズの対を学習したものである。
言語モデル及び翻訳モデルは、学習段階で、教師データ群のコーパス(corpus)を用いて学習し、運用段階で、入力されたテキストに対して最も確率が高い翻訳テキストを出力する。 The statistical machine translation system has a “language model” and a “translation model” using statistical probability theory.
The “language model” is a statistical model that defines the likelihood of the arrangement of words in the source language, and is learned from monolingual data in the source language.
The “translation model” is obtained by learning a pair of words and phrases that are highly likely to correspond semantically for sentences in two different languages.
The language model and the translation model are learned using a corpus of the teacher data group at the learning stage, and the translated text with the highest probability is output with respect to the input text at the operation stage.

統計的機械翻訳システムは、コーパスさえあれば、短期間で且つ低コストにシステムを構築することができる。基本的には、単語やフレーズの単位に文章を少しずつ区切りながら、その都度、確率が高いと想定される翻訳テキストをつなげて、文章全体を生成する。 A statistical machine translation system can be constructed in a short period of time and at a low cost as long as it has a corpus. Basically, a sentence is divided into units of words and phrases little by little, and each time a translation text that is assumed to have a high probability is connected to generate the whole sentence.

従来、翻訳性能を劣化させることなく、学習用のコーパスを小規模化する技術がある（例えば特許文献１参照）。この技術によれば、所望タスクに一致したコーパスであるインドメインコーパスから学習したインドメイン言語モデルと、所望タスクと異なるコーパスであるアウトオブドメインコーパスとが記憶されている。そして、インドメイン言語モデルを用いて、インドメインコーパスと、アウトオブドメインコーパスに含まれる各文との類似性に関する類似情報をそれぞれ算出する。その類似情報を用いて、インドメインコーパスと類似性の高い複数の対訳テキストをアウトオブドメインコーパスから選択する。 Conventionally, there is a technique for downsizing a learning corpus without degrading translation performance (see, for example, Patent Document 1). According to this technique, an in-domain language model learned from an in-domain corpus that is a corpus that matches a desired task, and an out-of-domain corpus that is a corpus different from the desired task are stored. Then, using the in-domain language model, similar information regarding the similarity between the in-domain corpus and each sentence included in the out-of-domain corpus is calculated. Using the similarity information, a plurality of parallel translation texts that are highly similar to the in-domain corpus are selected from the out-of-domain corpus.

特開２００９−０６４０５１号公報JP 2009-064051 A

Bisazza, Nick Ruiz, Marcello Federico, "Fill-up versus Interpolation Methods for Phrase-based SMT Adaptation Arianna", Proc. of IWSLT2011、[online]、［平成２９年３月２６日検索］、インターネット＜URL:http://www.mt-archive.info/IWSLT-2011-Bisazza.pdf＞Bisazza, Nick Ruiz, Marcello Federico, "Fill-up versus Interpolation Methods for Phrase-based SMT Adaptation Arianna", Proc. Of IWSLT2011, [online], [March 26, 2017 search], Internet <URL: http: //www.mt-archive.info/IWSLT-2011-Bisazza.pdf> Natural Language Processing、「エントロピーとパープレキシティ」、長岡技術科学大学電気電子情報工学専攻自然言語処理研究室、[online]、［平成２９年３月２６日検索］、インターネット＜URL:http://www.jnlp.org/lab/graduates/okada/nlp/term/entropy＞Natural Language Processing, “Entropy and Perplexity”, Nagaoka University of Technology, Department of Electrical and Electronic Information Engineering, Natural Language Processing Laboratory, [online], [Search on March 26, 2017], Internet <URL: http: // www.jnlp.org/lab/graduates/okada/nlp/term/entropy>

従来技術によれば、コーパスの精度自体が、言語モデル及び翻訳モデルの精度に影響を与える。そのために、コーパスは、実際に発話されているログテキストと、そのログテキストに適切な対訳テキストとを書き起こして作成される。そのような作業には、多大なコストと時間を要することとなる。
また、そのように高い精度を求めて作成されるコーパスは、通常、１つの原言語を網羅した１バージョンを作成することに注力される。そのために、コーパスは、標準的で一般に利用可能なものとして提供される。 According to the prior art, the accuracy of the corpus itself affects the accuracy of the language model and the translation model. For this purpose, the corpus is created by transcribing the log text that is actually spoken and the appropriate bilingual text in the log text. Such work requires a great deal of cost and time.
In addition, a corpus created with such high accuracy usually focuses on creating one version that covers one source language. To that end, the corpus is provided as standard and generally available.

しかしながら、これによって作成されるコーパスは、肥大化し大容量となる。また、そのようなコーパスを適用した言語処理装置にとっては、消費メモリや計算量の増大につながる。更に、全てのユーザを対象しているために、そのようなコーパスによって学習された言語モデル及び翻訳モデルは、誤りが湧き出すという問題も生じる。 However, the corpus created thereby is enlarged and has a large capacity. In addition, for a language processing device to which such a corpus is applied, this leads to an increase in memory consumption and calculation amount. Furthermore, since all users are targeted, the language model and the translation model learned by such a corpus also cause a problem that errors occur.

ここで、本願の発明者らは、全てのユーザを対象にした１バージョンで且つ大容量のコーパスを作成することに問題があるのではないか、と考えた。即ち、複数の言語モデルや翻訳モデルの中から、そのユーザの発話テキストに対して、できる限り最適な言語モデルや翻訳モデルを選択することができれば、全体的な言語処理の精度を高めることができるのではないか、と考えた。 Here, the inventors of the present application thought that there might be a problem in creating a large-capacity corpus with one version targeting all users. That is, if the most appropriate language model or translation model can be selected from among a plurality of language models or translation models for the user's utterance text, the accuracy of the overall language processing can be improved. I thought that.

そこで、本発明は、複数の言語モデルの中から、ユーザに最適な言語モデルを選択することができる言語処理装置、プログラム及び方法を提供することを目的とする。 SUMMARY An advantage of some aspects of the invention is that it provides a language processing apparatus, a program, and a method capable of selecting a language model optimum for a user from a plurality of language models.

本発明によれば、ユーザからの発話テキストに適用すべき言語モデルを選択する言語処理装置であって、
「例文」に「付随属性」を対応付けた言語コーパス(corpus)を蓄積した言語コーパス蓄積手段と、
複数のユーザの過去の発話に基づく「ログテキスト」に「ユ−ザ属性」を対応付けたログコーパスを蓄積したログコーパス蓄積手段と、
付随属性毎に、当該付随属性に対応する言語コーパスから学習した言語モデルと、
ユ−ザ属性毎に、当該ユ−ザ属性に対応するログテキストの群を言語モデルに入力し、当該言語モデルから出力される生起確率が最も高い付随属性を対応付ける属性対応テーブル作成手段と、
ユーザからの発話テキストに対して、当該ユーザのユ−ザ属性に対応付けられた付随属性に基づく言語モデルを適用する言語処理手段と
を有することを特徴とする。 According to the present invention, a language processing apparatus for selecting a language model to be applied to a speech text from a user,
A language corpus storage means for storing a language corpus (corpus) in which "accompanying attributes" are associated with "example sentences";
Log corpus storage means for storing a log corpus in which “user attribute” is associated with “log text” based on past utterances of a plurality of users;
For each incidental attribute, a language model learned from the language corpus corresponding to the incidental attribute,
For each user attribute, a group of log texts corresponding to the user attribute is input to the language model, and an attribute correspondence table creating means for associating the associated attribute with the highest occurrence probability output from the language model;
It has a language processing means for applying a language model based on an accompanying attribute associated with a user attribute of the user to speech text from the user.

本発明の言語処理装置における他の実施形態によれば、
ユ−ザ属性は、ユーザに基づく「国言語」「性別」「年齢層」「国籍」のいずれか又はそれらの組み合わせであり、
付随属性は、発話された「地域」のいずれか又はそれらの組み合わせ、又は、発話された「話題分野」のいずれか又はそれらの組み合わせであることも好ましい。 According to another embodiment of the language processing apparatus of the present invention,
The user attribute is one of “national language”, “gender”, “age group”, “nationality” based on the user, or a combination thereof.
The accompanying attribute is preferably any one of the “regions” spoken or a combination thereof, or any of the “topic areas” spoken or a combination thereof.

本発明の言語処理装置における他の実施形態によれば、
付随属性毎に、当該付随属性に対応する言語コーパスから学習した翻訳モデルを更に有し、
言語処理手段は、ユーザからの発話テキストに対して、当該ユーザのユ−ザ属性に対応付けられた付随属性に基づく翻訳モデルを適用することも好ましい。 According to another embodiment of the language processing apparatus of the present invention,
For each incidental attribute, further comprising a translation model learned from the language corpus corresponding to the incidental attribute,
It is also preferable that the language processing means applies a translation model based on the accompanying attribute associated with the user attribute of the user to the utterance text from the user.

本発明の言語処理装置における他の実施形態によれば、
言語モデルは、生起確率として、２のエントロピー乗となるパープレキシティを出力し、
属性対応テーブル作成手段は、ユ−ザ属性毎に、パープレキシティが最も低い付随属性を対応付けることも好ましい。 According to another embodiment of the language processing apparatus of the present invention,
The language model outputs a perplexity that is an entropy power of 2 as an occurrence probability,
It is also preferable that the attribute correspondence table creating means associates the associated attribute with the lowest perplexity for each user attribute.

本発明の言語処理装置における他の実施形態によれば、
言語モデルは、n-gram又はディープラーニングに基づく音声認識用のものである
ことも好ましい。 According to another embodiment of the language processing apparatus of the present invention,
The language model is also preferably for speech recognition based on n-gram or deep learning.

本発明によれば、ユーザからの発話テキストに適用すべき言語モデルを選択する装置に搭載されたコンピュータを機能させるプログラムであって、
「例文」に「付随属性」を対応付けた言語コーパス(corpus)を蓄積した言語コーパス蓄積手段と、
複数のユーザの過去の発話に基づく「ログテキスト」に「ユ−ザ属性」を対応付けたログコーパスを蓄積したログコーパス蓄積手段と、
付随属性毎に、当該付随属性に対応する言語コーパスから学習した言語モデルと、
ユ−ザ属性毎に、当該ユ−ザ属性に対応するログテキストの群を言語モデルに入力し、当該言語モデルから出力される生起確率が最も高い付随属性を対応付ける属性対応テーブル作成手段と、
ユーザからの発話テキストに対して、当該ユーザのユ−ザ属性に対応付けられた付随属性に基づく言語モデルを適用する言語処理手段と
してコンピュータを機能させることを特徴とする。 According to the present invention, there is provided a program for causing a computer mounted on an apparatus for selecting a language model to be applied to a speech text from a user to function,
A language corpus storage means for storing a language corpus (corpus) in which "accompanying attributes" are associated with "example sentences";
Log corpus storage means for storing a log corpus in which “user attribute” is associated with “log text” based on past utterances of a plurality of users;
For each incidental attribute, a language model learned from the language corpus corresponding to the incidental attribute,
For each user attribute, a group of log texts corresponding to the user attribute is input to the language model, and an attribute correspondence table creating means for associating the associated attribute having the highest occurrence probability output from the language model;
The computer is caused to function as a language processing unit that applies a language model based on an accompanying attribute associated with a user attribute of the user to an utterance text from the user.

本発明によれば、ユーザからの発話テキストに適用すべき言語モデルを選択する装置の言語処理方法であって、
装置は、
「例文」に「付随属性」を対応付けた言語コーパス(corpus)を蓄積した言語コーパス蓄積部と、
複数のユーザの過去の発話に基づく「ログテキスト」に「ユ−ザ属性」を対応付けたログコーパスを蓄積したログコーパス蓄積部と、
付随属性毎に、当該付随属性に対応する言語コーパスから学習した言語モデルと
を有し、
装置は、
学習段階として、ユ−ザ属性毎に、当該ユ−ザ属性に対応するログテキストの群を言語モデルに入力し、当該言語モデルから出力される生起確率が最も高い付随属性を対応付け、
運用段階として、ユーザからの発話テキストに対して、当該ユーザのユ−ザ属性に対応付けられた付随属性に基づく言語モデルを適用した言語処理を実行する
ことを特徴とする。 According to the present invention, there is provided a language processing method of a device for selecting a language model to be applied to a speech text from a user,
The device
A language corpus accumulating unit that accumulates a language corpus (corpus) in which "accompanying attributes" are associated with "example sentences";
A log corpus accumulation unit that accumulates a log corpus in which “user attribute” is associated with “log text” based on past utterances of a plurality of users;
Each accompanying attribute has a language model learned from the language corpus corresponding to the accompanying attribute,
The device
As a learning stage, for each user attribute, a group of log texts corresponding to the user attribute is input to the language model, and an associated attribute having the highest occurrence probability output from the language model is associated with the user attribute.
As an operation stage, it is characterized in that language processing is performed on a speech text from a user, to which a language model based on an accompanying attribute associated with the user attribute of the user is applied.

本発明の言語処理装置、プログラム及び方法によれば、複数の言語モデルの中から、ユーザに最適な言語モデルを選択することができる。 According to the language processing apparatus, program, and method of the present invention, it is possible to select an optimal language model for a user from among a plurality of language models.

本発明における言語処理装置の機能構成図である。It is a functional block diagram of the language processing apparatus in this invention. 各機能構成部と、付随属性及びユーザ属性の関係を表す説明図である。It is explanatory drawing showing the relationship between each function structure part, an accompanying attribute, and a user attribute. 言語コーパス蓄積部のデータ構成を表すテーブルである。It is a table showing the data structure of a language corpus storage part. ログコーパス蓄積部のデータ構成を表すテーブルである。It is a table showing the data structure of a log corpus storage part. 本発明における言語モデル及び翻訳モデルの関係を表す説明図である。It is explanatory drawing showing the relationship between the language model in this invention, and a translation model. 本発明における属性対応テーブル作成部の説明図である。It is explanatory drawing of the attribute corresponding | compatible table preparation part in this invention. 本発明における属性対応テーブル作成部の説明図である。It is explanatory drawing of the attribute corresponding | compatible table preparation part in this invention. ユーザの属性に応じた言語モデル及び翻訳モデルを適用する説明図である。It is explanatory drawing which applies the language model and translation model according to a user's attribute.

以下、本発明の実施の形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明における言語処理装置の機能構成図である。
図２は、各機能構成部と、付随属性及びユーザ属性の関係を表す説明図である。
以下では、図１及び図２を参照しながら、言語処理装置の各機能について説明する。 FIG. 1 is a functional configuration diagram of a language processing apparatus according to the present invention.
FIG. 2 is an explanatory diagram showing the relationship between each function component, the accompanying attribute, and the user attribute.
Hereinafter, each function of the language processing apparatus will be described with reference to FIGS. 1 and 2.

図１によれば、言語処理装置１は、例えば統計的機械翻訳サーバであって、端末からユーザの「発話テキスト」を受信し、その言語処理結果を返信する。また、端末からユーザの「発話音声データ」を受信する場合、言語処理装置１は、音声認識処理によって発話テキストに変換して、その発話テキストに対する言語処理結果を返信する。 According to FIG. 1, the language processing device 1 is, for example, a statistical machine translation server, receives a user's “uttered text” from a terminal, and returns a result of the language processing. When the user's “uttered voice data” is received from the terminal, the language processing device 1 converts the speech text into speech text through speech recognition processing, and returns the language processing result for the speech text.

言語処理装置１は、「言語モデル」及び「翻訳モデル」を用いて、言語処理を実行する。ここで、本発明の言語処理装置１は、ユーザからの発話テキストに適用すべき言語モデル（及び翻訳モデル）を選択して適用する。
言語処理装置１は、言語コーパス蓄積部１００、ログコーパス蓄積部１０１と、言語モデル１０２と、翻訳モデル１０３と、属性対応テーブル作成部１１と、言語処理部１２を有する。これら機能構成図は、装置に搭載されたコンピュータを機能させるプログラムを実行することによって実現される。また、これら機能構成部の処理の流れは、装置の言語処理方法としても理解できる。 The language processing apparatus 1 executes language processing using a “language model” and a “translation model”. Here, the language processing apparatus 1 of the present invention selects and applies the language model (and translation model) to be applied to the utterance text from the user.
The language processing apparatus 1 includes a language corpus storage unit 100, a log corpus storage unit 101, a language model 102, a translation model 103, an attribute correspondence table creation unit 11, and a language processing unit 12. These functional configuration diagrams are realized by executing a program that causes a computer mounted on the apparatus to function. Further, the processing flow of these functional components can be understood as a language processing method of the apparatus.

［言語コーパス蓄積部１００］
言語コーパス蓄積部１００は、自然言語の文章（例文）を構造化し大規模に集積した言語コーパスを蓄積する。
ここで、本発明の「言語コーパス」は、「例文」に「付随属性」を対応付けたものである。尚、「例文」には、原文と対訳文とが対応付けられたものであってもよい。
「付随属性」は、例えば、発話された「地域」又は「話題分野」に基づく属性である。勿論、これらに限られず、その他、様々な観点に基づく属性を含み又は組み合わせたものであってもよい。 [Language Corpus Accumulation Unit 100]
The language corpus accumulating unit 100 accumulates a language corpus in which natural language sentences (example sentences) are structured and accumulated on a large scale.
Here, the “language corpus” of the present invention is obtained by associating “accompanying attributes” with “example sentences”. The “example sentence” may be a sentence in which the original sentence and the parallel translation sentence are associated with each other.
The “accompanying attribute” is, for example, an attribute based on the spoken “region” or “topic area”. Of course, the present invention is not limited to these, and may include or combine attributes based on various viewpoints.

付随属性「豊島区」「新宿区」「渋谷区」のような「地域」性から、例えば以下のようなことが想定される。
（１）「豊島区」（付随属性）に基づく言語コーパスには、「サンシャインシティ」や免税商品名のような単語を含む例文が多いと想定される。
（２）「新宿区」（付随属性）に基づく言語コーパスには、「歌舞伎町」や免税商品名のような単語を含む例文が多いと想定される。
（３）「渋谷区」（付随属性）に基づく言語コーパスには、「スクランブル」や繁華街のような単語を含む例文が多いと想定される。
（４）「中央区」（付随属性）に基づく言語コーパスには、「銀座」や高級ブランド名のような単語を含む例文が多いと想定される。 From the “regional” nature such as the accompanying attributes “Toshima Ward” “Shinjuku Ward” “Shibuya Ward”, for example, the following is assumed.
(1) Language corpus based on “Toshima Ward” (accompanying attribute) is assumed to have many example sentences including words such as “Sunshine City” and duty-free product names.
(2) Language corpus based on “Shinjuku Ward” (accompanying attribute) is assumed to have many example sentences including words such as “Kabukicho” and duty-free product names.
(3) Language corpus based on “Shibuya Ward” (accompanying attribute) is assumed to have many example sentences including words such as “scramble” and downtown.
(4) It is assumed that there are many example sentences including words such as “Ginza” and luxury brand names in language corpus based on “Chuo-ku” (accompanying attribute).

図３は、言語コーパス蓄積部のデータ構成を表すテーブルである。 FIG. 3 is a table showing the data structure of the language corpus storage unit.

図３によれば、以下のような言語コーパスのデータが記述されている。
（１）例文及び対訳文「サンシャインシティに行きたい」には、豊島区の観光名所地について述べられているため、付随属性「豊島区」が付与されている。
（２）例文及び対訳文「化粧品はどこにありますか？」には、免税商品について尋ねられることも多い付随属性「新宿区」が付与されている。
（３）例文及び対訳文「近くに安いバーはありますか？」には、繁華街について尋ねられることも多い付随属性「渋谷区」が付与されている。
（４）例文及び対訳文「温水洗浄便座を探しています」には、免税商品について尋ねられることも多い付随属性「豊島区」が付与されている。 According to FIG. 3, the following language corpus data is described.
(1) Since the example sentence and the parallel translation “I want to go to Sunshine City” describe the tourist attractions in Toshima Ward, the accompanying attribute “Toshima Ward” is given.
(2) The accompanying attribute “Shinjuku-ku”, which is often asked about tax-free products, is given to the example sentences and the parallel translation “where are the cosmetics?”.
(3) The accompanying attribute “Shibuya Ward”, which is often asked about the downtown area, is given to the example sentence and the parallel translation “Is there a cheap bar nearby?”.
(4) The accompanying attribute “Toshima Ward”, which is often asked about tax-free products, is given to the example sentences and the parallel translation “Looking for a toilet with warm water washing”.

［ログコーパス蓄積部１０１］
ログコーパス蓄積部１０１は、複数のユーザの過去の発話に基づく「ログテキスト」に「ユ−ザ属性」を対応付けたログコーパスを蓄積する。
「ユ−ザ属性」は、例えば、ユーザに基づく「国言語」「性別」「年齢層」「国籍」のいずれか又はそれらの組み合わせである。
「ログテキスト」とは、ユーザの過去の発話に基づく（実際の会話の）テキストを意味する。 [Log Corpus Accumulation Unit 101]
The log corpus accumulation unit 101 accumulates a log corpus in which “user attribute” is associated with “log text” based on past utterances of a plurality of users.
The “user attribute” is, for example, any one of “national language”, “sex”, “age group”, “nationality” based on the user, or a combination thereof.
“Log text” means text based on the user's past utterances (actual conversation).

図４は、ログコーパス蓄積部のデータ構成を表すテーブルである。 FIG. 4 is a table showing the data structure of the log corpus storage unit.

図４によれば、以下のようなコーパスデータが記述されている。
（１）英語で発話されたログテキスト「I wan to visit Asakusa.」に、３０代の女性（Ｆ）のユーザ属性が付与されている。
（２）中国語で発話されたログテキストに、２０代の女性（Ｆ）のユーザ属性が付与されている。
（３）英語で発話されたログテキスト「Is there a good bar near here?」に、２０代の男性（Ｍ）のユーザ属性が付与されている。
（４）中国語で発話されたログテキストに、４０代の男性（Ｍ）のユーザ属性が付与されている。 According to FIG. 4, the following corpus data is described.
(1) The user attribute of a woman in her 30s (F) is assigned to the log text “I wan to visit Asakusa.” Spoken in English.
(2) The user attribute of a woman in her 20s (F) is given to the log text uttered in Chinese.
(3) The user attribute of a man in his twenties (M) is given to the log text “Is there a good bar near here?” Uttered in English.
(4) The user attribute of a man in his 40s (M) is given to the log text uttered in Chinese.

＜付随属性とユーザ属性との間に内在する関係＞
ここで、例えば以下のような関係が、付随属性とユーザ属性との間に内在していると想定する。
（１）付随属性「豊島区」では、観光名所地へ行きたいとするユーザ属性「英語」圏の「女性」の旅行者が多いと想定する。
また、付随属性「豊島区」では、免税商品を購入したいとするユーザ属性「中国語」圏の「男性」の旅行者も多いと想定する。
（２）付随属性「渋谷区」では、繁華街を楽しみたいとするユーザ属性「英語」圏の「男性」の旅行者が多いと想定する。
（３）付随属性「新宿区」では、免税商品を購入したいとするユーザ属性「中国語」圏の「女性」の旅行者が多いと想定する。 <Inherent relationship between accompanying attribute and user attribute>
Here, for example, it is assumed that the following relationship exists between the accompanying attribute and the user attribute.
(1) In the accompanying attribute “Toshima Ward”, it is assumed that there are many “female” travelers who have a user attribute “English” area who want to go to tourist attractions.
In the accompanying attribute “Toshima Ward”, it is assumed that there are many “male” travelers who want to purchase duty-free products.
(2) In the accompanying attribute “Shibuya Ward”, it is assumed that there are many “male” travelers who want to enjoy downtown.
(3) In the accompanying attribute “Shinjuku-ku”, it is assumed that there are many “female” travelers who have the user attribute “Chinese” area who want to purchase duty-free products.

［言語モデル１０２］
言語モデル１０２は、言語コーパス蓄積部１００を用いて、付随属性毎に、当該付随属性に対応する言語コーパスから学習したものである。言語モデル１０２は、付随属性に応じて、入力されたテキストの単語の並びの尤もらしさを規定する統計モデルを学習する。 [Language model 102]
The language model 102 is learned from the language corpus corresponding to the associated attribute for each associated attribute using the language corpus storage unit 100. The language model 102 learns a statistical model that defines the likelihood of the word sequence of the input text according to the accompanying attribute.

［翻訳モデル１０３］
翻訳モデル１０３は、言語コーパス蓄積部１００を用いて、付随属性毎に、当該付随属性に対応する言語コーパスから学習したものである。翻訳モデル１０３は、付属属性に応じて、入力されたテキストに対して最も適した対訳文を出力するものである。
付随属性に応じた各翻訳モデル１０３は、入力された第１の言語のログテキストに対して、第２の言語の翻訳テキストを出力する。即ち、翻訳モデル１０３は、単語やフレーズといった単位の確率付き辞書である。第１の言語と第２の言語との関係は、例えば日本語<->英語、日本語<->中国語、英語<->中国のような国語間双方向に基づく。 [Translation model 103]
The translation model 103 is learned from the language corpus corresponding to the associated attribute for each associated attribute using the language corpus storage unit 100. The translation model 103 outputs a bilingual sentence most suitable for the input text according to the attached attribute.
Each translation model 103 according to the accompanying attribute outputs a translation text in the second language for the input log text in the first language. That is, the translation model 103 is a dictionary with probability such as words and phrases. The relationship between the first language and the second language is based on bi-directional bilingual relations such as Japanese <-> English, Japanese <-> Chinese, English <-> China.

図５は、本発明における言語モデル及び翻訳モデルの関係を表す説明図である。 FIG. 5 is an explanatory diagram showing the relationship between the language model and the translation model in the present invention.

図５の言語モデル及び翻訳モデルによれば、付随属性種別「地域」について、付随属性「豊島区」「渋谷区」「新宿区」「中央区」のいずれか及びそれらの組み合わせとして表されている。また、言語モデル及び翻訳モデルは、付随属性種別「地域」の組み合わせによって相互に対応している。 According to the language model and the translation model of FIG. 5, the accompanying attribute type “region” is represented as any of the accompanying attributes “Toshima Ward”, “Shibuya Ward”, “Shinjuku Ward”, “Chuo Ward”, and combinations thereof. . The language model and the translation model correspond to each other by a combination of the accompanying attribute type “region”.

尚、言語モデル１０２及び翻訳モデル１０３それぞれについて、付随属性としての「地域」種別によって、十分なサイズの地域依存コーパスが準備できない場合、地域非依存のコーパスと地域依存コーパスとを、テキストレベルで混合して地域依存コーパスとしてもよい。また、地域非依存コーパスから作成されたモデルと、地域依存コーパスから作成されたモデルとを、線形補間するものであってもよい（例えば非特許文献１参照）。 In addition, for each of the language model 102 and the translation model 103, if a region-dependent corpus having a sufficient size cannot be prepared depending on the “region” type as an accompanying attribute, the region-independent corpus and the region-dependent corpus are mixed at the text level. It is good also as an area dependent corpus. Alternatively, a model created from a region-independent corpus and a model created from a region-dependent corpus may be linearly interpolated (see, for example, Non-Patent Document 1).

［属性対応テーブル作成部１１］
属性対応テーブル作成部１１は、ユ−ザ属性毎に、当該ユ−ザ属性に対応するログテキストの群を（ログコーパス蓄積部１０１から）言語モデル１０２に入力する。そして、ユーザ属性毎、「生起確率が最も高い付随属性」と対応付ける。
ここで、生起確率が「パープレキシティ」である場合、属性対応テーブル作成部１１は、ユ−ザ属性毎に、「パープレキシティが最も低い付随属性」を対応付ける。 [Attribute correspondence table creation unit 11]
For each user attribute, the attribute correspondence table creation unit 11 inputs a group of log texts corresponding to the user attribute (from the log corpus storage unit 101) to the language model 102. Each user attribute is associated with “accompanying attribute having the highest occurrence probability”.
Here, when the occurrence probability is “perplexity”, the attribute correspondence table creation unit 11 associates “accompanying attribute having the lowest perplexity” for each user attribute.

図６は、本発明における属性対応テーブル作成部の説明図である。
言語モデル１０２は、学習段階として、ユ−ザ属性毎に、当該ユ−ザ属性に対応するログテキストの群をログコーパス蓄積部１０１から入力し、その生起確率を出力する。 FIG. 6 is an explanatory diagram of the attribute correspondence table creation unit in the present invention.
As a learning stage, the language model 102 inputs a log text group corresponding to the user attribute from the log corpus storage unit 101 for each user attribute, and outputs the occurrence probability.

＜言語モデル１０２から出力されるパープレキシティについて＞
言語モデル１０２は、n-gram又はディープラーニングに基づく音声認識用のものであってもよい。「n-gram」とは、文中のある単語の発生が直前の(n-1)単語にのみ依存すると考えるモデルであり、単語の生成確率をマルコフモデルで近似したモデルである。「ディープラーニング（深層学習）」とは、多層構造のニューラルネットワークを用いた機械学習である。 <Perplexity output from language model 102>
The language model 102 may be for speech recognition based on n-gram or deep learning. The “n-gram” is a model that considers that the occurrence of a certain word in a sentence depends only on the immediately preceding (n−1) word, and is a model that approximates the word generation probability with a Markov model. “Deep learning” is machine learning using a neural network having a multilayer structure.

ここで、言語モデル１０２は、生起確率として、２のエントロピー乗となるパープレキシティ(perplexity)を出力するものであってもよい。 Here, the language model 102 may output a perplexity that is an entropy power of 2 as an occurrence probability.

情報源のエントロピーＨは、ある事象Ｅ_iが発生する確率をｐ_iとした場合、以下のように算出される（logの底は２）。
Ｈ＝−Σ_iｐ_ilogｐ_i
エントロピーＨは、情報量の期待値であり、「得られる情報の曖昧さ」をいう。情報源を言語に置き換えた場合、単語列ｗ₁,・・・,ｗ_nの生成確率をＰ(ｗ₁,・・・,ｗ_n)とした場合、Ｗj|ＭiのエントロピーＨは、以下のように算出される。
Ｈ(Ｗj|Ｍi)＝−Σ_{ｗ1,・・・,ｗn}Ｐ(ｗ₁,・・・,ｗ_n)logＰ(ｗ₁,・・・,ｗ_n)
＝−{Ｗj|Ｍi}Σ_ｗj∈Ｗjlogｐ(ｗj|Ｍi)
Ｍi：地域コーパスの組み合わせiにおける言語モデル
Ｗj：プロファイルの組み合わせjとして集計された発話テキスト
ｗj：発話テキストＷjに含まれる単語
これは、言語から生成される単語を特定するために必要な情報量を表す。
また、ある単語の後には平均して、２^{Ｈ(Ｗj|Ｍi)}個の単語が後続可能であることを表す。
即ち、情報理論的な意味での単語の平均分岐数を「パープレキシティ」と称し、以下のように表される。
ＰＰＬ＝２^{Ｈ(Ｗj|Ｍi)}
これは、言語モデル１０２と、プロファイル毎に集計された発話テキストと間のパープレキシティを表す。言語のパープレキシティが大きいほど、単語の特定が難しく、言語として複雑になる。 The entropy H of the information source is calculated as follows when the probability that a certain event E _i occurs is p _i (the bottom of the log is 2).
H = -Σ _{_{_i}} p _i logp _i
Entropy H is an expected value of the amount of information and refers to “ambiguity of information obtained”. When the information source is replaced with a language, if the generation probability of the word string w ₁ ,..., W _n is P (w ₁ ,..., W _n ), the entropy H of Wj | Mi is Is calculated as follows.
H (Wj | Mi) = - Σ w1, ···, wn P (w 1, ···, w n) logP (w 1, ···, w n)
=-{Wj | Mi} Σ _wj∈Wj logp (wj | Mi)
Mi: Language model in regional corpus combination i
Wj: Utterance text aggregated as profile combination j
wj: word included in utterance text Wj This represents the amount of information necessary to specify a word generated from a language.
In addition, on average, 2 ^{H (Wj | Mi)} words can be followed after a certain word.
That is, the average number of branching words in an information theoretical sense is called “perplexity” and is expressed as follows.
PPL = 2 ^{H (Wj | Mi)}
This represents the perplexity between the language model 102 and the speech text aggregated for each profile. The greater the language perplexity, the more difficult it is to identify words and the more complex the language.

図７は、本発明における属性対応テーブル作成部の説明図である。 FIG. 7 is an explanatory diagram of the attribute correspondence table creation unit in the present invention.

図７によれば、ユーザ属性と付随属性との組み合わせに、パープレキシティが対応付けられている。これによって、「ユ−ザ属性」に応じて、パープレキシティが最も低い「付随属性」を特定することができる。 According to FIG. 7, perplexity is associated with a combination of a user attribute and an accompanying attribute. As a result, the “accompanying attribute” having the lowest perplexity can be specified in accordance with the “user attribute”.

＜ログテキスト及び付随属性とユーザ属性との間に内在する関係＞
前述した図４を例にして、以下のような関係が、ログテキスト及び付随属性とユーザ属性との間に内在していると想定する。
（１）ユーザ属性「英語」圏の（３０代）「女性」について、例えば観光名所地を訪問したいというフレーズを含むログテキストが多く存在し、その生起確率が、付随属性「豊島区」の言語コーパスで高くなった場合、ユーザ属性「英語」圏の「３０代」「女性」には、付随属性「豊島区」が対応付けられる。
（２）ユーザ属性「中国語」圏の（２０代）「女性」について、例えば免税商品名を意味する単語を含むログテキストが多く存在し、その生起確率が、付随属性「新宿区＋豊島区」の言語コーパスで高くなった場合、ユーザ属性「中国語」圏の（２０代）「女性」には、付随属性「豊島区＋新宿区」が対応付けられる。
（３）ユーザ属性「英語」圏の（２０代）「男性」について、例えば繁華街を意味する単語を含むログテキストが多く存在し、その生起確率が、付随属性「渋谷区」の言語コーパスで高くなった場合、ユーザ属性「英語」圏の（２０代）「男性」には、付随属性「渋谷区」が対応付けられる。 <Inherent relationship between log text and associated attributes and user attributes>
Taking the above-described FIG. 4 as an example, it is assumed that the following relationship is inherent between the log text, the accompanying attribute, and the user attribute.
(1) For the “female” in the user attribute “English” area, there are many log texts that contain phrases such as, for example, a visit to a tourist attraction, and the occurrence probability is the language of the accompanying attribute “Toshima Ward”. When the corpus is high, the accompanying attribute “Toshima Ward” is associated with “30s” and “female” in the user attribute “English” area.
(2) For the user attribute “Chinese” -speaking (20s) “female”, for example, there are many log texts that contain words that mean duty-free merchandise names, and the occurrence probabilities are associated attributes “Shinjuku-ku + Toshima-ku , The accompanying attribute “Toshima-ku + Shinjuku-ku” is associated with the “female” in the 20th category of the user attribute “Chinese”.
(3) For the “male” in the user attribute “English” category (20s), for example, there are many log texts containing words meaning downtown, for example, the occurrence probability is a language corpus of the accompanying attribute “Shibuya-ku” When it becomes higher, the accompanying attribute “Shibuya-ku” is associated with “male” (20's) in the user attribute “English” area.

［言語処理部１２］
言語処理部１２は、運用段階として、ユーザからの発話テキストに対して、当該ユーザのユ−ザ属性に対応付けられた付随属性に基づく言語モデル（及び翻訳モデル）を適用する。ユ−ザ属性に対応付けられた付随属性は、属性対応テーブルから特定できる。 [Language processor 12]
As the operation stage, the language processing unit 12 applies a language model (and a translation model) based on the accompanying attribute associated with the user attribute of the user to the utterance text from the user. The accompanying attribute associated with the user attribute can be specified from the attribute correspondence table.

言語処理装置１は、ユーザ操作の端末から発話テキストを受信する前に、ユーザ属性を予め受信しておく必要がある。そして、言語処理部１２は、端末毎に、受信したユーザ属性に応じて、属性対応テーブルから付随属性を予め特定しておく。 The language processing apparatus 1 needs to receive user attributes in advance before receiving the utterance text from the user-operated terminal. And the language processing part 12 specifies an accompanying attribute from an attribute corresponding table beforehand for every terminal according to the received user attribute.

図８は、ユーザの属性に応じた言語モデル及び翻訳モデルを適用する説明図である。 FIG. 8 is an explanatory diagram for applying a language model and a translation model according to user attributes.

図８によれば、ユ−ザ属性に付随属性が対応付けられた属性対応テーブルが表されている。属性対応テーブルを用いて、ユーザからの発話テキストに適用すべき言語モデル及び翻訳モデルが選択される。
（１）ユーザ属性「女性」が発話した「英語」の発話テキストに対して、「豊島区」に基づく言語モデル及び翻訳モデルが適用され、翻訳テキスト「隅田川に行きたいです」が出力される。
（２）ユーザ属性「男性」が発話した「英語」の発話テキストに対して、「渋谷区」に基づく言語モデル及び翻訳モデルが適用され、翻訳テキスト「この近くに良いクラブはありますか？」が出力される。
（３）ユーザ属性「女性」が発話した「中国語」の発話テキストに対して、「豊島区＋新宿区」に基づく言語モデル及び翻訳モデルが適用され、翻訳テキスト「化粧品はどこにありますか？」が出力される。 FIG. 8 shows an attribute correspondence table in which accompanying attributes are associated with user attributes. A language model and a translation model to be applied to the utterance text from the user are selected using the attribute correspondence table.
(1) The language model and translation model based on “Toshima Ward” are applied to the “English” utterance text uttered by the user attribute “female”, and the translated text “I want to go to Sumida River” is output.
(2) The language model and translation model based on “Shibuya Ward” are applied to the “English” utterance text uttered by the user attribute “male”, and the translated text “Is there a good club near here?” Is output.
(3) The language model and translation model based on “Toshima-ku + Shinjuku-ku” are applied to the “Chinese” utterance text uttered by the user attribute “female”, and the translated text “Where is the cosmetics?” Is output.

以上、詳細に説明したように、本発明の言語処理装置、プログラム及び方法によれば、複数の言語モデル及び翻訳モデルの中から、ユーザに最適な言語モデル及び翻訳モデルを選択することができる。
言語モデル及び翻訳モデルは、言語コーパスを用いて付随属性毎に予め学習しており、言語モデルは、ユーザ属性毎に、当該ユーザ属性のログコーパスを用いて生起確率を出力する。そして、ユーザ属性に応じた、当該生起確率が最も高い言語モデル及び翻訳モデルを選択することができる。即ち、ユーザの発話テキストに対して、ユーザ属性に応じた最適な言語モデル及び翻訳モデルを用いて、言語処理をすることができる。
これによって、言語モデル及び翻訳モデルそれぞれを小規模化することができ、メモリ量を小容量化し、誤りの湧き出しを減少させることができる。即ち、統計的機械学習システムにおける言語処理の高速化と精度向上とを実現することができる。 As described above in detail, according to the language processing apparatus, program, and method of the present invention, it is possible to select a language model and translation model that are optimal for the user from among a plurality of language models and translation models.
The language model and the translation model are previously learned for each accompanying attribute using a language corpus, and the language model outputs the occurrence probability using the log corpus of the user attribute for each user attribute. Then, it is possible to select the language model and translation model having the highest occurrence probability according to the user attribute. That is, language processing can be performed on the user's utterance text using an optimal language model and translation model corresponding to the user attribute.
As a result, the language model and the translation model can be reduced in size, the amount of memory can be reduced, and the occurrence of errors can be reduced. That is, it is possible to realize speeding up and accuracy improvement of language processing in the statistical machine learning system.

前述した本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。前述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 Various changes, modifications, and omissions of the above-described various embodiments of the present invention can be easily made by those skilled in the art. The above description is merely an example, and is not intended to be restrictive. The invention is limited only as defined in the following claims and the equivalents thereto.

１言語処理装置
１００言語コーパス蓄積部
１０１ログコーパス蓄積部
１０２言語モデル
１０３翻訳モデル
１１属性対応テーブル作成部
１２言語処理部

DESCRIPTION OF SYMBOLS 1 Language processing apparatus 100 Language corpus storage part 101 Log corpus storage part 102 Language model 103 Translation model 11 Attribute correspondence table preparation part 12 Language processing part

Claims

A language processing device for selecting a language model to be applied to speech text from a user,
A language corpus storage means for storing a language corpus (corpus) in which "accompanying attributes" are associated with "example sentences";
Log corpus storage means for storing a log corpus in which “user attribute” is associated with “log text” based on past utterances of a plurality of users;
For each incidental attribute, a language model learned from the language corpus corresponding to the incidental attribute;
Attribute correspondence table creating means for associating a group of log texts corresponding to the user attribute to the language model for each user attribute and associating an associated attribute having the highest occurrence probability output from the language model; ,
A language processing apparatus, comprising: a language processing unit that applies the language model based on an accompanying attribute associated with a user attribute of a user to speech text from the user.

The user attribute is one of “national language”, “gender”, “age group”, “nationality” based on the user, or a combination thereof.
2. The language processing apparatus according to claim 1, wherein the accompanying attribute is any one of a spoken “region” or a combination thereof, or any one of a spoken “topic field” or a combination thereof. .

For each incidental attribute, further comprising a translation model learned from the language corpus corresponding to the incidental attribute,
The said language processing means applies the said translation model based on the incidental attribute matched with the user attribute of the said user with respect to the utterance text from a user. Language processor.

The language model outputs a perplexity that is an entropy power of 2 as the occurrence probability,
The language processing apparatus according to claim 1, wherein the attribute correspondence table creating unit associates an associated attribute having the lowest perplexity with each user attribute.

The language processing apparatus according to claim 1, wherein the language model is for speech recognition based on n-gram or deep learning.

A program for operating a computer mounted on a device for selecting a language model to be applied to a speech text from a user,
A language corpus storage means for storing a language corpus (corpus) in which "accompanying attributes" are associated with "example sentences";
Log corpus storage means for storing a log corpus in which “user attribute” is associated with “log text” based on past utterances of a plurality of users;
For each accompanying attribute, a language model learned from the language corpus corresponding to the accompanying attribute;
Attribute correspondence table creating means for inputting, for each user attribute, the group of log texts corresponding to the user attribute to the language model, and associating the associated attribute having the highest occurrence probability output from the language model; ,
A program that causes a computer to function as a language processing unit that applies the language model based on an accompanying attribute associated with a user attribute of the user to speech text from the user.

A language processing method of a device for selecting a language model to be applied to speech text from a user,
The device is
A language corpus accumulating unit that accumulates a language corpus (corpus) in which "accompanying attributes" are associated with "example sentences";
A log corpus accumulation unit that accumulates a log corpus in which “user attribute” is associated with “log text” based on past utterances of a plurality of users;
For each accompanying attribute, the language model learned from the language corpus corresponding to the accompanying attribute,
The device is
As a learning stage, for each user attribute, the group of log texts corresponding to the user attribute is input to the language model, and the incidental attribute having the highest occurrence probability output from the language model is associated,
A language processing method for an apparatus, characterized in that, as an operation stage, language processing is applied to speech text from a user, to which the language model is applied based on an accompanying attribute associated with the user attribute of the user. .