JP6138065B2

JP6138065B2 - Program, apparatus and method for outputting search keywords suitable for different language systems

Info

Publication number: JP6138065B2
Application number: JP2014016510A
Authority: JP
Inventors: 裕幸川嶋; 安田　圭志; 圭志安田
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2014-01-31
Filing date: 2014-01-31
Publication date: 2017-05-31
Anticipated expiration: 2034-01-31
Also published as: JP2015143907A

Description

本発明は、異なる言語体系間における対訳語を選択する技術に関する。 The present invention relates to a technique for selecting parallel translation words between different language systems.

誰しも、母国語に対する外国語を学習するために、画像や音声のようなメディアデータを通じて理解することが多い。例えば教育現場でも、外国語の単語の意味を母国語で表現しにくい場合、学習者に、メディアデータによって視覚や聴覚に対するイメージで理解させることがある。 Everyone often learns through media data such as images and sounds in order to learn a foreign language for their native language. For example, even in an educational setting, if it is difficult to express the meaning of a foreign language word in the native language, the learner may be made to understand the visual and auditory image by media data.

従来、正書／意味／音韻それぞれについて類似した単語対同士の混同を考慮して、学習者に課題を与える外国語学習装置の技術がある（例えば特許文献１参照）。この技術によれば、正書／意味／音韻それぞれのルールにそれぞれ基づいて、指定された学習用単語に対する正書／意味／音韻の干渉語の候補を抽出する。そして、これら干渉度に応じて、統合された難易度を推定し、問題系列が作成される。生成された問題には、画像及び発音音声が組み合わされてユーザに提示される。 Conventionally, there is a technique of a foreign language learning apparatus that gives a learner a problem in consideration of confusion between word pairs similar to each of the correct text / meaning / phoneme (see, for example, Patent Document 1). According to this technique, based on the rules of the correct text / meaning / phoneme, candidate words of the correct text / meaning / phonological interference word for the designated learning word are extracted. And according to these interference degrees, the integrated difficulty level is estimated and a problem series is created. The generated problem is presented to the user in combination with an image and pronunciation sound.

また、予め画像に紐づけられたメタデータを、入力されたテキストクエリで検索し、検索者に対して画像を提示する技術もある（例えば非特許文献１参照）。この技術によれば、与えられた外国語単語と、当該外国語を母国語に翻訳した母国語とから、適切な画像を取得するための画像検索キーワードが生成される。 In addition, there is a technique for searching for metadata associated with an image in advance using an input text query and presenting the image to a searcher (see, for example, Non-Patent Document 1). According to this technique, an image search keyword for acquiring an appropriate image is generated from a given foreign language word and a native language obtained by translating the foreign language into the native language.

特許第４４３２０７９号Japanese Patent No. 4432079

Google画像検索、[online]、［平成２６年１月２２日検索］、インターネット＜URL:http://images.google.com/＞Google Image Search, [online], [Search January 22, 2014], Internet <URL: http: //images.google.com/> ベクトル空間型モデルを使ったランキング前処理、[online]、［平成２６年１月２２日検索］、インターネット＜URL:http://www.jpo.go.jp/shiryou/s_sonota/hyoujun_gijutsu/search_engine/b/b51.htm＞Pre-ranking processing using a vector space model, [online], [Search January 22, 2014], Internet <URL: http://www.jpo.go.jp/shiryou/s_sonota/hyoujun_gijutsu/search_engine/ b / b51.htm>

しかしながら、特許文献１に記載の技術によれば、外国語単語の意味に対応する画像を予め格納しておく必要がある。また、非特許文献１に記載の技術によれば、副詞、形容詞又は多義性を有する単語を検索キーワードとした場合、外国語単語の意味に合致しない画像や多義の複数の画像が提示されることもある。 However, according to the technique described in Patent Document 1, it is necessary to store in advance an image corresponding to the meaning of a foreign language word. Further, according to the technique described in Non-Patent Document 1, when an adverb, adjective, or a word having ambiguity is used as a search keyword, an image that does not match the meaning of a foreign language word or a plurality of ambiguity images are presented. There is also.

ここで、本願の発明者らは、外国語を母国語に翻訳したキーワードを用いて、画像のようなメディアデータを検索したとしても、そのメディアデータが本来の外国語の意味と一致していない場合もあるのではないか？と考えた。その際、外国語と共起する語と、翻訳した母国語と共起する語との関係も考慮して、適切な母国語の検索キーワードを抽出することができないか？と考えた。 Here, even if the inventors of the present application search for media data such as an image using a keyword obtained by translating a foreign language into its native language, the media data does not match the original meaning of the foreign language. Is there a case? I thought. At that time, considering the relationship between words that co-occur with a foreign language and words that co-occur with a translated native language, is it possible to extract a search keyword for an appropriate native language? I thought.

そこで、本発明は、異なる言語体系に対して適切な検索キーワードを出力するプログラム、装置及び方法を提供することを目的とする。この検索キーワードは、その語を理解させるべきメディアデータの検索に用いることができる。 Accordingly, an object of the present invention is to provide a program, an apparatus, and a method for outputting an appropriate search keyword for different language systems. This search keyword can be used to search for media data that should be understood.

本発明によれば、第１の言語体系の第１の原語と、該第１の原語の対訳となる第２の言語体系の第２の原語とを入力し、第１の原語に対する用法として適切な第２の言語体系の第２の共起語を出力するようにコンピュータを機能させるプログラムであって、
第１の言語体系のコーパス辞書を用いて、第１の原語と共起する複数の第１の共起語を抽出する第１の共起語抽出手段と、
第１の言語体系の対訳辞書を用いて、第１の共起語の対訳となる第２の対訳共起語を抽出する第１の対訳共起語抽出手段と、
第２の言語体系のコーパス辞書を用いて、第２の原語と第２の対訳共起語とを組み合わせた語列毎に出現頻度を検索する第１の出現頻度検索手段と、
最も出現頻度が多い語列における第２の対訳共起語を、第１の原語に対する用法として適切な第２の共起語として決定する共起語決定手段と
してコンピュータを機能させることを特徴とする。 According to the present invention, the first original language of the first language system and the second original language of the second language system, which is a parallel translation of the first original language, are input, and suitable as a usage for the first original language A program that causes a computer to function to output a second co-occurrence word of a second language system,
First co-occurrence word extracting means for extracting a plurality of first co-occurrence words co-occurring with the first original word using a corpus dictionary of the first language system;
First bilingual co-occurrence word extraction means for extracting a second bilingual co-occurrence word that becomes a bilingual translation of the first co-occurrence word using a bilingual dictionary of the first language system;
Using a corpus dictionary of a second language system, first appearance frequency search means for searching for an appearance frequency for each word string combining the second original word and the second parallel co-occurrence word;
A computer is caused to function as a co-occurrence word determining means for determining a second parallel co-occurrence word in a word string having the highest appearance frequency as a second co-occurrence word suitable as a usage for the first original word. .

本発明のプログラムにおける他の実施形態によれば、
第１の言語体系が外国語であり、第２の言語体系が母国語であり、
外国語の第１の原語に対する用法として、母国語の第２の原語と第２の共起語とからなる語列が出力される
ようにコンピュータを機能させることも好ましい。 According to another embodiment of the program of the present invention,
The first language system is a foreign language, the second language system is a native language,
As a usage of the first original language of the foreign language, it is also preferable to cause the computer to function so that a word string composed of the second original language of the native language and the second co-occurrence word is output.

本発明のプログラムにおける他の実施形態によれば、
第１の言語体系が母国語であり、第２の言語体系が外国語であり、
母国語の第１の原語に対する用法として、外国語の第２の原語と第２の共起語とからなる語列が出力される
ようにコンピュータを機能させることも好ましい。 According to another embodiment of the program of the present invention,
The first language system is a native language, the second language system is a foreign language,
As a usage of the first original language of the native language, it is also preferable to cause the computer to function so that a word string composed of the second original language of the foreign language and the second co-occurrence word is output.

本発明のプログラムにおける他の実施形態によれば、
共起語決定手段によって決定された第２の原語と第２の共起語とからなる語列をキーとして、メディアサーバへ送信し、その検索結果を出力する検索手段と
してコンピュータを更に機能させることも好ましい。 According to another embodiment of the program of the present invention,
Using the word sequence composed of the second original word and the second co-occurrence word determined by the co-occurrence word determination means as a key, further causing the computer to function as search means for transmitting the search result. Is also preferable.

本発明のプログラムにおける他の実施形態によれば、
メディアサーバは、文章、画像又は音声のメディアサーバである
ようにコンピュータを機能させることも好ましい。 According to another embodiment of the program of the present invention,
It is also preferred to have the computer function so that the media server is a text, image or audio media server.

本発明のプログラムにおける他の実施形態によれば、
第１の共起語抽出手段から抽出された複数の第１の共起語の中で、第１の原語と同一類似の格となる共起語を削除し、その他の第１の共起語を第１の対訳共起語抽出手段へ出力する第１の共起語フィルタ手段と
してコンピュータを更に機能させることも好ましい。 According to another embodiment of the program of the present invention,
Among the plurality of first co-occurrence words extracted from the first co-occurrence word extracting means, the co-occurrence word having the same and similar case as the first original word is deleted, and the other first co-occurrence words It is also preferable to further cause the computer to function as first co-occurrence word filter means for outputting to the first parallel co-occurrence word extraction means.

本発明のプログラムにおける他の実施形態によれば、
第２の言語体系のコーパス辞書を用いて、第２の原語と共起する複数の第２の共起語を抽出する第２の共起語抽出手段と、
第２の言語体系の対訳辞書を用いて、第２の共起語の対訳となる第１の対訳共起語を抽出する第２の対訳共起語抽出手段と、
第１の言語体系のコーパス辞書を用いて、第１の原語と第１の対訳共起語とを組み合わせた語列毎に出現頻度を検索する第２の出現頻度検索手段と
してコンピュータを更に機能させ、
共起語決定手段は、最も出現頻度が多い語列における第２の対訳共起語を、第１の原語に対する用法として適切な第２の共起語として決定し、最も出現頻度が多い語列における第１の対訳共起語を、第２の原語に対する用法として適切な第１の共起語として決定し、第１の共起語と第２の共起語における出現頻度の多い方の、第１の原語と第１の共起語とからなる語列、又は、第２の原語と第２の共起語とからなる語列の一方を決定する
ようにコンピュータを機能させることも好ましい。 According to another embodiment of the program of the present invention,
Second co-occurrence word extraction means for extracting a plurality of second co-occurrence words co-occurring with the second original word using a corpus dictionary of the second language system;
Second bilingual co-occurrence word extraction means for extracting a first bilingual co-occurrence word that is a bilingual translation of the second co-occurrence word using a bilingual dictionary of the second language system;
Using the corpus dictionary of the first language system, the computer further functions as second appearance frequency search means for searching for the appearance frequency for each word string combining the first original word and the first parallel co-occurrence word. ,
The co-occurrence word determining means determines the second parallel co-occurrence word in the word string having the highest appearance frequency as a second co-occurrence word suitable as a usage for the first original word, and the word string having the highest appearance frequency. The first parallel co-occurrence word in is determined as the first co-occurrence word suitable as a usage for the second original word, and the one with the higher occurrence frequency in the first co-occurrence word and the second co-occurrence word is determined. It is also preferable to cause the computer to function so as to determine one of a word string composed of the first original word and the first co-occurrence word, or a word string composed of the second original word and the second co-occurrence word.

本発明のプログラムにおける他の実施形態によれば、
第２の共起語抽出手段から抽出された複数の第２の共起語の中で、第２の原語と同一類似の格となる共起語を削除し、その他の第２の共起語を第２の対訳共起語抽出手段へ出力する第２の共起語フィルタ手段と
してコンピュータを更に機能させることも好ましい。 According to another embodiment of the program of the present invention,
Among the plurality of second co-occurrence words extracted from the second co-occurrence word extracting means, the co-occurrence words having the same and similar case as the second original word are deleted, and the other second co-occurrence words It is also preferable to further cause the computer to function as second co-occurrence word filtering means for outputting to the second bilingual co-occurrence word extracting means.

本発明のプログラムにおける他の実施形態によれば、
第１の共起語フィルタ手段から出力された第１の共起語と、第２の出現頻度検索手段から出力された第１の対訳共起語とを比較し、一致した第１の対訳共起語のみを共起語決定手段へ出力する第２の対訳共起語比較手段と、
第２の共起語フィルタ手段から出力された第２の共起語と、第１の出現頻度検索手段から出力された第２の対訳共起語とを比較し、一致した第２の対訳共起語のみを共起語決定手段へ出力する第１の対訳共起語比較手段と
してコンピュータを更に機能させることも好ましい。 According to another embodiment of the program of the present invention,
The first co-occurrence word output from the first co-occurrence word filter means and the first parallel translation co-occurrence word output from the second appearance frequency search means are compared, and the matched first parallel translation co-words are compared. Second parallel co-occurrence word comparison means for outputting only the word to the co-occurrence word determination means;
The second co-occurrence word output from the second co-occurrence word filter means is compared with the second bi-translation co-occurrence word output from the first appearance frequency search means, and the matched second parallel co-occurrence words are compared. It is also preferable that the computer further function as first parallel co-occurrence word comparison means for outputting only the word to the co-occurrence word determination means.

本発明によれば、第１の言語体系の第１の原語と、該第１の原語の対訳となる第２の言語体系の第２の原語とを入力し、第１の原語に対する用法として適切な第２の言語体系の第２の共起語を出力する装置であって、
第１の言語体系のコーパス辞書を用いて、第１の原語と共起する複数の第１の共起語を抽出する第１の共起語抽出手段と、
第１の言語体系の対訳辞書を用いて、第１の共起語の対訳となる第２の対訳共起語を抽出する第１の対訳共起語抽出手段と、
第２の言語体系のコーパス辞書を用いて、第２の原語と第２の対訳共起語とを組み合わせた語列毎に出現頻度を検索する第１の出現頻度検索手段と、
最も出現頻度が多い語列における第２の対訳共起語を、第１の原語に対する用法として適切な第２の共起語として決定する共起語決定手段と
を有することを特徴とする。 According to the present invention, the first original language of the first language system and the second original language of the second language system, which is a parallel translation of the first original language, are input, and suitable as a usage for the first original language A device for outputting a second co-occurrence word of a second language system,
First co-occurrence word extracting means for extracting a plurality of first co-occurrence words co-occurring with the first original word using a corpus dictionary of the first language system;
First bilingual co-occurrence word extraction means for extracting a second bilingual co-occurrence word that becomes a bilingual translation of the first co-occurrence word using a bilingual dictionary of the first language system;
Using a corpus dictionary of a second language system, first appearance frequency search means for searching for an appearance frequency for each word string combining the second original word and the second parallel co-occurrence word;
Co-occurrence word determining means for determining the second parallel co-occurrence word in the word string having the highest appearance frequency as a suitable second co-occurrence word as a usage for the first original word.

本発明によれば、装置を用いて、第１の言語体系の第１の原語と、該第１の原語の対訳となる第２の言語体系の第２の原語とを入力し、第１の原語に対する用法として適切な第２の言語体系の第２の共起語を出力する方法であって、
第１の言語体系のコーパス辞書を用いて、第１の原語と共起する複数の第１の共起語を抽出する第１のステップと、
第１の言語体系の対訳辞書を用いて、第１の共起語の対訳となる第２の対訳共起語を抽出する第２のステップと、
第２の言語体系のコーパス辞書を用いて、第２の原語と第２の対訳共起語とを組み合わせた語列毎に出現頻度を検索する第３のステップと、
最も出現頻度が多い語列における第２の対訳共起語を、第１の原語に対する用法として適切な第２の共起語として決定する第４のステップと
を有することを特徴とする。 According to the present invention, the first original language of the first language system and the second original language of the second language system that is a parallel translation of the first original language are input using the device, A method of outputting a second co-occurrence word of a second language system suitable as a usage for an original language,
A first step of extracting a plurality of first co-occurrence words co-occurring with a first original word using a corpus dictionary of a first language system;
A second step of extracting a second parallel co-occurrence word that is a parallel translation of the first co-occurrence word using the parallel translation dictionary of the first language system;
A third step of searching for the appearance frequency for each word string combining the second original word and the second parallel co-occurrence word using a corpus dictionary of the second language system;
And a fourth step of determining a second co-occurrence word in the word string having the highest appearance frequency as a second co-occurrence word suitable as a usage for the first original word.

本発明のプログラム、装置及び方法によれば、異なる言語体系に対して適切な検索キーワードを出力することができる。この検索キーワードは、その語を理解させるべきメディアデータの検索に用いることができる。 According to the program, apparatus and method of the present invention, it is possible to output search keywords suitable for different language systems. This search keyword can be used to search for media data that should be understood.

本発明における対訳キーワード抽出装置を含むシステム構成図である。It is a system block diagram containing the bilingual keyword extraction apparatus in this invention. 外国語に対する用法として適切な日本語を抽出する本発明のプログラムの機能構成図である。It is a functional block diagram of the program of this invention which extracts Japanese appropriate for the usage with respect to a foreign language. 日本語に対する用法として適切な外国語を抽出する本発明のプログラムの機能構成図である。It is a functional block diagram of the program of this invention which extracts the foreign language suitable as usage to Japanese. 図２の構成に、共起語フィルタ部を含めた機能構成図である。It is a functional block diagram which included the co-occurrence word filter part in the structure of FIG. 図２の構成と図３の構成とを並列に構成した機能構成図である。FIG. 4 is a functional configuration diagram in which the configuration of FIG. 2 and the configuration of FIG. 3 are configured in parallel. 図５の構成の中で、異なる言語体系間で共起語を比較した機能構成図である。FIG. 6 is a functional configuration diagram comparing co-occurrence words between different language systems in the configuration of FIG. 5.

以下、本発明の実施の形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明における対訳キーワード抽出装置を含むシステム構成図である。 FIG. 1 is a system configuration diagram including a bilingual keyword extracting apparatus according to the present invention.

図１によれば、対訳キーワード抽出装置が、インターネットのようなネットワークに接続されている。また、インターネットには、画像や音声を検索可能なメディアサーバも接続されている。メディアサーバは、予め大量のメディアデータ（画像、映像、音声）を蓄積したものであって、クエリのキーワードを受信し、そのキーワードに適切なメディアデータを検索して返信する。メディアサーバは、メディアデータに、多言語のテキストをメタデータとして対応付けたものである。 According to FIG. 1, the bilingual keyword extracting apparatus is connected to a network such as the Internet. In addition, a media server capable of searching for images and sounds is connected to the Internet. The media server stores a large amount of media data (image, video, audio) in advance, receives a query keyword, searches for media data appropriate for the keyword, and returns it. The media server associates multilingual text as metadata with media data.

図１によれば、利用者が、学習対象となる外国語「much」を、ユーザ端末へ入力したとする。ユーザ端末は、その外国語「much」を、対訳キーワード抽出装置へ送信する。対訳キーワード抽出装置は、その外国語「much」を母国語「たくさん」に翻訳する。尚、ユーザ端末が、その外国語「much」を母国語「たくさん」に翻訳して、「much」「たくさん」の両方を、対訳キーワード抽出装置へ送信するものであってもよい。 According to FIG. 1, it is assumed that the user inputs the foreign language “much” to be learned to the user terminal. The user terminal transmits the foreign language “much” to the bilingual keyword extraction device. The bilingual keyword extraction device translates the foreign language “much” into the native language “many”. The user terminal may translate the foreign language “much” into the native language “many” and transmit both “much” and “many” to the bilingual keyword extracting apparatus.

これに対し、対訳キーワード抽出装置は、その外国語「much」を学習者に理解させるために適切な母国語の検索キーワード「たくさんのお金」を選択したとする。そして、対訳キーワード抽出装置は、母国語「たくさんのお金」をクエリとして、メディアサーバを検索する。メディアサーバは、「たくさんのお金」に近いメタデータを対応付けたメディアデータ（画像、映像又は音声）を、対訳キーワード抽出装置へ返信する。 On the other hand, it is assumed that the bilingual keyword extraction apparatus selects a search keyword “many moneys” in an appropriate native language in order for the learner to understand the foreign language “much”. Then, the bilingual keyword extracting device searches the media server using the native language “a lot of money” as a query. The media server returns media data (image, video, or sound) associated with metadata close to “a lot of money” to the bilingual keyword extraction device.

そして、対訳キーワード抽出装置は、メディアサーバから受信したメディアデータを、ユーザ端末へ返信する。ユーザ端末は、そのメディアデータを再生することによって、利用者は、入力した外国語「much」の意味合いを、母国語「たくさんのお金」に関連したメディアデータによって、視聴的に理解することができる。 Then, the bilingual keyword extraction device returns the media data received from the media server to the user terminal. By playing the media data on the user terminal, the user can visually understand the meaning of the input foreign language “much” from the media data related to the mother tongue “much money”. .

ここで、対訳キーワード抽出装置は、逆に、ユーザ端末から、学習対象となる母国語を受信し、その母国語に関連する外国語のメディアデータを、ユーザ端末へ返信するものであってもよい。即ち、本発明の対訳キーワード抽出装置は、第１の言語体系の第１の原語と、その第１の原語の対訳となる第２の言語体系の第２の原語とから、第１の原語に対する用法として適切な第２の言語体系の第２の共起語を出力するものである。 Here, the bilingual keyword extraction device may, conversely, receive the native language to be learned from the user terminal and return the foreign language media data related to the native language to the user terminal. . That is, the bilingual keyword extracting apparatus according to the present invention applies the first original language from the first original language of the first language system and the second original language of the second language system that is the parallel translation of the first original language. The second co-occurrence word of the second language system suitable for usage is output.

図２は、外国語に対する用法として適切な日本語を抽出する本発明のプログラムの機能構成図である。 FIG. 2 is a functional configuration diagram of the program of the present invention for extracting Japanese appropriate for usage in a foreign language.

図２によれば、対訳キーワード抽出装置（サーバ）に搭載されたコンピュータを機能させるプログラムの構成を表す。ここでは、第１の言語体系が外国語であり、第２の言語体系が日本語である。また、外国語の第１の原語に対する用法として、日本語の第２の原語と第２の共起語とからなる語列が出力される。 FIG. 2 shows the configuration of a program that causes a computer installed in a bilingual keyword extraction apparatus (server) to function. Here, the first language system is a foreign language, and the second language system is Japanese. In addition, as a usage for the first original language of the foreign language, a word string composed of the second Japanese original language and the second co-occurrence language is output.

図２によれば、対訳キーワード抽出装置は、外国語コーパス辞書１１１と、外国語対訳辞書１１２と、日本語コーパス辞書１２１とを有する。コーパス(corpus)とは、自然言語の文章を構造化し大規模に集積したものであって、語毎に、品詞や共起語が対応付けられたデータベースである。「外国語コーパス辞書１１１」は、外国語の語毎に、使用頻度が高い共起語が対応付けられたものである。同様に、「日本語コーパス辞書１２１」も、日本語の語毎に、使用頻度が高い共起語が対応付けられたものである。また、「外国語対訳辞書１１２」は、外国語の語の対訳となる日本語の語が対応付けられたものである。 Referring to FIG. 2, the bilingual keyword extraction device includes a foreign language corpus dictionary 111, a foreign language bilingual dictionary 112, and a Japanese corpus dictionary 121. A corpus is a database in which sentences in natural language are structured and accumulated on a large scale, and a part of speech or a co-occurrence word is associated with each word. The “foreign language corpus dictionary 111” is one in which a co-occurrence word having a high frequency of use is associated with each foreign language word. Similarly, the “Japanese corpus dictionary 121” is one in which a co-occurrence word having a high use frequency is associated with each Japanese word. The “foreign language parallel translation dictionary 112” is a dictionary in which Japanese words that are parallel translations of foreign language words are associated with each other.

また、図２によれば、対訳キーワード抽出装置は、対訳単語入力部２と、第１の共起語抽出部３１と、第１の対訳共起語抽出部３２と、第１の出現頻度検索部３３と、共起語決定部５と、検索部６とを有する。これら機能構成部は、装置に搭載されたコンピュータを機能させるプログラムを実行することによって実現される。 Further, according to FIG. 2, the bilingual keyword extraction device includes a bilingual word input unit 2, a first co-occurrence word extraction unit 31, a first bilingual co-occurrence word extraction unit 32, and a first appearance frequency search. The unit 33, the co-occurrence word determination unit 5, and the search unit 6 are included. These functional components are realized by executing a program that causes a computer installed in the apparatus to function.

［対訳単語入力部２］
対訳キーワード抽出装置は、外国語の原語と、その対訳となる日本語の原語とを入力する。対訳の日本語の原語は、対訳キーワード抽出装置によって対訳されたものであってもよいし、ユーザ端末から受信したものであってもよい。図２によれば、例えば以下のような語を入力する。
（外国語の原語）much <-> （日本語の原語）たくさん
外国語の原語「much」は、第１の共起語抽出部３１へ出力される。また、日本語の原語「たくさん」は、第１の出現頻度検索部３３へ出力される。 [Parallel translation word input part 2]
The bilingual keyword extraction device inputs a foreign language original language and a Japanese original language to be translated. The bilingual Japanese source language may be translated by a bilingual keyword extraction device or received from a user terminal. According to FIG. 2, for example, the following words are input.
(Foreign language original) much <-> (Japanese original language) Many foreign language original words “much” are output to the first co-occurrence word extraction unit 31. Also, the Japanese original word “many” is output to the first appearance frequency search unit 33.

［第１の共起語抽出部３１］
第１の共起語抽出部３１は、外国語コーパス辞書１１１（第１の言語体系のコーパス辞書）を用いて、外国語の原語「much」（第１の原語）と共起する複数の第１の共起語を抽出する。外国語の原語「much」に対して、例えば以下のような語が抽出されたとする。
（外国語の原語）much -> （第１の共起語）
better
pretty
money
success
music
抽出された共起語は、第１の対訳共起語抽出部３２へ出力される。 [First co-occurrence word extraction unit 31]
The first co-occurrence word extraction unit 31 uses a foreign language corpus dictionary 111 (a corpus dictionary of a first language system) to generate a plurality of first co-occurrence words that co-occur with a foreign language original word “much” (first original language). Extract one co-occurrence word. For example, the following words are extracted from the original language “much” of the foreign language.
(Foreign language) much-> (first co-occurrence)
better
pretty
money
success
music
The extracted co-occurrence words are output to the first parallel co-occurrence word extraction unit 32.

尚、共起語の抽出は、原語に対する１つ以上の共起語が抽出される。また、コーパスには、原語と共起語との出現頻度（共起頻度）が対応付けられたものであってもよい。共起語としては、例えば、同じ意味を持つ同義語、反対語、慣用句、成語、用法等がある。 In the extraction of co-occurrence words, one or more co-occurrence words for the original word are extracted. The corpus may be associated with the appearance frequency (co-occurrence frequency) of the original word and the co-occurrence word. Examples of co-occurrence words include synonyms, antonyms, idiomatic phrases, idioms, usages, and the like having the same meaning.

［第１の対訳共起語抽出部３２］
第１の対訳共起語抽出部３２は、外国語対訳辞書１１２（第１の言語体系の対訳辞書）を用いて、第１の共起語の対訳となる第２の対訳共起語を抽出する。例えば以下のような語が抽出されたとする。
（外国語の第１の共起語）（日本語の第２の対訳共起語）
better -> 良い
pretty -> かわいい
money -> お金
success -> 成功
music -> 音楽
抽出された第２の対訳共起語は、第１の出現頻度検索部３３へ出力される。 [First parallel co-occurrence word extraction unit 32]
The first bilingual co-occurrence word extraction unit 32 uses the foreign language bilingual dictionary 112 (the bilingual dictionary of the first language system) to extract a second bilingual co-occurrence word that is a bilingual translation of the first co-occurring word. To do. For example, assume that the following words are extracted.
(Foreign language first co-occurrence) (Japanese second parallel co-occurrence)
better-> better
pretty-> cute
money-> money
success-> success
music-> music The extracted second parallel co-occurrence word is output to the first appearance frequency search unit 33.

［第１の出現頻度検索部３３］
第１の出現頻度検索部３３は、日本語コーパス辞書１２１（第２の言語体系のコーパス辞書）を用いて、日本語の第２の原語「たくさん」と第２の対訳共起語とを組み合わせた語列毎に、出現頻度を検索する。例えば以下のような語列に対する出現頻度が抽出される。
（日本語の原語と対訳共起語）（出現頻度）
たくさんのお金 -> 1500
たくさんの音楽 -> 780
たくさんの成功 -> 550
たくさんの良い -> 30
たくさんのかわいい -> 20
尚、出現頻度は、例えば「たくさんの」から「お金」方向への出現頻度と、「お金」から「たくさんの」方向への出現頻度とを導出し、その両者を合わせて単語間の出現頻度とするものであってもよい。 [First appearance frequency search unit 33]
The first appearance frequency search unit 33 uses the Japanese corpus dictionary 121 (the corpus dictionary of the second language system) to combine the second Japanese original word “many” and the second parallel co-occurrence word. Search for appearance frequency for each word string. For example, appearance frequencies for the following word strings are extracted.
(Japanese original words and parallel co-occurrence words) (Appearance frequency)
Lots of money-> 1500
Lots of music-> 780
Lots of success-> 550
Many good-> 30
Many cute-> 20
The frequency of appearance is, for example, the frequency of appearance from the direction of “many” to “money” and the frequency of occurrence from “money” to the direction of “many”, and combining them to generate the frequency of occurrence between words. It may be.

［共起語決定部５］
共起語決定部５は、最も出現頻度が多い語列「たくさんのお金」における第２の対訳共起語を、第１の原語に対する用法として適切な第２の共起語として決定する。決定された「たくさんのお金」は、検索部６へ出力される。 [Co-occurrence word determination unit 5]
The co-occurrence word determining unit 5 determines the second parallel co-occurrence word in the word string “many moneys” having the highest appearance frequency as a second co-occurrence word suitable as a usage for the first original word. The determined “a lot of money” is output to the search unit 6.

他の実施形態として、第１の出現頻度検索部３３は、単なる出現頻度ではなく、ＴＦ−ＩＤＦ（Term Frequency - Inverse Document Frequency：単語の出現頻度−逆出現頻度）によって特徴的な語列を抽出するものであってもよい（例えば非特許文献２参照）。ＴＦ−ＩＤＦとは、各単語に重みを付けて、クエリから文章をベクトル空間で表し、文章とクエリの類似度でランク付けをする技術である。 As another embodiment, the first appearance frequency search unit 33 extracts a characteristic word string based on TF-IDF (Term Frequency-Inverse Document Frequency) instead of a simple appearance frequency. (For example, refer nonpatent literature 2). TF-IDF is a technology that weights each word, expresses sentences from a query in a vector space, and ranks them based on the similarity between the sentences and the query.

［検索部６］
検索部６は、共起語決定部５によって決定された第２の原語と第２の共起語とからなる語列「たくさんのお金」をキーとして、メディアサーバへ送信する。これに対し、メディアサーバは、「たくさんのお金」に近いメタデータが対応付けられたメディアデータを、対訳キーワード抽出装置の検索部６へ返信する。そして、そのメディアデータは、ユーザ端末へ、検索結果として返信される。 [Search unit 6]
The search unit 6 transmits, to the media server, the word string “much money” made up of the second original word and the second co-occurrence word determined by the co-occurrence word determination unit 5 as a key. On the other hand, the media server returns media data associated with metadata close to “a lot of money” to the search unit 6 of the bilingual keyword extraction device. Then, the media data is returned as a search result to the user terminal.

ユーザ端末は、対訳キーワード抽出装置から返信されたメディアデータを再生する。利用者は、日本語「たくさんのお金」に関連するメタデータを視聴することによって、外国語「much」の意味合いを理解することができる。 The user terminal reproduces the media data returned from the bilingual keyword extraction device. Users can understand the meaning of the foreign language “much” by viewing metadata related to Japanese “a lot of money”.

図３は、日本語に対する用法として適切な外国語を抽出する本発明のプログラムの機能構成図である。 FIG. 3 is a functional configuration diagram of a program according to the present invention for extracting a foreign language suitable as a usage for Japanese.

図３によれば、第１の言語体系が日本語であり、第２の言語体系が外国語である。日本語の第１の原語に対する用法として、外国語の第２の原語と第２の共起語とからなる語列が出力される。即ち、図２と比較して、検索対象となる言語体系が逆となったものである。 According to FIG. 3, the first language system is Japanese and the second language system is a foreign language. As a usage for the first Japanese original language, a word string composed of the second foreign language foreign word and the second co-occurrence word is output. That is, as compared with FIG. 2, the language system to be searched is reversed.

［対訳単語入力部２］
対訳キーワード抽出装置は、例えば以下のような語を入力する。
（日本語の原語）たくさん <-> （外国語の原語）much
日本語の原語「たくさん」は、第２の共起語抽出部４１へ出力される。また、外国語の原語「much」は、第２の出現頻度検索部４３へ出力される。 [Parallel translation word input part 2]
The bilingual keyword extracting device inputs the following words, for example.
(Original Japanese) <-> (Foreign language) much
The original Japanese word “many” is output to the second co-occurrence word extraction unit 41. Further, the foreign language original word “much” is output to the second appearance frequency search unit 43.

［第２の共起語抽出部４１］
第２の共起語抽出部４１は、日本語コーパス辞書１２１（第２の言語体系のコーパス辞書）を用いて、日本語の原語「たくさん」（第２の原語）と共起する複数の第２の共起語を抽出する。日本語の原語「たくさん」に対して、例えば以下のような語が抽出されたとする。
（日本語の原語）たくさん -> （第２の共起語）
多い
お金
命
音楽
笑顔
抽出された共起語は、第２の対訳共起語抽出部４２へ出力される。 [Second co-occurrence word extraction unit 41]
The second co-occurrence word extraction unit 41 uses a Japanese corpus dictionary 121 (a corpus dictionary of the second language system) to generate a plurality of second words that co-occur with the Japanese original word “many” (second original word). Two co-occurrence words are extracted. For example, the following words are extracted from the Japanese original word “many”.
(Original Japanese) Lots-> (Second co-occurrence)
Many
money
life
musics
The extracted co-occurrence words are output to the second bilingual co-occurrence word extraction unit 42.

［第２の対訳共起語抽出部４２］
第２の対訳共起語抽出部４２は、日本語対訳辞書１２２（第２の言語体系の対訳辞書）を用いて、第２の共起語の対訳となる第１の対訳共起語を抽出する。例えば以下のような語が抽出されたとする。
（日本語の第２の共起語）（外国語の第１の対訳共起語）
多い -> many
お金 -> money
命 -> live
音楽 -> music
笑顔 -> smile
抽出された第１の対訳共起語は、第２の出現頻度検索部４３へ出力される。 [Second parallel co-occurrence word extraction unit 42]
The second bilingual co-occurrence word extraction unit 42 uses the Japanese bilingual dictionary 122 (the bilingual dictionary of the second language system) to extract the first bilingual co-occurrence word that is the bilingual translation of the second co-occurrence word. To do. For example, assume that the following words are extracted.
(Second co-occurrence word in Japanese) (First parallel co-occurrence word in foreign language)
Many-> many
Money-> money
Life-> live
Music-> music
Smile-> smile
The extracted first parallel translation co-occurrence word is output to the second appearance frequency search unit 43.

［第２の出現頻度検索部４３］
第２の出現頻度検索部４３は、外国語コーパス辞書１１１（第１の言語体系のコーパス辞書）を用いて、外国語の第１の原語「much」と第１の対訳共起語とを組み合わせた語列毎に、出現頻度を検索する。例えば以下のような語列に対する出現頻度が抽出される。
（外国語の原語と対訳共起語）（出現頻度）
much smile -> 2500
much money -> 1200
much music -> 800
much live -> 340
much many -> 1 [Second appearance frequency search unit 43]
The second appearance frequency search unit 43 uses the foreign language corpus dictionary 111 (corpus dictionary of the first language system) to combine the first original word “much” of the foreign language and the first parallel co-occurrence word. Search for appearance frequency for each word string. For example, appearance frequencies for the following word strings are extracted.
(Foreign language and parallel co-occurrence words) (Appearance frequency)
much smile-> 2500
much money-> 1200
much music-> 800
much live-> 340
much many-> 1

［共起語決定部５］
共起語決定部５は、最も出現頻度が多い語列「much smile」における第２の対訳共起語を、第１の原語に対する用法として適切な第２の共起語として決定する。決定された「much smile」は、検索部６へ出力される。 [Co-occurrence word determination unit 5]
The co-occurrence word determination unit 5 determines the second parallel co-occurrence word in the word string “much smile” having the highest appearance frequency as a second co-occurrence word suitable as a usage for the first original word. The determined “much smile” is output to the search unit 6.

［検索部６］
検索部６は、共起語決定部５によって決定された第２の原語と第２の共起語とからなる語列「much smile」をキーとして、メディアサーバへ送信する。これに対し、メディアサーバは、「much smile」に近いメタデータが対応付けられたメディアデータを、対訳キーワード抽出装置の検索部６へ返信する。そして、そのメディアデータは、ユーザ端末へ、検索結果として返信される。 [Search unit 6]
The search unit 6 transmits the word string “much smile” composed of the second original word and the second co-occurrence word determined by the co-occurrence word determination unit 5 to the media server as a key. In response to this, the media server returns the media data associated with metadata close to “much smile” to the search unit 6 of the bilingual keyword extraction device. Then, the media data is returned as a search result to the user terminal.

ユーザ端末は、対訳キーワード抽出装置から返信されたメディアデータを再生する。利用者は、外国語「much smile」に関連するメタデータを視聴することによって、日本語「たくさん」の意味合いを理解することができる。 The user terminal reproduces the media data returned from the bilingual keyword extraction device. Users can understand the meaning of Japanese “many” by viewing metadata related to the foreign language “much smile”.

図４は、図２の構成に、共起語フィルタ部を含めた機能構成図である。 FIG. 4 is a functional configuration diagram including a co-occurrence word filter unit in the configuration of FIG.

図４によれば、第１の共起語抽出部３１と、第１の対訳共起語抽出部３２との間に、第１の共起語フィルタ部３４が備えられている。 According to FIG. 4, a first co-occurrence word filter unit 34 is provided between the first co-occurrence word extraction unit 31 and the first parallel translation co-occurrence word extraction unit 32.

［第１の共起語フィルタ部３４］
第１の共起語フィルタ部３４は、第１の共起語抽出部３１から抽出された複数の第１の共起語の中で、第１の原語「much」と同一類似の格となる共起語を削除し、その他の第１の共起語を第１の対訳共起語抽出部３２へ出力する。例えば以下のように共起語が削除される。
（外国語の原語）much -> （第１の共起語）
×better（muchと同一類似の副詞格）
×pretty（muchと同一類似の副詞格）
money
success
music
尚、第１の共起語フィルタ部３４は、a, then等の定冠詞や、at, from等の前置詞を除外することも好ましい。 [First Co-occurrence Word Filter Unit 34]
The first co-occurrence word filter unit 34 has the same case as the first original word “much” among the plurality of first co-occurrence words extracted from the first co-occurrence word extraction unit 31. The co-occurrence words are deleted, and the other first co-occurrence words are output to the first parallel co-occurrence word extraction unit 32. For example, co-occurrence words are deleted as follows.
(Foreign language) much-> (first co-occurrence)
× better (same adverb case as much)
× pretty (adverbic case similar to much)
money
success
music
The first co-occurrence word filter unit 34 preferably excludes definite articles such as a and then and prepositions such as at and from.

図４には図示していないが、第２の共起語抽出部４１と、第２の対訳共起語抽出部４２との間に、第２の共起語フィルタ部４４を備えることも好ましい。 Although not shown in FIG. 4, it is also preferable to provide a second co-occurrence word filter unit 44 between the second co-occurrence word extraction unit 41 and the second parallel translation co-occurrence word extraction unit 42. .

［第２の共起語フィルタ部４４］
第２の共起語フィルタ部４４は、第２の共起語抽出部４１から抽出された複数の第２の共起語の中で、第２の原語と同一類似の格となる共起語を削除し、その他の第２の共起語を第２の対訳共起語抽出部４２へ出力する。例えば以下のように共起語が削除される。
（日本語の原語）たくさん -> （第２の共起語）
×多い（「たくさん」と同一類似の副詞格）
お金
命
音楽
笑顔 [Second Co-occurrence Word Filter Unit 44]
The second co-occurrence word filter unit 44 is a co-occurrence word having the same similarity as the second original word among the plurality of second co-occurrence words extracted from the second co-occurrence word extraction unit 41. And the other second co-occurrence words are output to the second parallel co-occurrence word extraction unit 42. For example, co-occurrence words are deleted as follows.
(Original Japanese) Lots-> (Second co-occurrence)
× Many (adverbial case similar to “many”)
money
life
musics
Smile

図５は、図２の構成と図３の構成とを並列に構成した機能構成図である。 FIG. 5 is a functional configuration diagram in which the configuration of FIG. 2 and the configuration of FIG. 3 are configured in parallel.

図５によれば、図２の構成と図３の構成とが、共起語決定部５によって統合されている。共起語決定部５は、以下のステップで共起語を決定する。
（Ｓ１）最も出現頻度が多い語列における第２の対訳共起語「お金」を、第１の原語「たくさん」に対する用法として適切な第２の共起語として決定する。即ち、「たくさんのお金」が決定される。
（Ｓ２）最も出現頻度が多い語列における第１の対訳共起語「smile」を、第２の原語に対する用法として適切な第１の共起語として決定する。即ち、「much smile」が決定される。
（Ｓ３）第１の共起語と第２の共起語における出現頻度の多い方の、第１の原語と第１の共起語とからなる語列、又は、第２の原語と第２の共起語とからなる語列の一方を決定する。ここでは、出現頻度が多い「much smile」が決定される。 According to FIG. 5, the configuration of FIG. 2 and the configuration of FIG. 3 are integrated by the co-occurrence word determination unit 5. The co-occurrence word determination unit 5 determines the co-occurrence word in the following steps.
(S1) The second bilingual co-occurrence word “money” in the word sequence having the highest appearance frequency is determined as a second co-occurrence word suitable as a usage for the first original word “many”. That is, “a lot of money” is determined.
(S2) The first parallel co-occurrence word “smile” in the word string having the highest appearance frequency is determined as the first co-occurrence word suitable as a usage for the second original word. That is, “much smile” is determined.
(S3) A word string composed of the first original word and the first co-occurrence word, or the second original word and the second one having the higher appearance frequency in the first co-occurrence word and the second co-occurrence word. One of the word strings consisting of the co-occurrence words is determined. Here, “much smile” having a high appearance frequency is determined.

図６は、図５の構成の中で、異なる言語体系間で共起語を比較した機能構成図である。 FIG. 6 is a functional configuration diagram in which co-occurrence words are compared between different language systems in the configuration of FIG.

［第１の対訳共起語比較部３５］
第１の対訳共起語比較部３５は、第２の共起語フィルタ部４４から出力された第２の共起語と、第１の出現頻度検索部３３から出力された第２の対訳共起語とを比較し、一致した第２の対訳共起語のみを共起語決定部５へ出力する。図６によれば、以下のように比較できる。
（第１の出現頻度検索部３３の出力）（第２の共起語フィルタ部４４の出力）
○たくさんのお金お金
×たくさんの成功命
○たくさんの音楽音楽
笑顔
これによって、第１の対訳共起語比較部３５は、「たくさんのお金」「たくさんの音楽」を、共起語決定部５へ出力する。 [First parallel co-occurrence word comparison unit 35]
The first bilingual co-occurrence word comparison unit 35 outputs the second co-occurrence word output from the second co-occurrence word filter unit 44 and the second bilingual co-occurrence word output from the first appearance frequency search unit 33. The word is compared, and only the matched second parallel co-occurrence word is output to the co-occurrence word determination unit 5. According to FIG. 6, the comparison can be made as follows.
(Output of first appearance frequency search unit 33) (Output of second co-occurrence word filter unit 44)
○ A lot of money Money
× Many successes
○ Lots of music Music
Smile By this, the first parallel co-occurrence word comparison unit 35 outputs “much money” and “much music” to the co-occurrence word determination unit 5.

［第２の対訳共起語比較部４５］
第２の対訳共起語比較部４５は、第１の共起語フィルタ部３４から出力された第１の共起語と、第２の出現頻度検索部４３から出力された第１の対訳共起語とを比較し、一致した第１の対訳共起語のみを共起語決定部５へ出力する。
（第２の出現頻度検索部４３の出力）（第２の共起語フィルタ部４４の出力）
×much smile success
○much money money
○much music music
×much live
これによって、第２の対訳共起語比較部４５は、「much money」「much music」を、共起語決定部５へ出力する。 [Second parallel co-occurrence word comparison unit 45]
The second parallel co-occurrence word comparison unit 45 outputs the first co-occurrence word output from the first co-occurrence word filter unit 34 and the first parallel translation co-occurrence word output from the second appearance frequency search unit 43. The words are compared with each other, and only the matched first parallel co-occurrence word is output to the co-occurrence word determining unit 5.
(Output of Second Appearance Frequency Search Unit 43) (Output of Second Co-occurrence Word Filter Unit 44)
× much smile success
○ much money money
○ much music music
× much live
As a result, the second parallel co-occurrence word comparison unit 45 outputs “much money” and “much music” to the co-occurrence word determination unit 5.

図６によれば、共起語決定部５は、最も出現頻度の高い語列を決定する。
たくさんのお金 -> 1500
たくさんの音楽 -> 780
much money -> 1200
much music -> 800
最終的に、共起語決定部５は、「たくさんのお金」を検索キーワードとして決定し、検索部６へ出力する。 According to FIG. 6, the co-occurrence word determination unit 5 determines a word string having the highest appearance frequency.
Lots of money-> 1500
Lots of music-> 780
much money-> 1200
much music-> 800
Finally, the co-occurrence word determination unit 5 determines “a lot of money” as a search keyword and outputs it to the search unit 6.

検索部６は、語列「たくさんのお金」をキーとして、メディアサーバへ送信する。これに対し、メディアサーバは、「たくさんのお金」に近いメタデータが対応付けられたメディアデータを、対訳キーワード抽出装置の検索部６へ返信する。そして、そのメディアデータは、ユーザ端末へ、検索結果として返信される。 The search unit 6 transmits the word string “a lot of money” as a key to the media server. On the other hand, the media server returns media data associated with metadata close to “a lot of money” to the search unit 6 of the bilingual keyword extraction device. Then, the media data is returned as a search result to the user terminal.

以上、詳細に説明したように、本発明のプログラム、装置及び方法によれば、異なる言語体系に対して適切な検索キーワードを出力することができる。この検索キーワードは、その語を理解させるべきメディアデータの検索に用いることができる。 As described above in detail, according to the program, apparatus and method of the present invention, it is possible to output search keywords suitable for different language systems. This search keyword can be used to search for media data that should be understood.

特に、本発明によれば、例えば外国語の単語学習装置について、使用するメディアデータ（画像、映像又は音声）の学習素材を、予め記憶しておく必要がない。その都度、膨大な外部のメディアサーバを検索することによって、適切なメディアデータを取得することができる。また、本発明によれば、外国語と母国語との間で、できる限り母国語で利用頻度が多いキーワードを用いて、メディアデータを検索することができる。従って、外国語を母国語に単に直訳しただけでは理解できない外国語であっても、母国語として利用頻度が多いキーワードに自動的に補完され、利用者にとって適切なメディアデータが視聴されることとなる。 In particular, according to the present invention, for example, for a foreign language word learning apparatus, it is not necessary to previously store learning materials of media data (images, videos, or sounds) to be used. In each case, appropriate media data can be acquired by searching a vast number of external media servers. In addition, according to the present invention, media data can be searched between a foreign language and a native language using keywords that are frequently used in the native language as much as possible. Therefore, even foreign languages that cannot be understood by simply translating the foreign language into the native language are automatically supplemented with keywords that are frequently used as the native language, and media data appropriate for the user can be viewed. Become.

前述した本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。前述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 Various changes, modifications, and omissions of the above-described various embodiments of the present invention can be easily made by those skilled in the art. The above description is merely an example, and is not intended to be restrictive. The invention is limited only as defined in the following claims and the equivalents thereto.

１対訳キーワード抽出装置
１１１外国語コーパス辞書
１１２外国語対訳辞書
１２１日本語コーパス辞書
１２２日本語対訳辞書
２対訳単語入力部
３１第１の共起語抽出部
３２第１の対訳共起語抽出部
３３第１の出現頻度検索部
３４第１の共起語フィルタ部
３５第１の対訳共起語比較部
４１第２の共起語抽出部
４２第２の対訳共起語抽出部
４３第２の出現頻度検索部
４４第２の共起語フィルタ部
４５第２の対訳共起語比較部
５共起語決定部
６検索部 DESCRIPTION OF SYMBOLS 1 Parallel keyword extraction apparatus 111 Foreign language corpus dictionary 112 Foreign language parallel dictionary 121 Japanese corpus dictionary 122 Japanese parallel dictionary 2 Parallel word input part 31 1st co-occurrence word extraction part 32 1st parallel translation word extraction part 33 1st appearance frequency search part 34 1st co-occurrence word filter part 35 1st parallel translation co-occurrence word comparison part 41 2nd co-occurrence word extraction part 42 2nd parallel translation co-occurrence word extraction part 43 2nd appearance Frequency search unit 44 Second co-occurrence word filter unit 45 Second parallel co-occurrence word comparison unit 5 Co-occurrence word determination unit 6 Search unit

Claims

The first language of the first language system and the second language of the second language system, which is a parallel translation of the first language, are input, and the second language system suitable as a usage for the first language A program that causes a computer to function to output the second co-occurrence word of
First co-occurrence word extracting means for extracting a plurality of first co-occurrence words co-occurring with the first original word using a corpus dictionary of the first language system;
First bilingual co-occurrence word extraction means for extracting a second bilingual co-occurrence word that becomes a bilingual translation of the first co-occurrence word using a bilingual dictionary of the first language system;
Using a corpus dictionary of a second language system, first appearance frequency search means for searching for an appearance frequency for each word string combining the second original word and the second parallel co-occurrence word;
A computer is caused to function as a co-occurrence word determining means for determining a second parallel co-occurrence word in a word string having the highest appearance frequency as a second co-occurrence word suitable as a usage for the first original word. program.

The first language system is a foreign language, the second language system is a native language,
The computer is caused to function as a usage of the first original language of the foreign language so that a word string composed of the second original language and the second co-occurrence language of the native language is output. The program described in.

The first language system is a native language, the second language system is a foreign language,
2. The computer according to claim 1, wherein the computer functions so as to output a word string composed of the second original word and the second co-occurrence word of the foreign language as a usage for the first original word of the native language. The program described in.

The computer is further caused to function as search means for transmitting to the media server using the word string consisting of the second original word and the second co-occurrence word determined by the co-occurrence word determination means as a key and outputting the search result. The program according to any one of claims 1 to 3, wherein:

The program according to any one of claims 1 to 4, wherein the media server causes a computer to function as a text, image, or audio media server.

Among the plurality of first co-occurrence words extracted from the first co-occurrence word extracting means, the co-occurrence word having the same and similar case as the first original word is deleted, and the other first co-occurrence words 6. The program according to claim 1, wherein the computer is further caused to function as first co-occurrence word filtering means for outputting to the first parallel co-occurrence word extracting means.

Second co-occurrence word extraction means for extracting a plurality of second co-occurrence words co-occurring with the second original word using a corpus dictionary of the second language system;
Second bilingual co-occurrence word extraction means for extracting a first bilingual co-occurrence word that is a bilingual translation of the second co-occurrence word using a bilingual dictionary of the second language system;
Using the corpus dictionary of the first language system, the computer further functions as second appearance frequency search means for searching for the appearance frequency for each word string combining the first original word and the first parallel co-occurrence word. ,
The co-occurrence word determining means determines the second parallel co-occurrence word in the word string having the highest appearance frequency as a second co-occurrence word suitable as a usage for the first original word, and the word having the highest appearance frequency. The first parallel co-occurrence word in the column is determined as the first co-occurrence word suitable as a usage for the second original word, and the one with the highest appearance frequency in the first co-occurrence word and the second co-occurrence word is determined. The computer is made to function so as to determine one of a word string composed of the first original word and the first co-occurrence word, or a word string composed of the second original word and the second co-occurrence word. The program according to any one of claims 1 to 6.

Among the plurality of second co-occurrence words extracted from the second co-occurrence word extracting means, the co-occurrence words having the same and similar case as the second original word are deleted, and the other second co-occurrence words The program according to claim 7, further causing the computer to function as second co-occurrence word filtering means for outputting to the second parallel co-occurrence word extraction means.

The first co-occurrence word output from the first co-occurrence word filter means and the first parallel translation co-occurrence word output from the second appearance frequency search means are compared, and the matched first parallel translation co-words are compared. Second parallel co-occurrence word comparison means for outputting only the word to the co-occurrence word determination means;
The second co-occurrence word output from the second co-occurrence word filter means is compared with the second bi-translation co-occurrence word output from the first appearance frequency search means, and the matched second parallel co-occurrence words are compared. 9. The program according to claim 8, further causing a computer to function as first parallel co-occurrence word comparing means for outputting only the word to the co-occurrence word determining means.

The first language of the first language system and the second language of the second language system, which is a parallel translation of the first language, are input, and the second language system suitable as a usage for the first language A device for outputting the second co-occurrence word of
First co-occurrence word extracting means for extracting a plurality of first co-occurrence words co-occurring with the first original word using a corpus dictionary of the first language system;
First bilingual co-occurrence word extraction means for extracting a second bilingual co-occurrence word that becomes a bilingual translation of the first co-occurrence word using a bilingual dictionary of the first language system;
Using a corpus dictionary of a second language system, first appearance frequency search means for searching for an appearance frequency for each word string combining the second original word and the second parallel co-occurrence word;
And a co-occurrence word determining means for determining a second parallel co-occurrence word in a word string having the highest appearance frequency as a second co-occurrence word suitable as a usage for the first original word.

Using the apparatus, the first original language of the first language system and the second original language of the second language system, which is a parallel translation of the first original language, are input and suitable as a usage for the first original language A method for outputting a second co-occurrence word of a second language system,
A first step of extracting a plurality of first co-occurrence words co-occurring with a first original word using a corpus dictionary of a first language system;
A second step of extracting a second parallel co-occurrence word that is a parallel translation of the first co-occurrence word using the parallel translation dictionary of the first language system;
A third step of searching for the appearance frequency for each word string combining the second original word and the second parallel co-occurrence word using a corpus dictionary of the second language system;
And a fourth step of determining the second parallel co-occurrence word in the word sequence having the highest appearance frequency as a second co-occurrence word suitable as a usage for the first original word.