JP6987003B2

JP6987003B2 - Text mining methods, text mining programs, and text mining equipment

Info

Publication number: JP6987003B2
Application number: JP2018052074A
Authority: JP
Inventors: 未希柿ノ木
Original assignee: Screen Holdings Co Ltd
Current assignee: Screen Holdings Co Ltd
Priority date: 2018-03-20
Filing date: 2018-03-20
Publication date: 2021-12-22
Anticipated expiration: 2038-03-20
Also published as: KR20190110428A; CN110309290B; CN110309290A; KR102162779B1; TWI703457B; JP2019164593A; TW201945958A

Description

本発明は、テキストマイニングに関し、特に、単語の共起ネットワークを含む画面を表示するテキストマイニング方法、テキストマイニングプログラム、および、テキストマイニング装置に関する。 The present invention relates to text mining, and more particularly to a text mining method for displaying a screen including a co-occurrence network of words, a text mining program, and a text mining device.

近年、自由記述されたテキストデータを分析し、分析結果から有用な情報を求めるテキストマイニングが注目されている。テキストマイニングでは、例えば、分析対象のテキストデータから単語を抽出し、単語の出現頻度や出現傾向などを解析することにより、情報を求める。 In recent years, text mining, which analyzes freely described text data and seeks useful information from the analysis results, has attracted attention. In text mining, for example, words are extracted from the text data to be analyzed, and information is obtained by analyzing the appearance frequency and appearance tendency of the words.

自由記述されたテキストデータを分析するときには、分析者は、初期段階では対象を主観的に選択するのではなく、テキストデータの全体像を把握する必要がある。このため、分析者は、テキストデータに含まれる単語の共起ネットワークを用いることがある。 When analyzing freely written text data, the analyst needs to get an overall picture of the text data rather than subjectively selecting the target at the initial stage. For this reason, the analyst may use a co-occurrence network of words contained in the text data.

図１９は、共起ネットワークの例を示す図である。共起ネットワークは、テキストデータから同じ文に含まれることが多い単語のペアを抽出し、その結果を無向グラフで表現したものである。分析対象のテキストデータにおいて単語Ｗａと単語Ｗｂが同じ文に含まれることが多い場合、共起ネットワークには、単語Ｗａに対応するノード、単語Ｗｂに対応するノード、および、両者を接続するエッジが含まれる。図１９に示す共起ネットワークは、「スタッフ」に対応するノード、「対応」に対応するノード、および、両者を接続するエッジを含んでいる。図１９に示す共起ネットワークを見れば、分析対象のテキストデータでは「スタッフ」と「対応」が同じ文に含まれることが多いことが分かる。 FIG. 19 is a diagram showing an example of a co-occurrence network. The co-occurrence network extracts a pair of words that are often included in the same sentence from text data, and expresses the result in an undirected graph. When the word Wa and the word Wb are often included in the same sentence in the text data to be analyzed, the co-occurrence network has a node corresponding to the word Wa, a node corresponding to the word Wb, and an edge connecting the two. included. The co-occurrence network shown in FIG. 19 includes a node corresponding to "staff", a node corresponding to "correspondence", and an edge connecting both. Looking at the co-occurrence network shown in FIG. 19, it can be seen that in the text data to be analyzed, "staff" and "correspondence" are often included in the same sentence.

一般に、共起ネットワークは、指定されたテキストデータの全体に基づき生成される。以下、このような共起ネットワークを「全体共起ネットワーク」という。分析者は、自分が立てた仮説や分析目的に応じて全体共起ネットワークから注目すべき単語（以下、注目語という）を複数個選択し、注目語を考慮して以降の分析を行う。 Generally, co-occurrence networks are generated based on the entire specified text data. Hereinafter, such a co-occurrence network is referred to as a "whole co-occurrence network". The analyst selects a plurality of notable words (hereinafter referred to as notable words) from the whole co-occurrence network according to the hypothesis and the analysis purpose set by the analyst, and performs the subsequent analysis in consideration of the notable words.

分析者は、注目語を選択するときに、選択した注目語が分析目的などに適しているか否かを判断するために、注目語を含む文の中で注目語がどのように使われているかを考察する。このため、分析者は、指定されたテキストデータのうち注目語を含む文からなるテキストデータ（以下、限定テキストデータという）に基づく共起ネットワークを用いることがある。なお、ここで言う「注目語を含む文」は、注目語を含む単一の文を意味する場合だけでなく、注目語を含む文を包含する段落など、ブロック単位に分割された複数の文（文の集合）を意味する場合がある。以下、このような共起ネットワークを「限定共起ネットワーク」という。分析者は、限定共起ネットワークを用いることにより、限定テキストデータの内容を把握することができる。分析者は、すべての注目語を選択するまで、全体共起ネットワークと限定共起ネットワークを繰り返し参照する。 When an analyst selects a attention word, how the attention word is used in a sentence containing the attention word in order to determine whether or not the selected attention word is suitable for an analysis purpose or the like. Consider. For this reason, the analyst may use a co-occurrence network based on text data (hereinafter referred to as limited text data) consisting of sentences including a word of interest in the designated text data. The "sentence including the attention word" here means not only a single sentence including the attention word but also a plurality of sentences divided into block units such as a paragraph containing the sentence including the attention word. May mean (a set of sentences). Hereinafter, such a co-occurrence network is referred to as a "limited co-occurrence network". The analyst can grasp the content of the limited text data by using the limited co-occurrence network. The analyst repeatedly references the whole co-occurrence network and the limited co-occurrence network until all the words of interest are selected.

以下、テキストデータに含まれる単語の共起ネットワークを生成し、生成した共起ネットワークを含む画面を表示するテキストマイニング装置について考える。特許文献１には、複数の文書のそれぞれについて全体共起ネットワークを生成し、生成した複数の全体共起ネットワークを含む画面を表示するドキュメントデータベース表示装置が記載されている。この表示装置は、複数の全体共起ネットワークの中から利用者が入力した単語を検索し、検索した単語を画面内で強調表示する。 Hereinafter, a text mining device that generates a co-occurrence network of words included in text data and displays a screen including the generated co-occurrence network will be considered. Patent Document 1 describes a document database display device that generates a total co-occurrence network for each of a plurality of documents and displays a screen including the generated total co-occurrence network. This display device searches for a word input by the user from a plurality of co-occurrence networks, and highlights the searched word on the screen.

特開平８−３１４９８０号公報Japanese Unexamined Patent Publication No. 8-314980

従来のテキストマイニング装置は、指定されたテキストデータの全体に基づき共起ネットワークを生成する。したがって、従来のテキストマイニング装置によれば、全体共起ネットワークを含む画面を容易に表示することができる。 A conventional text mining device creates a co-occurrence network based on the entire specified text data. Therefore, according to the conventional text mining device, the screen including the whole co-occurrence network can be easily displayed.

一方、従来のテキストマイニング装置を用いて限定共起ネットワークを含む画面を表示するときには、分析者は煩雑な操作を行う必要がある。具体的には、分析者は、全体共起ネットワークの中から１個の注目語を選択するたびに、指定されたテキストデータに基づき限定テキストデータを生成し、生成した限定テキストデータをテキストマイニング装置に与える必要がある。また、分析者は、注目語を選択するときに、全体共起ネットワークと限定共起ネットワークの両方を参照する。このため、テキストマイニング装置は、全体共起ネットワークの画像データと限定共起ネットワークの画像データの両方を保存する必要がある。しかし、多くの共起ネットワークを生成した場合、画像データの保存と管理が困難になる。 On the other hand, when displaying a screen including a limited co-occurrence network using a conventional text mining device, the analyst needs to perform complicated operations. Specifically, each time the analyst selects one noteworthy word from the whole co-occurrence network, the analyst generates limited text data based on the specified text data, and the generated limited text data is used as a text mining device. Need to be given to. Also, the analyst refers to both the global co-occurrence network and the limited co-occurrence network when selecting the term of interest. Therefore, the text mining device needs to store both the image data of the whole co-occurrence network and the image data of the limited co-occurrence network. However, when many co-occurrence networks are generated, it becomes difficult to store and manage image data.

それ故に、本発明は、注目語を指定したときの共起ネットワークを含む画面を簡単な操作で表示できるテキストマイニング方法、テキストマイニングプログラム、および、テキストマイニング装置を提供することを目的とする。 Therefore, it is an object of the present invention to provide a text mining method, a text mining program, and a text mining device capable of displaying a screen including a co-occurrence network when a word of interest is specified by a simple operation.

本発明の第１の局面は、テキストデータの分析結果を含む画面を表示するテキストマイニング方法であって、
テキストデータから単語を抽出するステップと、
前記単語について共起行列を生成するステップと、
前記共起行列に基づき共起ネットワークを生成するステップと、
前記共起ネットワークを含む画面を表示するステップとを備え、
指定されたテキストデータの全体に基づく第１共起ネットワークを含む第１画面内で注目語を指定する指示が入力されたときに、前記単語を抽出するステップは前記指定されたテキストデータのうち前記注目語を含む部分からなる限定テキストデータから前記単語を抽出し、前記共起行列を生成するステップは前記単語について前記限定テキストデータを用いて第２共起行列を生成し、前記共起ネットワークを生成するステップは前記第２共起行列に基づき第２共起ネットワークを生成し、前記画面を表示するステップは前記第２共起ネットワークを含む第２画面を表示することを特徴とする。 The first aspect of the present invention is a text mining method for displaying a screen including an analysis result of text data.
Steps to extract words from text data,
Steps to generate a co-occurrence matrix for the word,
Steps to generate a co-occurrence network based on the co-occurrence matrix,
A step of displaying a screen including the co-occurrence network is provided.
When an instruction to specify a word of interest is input in the first screen including the first co-occurrence network based on the entire specified text data, the step of extracting the word is described in the specified text data. In the step of extracting the word from the limited text data including the part including the word of interest and generating the co-occurrence matrix, a second co-occurrence matrix is generated for the word using the limited text data, and the co-occurrence network is generated. The generation step is characterized in that a second co-occurrence network is generated based on the second co-occurrence matrix, and the step of displaying the screen is characterized in displaying a second screen including the second co-occurrence network.

本発明の第２の局面は、本発明の第１の局面において、
前記第１画面内で前記第１共起ネットワークに含まれる１個または複数のノードを選択し、分析開始を選択することにより、前記ノードに対応する単語を前記注目語として指定する指示が入力されることを特徴とする。 The second aspect of the present invention is the first aspect of the present invention.
By selecting one or more nodes included in the first co-occurrence network in the first screen and selecting start analysis, an instruction to specify the word corresponding to the node as the attention word is input. It is characterized by that.

本発明の第３の局面は、本発明の第１の局面において、
前記第１画面内で前記第１共起ネットワークに含まれる１個のノードを続けて選択することにより、前記ノードに対応する単語を前記注目語として指定する指示が入力されることを特徴とする。 The third aspect of the present invention is the first aspect of the present invention.
By continuously selecting one node included in the first co-occurrence network in the first screen, an instruction for designating the word corresponding to the node as the attention word is input. ..

本発明の第４の局面は、本発明の第１の局面において、
前記第１画面内で前記第１共起ネットワークに含まれる１本のエッジを続けて選択することにより、前記エッジに接続された２個のノードに対応する単語を前記注目語として指定する指示が入力されることを特徴とする。 The fourth aspect of the present invention is the first aspect of the present invention.
By continuously selecting one edge included in the first co-occurrence network in the first screen, an instruction to specify a word corresponding to two nodes connected to the edge as the attention word is given. It is characterized by being input.

本発明の第５の局面は、本発明の第１の局面において、
前記第１画面内で前記第１共起ネットワークに含まれる１本または複数のエッジを選択し、分析開始を選択することにより、前記エッジに接続された複数のノードに対応する単語を前記注目語として指定する指示が入力されることを特徴とする。 The fifth aspect of the present invention is the first aspect of the present invention.
By selecting one or more edges included in the first co-occurrence network in the first screen and selecting start analysis, the word corresponding to the plurality of nodes connected to the edge is referred to as the attention word. It is characterized in that the instruction specified as is input.

本発明の第６の局面は、本発明の第１の局面において、
複数の第２共起ネットワークを含む第２画面内で併合指示が入力されたときに、前記画面を表示するステップは、前記複数の第２共起ネットワークをタブ形式で表示することを特徴とする。 The sixth aspect of the present invention is the first aspect of the present invention.
When a merge instruction is input in a second screen including a plurality of second co-occurrence networks, the step of displaying the screen is characterized in that the plurality of second co-occurrence networks are displayed in a tab format. ..

本発明の第７の局面は、本発明の第６の局面において、
前記第２画面内で一の第２共起ネットワークを掴んで他の第２共起ネットワーク内で離すことにより、前記併合指示が入力されることを特徴とする。 The seventh aspect of the present invention is the sixth aspect of the present invention.
It is characterized in that the merge instruction is input by grasping one second co-occurrence network in the second screen and separating it in another second co-occurrence network.

本発明の第８の局面は、本発明の第１の局面において、
前記限定テキストデータは、前記指定されたテキストデータのうち前記注目語を含む文からなることを特徴とする。 The eighth aspect of the present invention is the first aspect of the present invention.
The limited text data is characterized by consisting of a sentence including the word of interest in the designated text data.

本発明の第９の局面は、本発明の第８の局面において、
複数の注目語が指定されたときの前記限定テキストデータは、前記指定されたテキストデータのうち前記複数の注目語のすべてを含む文からなることを特徴とする。 The ninth aspect of the present invention is the eighth aspect of the present invention.
The limited text data when a plurality of attention words are designated is characterized by comprising a sentence including all of the plurality of attention words among the designated text data.

本発明の第１０の局面は、本発明の第８の局面において、
複数の注目語が指定されたときの前記限定テキストデータは、前記指定されたテキストデータのうち前記複数の注目語のいずれかを含む文からなることを特徴とする。 The tenth aspect of the present invention is the eighth aspect of the present invention.
The limited text data when a plurality of attention words are designated is characterized by comprising a sentence including any one of the plurality of attention words among the designated text data.

本発明の第１１の局面は、本発明の第１の局面において、
前記共起行列を生成するステップは、Ｊａｃｃａｒｄ係数を要素とする共起行列を生成することを特徴とする。 The eleventh aspect of the present invention is the first aspect of the present invention.
The step of generating the co-occurrence matrix is characterized in that a co-occurrence matrix having a Jaccard coefficient as an element is generated.

本発明の第１２の局面は、テキストデータの分析結果を含む画面を表示するためのテキストマイニングプログラムであって、
テキストデータから単語を抽出するステップと、
前記単語について共起行列を生成するステップと、
前記共起行列に基づき共起ネットワークを生成するステップと、
前記共起ネットワークを含む画面を表示するステップとをコンピュータにＣＰＵがメモリを利用して実行させ、
指定されたテキストデータの全体に基づく第１共起ネットワークを含む第１画面内で注目語を指定する指示が入力されたときに、前記単語を抽出するステップは前記指定されたテキストデータのうち前記注目語を含む部分からなる限定テキストデータから前記単語を抽出し、前記共起行列を生成するステップは前記単語について前記限定テキストデータを用いて第２共起行列を生成し、前記共起ネットワークを生成するステップは前記第２共起行列に基づき第２共起ネットワークを生成し、前記画面を表示するステップは前記第２共起ネットワークを含む第２画面を表示することを特徴とする。 The twelfth aspect of the present invention is a text mining program for displaying a screen including an analysis result of text data.
Steps to extract words from text data,
Steps to generate a co-occurrence matrix for the word,
Steps to generate a co-occurrence network based on the co-occurrence matrix,
The CPU causes the computer to execute the step of displaying the screen including the co-occurrence network by using the memory.
When an instruction to specify a word of interest is input in the first screen including the first co-occurrence network based on the entire specified text data, the step of extracting the word is described in the specified text data. In the step of extracting the word from the limited text data including the part including the word of interest and generating the co-occurrence matrix, a second co-occurrence matrix is generated for the word using the limited text data, and the co-occurrence network is generated. The generation step is characterized in that a second co-occurrence network is generated based on the second co-occurrence matrix, and the step of displaying the screen is characterized in displaying a second screen including the second co-occurrence network.

本発明の第１３の局面は、本発明の第１２の局面において、
前記第１画面内で前記第１共起ネットワークに含まれる１個または複数のノードを選択し、分析開始を選択することにより、前記ノードに対応する単語を前記注目語として指定する指示が入力されることを特徴とする。 The thirteenth aspect of the present invention is the twelfth aspect of the present invention.
By selecting one or more nodes included in the first co-occurrence network in the first screen and selecting start analysis, an instruction to specify the word corresponding to the node as the attention word is input. It is characterized by that.

本発明の第１４の局面は、本発明の第１２の局面において、
前記第１画面内で前記第１共起ネットワークに含まれる１個のノードを続けて選択することにより、前記ノードに対応する単語を前記注目語として指定する指示が入力されることを特徴とする。 The fourteenth aspect of the present invention is the twelfth aspect of the present invention.
By continuously selecting one node included in the first co-occurrence network in the first screen, an instruction for designating the word corresponding to the node as the attention word is input. ..

本発明の第１５の局面は、本発明の第１２の局面において、
前記第１画面内で前記第１共起ネットワークに含まれる１本のエッジを続けて選択することにより、前記エッジに接続された２個のノードに対応する単語を前記注目語として指定する指示が入力されることを特徴とする。 The fifteenth aspect of the present invention is the twelfth aspect of the present invention.
By continuously selecting one edge included in the first co-occurrence network in the first screen, an instruction to specify a word corresponding to two nodes connected to the edge as the attention word is given. It is characterized by being input.

本発明の第１６の局面は、本発明の第１２の局面において、
前記第１画面内で前記第１共起ネットワークに含まれる１本または複数のエッジを選択し、分析開始を選択することにより、前記エッジに接続された複数のノードに対応する単語を前記注目語として指定する指示が入力されることを特徴とする。 The sixteenth aspect of the present invention is the twelfth aspect of the present invention.
By selecting one or more edges included in the first co-occurrence network in the first screen and selecting start analysis, the word corresponding to the plurality of nodes connected to the edge is referred to as the attention word. It is characterized in that the instruction specified as is input.

本発明の第１７の局面は、本発明の第１２の局面において、
複数の第２共起ネットワークを含む第２画面内で併合指示が入力されたときに、前記画面を表示するステップは、前記複数の第２共起ネットワークをタブ形式で表示することを特徴とする。 The seventeenth aspect of the present invention is the twelfth aspect of the present invention.
When a merge instruction is input in a second screen including a plurality of second co-occurrence networks, the step of displaying the screen is characterized in that the plurality of second co-occurrence networks are displayed in a tab format. ..

本発明の第１８の局面は、本発明の第１７の局面において、
前記第２画面内で一の第２共起ネットワークを掴んで他の第２共起ネットワーク内で離すことにより、前記併合指示が入力されることを特徴とする。 The eighteenth aspect of the present invention is the seventeenth aspect of the present invention.
It is characterized in that the merge instruction is input by grasping one second co-occurrence network in the second screen and separating it in another second co-occurrence network.

本発明の第１９の局面は、テキストデータの分析結果を含む画面を表示するテキストマイニング装置であって、
テキストデータから単語を抽出する単語抽出部と、
前記単語について共起行列を生成する共起行列生成部と、
前記共起行列に基づき共起ネットワークを生成する共起ネットワーク生成部と、
前記共起ネットワークを含む画面を表示する画面表示部とを備え、
指定されたテキストデータの全体に基づく第１共起ネットワークを含む第１画面内で注目語を指定する指示が入力されたときに、前記単語抽出部は前記指定されたテキストデータのうち前記注目語を含む部分からなる限定テキストデータから前記単語を抽出し、前記共起行列生成部は前記単語について前記限定テキストデータを用いて第２共起行列を生成し、前記共起ネットワーク生成部は前記第２共起行列に基づき第２共起ネットワークを生成し、前記画面表示部は前記第２共起ネットワークを含む第２画面を表示することを特徴とする。 A nineteenth aspect of the present invention is a text mining device that displays a screen containing analysis results of text data.
A word extractor that extracts words from text data,
A co-occurrence matrix generator that generates a co-occurrence matrix for the word,
A co-occurrence network generation unit that generates a co-occurrence network based on the co-occurrence matrix,
It is provided with a screen display unit that displays a screen including the co-occurrence network.
When an instruction to specify a attention word is input in the first screen including the first co-occurrence network based on the entire specified text data, the word extraction unit uses the attention word in the specified text data. The word is extracted from the limited text data including the portion including, the co-occurrence matrix generation unit generates a second co-occurrence matrix using the limited text data for the word, and the co-occurrence network generation unit generates the first. A second co-occurrence network is generated based on the two co-occurrence matrix, and the screen display unit displays a second screen including the second co-occurrence network.

本発明の第２０の局面は、本発明の第１９の局面において、
複数の第２共起ネットワークを含む第２画面内で併合指示が入力されたときに、前記画面表示部は、前記複数の第２共起ネットワークをタブ形式で表示することを特徴とする。 The twentieth aspect of the present invention is the nineteenth aspect of the present invention.
When a merge instruction is input in a second screen including a plurality of second co-occurrence networks, the screen display unit is characterized in that the plurality of second co-occurrence networks are displayed in a tab format.

上記第１、第１２または第１９の局面によれば、指定されたテキストデータの全体に基づく第１共起ネットワークを含む第１画面内で注目語を指定する指示が入力されたときに、指定されたテキストデータのうち注目語を含む部分に基づく第２共起ネットワークを含む第２画面が表示される。したがって、注目語を指定したときの共起ネットワークを含む画面を簡単な操作で表示することができる。 According to the first, twelfth or nineteenth aspect described above, when an instruction to specify a word of interest is input in the first screen including the first co-occurrence network based on the entire specified text data, the designation is made. A second screen including a second co-occurrence network based on the portion of the text data that contains the word of interest is displayed. Therefore, the screen including the co-occurrence network when the attention word is specified can be displayed by a simple operation.

上記第２または第１３の局面によれば、第１画面内で１個または複数のノードと分析開始を選択することにより、１個または複数の注目語を指定する指示を簡単な操作で入力し、１個または複数の注目語を指定したときの共起ネットワークを含む画面を表示することができる。 According to the second or thirteenth aspect, the instruction to specify one or more attention words is input by a simple operation by selecting one or more nodes and the analysis start in the first screen. It is possible to display a screen including a co-occurrence network when one or more attention words are specified.

上記第３または第１４の局面によれば、第１画面内で１個のノードを続けて選択することにより、１個の注目語を指定する指示を簡単な操作で入力し、１個の注目語を指定したときの共起ネットワークを含む画面を表示することができる。 According to the third or fourteenth aspect, by continuously selecting one node in the first screen, an instruction to specify one attention word is input by a simple operation, and one attention is given. A screen containing the co-occurrence network when a word is specified can be displayed.

上記第４または第１５の局面によれば、第１画面内で１本のエッジを続けて選択することにより、２個の注目語を指定する指示を簡単な操作で入力し、２個の注目語を指定したときの共起ネットワークを含む画面を表示することができる。 According to the fourth or fifteenth aspect, by continuously selecting one edge in the first screen, an instruction to specify two attention words is input by a simple operation, and two attentions are given. A screen containing the co-occurrence network when a word is specified can be displayed.

上記第５または第１６の局面によれば、第１画面内で１本または複数のエッジと分析開始を選択することにより、複数の注目語を指定する指示を簡単な操作で入力し、複数の注目語を指定したときの共起ネットワークを含む画面を表示することができる。 According to the fifth or sixteenth aspect, by selecting one or more edges and the start of analysis in the first screen, instructions for designating a plurality of attention words can be input by a simple operation, and a plurality of edges can be input. It is possible to display a screen including the co-occurrence network when the attention word is specified.

上記第６、第１７または第２０の局面によれば、併合指示が入力されたときに複数の第２共起ネットワークをタブ形式で表示することにより、複数の第２共起ネットワークをコンパクトに表示することができる。 According to the sixth, 17th, or twentieth aspect, the plurality of second co-occurrence networks are displayed compactly by displaying the plurality of second co-occurrence networks in a tab format when the merge instruction is input. can do.

上記第７または第１８の局面によれば、第２画面内で第２共起ネットワークを掴んで離すことにより、併合指示を簡単な操作で入力し、複数の第２共起ネットワークをコンパクトに表示することができる。 According to the seventh or eighteenth aspect, by grasping and releasing the second co-occurrence network in the second screen, a merge instruction can be input with a simple operation, and a plurality of second co-occurrence networks can be displayed compactly. can do.

上記第８の局面によれば、注目語を指定する指示が入力されたときに、指定されたテキストデータを文単位で分けて限定テキストデータを求め、求めた限定テキストデータに基づく第２共起ネットワークを含む画面を表示することができる。 According to the eighth aspect, when the instruction to specify the attention word is input, the specified text data is divided into sentence units to obtain the limited text data, and the second co-occurrence based on the obtained limited text data. A screen including the network can be displayed.

上記第９または第１０の局面によれば、複数の注目語についてＡＮＤ処理またはＯＲ処理を行ったときの第２共起ネットワークを含む画面を表示することができる。 According to the ninth or tenth aspect, it is possible to display a screen including the second co-occurrence network when AND processing or OR processing is performed on a plurality of notable words.

上記第１１の局面によれば、Ｊａｃｃａｒｄ係数を要素とする共起行列を生成することにより、テキストデータに含まれる単語の共起性を好適に分析することができる。 According to the eleventh aspect, the co-occurrence of words contained in the text data can be suitably analyzed by generating a co-occurrence matrix having a Jaccard coefficient as an element.

本発明の実施形態に係るテキストマイニング装置の構成を示すブロック図である。It is a block diagram which shows the structure of the text mining apparatus which concerns on embodiment of this invention. 図１に示すテキストマイニング装置として機能するコンピュータの構成を示すブロック図である。It is a block diagram which shows the structure of the computer which functions as the text mining apparatus shown in FIG. 図１に示すテキストマイニング装置の動作を示すフローチャートである。It is a flowchart which shows the operation of the text mining apparatus shown in FIG. 図１に示すテキストマイニング装置で生成される共起行列の例を示す図である。It is a figure which shows the example of the co-occurrence matrix generated by the text mining apparatus shown in FIG. 図１に示すテキストマイニング装置が表示する全体共起ネットワークを含むウインドウの例を示す図である。It is a figure which shows the example of the window including the whole co-occurrence network displayed by the text mining apparatus shown in FIG. 1. 図５に示すウインドウ内で注目語を指定する第１の操作を示す図である。It is a figure which shows the 1st operation which specifies the attention word in the window shown in FIG. 図５に示すウインドウ内で注目語を指定する第２の操作を示す図である。It is a figure which shows the 2nd operation which specifies the attention word in the window shown in FIG. 図５に示すウインドウ内で注目語を指定する第３の操作を示す図である。It is a figure which shows the 3rd operation which specifies the attention word in the window shown in FIG. 図５に示すウインドウ内で注目語を指定する第４の操作を示す図である。It is a figure which shows the 4th operation which specifies the attention word in the window shown in FIG. 図５に示すウインドウ内で注目語を指定する第５の操作を示す図である。It is a figure which shows the 5th operation which specifies the attention word in the window shown in FIG. 図５に示すウインドウ内で注目語を指定する第６の操作を示す図である。It is a figure which shows the 6th operation which specifies the attention word in the window shown in FIG. 図１に示すテキストマイニング装置が表示する限定共起ネットワークを含むウインドウの例を示す図である。It is a figure which shows the example of the window including the limited co-occurrence network displayed by the text mining apparatus shown in FIG. 1. 図１に示すテキストマイニング装置が表示する限定共起ネットワークを含むウインドウの例を示す図である。It is a figure which shows the example of the window including the limited co-occurrence network displayed by the text mining apparatus shown in FIG. 1. 図１に示すテキストマイニング装置の表示画面の例を示す図である。It is a figure which shows the example of the display screen of the text mining apparatus shown in FIG. 図１に示すテキストマイニング装置の表示画面の例を示す図である。It is a figure which shows the example of the display screen of the text mining apparatus shown in FIG. 図１に示すテキストマイニング装置の表示画面の例を示す図である。It is a figure which shows the example of the display screen of the text mining apparatus shown in FIG. 図１に示すテキストマイニング装置におけるウインドウを併合する操作を示す図である。It is a figure which shows the operation of merging windows in the text mining apparatus shown in FIG. 図１７に示す操作を行った後の表示画面を示す図である。It is a figure which shows the display screen after performing the operation shown in FIG. 共起ネットワークの例を示す図である。It is a figure which shows the example of the co-occurrence network.

以下、図面を参照して、本発明の実施形態に係るテキストマイニング方法、テキストマイニングプログラム、および、テキストマイニング装置について説明する。本実施形態に係るテキストマイニング方法は、典型的にはコンピュータを用いて実行される。本実施形態に係るテキストマイニングプログラムは、コンピュータを用いてテキストマイニング方法を実施するためのプログラムである。本実施形態に係るテキストマイニング装置は、典型的にはコンピュータを用いて構成される。テキストマイニングプログラムを実行するコンピュータは、テキストマイニング装置として機能する。 Hereinafter, the text mining method, the text mining program, and the text mining apparatus according to the embodiment of the present invention will be described with reference to the drawings. The text mining method according to the present embodiment is typically performed using a computer. The text mining program according to the present embodiment is a program for implementing a text mining method using a computer. The text mining device according to the present embodiment is typically configured by using a computer. The computer that executes the text mining program functions as a text mining device.

図１は、本発明の実施形態に係るテキストマイニング装置の構成を示すブロック図である。図１に示すテキストマイニング装置１０は、指示入力部１１、テキストデータ記憶部１２、単語抽出部１３、共起行列生成部１４、共起ネットワーク生成部１５、および、画面表示部１６を備えている。テキストマイニング装置１０は、テキストデータ記憶部１２に記憶されたテキストデータに基づきテキストデータの分析結果として共起ネットワークを生成し、生成した共起ネットワークを含む画面を表示する。 FIG. 1 is a block diagram showing a configuration of a text mining device according to an embodiment of the present invention. The text mining device 10 shown in FIG. 1 includes an instruction input unit 11, a text data storage unit 12, a word extraction unit 13, a co-occurrence matrix generation unit 14, a co-occurrence network generation unit 15, and a screen display unit 16. .. The text mining device 10 generates a co-occurrence network as an analysis result of the text data based on the text data stored in the text data storage unit 12, and displays a screen including the generated co-occurrence network.

テキストマイニング装置１０の動作の概要は、以下のとおりである。指示入力部１１には、利用者（テキストデータの分析者）からの指示が入力される。テキストデータ記憶部１２は、自由記述された１以上のテキストデータを記憶している。単語抽出部１３は、テキストデータ記憶部１２から指定されたテキストデータを読み出し、読み出したテキストデータに対して形態素解析を行うことにより、テキストデータから単語を抽出する。共起行列生成部１４は、単語抽出部１３で抽出された単語について共起行列を生成する。共起ネットワーク生成部１５は、共起行列生成部１４で生成された共起行列に基づき共起ネットワークを生成する。画面表示部１６は、共起ネットワーク生成部１５で生成された共起ネットワークを含む画面を表示する。 The outline of the operation of the text mining device 10 is as follows. Instructions from the user (analyzer of text data) are input to the instruction input unit 11. The text data storage unit 12 stores one or more freely described text data. The word extraction unit 13 reads out the text data designated from the text data storage unit 12, and performs morphological analysis on the read text data to extract words from the text data. The co-occurrence matrix generation unit 14 generates a co-occurrence matrix for the words extracted by the word extraction unit 13. The co-occurrence network generation unit 15 generates a co-occurrence network based on the co-occurrence matrix generated by the co-occurrence matrix generation unit 14. The screen display unit 16 displays a screen including the co-occurrence network generated by the co-occurrence network generation unit 15.

利用者は、指示入力部１１を用いて、分析対象のテキストデータを指定する指示、注目語を指定する指示などを入力する。単語抽出部１３、共起ネットワーク生成部１５、および、画面表示部１６は、利用者からの指示に従い、共起ネットワークを含む画面を表示するための動作を行う。テキストデータを指定する指示が入力されたときには、指定されたテキストデータの全体に基づく全体共起ネットワークが生成され、全体共起ネットワークを含む画面が表示される。全体共起ネットワークを含む画面内で注目語を指定する指示が入力されたときには、指定されたテキストデータのうち注目語を含む文に基づく限定共起ネットワークが生成され、限定共起ネットワークを含む画面が表示される。 The user inputs an instruction for designating the text data to be analyzed, an instruction for designating a word of interest, and the like by using the instruction input unit 11. The word extraction unit 13, the co-occurrence network generation unit 15, and the screen display unit 16 perform an operation for displaying a screen including the co-occurrence network according to an instruction from the user. When the instruction to specify the text data is input, the whole co-occurrence network based on the whole of the specified text data is generated, and the screen including the whole co-occurrence network is displayed. When an instruction to specify a noteworthy word is input in the screen including the whole co-occurrence network, a limited co-occurrence network based on the sentence including the noteworthy word in the specified text data is generated, and the screen including the limited co-occurrence network is generated. Is displayed.

図２は、テキストマイニング装置１０として機能するコンピュータの構成を示すブロック図である。図２に示すコンピュータ２０は、ＣＰＵ２１、メインメモリ２２、記憶部２３、入力部２４、表示部２５、通信部２６、および、記録媒体読み取り部２７を備えている。メインメモリ２２には、例えば、ＤＲＡＭが使用される。記憶部２３には、例えば、ハードディスクやソリッドステートドライブが使用される。入力部２４には、例えば、キーボード２８やマウス２９が含まれる。表示部２５には、例えば、液晶ディスプレイが使用される。通信部２６は、有線通信または無線通信のインターフェイス回路である。記録媒体読み取り部２７は、プログラムなどを記憶した記録媒体３０のインターフェイス回路である。記録媒体３０には、例えば、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、ＵＳＢメモリなどの非一過性の記録媒体が使用される。 FIG. 2 is a block diagram showing a configuration of a computer functioning as a text mining device 10. The computer 20 shown in FIG. 2 includes a CPU 21, a main memory 22, a storage unit 23, an input unit 24, a display unit 25, a communication unit 26, and a recording medium reading unit 27. For the main memory 22, for example, DRAM is used. For example, a hard disk or a solid state drive is used for the storage unit 23. The input unit 24 includes, for example, a keyboard 28 and a mouse 29. For the display unit 25, for example, a liquid crystal display is used. The communication unit 26 is an interface circuit for wired communication or wireless communication. The recording medium reading unit 27 is an interface circuit of the recording medium 30 that stores a program or the like. As the recording medium 30, for example, a non-transient recording medium such as a CD-ROM, a DVD-ROM, or a USB memory is used.

コンピュータ２０がテキストマイニングプログラム３１を実行する場合、記憶部２３は、テキストマイニングプログラム３１とテキストデータ３２を記憶する。テキストマイニングプログラム３１とテキストデータ３２は、例えば、サーバや他のコンピュータから通信部２６を用いて受信したものでもよく、記録媒体３０から記録媒体読み取り部２７を用いて読み出したものでもよい。 When the computer 20 executes the text mining program 31, the storage unit 23 stores the text mining program 31 and the text data 32. The text mining program 31 and the text data 32 may be, for example, those received from a server or another computer using the communication unit 26, or those read from the recording medium 30 using the recording medium reading unit 27.

テキストマイニングプログラム３１を実行するときには、テキストマイニングプログラム３１とテキストデータ３２はメインメモリ２２に複写転送される。ＣＰＵ２１は、メインメモリ２２を作業用メモリとして利用して、メインメモリ２２に記憶されたテキストマイニングプログラム３１を実行することにより、テキストデータ３２から単語を抽出する処理、抽出した単語について共起行列を生成する処理、生成した共起行列に基づき共起ネットワークを生成する処理、生成した共起ネットワークを含む画面を表示する処理などを行う。このときコンピュータ２０は、テキストマイニング装置１０として機能する。なお、以上に述べたコンピュータ２０の構成は一例に過ぎず、任意のコンピュータを用いてテキストマイニング装置１０を構成することができる。 When the text mining program 31 is executed, the text mining program 31 and the text data 32 are copied and transferred to the main memory 22. The CPU 21 uses the main memory 22 as a working memory, executes a text mining program 31 stored in the main memory 22, processes to extract words from the text data 32, and creates a co-occurrence matrix for the extracted words. The process of generating, the process of generating a co-occurrence network based on the generated co-occurrence matrix, the process of displaying the screen including the generated co-occurrence network, and the like are performed. At this time, the computer 20 functions as a text mining device 10. The configuration of the computer 20 described above is only an example, and the text mining device 10 can be configured by using any computer.

図３は、テキストマイニング装置１０の動作を示すフローチャートである。図３に示す動作を行う前に、テキストデータ記憶部１２は自由記述された１以上のテキストデータを記憶している。各テキストデータは、複数の文を含んでいる。テキストマイニング装置１０は、テキストデータ記憶部１２に記憶されたテキストデータのうちで利用者が指定したテキストデータに対して処理を行う。 FIG. 3 is a flowchart showing the operation of the text mining device 10. Before performing the operation shown in FIG. 3, the text data storage unit 12 stores one or more freely described text data. Each text data contains a plurality of sentences. The text mining device 10 processes the text data designated by the user among the text data stored in the text data storage unit 12.

図３において、指示入力部１１は、まず利用者からテキストデータを指定する指示を受け取る（ステップＳ１０１）。このとき、指示入力部１１は、テキストデータを指定する指示に加えて、共起行列の基準値（詳細は後述）を設定する指示、ＡＮＤ処理とＯＲ処理（詳細は後述）を切り替える指示、共起ネットワークの表示態様の詳細を設定する指示などを受け取ってもよい。受け取った指示は、テキストマイニング装置１０の各部に対して出力される。 In FIG. 3, the instruction input unit 11 first receives an instruction to specify text data from the user (step S101). At this time, in addition to the instruction for designating the text data, the instruction input unit 11 includes an instruction for setting a reference value of the co-occurrence matrix (details will be described later), an instruction for switching between AND processing and OR processing (details will be described later), and co-occurrence. You may receive an instruction to set the details of the display mode of the origin network. The received instruction is output to each part of the text mining device 10.

次に、単語抽出部１３は、テキストデータ記憶部１２から指定されたテキストデータを読み出す（ステップＳ１０２）。次に、単語抽出部１３は、ステップＳ１０２で読み出したテキストデータに対して形態素解析を行うことにより、読み出したテキストデータから単語を抽出する（ステップＳ１０３）。このとき、単語抽出部１３は、読み出したテキストデータから、後の分析で必要となる単語だけを抽出する。次に、共起行列生成部１４は、ステップＳ１０３で抽出された単語について、ステップＳ１０２で読み出されたテキストデータを用いて共起行列を生成する（ステップＳ１０４）。 Next, the word extraction unit 13 reads out the designated text data from the text data storage unit 12 (step S102). Next, the word extraction unit 13 extracts words from the read text data by performing morphological analysis on the text data read in step S102 (step S103). At this time, the word extraction unit 13 extracts only the words necessary for the later analysis from the read text data. Next, the co-occurrence matrix generation unit 14 generates a co-occurrence matrix for the words extracted in step S103 using the text data read in step S102 (step S104).

図４は、共起行列生成部１４で生成された共起行列の例を示す図である。共起行列の要素は、単語のペアについて求めたＪａｃｃａｒｄ係数である。分析対象のテキストデータについて、単語Ｗａを含む文の集合をＡ、単語Ｗｂを含む文の集合をＢとする。単語のペア（Ｗａ，Ｗｂ）についてのＪａｃｃａｒｄ係数Ｋ（Ｗａ，Ｗｂ）は、次式（１）で与えられる。
Ｋ（Ｗａ，Ｗｂ）＝｜Ａ∩Ｂ｜／｜Ａ∪Ｂ｜ …（１）
ただし、式（１）において、記号∩は積集合を求める演算を表し、記号∪は和集合を求める演算を表し、｜Ｓ｜は集合Ｓに含まれる要素の個数を表す。 FIG. 4 is a diagram showing an example of a co-occurrence matrix generated by the co-occurrence matrix generation unit 14. The element of the co-occurrence matrix is the Jaccard index obtained for a pair of words. For the text data to be analyzed, let A be a set of sentences including the word Wa, and let B be a set of sentences including the word Wb. The Jaccard index K (Wa, Wb) for a word pair (Wa, Wb) is given by the following equation (1).
K (Wa, Wb) = | A∩B | / | A∪B |… (1)
However, in the equation (1), the symbol ∩ represents an operation for obtaining an intersection, the symbol ∪ represents an operation for obtaining a union, and | S | represents the number of elements included in the set S.

共起行列生成部１４は、ステップＳ１０４において、ステップＳ１０２で読み出されたテキストデータの全体から抽出された単語のペアのすべてについてＪａｃｃａｒｄ係数を求め、求めたＪａｃｃａｒｄ係数を要素とする共起行列を生成する。共起行列の行および列は、ステップＳ１０２で読み出されたテキストデータの全体から抽出された単語の種類に対応する。読み出されたテキストデータの全体からｎ種類の単語が抽出されたとき、ステップＳ１０４で生成される共起行列は、対角要素がすべて１であるｎ行ｎ列の対称行列である。 In step S104, the co-occurrence matrix generation unit 14 obtains a Jaccard index for all the word pairs extracted from the entire text data read in step S102, and obtains a co-occurrence matrix having the obtained Jaccard coefficient as an element. Generate. The rows and columns of the co-occurrence matrix correspond to the types of words extracted from the entire text data read in step S102. When n kinds of words are extracted from the whole read text data, the co-occurrence matrix generated in step S104 is an n-row, n-column symmetric matrix having all 1 diagonal elements.

なお、共起行列生成部１４は、テキストデータを文以外の単位で分けてＪａｃｃａｒｄ係数を求めてもよい。例えば、共起行列生成部１４は、単語Ｗａを含む段落の集合をＡ、単語Ｗｂを含む段落の集合をＢとして、式（１）に従いＪａｃｃａｒｄ係数を求めてもよい。また、テキストデータに含まれる文が日付を有する場合には、共起行列生成部１４は、テキストデータを同じ日付を有する文からなる複数の部分に分け、単語Ｗａを含む部分の集合をＡ、単語Ｗｂを含む部分の集合をＢとして、式（１）に従いＪａｃｃａｒｄ係数を求めてもよい。また、共起行列生成部１４は、単語の共起性を示す他の値（例えば、Ｓｉｍｐｓｏｎ係数やコサイン距離など）を要素として含む共起行列を生成してもよい。 The co-occurrence matrix generation unit 14 may divide the text data into units other than sentences to obtain the Jaccard index. For example, the co-occurrence matrix generation unit 14 may obtain the Jaccard index according to the equation (1), where A is a set of paragraphs containing the word Wa and B is a set of paragraphs containing the word Wb. When the sentence included in the text data has a date, the co-occurrence matrix generation unit 14 divides the text data into a plurality of parts consisting of sentences having the same date, and sets a set of parts including the word Wa as A. The Jaccard index may be obtained according to the equation (1), where B is a set of parts including the word Wb. Further, the co-occurrence matrix generation unit 14 may generate a co-occurrence matrix including other values indicating the co-occurrence of words (for example, Simpson coefficient, cosine distance, etc.) as elements.

次に、共起ネットワーク生成部１５は、ステップＳ１０４で生成された共起行列に基づき、全体共起ネットワークを生成する（ステップＳ１０５）。次に、画面表示部１６は、ステップＳ１０５で生成された全体共起ネットワークを含む画面を表示する（ステップＳ１０６）。図５は、ステップＳ１０６で表示される、全体共起ネットワークを含むウインドウの例を示す図である。図５に示すウインドウ４１は、全体共起ネットワーク５１と分析ボタン６１を含んでいる。分析ボタン６１は、分析開始を指示するために設けられる。 Next, the co-occurrence network generation unit 15 generates an entire co-occurrence network based on the co-occurrence matrix generated in step S104 (step S105). Next, the screen display unit 16 displays a screen including the entire co-occurrence network generated in step S105 (step S106). FIG. 5 is a diagram showing an example of a window including a whole co-occurrence network displayed in step S106. The window 41 shown in FIG. 5 includes a whole co-occurrence network 51 and an analysis button 61. The analysis button 61 is provided to instruct the start of analysis.

共起ネットワーク生成部１５は、共起行列の基準値（以下、Ｖとする）を有している。基準値Ｖは、予め決定された値でもよく、指示入力部１１を用いて利用者から設定された値でもよい。ステップＳ１０４で生成された共起行列において、単語Ｗａに対応する行に含まれるＪａｃｃａｒｄ係数Ｋ（Ｗａ，＊）の最大値が基準値Ｖ以上である場合、共起ネットワーク生成部１５は単語Ｗａに対応するノード（単語Ｗａと記載したノード）を全体共起ネットワークに含める。また、ステップＳ１０４で生成された共起行列において、単語のペア（Ｗａ，Ｗｂ）に係るＪａｃｃａｒｄ係数Ｋ（Ｗａ，Ｗｂ）が基準値Ｖ以上である場合、共起ネットワーク生成部１５は単語Ｗａに対応するノードと単語Ｗｂに対応するノードとを接続するエッジを全体共起ネットワークに含める。 The co-occurrence network generation unit 15 has a reference value (hereinafter referred to as V) of the co-occurrence matrix. The reference value V may be a predetermined value or a value set by the user using the instruction input unit 11. In the co-occurrence matrix generated in step S104, when the maximum value of the Jaccard coefficient K (Wa, *) included in the row corresponding to the word Wa is equal to or larger than the reference value V, the co-occurrence network generation unit 15 determines the word Wa. Include the corresponding node (the node described as the word Wa) in the global co-occurrence network. Further, in the co-occurrence matrix generated in step S104, when the Jaccard coefficient K (Wa, Wb) related to the word pair (Wa, Wb) is equal to or more than the reference value V, the co-occurrence network generation unit 15 is assigned to the word Wa. The edge connecting the corresponding node and the node corresponding to the word Wb is included in the whole co-occurrence network.

図５に示す全体共起ネットワーク５１では、出現頻度が大きい単語に対応するノードは大きく表示されている。共起ネットワークを含む画面を表示するときに、Ｊａｃｃａｒｄ係数Ｋ（Ｗａ，Ｗｂ）が大きいときに、単語Ｗａに対応するノードと単語Ｗｂに対応するノードとを接続するエッジを太く表示してもよい。また、Ｊａｃｃａｒｄ係数に応じて、エッジの色を切り替えてもよく、エッジの太さと色の両方を切り替えてもよい。共起ネットワークは、エッジを介して到達可能な複数の部分に分けられる。共起ネットワークを含む画面を表示するときに、各部分に含まれる複数のノードを各部分に割り当てた色で表示してもよい。なお、共起ネットワークに含まれるノードとエッジの位置に意味はない。 In the general co-occurrence network 51 shown in FIG. 5, the nodes corresponding to the words having a high frequency of appearance are displayed in large size. When displaying a screen including a co-occurrence network, when the Jaccard index K (Wa, Wb) is large, the edge connecting the node corresponding to the word Wa and the node corresponding to the word Wb may be displayed thickly. .. Further, the color of the edge may be switched according to the Jaccard index, or both the thickness and the color of the edge may be switched. The co-occurrence network is divided into multiple parts that can be reached via the edge. When displaying the screen including the co-occurrence network, a plurality of nodes included in each part may be displayed in the color assigned to each part. The positions of the nodes and edges included in the co-occurrence network are meaningless.

次に、指示入力部１１は、利用者から注目語を指定する指示を受け取る（ステップＳ１１１）。ステップＳ１１１を実行するときには、全体共起ネットワークを含む画面が表示されている。利用者は、マウス２９を操作して、全体共起ネットワークの要素を選択することにより、注目語を指定する指示を入力する。なお、利用者は、指示を入力するときに、マウス２９に代えてキーボード２８を用いてもよく、表示画面に直接触れるなどの操作を行ってもよい。以下、ステップＳ１１１を実行するときに、図５に示すウインドウ４１を含む画面が表示されているとする。 Next, the instruction input unit 11 receives an instruction to specify a word of interest from the user (step S111). When step S111 is executed, a screen including the whole co-occurrence network is displayed. The user inputs an instruction to specify a word of interest by operating the mouse 29 and selecting an element of the whole co-occurrence network. When inputting an instruction, the user may use the keyboard 28 instead of the mouse 29, or may perform an operation such as directly touching the display screen. Hereinafter, it is assumed that the screen including the window 41 shown in FIG. 5 is displayed when the step S111 is executed.

図６〜図１１は、それぞれ、ウインドウ４１内で注目語を指定する第１〜第６の操作を示す図である。図６〜図１１において、吹き出しは操作の手順を示し、白い矢印はマウスカーソル６２の移動を示す。吹き出しおよび矢印は、実際の画面には表示されない。以下、マウスカーソル６２が表示画面内のある要素の上にあるときにマウス２９のボタンをクリック（ダブルクリック）することを「要素をクリック（ダブルクリック）する」という。 6 to 11 are diagrams showing the first to sixth operations for designating a word of interest in the window 41, respectively. In FIGS. 6 to 11, the balloon indicates the operation procedure, and the white arrow indicates the movement of the mouse cursor 62. Callouts and arrows do not appear on the actual screen. Hereinafter, clicking (double-clicking) the button of the mouse 29 when the mouse cursor 62 is on a certain element in the display screen is referred to as "clicking (double-clicking) the element".

図６に示すように、利用者は、ウインドウ４１内でまず注目語として指定する単語（ここでは「露天風呂」）に対応するノードをクリックし（１回目のクリック）、次に分析ボタン６１をクリックする（２回目のクリック）。この操作により、１回目にクリックされたノードに対応する単語が注目語として指定される。このように全体共起ネットワークを含む画面内で全体共起ネットワークに含まれる１個のノードを選択し、分析開始を選択することにより、１個の注目語を指定する指示が入力される。 As shown in FIG. 6, the user first clicks the node corresponding to the word (here, “open-air bath”) specified as the word of interest in the window 41 (first click), and then clicks the analysis button 61. Click (second click). By this operation, the word corresponding to the node clicked for the first time is designated as the word of interest. In this way, by selecting one node included in the whole co-occurrence network in the screen including the whole co-occurrence network and selecting the start of analysis, an instruction to specify one attention word is input.

図７に示すように、利用者は、ウインドウ４１内で注目語として指定する単語（ここでは「露天風呂」）に対応するノードをダブルクリックする。この操作により、ダブルクリックされたノードに対応する単語が注目語として指定される。このように全体共起ネットワークを含む画面内で全体共起ネットワークに含まれる１個のノードを続けて選択することにより、１個の注目語を指定する指示が入力される。 As shown in FIG. 7, the user double-clicks the node corresponding to the word designated as the attention word (here, “open-air bath”) in the window 41. By this operation, the word corresponding to the double-clicked node is designated as the attention word. In this way, by continuously selecting one node included in the whole co-occurrence network in the screen including the whole co-occurrence network, an instruction to specify one attention word is input.

図８に示すように、利用者は、ウインドウ４１内でまず注目語として指定する単語（ここでは「露天風呂」）に対応するノードをクリックし（１回目のクリック）、次に注目語として指定する別の単語（ここでは「値段」）に対応するノードをクリックし（２回目のクリック）、最後に分析ボタン６１をクリックする（最後のクリック）。この操作により、１回目と２回目にクリックされたノードに対応する２個の単語が注目語として指定される。利用者は、ウインドウ４１内でｐ個（ｐは３以上の整数）のノードを順にクリックし、最後に分析ボタン６１をクリックしてもよい。この操作により、ｐ個のノードに対応するｐ個の単語が注目語として指定される。このように全体共起ネットワークを含む画面内で全体共起ネットワークに含まれる複数のノードを選択し、分析開始を選択することにより、複数の注目語を指定する指示が入力される。 As shown in FIG. 8, the user first clicks the node corresponding to the word designated as the attention word (here, “open-air bath”) in the window 41 (first click), and then designates it as the attention word. Click the node corresponding to another word (here, "price") (second click), and finally click the analysis button 61 (last click). By this operation, the two words corresponding to the nodes clicked the first time and the second time are designated as the words of interest. The user may click p nodes (p is an integer of 3 or more) in order in the window 41, and finally click the analysis button 61. By this operation, p words corresponding to p nodes are designated as attention words. In this way, by selecting a plurality of nodes included in the global co-occurrence network in the screen including the global co-occurrence network and selecting start analysis, instructions for designating a plurality of attention words are input.

図９に示すように、利用者は、ウインドウ４１内で注目語として指定する２個の単語（ここでは「露天風呂」と「階段」）に対応する２個のノードを接続するエッジをダブルクリックする。これにより、ダブルクリックされたエッジに接続された２個のノードに対応する２個の単語が注目語として指定される。このように全体共起ネットワークを含む画面内で全体共起ネットワークに含まれる１個のエッジを続けて選択することにより、２個の注目語を指定する指示が入力される。 As shown in FIG. 9, the user double-clicks the edge connecting the two nodes corresponding to the two words (here, "open-air bath" and "stairs") specified as the words of interest in the window 41. do. As a result, the two words corresponding to the two nodes connected to the double-clicked edge are designated as the words of interest. In this way, by continuously selecting one edge included in the whole co-occurrence network in the screen including the whole co-occurrence network, an instruction to specify two attention words is input.

図１０に示すように、利用者は、ウインドウ４１内でまず注目語として指定する２個の単語（ここでは「露天風呂」と「階段」）に対応する２個のノードを接続するエッジをクリックし（１回目のクリック）、次に分析ボタン６１をクリックする（２回目のクリック）。これにより、１回目にクリックされたエッジに接続された２個のノードに対応する２個の単語が注目語として指定される。このように全体共起ネットワークを含む画面内で全体共起ネットワークに含まれる１個のエッジを選択し、分析開始を選択することにより、２個の注目語を指定する指示が入力される。 As shown in FIG. 10, the user first clicks on the edge connecting the two nodes corresponding to the two words (here, "open-air bath" and "stairs") designated as the attention words in the window 41. Then click the analysis button 61 (second click). As a result, the two words corresponding to the two nodes connected to the edge clicked the first time are designated as the words of interest. In this way, by selecting one edge included in the whole co-occurrence network in the screen including the whole co-occurrence network and selecting the start of analysis, an instruction to specify two attention words is input.

図１１に示すように、利用者は、ウインドウ４１内でまず注目語として指定する２個の単語（ここでは「露天風呂」と「階段」）に対応する２個のノードを接続するエッジをクリックし（１回目のクリック）、次に注目語として指定する別の２個の単語（ここでは「値段」と「考える」）に対応する２個のノードを接続するエッジをクリックし（２回目のクリック）、最後に分析ボタン６１をクリックする（最後のクリック）。この操作により、１回目と２回目にクリックされた２個のエッジに接続された４個のノードに対応する４個の単語が注目語として指定される。利用者は、ウインドウ４１内でｑ本（ｑは３以上の整数）のエッジを順にクリックし、最後に分析ボタン６１をクリックしてもよい。この操作により、ｑ本のエッジに接続された２ｑ個のノードに対応する２ｑ個の単語が注目語として指定される。このように全体共起ネットワークを含む画面内で全体共起ネットワークに含まれる複数のエッジを選択し、分析開始を選択することにより、複数の注目語を指定する指示が入力される。 As shown in FIG. 11, the user first clicks on the edge connecting the two nodes corresponding to the two words (here, "open-air bath" and "stairs") designated as the attention words in the window 41. (1st click), then click the edge connecting the 2 nodes corresponding to the other 2 words (here "price" and "think") that you specify as the word of interest (2nd click) Click), and finally click the analysis button 61 (last click). By this operation, four words corresponding to the four nodes connected to the two edges clicked the first time and the second time are designated as the words of interest. The user may click q edges (q is an integer of 3 or more) in order in the window 41, and finally click the analysis button 61. By this operation, 2q words corresponding to 2q nodes connected to q edges are designated as attention words. In this way, by selecting a plurality of edges included in the whole co-occurrence network in the screen including the whole co-occurrence network and selecting start analysis, instructions for designating a plurality of attention words are input.

指示入力部１１は、ステップＳ１１１において、注目語を指定する指示に加えて、共起行列の基準値を設定する指示、ＡＮＤ処理とＯＲ処理を切り替える指示、共起ネットワークの表示態様の詳細を設定する指示などを受け取ってもよい。受け取った指示は、テキストマイニング装置１０の各部に対して出力される。 In step S111, the instruction input unit 11 sets, in addition to the instruction to specify the word of interest, an instruction to set a reference value of the co-occurrence matrix, an instruction to switch between AND processing and OR processing, and details of the display mode of the co-occurrence network. You may receive instructions to do so. The received instruction is output to each part of the text mining device 10.

次に、単語抽出部１３は、ステップＳ１０２で読み出したテキストデータからステップＳ１１１で指定された注目語を含む文を抽出することにより、注目語を含む文からなる限定テキストデータを求める（ステップＳ１１２）。 Next, the word extraction unit 13 obtains limited text data consisting of sentences including the attention word by extracting a sentence including the attention word specified in step S111 from the text data read in step S102 (step S112). ..

単語抽出部１３は、複数の注目語が指定された場合にＡＮＤ処理とＯＲ処理のうちいずれを行うかを示すフラグを有している。フラグの値は、予め決定された値でもよく、指示入力部１１を用いて利用者から設定された値でもよい。フラグがＡＮＤ処理を示す場合、単語抽出部１３は、読み出したテキストデータから指定された複数の注目語のすべてを含む文を抽出することにより、限定テキストデータを求める。フラグがＯＲ処理を示す場合、単語抽出部１３は、読み出したテキストデータから指定されたいずれかの注目語を含む文を抽出することにより、限定テキストデータを求める。 The word extraction unit 13 has a flag indicating which of the AND process and the OR process is to be performed when a plurality of noteworthy words are specified. The value of the flag may be a predetermined value or a value set by the user using the instruction input unit 11. When the flag indicates AND processing, the word extraction unit 13 obtains limited text data by extracting a sentence including all of a plurality of designated words of interest from the read text data. When the flag indicates OR processing, the word extraction unit 13 obtains limited text data by extracting a sentence including any of the designated words of interest from the read text data.

次に、単語抽出部１３は、ステップＳ１１２で求めた限定テキストデータに対して形態素解析を行うことにより、限定テキストデータから単語を抽出する（ステップＳ１１３）。次に、共起行列生成部１４は、ステップＳ１１３で抽出された単語について、ステップＳ１１２で求められた限定テキストデータを用いて共起行列を生成する（ステップＳ１１４）。次に、共起ネットワーク生成部１５は、ステップＳ１１４で生成された共起行列に基づき、限定共起ネットワークを生成する（ステップＳ１１５）。なお、ステップＳ１０３〜Ｓ１０５とステップＳ１１３〜Ｓ１１５の間では、処理対象は異なるが、処理内容は同じである。 Next, the word extraction unit 13 extracts words from the limited text data by performing morphological analysis on the limited text data obtained in step S112 (step S113). Next, the co-occurrence matrix generation unit 14 generates a co-occurrence matrix for the words extracted in step S113 using the limited text data obtained in step S112 (step S114). Next, the co-occurrence network generation unit 15 generates a limited co-occurrence network based on the co-occurrence matrix generated in step S114 (step S115). The processing targets are different between steps S103 to S105 and steps S113 to S115, but the processing contents are the same.

一般に、ステップＳ１１２で求められた限定テキストデータから抽出される単語の種類は、ステップＳ１０２で読み出されたテキストデータから抽出される単語の種類よりも少ない。ステップＳ１１４で生成された共起行列は、ステップＳ１０４で生成された共起行列とは異なる。ステップＳ１１５で生成された限定共起ネットワークは、ステップＳ１０５で生成された全体共起ネットワークとは異なる。 Generally, the types of words extracted from the limited text data obtained in step S112 are smaller than the types of words extracted from the text data read in step S102. The co-occurrence matrix generated in step S114 is different from the co-occurrence matrix generated in step S104. The limited co-occurrence network generated in step S115 is different from the whole co-occurrence network generated in step S105.

次に、画面表示部１６は、ステップＳ１１５で生成された限定共起ネットワークを含む画面を表示する（ステップＳ１１６）。図１２および図１３は、ステップＳ１１６で表示される、限定共起ネットワークを含むウインドウの例を示す図である。図１２に示すウインドウ４２は、１個の注目語（ここでは「露天風呂」）を指定したときの限定共起ネットワーク５２を含んでいる。図１３に示すウインドウ４３は、２個の注目語（ここでは「露天風呂」と「浴場」）を指定したときの限定共起ネットワーク５３を含んでいる。 Next, the screen display unit 16 displays a screen including the limited co-occurrence network generated in step S115 (step S116). 12 and 13 are diagrams showing an example of a window containing a limited co-occurrence network displayed in step S116. The window 42 shown in FIG. 12 includes a limited co-occurrence network 52 when one noteworthy word (here, “open-air bath”) is specified. The window 43 shown in FIG. 13 includes a limited co-occurrence network 53 when two notable words (here, “open-air bath” and “bathhouse”) are specified.

図１４および図１５は、テキストマイニング装置１０の表示画面の例を示す図である。画面表示部１６は、全体共起ネットワークを含むウインドウと限定共起ネットワークを含むウインドウとを重ねずに並べて表示してもよく、両者を重ねて表示してもよい。図１４に示す画面７１では、全体共起ネットワーク５１を含むウインドウ４１と限定共起ネットワーク５２を含むウインドウ４２とは、重ねずに並べて表示されている。利用者は、画面７１において、全体共起ネットワーク５１と限定共起ネットワーク５２を同時に見ることができる。図１５に示す画面７２では、限定共起ネットワーク５２を含むウインドウ４２は、全体共起ネットワーク５１を含むウインドウ４１に重ねて表示されている。利用者は、画面７２において、全体共起ネットワーク５１と限定共起ネットワーク５２を切り替えて見ることができる。 14 and 15 are diagrams showing an example of a display screen of the text mining device 10. The screen display unit 16 may display the window including the whole co-occurrence network and the window including the limited co-occurrence network side by side without overlapping, or may display both of them in an overlapping manner. In the screen 71 shown in FIG. 14, the window 41 including the whole co-occurrence network 51 and the window 42 including the limited co-occurrence network 52 are displayed side by side without overlapping. The user can see the whole co-occurrence network 51 and the limited co-occurrence network 52 at the same time on the screen 71. In the screen 72 shown in FIG. 15, the window 42 including the limited co-occurrence network 52 is displayed superimposed on the window 41 including the whole co-occurrence network 51. The user can switch between the whole co-occurrence network 51 and the limited co-occurrence network 52 on the screen 72.

次に、指示入力部１１は、利用者から指示を受け取る（ステップＳ１２１）。次に、テキストマイニング装置１０は、ステップＳ１２１で受け取った指示が注目語を指定する指示か否かを判断する（ステップＳ１２２）。ステップＳ１２２でＹｅｓの場合、テキストマイニング装置１０の制御はステップＳ１１２へ進む。この場合、ステップＳ１２１で指定された注目語についてステップＳ１１２〜Ｓ１１６が実行され、ステップＳ１２１で指定された注目語を含む文からなる限定テキストデータに基づく限定共起ネットワークを含む画面が表示される。 Next, the instruction input unit 11 receives an instruction from the user (step S121). Next, the text mining device 10 determines whether or not the instruction received in step S121 is an instruction for designating a word of interest (step S122). If Yes in step S122, the control of the text mining device 10 proceeds to step S112. In this case, steps S112 to S116 are executed for the attention word specified in step S121, and a screen including a limited co-occurrence network based on limited text data consisting of sentences including the attention word specified in step S121 is displayed.

図１６は、テキストマイニング装置１０の表示画面の例を示す図である。図１６に示す画面７３では、全体共起ネットワーク５１を含むウインドウ４１と限定共起ネットワーク５２を含むウインドウ４２とに重ねて、注目語として「浴場」を指定したときの限定共起ネットワーク５４を含むウインドウ４４が表示されている。画面７３は、ステップＳ１１１で「露天風呂」を注目語として指定し、ステップＳ１２１で「浴場」を注目語として指定したときに表示される。利用者は、画面７３において、全体共起ネットワーク５１と限定共起ネットワーク５２、５４を切り替えて見ることができる。 FIG. 16 is a diagram showing an example of a display screen of the text mining device 10. In the screen 73 shown in FIG. 16, the window 41 including the whole co-occurrence network 51 and the window 42 including the limited co-occurrence network 52 are overlapped with each other, and the limited co-occurrence network 54 when "bath" is designated as a noteworthy word is included. The window 44 is displayed. The screen 73 is displayed when "open-air bath" is designated as a noteworthy word in step S111 and "bathhouse" is designated as a noteworthy word in step S121. The user can switch between the whole co-occurrence network 51 and the limited co-occurrence networks 52 and 54 on the screen 73.

ステップＳ１２２でＮｏの場合、テキストマイニング装置１０の制御はステップＳ１２３へ進む。この場合、ステップＳ１２１で受け取った指示は、例えば、ウインドウを移動させる指示、ウインドウを非表示にする指示、ウインドウを閉じる指示、ウインドウを併合する指示などである。利用者は、全体共起ネットワークと限定共起ネットワークを含む画面が表示されているときに指示入力部１１を操作することにより、これらの指示を入力する。画面表示部１６は、ステップＳ１２１で受け取った指示に従い、更新後の画面を表示する（ステップＳ１２３）。その後、テキストマイニング装置１０の制御は、ステップＳ１２１へ進む。 If No in step S122, the control of the text mining device 10 proceeds to step S123. In this case, the instructions received in step S121 are, for example, an instruction to move the window, an instruction to hide the window, an instruction to close the window, an instruction to merge the windows, and the like. The user inputs these instructions by operating the instruction input unit 11 when the screen including the whole co-occurrence network and the limited co-occurrence network is displayed. The screen display unit 16 displays the updated screen according to the instruction received in step S121 (step S123). After that, the control of the text mining device 10 proceeds to step S121.

図１７は、ウインドウを併合する操作を示す図である。図１７に示す画面７４には、「露天風呂」を注目語として指定したときの限定共起ネットワーク５２を含むウインドウ４２と、「浴場」を注目語として指定したときの限定共起ネットワーク５４を含むウインドウ４４とが表示されている。利用者は、画面７４において、２個の限定共起ネットワーク５２、５４を同時に見ることができる。 FIG. 17 is a diagram showing an operation of merging windows. The screen 74 shown in FIG. 17 includes a window 42 including a limited co-occurrence network 52 when "open-air bath" is designated as a noteworthy word, and a limited co-occurrence network 54 when "bathhouse" is designated as a noteworthy word. Window 44 is displayed. The user can simultaneously view the two limited co-occurrence networks 52 and 54 on the screen 74.

図１７に示すハッチング付き矢印は、マウス２９のボタンが押された状態でマウスカーソル６２が移動したことを示す。この矢印は、実際の画面には表示されない。利用者は、画面７４内で限定共起ネットワーク５２を掴んで限定共起ネットワーク５４内で離す操作（ドロップ操作）を行う。より詳細には、利用者は、マウスカーソル６２がウインドウ４２内にあるときにマウス２９のボタンを押し、マウス２９のボタンを押したままでマウスカーソル６２をウインドウ４４内まで移動させて、マウスカーソル６２がウインドウ４４内にあるときにマウス２９のボタンを離す。この操作により、ウインドウを併合する指示が入力される。 The hatched arrow shown in FIG. 17 indicates that the mouse cursor 62 has moved while the button of the mouse 29 is pressed. This arrow does not appear on the actual screen. The user performs an operation (drop operation) of grasping the limited co-occurrence network 52 in the screen 74 and releasing it in the limited co-occurrence network 54. More specifically, the user presses the button of the mouse 29 when the mouse cursor 62 is in the window 42, moves the mouse cursor 62 into the window 44 while holding down the button of the mouse 29, and causes the mouse cursor 62. Release the mouse 29 button when is in the window 44. By this operation, an instruction to merge windows is input.

図１８は、図１７に示す操作を行った後の表示画面を示す図である。図１８に示す画面７５には、複数の限定共起ネットワークをタブ形式で表示するウインドウ４５が表示されている。図１８では、「露天風呂」と記載したタブ６４が選択され、ウインドウ４５には「露天風呂」を注目語として指定したときの限定共起ネットワーク５２が表示されている。「浴場」と記載したタブ６３が選択されたときには、ウインドウ４５には図１７に示す限定共起ネットワーク５４が表示される。 FIG. 18 is a diagram showing a display screen after performing the operation shown in FIG. On the screen 75 shown in FIG. 18, a window 45 for displaying a plurality of limited co-occurrence networks in a tab format is displayed. In FIG. 18, the tab 64 described as “open-air bath” is selected, and the window 45 displays the limited co-occurrence network 52 when “open-air bath” is designated as a noteworthy word. When the tab 63 described as "bathhouse" is selected, the window 45 displays the limited co-occurrence network 54 shown in FIG.

利用者がウインドウ４５内の閉じるボタン（×印）をクリックしたときに、ウインドウ４５は閉じる。利用者がタブ６３内の閉じるボタンをクリックしたときには、タブ６３は表示されなくなる。利用者がタブ６４内の閉じるボタンをクリックしたときには、タブ６４は表示されなくなり、ウインドウ４５には限定共起ネットワーク５４が表示される。 When the user clicks the close button (x mark) in the window 45, the window 45 closes. When the user clicks the close button in the tab 63, the tab 63 disappears. When the user clicks the close button in the tab 64, the tab 64 is not displayed and the limited co-occurrence network 54 is displayed in the window 45.

以上に示すように、本実施形態に係るテキストマイニング方法は、テキストデータから単語を抽出するステップ（ステップＳ１０２、Ｓ１０３、Ｓ１１２、Ｓ１１３）と、抽出した単語について共起行列を生成するステップ（ステップＳ１０４、Ｓ１１４）と、生成した共起行列に基づき共起ネットワークを生成するステップ（ステップＳ１０５、Ｓ１１５）と、共起ネットワークを含む画面を表示するステップ（ステップＳ１０６、Ｓ１１６）とを備えている。指定されたテキストデータの全体に基づく第１共起ネットワーク（全体共起ネットワーク５１）を含む第１画面（ウインドウ４１を含む画面）内で注目語を指定する指示が入力されたときに、単語を抽出するステップ（ステップＳ１１２、Ｓ１１３）は指定されたテキストデータのうち注目語を含む部分（注目語を含む文）からなる限定テキストデータから単語を抽出し、共起行列を生成するステップ（ステップＳ１１４）は抽出した単語について限定テキストデータを用いて第２共起行列を生成し、共起ネットワークを生成するステップ（ステップＳ１１５）は第２共起行列に基づき第２共起ネットワーク（限定共起ネットワーク５２〜５４）を生成し、画面を表示するステップ（ステップＳ１１６）は第２共起ネットワークを含む第２画面（ウインドウ４２〜４５を含む画面）を表示する。このように本実施形態に係るテキストマイニング方法では、指定されたテキストデータの全体に基づく第１共起ネットワークを含む第１画面内で注目語を指定する指示が入力されたときに、指定されたテキストデータのうち注目語を含む部分に基づく第２共起ネットワークを含む第２画面が表示される。したがって、注目語を指定したときの共起ネットワークを含む画面を簡単な操作で表示することができる。 As described above, in the text mining method according to the present embodiment, a step of extracting words from text data (steps S102, S103, S112, S113) and a step of generating a co-occurrence matrix for the extracted words (step S104). , S114), a step of generating a co-occurrence network based on the generated co-occurrence matrix (steps S105 and S115), and a step of displaying a screen including the co-occurrence network (steps S106 and S116). When an instruction to specify a word of interest is input in the first screen (screen including the window 41) including the first co-occurrence network (total co-occurrence network 51) based on the entire specified text data, the word is input. The extraction step (steps S112 and S113) is a step (step S114) of extracting a word from the limited text data consisting of a portion including the attention word (sentence including the attention word) in the designated text data and generating a co-occurrence matrix. ) Generates a second co-occurrence matrix using limited text data for the extracted word, and the step of generating a co-occurrence network (step S115) is a second co-occurrence network (limited co-occurrence network) based on the second co-occurrence matrix. The step of generating 52 to 54) and displaying the screen (step S116) displays the second screen (screen including windows 42 to 45) including the second co-occurrence network. As described above, in the text mining method according to the present embodiment, the text mining method is designated when an instruction to specify a word of interest is input in the first screen including the first co-occurrence network based on the entire designated text data. The second screen including the second co-occurrence network based on the part of the text data including the word of interest is displayed. Therefore, the screen including the co-occurrence network when the attention word is specified can be displayed by a simple operation.

また、第１画面内で第１共起ネットワークに含まれる１個または複数のノードを選択し、分析開始を選択することにより、ノードに対応する単語を注目語として指定する指示が入力される（図６、図８）。このように第１画面内で１個または複数のノードと分析開始を選択することにより、１個または複数の注目語を指定する指示を簡単な操作で入力し、１個または複数の注目語を指定したときの共起ネットワークを含む画面を表示することができる。また、第１画面内で第１共起ネットワークに含まれる１個のノードを続けて選択することにより、ノードに対応する単語を注目語として指定する指示が入力される（図７）。このように第１画面内で１個のノードを続けて選択することにより、１個の注目語を指定する指示を簡単な操作で入力し、１個の注目語を指定したときの共起ネットワークを含む画面を表示することができる。 In addition, by selecting one or more nodes included in the first co-occurrence network in the first screen and selecting start analysis, an instruction to specify the word corresponding to the node as the word of interest is input (). 6 and 8). By selecting one or more nodes and start analysis in the first screen in this way, you can easily enter instructions to specify one or more attention words, and one or more attention words can be selected. A screen including the co-occurrence network at the specified time can be displayed. Further, by continuously selecting one node included in the first co-occurrence network in the first screen, an instruction to specify the word corresponding to the node as the word of interest is input (FIG. 7). By selecting in this way continues to one of the nodes in the first screen, co-occurrence network when inputs an instruction to specify one noteworthy by a simple operation, specifying one noteworthy Can display screens that include.

また、第１画面内で第１共起ネットワークに含まれる１本のエッジを続けて選択することにより、エッジに接続された２個のノードに対応する単語を注目語として指定する指示が入力される（図９）。このように第１画面内で１本のエッジを続けて選択することにより、２個の注目語を指定する指示を簡単な操作で入力し、２個の注目語を指定したときの共起ネットワークを含む画面を表示することができる。また、第１画面内で第１共起ネットワークに含まれる１本または複数のエッジを選択し、分析開始を選択することにより、エッジに接続された複数のノードに対応する単語を注目語として指定する指示が入力される（図１０、図１１）。このように第１画面内で１本または複数のエッジと分析開始を選択することにより、複数の注目語を指定する指示を簡単な操作で入力し、複数の注目語を指定したときの共起ネットワークを含む画面を表示することができる。
In addition, by continuously selecting one edge included in the first co-occurrence network in the first screen, an instruction to specify the word corresponding to the two nodes connected to the edge as the word of interest is input. (Fig. 9). By continuously selecting one edge in the first screen in this way, an instruction to specify two attention words can be input with a simple operation, and a co-occurrence network when two attention words are specified. Can display screens that include. Also, by selecting one or more edges included in the first co-occurrence network in the first screen and selecting start analysis, the words corresponding to the multiple nodes connected to the edges are specified as the words of interest. Instructions to be input are input (FIGS. 10 and 11). By selecting one or more edges and the start of analysis in the first screen in this way, an instruction to specify multiple attention words can be input with a simple operation, and co-occurrence when multiple attention words are specified. You can display a screen that includes the network.

また、複数の第２共起ネットワーク（限定共起ネットワーク５２、５４）を含む第２画面（画面７４）内で併合指示が入力されたときに（図１７）、画面を表示するステップは、複数の第２共起ネットワークをタブ形式で表示する（図１８）。これにより、複数の第２共起ネットワークをコンパクトに表示することができる。また、第２画面内で一の第２共起ネットワーク（限定共起ネットワーク５２）を掴んで他の第２共起ネットワーク（限定共起ネットワーク５４）内で離すことにより、併合指示が入力される。したがって、併合指示を簡単な操作で入力し、複数の第２共起ネットワークをコンパクトに表示することができる。 Further, when the merge instruction is input in the second screen (screen 74) including the plurality of second co-occurrence networks (limited co-occurrence networks 52, 54) (FIG. 17), there are a plurality of steps for displaying the screen. The second co-occurrence network of No. 1 is displayed in a tab format (FIG. 18). As a result, a plurality of second co-occurrence networks can be displayed compactly. Further, by grasping one second co-occurrence network (limited co-occurrence network 52) in the second screen and separating it in another second co-occurrence network (limited co-occurrence network 54), a merge instruction is input. .. Therefore, the merge instruction can be input by a simple operation, and a plurality of second co-occurrence networks can be displayed compactly.

限定テキストデータは、指定されたテキストデータのうち注目語を含む文から構成されていてもよい。この場合、注目語を指定する指示が入力されたときに、指定されたテキストデータを文単位で分けて限定テキストデータを求め、求めた限定テキストデータに基づく第２共起ネットワークを含む画面を表示することができる。複数の注目語が指定されたときの限定テキストデータは、指定されたテキストデータのうち複数の注目語のすべてを含む文から構成されていてもよい。この場合、複数の注目語についてＡＮＤ処理を行ったときの第２共起ネットワークを含む画面を表示することができる。複数の注目語が指定されたときの限定テキストデータは、指定されたテキストデータのうち複数の注目語のいずれかを含む文から構成されていてもよい。この場合、複数の注目語についてＯＲ処理を行ったときの第２共起ネットワークを含む画面を表示することができる。また、共起行列を生成するステップは、Ｊａｃｃａｒｄ係数を要素とする共起行列を生成する。したがって、テキストデータに含まれる単語の共起性を好適に分析することができる。 The limited text data may be composed of sentences including the word of interest in the designated text data. In this case, when an instruction to specify a word of interest is input, the specified text data is divided into sentence units to obtain limited text data, and a screen including the second co-occurrence network based on the obtained limited text data is displayed. can do. The limited text data when a plurality of attention words are specified may be composed of a sentence including all of the plurality of attention words in the specified text data. In this case, it is possible to display a screen including the second co-occurrence network when AND processing is performed on a plurality of attention words. The limited text data when a plurality of attention words are specified may be composed of a sentence including any one of the plurality of attention words among the specified text data. In this case, it is possible to display a screen including the second co-occurrence network when OR processing is performed on a plurality of attention words. Further, the step of generating a co-occurrence matrix generates a co-occurrence matrix having a Jaccard coefficient as an element. Therefore, the co-occurrence of words contained in the text data can be suitably analyzed.

本実施形態に係るテキストマイニング装置１０およびテキストマイニングプログラム３１は、上記のテキストマイニング方法と同様の特徴を有し、同様の効果を奏する。本実施形態に係るテキストマイニング方法、テキストマイニング装置１０、および、テキストマイニングプログラム３１によれば、注目語を指定したときの共起ネットワークを含む画面を簡単な操作で表示することができる。 The text mining device 10 and the text mining program 31 according to the present embodiment have the same characteristics as the above-mentioned text mining method, and have the same effects. According to the text mining method, the text mining device 10, and the text mining program 31 according to the present embodiment, it is possible to display a screen including a co-occurrence network when a word of interest is specified by a simple operation.

１０…テキストマイニング装置
１１…指示入力部
１２…テキストデータ記憶部
１３…単語抽出部
１４…共起行列生成部
１５…共起ネットワーク生成部
１６…画面表示部
２０…コンピュータ
２１…ＣＰＵ
２２…メインメモリ
２９…マウス
３０…記録媒体
３１…テキストマイニングプログラム
３２…テキストデータ
４１〜４５…ウインドウ
５１…全体共起ネットワーク
５２〜５４…限定共起ネットワーク
６１…分析ボタン
６２…マウスカーソル
６３〜６４…タブ
７１〜７５…画面 10 ... Text mining device 11 ... Instruction input unit 12 ... Text data storage unit 13 ... Word extraction unit 14 ... Co-occurrence matrix generation unit 15 ... Co-occurrence network generation unit 16 ... Screen display unit 20 ... Computer 21 ... CPU
22 ... Main memory 29 ... Mouse 30 ... Recording medium 31 ... Text mining program 32 ... Text data 41-45 ... Window 51 ... Overall co-occurrence network 52-54 ... Limited co-occurrence network 61 ... Analysis button 62 ... Mouse cursor 63-64 … Tab 71-75… screen

Claims

It is a text mining method that displays a screen containing the analysis results of text data.
Steps to extract words from text data,
Steps to generate a co-occurrence matrix for the word,
Steps to generate a co-occurrence network based on the co-occurrence matrix,
A step of displaying a screen including the co-occurrence network is provided.
When an instruction to specify a word of interest is input in the first screen including the first co-occurrence network based on the entire specified text data, the step of extracting the word is the step of the specified text data. In the step of extracting the word from the limited text data including the part including the word of interest and generating the co-occurrence matrix, a second co-occurrence matrix is generated for the word using the limited text data, and the co-occurrence network is formed. The generation step generates a second co-occurrence network based on the second co-occurrence matrix, and the step of displaying the screen displays a second screen including the second co-occurrence network. Text mining. Method.

By selecting one or more nodes included in the first co-occurrence network in the first screen and selecting start analysis, an instruction to specify the word corresponding to the node as the attention word is input. The text mining method according to claim 1, wherein the text mining method is characterized by the above.

By continuously selecting one node included in the first co-occurrence network in the first screen, an instruction for designating the word corresponding to the node as the attention word is input. , The text mining method according to claim 1.

By continuously selecting one edge included in the first co-occurrence network in the first screen, an instruction to specify a word corresponding to two nodes connected to the edge as the attention word is given. The text mining method according to claim 1, wherein the text mining method is input.

By selecting one or more edges included in the first co-occurrence network in the first screen and selecting start analysis, the word corresponding to the plurality of nodes connected to the edge is referred to as the attention word. The text mining method according to claim 1, wherein an instruction specified as is input.

When a merge instruction is input in a second screen including a plurality of second co-occurrence networks, the step of displaying the screen is characterized in that the plurality of second co-occurrence networks are displayed in a tab format. , The text mining method according to claim 1.

The text mining according to claim 6, wherein the merge instruction is input by grasping one second co-occurrence network in the second screen and separating it in another second co-occurrence network. Method.

The text mining method according to claim 1, wherein the limited text data is composed of a sentence including the attention word among the designated text data.

The text mining according to claim 8, wherein the limited text data when a plurality of attention words are designated comprises a sentence including all of the plurality of attention words among the designated text data. Method.

The text according to claim 8, wherein the limited text data when a plurality of attention words are designated comprises a sentence including any one of the plurality of attention words among the designated text data. Mining method.

The text mining method according to claim 1, wherein the step of generating the co-occurrence matrix is characterized by generating a co-occurrence matrix having a Jaccard coefficient as an element.

A text mining program for displaying screens containing analysis results of text data.
Steps to extract words from text data,
Steps to generate a co-occurrence matrix for the word,
Steps to generate a co-occurrence network based on the co-occurrence matrix,
The CPU causes the computer to execute the step of displaying the screen including the co-occurrence network by using the memory.
When an instruction to specify a word of interest is input in the first screen including the first co-occurrence network based on the entire specified text data, the step of extracting the word is the step of the specified text data. In the step of extracting the word from the limited text data including the part including the word of interest and generating the co-occurrence matrix, a second co-occurrence matrix is generated for the word using the limited text data, and the co-occurrence network is formed. The generation step generates a second co-occurrence network based on the second co-occurrence matrix, and the step of displaying the screen displays a second screen including the second co-occurrence network. Text mining. program.

By selecting one or more nodes included in the first co-occurrence network in the first screen and selecting start analysis, an instruction to specify the word corresponding to the node as the attention word is input. The text mining program according to claim 12, wherein the text mining program is characterized in that.

By continuously selecting one node included in the first co-occurrence network in the first screen, an instruction for designating the word corresponding to the node as the attention word is input. , The text mining program according to claim 12.

By continuously selecting one edge included in the first co-occurrence network in the first screen, an instruction to specify a word corresponding to two nodes connected to the edge as the attention word is given. The text mining program according to claim 12, wherein the text mining program is input.

By selecting one or more edges included in the first co-occurrence network in the first screen and selecting start analysis, the word corresponding to the plurality of nodes connected to the edge is referred to as the attention word. 12. The text mining program according to claim 12, wherein an instruction is input.

When a merge instruction is input in a second screen including a plurality of second co-occurrence networks, the step of displaying the screen is characterized in that the plurality of second co-occurrence networks are displayed in a tab format. , The text mining program according to claim 12.

The text mining according to claim 17, wherein the merge instruction is input by grasping one second co-occurrence network in the second screen and separating it in another second co-occurrence network. program.

A text mining device that displays a screen containing the analysis results of text data.
A word extractor that extracts words from text data,
A co-occurrence matrix generator that generates a co-occurrence matrix for the word,
A co-occurrence network generation unit that generates a co-occurrence network based on the co-occurrence matrix,
It is provided with a screen display unit that displays a screen including the co-occurrence network.
When an instruction to specify a noteworthy word is input in the first screen including the first co-occurrence network based on the entire designated text data, the word extraction unit uses the designated text data to indicate the noteworthy word. The word is extracted from the limited text data including the portion including, the co-occurrence matrix generation unit generates a second co-occurrence matrix using the limited text data for the word, and the co-occurrence network generation unit generates the first co-occurrence network. A text mining device, characterized in that a second co-occurrence network is generated based on a two co-occurrence matrix, and the screen display unit displays a second screen including the second co-occurrence network.

When a merge instruction is input in a second screen including a plurality of second co-occurrence networks, the screen display unit displays the plurality of second co-occurrence networks in a tab format. Item 19. The text mining apparatus according to Item 19.