JP7404694B2

JP7404694B2 - Relevance evaluation method, relevance evaluation device, program

Info

Publication number: JP7404694B2
Application number: JP2019138655A
Authority: JP
Inventors: 有杉崎
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2019-07-29
Filing date: 2019-07-29
Publication date: 2023-12-26
Anticipated expiration: 2039-07-29
Also published as: JP2021022193A

Description

本発明は、関連性評価方法、関連性評価装置、プログラムに関する。 The present invention relates to a relevance evaluation method, a relevance evaluation device, and a program.

企業のような組織、団体などにおいて、特定の技術、分野、製品、プロジェクトなどに関係している人を効率的に検索することなどを実現するため、人と単語との関連性を算出・評価することがある。 Calculate and evaluate the relationship between people and words in order to efficiently search for people related to a specific technology, field, product, project, etc. in organizations such as companies and organizations. There are things to do.

このような関連性を算出する際に用いる技術として、例えば、特許文献１がある。特許文献１には、収集部と、単語抽出部と、関連スコア算出部と、を有する関連スコア算出システムが記載されている。特許文献１によると、単語抽出部は、収集部が端末装置から収集した操作ログに記述されているファイル名から単語を抽出する。また、関連スコア算出部は、操作ログに基づいて、ユーザと単語との関連の強さを表す関連スコアを算出する。さらに、特許文献１には、関連スコアの算出方法として、一人のユーザに対し一つの単語ごとに関連性を数値化する第１の方法と、単語を使う人の分布が似ている別の単語とも関連があるとして単語同士の関連度も加味した関連性を数値化する第２の方法と、の２つの方法が記載されている。 As a technique used when calculating such a relationship, there is, for example, Patent Document 1. Patent Document 1 describes a related score calculation system that includes a collection unit, a word extraction unit, and a related score calculation unit. According to Patent Document 1, a word extraction unit extracts words from file names written in operation logs collected from terminal devices by a collection unit. Further, the association score calculation unit calculates an association score representing the strength of association between the user and the word based on the operation log. Furthermore, Patent Document 1 describes two methods for calculating relevance scores: a first method in which the relevance is quantified for each word for one user, and another method in which the relevance is quantified for each word for one user. Two methods are described: a second method that digitizes the relationship, which also takes into account the degree of association between words.

特開２０１９－８６９４０号公報JP2019-86940A

特許文献１に記載されている第２の方法で関連性を数値化した場合、例えば、「スマホ」という単語を使う人は「スマートフォン」という単語とも関連性があると判断されるなど、類語や表記ゆれなどに強くなることが期待される。しかしながら、第２の方法の場合、例えば、ユーザＡが「Ｂ」という単語をよく使っており、ユーザ全体において「Ａ」という単語を使う者が「Ｂ」という単語もよく使う傾向があった場合、仮にユーザＡが「Ａ」という単語を使っていなくても、ユーザＡは「Ａ」との関連スコアも高くなることがあった。 When relevance is quantified using the second method described in Patent Document 1, for example, it is determined that a person who uses the word "smartphone" is also related to the word "smartphone". It is expected that it will be more resistant to spelling fluctuations. However, in the case of the second method, for example, if user A often uses the word "B" and there is a tendency among all users who use the word "A" to also often use the word "B". , even if user A did not use the word "A", the association score for user A with "A" might also be high.

このように、類語や表記ゆれなどに強くなる第２の方法を用いると、実際には使っていない単語に対する関連スコアが高くなるおそれがある、という課題が生じていた。このような課題は、使用者が少ない、専門性の高い単語でより顕著に発生していた。 As described above, when the second method, which is more resistant to synonyms and spelling variations, is used, there is a problem in that the association score for words that are not actually used may become high. These problems were more pronounced with highly specialized words that were used by fewer users.

そこで、本発明の目的は、類語や表記ゆれなどに強くしつつ、実際には使っていない単語に対する関連スコアが高くなる影響を抑制することが難しい、という課題を解決する関連性評価方法、関連性評価装置、プログラムを提供することにある。 SUMMARY OF THE INVENTION Therefore, an object of the present invention is to provide a relevance evaluation method that solves the problem that it is difficult to suppress the influence of high relevance scores for words that are not actually used, while being robust against synonyms and spelling variations. Our goal is to provide sex evaluation devices and programs.

かかる目的を達成するため本発明の一形態である関連性評価方法は、
情報処理装置が、
単語の参照時間を示す情報に基づいて、ユーザと単語との関連の高さを示す第１スコアを算出し、
前記第１スコアと単語間の関連度とに基づいて、第２スコアを算出し、
算出した第１スコアと第２スコアとに基づいて、ユーザを特定する情報の並び替えを行う
という構成をとる。 In order to achieve this purpose, a relevance evaluation method that is one form of the present invention includes:
The information processing device
calculating a first score indicating the degree of association between the user and the word based on information indicating the reference time of the word;
Calculating a second score based on the first score and the degree of association between words,
The configuration is such that information for identifying users is rearranged based on the calculated first score and second score.

また、本発明の他の形態である関連性評価装置は、
単語の参照時間を示す情報に基づいて、ユーザと単語との関連の高さを示す第１スコアを算出する第１スコア算出部と、
前記第１スコアと単語間の関連度とに基づいて、第２スコアを算出する第２スコア算出部と、
算出した第１スコアと第２スコアとに基づいて、ユーザを特定する情報の並び替えを行う並び替え処理部と、
を有する
という構成をとる。 Further, a relevance evaluation device according to another embodiment of the present invention includes:
a first score calculation unit that calculates a first score indicating the degree of association between the user and the word based on information indicating the reference time of the word;
a second score calculation unit that calculates a second score based on the first score and the degree of association between words;
a sorting processing unit that sorts information for identifying users based on the calculated first score and second score;
It has the following structure.

また、本発明の他の形態であるプログラムは、
情報処理装置に、
単語の参照時間を示す情報に基づいて、ユーザと単語との関連の高さを示す第１スコアを算出する第１スコア算出部と、
前記第１スコアと単語間の関連度とに基づいて、第２スコアを算出する第２スコア算出部と、
算出した第１スコアと第２スコアとに基づいて、ユーザを特定する情報の並び替えを行う並び替え処理部と、
を実現するためのプログラムである。 Further, a program which is another form of the present invention is
In the information processing device,
a first score calculation unit that calculates a first score indicating the degree of association between the user and the word based on information indicating the reference time of the word;
a second score calculation unit that calculates a second score based on the first score and the degree of association between words;
a sorting processing unit that sorts information for identifying users based on the calculated first score and second score;
This is a program to achieve this.

本発明は、以上のように構成されることにより、類語や表記ゆれなどに強くしつつ、実際には使っていない単語に対する関連スコアが高くなる影響を抑制することが難しい、という課題を解決する関連性評価方法、関連性評価装置、プログラムを提供することが可能となる。 By having the above configuration, the present invention solves the problem that it is difficult to suppress the influence of high association scores for words that are not actually used, while making it resistant to synonyms and spelling variations. It becomes possible to provide a relevance evaluation method, a relevance evaluation device, and a program.

本発明の第１の実施形態における関連性評価装置の構成の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of the configuration of a relevance evaluation device according to a first embodiment of the present invention. 図１で示す参照時間情報の一例を示す図である。FIG. 2 is a diagram showing an example of reference time information shown in FIG. 1; 図１で示す第１スコア情報の一例を示す図である。2 is a diagram showing an example of first score information shown in FIG. 1. FIG. 図１で示す第１テーブルの一例を示す図である。2 is a diagram showing an example of a first table shown in FIG. 1. FIG. 図１で示す第２テーブルの一例を示す図である。FIG. 2 is a diagram showing an example of a second table shown in FIG. 1; 図１で示す第２スコア情報の一例を示す図である。It is a figure which shows an example of the 2nd score information shown in FIG. 図１で示すランキング情報に含まれるランキングテーブルの一例を示す図である。2 is a diagram showing an example of a ranking table included in the ranking information shown in FIG. 1. FIG. 図１で示すランキング情報に含まれるランキングテーブルの他の一例を示す図である。2 is a diagram showing another example of a ranking table included in the ranking information shown in FIG. 1. FIG. 第１スコア算出部が行う処理の一例を説明するための図である。FIG. 3 is a diagram for explaining an example of a process performed by a first score calculation unit. 第２テーブル生成部が行う処理の一例を説明するための図である。FIG. 7 is a diagram for explaining an example of processing performed by a second table generation unit. 関連性評価装置の動作の一例を示すフローチャートである。It is a flowchart which shows an example of operation of a relevance evaluation device. ランキングテーブルを生成する処理の一例を示すフローチャートである。3 is a flowchart illustrating an example of processing for generating a ranking table. 関連性評価装置が検索を行う際の動作の一例を示すフローチャートである。It is a flow chart which shows an example of operation when a relevance evaluation device performs a search. 本発明の第２の実施形態における関連性評価装置の構成の一例を示すブロック図である。It is a block diagram showing an example of composition of a relevance evaluation device in a 2nd embodiment of the present invention.

［第１の実施形態］
本発明の第１の実施形態を図１から図１３までを参照して説明する。図１は、関連性評価装置１００の構成の一例を示すブロック図である。図２は、参照時間情報１１１の一例を示す図である。図３は、第１スコア情報１１２の一例を示す図である。図４は、第１テーブル１１３の一例を示す図である。図５は、第２テーブル１１４の一例を示す図である。図６は、第２スコア情報１１５の一例を示す図である。図７、図８は、ランキング情報116に含まれるランキングテーブルの一例を示す図である。図９は、第１スコア算出部１２０が行う処理の一例を説明するための図である。図１０は、第２テーブル生成部１４０が行う処理の一例を説明するための図である。図１１から図１３は、関連性評価装置１００の動作の一例を示すフローチャートである。 [First embodiment]
A first embodiment of the present invention will be described with reference to FIGS. 1 to 13. FIG. 1 is a block diagram showing an example of the configuration of the relevance evaluation device 100. FIG. 2 is a diagram showing an example of the reference time information 111. FIG. 3 is a diagram showing an example of the first score information 112. FIG. 4 is a diagram showing an example of the first table 113. FIG. 5 is a diagram showing an example of the second table 114. FIG. 6 is a diagram showing an example of the second score information 115. 7 and 8 are diagrams showing examples of ranking tables included in the ranking information 116. FIG. 9 is a diagram for explaining an example of a process performed by the first score calculation unit 120. FIG. 10 is a diagram for explaining an example of processing performed by the second table generation unit 140. 11 to 13 are flowcharts showing an example of the operation of the relevance evaluation device 100.

本発明の第１の実施形態では、ある人はある単語を使う時間が多いなどの人と単語との関連性の強さを数値化してランキング形式で表示することにより、人と単語との関連性の強さを評価する関連性評価装置１００について説明する。後述するように、関連性評価装置１００は、各ユーザに対して単語ごとの関連性を数値化した第１スコアと、単語を使う人の分布が似ている別の単語とも関連があるとして単語間の関連度を第１スコアに加味した第２スコアと、を算出する。そして、関連性評価装置１００は、算出した第１スコアと、第２スコアと、単語を使用したことがあるユーザの数などの単語の使用状況と、に基づいて、ランキングを生成する。 In the first embodiment of the present invention, the strength of the relationship between a person and a word, such as a certain person spending a lot of time using a certain word, is quantified and displayed in a ranking format. The relevance evaluation device 100 that evaluates the strength of gender will be described. As will be described later, the relevance evaluation device 100 determines whether a word is related to another word that has a similar distribution of users and a first score that quantifies the relevance of each word to each user. A second score is calculated by adding the degree of association between the first score and the first score. Then, the relevance evaluation device 100 generates a ranking based on the calculated first score, second score, and word usage status such as the number of users who have used the word.

関連性評価装置１００は、上述した第１スコアや第２スコアを算出してランキングを生成することにより、人と単語との関連性の強さを評価する情報処理装置である。また、関連性評価装置１００は、生成したランキングを用いた検索を行うことが出来るよう構成されている。 The relevance evaluation device 100 is an information processing device that evaluates the strength of the relevance between a person and a word by calculating the above-described first score and second score and generating a ranking. Furthermore, the relevance evaluation device 100 is configured to be able to perform a search using the generated rankings.

図１は、関連性評価装置１００の構成の一例を示している。図１を参照すると、関連性評価装置１００は、例えば、記憶部１１０と、第１スコア算出部１２０と、第１テーブル生成部１３０と、第２テーブル生成部１４０と、第２スコア算出部１５０と、ランキング生成部１６０と、キーワード受付部１７０と、検索部１８０と、出力部１９０と、を有している。 FIG. 1 shows an example of the configuration of a relevance evaluation device 100. Referring to FIG. 1, the relevance evaluation device 100 includes, for example, a storage unit 110, a first score calculation unit 120, a first table generation unit 130, a second table generation unit 140, and a second score calculation unit 150. , a ranking generation section 160 , a keyword reception section 170 , a search section 180 , and an output section 190 .

例えば、関連性評価装置１００は、ＣＰＵ（Central Processing Unit）などの演算装置を有している。例えば、関連性評価装置１００は、記憶部１１０などの記憶装置が記憶するプログラムを演算装置が実行することで、上述した各処理部を実現する。 For example, the relevance evaluation device 100 includes a calculation device such as a CPU (Central Processing Unit). For example, the relevance evaluation device 100 realizes each of the processing units described above by having a calculation device execute a program stored in a storage device such as the storage unit 110.

記憶部１１０は、ハードディスクやメモリなどの記憶装置である。記憶部１１０で記憶される主な情報としては、例えば、参照時間情報１１１と、第１スコア情報１１２と、第１テーブル１１３と、第２テーブル１１４と、第２スコア情報１１５と、ランキング情報１１６と、がある。また、記憶部１１０には、外部装置や記憶媒体などから予め読み込まれたプログラムを記憶することが出来る。 The storage unit 110 is a storage device such as a hard disk or a memory. The main information stored in the storage unit 110 includes, for example, reference time information 111, first score information 112, first table 113, second table 114, second score information 115, and ranking information 116. There is. Further, the storage unit 110 can store a program read in advance from an external device, a storage medium, or the like.

参照時間情報１１１は、ユーザと単語の組ごとに、ユーザが単語を参照していた時間である参照時間を示す情報である。参照時間情報１１１は、例えば、ファイルの操作ログやスケジュール情報などに基づいて予め生成され記憶部１１０に格納されている。単語の参照時間としては、例えば、ある単語をファイル名に含むファイルを開いていたユーザの操作時間、キーボードのタイプ数、ある単語を件名に含むスケジュール情報の開始から終了までの時間、などがある。単語の参照時間は、上記例示した以外のものであっても構わない。 The reference time information 111 is information indicating the reference time, which is the time during which the user referred to the word, for each pair of user and word. The reference time information 111 is generated in advance and stored in the storage unit 110 based on, for example, file operation logs and schedule information. Examples of word reference times include the operation time of a user who opened a file that includes a certain word in the file name, the number of keyboard types, and the time from start to end of schedule information that includes a certain word in the subject, etc. . The reference time of the word may be other than those exemplified above.

図２は、参照時間情報１１１の一例を示している。図２で示すように、参照時間情報１１１では、例えば、ユーザ名と、単語と、参照時間と、が対応づけられている。例えば、図２の１行目は、ユーザ名「Ａさん」の単語「働き」の参照時間が「３０」であることを示している。 FIG. 2 shows an example of the reference time information 111. As shown in FIG. 2, in the reference time information 111, for example, a user name, a word, and a reference time are associated with each other. For example, the first line in FIG. 2 shows that the reference time of the word "work" for the user name "Mr. A" is "30".

第１スコア情報１１２は、各ユーザに対して単語ごとの関連性を数値化した第１スコアを示す情報である。つまり、第１スコア情報１１２には、ユーザと単語との関連の高さを示す第１スコアが含まれている。第１スコア情報１１２が示す第１スコアは、後述する第１スコア算出部１２０により、参照時間情報１１１に基づいて算出される。 The first score information 112 is information indicating a first score that is a numerical representation of the relevance of each word for each user. That is, the first score information 112 includes a first score indicating the degree of association between the user and the word. The first score indicated by the first score information 112 is calculated based on the reference time information 111 by a first score calculation unit 120 described later.

図３は、第１スコア情報１１２の一例を示している。図３で示すように、第１スコア情報１１２では、例えば、ユーザ名と、単語と、第１スコアと、が対応づけられている。例えば、図３の１行目は、ユーザ名「Ａさん」の単語「サービス」の第１スコアが「０」であることを示している。 FIG. 3 shows an example of the first score information 112. As shown in FIG. 3, in the first score information 112, for example, a user name, a word, and a first score are associated with each other. For example, the first line of FIG. 3 shows that the first score of the word "service" for the user name "Mr. A" is "0".

なお、第１スコア情報１１２が示す第１スコアは、ユーザによる単語の参照時間が０である場合対応するスコアの値が０となり、参照時間が長いほどスコアの値が高くなる、という特徴を有している。また、第１スコアは、ユーザ全体が使っている単語ほどスコアの値が低くなり、特定のユーザにのみ使われている単語はスコアの値が高くなる、という特徴を有している。以上の特徴のため、第１スコアは、ユーザが単語をよく使っている、特定のユーザのみが単語を使っているなど、ユーザと単語との関連性が高いと評価される場合に値が高くなるスコアである、ということが出来る。 Note that the first score indicated by the first score information 112 has a characteristic that when the reference time of a word by the user is 0, the corresponding score value is 0, and the longer the reference time, the higher the score value. are doing. Further, the first score has a characteristic that the score value becomes lower for a word that is used by all users, and the score value becomes higher for a word that is used only by a specific user. Because of the above characteristics, the first score has a high value when the user and the word are evaluated to be highly related, such as when the user uses the word often or when only a specific user uses the word. It can be said that the score is

第１テーブル１１３は、第１スコア情報１１２に基づいて、各ユーザを行、各単語を列に当てはめることで生成したテーブルである。第１テーブル１１３は、後述する第１テーブル生成部１３０により生成される。 The first table 113 is a table generated by assigning each user to a row and each word to a column based on the first score information 112. The first table 113 is generated by a first table generation unit 130, which will be described later.

図４は、第１テーブル１１３の一例を示している。例えば、図４の１行目は、ユーザ名「Ａさん」の、単語「サービス」の第１スコアが「０」、単語「休暇」の第１スコアが「０」、単語「休暇表」の第１スコアが「０」、……、というように、ユーザ名「Ａさん」の各単語に対する第１スコアを示している。 FIG. 4 shows an example of the first table 113. For example, in the first line of Figure 4, the first score of the word "service" of the user name "Mr. A" is "0", the first score of the word "vacation" is "0", and the first score of the word "vacation table" is "0". The first score for each word of the user name "Mr. A" is shown as "0", . . . .

第２テーブル１１４は、単語間の関連度を示すテーブルである。第２テーブル１１４は、後述する第２テーブル生成部１４０により第１スコア情報１１２や第１テーブル113が示す第１スコアに基づいて生成される。 The second table 114 is a table showing the degree of association between words. The second table 114 is generated based on the first score information 112 and the first score indicated by the first table 113 by a second table generation unit 140, which will be described later.

図５は、第２テーブル１１４を示している。例えば、図４の１行目は、単語「会議」と単語「会議」の関連度が「１」、単語「会議」と単語「働き」の関連度が「－０．６５５７８」、単語「会議」と単語「働き方」の関連度が「－０．６５５７８」、……、というように、単語「会議」と他の単語との関連度を示している。 FIG. 5 shows the second table 114. For example, in the first line of FIG. ” and the word “work style” are “-0.65578”, etc., indicating the degree of association between the word “meeting” and other words.

なお、第２テーブル１１４が示す単語間の関連度は、第１スコアの分布が似ているなど使用のされ方が類似している単語について値が大きくなる指標である。例えば、単語ｘと単語ｙの間の関連度の場合、単語ｘと単語ｙを両方とも使う人が多い、又は、単語ｘと単語ｙを両方とも使わない人が多いなど、使用のされ方が類似しているほど、値が大きくなる。例えば、単語スマホと単語スマートフォンについて、使われ方が類似して値が大きくなることなどが想定される。 Note that the degree of association between words indicated by the second table 114 is an index that increases in value for words that are used in similar ways, such as having similar first score distributions. For example, in the case of the degree of association between word x and word y, there are many people who use both word x and word y, or many people who do not use both word x and word y. The more similar they are, the larger the value. For example, it is assumed that the words ``smartphone'' and ``smartphone'' are used in similar ways, resulting in a large value.

第２スコア情報１１５は、単語を使う人の分布が似ている別の単語とも関連があるとして単語間の関連度を第１スコアに加味した第２スコアを示す情報である。第２スコア情報１１５が示す第２スコアは、後述する第２スコア算出部１５０により、第１スコア情報１１２または第１テーブル１１３が示す第１スコアと第２テーブル１１４が示す単語間の関連度とに基づいて算出される。 The second score information 115 is information indicating a second score obtained by adding the degree of association between words to the first score, assuming that the word is also related to another word that has a similar distribution of users. The second score indicated by the second score information 115 is determined by a second score calculation unit 150 (described later) based on the degree of association between the first score indicated by the first score information 112 or the first table 113 and the word indicated by the second table 114. Calculated based on.

図６は、第２スコア情報１１５の一例を示している。図６で示すように、第２スコア情報１１５では、ユーザ名と、単語と、第２スコアと、が対応づけられている。例えば、図６の１行目は、ユーザ名「Ａさん」の単語「サービス」の第１スコアが「０．１４５６５０」であることを示している。 FIG. 6 shows an example of the second score information 115. As shown in FIG. 6, in the second score information 115, user names, words, and second scores are associated with each other. For example, the first line of FIG. 6 shows that the first score of the word "service" for the user name "Mr. A" is "0.145650".

なお、第２スコア情報１１５が示す第２スコアは、上述したように、第１スコアと単語間の関連度とに基づいて算出する。そのため、第２スコアを用いると、例えば、後述する検索を行う際に、検索キーワードとして指定した単語との関連度が高い別の単語との関連が高いユーザのユーザ名も上位に検索することが可能となる。つまり、第２スコア情報１１５が示す第２スコアは、第１スコアよりも類語や表記ゆれなどに強くなる値である、ということが出来る。 Note that the second score indicated by the second score information 115 is calculated based on the first score and the degree of association between words, as described above. Therefore, by using the second score, for example, when performing a search described below, it is possible to search for user names of users that are highly related to another word that is highly related to the word specified as a search keyword. It becomes possible. In other words, it can be said that the second score indicated by the second score information 115 is a value that is more resistant to synonyms and spelling variations than the first score.

ランキング情報１１６は、単語に対する人の関連性の高さを単語ごとにランキング形式で示している。図７、図８で示すように、ランキング情報１１６には、第１の形式、または、第２の形式により生成されたランキングテーブルが単語ごとに含まれている。ランキング情報１１６に含まれるランキングテーブルは、後述するランキング生成部１６０により生成される。また、ランキングテーブルを図７で示す第１の形式と図８で示す第２の形式のいずれを用いて生成するかは、後述するランキング生成部１６０により単語ごとに判断される。 Ranking information 116 indicates the degree of relevance of a person to a word in a ranking format for each word. As shown in FIGS. 7 and 8, the ranking information 116 includes a ranking table generated in the first format or the second format for each word. The ranking table included in the ranking information 116 is generated by a ranking generation unit 160, which will be described later. Furthermore, which of the first format shown in FIG. 7 and the second format shown in FIG. 8 should be used to generate the ranking table is determined for each word by the ranking generation unit 160, which will be described later.

図７で示す第１の形式によるランキングテーブルは、後述するように、ユーザを特定する情報を第２スコアの降順で並び替えた結果として生成される。図７で示すように、第１の形式によるランキングテーブルの場合、第２スコアの値が高いほど順位が上がっており、第１スコアは考慮されていない。例えば、図７の２行目では、順位「２」位のユーザとして、第２スコア「０．０４３９５９」、第１スコア「０」である「Ｂさん」が対応づけられている。 The ranking table in the first format shown in FIG. 7 is generated as a result of sorting the information identifying the users in descending order of the second scores, as will be described later. As shown in FIG. 7, in the case of the ranking table in the first format, the higher the value of the second score, the higher the ranking, and the first score is not taken into consideration. For example, in the second line of FIG. 7, "Mr. B", whose second score is "0.043959" and first score is "0", is associated with the user ranked "2".

一方、図８で示す第２の形式によるランキングテーブルは、ユーザを特定する情報を、第１スコアの降順で並び替えた後、第１スコアが０であるユーザを第２スコアの降順で並び変えた結果として生成される。図８で示すように、第２の形式によるランキングテーブルの場合、第１スコアの値が高いほど順位が上がっており、第２スコアの値は第１スコアの値が０の場合のみ影響している。例えば、図８の２行目では、順位「２」位のユーザとして、第２スコア「０．０２４０４５」、第１スコア「０．００３８５４」である「Ｃさん」が対応づけられている。 On the other hand, in the ranking table in the second format shown in FIG. 8, the information that identifies users is sorted in descending order of the first score, and then the users whose first score is 0 are sorted in the descending order of the second score. generated as a result of As shown in Figure 8, in the case of the ranking table in the second format, the higher the value of the first score, the higher the ranking, and the value of the second score has an effect only when the value of the first score is 0. There is. For example, in the second line of FIG. 8, "Mr. C", who has a second score of "0.024045" and a first score of "0.003854", is associated as a user ranked "2".

なお、ランキングテーブルに含まれるユーザを特定する情報には、例えば、ユーザ名が含まれる。ユーザを特定する情報には、第１スコア、第２スコアを含んでも構わないし、そのほかの各種情報を含んでも構わない。 Note that the information for specifying a user included in the ranking table includes, for example, a user name. The information specifying the user may include the first score and the second score, or may include various other information.

第１スコア算出部１２０は、参照時間情報１１１に基づいて、各ユーザに対して単語ごとに、ユーザと単語との関連性を数値化した第１スコアを算出する。そして、第１スコア算出部１２０は、算出した第１スコアを第１スコア情報１１２として記憶部１１０に格納する。 The first score calculation unit 120 calculates, for each user, a first score for each word based on the reference time information 111, which is a numerical representation of the relationship between the user and the word. The first score calculation unit 120 then stores the calculated first score in the storage unit 110 as the first score information 112.

例えば、第１スコア算出部１２０は、図９で示すTF-IDFを算出することで、ユーザと単語間の関連性を示す第１スコアを算出する。ここで、TF-IDFとは、文書中に含まれる単語の重要性を評価する手法である。第１スコア算出部１２０は、各ユーザが使用した単語を文章とみなして図９で示す計算式にあてはめることで、第１スコアとしてTF-IDFを算出する。第１スコア算出部１２０による算出の手順は、以下の通りとなる。 For example, the first score calculation unit 120 calculates the first score indicating the relationship between the user and the word by calculating the TF-IDF shown in FIG. Here, TF-IDF is a method for evaluating the importance of words included in a document. The first score calculation unit 120 calculates TF-IDF as a first score by considering the words used by each user as sentences and applying them to the calculation formula shown in FIG. The calculation procedure by the first score calculation unit 120 is as follows.

まず、第１スコア算出部１２０は、参照時間情報１１１を参照して、ユーザごとに全ての単語の参照時間を合計することで、あるユーザにおける全ての単語の参照時間の合計を示す第１総参照時間を算出する。また、第１スコア算出部１２０は、単語ごとに全てのユーザの参照時間を合計することで、ある単語における全てのユーザの参照時間の合計を示す第２総参照時間を算出する。 First, the first score calculation unit 120 refers to the reference time information 111 and totals the reference times of all words for each user, thereby obtaining a first total that indicates the total reference time of all words for a certain user. Calculate reference time. In addition, the first score calculation unit 120 calculates a second total reference time indicating the total reference time of all users for a certain word by summing the reference time of all users for each word.

また、第１スコア算出部１２０は、あるユーザにおけるある単語の参照時間を第１総参照時間で割ることでTFを算出するとともに、あるユーザにおけるある単語の参照時間を第２総参照時間で割ることでIDFを算出する。そして、第１スコア算出部１２０は、算出したTFとIDFを掛けることで、あるユーザ、ある単語におけるTF-IDFを算出する。 Further, the first score calculation unit 120 calculates TF by dividing the reference time of a certain word by a certain user by the first total reference time, and also divides the reference time of a certain word by a certain user by the second total reference time. Calculate IDF by Then, the first score calculation unit 120 calculates TF-IDF for a certain user and a certain word by multiplying the calculated TF and IDF.

例えば、第１スコア算出部１２０は、上述した処理をユーザと単語の組み合わせごとに行うことで、各ユーザ、各単語のTF-IDF（つまり、第１スコア）を算出する。そして、第１スコア算出部１２０は、算出した結果を、第１スコア情報１１２として記憶部１１０に格納する。なお、あるユーザのある単語の参照時間が０である場合、TFとIDFの値がともに０となる。そのため、あるユーザ、ある単語の第１スコアの値も０となることになる。 For example, the first score calculation unit 120 calculates the TF-IDF (that is, the first score) of each user and each word by performing the above-described process for each combination of user and word. The first score calculation unit 120 then stores the calculated result in the storage unit 110 as the first score information 112. Note that if the reference time of a certain word by a certain user is 0, the values of TF and IDF are both 0. Therefore, the value of the first score for a certain user and a certain word will also be 0.

第１テーブル生成部１３０は、第１スコア情報１１２に基づいて第１テーブル１１３を生成する。例えば、第１テーブル生成部１３０は、第１スコア情報１１２に含まれる各ユーザを行、各単語を列に当てはめることで、第１テーブル１１３を生成する。そして、第１テーブル生成部１３０は、生成した第１テーブル１１３を記憶部１１０に格納する。 The first table generation unit 130 generates the first table 113 based on the first score information 112. For example, the first table generation unit 130 generates the first table 113 by assigning each user included in the first score information 112 to a row and each word to a column. The first table generation unit 130 then stores the generated first table 113 in the storage unit 110.

第２テーブル生成部１４０は、第１スコア情報１１２または第１テーブル１１３が示す第１スコアに基づいて、単語間の関連度を示す第２テーブル１１４を生成する。そして、第２テーブル生成部１４０は、生成した第２テーブル１１４を記憶部１１０に格納する。 The second table generation unit 140 generates a second table 114 indicating the degree of association between words based on the first score indicated by the first score information 112 or the first table 113. Then, the second table generation unit 140 stores the generated second table 114 in the storage unit 110.

例えば、第２テーブル生成部１４０は、図９で示す関連度ｒを求める計算式を用いることで、単語間の関連度を算出する。ここで、図９で示す式は、全ユーザの数をn人とし、対象の単語から２つの単語xとyを選んだとすると、両者の単語のn人分の第１スコアの共分散（Sxy）と、それぞれの単語のn人分の第１スコアの標準偏差（Sx、Sy）から単語間の関連度を算出するものである。 For example, the second table generation unit 140 calculates the degree of association between words by using the formula for calculating the degree of association r shown in FIG. Here, the formula shown in Figure 9 is the covariance (Sxy) of the first scores of both words for n users, assuming that the total number of users is n and two words x and y are selected from the target words. The degree of association between words is calculated from the standard deviation (Sx, Sy) of the first scores of n people for each word.

なお、xiは、単語xにおけるi番目の第１スコアであり、yiは、単語yにおけるi番目の第１スコアである。また、図１０で示す計算式中の下記数１は、xの第１スコアの平均値であり、図１０で示す計算式中の下記数２は、yの第１スコアの平均値である。
Note that xi is the i-th first score for word x, and yi is the i-th first score for word y. Moreover, the following number 1 in the calculation formula shown in FIG. 10 is the average value of the first score of x, and the following number 2 in the calculation formula shown in FIG. 10 is the average value of the first score of y.

なお、図１０で示す関連度ｒは、使用のされかたが類似している単語について、値が大きくなる指標である。つまり、単語間の関連度は、単語xと単語yを両方とも使う人が多い、あるいは、単語xと単語yを両方とも使わない人が多い場合、値が大きくなる。例えば、「スマホ」と「スマートフォン」については、使われ方が類似し、値も大きくなることが考えられる。 Note that the degree of association r shown in FIG. 10 is an index that increases in value for words that are used in similar ways. In other words, the degree of association between words increases when there are many people who use both word x and word y, or when there are many people who do not use both word x and word y. For example, "smartphones" and "smartphones" are used in similar ways, and their values are likely to be large.

第２テーブル生成部１４０は、上述した計算式により各単語間の関連度を算出する。そして、第２テーブル生成部１４０は、各単語間の関連度が対象行列の形になるよう、各単語同士の組み合わせによる第２テーブル１１４を作成する。なお、自身との単語間の関連度については１とする。 The second table generation unit 140 calculates the degree of association between each word using the above-described calculation formula. Then, the second table generation unit 140 creates a second table 114 based on combinations of words so that the degree of association between each word is in the form of a target matrix. Note that the degree of association between words with itself is set to 1.

第２スコア算出部１５０は、第１スコア情報１１２または第１テーブル１１３が示す第１スコアと、第２テーブル１１４が示す単語間の関連度と、に基づいて、単語間の関連度を第１スコアに加味した第２スコアを算出する。そして、第２スコア算出部１５０は、算出した第２スコアを第２スコア情報１１５として記憶部１１０に格納する。 The second score calculation unit 150 calculates the degree of association between words based on the first score indicated by the first score information 112 or the first table 113 and the degree of association between words indicated by the second table 114. A second score is calculated by taking the score into consideration. Then, the second score calculation unit 150 stores the calculated second score in the storage unit 110 as second score information 115.

例えば、第２スコア算出部１５０は、下記式を用いることで第２スコアを算出する。
ユーザαと単語Ｘとの関連スコア
＝Σ（単語Ｘとある単語との単語間の関連度×ある単語へのユーザＡの第１スコア） For example, the second score calculation unit 150 calculates the second score using the following formula.
Association score between user α and word X = Σ (degree of association between word X and a certain word x user A's first score for a certain word)

具体的には、例えば、図４で示す第１テーブル１１３と図５で示す第２テーブル１１４とを用いてユーザ名「Ａさん」と単語「サービス」との第２スコアを算出する場合、第２スコア算出部１５０は、下記のような計算を行うことで第２スコアを算出する。つまり、第２スコア算出部１５０は、（「サービス」と「会議」との関連度×「Ａさん」の「会議」の第１スコア）＋（「サービス」と「働き」との関連度×「Ａさん」の「働き」の第１スコア）＋（「サービス」と「働き方」との関連度×「Ａさん」の「働き方」の第１スコア）＋（「サービス」と「抑制」との関連度×「Ａさん」の「抑制」の第１スコア）＋（「サービス」と「改革」との関連度×「Ａさん」の「改革」の第１スコア）＋（「サービス」と「残業」との関連度×「Ａさん」の「残業」の第１スコア）＋（「サービス」と「残業抑制」との関連度×「Ａさん」の「残業抑制」の第１スコア）＋（「サービス」と「サービス」との関連度×「Ａさん」の「サービス」の第１スコア）＋（「サービス」と「休暇」との関連度×「Ａさん」の「休暇」の第１スコア）＋（「サービス」と「休暇表」との関連度×「Ａさん」の「休暇表」の第１スコア）＝（－0.46217×0）＋（0.941227×0.063492）＋（0.941227×0.063492）＋（－0.40357×0.027778）＋（0.941227×0.063492）＋（－0.40357×0.027778）＋（－0.40357×0.027778）＋（1×0）＋（1×0）＋（1×0）=0.145650を算出する。 Specifically, for example, when calculating the second score of the user name "Mr. A" and the word "service" using the first table 113 shown in FIG. 4 and the second table 114 shown in FIG. The second score calculation unit 150 calculates the second score by performing the following calculation. In other words, the second score calculation unit 150 calculates (degree of association between "service" and "meeting" x first score of "Mr. A's" "meeting") + (degree of association between "service" and "work" x "Mr. A's" first score of "work") + (degree of relationship between "service" and "work style" x "Mr. A's" first score of "work style") + ("service" and "restraint" ” × ``Mr. A'''s first score of ``restraint'') + (degree of relationship between ``service'' and ``reform'' × ``Mr. A'''s 1st score of ``reform'') + (``Mr. A'''s 1st score of ``reform'') ” and “overtime work” x “Mr. A”’s first score of “overtime”) + (degree of association between “service” and “overtime work control” x “Mr. A”’s first score of “overtime work control”) score) + (degree of association between “service” and “service” × first score of “service” for “Mr. A”) + (degree of association between “service” and “vacation” × “vacation” of “person A”) ' first score) + (degree of association between 'service' and 'vacation table' x first score of 'A's' 'vacation table') = (-0.46217 x 0) + (0.941227 x 0.063492) + ( 0.941227 x 0.063492) + (-0.40357 x 0.027778) + (0.941227 x 0.063492) + (-0.40357 x 0.027778) + (-0.40357 x 0.027778) + (1 x 0) + (1 x 0) + (1 x 0) = Calculate 0.145650.

ランキング生成部１６０は、第１スコア情報１１２と第２スコア情報１１５とに基づいて、単語に対する人の関連性の高さを示すランキングテーブルを単語ごとに生成する。ランキング生成部１６０が生成するランキングテーブルは、図７で示すような第１の形式と図８で示すような第２の形式のうちのいずれかとなる。そして、ランキング生成部１６０は、生成したランキングテーブルをランキング情報１１６として記憶部１１０に格納する。 The ranking generation unit 160 generates a ranking table for each word, based on the first score information 112 and the second score information 115, indicating the degree of human relevance to the word. The ranking table generated by the ranking generation unit 160 has either a first format as shown in FIG. 7 or a second format as shown in FIG. 8. The ranking generation unit 160 then stores the generated ranking table in the storage unit 110 as the ranking information 116.

例えば、ランキング生成部１６０は、対象の単語について、第１スコアを算出したユーザのうち第１スコアが０でないユーザの数を計測する。つまり、ランキング生成部１６０は、単語を使用したことがあるユーザの数を計測する。 For example, the ranking generation unit 160 measures the number of users whose first score is not 0 among the users who have calculated the first score for the target word. In other words, the ranking generation unit 160 measures the number of users who have used the word.

計測したユーザの数が予め定められた閾値以上である場合、ランキング生成部１６０は、ユーザを特定する情報を第２スコアの降順で並び替えることで、第１の形式によるランキングテーブルを生成する。一方、計測したユーザの数が閾値未満である場合、ランキング生成部１６０は、ユーザを特定する情報を、第１スコアの降順で並び替えた後、第１スコアが０であるユーザを第２スコアの降順で並び変えることで、第２の形式によるランキングテーブルを生成する。このように、ランキング生成部１６０は、計測したユーザの数に応じて、異なる形式でユーザを特定する情報の並び替えを行う。また、第１の形式では第２のスコアに基づく並び替えを行い、第２の形式では第１スコア及び第２スコアに基づく並び替えを行うことになる。 If the measured number of users is equal to or greater than a predetermined threshold, the ranking generation unit 160 generates a ranking table in the first format by rearranging the information identifying the users in descending order of the second scores. On the other hand, if the number of measured users is less than the threshold, the ranking generation unit 160 rearranges the information identifying the users in descending order of the first score, and then assigns the users whose first score is 0 to the second score. By rearranging in descending order, a ranking table in the second format is generated. In this way, the ranking generation unit 160 rearranges information specifying users in different formats depending on the number of users measured. Further, in the first format, sorting is performed based on the second score, and in the second format, sorting is performed based on the first score and the second score.

なお、上述した閾値は任意の値で構わない。また、閾値は予め定められた固定値以外であっても構わない。例えば、閾値は、全体人数のn%などというような割合で示すものであっても構わない。また、閾値は、ユーザからのフィードバックなどによりで動的に変化させる値であっても構わない。 Note that the threshold value described above may be any value. Further, the threshold value may be other than a predetermined fixed value. For example, the threshold value may be expressed as a percentage such as n% of the total number of people. Further, the threshold value may be a value that is dynamically changed based on feedback from the user.

このように、ランキング生成部１６０は、単語を使用したことがあるユーザの数などの単語の使用状況に基づいて、第１の形式と第２の形式のどちらのランキングテーブルを生成するか決定する。そして、ランキング生成部１６０は、決定した形式でランキングテーブルを生成する。 In this way, the ranking generation unit 160 determines which ranking table to generate, the first format or the second format, based on the usage status of the word, such as the number of users who have used the word. . The ranking generation unit 160 then generates a ranking table in the determined format.

なお、ランキング生成部１６０は、上述したようなランキングテーブルの生成を、参照時間情報１１１などに含まれる各単語に対して実施する。そのため、ランキング情報１１６には、各単語に対するランキングテーブルが第１の形式または第２の形式のいずれかの形式で含まれている。 Note that the ranking generation unit 160 generates the ranking table as described above for each word included in the reference time information 111 and the like. Therefore, the ranking information 116 includes a ranking table for each word in either the first format or the second format.

キーワード受付部１７０は、検索者から検索キーワードを受け付ける。例えば、キーワード受付部１７０は、検索キーワードとして、単語を受け付ける。 The keyword accepting unit 170 accepts search keywords from searchers. For example, the keyword accepting unit 170 accepts words as search keywords.

検索部１８０は、検索キーワードに応じた検索を実行する。例えば、検索部１８０は、検索キーワードが示す単語のランキングテーブルをランキング情報１１６から検索する。 The search unit 180 executes a search according to a search keyword. For example, the search unit 180 searches the ranking table of the word indicated by the search keyword from the ranking information 116.

出力部１９０は、検索部１８０による検索結果であるランキングテーブルを出力する。出力部１９０による出力は、例えば、画面表示部に対する表示や外部装置に対する送信などがある。 The output unit 190 outputs a ranking table that is the search result by the search unit 180. The output by the output unit 190 includes, for example, display on a screen display unit and transmission to an external device.

キーワード受付部１７０と検索部１８０と出力部１９０とによる検索の実現例としては、例えば、キーワード受付部４が検索者の使用する外部装置から、通信ネットワークを介して、検索キーワードを受け付ける。そして、出力部１９０は、検索者の使用する外部装置に対して検索結果を出力する。このような形式が考えられる。検索は、関連性評価装置１００が有するキーボードなどにより検索キーワードを受け付け、関連性評価装置１００が有する画面表示部に表示するよう行われても構わない。また、キーワード受付部１７０は、例えば、与えられた文章から単語を抽出して、抽出した単語を検索キーワードとして受け付けるよう構成しても構わない。つまり、キーワード受付部１７０は、必ずしも検索者から直接検索キーワードの入力を受け付けるよう構成しなくても構わない。なお、本実施形態においては、キーワード受付部１７０が文章から単語を抽出する際の処理の内容については、特に限定しない。キーワード受付部１７０は、既知の技術を用いて文章から単語を抽出するよう構成することが出来る。 As an example of implementing a search by the keyword reception unit 170, search unit 180, and output unit 190, for example, the keyword reception unit 4 receives a search keyword from an external device used by a searcher via a communication network. The output unit 190 then outputs the search results to an external device used by the searcher. Such a format is possible. The search may be performed by accepting a search keyword using a keyboard or the like included in the relevance evaluation device 100 and displaying the search keyword on a screen display unit included in the relevance evaluation device 100. Further, the keyword accepting unit 170 may be configured to extract words from a given sentence and accept the extracted words as search keywords, for example. In other words, the keyword accepting unit 170 does not necessarily have to be configured to directly accept input of a search keyword from a searcher. Note that in this embodiment, the content of the process performed when the keyword reception unit 170 extracts words from a sentence is not particularly limited. The keyword reception unit 170 can be configured to extract words from sentences using known techniques.

以上が、関連性評価装置１００の構成の一例である。続いて、図１１から図１３までを参照して、関連性評価装置１００の動作の一例について説明する。 The above is an example of the configuration of the relevance evaluation device 100. Next, an example of the operation of the relevance evaluation device 100 will be described with reference to FIGS. 11 to 13.

まず、図１１を参照して、ランキングテーブルを生成してランキング情報１１６として格納する際の関連性評価装置１００の動作の一例について説明する。 First, with reference to FIG. 11, an example of the operation of the relevance evaluation device 100 when generating a ranking table and storing it as the ranking information 116 will be described.

図１１を参照すると、関連性評価装置１００の第１スコア算出部１２０は、参照時間情報１１１に基づいて、各ユーザに対して単語ごとに、ユーザと単語との関連性を数値化した第１スコアを算出する（ステップＳ１０１）。例えば、第１スコア算出部１２０は、図８で示すTF-IDFを算出することで第１スコアを算出する。 Referring to FIG. 11, the first score calculation unit 120 of the relevance evaluation device 100 calculates, for each user, a first score calculation unit 120 that quantifies the relevance between the user and the word for each word based on the reference time information 111. A score is calculated (step S101). For example, the first score calculation unit 120 calculates the first score by calculating the TF-IDF shown in FIG. 8.

第１テーブル生成部１３０は、第１スコア情報１１２に基づいて、第１テーブル１１３を生成する（ステップＳ１０２）。 The first table generation unit 130 generates the first table 113 based on the first score information 112 (step S102).

第２テーブル生成部１４０は、第１テーブル１１３に基づいて、単語間の関連度を示す第２テーブル１１４を生成する（ステップＳ１０３）。例えば、第２テーブル生成部１４０は、図９で示す計算式を計算することで、単語間の関連度を算出する。そして、第２テーブル生成部１４０は、生成した各単語間の関連度をテーブル形式で表現することで、第２テーブル１１４を生成する。 The second table generation unit 140 generates a second table 114 indicating the degree of association between words based on the first table 113 (step S103). For example, the second table generation unit 140 calculates the degree of association between words by calculating the calculation formula shown in FIG. The second table generation unit 140 then generates the second table 114 by expressing the degree of association between the generated words in a table format.

第２スコア算出部１５０は、第１スコア情報１１２または第１テーブル１１３が示す第１スコアと、第２テーブル１１４が示す単語間の関連度と、に基づいて、単語間の関連度を第１スコアに加味した第２スコアを算出する（ステップＳ１０４）。例えば、第２スコア算出部１５０は、単語Ｘとある単語との単語間の関連度×ある単語へのユーザＡの第１スコアを、各単語について算出して和をとることで、第２スコアを算出する。 The second score calculation unit 150 calculates the degree of association between words based on the first score indicated by the first score information 112 or the first table 113 and the degree of association between words indicated by the second table 114. A second score is calculated by taking the score into consideration (step S104). For example, the second score calculation unit 150 calculates the degree of association between word X and a certain word x the first score of user A for a certain word for each word, and calculates the sum, thereby obtaining the second score. Calculate.

ランキング生成部１６０は、第１スコア情報１１２と第２スコア情報１１５とに基づいて、単語に対する人の関連性の高さを示すランキングテーブルを単語ごとに生成する（ステップＳ１０５）。ランキング生成部１６０が第１の形式と第２の形式のいずれでランキングテーブルを生成するかは、例えば、単語の使用状況に基づいて決定される。 The ranking generation unit 160 generates, for each word, a ranking table indicating the degree of human relevance to the word, based on the first score information 112 and the second score information 115 (step S105). Whether the ranking generation unit 160 generates the ranking table in the first format or the second format is determined based on, for example, the usage status of the word.

以上が、ランキングテーブルを生成してランキング情報１１６として格納する際の関連性評価装置１００の動作の一例である。続いて、図１２を参照して、ステップＳ１０５のランキング生成処理についてより詳細に説明する。 The above is an example of the operation of the relevance evaluation device 100 when generating a ranking table and storing it as the ranking information 116. Next, with reference to FIG. 12, the ranking generation process in step S105 will be described in more detail.

図１２を参照すると、ランキング生成部１６０は、対象の単語について、第１スコアが０でないユーザの数を計測する（ステップＳ２０１）。 Referring to FIG. 12, the ranking generation unit 160 measures the number of users whose first score is not 0 for the target word (step S201).

計測したユーザの数が予め定められた閾値以上である場合（ステップＳ２０２、Ｙｅｓ）、ランキング生成部１６０は、ユーザを特定する情報を第２スコアの降順で並び替える（ステップＳ２０３）。これにより、ランキング生成部１６０は、第１の形式によるランキングテーブルを生成する。 If the measured number of users is equal to or greater than a predetermined threshold (step S202, Yes), the ranking generation unit 160 sorts the information identifying the users in descending order of the second score (step S203). Thereby, the ranking generation unit 160 generates a ranking table in the first format.

一方、計測したユーザの数が予め定められた閾値未満である場合（ステップＳ２０２、Ｎｏ）、ランキング生成部１６０は、ユーザを特定する情報を第１スコアの降順で並び替える（ステップＳ２０４）。また、ランキング生成部１６０は、第１スコアが０であるユーザを第２スコアの降順で並び変える（ステップＳ２０５）。これにより、ランキング生成部１６０は、第２の形式によるランキングを生成する。 On the other hand, if the measured number of users is less than the predetermined threshold (step S202, No), the ranking generation unit 160 sorts the information identifying the users in descending order of the first score (step S204). Furthermore, the ranking generation unit 160 sorts the users whose first score is 0 in descending order of their second scores (step S205). Thereby, the ranking generation unit 160 generates rankings in the second format.

以上が、図１１のステップＳ１０５で示したランキング生成処理の一例である。続いて、図１３を参照して、検索処理を行う際の関連性評価装置１００の動作の一例について説明する。 The above is an example of the ranking generation process shown in step S105 in FIG. 11. Next, with reference to FIG. 13, an example of the operation of the relevance evaluation device 100 when performing a search process will be described.

図１３を参照すると、キーワード受付部１７０は、検索者から検索キーワードを受け付ける（ステップＳ３０１）。例えば、キーワード受付部１７０は、検索キーワードとして、単語を受け付ける。 Referring to FIG. 13, the keyword accepting unit 170 accepts a search keyword from a searcher (step S301). For example, the keyword accepting unit 170 accepts words as search keywords.

検索部１８０は、検索キーワードに応じた検索を実行する（ステップＳ３０２）。例えば、検索部１８０は、検索キーワードが示す単語のランキングテーブルをランキング情報１１６から検索する。 The search unit 180 executes a search according to the search keyword (step S302). For example, the search unit 180 searches the ranking table of the word indicated by the search keyword from the ranking information 116.

出力部１９０は、検索部１８０による検索結果であるランキングテーブルを出力する（ステップＳ３０３）。出力部１９０による出力は、例えば、画面表示部に対する表示や外部装置に対する送信などがある。 The output unit 190 outputs a ranking table that is the search result by the search unit 180 (step S303). The output by the output unit 190 includes, for example, display on a screen display unit and transmission to an external device.

以上が、検索処理を行う際の関連性評価装置１００の動作の一例である。 The above is an example of the operation of the relevance evaluation device 100 when performing a search process.

このように、関連性評価装置１００は、第１スコアを算出する第１スコア算出部１２０と、第２スコアを算出する第２スコア算出部１５０と、ランキング生成部１６０と、を有している。このような構成により、ランキング生成部１６０は、単語の使用状況などに応じて、第２スコアに基づく並び替えを行う第１の形式と、第１スコア及び第２スコアに基づく並び替えを行う第２の形式と、のいずれの形式でランキングテーブルを生成するか決定することが出来る。これにより、第２スコアを用いた並び替えにより表記ゆれなどに対応しつつ、専門性の高い単語など単語の使用者が少ない場合に実際の使用者を上位にランキングさせることが可能となる。つまり、上記構成によると、類語や表記ゆれなどに強くしつつ、実際には使っていない単語に対する関連スコアが高くなる影響を抑制することが可能となる。 In this way, the relevance evaluation device 100 includes the first score calculation unit 120 that calculates the first score, the second score calculation unit 150 that calculates the second score, and the ranking generation unit 160. . With such a configuration, the ranking generation unit 160 can perform sorting based on the second score according to word usage, etc., and a second format sorting based on the first and second scores. It is possible to decide which format to generate the ranking table in. This makes it possible to deal with variations in spelling by sorting using the second score, and to rank actual users higher when there are fewer users of a word, such as a highly specialized word. In other words, according to the above configuration, it is possible to suppress the influence of high association scores for words that are not actually used, while making it resistant to synonyms and spelling variations.

なお、本実施形態においては、関連性評価装置１００が１台の情報処理装置により構成される場合について例示した。しかしながら、関連性評価装置１００は、例えば、ネットワークを介して接続された複数台の情報処理装置により構成されても構わない。例えば、関連性評価装置１００は、記憶部１１０と第１スコア算出部１２０と第１テーブル生成部１３０と第２テーブル生成部１４０と第２スコア算出部１５０とランキング生成部１６０とを有する情報処理装置と、キーワード受付部１７０と検索部１８０と出力部１９０とを有する情報処理装置と、から構成されても構わない。 In addition, in this embodiment, the case where the relevance evaluation apparatus 100 is comprised by one information processing apparatus was illustrated. However, the relevance evaluation device 100 may be configured by, for example, a plurality of information processing devices connected via a network. For example, the relevance evaluation device 100 includes an information processing unit 110, a first score calculation unit 120, a first table generation unit 130, a second table generation unit 140, a second score calculation unit 150, and a ranking generation unit 160. The information processing apparatus may include an information processing apparatus having a keyword reception section 170, a search section 180, and an output section 190.

また、本実施形態においては、ランキング情報１１６にランキングテーブルが予め生成されて格納されている場合について例示した。しかしながら、ランキングテーブルは、例えば、検索部１８０による検索が行われる際に生成されるように構成しても構わない。このように、ランキングテーブルの生成時期は本実施形態で例示した場合に限定されない。 Furthermore, in this embodiment, a case has been exemplified in which a ranking table is generated and stored in the ranking information 116 in advance. However, the ranking table may be configured to be generated, for example, when the search unit 180 performs a search. In this way, the generation timing of the ranking table is not limited to the case illustrated in this embodiment.

また、本実施形態においては、第１スコアが０でないユーザの数を計測した結果に基づいて、いずれの形式でランキングテーブルを生成するか決定する場合について例示した。しかしながら、ランキングテーブル生成部１６０は、上記例示した以外の方法により、第１の形式と第２の形式のいずれの形式でランキングテーブルを生成するか決定するよう構成しても構わない。例えば、ランキング生成部１６０は、第１スコアが所定の基準閾値以上のユーザの数に基づいて、いずれの形式でランキングテーブルを生成するか決定するよう構成しても構わない。換言すると、ランキング生成部１６０は、第１スコアが所定の基準閾値以上であるユーザの数が予め定められた閾値以上である場合に、第１の形式によるランキングテーブルを生成するよう構成することが出来る。また、ランキング生成部１６０は、例えば、第１の形式によるランキングテーブルを生成した際に所定順位以上に存在する、第１スコアが所定の基準閾値未満のユーザの数などに基づいて、いずれの形式でランキングテーブルを生成するか決定するよう構成しても構わない。また、ランキング生成部１６０は、第２スコアの状況などを加味して、いずれの形式でランキングテーブルを生成するか決定するよう構成しても構わない。なお、上記基準閾値は任意の値で構わない。 Furthermore, in the present embodiment, a case has been exemplified in which it is determined in which format the ranking table is generated based on the result of counting the number of users whose first score is not 0. However, the ranking table generation unit 160 may be configured to determine which format to generate the ranking table in, the first format or the second format, by a method other than the one exemplified above. For example, the ranking generation unit 160 may be configured to determine in which format the ranking table is generated based on the number of users whose first score is equal to or higher than a predetermined reference threshold. In other words, the ranking generation unit 160 may be configured to generate a ranking table in the first format when the number of users whose first score is greater than or equal to a predetermined reference threshold is greater than or equal to a predetermined threshold. I can do it. Furthermore, the ranking generation unit 160 selects one of the formats based on, for example, the number of users whose first score is less than a predetermined reference threshold and who are present in a predetermined rank or higher when the ranking table in the first format is generated. It may be configured such that it is determined whether to generate a ranking table. Further, the ranking generation unit 160 may be configured to take into account the situation of the second score and the like to determine in which format the ranking table is generated. Note that the reference threshold value may be any value.

また、第２テーブル生成部１４０による単語間の関連度を算出する処理は、単語数によっては計算量が多くなる。そのため、使用率に使用率閾値を設けるなどの方法により、関連度を算出する単語の数に制限を設けるよう構成しても構わない。 Furthermore, the process of calculating the degree of association between words by the second table generation unit 140 requires a large amount of calculation depending on the number of words. Therefore, the number of words for which the degree of association is calculated may be limited by a method such as setting a usage rate threshold value for the usage rate.

［第２の実施形態］
次に、図１４を参照して、本発明の第２の実施形態について説明する。第２の実施形態では、関連性評価装置２０の構成の概要について説明する。 [Second embodiment]
Next, referring to FIG. 14, a second embodiment of the present invention will be described. In the second embodiment, an overview of the configuration of the relevance evaluation device 20 will be described.

図１４は、関連性評価装置２０の構成の一例を示している。図１３を参照すると、関連性評価装置２０は、第１スコア算出部２１と、第２スコア算出部２２と、並び替え処理部２３と、を有している。 FIG. 14 shows an example of the configuration of the relevance evaluation device 20. Referring to FIG. 13, the relevance evaluation device 20 includes a first score calculation section 21, a second score calculation section 22, and a rearrangement processing section 23.

例えば、関連性評価装置２０は、ＣＰＵなどの演算装置と記憶装置とを有している。例えば、関連性評価装置２０は、記憶装置に格納されたプログラムを演算装置が実行することで、上記各処理部を実現する。 For example, the relevance evaluation device 20 includes a calculation device such as a CPU and a storage device. For example, the relevance evaluation device 20 realizes each of the processing units described above by having a calculation device execute a program stored in a storage device.

第１スコア算出部２１は、単語の参照時間を示す情報に基づいて、ユーザと単語との関連の高さを示す第１スコアを算出する。 The first score calculation unit 21 calculates a first score indicating the degree of association between the user and the word, based on information indicating the reference time of the word.

第２スコア算出部２２は、第１スコア算出部が算出した第１スコアと単語間の関連度とに基づいて、第２スコアを算出する。 The second score calculation unit 22 calculates a second score based on the first score calculated by the first score calculation unit and the degree of association between words.

並び替え処理部２３は、第１スコア算出部２１が算出した第１スコアと、第２スコア算出部２２が算出した第２スコアとに基づいて、ユーザを特定する情報の並び替えを行う。 The rearrangement processing unit 23 rearranges the information identifying the users based on the first score calculated by the first score calculation unit 21 and the second score calculated by the second score calculation unit 22.

このように、関連性評価装置２０は、第１スコア算出部２１と第２スコア算出部２２と並び替え処理部２３とを有している。このような構成により並び替え処理部２３は、第１スコア算出部２１が算出した第１スコアと、第２スコア算出部２２が算出した第２スコアとに基づいて、ユーザを特定する情報の並び替えを行うことが出来る。その結果、並び替え処理部２３は、例えば、第２スコアに基づく並び替えを行うか、第１スコア及び第２スコアに基づく並び替えを行うか、などを決定することが可能となる。これにより、類語や表記ゆれなどに強くしつつ、実際には使っていない単語に対する関連スコアが高くなる影響を抑制することが可能となる。 In this way, the relevance evaluation device 20 includes a first score calculation section 21 , a second score calculation section 22 , and a rearrangement processing section 23 . With such a configuration, the sorting processing unit 23 sorts the information identifying the user based on the first score calculated by the first score calculation unit 21 and the second score calculated by the second score calculation unit 22. You can make changes. As a result, the sorting processing unit 23 can determine, for example, whether to perform sorting based on the second score or based on the first score and second score. This makes it possible to suppress the effects of high association scores for words that are not actually used, while making it resistant to synonyms and spelling variations.

また、上述した関連性評価装置２０は、当該関連性評価装置２０に所定のプログラムが組み込まれることで実現できる。具体的に、本発明の他の形態であるプログラムは、情報処理装置に、単語の参照時間を示す情報に基づいて、ユーザと単語との関連の高さを示す第１スコアを算出する第１スコア算出部２１と、第１スコアと単語間の関連度とに基づいて、第２スコアを算出する第２スコア算出部と、算出した第１スコアと第２スコアとに基づいて、ユーザを特定する情報の並び替えを行う並び替え処理部と、を実現するためのプログラムである。 Furthermore, the above-described relevance evaluation device 20 can be realized by incorporating a predetermined program into the relevance evaluation device 20. Specifically, a program according to another embodiment of the present invention causes an information processing device to calculate a first score indicating the degree of association between the user and the word based on information indicating the reference time of the word. A score calculation unit 21, a second score calculation unit that calculates a second score based on the first score and the degree of association between words, and a user identified based on the calculated first score and second score. This is a program for realizing a rearrangement processing unit that rearranges information to be processed.

また、上述した関連性評価装置２０により実行される関連性評価方法は、情報処理装置が、単語の参照時間を示す情報に基づいて、ユーザと単語との関連の高さを示す第１スコアを算出し、前記第１スコアと単語間の関連度とに基づいて、第２スコアを算出し、算出した第１スコアと第２スコアとに基づいて、ユーザを特定する情報の並び替えを行う、という方法である。 Further, in the relevancy evaluation method executed by the relevancy evaluation device 20 described above, the information processing device calculates a first score indicating the degree of relevancy between the user and the word based on information indicating the reference time of the word. and calculating a second score based on the first score and the degree of association between words, and rearranging information that identifies the user based on the calculated first score and second score. This is the method.

上述した構成を有する、プログラム、又は、関連性評価方法、の発明であっても、上記関連性評価装置２０と同様の作用・効果を有するために、上述した本発明の目的を達成することが出来る。 Even if the invention is a program or a relevance evaluation method having the above-mentioned configuration, it is possible to achieve the above-mentioned object of the present invention because it has the same operation and effect as the relevance evaluation device 20. I can do it.

＜付記＞
上記実施形態の一部又は全部は、以下の付記のようにも記載されうる。以下、本発明における関連性評価装置などの概略を説明する。但し、本発明は、以下の構成に限定されない。 <Additional notes>
Part or all of the above embodiments may also be described as in the following additional notes. Hereinafter, an outline of the relevance evaluation device and the like in the present invention will be explained. However, the present invention is not limited to the following configuration.

（付記１）
情報処理装置が、
単語の参照時間を示す情報に基づいて、ユーザと単語との関連の高さを示す第１スコアを算出し、
前記第１スコアと単語間の関連度とに基づいて、第２スコアを算出し、
算出した前記第１スコアと前記第２スコアとに基づいて、ユーザを特定する情報の並び替えを行う
関連性評価方法。
（付記２）
付記１に記載の関連性評価方法であって、
単語の使用状況に基づいて並び替えを行う形式を決定し、
決定した形式による並び替えを行う
関連性評価方法。
（付記３）
付記２に記載の関連性評価方法であって、
前記第１スコアを算出したユーザのうち前記第１スコアの値が所定の基準閾値以上であるユーザの数に基づいて、単語の使用状況を判断する
関連性評価方法。
（付記４）
付記１から付記３までのいずれか１項に記載の関連性評価方法であって、
前記第２スコアに基づく並び替えを行う第１の形式と、前記第１スコア及び前記第２スコアに基づく並び替えを行う第２の形式と、のうちのいずれかの形式による並び替えを行う
関連性評価方法。
（付記５）
付記４に記載の関連性評価方法であって、
前記第１の形式では、ユーザを特定する情報を前記第２スコアの降順で並び替える
関連性評価方法。
（付記６）
付記４または付記５に記載の関連性評価方法であって、
前記第２の形式では、ユーザを特定する情報を前記第１スコアの降順で並び替えた後、前記第１スコアの値が０であるユーザのユーザを特定する情報を前記第２スコアの降順で並び替える
関連性評価方法。
（付記７）
付記１から付記６までのいずれか１項に記載の関連性評価方法であって、
検索キーワードに基づいて並び替えた結果を検索し、検索の結果を出力する
関連性評価方法。
（付記８）
付記１から付記７までのいずれか１項に記載の関連性評価方法であって、
単語間の関連度を前記第１スコアに基づいて算出する
関連性評価方法。
（付記９）
単語の参照時間を示す情報に基づいて、ユーザと単語との関連の高さを示す第１スコアを算出する第１スコア算出部と、
前記第１スコアと単語間の関連度とに基づいて、第２スコアを算出する第２スコア算出部と、
算出した前記第１スコアと前記第２スコアとに基づいて、ユーザを特定する情報の並び替えを行う並び替え処理部と、
を有する
関連性評価装置。
（付記１０）
情報処理装置に、
単語の参照時間を示す情報に基づいて、ユーザと単語との関連の高さを示す第１スコアを算出する第１スコア算出部と、
前記第１スコアと単語間の関連度とに基づいて、第２スコアを算出する第２スコア算出部と、
算出した前記第１スコアと前記第２スコアとに基づいて、ユーザを特定する情報の並び替えを行う並び替え処理部と、
を実現するためのプログラム。 (Additional note 1)
The information processing device
calculating a first score indicating the degree of association between the user and the word based on information indicating the reference time of the word;
Calculating a second score based on the first score and the degree of association between words,
A relevance evaluation method in which information for identifying users is rearranged based on the calculated first score and second score.
(Additional note 2)
The relevance evaluation method described in Appendix 1,
Determine the format for sorting based on word usage,
A relevance evaluation method that performs sorting according to a determined format.
(Additional note 3)
The relevance evaluation method described in Appendix 2,
A relevance evaluation method, wherein usage of a word is determined based on the number of users whose first score is equal to or greater than a predetermined reference threshold among the users who have calculated the first score.
(Additional note 4)
The relevance evaluation method according to any one of Supplementary Notes 1 to 3,
A first format that performs sorting based on the second score, and a second format that performs sorting based on the first score and the second score. Gender evaluation method.
(Appendix 5)
The relevance evaluation method described in Appendix 4,
In the first format, information for identifying users is sorted in descending order of the second score.
(Appendix 6)
The relevance evaluation method described in Appendix 4 or Appendix 5,
In the second format, after sorting the information that identifies the users in descending order of the first score, the information that identifies the users whose first score is 0 is sorted in the descending order of the second score. Sort by Relevance evaluation method.
(Appendix 7)
The relevance evaluation method according to any one of Supplementary notes 1 to 6,
A relevance evaluation method that searches for results sorted based on search keywords and outputs the search results.
(Appendix 8)
The relevance evaluation method according to any one of Supplementary Notes 1 to 7,
A relevance evaluation method that calculates a degree of relevance between words based on the first score.
(Appendix 9)
a first score calculation unit that calculates a first score indicating the degree of association between the user and the word based on information indicating the reference time of the word;
a second score calculation unit that calculates a second score based on the first score and the degree of association between words;
a sorting processing unit that sorts information for identifying users based on the calculated first score and second score;
A relevance evaluation device having:
(Appendix 10)
In the information processing device,
a first score calculation unit that calculates a first score indicating the degree of association between the user and the word based on information indicating the reference time of the word;
a second score calculation unit that calculates a second score based on the first score and the degree of association between words;
a sorting processing unit that sorts information for identifying users based on the calculated first score and second score;
A program to achieve this.

なお、上記各実施形態及び付記において記載したプログラムは、記憶装置に記憶されていたり、コンピュータが読み取り可能な記録媒体に記録されていたりする。例えば、記録媒体は、フレキシブルディスク、光ディスク、光磁気ディスク、及び、半導体メモリ等の可搬性を有する媒体である。 Note that the programs described in each of the above embodiments and supplementary notes are stored in a storage device or recorded in a computer-readable recording medium. For example, the recording medium is a portable medium such as a flexible disk, an optical disk, a magneto-optical disk, and a semiconductor memory.

以上、上記各実施形態を参照して本願発明を説明したが、本願発明は、上述した実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明の範囲内で当業者が理解しうる様々な変更をすることが出来る。 Although the present invention has been described above with reference to the embodiments described above, the present invention is not limited to the embodiments described above. The configuration and details of the present invention can be modified in various ways that can be understood by those skilled in the art within the scope of the present invention.

１００関連性評価装置
１１０記憶部
１１１参照時間情報
１１２第１スコア情報
１１３第１テーブル
１１４第２テーブル
１１５第２スコア情報
１１６ランキング情報
１２０第１スコア算出部
１３０第１テーブル生成部
１４０第２テーブル生成部
１５０第２スコア算出部
１６０ランキング生成部
１７０キーワード受付部
１８０検索部
１９０出力部
２０関連性評価装置
２１第１スコア算出部
２２第２スコア算出部
２３並び替え処理部

100 Relevance evaluation device 110 Storage unit 111 Reference time information 112 First score information 113 First table 114 Second table 115 Second score information 116 Ranking information 120 First score calculation unit 130 First table generation unit 140 Second table generation Section 150 Second score calculation section 160 Ranking generation section 170 Keyword reception section 180 Search section 190 Output section 20 Relevance evaluation device 21 First score calculation section 22 Second score calculation section 23 Sorting processing section

Claims

The information processing device
Based on the information indicating the reference time of the word, calculate a first score that is a value according to the reference time of the word by the user and indicates the height of the relationship between the user and the word,
Based on the first score and the degree of association between words, a second score is calculated by adding the degree of association between words to the first score,
By sorting the information that identifies users based on the calculated first score, second score, and word usage, it shows the strength of the relationship between people and words, and makes it easier to search. generate a ranking of
When generating the ranking, for the word for which the ranking is to be generated, the number of users whose first score is equal to or higher than a threshold value is measured among the users who have calculated the first score, and the number of users whose first score is equal to or higher than a threshold value is measured. In one case, the ranking is generated by rearranging information that identifies users in descending order of the second score, and if the number of measured users is less than a threshold, the users are identified in descending order of the first score. 2. A relevance evaluation method, wherein the ranking is generated by sorting the information on the first score, and then sorting the users whose first score is 0 in descending order of the second score .

The relevance evaluation method according to claim 1,
A relevance evaluation method that searches the rankings that are the results of sorting based on the obtained search keywords and outputs the search results.

a first score calculation unit that calculates a first score, which is a value according to the user's reference time of the word and indicates a high degree of association between the user and the word, based on information indicating the reference time of the word;
a second score calculation unit that calculates a second score by adding the degree of association between words to the first score based on the first score and the degree of association between words;
By sorting the information that identifies users based on the calculated first score, second score, and word usage, it shows the strength of the relationship between people and words, and makes it easier to search. a sorting processing unit that generates a ranking,
has
When generating the ranking, the sorting processing unit measures the number of users whose first score is equal to or higher than a reference value for the word for which the ranking is to be generated, among the users who have calculated the first score. If the number of users is greater than or equal to the threshold, the ranking is generated by rearranging the information that identifies the users in descending order of the second score, and if the number of users measured is less than the threshold, the first A relevance evaluation device that generates the ranking by sorting information specifying users in descending order of scores, and then sorting users whose first score is 0 in descending order of their second scores .

In the information processing device,
a first score calculation unit that calculates a first score, which is a value according to the user's reference time of the word and indicates a high degree of association between the user and the word, based on information indicating the reference time of the word;
a second score calculation unit that calculates a second score by adding the degree of association between words to the first score based on the first score and the degree of association between words;
By sorting the information that identifies users based on the calculated first score, second score, and word usage, it shows the strength of the relationship between people and words, and makes it easier to search. a sorting processing unit that generates a ranking,
Realize,
When generating the ranking, the sorting processing unit measures the number of users whose first score is equal to or higher than a reference value for the word for which the ranking is to be generated, among the users who have calculated the first score. If the number of users is greater than or equal to the threshold, the ranking is generated by rearranging the information that identifies the users in descending order of the second score, and if the number of users measured is less than the threshold, the first A program that generates the ranking by sorting information specifying users in descending order of scores, and then sorting users whose first score is 0 in descending order of their second scores .