JP7682525B2

JP7682525B2 - Content similarity calculation system and content search and recommendation system

Info

Publication number: JP7682525B2
Application number: JP2021112077A
Authority: JP
Inventors: 崇文中西
Original assignee: MUSASHINO UNIVERSITY
Current assignee: MUSASHINO UNIVERSITY
Priority date: 2021-07-06
Filing date: 2021-07-06
Publication date: 2025-05-26
Anticipated expiration: 2041-07-06
Also published as: JP2023008474A

Description

本発明は、コンテンツの類似を計算する技術に関し、特に、時系列で変化する展開や筋を有するメディアコンテンツの類似度を計算するコンテンツ類似システムおよびコンテンツ検索・推奨システムに適用して有効な技術に関するものである。 The present invention relates to a technology for calculating content similarity, and in particular to a technology that is useful when applied to a content similarity system and a content search and recommendation system that calculates the similarity of media content that has a development or plot that changes over time.

近年、インターネット上のいたるところにデジタルのメディアコンテンツ（例えば、小説、音楽、映画、動画等）が大量に存在しており、ユーザーがこれらを楽しむ機会が増えている。ここでは、ユーザーの趣味・嗜好や意図に合致するコンテンツを効果的に取得して推奨する手法を確立することが重要となる。特に、単なるパターンマッチングではなく、いわゆる感性検索の一つとして、メディアコンテンツの意味や印象といった内容に基づいて検索・推奨する手法を確立することが必要である。 In recent years, a huge amount of digital media content (e.g., novels, music, movies, videos, etc.) has been available all over the Internet, and users have more opportunities to enjoy this content. It is important to establish a method to effectively acquire and recommend content that matches the user's hobbies, tastes, and intentions. In particular, it is necessary to establish a method that searches and recommends content based on the meaning and impression of media content, as a type of so-called emotional search, rather than simply pattern matching.

一般的に、これらのメディアコンテンツは、時系列で変化する展開や筋を有し、その意味や印象が時間とともに遷移する。例えば、小説などでは、表現方法や手法は相違していても、似通った展開や筋を有する作品があり、展開や筋から作者のバックグラウンドを推測することができる場合もある。メディアコンテンツの意味的な特徴は、メディアコンテンツそのものの特徴によってだけでなく、時系列での特徴の変化によっても表現されているのであり、メディアコンテンツとして表現された作品を評価する場合は、全体での意味や印象だけでなく、各シーンで呼び起こされる意味や印象についても検証することが重要である。 Generally, these media contents have developments and plots that change over time, and their meanings and impressions change over time. For example, novels and other works may have similar developments and plots even if the expression methods and techniques differ, and it may be possible to infer the author's background from the development and plot. The semantic characteristics of media content are expressed not only by the characteristics of the media content itself, but also by changes in characteristics over time. When evaluating a work expressed as media content, it is important to examine not only the meaning and impression as a whole, but also the meaning and impression evoked by each scene.

一般的な時系列データの分析においては、２つの時系列データ間の類似を概念的に把握するための適切な手法を用いることが重要である。これには大きく分けて、時系列データ間の類似を直接的に計算する手法と、時系列データを周波数データに変換して類似を計算する手法とがある。 In general time series data analysis, it is important to use an appropriate method to conceptually grasp the similarity between two pieces of time series data. Broadly speaking, there are two methods: one that directly calculates the similarity between time series data, and one that converts the time series data into frequency data and then calculates the similarity.

前者の直接的に計算する手法としては、例えば、ＤＴＷ（Dynamic Time Warping：動的時間伸縮法）や、ＥＲＰ（Edit distance with Real Penalty）、ＬＣＳ（Longest Common Subsequence：最長共通部分列問題）、ＥＤＲ（Edit Distance on Real sequence）、ＦＴＳＥ（Fast Time Series Evaluation）など、各種の手法が知られている。また、後者の周波数データに変換して計算する手法としては、フーリエ変換を用いた手法（非特許文献１、非特許文献２など）や、ハールウェーブレットを用いた手法がある。 There are various known methods for directly calculating the former, such as Dynamic Time Warping (DTW), Edit distance with Real Penalty (ERP), Longest Common Subsequence (LCS), Edit Distance on Real sequence (EDR), and Fast Time Series Evaluation (FTSE). In addition, there are methods for converting the latter to frequency data and calculating the latter, such as a method using the Fourier transform (Non-Patent Document 1, Non-Patent Document 2, etc.) and a method using the Haar wavelet.

また、単なるパターンマッチングではなく、コンテンツの時系列での意味内容の変化を考慮する技術として、例えば、特開２０２１－９６３３号公報（特許文献１）には、検索対象データのコンテンツを解析して、各感動項目とその感動量とが対応づけられた感動ベクトルを算出するとともに、基準試料に対する利用者の応答内容に基づいて、利用者が要求する事象についての感動ベクトルを算出し、検索対象データについて算出された感動ベクトルと、利用者について算出された感動ベクトルとを比較演算することにより、利用者について算出された感動ベクトルに近い感動ベクトルを示す検索対象データを抽出する情報検索方法が記載されている。感動項目に、複数の特定の条件が時系列に充足されると判定された場合に感動量が増加される項目を含むことで、時系列のストーリー性を考慮することができるとされている。 As a technique that takes into account changes in the meaning of content over time rather than simply pattern matching, for example, JP 2021-9633 A (Patent Document 1) describes an information retrieval method that analyzes the content of search target data to calculate an emotion vector in which each emotion item and its emotion amount are associated, calculates an emotion vector for an event requested by a user based on the user's response to a reference sample, and extracts search target data that shows an emotion vector close to the emotion vector calculated for the user by performing a comparative calculation between the emotion vector calculated for the search target data and the emotion vector calculated for the user. It is said that the story-like nature of the time series can be taken into account by including in the emotion items an item whose emotion amount is increased when it is determined that multiple specific conditions are satisfied in time series.

特開２０２１－９６３３号公報JP 2021-9633 A

R.Agrawal, C.Faloutsos, A.Swami, "Efficient similarity search in sequence databases", In International conference on foundations of data organization and algorithms, pp. 69-84, 1993.R.Agrawal, C.Faloutsos, A.Swami, "Efficient similarity search in sequence databases", In International conference on foundations of data organization and algorithms, pp. 69-84, 1993. D.Rafiei, A.O.Mendelzon, "Efficient retrieval of similar time sequences using DFT", In 1998 International Conference on Data Organization (FODO'98), 1998.D. Rafiei, A. O. Mendelzon, "Efficient retrieval of similar time sequences using DFT", In 1998 International Conference on Data Organization (FODO'98), 1998.

例えば、非特許文献１、２なども含む、従来の時系列データの類似の計算手法をメディアコンテンツに適用することで、メディアコンテンツのデータ自体の類似を評価することが可能である。しかしながら、メディアコンテンツにおける意味や印象の時系列での遷移といった意味内容を考慮した類似の評価をすることは困難である。 For example, by applying conventional methods for calculating similarity of time-series data, including those described in Non-Patent Documents 1 and 2, to media content, it is possible to evaluate the similarity of the media content data itself. However, it is difficult to evaluate the similarity taking into account the semantic content, such as the transition of meaning and impressions over time in the media content.

一方、特許文献１に記載されたような技術では、コンテンツの意味内容の時系列での変化について一定程度考慮することができる。しかしながら、時系列といっても項目の出現順序や頻度を考慮できるにとどまり、意味内容が変化する時間的な間隔まで考慮することはできない。 On the other hand, the technology described in Patent Document 1 can take into account to a certain extent the change in the meaning of the content over time. However, even though it is a time series, it can only take into account the order and frequency of appearance of items, and cannot take into account the time interval at which the meaning changes.

そこで本発明の目的は、メディアコンテンツの意味や印象の変化、遷移を時間軸、時系列で把握して、メディアコンテンツの展開や筋に基づく類似度を計算するコンテンツ類似計算システム、およびこれを用いたコンテンツ検索・推奨システムを提供することにある。 The object of the present invention is to provide a content similarity calculation system that grasps the changes and transitions in the meaning and impression of media content on a time axis and in a chronological order, and calculates the similarity based on the development and plot of the media content, and a content search and recommendation system that uses the same.

本発明の前記ならびにその他の目的と新規な特徴は、本明細書の記述および添付図面から明らかになるであろう。 The above and other objects and novel features of the present invention will become apparent from the description of this specification and the accompanying drawings.

本願において開示される発明のうち、代表的なものの概要を簡単に説明すれば、以下のとおりである。 The following is a brief summary of the representative inventions disclosed in this application:

本発明の代表的な実施の形態によるコンテンツ類似計算システムは、時系列で内容が変化するメディアコンテンツ間の類似度を計算するコンテンツ類似計算システムであって、入力された２以上のメディアコンテンツについて、それぞれ複数の意味項目についてその影響の程度の変化を抽出して意味波形のデータとして出力する意味波形抽出部と、前記各メディアコンテンツの前記各意味波形のデータからフーリエ変換により意味周波数スペクトルを取得して出力する時間・周波数領域変換部と、前記各メディアコンテンツについて、それぞれ、前記各意味周波数スペクトルのデータを要素とする意味周波数スペクトルベクトルを生成し、前記意味周波数スペクトルベクトル間のコサイン類似度を計算して出力する類似計算部と、を有する。 A content similarity calculation system according to a representative embodiment of the present invention is a content similarity calculation system that calculates the similarity between media contents whose contents change over time, and includes a semantic waveform extraction unit that extracts changes in the degree of influence of multiple semantic items for two or more input media contents and outputs them as semantic waveform data, a time-frequency domain conversion unit that obtains and outputs a semantic frequency spectrum from the semantic waveform data of each of the media contents by Fourier transform, and a similarity calculation unit that generates a semantic frequency spectrum vector for each of the media contents, whose elements are the data of each of the semantic frequency spectrums, and calculates and outputs the cosine similarity between the semantic frequency spectrum vectors.

本願において開示される発明のうち、代表的なものによって得られる効果を簡単に説明すれば以下のとおりである。 The effects achieved by the representative inventions disclosed in this application can be briefly explained as follows:

すなわち、本発明の代表的な実施の形態によれば、メディアコンテンツの意味や印象の変化、遷移を時間軸、時系列で把握して、メディアコンテンツの展開や筋に基づく類似度を計算することが可能となる。 In other words, according to a representative embodiment of the present invention, it is possible to grasp the changes and transitions in the meaning and impression of media content on a time axis and in a chronological order, and calculate the similarity based on the development and plot of the media content.

本発明の実施の形態１であるコンテンツ類似計算システムの構成例について概要を示した図である。1 is a diagram showing an overview of a configuration example of a content similarity calculation system according to a first embodiment of the present invention; 本発明の実施の形態１におけるコンテンツ類似計算処理の概要について説明した図である。FIG. 2 is a diagram illustrating an overview of a content similarity calculation process according to the first embodiment of the present invention. 本発明の実施の形態１における意味波形抽出処理の概要について説明した図である。FIG. 2 is a diagram illustrating an overview of a semantic waveform extraction process according to the first embodiment of the present invention. 本発明の実施の形態１におけるメディアコンテンツから時系列で特徴を抽出する例について概要を示した図である。FIG. 2 is a diagram showing an overview of an example of extracting features in time series from media content in the first embodiment of the present invention. 本発明の実施の形態１における時間・周波数領域変換処理の概要について説明した図である。FIG. 2 is a diagram illustrating an overview of a time-frequency domain transform process according to the first embodiment of the present invention. 本発明の実施の形態２であるコンテンツ検索・推奨システムの構成例について概要を示した図である。FIG. 11 is a diagram showing an overview of a configuration example of a content search and recommendation system according to a second embodiment of the present invention.

以下、本発明の実施の形態を図面に基づいて詳細に説明する。なお、実施の形態を説明するための全図において、同一部には原則として同一の符号を付し、その繰り返しの説明は省略する。一方で、ある図において符号を付して説明した部位について、他の図の説明の際に再度の図示はしないが同一の符号を付して言及する場合がある。 The following describes in detail an embodiment of the present invention with reference to the drawings. In all drawings used to explain the embodiment, the same parts are generally given the same reference numerals, and repeated explanations will be omitted. However, there are cases where parts that have been described with reference numerals in one drawing will be referred to with the same reference numerals when explaining other drawings, although they will not be shown again.

（実施の形態１）
＜概要＞
本発明の実施の形態１であるコンテンツ類似計算システムは、時系列で内容が変化するメディアコンテンツについて、その意味や印象の時系列での変化や遷移（意味的遷移）を把握し、これを波形として表現した意味波形を抽出する。そして、メディアコンテンツにおける複数の意味項目についてそれぞれ抽出された意味波形をフーリエ変換により周波数領域に変換して周波数スペクトルを取得し、これをベクトルとして表現する。そして、各メディアコンテンツにおいてそれぞれ得られたこのベクトルについて、ベクトル間のコサイン類似度を計算することで、意味的遷移に基づいてメディアコンテンツ間の類似度を計算する情報処理システムである。 (Embodiment 1)
＜Overview＞
The content similarity calculation system according to the first embodiment of the present invention grasps the time-series changes and transitions (semantic transitions) in the meanings and impressions of media content whose contents change over time, and extracts semantic waveforms that express these as waveforms.The semantic waveforms extracted for each of the multiple semantic items in the media content are then converted to the frequency domain by Fourier transform to obtain frequency spectra, which are then expressed as vectors.The information processing system then calculates the cosine similarity between the vectors obtained for each of the media contents, thereby calculating the similarity between the media content based on the semantic transitions.

＜システム構成＞
図１は、本発明の実施の形態１であるコンテンツ類似計算システム１の構成例について概要を示した図である。コンテンツ類似計算システム１は、例えば、サーバ機器やクラウドコンピューティングサービス上に構築された仮想サーバ等のサーバシステム、もしくはＰＣ（Personal Computer）等のコンピュータシステムにより構成される。そして、図示しないＣＰＵ（Central Processing Unit）により、ＨＤＤ（Hard Disk Drive）等の記録装置からメモリ上に展開したＯＳ（Operating System）やＤＢＭＳ（DataBase Management System）、Ｗｅｂサーバプログラム等のミドルウェアや、その上で稼働するソフトウェアを実行することで、メディアコンテンツの類似計算に係る後述する各種機能を実現する。 <System Configuration>
1 is a diagram showing an overview of a configuration example of a content similarity calculation system 1 according to a first embodiment of the present invention. The content similarity calculation system 1 is, for example, a server system such as a server device or a virtual server constructed on a cloud computing service, or a computer system such as a PC (Personal Computer). A CPU (Central Processing Unit) (not shown) executes middleware such as an OS (Operating System), a DBMS (DataBase Management System), and a Web server program, which are expanded onto a memory from a recording device such as a HDD (Hard Disk Drive), and software running thereon, thereby realizing various functions related to similarity calculation of media content, which will be described later.

コンテンツ類似計算システム１は、例えば、ソフトウェアとして実装されたコンテンツ入力部１１、意味波形抽出部１２、時間・周波数領域変換部１３、および類似計算部１４等の各部を有する。 The content similarity calculation system 1 has various components, such as a content input unit 11, a semantic waveform extraction unit 12, a time-frequency domain conversion unit 13, and a similarity calculation unit 14, which are implemented as software.

コンテンツ入力部１１は、類似度を計算する対象となるメディアコンテンツの入力を受け付ける機能を有する。図示しない入出力デバイスを介してメディアコンテンツのデータファイルの入力を直接受け付けてもよいし、インターネット等のネットワーク２を介してアップロードにより入力を受け付けてもよい。類似計算の対象となるメディアコンテンツは、必ずしも同じ種類のもの（例えば、小説同士、映画同士等）である必要はない。本実施の形態では、類似計算の際にメディアコンテンツのデータ同士を直接対比させるのではなく、意味波形を抽出した上で周波数領域に変換して類似計算することから、異なる種類のメディアコンテンツであっても類似計算することができる。 The content input unit 11 has a function of accepting input of media content for which similarity is to be calculated. The input of a data file of the media content may be accepted directly via an input/output device (not shown), or may be accepted by uploading via a network 2 such as the Internet. The media content that is the subject of the similarity calculation does not necessarily have to be of the same type (for example, two novels, two movies, etc.). In this embodiment, when calculating the similarity, the data of the media content is not directly compared, but rather the semantic waveform is extracted and converted into the frequency domain for similarity calculation, so that similarity calculation can be performed even for different types of media content.

意味波形抽出部１２は、コンテンツ入力部１１を介して入力されたメディアコンテンツから、複数の特徴項目について、それぞれの時間軸・時系列での変化を抽出し、これに基づいて、コンテンツの意味や印象を表す複数の意味項目について、それぞれの影響の程度についての時間軸・時系列での遷移を示す意味波形のデータとして出力する機能を有する。意味波形抽出部１２の機能や処理内容については後述する。 The semantic waveform extraction unit 12 has a function of extracting changes on a time axis/time series for each of multiple feature items from the media content input via the content input unit 11, and based on this, outputting semantic waveform data showing transitions on a time axis/time series for the degree of influence of each of multiple semantic items that represent the meaning and impression of the content. The functions and processing contents of the semantic waveform extraction unit 12 will be described later.

時間・周波数領域変換部１３は、意味波形抽出部１２から出力された各意味項目に係る意味波形のデータに対して、それぞれフーリエ変換を行って時間領域のデータから周波数領域のデータに変換することで意味周波数スペクトルを取得して出力する機能を有する。時間・周波数領域変換部１３の機能や処理内容についても後述する。 The time-to-frequency domain conversion unit 13 has a function of performing a Fourier transform on the semantic waveform data relating to each semantic item output from the semantic waveform extraction unit 12, converting the data from the time domain into data in the frequency domain, thereby acquiring and outputting a semantic frequency spectrum. The functions and processing contents of the time-to-frequency domain conversion unit 13 will also be described later.

類似計算部１４は、類似度を計算する対象の各メディアコンテンツについて、それぞれ時間・周波数領域変換部１３から出力された意味周波数スペクトルのデータに基づいて、当該メディアコンテンツ間の類似度を計算して出力する機能を有する。意味周波数スペクトルのデータに基づく類似度の計算については特に限定されないが、例えば、計算対象の各メディアコンテンツについて、それぞれ、各意味項目についての意味周波数スペクトルのデータを要素とするベクトル（意味周波数スペクトルベクトル）を生成し、これらのベクトル間のコサイン類似度を計算する手法などをとることができる。 The similarity calculation unit 14 has a function of calculating and outputting the similarity between the media contents for which similarity is to be calculated, based on the semantic frequency spectrum data output from the time-frequency domain conversion unit 13. There are no particular limitations on the calculation of similarity based on the semantic frequency spectrum data, but for example, a method can be used in which, for each media content to be calculated, a vector (semantic frequency spectrum vector) is generated whose elements are the semantic frequency spectrum data for each semantic item, and the cosine similarity between these vectors is calculated.

＜コンテンツ類似計算処理の内容＞
図２は、本発明の実施の形態１におけるコンテンツ類似計算処理の概要について説明した図である。コンテンツ入力部１１を介して入力された、計算対象の複数のメディアコンテンツＡ（４１ａ）およびメディアコンテンツＢ（４１ｂ）について、それぞれ、意味波形抽出部１２により意味波形抽出処理（Ｓ１ａ、Ｓ１ｂ）を行うことで、意味波形Ａ（４２ａ）および意味波形Ｂ（４２ｂ）を得る。 <Contents of content similarity calculation processing>
2 is a diagram illustrating an overview of the content similarity calculation process in the first embodiment of the present invention. For a plurality of media contents A (41a) and B (41b) to be calculated, which are input via the content input unit 11, the semantic waveform extraction unit 12 performs semantic waveform extraction processes (S1a, S1b) to obtain semantic waveforms A (42a) and B (42b).

この意味波形Ａ（４２ａ）、意味波形Ｂ（４２ｂ）に対してそれぞれ、時間・周波数領域変換部１３により時間・周波数領域変換処理（Ｓ２ａ、Ｓ２ｂ）を行うことで、意味周波数スペクトルＡ（４３ａ）および意味周波数スペクトルＢ（４３ｂ）を得る。そして、この意味周波数スペクトルＡ（４３ａ）、意味周波数スペクトルＢ（４３ｂ）のデータについて、類似計算部１４により、それぞれベクトル化して意味周波数スペクトルベクトルとし、これらのベクトル間のコサイン類似度を計算する類似計算処理（Ｓ３）を行うことで、メディアコンテンツＡ（４１ａ）とメディアコンテンツＢ（４１ｂ）との間の類似度４５を計算する。 The time-frequency domain conversion unit 13 performs time-frequency domain conversion processing (S2a, S2b) on the semantic waveform A (42a) and the semantic waveform B (42b), respectively, to obtain the semantic frequency spectrum A (43a) and the semantic frequency spectrum B (43b). The data of the semantic frequency spectrum A (43a) and the semantic frequency spectrum B (43b) are then vectorized by the similarity calculation unit 14 to obtain semantic frequency spectrum vectors, and a similarity calculation process (S3) is performed to calculate the cosine similarity between these vectors, thereby calculating the similarity 45 between the media content A (41a) and the media content B (41b).

以下では、各処理の内容について説明する。 The details of each process are explained below.

＜意味波形抽出処理＞
図３は、本発明の実施の形態１における意味波形抽出処理の概要について説明した図である。本実施の形態では、展開や筋を有する、すなわち時間ｔの経過によって意味や印象などが変化するメディアコンテンツ４１から、意味項目毎に複数の意味波形４２を抽出する。図３の例では、「喜び」と「悲しみ」という２つの意味項目についてそれぞれ意味波形４２を抽出したことを示している。このように、本実施の形態では、抽出される意味波形４２の数は、評価する意味項目（意味や印象の観点）の数と同じである。 <Semantic Waveform Extraction Processing>
3 is a diagram illustrating an outline of the semantic waveform extraction process in the first embodiment of the present invention. In this embodiment, a plurality of semantic waveforms 42 are extracted for each semantic item from media content 41 that has a development or plot, i.e., whose meaning or impression changes over time t. The example in FIG. 3 shows that a semantic waveform 42 is extracted for each of two semantic items, "joy" and "sadness." Thus, in this embodiment, the number of semantic waveforms 42 extracted is the same as the number of semantic items (perspectives of meaning and impression) to be evaluated.

意味波形抽出処理では、まず、メディアコンテンツ４１を所定の時間幅で区切った複数のウィンドウ５１に分割する（ステップ１）。ここでは、意味を表現することができる最小の時間幅をウィンドウサイズとし、対象のメディアコンテンツ４１の先頭から（もしくは指定された位置から）設定されたウィンドウサイズ毎に複数のウィンドウ５１に分割する。 In the semantic waveform extraction process, first, the media content 41 is divided into multiple windows 51 separated by a predetermined time width (step 1). Here, the window size is set to the smallest time width that can express the meaning, and the target media content 41 is divided into multiple windows 51 for each set window size from the beginning (or from a specified position).

そして、複数のウィンドウ５１に分割されたメディアコンテンツ４１から、複数の特徴項目をそれぞれ抽出する（ステップ２）。これにより、ウィンドウ５１毎に現れる特徴項目の時系列での変化を得ることができる。なお、ここでの特徴項目は、例えば、メディアコンテンツ４１中に直接的に現れる、意味や印象に関連する文章や単語、画像、音楽等とすることができる。その後、ウィンドウ５１毎の各特徴項目を、意味項目（例えば、「喜び」や「悲しみ」等）に変換する（ステップ３）。 Then, multiple feature items are extracted from the media content 41 divided into multiple windows 51 (step 2). This makes it possible to obtain the time-series changes in the feature items appearing in each window 51. Note that the feature items here can be, for example, sentences, words, images, music, etc. that appear directly in the media content 41 and relate to meanings or impressions. After that, each feature item for each window 51 is converted into a semantic item (for example, "joy" or "sadness") (step 3).

そして、各意味項目の影響の大きさ・程度をウィンドウ５１毎に数値化して（ステップ４）、その時系列での変化を波形として表した意味波形４２を得る（ステップ５）。図３の例では、メディアコンテンツ４１における「喜び」と「悲しみ」の２つの意味項目について得られた意味波形４２の例を示している。 Then, the magnitude and degree of the influence of each semantic item is quantified for each window 51 (step 4), and a semantic waveform 42 is obtained that shows the changes over time as a waveform (step 5). The example in Figure 3 shows an example of a semantic waveform 42 obtained for two semantic items, "joy" and "sadness," in media content 41.

上記のステップ２およびステップ３の処理における、特徴項目を意味項目に変換する手法としては、例えば、文章や単語に対するネガポジ判定や感性判定等の既存の一般的な感性・感情分析の手法を用いることができる。また、例えば、文章の統計的潜在意味解析の手法として知られているトピックモデルを用いて、メディアコンテンツ４１中におけるトピック（意味項目）の時系列でのシェアの変化を把握する手法を用いることも可能である。 In the above processing of steps 2 and 3, the method of converting feature items into semantic items can be, for example, an existing general sensibility/emotion analysis method such as negative/positive judgment or sentiment judgment for sentences and words. It is also possible to use a method of grasping the change in the share of topics (semantic items) over time in media content 41, for example, by using a topic model known as a method of statistical latent semantic analysis of sentences.

また、例えば、文献（T.Kitagawa and Y.Kiyoki,Y, "Fundamental framework for media data retrieval system using media lexicon transformation operator", Information Modeling and Knowledge Bases, vol.12, pp.316-326, 2001）において提案されている、メディアコンテンツから印象語を抽出するフレームワークであるMedia-lexicon Transformation Operatorを利用することも可能である。このMedia-lexicon Transformation Operatorは、対象とするメディアコンテンツに関する分野の専門家による研究や評論、統計などを用いることにより、人間がそのメディアコンテンツから受ける印象語の抽出を実現する機構である。 It is also possible to use the Media-lexicon Transformation Operator, a framework for extracting impression words from media content, proposed in the literature (T. Kitagawa and Y. Kiyoki, Y, "Fundamental framework for media data retrieval system using media lexicon transformation operator", Information Modeling and Knowledge Bases, vol.12, pp.316-326, 2001). The Media-lexicon Transformation Operator is a mechanism that uses research, reviews, statistics, etc. by experts in the field related to the target media content to extract impression words that humans get from that media content.

図４は、本発明の実施の形態１におけるメディアコンテンツ４１から時系列で特徴を抽出する例について概要を示した図である。本実施の形態では、上記のMedia-lexicon Transformation Operatorが本来静的な一次元のデータを対象としているところ、これを時間軸に拡張して用いる。例えば、メディアンコンテンツ４１中のウィンドウ５１毎に現れる特徴項目に係る単語（ｗ_１、ｗ_２、…、ｗ_ｍ）の出現頻度と、各ウィンドウ５１の時間（ｔ_１、ｔ_２、…）（例えば、メディアコンテンツ４１の開始からの経過時間により把握）との対応関係から得られる単語行列Ｙを求め、これを、単語（ｗ_１、ｗ_２、…、ｗ_ｍ）と、意味や印象を示す意味項目（ｘ_１、ｘ_２、…、ｘ_ｎ）との関係からなる変換行列Ｔ（予め辞書データ的に用意される）を用いて、変換行列Ｔと、意味項目（ｘ_１、ｘ_２、…、ｘ_ｎ）と各ウィンドウ５１の時間（ｔ_１、ｔ_２、…）との関係からなるメディアコンテンツ行列Ｘとの積に変換する。このメディアコンテンツ行列Ｘにより、各意味項目の時系列での遷移を把握することができる。 4 is a diagram showing an overview of an example of extracting features in a time series from media content 41 in the first embodiment of the present invention. In this embodiment, the above-mentioned Media-lexicon Transformation Operator, which is originally intended for static one-dimensional data, is extended to the time axis for use. For example, a word matrix Y is obtained from the correspondence between the appearance frequency of words ( _w1 , _w2 , ..., _wm ) related to feature items appearing in each window 51 in the media content 41 and the time ( _t1 , _t2 , _... ) of each window 51 (for example, grasped by the elapsed time from the start of _{the media content 41), and this is converted into a product of the conversion matrix T (prepared in advance as dictionary data) consisting of the relationship between the words (w1} _, _w2 _, ..., _wm ) and semantic items ( _x1 , _x2 , ..., _xn ) indicating meanings and impressions, and a media content matrix X consisting of the relationship between the semantic items ( _x1 , x2, ..., xn) and the time (t1, _t2 , ...) of each window 51. The media content matrix X makes it possible to grasp the transition of each semantic item in time series.

＜時間・周波数領域変換処理＞
図５は、本発明の実施の形態１における時間・周波数領域変換処理の概要について説明した図である。ここでは、各意味項目（図５の例では「喜び」と「悲しみ」）の意味波形４２のデータに対して、それぞれ高速フーリエ変換（Fast Fourier Transform：ＦＦＴ）アルゴリズムにより離散フーリエ変換を行って時間領域から周波数領域に変換することで、意味項目毎の意味周波数スペクトル（図５の例では「喜び」の意味周波数スペクトル４３ａと「悲しみ」の意味周波数スペクトル４３ｂ）を得る。 <Time-frequency domain transformation processing>
5 is a diagram illustrating an overview of the time-frequency domain conversion process in the first embodiment of the present invention. Here, the data of the semantic waveform 42 of each semantic item (in the example of FIG. 5, "joy" and "sadness") is subjected to a discrete Fourier transform using a Fast Fourier Transform (FFT) algorithm to convert from the time domain to the frequency domain, thereby obtaining a semantic frequency spectrum for each semantic item (in the example of FIG. 5, a semantic frequency spectrum 43a for "joy" and a semantic frequency spectrum 43b for "sadness").

＜類似計算処理＞
類似計算処理では、メディアコンテンツ４１毎に、上記の一連の処理で得られた意味項目毎の意味周波数スペクトルを１列の要素としてベクトル化して意味周波数スペクトルベクトル４４を取得する。そして、メディアコンテンツ４１毎の意味周波数スペクトルベクトル４４のコサイン類似度を計算することで類似度を得る。類似度については、コサイン類似度の値をそのまま用いてもよいし、この値を所定の基準値と比較することで、類似している／していないという判定や、段階的な類似の程度を評価をするものであってもよい。 <Similar calculation processing>
In the similarity calculation process, for each media content 41, the semantic frequency spectrum for each semantic item obtained by the above series of processes is vectorized as a column of elements to obtain a semantic frequency spectrum vector 44. Then, the cosine similarity of the semantic frequency spectrum vector 44 for each media content 41 is calculated to obtain the similarity. The value of the cosine similarity may be used as is for the similarity, or this value may be compared with a predetermined reference value to determine whether or not the contents are similar, or to evaluate the degree of similarity in stages.

＜結語＞
以上に示したように、本発明の実施の形態１であるコンテンツ類似計算システム１によれば、メディアコンテンツ４１における意味や印象の遷移を時間軸、時系列で把握して、メディアコンテンツ４１の展開や筋に基づいて、コンテンツ間の類似度を計算することが可能である。 <Conclusion>
As described above, according to the content similarity calculation system 1 which is embodiment 1 of the present invention, it is possible to grasp the transitions in meaning and impression in the media content 41 on a time axis and in a chronological order, and to calculate the similarity between contents based on the development and plot of the media content 41.

また、類似計算の際にメディアコンテンツ４１のデータ同士を直接対比させるのではなく、メディアコンテンツ４１から意味波形４２を抽出し、これを周波数領域に変換した意味周波数スペクトル４３に基づいて類似計算することから、例えば、小説と映画など、異なる種類のメディアコンテンツ４１であっても類似計算することが可能である。 In addition, when performing similarity calculations, the data of the media contents 41 are not directly compared, but rather the semantic waveform 42 is extracted from the media contents 41 and converted into the frequency domain to perform similarity calculations based on the semantic frequency spectrum 43. This makes it possible to perform similarity calculations even for different types of media contents 41, such as novels and movies.

さらに、類似計算の際に、メディアコンテンツ４１毎に、複数の意味周波数スペクトル４３をベクトル化した意味周波数スペクトルベクトル４４を生成して、これらのベクトル間のコサイン類似度を計算して類似度を得るというシンプルな手法をとることにより、低コストでメディアコンテンツ４１間の類似度を計算することが可能である。 Furthermore, when calculating the similarity, a simple method is used in which a semantic frequency spectrum vector 44 is generated for each media content 41 by vectorizing multiple semantic frequency spectra 43, and the cosine similarity between these vectors is calculated to obtain the similarity, making it possible to calculate the similarity between media contents 41 at low cost.

（実施の形態２）
本発明の実施の形態２であるコンテンツ検索・推奨システムは、上述した実施の形態１のコンテンツ類似計算システムを用いて、メディアコンテンツの検索・推奨を行う情報処理システムである。 (Embodiment 2)
The content search and recommendation system according to the second embodiment of the present invention is an information processing system that uses the content similarity calculation system according to the first embodiment described above to search for and recommend media content.

＜システム構成＞
図６は、本発明の実施の形態２であるコンテンツ検索・推奨システムの構成例について概要を示した図である。コンテンツ検索・推奨システム３は、例えば、サーバ機器やクラウドコンピューティングサービス上に構築された仮想サーバ等のサーバシステム、もしくはＰＣ等のコンピュータシステムにより構成される。そして、図示しないＣＰＵにより、ＨＤＤ等の記録装置からメモリ上に展開したＯＳやＤＢＭＳ、Ｗｅｂサーバプログラム等のミドルウェアや、その上で稼働するソフトウェアを実行することで、ユーザから指定された条件に基づくメディアコンテンツの検索や推奨に係る後述する各種機能を実現する。 <System Configuration>
6 is a diagram showing an overview of a configuration example of a content search and recommendation system according to a second embodiment of the present invention. The content search and recommendation system 3 is, for example, a server system such as a server device or a virtual server constructed on a cloud computing service, or a computer system such as a PC. Then, a CPU (not shown) executes middleware such as an OS, a DBMS, and a Web server program that are expanded from a recording device such as an HDD onto a memory, and software that runs on the middleware, thereby realizing various functions (described later) related to searching and recommending media content based on conditions specified by a user.

コンテンツ検索・推奨システム３は、例えば、ソフトウェアとして実装された条件コンテンツ入力部３１、コンテンツ取出部３２、コンテンツ類似計算部３３、および結果コンテンツ出力部３４等の各部を有する。また、複数のメディアコンテンツを格納するコンテンツデータベース（ＤＢ）３５を有する。コンテンツ検索・推奨システム３は、コンテンツＤＢ３５に格納されているメディアコンテンツの中から、ユーザにより検索条件として指定されたメディアコンテンツに類似するものを抽出して、検索結果もしくは推奨コンテンツとして出力する。 The content search and recommendation system 3 has various parts, such as a condition content input unit 31, a content extraction unit 32, a content similarity calculation unit 33, and a result content output unit 34, which are implemented as software. It also has a content database (DB) 35 that stores multiple media contents. The content search and recommendation system 3 extracts media contents similar to media contents specified by the user as search conditions from the media contents stored in the content DB 35, and outputs them as search results or recommended content.

条件コンテンツ入力部３１は、メディアコンテンツの検索や推奨を行う際の条件となる比較対象のメディアコンテンツの入力を受け付ける機能を有する。図示しない入出力デバイスを介してメディアコンテンツのデータファイルの入力を直接受け付けてもよいし、インターネット等のネットワーク２を介してアップロードにより入力を受け付けてもよい。 The condition content input unit 31 has a function of accepting input of media content to be compared as a condition when searching for or recommending media content. The input of a data file of the media content may be accepted directly via an input/output device (not shown), or the input may be accepted by uploading via a network 2 such as the Internet.

コンテンツ取出部３２は、コンテンツＤＢ３５に格納されているメディアコンテンツを検索対象として順次取り出す機能を有する。全てのメディアコンテンツを順次取り出すようにしてもよいし、条件コンテンツ入力部３１を介して入力された検索条件のメディアコンテンツの内容やメタデータその他の情報に基づいてコンテンツＤＢ３５から取り出すメディアコンテンツの範囲を絞り込んで、その中から順次取り出すようにしてもよい。 The content retrieval unit 32 has a function of sequentially retrieving media content stored in the content DB 35 as search targets. It may be set to sequentially retrieve all media content, or it may narrow down the range of media content to be retrieved from the content DB 35 based on the contents, metadata, and other information of the media content of the search criteria inputted via the condition content input unit 31, and sequentially retrieve media content from among them.

コンテンツ類似計算部３３は、条件コンテンツ入力部３１を介して入力された検索条件のメディアコンテンツと、コンテンツ取出部３２によりコンテンツＤＢ３５から順次取り出されたメディアコンテンツとを入力として、メディアコンテンツ間の類似度を計算する機能を有する。実装に際しては、例えば、上述の実施の形態１に記載されたコンテンツ類似計算システム１（より正確にはソフトウェアとして実装された各部）を用いることができ、本実施の形態でもそのような構成をとるものとする。 The content similarity calculation unit 33 has a function of calculating the similarity between media contents by inputting the media contents of the search criteria inputted via the condition content input unit 31 and the media contents sequentially extracted from the content DB 35 by the content extraction unit 32. When implementing, for example, the content similarity calculation system 1 described in the above-mentioned first embodiment (more precisely, each unit implemented as software) can be used, and such a configuration is also adopted in this embodiment.

結果コンテンツ出力部３４は、コンテンツ類似計算部３３により計算された類似度の情報に基づいて、コンテンツＤＢ３５内のメディアコンテンツのうち、検索条件のメディアコンテンツとの間の類似度が最も高いものから順に一定数（もしくは類似度が一定の値以上）のメディアコンテンツを検索結果もしくは推奨コンテンツとしてユーザに対して出力する機能を有する。出力方法は特に限定されず、コンテンツ検索・推奨システム３が備える図示しないディスプレイに表示してもよいし、データファイルとして取り出したり、ネットワーク２を介してユーザがダウンロードしたりできるようにしてもよい。 The result content output unit 34 has a function of outputting to the user as search results or recommended contents a certain number of media contents (or similarities equal to or greater than a certain value) from among the media contents in the content DB 35, in order of the highest similarity to the media content specified by the search criteria, based on the similarity information calculated by the content similarity calculation unit 33. The output method is not particularly limited, and may be displayed on a display (not shown) provided in the content search and recommendation system 3, or may be extracted as a data file or made available for the user to download via the network 2.

＜結語＞
以上に説明したように、本発明の実施の形態２であるコンテンツ検索・推奨システム３によれば、コンテンツＤＢ３５に格納されているメディアコンテンツの中から、ユーザにより検索条件として指定されたメディアコンテンツに類似するものを抽出して、検索結果もしくは推奨コンテンツとして出力することが可能である。 <Conclusion>
As described above, according to the content search and recommendation system 3 which is embodiment 2 of the present invention, it is possible to extract media content similar to media content specified by the user as a search condition from among media content stored in the content DB 35, and output the content as the search result or recommended content.

以上、本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は上記の実施の形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能であることはいうまでもない。また、上記の実施の形態は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施の形態の構成の一部を他の実施の形態の構成に置き換えることが可能であり、また、ある実施の形態の構成に他の実施の形態の構成を加えることも可能である。また、各実施の形態の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 The invention made by the inventor has been specifically described above based on the embodiments, but it goes without saying that the present invention is not limited to the above embodiments and can be modified in various ways without departing from the gist of the invention. Furthermore, the above embodiments have been described in detail to explain the invention in an easy-to-understand manner, and the invention is not necessarily limited to having all of the configurations described. Furthermore, it is possible to replace part of the configuration of one embodiment with the configuration of another embodiment, and it is also possible to add the configuration of another embodiment to the configuration of one embodiment. Furthermore, it is possible to add, delete, or replace part of the configuration of each embodiment with other configurations.

また、上記の各構成、機能、処理部、処理手段等は、それらの一部または全部を、例えば、集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリやハードディスク、ＳＳＤ（Solid State Drive）等の記録装置、またはＩＣカード、ＳＤカード、ＤＶＤ等の記録媒体に置くことができる。 Furthermore, the above-mentioned configurations, functions, processing units, processing means, etc. may be realized in part or in whole in hardware, for example by designing them as integrated circuits. Furthermore, the above-mentioned configurations, functions, etc. may be realized in software by a processor interpreting and executing a program that realizes each function. Information on the programs, tables, files, etc. that realize each function can be stored in a recording device such as a memory, hard disk, or SSD (Solid State Drive), or in a recording medium such as an IC card, SD card, or DVD.

また、上記の各図において、制御線や情報線は説明上必要と考えられるものを示しており、必ずしも実装上の全ての制御線や情報線を示しているとは限らない。実際にはほとんど全ての構成が相互に接続されていると考えてもよい。 In addition, in each of the above diagrams, the control lines and information lines shown are those considered necessary for explanation, and do not necessarily show all of the control lines and information lines in the actual implementation. In reality, it can be assumed that almost all components are interconnected.

本発明は、時系列で変化する展開や筋を有するメディアコンテンツの類似度を計算するコンテンツ類似システムおよびコンテンツ検索・推奨システムに利用可能である。 The present invention can be used in content similarity systems and content search and recommendation systems that calculate the similarity of media content that has a development or plot that changes over time.

１…コンテンツ類似計算システム、２…ネットワーク、３…コンテンツ検索・推奨システム、
１１…コンテンツ入力部、１２…意味波形抽出部、１３…時間・周波数領域変換部、１４…類似計算部、
３１…条件コンテンツ入力部、３２…コンテンツ取出部、３３…コンテンツ類似計算部、３４…結果コンテンツ出力部３４、３５…コンテンツＤＢ、
４１…メディアコンテンツ、４１ａ…メディアコンテンツＡ、４１ｂ…メディアコンテンツＢ、４２…意味波形、４２ａ…意味波形Ａ、４２ｂ…意味波形Ｂ、４３…意味周波数スペクトル、４３ａ…意味周波数スペクトルＡ、４３ｂ…意味周波数スペクトルＢ、４４…意味周波数スペクトルベクトル、４５…類似度
５１…ウィンドウ 1...Content similarity calculation system, 2...Network, 3...Content search and recommendation system,
11 ... content input unit, 12 ... semantic waveform extraction unit, 13 ... time-frequency domain conversion unit, 14 ... similarity calculation unit,
31: condition content input unit, 32: content extraction unit, 33: content similarity calculation unit, 34: result content output unit, 35: content DB,
41...media content, 41a...media content A, 41b...media content B, 42...semantic waveform, 42a...semantic waveform A, 42b...semantic waveform B, 43...semantic frequency spectrum, 43a...semantic frequency spectrum A, 43b...semantic frequency spectrum B, 44...semantic frequency spectrum vector, 45...similarity, 51...window

Claims

A content similarity calculation system for calculating a similarity between media contents whose contents change over time, comprising:
a semantic waveform extraction unit for extracting changes in the degree of influence of a plurality of semantic items for each of two or more input media contents and outputting the changes as semantic waveform data;
a time-frequency domain transform unit that performs a Fourier transform on the semantic waveform data of each of the media contents to obtain and output a semantic frequency spectrum;
a similarity calculation unit that generates a semantic frequency spectrum vector having data of each of the semantic frequency spectra as elements for each of the media contents, calculates and outputs a cosine similarity between the semantic frequency spectrum vectors;
A content similarity calculation system comprising:

2. The content similarity calculation system according to claim 1,
The semantic waveform extraction unit
A content similarity calculation system which obtains a media content matrix whose product with the transformation matrix becomes the word matrix based on a word matrix showing the correspondence between the number of occurrences of words related to feature items in each window obtained by dividing the media content into a predetermined time width and the time of each window, and a transformation matrix showing the correspondence between the words related to the feature items and each of the semantic items, and obtains data for each of the semantic waveforms based on the media content matrix data.

A content search and recommendation system for searching for media content similar to a specified second media content from a content storage unit in which a plurality of first media content whose contents change over time is stored, comprising:
A content similarity calculation unit comprising the content similarity calculation system according to claim 1;
a result content output unit that outputs, as a search result, the first media content whose similarity satisfies a predetermined condition based on the similarity between each of the first media content and the second media content calculated by the content similarity calculation unit;
A content search and recommendation system having the above.