JP7549804B2

JP7549804B2 - Emotion estimation device

Info

Publication number: JP7549804B2
Application number: JP2021032323A
Authority: JP
Inventors: 裕紀中谷; 菜穂子萬; 雅之渡邊
Original assignee: Mazda Motor Corp
Current assignee: Mazda Motor Corp
Priority date: 2021-03-02
Filing date: 2021-03-02
Publication date: 2024-09-12
Anticipated expiration: 2041-03-02
Also published as: JP2022133570A

Description

本発明は、感情推定装置に関し、特に、聴覚刺激に対する被測定者の感情状態について推定精度を向上させた改良に関する。 The present invention relates to an emotion estimation device, and in particular to an improvement that improves the estimation accuracy of the subject's emotional state in response to an auditory stimulus.

車両の運転支援制御等の分野においては、様々な感覚刺激（視覚刺激や聴覚刺激等）を受けた被測定者（例えば車両乗員）の感情状態を推定する感情推定装置が知られている。また、このような感情推定装置を用いた運転支援制御としては、感情推定装置で推定された感情状態に基づいて、車両の乗員の感情状態がより運転に好ましい状態になるように、乗員に対して聴覚刺激や視覚刺激を与える方法が知られている。このような感情推定や運転支援制御に関する技術としては、例えば、特許文献１（特許第６４２８７４８号）に、人間の脳内の階層的な感覚情報処理を模擬したモデルを用いて車両乗員の感情を推定し、運転支援制御を行う技術が開示されている。 In the field of vehicle driving assistance control, an emotion estimation device is known that estimates the emotional state of a subject (e.g., a vehicle occupant) who has received various sensory stimuli (visual stimuli, auditory stimuli, etc.). In addition, as a driving assistance control using such an emotion estimation device, a method is known in which auditory and visual stimuli are given to a vehicle occupant so that the emotional state of the vehicle occupant becomes more favorable for driving based on the emotional state estimated by the emotion estimation device. As an example of a technology related to such emotion estimation and driving assistance control, Patent Document 1 (Patent No. 6428748) discloses a technology for estimating the emotions of a vehicle occupant using a model that simulates the hierarchical sensory information processing in the human brain and performing driving assistance control.

特許第６４２８７４８号Patent No. 6428748

ところで、聴覚刺激の人間の感性への影響は、音のテクスチャによって変わってくる。すなわち、聴覚刺激を受けた場合の人間の感じ方は、聴覚刺激の種類（機械音、自然音、電子音等の音源の種類）によって変わってくる。しかしながら、従来の感情推定装置（例えば、上記特許文献１に開示の技術）では、取得された音データの全体を、そのまま共通の感情推定式で処理して、感情推定が行われるので、音源の種類の違いが人間の感性に与える影響が感情推定に適切に反映されず、このため、感情推定の精度が十分には得られない場合があった。 The effect of an auditory stimulus on human sensibility varies depending on the texture of the sound. In other words, how a person feels when receiving an auditory stimulus varies depending on the type of auditory stimulus (type of sound source, such as mechanical sound, natural sound, electronic sound, etc.). However, in conventional emotion estimation devices (for example, the technology disclosed in the above-mentioned Patent Document 1), emotion estimation is performed by processing the entire acquired sound data directly using a common emotion estimation formula, so the effect that differences in the type of sound source have on human sensibility is not properly reflected in the emotion estimation, and as a result, there are cases where the accuracy of the emotion estimation is not sufficient.

本発明は、以上のような事情を勘案してなされたもので、音環境におかれた被測定者の感情を推定する感情推定装置において、感情推定の精度を高めることができる感情推定装置を提供することを目的とする。 The present invention has been made in consideration of the above circumstances, and aims to provide an emotion estimation device that can improve the accuracy of emotion estimation in an emotion estimation device that estimates the emotion of a subject placed in a sound environment.

前記目的を達成するため、本発明においては、次のような解決方法を採択している。すなわち、請求項１に記載のように、音環境におかれた被測定者の感情状態を推定する感情推定装置において、前記音環境における音データを取得する音データ取得手段と、前記音データで特定される聴覚刺激を受けた際の前記被測定者の感情状態を所定の感情推定式に基づいて推定する推定手段と、前記音データのテクスチャ特徴量に基づいて前記音データを音源の種類毎に予め設定されたカテゴリに分類する分類手段と、前記音データが前記分類手段によってどのカテゴリに分類されたかにしたがって前記感情推定式を補正する補正手段とを備え、前記テクスチャ特徴量は、前記音データの周波数成分と変調周波数成分に基づいて算出される。 In order to achieve the above object, the present invention adopts the following solution. That is, as described in claim 1, an emotion estimation device for estimating an emotional state of a subject placed in a sound environment includes a sound data acquisition means for acquiring sound data in the sound environment, an estimation means for estimating an emotional state of the subject when receiving an auditory stimulus specified by the sound data based on a predetermined emotion estimation formula, a classification means for classifying the sound data into a category preset for each type of sound source based on a texture feature of the sound data, and a correction means for correcting the emotion estimation formula according to which category the sound data is classified by the classification means , and the texture feature is calculated based on a frequency component and a modulation frequency component of the sound data .

上記解決手法によれば、音データが音源の種類に応じてカテゴリに分類され、分類されたカテゴリに応じて感情推定式が補正されるので、感情推定には、音源の種類が被測定者の感情に与える影響が適切に反映され、感情推定の精度が向上する。また、各カテゴリへの分類及びこれに基づく感情推定は、人間の聴覚特性（人間の脳内の階層的な感覚情報処理）を的確に反映したものとできる。 According to the above-mentioned solution, the sound data is classified into categories according to the type of sound source, and the emotion estimation formula is corrected according to the classified category, so that the effect of the type of sound source on the emotion of the subject is appropriately reflected in the emotion estimation, and the accuracy of the emotion estimation is improved. In addition, the classification into each category and the emotion estimation based thereon can accurately reflect the human hearing characteristics (hierarchical sensory information processing in the human brain).

上記解決手法を前提とした好ましい態様は、特許請求の範囲における請求項２以下に記載の通りである。すなわち、前記分類手段により前記音データが破裂音又は輸送機器のカテゴリに分類された場合、前記補正手段は、感情状態がより不快且つより活性と推定される方向に前記感情推定式を補正する（請求項２対応）。この場合、破裂音又は輸送機器音を聞いたときに人間の感情が不快且つ活性の方向に変化する傾向が感情推定式に適切に反映されるので、感情推定の精度が高められる。 A preferred embodiment based on the above-mentioned solution method is as described in claim 2 and the following claims. That is, when the classification means classifies the sound data into the category of a plosive sound or a transport equipment sound, the correction means corrects the emotion estimation formula in a direction in which the emotional state is estimated to be more unpleasant and more active (corresponding to claim 2 ). In this case, the tendency of human emotions to change in the direction of unpleasantness and activity when hearing a plosive sound or a transport equipment sound is appropriately reflected in the emotion estimation formula, thereby improving the accuracy of emotion estimation.

前記分類手段により前記音データが人工音のカテゴリに分類された場合、前記補正手段は、感情状態がより不快と推定される方向に前記感情推定式を補正する（請求項３対応）。この場合、アラーム等、異常状態であることを示すことが多い人工的な音を聞いたときに人間の感情が不快方向に変化する傾向が感情推定式に適切に反映されるので、感情推定の精度が高められる。
When the classification means classifies the sound data into the category of artificial sounds, the correction means corrects the emotion estimation formula in a direction in which the emotional state is estimated to be more unpleasant. In this case, the tendency of human emotions to change in the unpleasant direction when hearing artificial sounds, such as alarms, which often indicate abnormal conditions, is appropriately reflected in the emotion estimation formula, thereby improving the accuracy of emotion estimation.

本発明によれば、音環境の成分を、人間の聴覚の特性に近いテクスチャ特徴量に基づいて音源の種類毎に分類し、分類毎の人間の脳の感じ方の特徴に基づいて、予め決められた補正手法により感情推定結果（感情推定式）を補正するので、感情推定の精度が高められる。 According to the present invention, the components of the sound environment are classified into types of sound sources based on texture features that are close to the characteristics of human hearing, and the emotion estimation results (emotion estimation formula) are corrected using a predetermined correction method based on the characteristics of how the human brain senses each classification, thereby improving the accuracy of emotion estimation.

被測定者の感情状態を示す感情マップ。An emotion map showing the subject's emotional state. 人間脳内の階層的な聴覚情報処理を模擬化して示すもので、各階層の特徴データと各特徴データの自己相関および相互相関の算出を示す図。This is a diagram showing a simulation of the hierarchical auditory information processing in the human brain, showing the calculation of feature data at each layer and the autocorrelation and cross-correlation of each feature data. 評価モデルを決定する手法を説明するための図。FIG. 13 is a diagram for explaining a method for determining an evaluation model. 評価モデルの設定を行う制御系統例を示すブロック図。FIG. 4 is a block diagram showing an example of a control system for setting an evaluation model. 音のカテゴリへの分類例を示す図。FIG. 1 is a diagram showing an example of classification of sounds into categories. カテゴリ分類に基づく補正を行わずに感情推定した結果を示す図。FIG. 13 is a diagram showing the result of emotion estimation without correction based on category classification. カテゴリ分類に基づく補正を行った場合の感情推定結果を示す図。FIG. 13 is a diagram showing emotion estimation results when correction based on category classification is performed.

以下、添付図面に基づいて本発明の実施形態について説明する。図１には、感情推定装置において被測定者の感情状態を示すための感情マップを示す。図示されるように、感情マップは、快・不快と活性・非活性との２軸をパラメータとするものであり、被測定者の感情状態は、感情マップ上における座標値で表される。 Hereinafter, an embodiment of the present invention will be described with reference to the attached drawings. FIG. 1 shows an emotion map for indicating the emotional state of a person being measured in an emotion estimation device. As shown in the figure, the emotion map has two axes, pleasant/unpleasant and active/inactive, as parameters, and the emotional state of the person being measured is represented by coordinate values on the emotion map.

図１において、感情マップは、縦横に９つの領域に破線で区分けされている（各領域の境界は破線で示される）。詳しく説明すると、活性・非活性のパラメータは、Ａ２を中間値として、非活性方向にＡ１が設定され、活性方向にＡ３が設定されている。つまり、Ａ１以下は非活性状態であり、Ａ３以上は活性状態であり、Ａ１とＡ３との間が中間の状態となる。 In Figure 1, the emotion map is divided horizontally and vertically into nine regions by dashed lines (the boundaries of each region are indicated by dashed lines). To explain in more detail, the activation/inactivation parameters have A2 as the intermediate value, A1 is set in the inactivation direction, and A3 is set in the activation direction. In other words, values below A1 are inactive, values above A3 are active, and values between A1 and A3 are intermediate states.

また、快・不快のパラメータは、Ｂ２を中間値として、不快方向にＢ１が設定され、快方向にＢ３が設定されている。つまり、Ｂ１以下は不快状態であり、Ｂ３以上は快状態であり、Ｂ１とＢ３との間が中間の状態となる。 The comfort/discomfort parameter is set to B1 on the uncomfortable side, with B2 as the intermediate value, and B3 on the comfortable side. In other words, anything below B1 is an uncomfortable state, anything above B3 is a comfortable state, and anything between B1 and B3 is an intermediate state.

本実施形態の感情推定装置を車両の運転支援に用いる場合、例えば、９つに区分された領域のうち、快・不快のレベルがＢ１よりも大きい範囲でかつ活性・非活性のレベルがＡ１とＡ３との間となる領域（領域２－２と領域３－２）が、望ましい感情状態（目標感情状態）とされ、運転者の感情状態をこの目標感情状態に近づけるような運転支援が行われることになる。 When the emotion estimation device of this embodiment is used for vehicle driving assistance, for example, of the nine divided regions, the region where the pleasant/unpleasant level is greater than B1 and the active/inactive level is between A1 and A3 (region 2-2 and region 3-2) is determined to be the desired emotional state (target emotional state), and driving assistance is provided to bring the driver's emotional state closer to this target emotional state.

次に、図２を参照しつつ、人間脳内の聴覚情報処理を模擬化した処理に基づく各階層での特徴データの算出と、各特徴データについてのＭｏｍｅｎｔおよび相互相関について説明する。すなわち、聴覚情報処理においては、音データが蝸牛（基底膜）に入力されることを出発点として、最終的に脳幹視床での処理が行われて、順次、第１階層の処理、第２階層の処理、第３階層の処理が行われるが、図２はこのような処理を模擬化して示すものである。なお、以下の説明では、図２中の数字を〇印で囲ったものを、〇１、〇２等でもって表示することとする（例えば〇１は、数字の１を〇印で囲ったものを意味する）。 Next, with reference to Figure 2, we will explain the calculation of feature data at each layer based on a process that simulates the auditory information processing in the human brain, and the moment and cross-correlation for each feature data. That is, in auditory information processing, sound data starts with being input to the cochlea (basilar membrane), and is finally processed in the brainstem thalamus, and then the first layer processing, second layer processing, and third layer processing are performed in sequence, and Figure 2 shows an imitation of such processing. Note that in the following explanation, numbers in Figure 2 that are circled will be represented as 01, 02, etc. (for example, 01 means the number 1 circled).

図２において、音データが、マイクにより構成された入力部１に入力される。入力された音データは、フーリエ変換部２において処理され、周波数でもって区分けされる（第１階層の処理）。図２では、簡単化のために高周波と低周波との２つに区分けした場合を示すが、実際には２０～３０程度に区分けされる。なお、この第１階層での処理は、蝸牛基底膜での処理に対応する。 In Figure 2, sound data is input to input unit 1, which is made up of a microphone. The input sound data is processed in Fourier transform unit 2 and divided by frequency (first-level processing). For simplicity, Figure 2 shows a case where the data is divided into two groups, high and low frequencies, but in reality, the data is divided into about 20 to 30 groups. This processing at the first level corresponds to processing at the cochlear basilar membrane.

第２階層の処理においては、第１階層で得られた特徴データ（区分けされた周波数）について、非線形圧縮した包絡線の抽出処理（〇１、〇２で示す特徴データの抽出）と、この後の非線形圧縮処理（強い音は弱められ、弱い音は強められる処理に相当）により、〇３、〇４で示す特徴データが取得される。この第２階層での処理は、蝸牛基底膜の振幅特性と脳幹視床に基づく処理に相当する。 In the second layer of processing, the feature data (divided frequencies) obtained in the first layer is subjected to a nonlinearly compressed envelope extraction process (extraction of feature data shown in O1 and O2), followed by a nonlinear compression process (corresponding to a process in which loud sounds are weakened and soft sounds are strengthened) to obtain the feature data shown in O3 and O4. This second layer of processing corresponds to processing based on the amplitude characteristics of the cochlear basilar membrane and the brainstem thalamus.

第３階層の処理においては、フーリエ変換部３、４によって、第２階層での特徴データ〇３、〇４についてそれぞれ、さらに周波数によって区分けされる（第２階層における特徴データについての変調周波数に相当）。第３階層で算出された特徴データが、〇５～〇８で示される。第３階層での処理は、脳幹視床での処理に相当する。 In the processing of the third layer, the Fourier transform units 3 and 4 further classify the feature data O3 and O4 in the second layer by frequency (corresponding to the modulation frequency of the feature data in the second layer). The feature data calculated in the third layer are shown as O5 to O8. Processing in the third layer corresponds to processing in the brainstem thalamus.

上述した各特徴データ〇１～〇８について、Ｍｏｍｅｎｔおよび相互相関が算出される。Ｍｏｍｅｎｔは、基本的統計量となる平均、分散、尖度、歪度のいずれかの値で定義され、階層によりＭｏｍｅｎｔの２乗値を用いる場合は別途Ｐｏｗｅｒの用語を用いるようにしてある。図２中、Ｍ２あるいはＭ３で示すものがＭｏｍｅｎｔあるいはＰｏｗｅｒであり、Ｃ０、Ｃ１、Ｃ２で示すのが相互相関である。より、具体的には、Ｍ２は、第２階層での特徴データとなる〇３、〇４についてＭｏｍｅｎｔであり、Ｍ３は、第３階層での特徴データとなる〇５～〇８についてのＰｏｗｅｒである。 Moment and cross-correlation are calculated for each of the above-mentioned feature data 01 to 08. Moment is defined as one of the basic statistical values: mean, variance, kurtosis, or skewness, and when the squared value of Moment is used depending on the hierarchy, the term Power is used separately. In Figure 2, M2 and M3 indicate Moment and Power, while C0, C1, and C2 indicate cross-correlation. More specifically, M2 is Moment for 03 and 04, which are feature data in the second hierarchy, and M3 is Power for 05 to 08, which are feature data in the third hierarchy.

また、Ｃ０は、第２階層の特徴データとなる〇３と〇４との間での相互相関である。Ｃ１は、〇５と〇７の間の相互相関と、〇６と〇８との相互相関を示し、第１階層の処理によって区分けされた異なる周波数間での相互相関となる。Ｃ２は、〇５と〇６の間の相互相関と、〇７と〇８との間の相互相関を示し、第１階層の処理によって区分けされたときに同一周波数となる間での相互相関となる。 Furthermore, C0 is the cross-correlation between O3 and O4, which are feature data in the second layer. C1 shows the cross-correlation between O5 and O7 and the cross-correlation between O6 and O8, which are cross-correlation between different frequencies separated by the processing in the first layer. C2 shows the cross-correlation between O5 and O6 and the cross-correlation between O7 and O8, which are cross-correlation between the same frequency when separated by the processing in the first layer.

次に、図３を参照しつつ、評価モデルを設定する例について説明する。なお、以下の説明では適宜、評価する感性が「不快」とした場合を例にして行うこととする。すなわち、例えばエンジン音が聴く者に与える不快なレベル（快レベル）を複数段階で評価する場合を例にして説明する。 Next, an example of setting an evaluation model will be described with reference to FIG. 3. In the following description, the evaluation sensitivity will be appropriately set to "unpleasant" as an example. That is, an example will be described in which the level of discomfort (comfort level) that engine noise causes to the listener is evaluated in multiple stages.

まず、図３中、Ｇ１、Ｇ２・・・Ｇｎとして示すのは、聴覚上の感性の評価対象となる異なる多数（例えば５００～１０００）の音データを示す。この多数の音データについての聴覚上の感性評価値（例えば不快レベルについて複数段階での評価値）が紐づけられている。なお、不快レベル評価値は、評価を行う複数人の専門家による評価値を平均した平均値としてある。図３において、太い縦線は、図２に示す個々のＭｏｍｅｎｔや相互相関を仕切る線である。そして、太い縦線で囲まれた範囲内での細い縦線は、Ｍｏｍｅｎｔや相関レベルを示す個々のデータ（相関係数）を仕切るものである。 First, in Figure 3, G1, G2, ... Gn indicate a large number of different sound data (e.g., 500 to 1000) that are the subject of an auditory sensitivity evaluation. The auditory sensitivity evaluation values (e.g., evaluation values at multiple stages for discomfort level) for these large number of sound data are linked. The discomfort level evaluation value is the average value obtained by averaging the evaluation values by multiple experts who perform the evaluation. In Figure 3, the thick vertical lines are lines that separate the individual moments and cross-correlations shown in Figure 2. And the thin vertical lines within the range surrounded by the thick vertical lines separate the individual data (correlation coefficients) that indicate the moments and correlation levels.

図３において、「２ｎｄ」として示すのは第２階層における特徴データであることを意味し、「３ｒｄ」として示すのは第３階層における特徴データであることを意味する。また、「Ａｃｒｏｓｓ」として示すのは、第１階層で区分けされた周波数のうち異なる周波数間での相互相関であることを意味し、「Ｗｉｔｈｉｎ」として示すのは第１階層で区分けされた周波数のうち同一周波数間での相互相関であることを意味する。 In Figure 3, "2nd" means feature data in the second layer, and "3rd" means feature data in the third layer. "Across" means cross-correlation between different frequencies among the frequencies divided in the first layer, and "Within" means cross-correlation between the same frequencies among the frequencies divided in the first layer.

図３では、相関の種類として６種類が設定されている。すなわち、左側から右側へ順次、「２ｎｄＭｏｍｅｎｔ」、「２ｎｄＡｃｒｏｓｓ」、「３ｒｄＰｏｗｅｒ、「３ｒｄＡｃｒｏｓｓ」、「３ｒｄＷｉｔｈｉｎ（Ｒｅａｌ）」、「３ｒｄＷｉｔｈｉｎ（Ｉｍａｇｅ）」である。なお、「３ｒｄＷｉｔｈｉｎ」については、ヒルベルト変換して、その実部が「３ｒｄＷｉｔｈｉｎ（Ｒｅａｌ）」とされ、虚部が「３ｒｄＷｉｔｈｉｎ（Ｉｍａｇｅ）」とされる。 In Figure 3, six types of correlation are set. From left to right, they are "2ndMoment", "2ndAcross", "3rdPower", "3rdAcross", "3rdWithin(Real)", and "3rdWithin(Image)". Note that "3rdWithin" is transformed using the Hilbert transform, with its real part being "3rdWithin(Real)" and its imaginary part being "3rdWithin(Image)".

各音データについて、Ｍｏｍｅｎｔ、相互相関が算出される。１つの音データについて、Ｍｏｍｅｎｔおよび相互相関について、数多くの値を有するものである。各Ｍｏｍｅｎｔおよび各相互相関について、主成分分析（例えば５次元）が行われて、図３中Ａ列で示すようにデータ抽出（データの絞り込み）が行われる。また、図３中Ｂ列で示すように、Ａ列の中から、主観評価値に対して一定以上の相関を有するもの（例えば不快レベルの評価に大きな影響を及ぼすパラメータ）が選択される。 For each sound data, the moment and cross-correlation are calculated. For one sound data, there are many values for the moment and cross-correlation. For each moment and each cross-correlation, a principal component analysis (e.g., five dimensions) is performed, and data extraction (narrowing of data) is performed as shown in column A in FIG. 3. In addition, as shown in column B in FIG. 3, from column A, those that have a certain level of correlation with the subjective evaluation value (for example, parameters that have a large impact on the evaluation of the discomfort level) are selected.

主成分分析によって取得された上記Ａ列に示すパラメータに基づいて、評価モデルを示す（１）式が、数１のように決定される。 Based on the parameters shown in column A obtained by principal component analysis, the evaluation model (1) is determined as shown in equation 1.

（１）式中、Ｙ、各音データについての評価値であり、ａ０～ａｍ、ｂ１～ｂｍ、ｃ（ｃについてのサフィックスはここでは省略）は係数（定数）であり、Ｘ１～Ｘｍは、図３で示すＡ列でのＭｏｍｅｎｔ（あるいはＰｏｗｅｒ）および相互相関値である。当初は、上記各係数は不知である。（１）式に対して、各音データＧ１～Ｇｎについての主観的な評価値ＹとＭｏｍｅｎｔや相互相関値Ｘ１～Ｘｍをあてはめて、回帰手法によって各係数が決定される。回帰モデルとなる（１）式において、個々の係数は、多数のＭｏｍｅｎｔや相互相関値のうち特定の１つが対応づけられている。

In formula (1), Y is an evaluation value for each sound data, a0 to am, b1 to bm, and c (the suffix for c is omitted here) are coefficients (constants), and X1 to Xm are Moment (or Power) and cross-correlation values in column A shown in FIG. 3. Initially, the above coefficients are unknown. Each coefficient is determined by applying the subjective evaluation value Y and Moment and cross-correlation values X1 to Xm for each sound data G1 to Gn to formula (1) using a regression method. In formula (1) which is a regression model, each coefficient is associated with a specific one of many Moments and cross-correlation values.

（１）式は、抽出されたＡ列で示すＭｏｍｅｎｔや相互相関値を含むように決定されたものとなる。この（１）式から、図３のＢ列で示すような例えば不快レベルの評価に大きな影響を与えるパラメータのみを残して（評価に影響がないパラメータを削除して）、不快レベル評価用の評価モデル式が下記の（２）式のように決定される。すなわち、（２）式では、（１）式でのサフィックスｍよりも小さいサフィックスｎを有するものとなる。 Formula (1) is determined so as to include the extracted moments and cross-correlation values shown in column A. From this formula (1), for example, only parameters that have a large effect on the evaluation of the discomfort level, as shown in column B of Figure 3, are left (parameters that have no effect on the evaluation are deleted), and the evaluation model formula for evaluating the discomfort level is determined as shown in formula (2) below. That is, formula (2) has a suffix n that is smaller than the suffix m in formula (1).

ここで、上記（１）式では、全ての１次項に対して、２次の項および交互作用の項（初期値ａ０を除く右辺第３項）を含んでいる。実際には、評価モデルを極力シンプルにするという観点から、主観評価に所定以上の影響を有するある特定の１次の項に対してのみ、２次の項および交互作用の項を有するものとするのが好ましく、このような処理が行われた後の状態が、（２）式である。なお、（２）式は、（１）式から不要なパラメータについての係数分を削除した後に、各係数のサフィックスが連続するように書き直したものとなっている（（１）式と（２）式とでは、同じサフィックスの係数が同じ内容を示すものとは限らない）。

Here, in the above formula (1), a second-order term and an interaction term (the third term on the right side excluding the initial value a0) are included for all first-order terms. In reality, from the viewpoint of simplifying the evaluation model as much as possible, it is preferable to have a second-order term and an interaction term only for a certain first-order term that has a predetermined or greater influence on the subjective evaluation, and the state after such processing is the formula (2). Note that the formula (2) is rewritten so that the suffixes of each coefficient are consecutive after the coefficients for unnecessary parameters are deleted from the formula (1) (coefficients with the same suffix do not necessarily indicate the same content in the formulas (1) and (2)).

ある音データの不快レベル評価を行う場合、当該ある音データについて、図３に示すようにＭｏｍｅｎｔ（Ｐｏｗｅｒ）や相互相関値を算出し、この算出結果を上記（２）式に当てはめることにより、上記ある音データについての評価値が数値でもって算出されることになる（定量化で可視化ともなる）。なお、（２）式での評価モデルの妥当性を、図３で示す音データＧ１～Ｇｎ以外の複数の音データ（から算出されるＭｏｍｅｎｔや相互相関値）とそれに対する主観的な評価値とを利用して、検証することもできる。 When evaluating the discomfort level of certain sound data, the moment (power) and cross-correlation value are calculated for the certain sound data as shown in Figure 3, and the calculation result is applied to the above formula (2), whereby an evaluation value for the certain sound data is calculated numerically (and can be visualized through quantification). Note that the validity of the evaluation model in formula (2) can also be verified using multiple sound data other than the sound data G1 to Gn shown in Figure 3 (moments and cross-correlation values calculated from them) and subjective evaluation values for them.

なお、上記の説明では、聴覚上の感性に着目して説明したが、視覚上の感性についても同様に評価モデルを設定することができる。すなわち、人間の視覚情報処理を模擬したときは、第１階層では周波数（つまり解像度）による区分け、第２階層では方位選択処理、第３階層では方位信号強度の処理が行われて、第１階層から第３階層の各特徴データについての自己相関（を示す相関値で、モーメントに基づく値を使用することもできる）、相互相関（を示す相関値）を利用して、図３で説明したのと同様な手法により、視覚上の感性についての評価モデルが設定される。 In the above explanation, the focus has been on auditory sensitivity, but an evaluation model can also be set for visual sensitivity in a similar manner. That is, when simulating human visual information processing, the first layer is divided by frequency (i.e., resolution), the second layer is azimuth selection processing, and the third layer is azimuth signal strength processing. Using the autocorrelation (correlation value indicating, values based on moments can also be used) and cross-correlation (correlation value indicating) for each piece of feature data from the first layer to the third layer, an evaluation model for visual sensitivity is set using a method similar to that described in Figure 3.

次に、図４を参照しつつ、本発明の実施形態における感情推定装置の制御系統例（感性評価を行うための評価モデルを設定するための全体的な制御系統例）について説明する。図４において、１１は、聴覚上の感性の評価対象となる音データの入力部であり、例えばマイクによって構成される。入力部１１に入力された音データは、図２及び図３を参照して説明した手法によって処理され、音データの特徴統計量として自己相関および相互相関が算出される。すなわち、入力された音データについて、特徴データが音特徴算出部１２によって算出されて、この特徴データに基づいて、統計量計算部１３によって自己相関および相互相関が算出される。 Next, with reference to FIG. 4, an example of a control system of an emotion estimation device in an embodiment of the present invention (an example of an overall control system for setting an evaluation model for performing a sensibility evaluation) will be described. In FIG. 4, 11 is an input unit for sound data to be evaluated for auditory sensibility, and is constituted by, for example, a microphone. The sound data input to the input unit 11 is processed by the method described with reference to FIG. 2 and FIG. 3, and autocorrelation and cross-correlation are calculated as feature statistics of the sound data. That is, feature data is calculated for the input sound data by the sound feature calculation unit 12, and autocorrelation and cross-correlation are calculated by the statistics calculation unit 13 based on this feature data.

一方、個人情報入力部２１において、個人情報（例えば性別、年齢、職業等）が入力され、入力された個人情報が、個人情報前処理部２２で前処理（例えば年齢を複数段階に分類等）された後、個人情報記録部２３に記録（記憶）される。 Meanwhile, personal information (e.g., gender, age, occupation, etc.) is input in the personal information input unit 21, and the input personal information is preprocessed (e.g., age is classified into multiple stages) in the personal information preprocessing unit 22, and then recorded (stored) in the personal information recording unit 23.

生体センサ入力部２４で入力された生体情報と主観評価部２５で入力された主観的な感性評価値とが、感性実測値推定部２６に入力されて感性実測値が推定され、この推定された感性実測値が感性実測値記録部２７に記録（記憶）される。 The bioinformation inputted by the biosensor input unit 24 and the subjective sensory evaluation value inputted by the subjective evaluation unit 25 are inputted to the sensory actual measurement value estimation unit 26 to estimate the sensory actual measurement value, and this estimated sensory actual measurement value is recorded (stored) in the sensory actual measurement value recording unit 27.

評価対象情報入力部２８で入力された評価対象データ（例えば音データ）に基づいて、評価対象特定部２９において評価対象が特定される。そして、この評価対象特定部で特定された評価対象が、評価対象記録部３０に記録（記憶）される。また、聴覚統計量計算部１３での計算結果が、聴覚統計量記録部３１に記録（記憶）される。 The evaluation target is identified in the evaluation target identification unit 29 based on the evaluation target data (e.g., sound data) input in the evaluation target information input unit 28. The evaluation target identified in the evaluation target identification unit is then recorded (stored) in the evaluation target recording unit 30. In addition, the calculation results in the hearing statistics calculation unit 13 are recorded (stored) in the hearing statistics recording unit 31.

上述した各記録部２３、２７、３０、３１が、データベース部Ｄを構成する。このデータベース部Ｄでのデータに基づいて、評価モデルが設定される。具体的には、データベース部Ｄでのデータが、評価パラメータ選択部Ｐに入力されて、評価すべき感性の種類に応じて、適切なパラメータ（評価軸）が選択されて（図３のＢ列のパラメータ選択が対応）、選択されたパラメータに応じて評価モデルが設定される（（２）式に示す評価モデルが設定される）。 The above-mentioned recording units 23, 27, 30, and 31 constitute the database unit D. An evaluation model is set based on the data in this database unit D. Specifically, the data in the database unit D is input to the evaluation parameter selection unit P, appropriate parameters (evaluation axes) are selected according to the type of sensitivity to be evaluated (corresponding to the parameter selection in column B of FIG. 3), and an evaluation model is set according to the selected parameters (the evaluation model shown in formula (2) is set).

評価対象記録部３０での記録内容は、図３に示す多数の音データＧ１～Ｇｎとして利用され、感性実測値記録部２７での記録内容が、各音データＧ１～Ｇｎに対する主観的な評価値として利用される。個人情報入力部２１は、個人情報（例えば性別や年齢層）に応じて評価値を補正するためのものであり、無くてもよいものである。統計量記録部３１は、新たに感性評価したときに、これを用いて評価モデルを学習補正するために用いられる。 The contents recorded in the evaluation object recording unit 30 are used as a large number of sound data G1 to Gn as shown in FIG. 3, and the contents recorded in the sensitivity measurement value recording unit 27 are used as subjective evaluation values for each sound data G1 to Gn. The personal information input unit 21 is used to correct the evaluation value according to personal information (e.g., gender and age group) and is not necessary. The statistics recording unit 31 is used to learn and correct the evaluation model when a new sensitivity evaluation is performed.

評価パラメータ選択部Ｐは、評価する感性の内容（種類）に応じて評価パラメータ（評価軸）を選択するものである。図４においては、評価パラメータとして、視覚刺激に関する評価パラメータＡと、聴覚刺激（音データ）に関する評価パラメータＢが例示されている。評価モデル構築部Ｃにおいては、評価パラメータ選択部Ｐで選択された評価パラメータに基づいて評価モデルが生成され、この評価モデルに基づいて、被測定者の感情状態の推定が実行される。 The evaluation parameter selection unit P selects evaluation parameters (evaluation axes) according to the content (type) of the sensitivity to be evaluated. In FIG. 4, evaluation parameters are exemplified by evaluation parameter A related to visual stimuli and evaluation parameter B related to auditory stimuli (sound data). In the evaluation model construction unit C, an evaluation model is generated based on the evaluation parameters selected in the evaluation parameter selection unit P, and the emotional state of the subject is estimated based on this evaluation model.

評価パラメータ選択部Ｐには、カテゴリ分類フィルタ３２が設けられている。カテゴリ分類フィルタ３２は、評価対象となる感性データが音データ（聴覚刺激）である場合に、取得された音データを、音源を特徴づける特徴量（テクスチャ特徴量）に基づいて、音源の種類毎に予め設定されたカテゴリに分類するフィルタである。すなわち、カテゴリ分類フィルタ３２は、音データをテクスチャ解析することにより、音データが含まれる可能性の高いカテゴリを選定し、選定されたカテゴリに音データを分類する。なお、カテゴリ分類フィルタ３２は、例えば機械学習により構築されるものである。 The evaluation parameter selection unit P is provided with a category classification filter 32. When the sensory data to be evaluated is sound data (auditory stimulation), the category classification filter 32 is a filter that classifies the acquired sound data into categories that are set in advance for each type of sound source based on features (texture features) that characterize the sound source. That is, the category classification filter 32 performs texture analysis on the sound data to select a category that is likely to include the sound data, and classifies the sound data into the selected category. The category classification filter 32 is constructed, for example, by machine learning.

カテゴリ分類フィルタ３２において音データの分類がなされると、音データがどのカテゴリに分類されたかにしたがって、評価モデル形成のためのパラメータセットに対して、カテゴリ毎の補正（音データの属するカテゴリに適合した補正）がなされる。例えば、図４においては、視覚刺激に関する評価パラメータＡは、音データに関するものではないので補正されないが、聴覚刺激に関するパラメータセットＢは、音データが属するカテゴリに応じた補正がなされることになる。 When the sound data is classified by the category classification filter 32, the parameter set for forming the evaluation model is corrected for each category (correction suited to the category to which the sound data belongs) according to which category the sound data is classified. For example, in FIG. 4, evaluation parameter A related to visual stimuli is not corrected because it does not relate to sound data, but parameter set B related to auditory stimuli is corrected according to the category to which the sound data belongs.

具体的には、感情推定式である上記式（２）における係数が、分類された各カテゴリの特徴（各カテゴリの音に対する人間の脳における感じ方）を反映したものに補正される（カテゴリ毎に予め設定された係数に変更される）。 Specifically, the coefficients in the emotion estimation formula (2) above are corrected to reflect the characteristics of each classified category (how the human brain feels each category of sound) (the coefficients are changed to preset coefficients for each category).

このように、本実施形態の感情推定装置では、音データをカテゴリに分類し、感情推定式に対してカテゴリ毎の補正を行うので、感情推定の精度が向上する。詳しく説明すると、聴覚刺激は、音データのカテゴリ（例えば、自然音、電子音等の音源の種類）によって、人間の感情（感性）に働きかける作用が大きく相違してくる（音の特徴量の感情への寄与
度が異なってくる）が、音データを全体として分析しただけでは、この相違が感情推定に十分に反映されない。このため、本実施形態においては、聴覚刺激に関するパラメータセットを、音データの属するカテゴリに応じて補正することにより、感情推定（感性評価）において、カテゴリが異なることによる相違を適切に反映させることができるようにし、より高精度できめ細かな感情推定を可能としている。 In this way, in the emotion estimation device of the present embodiment, sound data is classified into categories, and the emotion estimation formula is corrected for each category, thereby improving the accuracy of emotion estimation. To explain in detail, the effect of auditory stimuli on human emotions (sensitivity) varies greatly depending on the category of sound data (for example, the type of sound source such as natural sound, electronic sound, etc.) (the contribution of sound features to emotions varies), but this difference is not sufficiently reflected in emotion estimation by simply analyzing the sound data as a whole. For this reason, in the present embodiment, the parameter set related to the auditory stimuli is corrected according to the category to which the sound data belongs, so that the difference due to different categories can be appropriately reflected in emotion estimation (sensitivity evaluation), enabling more accurate and detailed emotion estimation.

カテゴリ分類フィルタ３２における音データの分類は、音データのテクスチャ特徴量に基づいて実行される。本実施形態においては、音データのテクスチャ特徴量として、特徴算出部１２及び計量計算部１３によって算出された自己相関及び相互相関が用いられる。 The classification of sound data in the category classification filter 32 is performed based on the texture features of the sound data. In this embodiment, the autocorrelation and cross-correlation calculated by the feature calculation unit 12 and the metric calculation unit 13 are used as the texture features of the sound data.

具体的には、図２及び図３とともに説明した通りの手順で、自己相関及び相互相関が算出される。すなわち、取得された音データは、周波数分解→包絡線抽出→変調周波数分解の順で処理され、周波数成分の振幅について平均、分散、歪度、尖度が計算され、自己相関及び相互相関が算出される（時間軸の変調を捉える）とともに、変調周波数成分（音波の包絡線の周波数成分）について、自己相関及び相互相関が算出される（時間／周波数変調を捉える）。 Specifically, autocorrelation and cross-correlation are calculated using the procedure described in conjunction with Figures 2 and 3. That is, the acquired sound data is processed in the order of frequency decomposition → envelope extraction → modulation frequency decomposition, the mean, variance, skewness, and kurtosis are calculated for the amplitude of the frequency components, and autocorrelation and cross-correlation are calculated (to capture the modulation on the time axis), and autocorrelation and cross-correlation are calculated for the modulation frequency components (frequency components of the envelope of the sound wave) (to capture the time/frequency modulation).

このように、テクスチャ特徴量は、音の周波数成分と変調周波数成分に基づいて算出されるので、各カテゴリへの分類及びこれに基づく感情推定は、人間の聴覚特性（人間の脳内の階層的な感覚情報処理）を的確に反映したものとできる。 In this way, texture features are calculated based on the frequency components and modulation frequency components of the sound, so classification into categories and emotion estimation based on this can accurately reflect the characteristics of human hearing (hierarchical sensory information processing in the human brain).

図５には、音データのカテゴリへの分類例を示す。本例において、音データの音源は、「動物」、「破裂音」、「日常音」、「電子音」、「音楽」、「自然音」、「会話」、「背景音」、「（電気的）効果音」、「輸送機器」の１０個のカテゴリに分類される。 Figure 5 shows an example of categorizing sound data. In this example, the sound sources of the sound data are classified into 10 categories: "animals," "plosives," "everyday sounds," "electronic sounds," "music," "natural sounds," "conversations," "background sounds," "(electrical) sound effects," and "transportation equipment."

図５に示されるように、「動物」カテゴリは、犬、猫、牛、馬、鳥等の動物の鳴き声等を含むものである。また、「破裂音」カテゴリは、銃声、爆破音、衝突音等を含むである。また、「日常音」カテゴリは、ベル、アラーム、ドアの開閉音、電話の音等を含むものである。また、「電子音」カテゴリは、テレビゲームの音、カジノの音、スマートフォンの音等を含むものである。また、「音楽」カテゴリは、カントリー、ロック、オーケストラ等の各ジャンルの音楽の音を含むである。また、「自然音」カテゴリは、海、谷、火、雷、雨、風の音等を含むものである。また、「会話」カテゴリ（「ヒト」カテゴリ）は、会話、泣き、笑い、叫び等のヒト（人間）の声を含むものである。また、「背景音」カテゴリは、街、オフィス等の音を含むものである。また、「効果音」カテゴリは、ホラー、サスペンス等の映像作品のシーンにおいて使用される音を含むものである。また、「輸送機器」カテゴリは、車、電車、飛行機、バイク等の音を含むものである。 As shown in FIG. 5, the "animal" category includes animal sounds such as dogs, cats, cows, horses, and birds. The "plosive sounds" category includes gunshots, explosions, and collisions. The "everyday sounds" category includes bells, alarms, door opening and closing, and telephone sounds. The "electronic sounds" category includes video game sounds, casino sounds, and smartphone sounds. The "music" category includes music of various genres such as country, rock, and orchestra. The "natural sounds" category includes the sounds of the sea, valleys, fire, thunder, rain, and wind. The "conversation" category ("human" category) includes human voices such as conversation, crying, laughing, and screaming. The "background sounds" category includes sounds of towns, offices, and the like. The "sound effects" category includes sounds used in scenes of horror, suspense, and other video works. The "transportation equipment" category includes sounds of cars, trains, airplanes, motorcycles, and the like.

各カテゴリに応じた感情推定式の補正は、各カテゴリの音の感情値の特徴（補正を加えない感情推定式での推定に対して、実際の感情スコアがどのような傾向にあるか）に応じて実行される。例えば、「破裂音」や「輸送機器」の感情値の特徴は、不快且つ活性であるので、音データが「破裂音」や「輸送機器」に分類された場合には、感情推定式は、快／不快については、より不快と推定する方向に、また活性／非活性については、より活性と推定する方向に、補正がなされることになる。同様に、「電子音」や「背景音」の場合には、より活性と推定される方向の補正が、「音楽」の場合には、より快と推定される方向の補正が、「会話」の場合には、より不快と推定される方向の補正が、それぞれ行われることになる。 The emotion estimation formula is corrected for each category according to the characteristics of the emotion value of the sound in each category (the tendency of the actual emotion score compared to the estimation by the emotion estimation formula without correction). For example, the emotion value characteristics of "plosive sounds" and "transportation equipment" are unpleasant and active, so when sound data is classified as "plosive sounds" or "transportation equipment", the emotion estimation formula is corrected in the direction of estimating more unpleasant for pleasant/unpleasant, and in the direction of estimating more active for active/inactive. Similarly, in the case of "electronic sounds" and "background sounds", correction is made in the direction of estimating more active, in the case of "music", correction is made in the direction of estimating more pleasant, and in the case of "conversation", correction is made in the direction of estimating more unpleasant.

音データ分類の別法として、人工音（例えば、電子音、警告音等）のカテゴリと非人工音（自然音、ヒトの声等）のカテゴリに分類することも考えられる。この場合、人工音は、アラーム等、異常状態を示すものであることが多いため、人間の感情に対して、より不快の方向に作用するものである。したがって、人工音に分類された音に対しては、より不快と推定する方向に感情推定式が補正される。 Another way of classifying sound data is to classify it into categories of artificial sounds (e.g., electronic sounds, warning sounds, etc.) and non-artificial sounds (natural sounds, human voices, etc.). In this case, artificial sounds are often alarms and other types of sounds that indicate abnormal conditions, and therefore tend to affect human emotions in a more unpleasant direction. Therefore, for sounds classified as artificial sounds, the emotion estimation formula is corrected to estimate them as more unpleasant.

なお、上記の説明では、音データが１つのカテゴリに分類される場合（音源が単独である場合）について説明してきたが、本発明は、音データの音源が複数である場合にも適用可能である。この場合、例えば、上記式（２）の係数を、音データにおける各音源の含有割合を考慮したうえで、音源の属する各カテゴリからの影響を反映したものに設定すればよい。 In the above explanation, we have described the case where sound data is classified into one category (where there is a single sound source), but the present invention is also applicable to cases where the sound data has multiple sound sources. In this case, for example, the coefficients in the above formula (2) can be set to reflect the influence of each category to which the sound source belongs, taking into account the proportion of each sound source contained in the sound data.

図６及び図７には、本発明の感情推定装置（補正された感情推定式）を用いることによる感情推定の精度向上を、音のカテゴリ毎の感情マップで示す。図６に示されるように、本発明の補正を行わない感情推定式による感情推定においては、図中に黒塗りの菱形で示す感情推定値が、図中に白抜きの丸で示す感情スコア（生体センサ等で直接的に検出された感情状態）に重ならずに予測が大きく外れているカテゴリが見受けられる。 Figures 6 and 7 show the improvement in the accuracy of emotion estimation by using the emotion estimation device of the present invention (the corrected emotion estimation formula) in the form of emotion maps for each sound category. As shown in Figure 6, in emotion estimation using the emotion estimation formula without the correction of the present invention, there are categories where the emotion estimation values shown as black diamonds in the figure do not overlap with the emotion scores shown as white circles in the figure (emotional states directly detected by a biosensor, etc.), resulting in a significantly off prediction.

これに対して、図７に示されるように、本発明の補正を行った感情推定式による感情推定によれば、感情推定値と感情スコアの一致度（重なり度合い）が向上し、予測精度が大幅に向上していることが分かる。例えば、「破壊音」カテゴリの場合、カテゴリ補正無しの感情推定では、縦軸に示す快／不快と横軸に示す活性／非活性において、実際よりも快及び非活性の方向に推定されてしまっているのに対して、本発明の感情推定によれば、不快及び活性の度合いが的確に推定され、感情スコアと感情推定値の一致度が向上している。 In contrast, as shown in Figure 7, emotion estimation using the emotion estimation formula corrected by the present invention improves the degree of agreement (overlap) between the emotion estimation value and the emotion score, and prediction accuracy is significantly improved. For example, in the case of the "destruction sound" category, emotion estimation without category correction estimates the degree of pleasantness and inactivity more than the actual state, with pleasantness/unpleasantness shown on the vertical axis and activity/inactivity shown on the horizontal axis, whereas emotion estimation according to the present invention accurately estimates the degree of unpleasantness and activity, improving the degree of agreement between the emotion score and the emotion estimation value.

次に、本発明の感情推定装置を用いた車両の運転支援制御について説明する。運転支援において、従来技術のように、音のカテゴリによらない共通の感情推定式によって乗員の感情を推定し、この推定結果に基づいて、乗員の感情状態の制御（例えば、不快状態を快状態とするように制御音を出す制御）を行う場合、音のカテゴリ（音源の種類）によって音の特徴量の感情への寄与度が異なるため、誤推定が生じやすい。 Next, we will explain driving assistance control for a vehicle using the emotion estimation device of the present invention. In driving assistance, when estimating the emotion of an occupant using a common emotion estimation formula that is not dependent on the sound category, as in the conventional technology, and controlling the emotional state of the occupant based on this estimation result (for example, by emitting a control sound to turn an unpleasant state into a pleasant state), erroneous estimation is likely to occur because the contribution of sound features to emotions differs depending on the sound category (type of sound source).

例えば、聴覚刺激である音楽を不快と判定してしまう結果、乗員が快状態であるのに、これを打ち消す制御音を出力してしまって、音環境が変わって乗員が不快になってしまうことがあり得る。また、走行ノイズを快と判定することにより、乗員が不快状態にあるのに、これに対する適切な制御がなされないことがあり得る。 For example, if music, which is an auditory stimulus, is judged to be unpleasant, a control sound may be output to cancel out the music even when the occupant is in a comfortable state, changing the sound environment and making the occupant uncomfortable. Also, if driving noise is judged to be pleasant, appropriate control may not be implemented even when the occupant is in an uncomfortable state.

これに対して、本発明の感情推定装置を用いた運転支援制御によれば、音をカテゴリに分類し、カテゴリ毎の感情推定式によって感情を推定するので、乗員の感情を的確に推定することができ、適切な制御（例えば、スピーカからの制御音の出力、車両挙動の制御（ｃｏ－ｐｉｌｏｔ）等）を行うことができる。 In contrast, driving assistance control using the emotion estimation device of the present invention classifies sounds into categories and estimates emotions using emotion estimation formulas for each category, making it possible to accurately estimate the emotions of occupants and perform appropriate control (e.g., outputting control sounds from speakers, controlling vehicle behavior (co-pilot), etc.).

上記１０個のカテゴリの各々に対応した運転支援制御としては、以下のような制御が考えられる。本発明によれば、これらの制御において、各カテゴリの特徴を反映した的確な感情推定を行えるので、より効果的な運転支援制御を実行し得る（感情推定式に対する補正が、特に有効に機能する）。 The following types of driving assistance control are possible for each of the 10 categories. According to the present invention, these controls can perform accurate emotion estimation that reflects the characteristics of each category, making it possible to implement more effective driving assistance control (correction to the emotion estimation formula works particularly effectively).

音カテゴリが「動物」である場合の制御としては、動物が出そうな場所（例えば地図情報より検出される）を走行している際、車両内外のマイクのうちで内側のマイクでのみ「動物」と分類される音が検出された場合（例えば、動物の鳴き声がラジオから聞こえた場合）に、非活性に強調した制御音を出して乗員を落ち着かせ、動物が飛び出す可能性が低いとアナウンスする制御を行い得る。 When the sound category is "animal," control can be performed such that, while driving in a location where animals are likely to appear (for example, detected from map information), if a sound classified as "animal" is detected only by the inner microphone among the microphones inside and outside the vehicle (for example, when an animal's cry is heard over the radio), a control sound with an emphasis on inactivity is emitted to calm the occupants and announce that there is a low possibility of an animal jumping out.

また、車両に動物が同乗している場合には、車両内（例えば後席）の動物の鳴き声が活性かつ不快であったとき、乗員にも活性かつ不快さを強調した音を出力し、動物に発生した異常に気づかせる（動物の体調悪化を乗員に伝える）制御を行い得る。 In addition, if an animal is riding in the vehicle and the animal's cries inside the vehicle (e.g., in the back seat) are loud and unpleasant, a sound that emphasizes the loudness and unpleasantness can be output to the occupants to make them aware of something abnormal that has happened to the animal (to inform the occupants of the animal's deteriorating health).

また、車両周辺に動物が潜んでいる可能性があるとき、車室外のマイクの音が「動物」のカテゴリであった場合（つまり、車両外の動物を検出した場合）、動物の飛び出しを警戒し、車速を落とす車両挙動制御を行い得る。 In addition, if there is a possibility that an animal may be lurking around the vehicle, and the sound from the microphone outside the cabin is in the "animal" category (i.e., an animal outside the vehicle is detected), vehicle behavior control can be performed to be on alert for an animal jumping out and reduce the vehicle speed.

音のカテゴリが「破壊音」である場合の制御としては、例えば、自車が故障した場合や自車の周囲で事故が発生した場合が考えられ、乗員は混乱した状態にあると想定できるので、非活性に強調した制御音を出力して、乗員を落ち着かせる制御を行い得る。また、乗員に周囲の状況確認をするよう注意喚起を行う制御音を出力する制御を行い得る。更に、自車の故障・事故による音の場合、路肩に緊急停止させる車両挙動制御を行い得る。 When the sound category is "destructive sound," for example, it may be the case that the vehicle breaks down or an accident occurs near the vehicle, and since it is assumed that the occupants are in a confused state, control may be performed to output a control sound with inactive emphasis to calm the occupants. Also, control may be performed to output a control sound that alerts the occupants to check the situation around them. Furthermore, when the sound is due to a breakdown or accident of the vehicle, vehicle behavior control may be performed to make an emergency stop on the shoulder of the road.

音のカテゴリが「日常音」（ベル、アラーム、ドアの開閉音、電話の着信音等）である場合の制御としては、警告音が鳴っているときに、感情状態を活性・不快に強調した制御音を出力し、警告音に従わせる（確認や車両停止をさせる）制御を行い得る。また、自車の周囲でサイレンが鳴っている場合には、車両を路肩に寄せる車両挙動制御を行い得る。 When the sound category is "everyday sounds" (bells, alarms, door opening and closing sounds, telephone ringtones, etc.), control can be performed by outputting a control sound that emphasizes the emotional state as active or unpleasant when a warning sound is sounding, and by making the driver follow the warning sound (by making the driver confirm or stop the vehicle). Also, when a siren is sounding around the vehicle, vehicle behavior control can be performed to pull the vehicle over to the shoulder of the road.

音のカテゴリが「電子音」である場合の制御としては、運転席でスマートフォンの音を検出したとき（ドライバーが、ながらスマホをしていると考えられるとき）に、不快側に強調した制御音を出力し、ながらスマホをやめさせる（又は注意勧告する）制御を行い得る（危険運転の防止）。また、後席で電子音を検出した場合（例えば、後席で同乗者がゲームをしている場合）、ドライバーにはマスキング音を出力し、ゲーム音が伝わらないようにすることで、運転に集中させる制御を行い得る。 When the sound category is "electronic sound," when the sound of a smartphone is detected in the driver's seat (when the driver is thought to be using the smartphone while driving), a control sound that emphasizes the unpleasantness can be output, causing the driver to stop using the smartphone while driving (or to issue a warning) (preventing dangerous driving). Also, when an electronic sound is detected in the back seat (for example, when a passenger is playing a game in the back seat), a masking sound can be output to the driver to prevent the game sound from reaching them, allowing them to concentrate on driving.

音のカテゴリが「音楽」である場合（例えば、車内のオーディオからの音楽が検出された場合）の制御としては、事前に好みのジャンル（カントリー、ロック、オーケストラ等）を登録しておき、各ジャンルに合わせた特徴量を持つ音を合わせて出力し、好みの音楽のジャンルにアレンジする制御を行い得る。 When the sound category is "music" (for example, when music is detected from the car audio), the control can be such that the user registers their preferred genre (country, rock, orchestra, etc.) in advance, and sounds with characteristics suited to each genre are output and arranged to fit the preferred music genre.

音のカテゴリが「自然音」である場合の制御としては、雷のような天候の変化があったときに、雷に驚いて活性状態になった乗員を、非活性を強調した制御音を出力することにより落ち着かせる制御を行い得る。また、車両が海や山に近づいてきたことを車外のマイクが検出した場合には、自然音（例えば、波の音、木々の揺れる音、小鳥のさえずり等）の特徴を強調した制御音を出力し、その自然音を強調して聞かせることにより、自然を身近に感じさせる制御を行い得る。 When the sound category is "natural sounds," control can be performed such that when there is a weather change such as thunder, a control sound that emphasizes deactivation is output to calm occupants who become startled and active by the thunder. Also, when a microphone outside the vehicle detects that the vehicle is approaching the sea or mountains, a control sound that emphasizes the characteristics of natural sounds (e.g., the sound of waves, the sound of swaying trees, birds chirping, etc.) can be output, and by emphasizing and playing the natural sounds, control can be performed to make the occupants feel closer to nature.

音のカテゴリが「会話」である場合の制御としては、乗員同士の会話やラジオからの音声が検出されたとき、人間の声以外の音を非活性に強調する（更にレベルも低減させる）ことで、人間の声（会話）を聞き取りやすくする制御を行い得る。また、乗員の一人が体調不良により切羽詰まった声を出している場合、他の乗員に活性及び不快を強調した制御音を出力して、乗員の一人の体調不良を他の乗員に気付かせる制御を行い得る。 When the sound category is "conversation," when a conversation between occupants or a voice from the radio is detected, sounds other than human voices can be inactively emphasized (and their levels reduced) to make the human voice (conversation) easier to hear. Also, if one of the occupants is in a desperate voice due to poor health, a control sound that emphasizes activity and discomfort can be output to the other occupants, making them aware of the poor health of one of the occupants.

音のカテゴリが「背景音」である場合の制御としては、車室外のマイクで人混みを検出した場合、活性に強調した制御音を出力し、人の動きに注意を向けさせる制御を行い得る。また、人の混雑を検出した際に、車両の速度を低減させる車両挙動制御を行い得る。 When the sound category is "background sound," control can be performed by outputting a control sound that emphasizes activity when a crowd is detected by a microphone outside the vehicle cabin, to draw attention to people's movements. In addition, when a crowd is detected, vehicle behavior control can be performed to reduce the vehicle's speed.

音のカテゴリが「効果音」である場合の制御としては、ドライバー以外の乗員が映像（例えば、ホラー、サスペンス等の映像）を見ている場合に、ドライバーには快に強調した制御音を出力し、映像の影響を受けにくくする制御を行い得る。また、オーディオ・ラジオからの音が検出された場合に、オーディオ・ラジオからの音以外の音を非活性に強調する（更にレベルも低減させる）ことにより、オーディオ・ラジオからの音を聞き取り易くする制御を行い得る。 When the sound category is "sound effects," control can be performed such that, when passengers other than the driver are watching video (for example, horror, suspense, etc.), a pleasantly emphasized control sound is output to the driver, making them less susceptible to the video. Also, when sound from an audio/radio is detected, control can be performed to make the sound from the audio/radio easier to hear by deactivating and emphasizing sounds other than the sound from the audio/radio (and further reducing the level).

音のカテゴリが「輸送機器」である場合の制御としては、車両の走行中に、走行時の車内音が不快であれば、快を強調した制御音を出力して、乗員の感情状態を快にする制御を行い得る。 When the sound category is "transportation equipment," if the interior sound while the vehicle is moving is unpleasant, a control sound that emphasizes comfort may be output to make the emotional state of the occupants more comfortable.

また、車室外のマイクで接近する他車を検出した場合、その他車が実際よりも近くに接近したように聞こえるように、活性に強調した制御音を出力し、接近を知らせる制御を行い得る。また、車両挙動制御として、他社との車間距離をとるように自車を移動させる制御を行い得る。 In addition, when an approaching vehicle is detected by a microphone outside the vehicle cabin, a control sound with enhanced activity may be output to make the other vehicle sound closer than it actually is, to notify the driver of its approach. Vehicle behavior control may also be performed to move the vehicle to keep a safe distance from other vehicles.

以上、本発明の実施形態について説明したが、本発明は、上記実施形態に限定されるものではなく、特許請求の範囲に記載された範囲において適宜の変更が可能である。例えば、上記実施形態で開示した音のカテゴリは例示であり、上記実施形態とは異なるカテゴリの設定も可能である。 Although the embodiments of the present invention have been described above, the present invention is not limited to the above embodiments, and appropriate modifications are possible within the scope of the claims. For example, the sound categories disclosed in the above embodiments are merely examples, and categories different from those in the above embodiments can also be set.

本発明は、車両の運転支援において、乗員（ドライバーや同乗者）の感情状態を的確に推定するために利用できる。 The present invention can be used to accurately estimate the emotional state of occupants (drivers and passengers) in vehicle driving assistance.

Ｄデータベース部
Ｐ評価パラメータ選択部
Ｃ評価モデル構築部
１１評価対象取得部
１２特徴量算出部
１３統計量計算部
２１個人情報入力部
２２個人情報前処理部
２３個人情報記録部
２４生体センサ入力部
２５主観評価入力部
２６感性実測値推定部
２７感性実測値記録部
２８評価対象情報入力部
２９評価対象特定部
３０評価対象記録部
３１統計量記録部
３２カテゴリ分類フィルタ D Database section P Evaluation parameter selection section C Evaluation model construction section 11 Evaluation target acquisition section 12 Feature amount calculation section 13 Statistics calculation section 21 Personal information input section 22 Personal information preprocessing section 23 Personal information recording section 24 Biosensor input section 25 Subjective evaluation input section 26 Perceptual measurement value estimation section 27 Perceptual measurement value recording section 28 Evaluation target information input section 29 Evaluation target identification section 30 Evaluation target recording section 31 Statistics recording section 32 Category classification filter

Claims

An emotion estimation device that estimates an emotional state of a subject placed in a sound environment, comprising:
a sound data acquisition means for acquiring sound data in the sound environment;
an estimation means for estimating an emotional state of the subject when the subject receives an auditory stimulus specified by the sound data based on a predetermined emotion estimation formula;
A classification means for classifying the sound data into categories that are preset for each type of sound source based on a texture feature of the sound data;
a correction means for correcting the emotion estimation formula according to the category into which the sound data has been classified by the classification means ,
The texture feature is calculated based on frequency components and modulation frequency components of the sound data .

The emotion estimation device according to claim 1 ,
When the classification means classifies the sound data into a category of a plosive sound or a transportation machine, the correction means corrects the emotion estimation formula in a direction in which the emotional state is estimated to be more unpleasant and more active.

The emotion estimation device according to claim 1 ,
When the classification means classifies the sound data into the artificial sound category, the correction means corrects the emotion estimation formula in a direction such that the emotional state is estimated to be more unpleasant.