JP7732938B2

JP7732938B2 - Target sound processing device, target sound processing method, and target sound processing program

Info

Publication number: JP7732938B2
Application number: JP2022056007A
Authority: JP
Inventors: 勝夫柳沼; 康一稲留
Original assignee: Okumura Corp
Current assignee: Okumura Corp
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2025-09-02
Anticipated expiration: 2042-03-30
Also published as: JP2023148133A

Description

本発明は、対象音加工装置、対象音加工方法および対象音加工プログラムに関する。 The present invention relates to a target sound processing device, a target sound processing method, and a target sound processing program.

音の響き方や遮音の程度などは通常、数値で示されることが多く、このような数値に馴染みの薄い一般の人は、具体的な音の聞こえ方をイメージすることは難しい。そのため、例えば、建物の設計仕様から音の響き方や遮音性能などを計算して、騒音などを収音した対象音に予測計算結果を加味した試聴音を生成することが行われている。例えば、特許文献１には、環境騒音（対象音）に対して、受音室内への伝搬経路ごと、例えば、戸境壁直接透過経路、開口部からの迂回伝搬経路、側壁固体伝搬経路などの経路ごとの減衰量を予測し、予測計算した減衰量から求まるインパルス応答波形を環境騒音の音源波形に畳み込み演算して評価音（試聴音）を生成することが開示されている（請求項１等）。 The reverberation and sound insulation level are usually expressed numerically, making it difficult for the average person, who is unfamiliar with such numbers, to visualize how a sound will sound. For this reason, methods are used to calculate the reverberation and sound insulation performance of a building based on its design specifications, and then generate a sample sound by incorporating the predicted calculation results into the target sound, such as noise. For example, Patent Document 1 discloses a method for predicting the attenuation of environmental noise (target sound) for each propagation path into a sound receiving room, such as the direct transmission path through a partition wall, the roundabout propagation path from an opening, and the solid propagation path through a side wall, and then generating an evaluation sound (sample sound) by convolving the impulse response waveform calculated from the predicted attenuation with the source waveform of the environmental noise (Claim 1, etc.).

特開２００３－１５６３８８号公報Japanese Patent Application Laid-Open No. 2003-156388 特許第４３０７６２２号公報Patent No. 4307622 特許第４２３４２５７号公報Patent No. 4234257

しかしながら、上記特許文献１に記載の技術では、試聴音を聞きたい場所において加工の対象となる対象音を収音し、加工して試聴音を生成していたが、収音した場所についての情報を有しないので、場所の情報に基づいて、試聴音を生成できなかった。 However, with the technology described in Patent Document 1, target sounds to be processed are collected in the location where the user wants to hear the sample sound, and then processed to generate the sample sound. However, since the technology does not have information about the location where the sound was collected, it is not possible to generate the sample sound based on the location information.

上記課題を解決するため、本発明に係る対象音加工装置は、
所定場所において、収音部により収音された対象音と、前記所定場所を撮像した所定場所画像とを関連付けて格納する格納部と、
前記所定場所において、所定環境下で前記対象音を試聴した場合に、実際に聞こえる音である絶対音を再現する試聴音を試聴したい場所であって、前記所定場所とは異なる場所を撮像した試聴場所画像を受け付ける画像受付部と、
受け付けた前記試聴場所画像に類似する前記所定場所画像を前記格納部から探索する探索部と、
探索された前記所定場所画像に関連付けられている対象音を取得する対象音取得部と、
前記収音部の第１特性を取得する第１特性取得部と、
前記第１特性に基づいて、前記対象音を加工するための第１加工用補正値を取得し、取得した前記第１加工用補正値を用いて、前記対象音を加工して加工済対象音を生成する加工済対象音生成部と、
前記所定場所とは異なる場所において、所定環境下で、前記対象音を聴取した場合に、実際に聞こえる音である絶対音を再現する前記試聴音を、生成された前記加工済対象音を加工して生成する試聴音生成部と、
生成された前記試聴音を出力するための出力部の第２特性を取得する第２特性取得部と、
前記第２特性に基づいて、前記試聴音を加工するための第２加工用補正値を取得し、取得した前記第２加工用補正値を用いて、前記試聴音を加工して加工済試聴音を生成する加工済試聴音生成部と、
生成した加工済試聴音を制御して、前記出力部から出力させる出力制御部と、
を備えた。 In order to solve the above problems, the target sound processing device according to the present invention comprises:
a storage unit that stores a target sound collected by a sound collection unit at a predetermined location and a predetermined location image obtained by capturing the predetermined location in association with each other;
an image receiving unit that receives a listening location image, the listening location image being an image of a location different from the predetermined location, where a user wishes to listen to a listening sound that reproduces absolute sound, which is a sound that is actually heard when the target sound is listened to under a predetermined environment at the predetermined location;
a search unit that searches the storage unit for the predetermined location image similar to the received listening location image;
a target sound acquisition unit that acquires a target sound associated with the searched predetermined location image;
a first characteristic acquisition unit that acquires a first characteristic of the sound collection unit;
a processed target sound generation unit that acquires a first processing correction value for processing the target sound based on the first characteristic, and processes the target sound using the acquired first processing correction value to generate a processed target sound;
a preview sound generation unit that processes the generated processed target sound to generate the preview sound that reproduces an absolute sound that is a sound that is actually heard when the target sound is listened to in a predetermined environment at a location different from the predetermined location;
a second characteristic acquisition unit that acquires a second characteristic of an output unit for outputting the generated sample sound;
a processed sample sound generating unit that obtains a second processing correction value for processing the sample sound based on the second characteristic, and processes the sample sound using the obtained second processing correction value to generate a processed sample sound;
an output control unit that controls the generated processed sample sound and outputs it from the output unit;
Equipped with.

また、上記課題を解決するため、本発明に係る対象音加工方法は、
所定場所において、収音部により収音された対象音と、前記所定場所を撮像した所定場所画像とを関連付けて格納する格納ステップと、
前記所定場所において、所定環境下で前記対象音を試聴した場合に、実際に聞こえる音である絶対音を再現する試聴音を試聴したい場所であって、前記所定場所とは異なる場所を撮像した試聴場所画像を受け付ける画像受付ステップと、
受け付けた前記試聴場所画像に類似する前記所定場所画像を格納部から探索する探索ステップと、
探索された前記所定場所画像に関連付けられている対象音を取得する対象音取得ステップと、
前記収音部の第１特性を取得する第１特性取得ステップと、
前記第１特性に基づいて、前記対象音を加工するための第１加工用補正値を取得し、取得した前記第１加工用補正値を用いて、前記対象音を加工して加工済対象音を生成する加工済対象音生成ステップと、
前記所定場所とは異なる場所において、所定環境下で、前記対象音を聴取した場合に、実際に聞こえる音である絶対音を再現する前記試聴音を、生成された前記加工済対象音を加工して生成する試聴音生成ステップと、
生成された前記試聴音を出力するための出力部の第２特性を取得する第２特性取得ステップと、
前記第２特性に基づいて、前記試聴音を加工するための第２加工用補正値を取得し、取得した前記第２加工用補正値を用いて、前記試聴音を加工して加工済試聴音を生成する加工済試聴音生成ステップと、
生成した加工済試聴音を制御して、前記出力部から出力させる出力制御ステップと、
を含む。 In order to solve the above problems, the target sound processing method according to the present invention includes:
a storing step of storing a target sound collected by a sound collecting unit at a predetermined location and a predetermined location image obtained by capturing the predetermined location in association with each other;
an image receiving step of receiving a listening location image of a location different from the predetermined location where a user wishes to listen to a listening sound that reproduces absolute sound, which is a sound that is actually heard when the user listens to the target sound under a predetermined environment at the predetermined location;
a searching step of searching a storage unit for an image of the predetermined location similar to the received image of the listening location;
a target sound acquisition step of acquiring a target sound associated with the searched predetermined location image;
a first characteristic acquisition step of acquiring a first characteristic of the sound collection unit;
a processed target sound generation step of acquiring a first processing correction value for processing the target sound based on the first characteristic, and processing the target sound using the acquired first processing correction value to generate a processed target sound;
a sample sound generating step of processing the generated processed target sound to generate the sample sound that reproduces an absolute sound that is a sound that is actually heard when the target sound is listened to in a predetermined environment at a location different from the predetermined location;
a second characteristic acquisition step of acquiring a second characteristic of an output unit for outputting the generated sample sound;
a processed sample sound generating step of obtaining a second processing correction value for processing the sample sound based on the second characteristic, and processing the sample sound using the obtained second processing correction value to generate a processed sample sound;
an output control step of controlling the generated processed sample sound to output it from the output unit;
Includes:

さらに、上記課題を解決するため、本発明に係る対象音加工プログラムは、
所定場所において、収音部により収音された対象音と、前記所定場所を撮像した所定場所画像とを関連付けて格納する格納ステップと、
前記所定場所において、所定環境下で前記対象音を試聴した場合に、実際に聞こえる音である絶対音を再現する試聴音を試聴したい場所であって、前記所定場所とは異なる場所を撮像した試聴場所画像を受け付ける画像受付ステップと、
受け付けた前記試聴場所画像に類似する前記所定場所画像を格納部から探索する探索ステップと、
探索された前記所定場所画像に関連付けられている対象音を取得する対象音取得ステップと、
前記収音部の第１特性を取得する第１特性取得ステップと、
前記第１特性に基づいて、前記対象音を加工するための第１加工用補正値を取得し、取得した前記第１加工用補正値を用いて、前記対象音を加工して加工済対象音を生成する加工済対象音生成ステップと、
前記所定場所とは異なる場所において、所定環境下で、前記対象音を聴取した場合に、実際に聞こえる音である絶対音を再現する前記試聴音を、生成された前記加工済対象音を加工して生成する試聴音生成ステップと、
生成された前記試聴音を出力するための出力部の第２特性を取得する第２特性取得ステップと、
前記第２特性に基づいて、前記試聴音を加工するための第２加工用補正値を取得し、取得した前記第２加工用補正値を用いて、前記試聴音を加工して加工済試聴音を生成する加工済試聴音生成ステップと、
生成した加工済試聴音を制御して、前記出力部から出力させる出力制御ステップと、
をコンピュータに実行させる。 Furthermore, in order to solve the above problem, the target sound processing program according to the present invention comprises:
a storing step of storing a target sound collected by a sound collecting unit at a predetermined location and a predetermined location image obtained by capturing the predetermined location in association with each other;
an image receiving step of receiving a listening location image of a location different from the predetermined location where a user wishes to listen to a listening sound that reproduces absolute sound, which is a sound that is actually heard when the user listens to the target sound under a predetermined environment at the predetermined location;
a searching step of searching a storage unit for an image of the predetermined location similar to the received image of the listening location;
a target sound acquisition step of acquiring a target sound associated with the searched predetermined location image;
a first characteristic acquisition step of acquiring a first characteristic of the sound collection unit;
a processed target sound generation step of acquiring a first processing correction value for processing the target sound based on the first characteristic, and processing the target sound using the acquired first processing correction value to generate a processed target sound;
a sample sound generating step of processing the generated processed target sound to generate the sample sound that reproduces an absolute sound that is a sound that is actually heard when the target sound is listened to in a predetermined environment at a location different from the predetermined location;
a second characteristic acquisition step of acquiring a second characteristic of an output unit for outputting the generated sample sound;
a processed sample sound generating step of obtaining a second processing correction value for processing the sample sound based on the second characteristic, and processing the sample sound using the obtained second processing correction value to generate a processed sample sound;
an output control step of controlling the generated processed sample sound to output it from the output unit;
to be executed by the computer.

本発明によれば、試聴音を生成するために収音した対象音と、収音場所の情報とを関連付けて格納するので、場所の情報に基づいて、試聴音を生成することができる。 According to the present invention, the target sound picked up to generate the sample sound is stored in association with information about the sound pickup location, allowing the sample sound to be generated based on the location information.

本発明の好ましい実施形態に係る対象音加工システムの概要を説明するための図である。1 is a diagram for explaining an overview of a target sound processing system according to a preferred embodiment of the present invention; 本発明の好ましい実施形態に係る対象音加工システムの動作概要を説明するためのシーケンス図である。1 is a sequence diagram for explaining an outline of the operation of a target sound processing system according to a preferred embodiment of the present invention; 本発明の好ましい実施形態に係る対象音加工システムに含まれる対象音加工装置の構成を説明するための図である。1 is a diagram for explaining the configuration of a target sound processing device included in a target sound processing system according to a preferred embodiment of the present invention. FIG. 本発明の好ましい実施形態に係る対象音加工システムに含まれる対象音加工装置が有するマイク補正値テーブルの一例を示す図である。1 is a diagram showing an example of a microphone correction value table possessed by a target sound processing device included in a target sound processing system according to a preferred embodiment of the present invention; 本発明の好ましい実施形態に係る対象音加工システムに含まれる対象音加工装置が有するスピーカ補正値テーブルの一例を示す図である。FIG. 2 is a diagram showing an example of a speaker correction value table held by a target sound processing device included in a target sound processing system according to a preferred embodiment of the present invention. 本発明の好ましい実施形態に係る対象音加工システムに含まれる対象音加工装置が有する対象音画像テーブルの一例を示す図である。FIG. 2 is a diagram showing an example of a target sound image table possessed by a target sound processing device included in a target sound processing system according to a preferred embodiment of the present invention. 本発明の好ましい実施形態に係る対象音加工システムに含まれる対象音加工装置のハードウェア構成を説明するための図である。1 is a diagram illustrating a hardware configuration of a target sound processing device included in a target sound processing system according to a preferred embodiment of the present invention. 本発明の好ましい実施形態に係る対象音加工システムに含まれる対象音加工装置の処理手順を説明するためのフローチャートである。1 is a flowchart illustrating a processing procedure of a target sound processing device included in a target sound processing system according to a preferred embodiment of the present invention. 本発明の好ましい実施形態に係る対象音加工システムに含まれる携帯端末の処理手順を説明するためのフローチャートである。10 is a flowchart illustrating a processing procedure of a mobile terminal included in the target sound processing system according to a preferred embodiment of the present invention. 収音データの周波数特性を示すグラフである。10 is a graph showing frequency characteristics of collected sound data. マイクの収音可能レベルと線形性とを示すグラフである。10 is a graph showing the microphone's pickup level and linearity. ヘッドホン再生音の周波数特性を示すグラフである。10 is a graph showing frequency characteristics of sound reproduced through headphones. ヘッドホンの再生可能レベルと線形性とを示すグラフである。1 is a graph showing the reproducible level and linearity of headphones. 実験条件の概要を示す図である。FIG. 1 is a diagram showing an outline of experimental conditions. 室間音圧レベル差を示すグラフである。10 is a graph showing the difference in sound pressure level between rooms. 会議室（音源室）側の音圧波形を示すグラフである。10 is a graph showing a sound pressure waveform on the conference room (sound source room) side. 会議室（音源室）側のオクターブバンドレベルを示すグラフである。10 is a graph showing octave band levels on the conference room (sound source room) side. 執務室（受音室）側の音圧波形を示すグラフである。10 is a graph showing a sound pressure waveform on the office (sound receiving room) side. 執務室（受音室）側のオクターブバンドレベルを示すグラフである。This is a graph showing the octave band levels on the office (receiving room) side.

以下に、本発明を実施するための形態について、図面を参照して、例示的に詳しく説明する。ただし、以下の実施形態に記載されている、構成、数値、処理の流れ、機能要素などは一例に過ぎず、その変形や変更は自由であって、本発明の技術範囲を以下の記載に限定する趣旨のものではない。 The following describes in detail exemplary embodiments of the present invention, with reference to the drawings. However, the configurations, numerical values, processing flows, functional elements, etc. described in the following embodiments are merely examples, and are open to modification and alteration. The technical scope of the present invention is not intended to be limited to the following description.

本発明の好ましい実施形態としての対象音加工システム１００について、図１Ａ～図５Ｂを用いて説明する。対象音加工システム１００は、例えば、所定場所において収音した対象音が、建築仕様などにより決まる性能を経て、どのように聞こえるかを評価するために用いられる。図１Ａは、本実施形態に係る対象音加工システム１００の概要を説明するための図である。 A target sound processing system 100 as a preferred embodiment of the present invention will be described using Figures 1A to 5B. The target sound processing system 100 is used, for example, to evaluate how a target sound picked up at a specified location sounds after undergoing performance determined by architectural specifications, etc. Figure 1A is a diagram for explaining an overview of the target sound processing system 100 according to this embodiment.

対象音加工システム１００は、対象音加工装置１１０および携帯端末１２０を含んで構成されている。また、携帯端末１２０には、収音部１３０（マイク）と出力部１４０（スピーカ）とが、携帯端末１２０の外部からケーブルを用いて有線接続されている。なお、収音部１３０と出力部１４０とは、携帯端末１２０と無線接続されていてもよく、さらに、収音部１３０と出力部１４０とは、携帯端末１２０に内蔵されているものを用いてもよい。 The target sound processing system 100 is composed of a target sound processing device 110 and a mobile terminal 120. A sound collection unit 130 (microphone) and an output unit 140 (speaker) are connected to the mobile terminal 120 via a cable from outside the mobile terminal 120. The sound collection unit 130 and the output unit 140 may be connected to the mobile terminal 120 wirelessly, or the sound collection unit 130 and the output unit 140 may be built into the mobile terminal 120.

例えば、携帯端末１２０を所有した作業者やユーザが、携帯端末１２０に接続された収音部１３０（マイク）を用いて、所定場所において、対象音を収音する。ここで、対象音は、例えば、室内の音、屋外の音などが含まれるが、これらには限定されない。また、所定場所は、例えば、戸建て住宅や集合住宅などの建築予定地、既存建物であり、当該所定場所の関係者や購入希望者などが、当該所定場所の音に関する環境を知りたい場所である。 For example, a worker or user who owns the mobile terminal 120 uses the sound collection unit 130 (microphone) connected to the mobile terminal 120 to collect target sounds at a predetermined location. Here, target sounds include, but are not limited to, indoor and outdoor sounds. The predetermined location may be, for example, a planned construction site for a detached house or apartment complex, or an existing building, and is a location where stakeholders or potential buyers of the predetermined location want to know the sound environment of the predetermined location.

まず、所定場所における対象音１３１の収音作業者が、収音部１３０を用いて所定場所の対象音１３１を収音し、収音した対象音１３１を携帯端末１２０へ保存する。そして、収音作業者（または携帯端末１２０の所有者など）が、携帯端末１２０に保存された対象音データを対象音加工装置１１０へ送信する。なお、携帯端末１２０と対象音加工装置１１０とは、無線接続により接続されている。また、対象音加工装置１１０は、クラウド上に設置されたクラウドサーバなどであってもよい。 First, a person collecting target sound 131 at a predetermined location uses the sound collection unit 130 to collect the target sound 131 at the predetermined location and saves the collected target sound 131 on the mobile terminal 120. Then, the sound collection person (or the owner of the mobile terminal 120, etc.) transmits the target sound data saved on the mobile terminal 120 to the target sound processing device 110. The mobile terminal 120 and the target sound processing device 110 are connected wirelessly. The target sound processing device 110 may also be a cloud server installed on the cloud, etc.

また、収音した対象音１３１を対象音加工装置１１０へ送信する際に、携帯端末１２０は、対象音１３１を収音した所定場所をカメラなどの撮像機器を用いて撮像した所定場所画像１６０を合わせて送信する。 In addition, when transmitting the collected target sound 131 to the target sound processing device 110, the mobile terminal 120 also transmits a predetermined location image 160 captured using imaging equipment such as a camera of the predetermined location where the target sound 131 was collected.

対象音加工装置１１０は、受信した対象音１３１と所定場所画像１６０とを関連付けて格納する。すなわち、対象音加工装置１１０は、例えば、所定場所のＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）位置データなどを基に、対象音１３１と所定場所画像１６０とを関連づけて格納する。このようにして、対象音加工装置１１０においては、後の利用に備えて、対象音１３１と対象音１３１を収音した場所の画像である所定場所画像１６０とのペア（組み合わせ）を所定のストレージ等にストックする。 The target sound processing device 110 associates the received target sound 131 with the predetermined location image 160 and stores them. That is, the target sound processing device 110 associates the target sound 131 with the predetermined location image 160 and stores them based on, for example, GPS (Global Positioning System) position data of the predetermined location. In this way, the target sound processing device 110 stocks a pair (combination) of the target sound 131 and the predetermined location image 160, which is an image of the location where the target sound 131 was picked up, in a predetermined storage device or the like for future use.

次に、例えば、ユーザが、上述の所定場所とは異なる場所において、所定環境下で対象音１３１を聴取した場合に、実際に聞こえる音である絶対音を再現する試聴音を試聴したい場合を考える。ここで、上述の所定場所とは異なる場所は、ユーザが、試聴音を試聴したい場所（試聴場所）である。この場合、当該ユーザは、まず、自身が所持するスマートフォンのカメラ１５０などを用いて試聴場所を撮像し、撮像して得られた試聴場所画像１５１を、対象音加工装置１１０へ送信する。 Next, consider the case where, for example, a user listens to the target sound 131 in a predetermined environment at a location different from the above-mentioned predetermined location, and wishes to listen to a sample sound that reproduces absolute pitch, which is the sound that is actually heard. Here, the location different from the above-mentioned predetermined location is the location where the user wishes to listen to the sample sound (listening location). In this case, the user first captures an image of the listening location using the camera 150 of their own smartphone, and transmits the captured listening location image 151 to the target sound processing device 110.

対象音加工装置１１０は、受信した試聴場所画像に類似する所定場所画像を、ストックされた所定場所画像の中から探索する。そして、対象音加工装置１１０は、試聴場所画像１５１に類似する所定場所画像１６０を探索できたら、この所定場所画像１６０に関連付けられている対象音１３１を取得する。そして、対象音加工装置１１０は、取得した対象音１３１について、例えば、以下の手順に従って、試聴音等を生成する。 The target sound processing device 110 searches for a predetermined location image similar to the received listening location image from among the stored predetermined location images. Then, if the target sound processing device 110 is able to find a predetermined location image 160 similar to the listening location image 151, it acquires the target sound 131 associated with this predetermined location image 160. Then, the target sound processing device 110 generates a listening sound or the like for the acquired target sound 131, for example, according to the following procedure.

すなわち、対象音加工装置１１０は、取得した対象音１３１の対象音データに基づいて、所定場所で収音された対象音１３１を加工して、評価音（加工済試聴音１４１）を生成する。なお、携帯端末１２０は、対象音データを対象音加工装置１１０へ送信する際に、携帯端末１２０に接続されている収音部１３０（マイク）および出力部１４０（スピーカ）の特性に関するデータも合わせて送信することが好ましいが、対象音加工装置１１０からの要求に応じる形で送信してもよい。 That is, the target sound processing device 110 processes the target sound 131 collected at a predetermined location based on the acquired target sound data of the target sound 131, to generate an evaluation sound (processed preview sound 141). When transmitting the target sound data to the target sound processing device 110, the mobile terminal 120 preferably also transmits data regarding the characteristics of the sound collection unit 130 (microphone) and output unit 140 (speaker) connected to the mobile terminal 120, but this may also be transmitted in response to a request from the target sound processing device 110.

これは、収音部１３０の収音特性により、収音された対象音１３１の一部の周波数成分がカットされたり、出力部１４０の出力特性により、出力される試聴音が実際に聞こえる音と異なる周波数成分を持つ音が出力されたりするためである。そのため、対象音加工システム１００においては、マイクやスピーカの特性をも加味した上で、対象音１３１を加工して試聴音（加工済試聴音１４１）を生成するようになっている。 This is because the sound collection characteristics of the sound collection unit 130 cause some frequency components of the collected target sound 131 to be cut, and the output characteristics of the output unit 140 cause the output preview sound to have different frequency components from the sound that is actually heard. For this reason, the target sound processing system 100 processes the target sound 131 while taking into account the characteristics of the microphone and speaker to generate the preview sound (processed preview sound 141).

例えば、携帯端末１２０に内蔵された内蔵型のマイクやスピーカは、携帯端末１２０の価格を抑える目的や、携帯端末１２０の筐体内のスペースの問題などから、外付け型のマイクやスピーカと比べて、小型化されており、特定の性能が制限されていたり、特定の機能が削られていたりする場合がある。また、外付け型のマイクやスピーカであっても、使用目的や価格によっては、使用可能な機能が特定の機能に限定されていたり、特定の性能が制限されていたり、これとは反対に特定の機能が強化されていたりする場合がある。そのため、マイクやスピーカには、様々な機能や性能を持ったものが多く存在している。 For example, built-in microphones and speakers built into mobile terminal 120 are smaller than external microphones and speakers, due to the need to keep the price of mobile terminal 120 down or to address space constraints within the housing of mobile terminal 120, and may have limited performance or certain functions removed. Also, even with external microphones and speakers, depending on the intended use and price, the functions that can be used may be limited to certain functions, certain performance may be restricted, or conversely, certain functions may be enhanced. For this reason, there are many microphones and speakers with a wide variety of functions and performance capabilities.

このように、マイクやスピーカなどについて、対象音１３１の収音専用のマイクや、試聴音の出力専用のスピーカを使用していれば、マイクやスピーカごとのばらつきを調整する必要はない。しかしながら、専用のマイクやスピーカを使用しなければならないとすれば、対象音１３１の収音の際には、専用マイクをその都度、所定場所にまで運搬しなければならず、また、試聴音を聴取する場合には、専用スピーカの設置場所まで出向かなければならず、臨機応変、機動的に対応することが困難となる。 In this way, if microphones and speakers are used that are dedicated to picking up the target sound 131 and speakers that are dedicated to outputting the sample sound, there is no need to adjust for variations between microphones and speakers. However, if dedicated microphones and speakers must be used, the dedicated microphones must be transported to a designated location each time the target sound 131 is picked up, and when the sample sound is to be listened to, the dedicated speakers must be located, making it difficult to respond flexibly and quickly.

そのため、対象音加工システム１００においては、各機器の特性等に依存する誤差を解消するために、収音した対象音１３１を加工する各段階において、機器による誤差を解消した絶対音（加工済対象音、試聴音、加工済試聴音１４１）を生成し、これらの絶対音を加工することにより、現実の音と変わらない音を再現して、聴取者（ユーザ）が体験できるようにしている。 Therefore, in order to eliminate errors that depend on the characteristics of each device, the target sound processing system 100 generates absolute sounds (processed target sound, trial sound, processed trial sound 141) at each stage of processing the picked-up target sound 131, eliminating errors caused by the device, and by processing these absolute sounds, reproduces sounds that are no different from real sounds, allowing the listener (user) to experience them.

よって、対象音加工システム１００においては、収音した対象音１３１の対象音データと所定場所画像１６０とともに、収音部１３０（マイク）の特性と、出力部１４０（スピーカ）の特性とを対象音加工装置１１０に送信する。あるいは、収音部１３０および出力部１４０の特性を対象音１３１と所定場所画像１６０とは、別個に対象音加工装置１１０に送信してもよく、例えば、対象音１３１と所定場所画像１６０との送信前に送信しても、これらの送信後に送信してもよい。対象音加工装置１１０は、取得した対象音データについて、収音した収音部１３０の特性に応じた補正値を用いて加工して、加工済対象音を生成する。次に、対象音加工装置１１０は、生成した加工済対象音から、所定場所において、所定環境下で対象音１３１を聴取した場合に、実際に聞こえる音を再現する試聴音を生成する。 Therefore, in the target sound processing system 100, the characteristics of the sound collection unit 130 (microphone) and the characteristics of the output unit 140 (speaker) are transmitted to the target sound processing device 110 along with the target sound data of the collected target sound 131 and the predetermined location image 160. Alternatively, the characteristics of the sound collection unit 130 and the output unit 140 may be transmitted to the target sound processing device 110 separately from the target sound 131 and the predetermined location image 160. For example, they may be transmitted before or after the transmission of the target sound 131 and the predetermined location image 160. The target sound processing device 110 processes the acquired target sound data using a correction value according to the characteristics of the sound collection unit 130 that collected the sound, to generate a processed target sound. Next, the target sound processing device 110 generates a trial sound from the generated processed target sound that reproduces the sound that would actually be heard when the target sound 131 is listened to at a predetermined location under a predetermined environment.

ここで、所定環境には、例えば、対象音１３１を収音した所定場所における建設予定の集合住宅や戸建住宅などの建物が含まれ、これらの建物の室内やベランダ等の室外なども含まれる。さらに、所定環境には、当該建物の壁の位置や面積、遮音性能（音響透過損失）、建物に取り付けられる窓の位置、面積、遮音性能（音響透過損失）、給気口の面積、数、遮音性能（基準化音響透過損失）、室内の表面積、吸音性能（吸音力）など、当該建物の住環境等を実現する様々な要因が含まれてもよい。 Here, the specified environment includes, for example, buildings such as apartment buildings or detached houses that are scheduled to be constructed at the specified location where the target sound 131 was picked up, as well as the interiors of these buildings and outdoor areas such as balconies. Furthermore, the specified environment may include various factors that create the living environment of the building, such as the position and area of the building's walls, sound insulation performance (sound transmission loss), the position, area, and sound insulation performance (sound transmission loss) of windows installed in the building, the area and number of air intakes, sound insulation performance (normalized sound transmission loss), indoor surface area, and sound absorption performance (sound absorption capacity).

そして、対象音加工装置１１０は、例えば、仮想空間上に所定環境（建物など）を再現し、収音した対象音１３１のデータを用いて、当該所定環境下における、音響効果を予測して、試聴音を生成する。対象音加工装置１１０は、生成した試聴音を、出力部１４０の特性に応じた補正値を用いて加工し、加工済試聴音を生成する。このように、出力部１４０の特性に基づいた補正値を用いて試聴音を加工することで、専用スピーカを用いなくても、出力部１４０において、実際に聞こえる音を確実に再現することが可能となる。なお、加工済試聴音の試聴結果に基づいて、試聴音を再度生成するようにしてもよい。このように、試聴音を再度生成するようにすることにより、様々な環境下における試聴音を再現できるので、例えば、窓の面積や遮音性能のグレードを変更することにより、様々な環境下における試聴音をシミュレートすることが可能となる。 The target sound processing device 110 then recreates a specified environment (such as a building) in a virtual space, and generates a trial sound by predicting the acoustic effects in that specified environment using data on the picked-up target sound 131. The target sound processing device 110 processes the generated trial sound using a correction value according to the characteristics of the output unit 140 to generate a processed trial sound. By processing the trial sound using a correction value based on the characteristics of the output unit 140 in this way, it is possible to reliably reproduce the sound that is actually heard in the output unit 140 without using a dedicated speaker. Note that the trial sound may be regenerated based on the results of listening to the processed trial sound. By regenerating the trial sound in this way, trial sounds in various environments can be reproduced. Therefore, it is possible to simulate trial sounds in various environments by, for example, changing the window area or the grade of sound insulation performance.

図１Ｂを参照して、対象音加工システム１００の動作概要について説明する。ステップＳ１０１において、収音部１３０により対象音１３１を収音する。ステップＳ１０３において、収音部１３０から携帯端末１２０へ、収音した対象音１３１を送信する。収音部１３０により収音される対象音１３１は、例えば、アナログデータとなっている。 With reference to Figure 1B, an overview of the operation of the target sound processing system 100 will be described. In step S101, the target sound 131 is collected by the sound collection unit 130. In step S103, the collected target sound 131 is transmitted from the sound collection unit 130 to the mobile terminal 120. The target sound 131 collected by the sound collection unit 130 is, for example, analog data.

そして、ステップＳ１０５において、携帯端末１２０に付属するカメラなどを用いて、対象音１３１を収音した場所である所定場所を撮像して、所定場所画像１６０を取得する。ステップＳ１０７において、携帯端末１２０は、取得した対象音１３１をデジタル変換した対象音データと所定場所画像データとを対象音加工装置１１０へ送信する。なお、所定場所画像１６０は、デジタルカメラなどで撮像されることにより、予めデジタルデータとなっている。ステップＳ１０９において、対象音加工装置１１０は、受信した対象音データと所定場所画像データとを格納する。 Then, in step S105, a camera attached to the mobile terminal 120 is used to capture an image of the predetermined location where the target sound 131 was picked up, thereby acquiring a predetermined location image 160. In step S107, the mobile terminal 120 transmits target sound data obtained by digitally converting the captured target sound 131 and the predetermined location image data to the target sound processing device 110. Note that the predetermined location image 160 is already in the form of digital data, having been captured using a digital camera or the like. In step S109, the target sound processing device 110 stores the received target sound data and predetermined location image data.

ステップＳ１１１において、例えば、携帯端末１２０は、ユーザが、所定場所とは異なる場所であって、試聴音を試聴したい場所である試聴場所をカメラ１５０で撮像して得られた試聴場所画像１５１を対象音加工装置１１０へ送信する。ステップＳ１１３において、対象音加工装置１１０は、受信した試聴場所画像１５１と類似する所定場所画像１６０を探索し、所定場所画像１６０に関連付けられている対象音１３１を取得する。 In step S111, for example, the mobile terminal 120 transmits to the target sound processing device 110 a listening location image 151 obtained by capturing an image of the listening location, which is a location different from the predetermined location and where the user wants to listen to the listening sound, using the camera 150. In step S113, the target sound processing device 110 searches for a predetermined location image 160 that is similar to the received listening location image 151, and acquires the target sound 131 associated with the predetermined location image 160.

ステップＳ１１５において、対象音加工装置１１０は、取得した対象音１３１を所定条件に従って加工して、加工済試聴音を生成する。ステップＳ１１７において、対象音加工装置１１０は、生成した加工済試聴音を携帯端末１２０（出力部１４０）へ送信する。ステップＳ１１９において、出力部１４０は、受信した加工済試聴音を出力する。ここで、収音部１３０および出力部１４０は、携帯端末１２０に内蔵されていてもよい。また、対象音１３１のデジタル変換は、対象音加工装置１１０において行ってもよい。 In step S115, the target sound processing device 110 processes the acquired target sound 131 in accordance with predetermined conditions to generate a processed preview sound. In step S117, the target sound processing device 110 transmits the generated processed preview sound to the mobile terminal 120 (output unit 140). In step S119, the output unit 140 outputs the received processed preview sound. Here, the sound collection unit 130 and the output unit 140 may be built into the mobile terminal 120. Furthermore, digital conversion of the target sound 131 may be performed in the target sound processing device 110.

＜対象音加工装置１１０の構成＞
図２を参照して対象音加工装置１１０の構成について説明する。対象音加工装置１１０は、格納部２１１、画像受付部２１２、探索部２１３、対象音取得部２１４、第１特性取得部２１５および加工済対象音生成部２１６を有する。さらに、対象音加工装置１１０は、試聴音生成部２１７、第２特性取得部２１８、加工済試聴音生成部２１９および出力制御部２２０を有する。 <Configuration of target sound processing device 110>
The configuration of the target sound processing device 110 will be described with reference to Fig. 2. The target sound processing device 110 has a storage unit 211, an image receiving unit 212, a search unit 213, a target sound acquisition unit 214, a first characteristic acquisition unit 215, and a processed target sound generation unit 216. The target sound processing device 110 further has a preview sound generation unit 217, a second characteristic acquisition unit 218, a processed preview sound generation unit 219, and an output control unit 220.

格納部２１１は、所定場所において、収音部１３０により集音された対象音１３１と、所定場所を撮像した所定場所画像１６０とを関連付けて格納する。対象音１３１は、例えば、マイク（収音部１３０）により収音される。所定場所画像１６０は、カメラなどで撮像される。カメラは、スマートフォンやタブレット端末などの携帯端末１２０に内蔵されているものや、カメラとして単独で存在しているもの、いずれを用いてもよい。撮像された所定場所画像１６０は、デジタルデータとして、カメラの内部ストレージや外部ストレージなどに保存される。 The storage unit 211 associates and stores target sound 131 collected by the sound collection unit 130 at a predetermined location with a predetermined location image 160 captured of the predetermined location. The target sound 131 is collected, for example, by a microphone (sound collection unit 130). The predetermined location image 160 is captured by a camera or the like. The camera may be one built into a mobile terminal 120 such as a smartphone or tablet terminal, or a standalone camera. The captured predetermined location image 160 is saved as digital data in the camera's internal storage or external storage.

また、所定場所画像１６０は、所定撮像方法に従って撮像された画像としてもよい。すなわち、カメラにより所定場所を撮像する場合には、所定場所に存在する音源や被写体からの距離や露光時間（シャッタースピード）、画角、露出などの撮像条件を揃えて撮像する。このように、撮像条件を揃えておくと、後述する探索部２１３による探索の時間を短縮できたり、探索精度を向上させたりすることができる。 The predetermined location image 160 may also be an image captured according to a predetermined imaging method. That is, when capturing an image of a predetermined location using a camera, imaging conditions such as the distance from a sound source or subject present at the predetermined location, exposure time (shutter speed), angle of view, and exposure are consistent. By consistent imaging conditions in this way, it is possible to shorten the search time by the search unit 213 (described below) and improve search accuracy.

さらに、カメラレンズの地面からの高さ、日光の入射角度、などの撮像条件を揃えて撮像してもよい。なお、様々な制約により、撮像条件を揃えられない場合も存在するので、このような場合には、所定の撮像条件に近い撮像条件で撮像すればよい。 Furthermore, images can be taken under consistent imaging conditions, such as the height of the camera lens from the ground and the angle of sunlight incidence. However, there may be cases where it is not possible to uniform imaging conditions due to various constraints, and in such cases, images can be taken under imaging conditions that are close to the specified imaging conditions.

画像受付部２１２は、所定場所において、所定環境下で対象音１３１を試聴した場合に、実際に聞こえる音である絶対音を再現する試聴音を試聴したい場所であって、当該所定場所とは異な場所を撮像した試聴場所画像１５１を受け付ける。試聴場所画像１５１は、例えば、カメラ１５０を用いて撮像される。 The image receiving unit 212 receives a listening location image 151 captured at a location different from the predetermined location where the user wishes to listen to a sample sound that reproduces absolute pitch, which is the sound that would actually be heard when listening to the target sound 131 in a predetermined environment at the predetermined location. The listening location image 151 is captured using, for example, a camera 150.

また、試聴場所画像１５１は、上述した所定場所画像１６０と同様に、所定撮像方法に従って撮像された画像としてもよい。このように、試聴場所画像１５１と所定場所画像１６０とにおいて、撮像方法を統一しておくと、受け付けた試聴場所画像１５１に類似する所定場所画像１６０の探索時間を短縮できたり、探索精度を向上させたりすることが可能となる。 Furthermore, the listening location image 151 may be an image captured according to a predetermined imaging method, similar to the above-mentioned predetermined location image 160. In this way, by using the same imaging method for the listening location image 151 and the predetermined location image 160, it is possible to shorten the time required to search for a predetermined location image 160 that is similar to the received listening location image 151 and improve the search accuracy.

探索部２１３は、受け付けた試聴場所画像１５１に類似する所定場所画像１６０を格納部２１１から探索する。探索部２１３は、例えば、試聴場所画像１５１と所定場所画像１６０との一致度に基づいて、試聴場所画像１５１に類似する所定場所画像１６０を格納部２１１から探索する。探索部２１３は、例えば、両画像の特徴点を抽出し、抽出された特徴点のうち一致する特徴点の数に基づいて、試聴場所画像１５１に類似する所定場所画像１６０を探索してもよい。 The search unit 213 searches the storage unit 211 for a predetermined location image 160 that is similar to the received listening location image 151. The search unit 213 searches the storage unit 211 for a predetermined location image 160 that is similar to the listening location image 151, for example, based on the degree of match between the listening location image 151 and the predetermined location image 160. The search unit 213 may, for example, extract feature points from both images and search for a predetermined location image 160 that is similar to the listening location image 151 based on the number of matching feature points among the extracted feature points.

また、探索部２１３は、所定場所を撮像した所定場所画像１６０を人工知能（ＡＩ：ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）に入力して、機械学習させてもよい。人工知能による機械学習が終了すると、探索部２１３は、学習済み所定場所画像モデルを生成する。なお、探索部２１３は、生成した学習済み所定場所画像モデルを所定のストレージ等に保存しておいてもよい。この場合、新たな学習用画像を取得して、機械学習を行い、学習済み所定場所画像モデルを生成するたびに、保存された学習済み所定場所画像モデルを更新するようにしてもよい。 The search unit 213 may also input the predetermined location image 160, which is an image of the predetermined location, into artificial intelligence (AI) for machine learning. When the machine learning by the AI is complete, the search unit 213 generates a trained predetermined location image model. The search unit 213 may also store the generated trained predetermined location image model in a specified storage device, etc. In this case, the stored trained predetermined location image model may be updated each time a new learning image is acquired, machine learning is performed, and a trained predetermined location image model is generated.

人工知能による機械学習は、既知のアルゴリズムを用いて行われる。機械学習においては、損失関数は、重み指定を行い、事象数の逆数を採用する。また、探索部２１３は、人工知能による機械学習の精度を向上させて、より精度の高い類似画像探索用のモデルを生成するために、人工知能に学習させる所定場所画像の数を水増しする。探索部２１３は、例えば、左右反転を用いて水増しデータを得る。さらに、探索部２１３は、人工知能による機械学習の精度を向上させるために、転移学習を用いてもよい。ここで、転移学習とは、異なるデータセットを用いた学習済みモデルを、別の問題に転用し、部分的な学習をすることで、モデルの性能向上を狙う手法である。特に、教師データが十分でない場合に、推論性能の向上と学習時間の低減が期待できる手法でもある。 Machine learning using artificial intelligence is performed using known algorithms. In machine learning, the loss function specifies weights and uses the inverse of the number of events. Furthermore, the search unit 213 inflates the number of images of a given location that the artificial intelligence learns in order to improve the accuracy of the machine learning using artificial intelligence and generate a more accurate model for similar image search. The search unit 213 obtains inflated data, for example, by flipping left and right. Furthermore, the search unit 213 may use transfer learning to improve the accuracy of the machine learning using artificial intelligence. Here, transfer learning is a method that aims to improve model performance by reusing a trained model using a different dataset for a different problem and performing partial learning. This method is particularly promising for improving inference performance and reducing learning time when there is insufficient training data.

対象音取得部２１４は、探索された所定場所画像１６０に関連付けられている対象音１３１を取得する。格納部２１１には、対象音１３１と所定場所画像１６０との組み合わせが格納されているため、対象音取得部２１４は、探索された所定場所画像１６０を基に、これに関連付けられている対象音１３１を取得する。 The target sound acquisition unit 214 acquires the target sound 131 associated with the searched predetermined location image 160. Since the storage unit 211 stores combinations of target sounds 131 and predetermined location images 160, the target sound acquisition unit 214 acquires the target sound 131 associated with the searched predetermined location image 160 based on it.

第１特性取得部２１５は、収音部１３０の第１特性を取得する。収音部１３０の第１特性は、対象音１３１を収音した際の諸々の条件であり、例えば、収音部１３０（マイク）の周波数特性などの機械特性や、収音部１３０に組み込まれているソフトウェアの特徴、所定場所の気温、湿度、風向、風量などの環境特性を含むものであるがこれらには限定されない。 The first characteristic acquisition unit 215 acquires the first characteristic of the sound collection unit 130. The first characteristic of the sound collection unit 130 is the various conditions when the target sound 131 is collected, and includes, for example, mechanical characteristics such as the frequency characteristics of the sound collection unit 130 (microphone), characteristics of the software built into the sound collection unit 130, and environmental characteristics such as the temperature, humidity, wind direction, and wind volume of a specified location, but is not limited to these.

加工済対象音生成部２１６は、第１特性に基づいて、対象音１３１を加工するための第１加工用補正値を取得し、取得した第１加工用補正値を用いて、対象音１３１を加工して加工済対象音を生成する。加工済対象音生成部２１６は、例えば、内部ストレージや外部ストレージに格納されている第１加工用補正値を取得する。第１加工用補正値は、対象音１３１に対して、取得した第１特性に応じた加工を施すための補正値である。 The processed target sound generation unit 216 acquires a first processing correction value for processing the target sound 131 based on the first characteristic, and processes the target sound 131 using the acquired first processing correction value to generate the processed target sound. The processed target sound generation unit 216 acquires, for example, the first processing correction value stored in internal storage or external storage. The first processing correction value is a correction value for processing the target sound 131 in accordance with the acquired first characteristic.

そして、加工済対象音生成部２１６は、取得した第１加工用補正値を用いて、対象音１３１（対象音データ）を加工して加工済対象音を生成する。加工済対象音生成部２１６は、例えば、特定の周波数成分をキャンセルなどして、対象音データを加工して、加工済対象音を生成する。 The processed target sound generation unit 216 then processes the target sound 131 (target sound data) using the acquired first processing correction value to generate the processed target sound. The processed target sound generation unit 216 processes the target sound data, for example, by canceling specific frequency components, to generate the processed target sound.

試聴音生成部２１７は、所定場所とは異なる場所において、所定環境下で、対象音１３１を聴取した場合に、実際に聞こえる音である絶対音を再現する試聴音を、生成された加工済対象音を加工して生成する。ここで、所定環境は、例えば、対象音１３１を収音した所定場所における建設予定の集合住宅や戸建住宅などの建物であり、これらの建物の室内やベランダ等の室外などを含む。さらに、所定環境には、当該建物の壁の位置や面積、遮音性能（音響透過損失）、建物に取り付けられる窓の位置、面積、遮音性能（音響透過損失）、給気口の面積、数、遮音性能（基準化音響透過損失）、室内の表面積、吸音性能（吸音力）、周辺建物からの音の反射など、当該建物の住環境等を実現する様々な要因が含まれてもよい。 The preview sound generation unit 217 processes the generated processed target sound to generate a preview sound that reproduces the absolute sound that will actually be heard when the target sound 131 is listened to in a predetermined environment at a location different from the predetermined location. Here, the predetermined environment is, for example, a building such as an apartment building or detached house that is scheduled to be constructed at the predetermined location where the target sound 131 is picked up, and includes the interior of these buildings and outdoor areas such as balconies. Furthermore, the predetermined environment may include various factors that realize the living environment of the building, such as the position and area of the building's walls, sound insulation performance (sound transmission loss), the position, area, and sound insulation performance (sound transmission loss) of windows installed in the building, the area and number of air intakes, sound insulation performance (normalized sound transmission loss), the surface area of the room, sound absorption performance (sound absorption capacity), and sound reflection from surrounding buildings.

また、生成される試聴音は、例えば、（１）外部騒音に起因する室内騒音、（２）空間遮音性能（隣室から伝搬する騒音）、（３）建物内外から敷地境界へ伝搬する騒音、（４）空調設備に起因する室内静ひつ性能、（５）床衝撃音遮断性能（床衝撃音）、（６）室内残響時間（音の響き）である。 The sample sounds generated include, for example, (1) indoor noise caused by external noise, (2) spatial sound insulation performance (noise propagating from adjacent rooms), (3) noise propagating from inside and outside the building to the site boundary, (4) indoor quieting performance caused by air conditioning equipment, (5) floor impact sound insulation performance (floor impact sound), and (6) indoor reverberation time (sound reverberation).

具体的には、（１）は、外部に鉄道などの騒音源がある場合に、その音が室内においてどのように聞こえるかについての音である。（２）は、ある部屋にＴＶや会議の音などの騒音源がある場合に、その音が隣室においてどのように聞こえるかについての音である。（３）は、建物敷地内（建物内や建物外）にある設備機械や作業音などの騒音源が敷地境界や近隣に対してどのように聞こえるかの音である。（４）は、エアコンや全熱交換器などの室内の空調設備が騒音源である場合に、室内の静粛性がどのように変化するか、あるいは、室内において空調設備の音がどのように聞こえるかについての音である。（５）は、上階の部屋からの飛び跳ねや走り回り音などが下階の部屋でどのように聞こえるかについての音である。（６）は、室内で会議や講演などをする場面で話し声や音響機器からの音が、当該室内においてどのように響いて聞こえるかについての音である。 Specifically, (1) refers to how a noise from an external noise source, such as a train, sounds indoors. (2) refers to how a noise from a TV or meeting sounds in a neighboring room. (3) refers to how noise sources such as equipment and machinery on the building premises (inside or outside the building) and operational noise sound to the property boundary or neighbors. (4) refers to how the quietness of a room changes or how the sound of an air conditioning unit, such as an air conditioner or total heat exchanger, sounds indoors. (5) refers to how sounds such as jumping or running from a room on an upper floor sound in a room on a lower floor. (6) refers to how voices and sounds from audio equipment resonate and sound in a room during a meeting, lecture, or other such occasion.

第２特性取得部２１８は、生成された試聴音を出力するための出力部１４０の第２特性を取得する。第２特性は、生成された試聴音を出力するための出力部１４０（スピーカ）の出力条件などであり、例えば、出力部１４０の機械特性や、出力部１４０に組み込まれているソフトウェアの特徴や出力場所の環境特性などを含むものであるが、これらには限定されない。 The second characteristic acquisition unit 218 acquires the second characteristic of the output unit 140 for outputting the generated preview sound. The second characteristic is the output condition of the output unit 140 (speaker) for outputting the generated preview sound, and includes, for example, the mechanical characteristics of the output unit 140, the characteristics of the software incorporated in the output unit 140, and the environmental characteristics of the output location, but is not limited to these.

加工済試聴音生成部２１９は、第２特性に基づいて、試聴音を加工するための第２加工用補正値を取得し、取得した第２加工用補正値を用いて、試聴音を加工して加工済試聴音を生成する。 The processed sample sound generation unit 219 acquires a second processing correction value for processing the sample sound based on the second characteristic, and processes the sample sound using the acquired second processing correction value to generate a processed sample sound.

出力部１４０から試聴音を出力した場合の、試聴音の再現性は、出力部１４０の特性に依存するため、同じ試聴音のデータを再生したとしても、特性の異なる複数の出力部１４０から出力させた場合には、聴取者にとっては、聞こえ方に差が生じることがある。このような事態が発生すると、実際に建物などを建築した後に、追加の工事が必要となる場合もある。そのため、事前に実際に聞こえる音を再現するために、出力部１４０の出力特性に応じて、生成した試聴音を加工する。試聴音の加工には、加工用の補正値（第２加工用補正値）が用いられる。 When a sample sound is output from the output unit 140, the reproducibility of the sample sound depends on the characteristics of the output unit 140. Therefore, even if the same sample sound data is played back, if it is output from multiple output units 140 with different characteristics, the listener may hear it differently. When this occurs, additional construction may be required after the building or other structure is actually constructed. Therefore, in order to reproduce the sound that will actually be heard in advance, the generated sample sound is processed according to the output characteristics of the output unit 140. A processing correction value (second processing correction value) is used to process the sample sound.

第２加工用補正値は、出力部１４０（スピーカ）のそれぞれについて、基準となる実験用の音を出力させて、それぞれの出力部１４０の周波数特性などの出力特性を確認した上で、決定される。そして、加工済試聴音生成部２１９は、上述のようにして決定された第２加工用補正値を用いて、試聴音を加工して、生成された試聴音を出力部１４０の特性に応じて加工された加工済試聴音を生成する。 The second processing correction value is determined after outputting a reference experimental sound from each output unit 140 (speaker) and checking the output characteristics, such as the frequency characteristics, of each output unit 140. The processed sample sound generation unit 219 then processes the sample sound using the second processing correction value determined as described above, and generates a processed sample sound that has been processed in accordance with the characteristics of the output unit 140.

出力制御部２２０は、生成された加工済試聴音１４１を制御して、出力部１４０から出力する。例えば、出力制御部２２０は、生成された加工済試聴音１４１を、携帯端末１２０へ送信して、出力部１４０から加工済試聴音１４１を出力する。また、出力制御部２２０は、生成された加工済試聴音１４１を出力部１４０へ直接送信することにより、加工済試聴音１４１を出力部１４０から出力するようにしてもよい。 The output control unit 220 controls the generated processed preview sound 141 and outputs it from the output unit 140. For example, the output control unit 220 transmits the generated processed preview sound 141 to the mobile terminal 120, and outputs the processed preview sound 141 from the output unit 140. The output control unit 220 may also transmit the generated processed preview sound 141 directly to the output unit 140, thereby outputting the processed preview sound 141 from the output unit 140.

＜携帯端末１２０の構成＞
次に図２を参照して、携帯端末１２０の構成について説明する。携帯端末１２０は、取得部２２１、送信部２２２、受信部２２３および出力制御部２２４を有する。なお、収音部１３０および出力部１４０は、携帯端末１２０に内蔵されていてもよい。 <Configuration of mobile terminal 120>
Next, the configuration of the mobile terminal 120 will be described with reference to Fig. 2. The mobile terminal 120 has an acquisition unit 221, a transmission unit 222, a reception unit 223, and an output control unit 224. Note that the sound collection unit 130 and the output unit 140 may be built into the mobile terminal 120.

取得部２２１は、所定場所において、収音部１３０（マイク）により収音された対象音１３１を取得する。ここで、対象音１３１は、室内の音および屋外の音の少なくとも１つが含まれる。取得された対象音１３１のデータは、アナログデータとなっているが、例えば、収音部１３０が、ＡＤコンバータを有していれば、収音部１３０において、収音した対象音１３１のアナログデータをデジタルデータに変換してもよい。また、収音部１３０が、ＡＤコンバータを有していない場合、携帯端末１２０が有するＡＤコンバータを用いて、デジタルデータに変換してもよい。さらに、収音部１３０および携帯端末１２０ともにＡＤコンバータを有していない場合、対象音加工装置１１０が有するＡＤコンバータにおいて、収音した対象音１３１のアナログデータをデジタルデータに変換してもよい。 The acquisition unit 221 acquires the target sound 131 collected by the sound collection unit 130 (microphone) at a predetermined location. Here, the target sound 131 includes at least one of indoor sound and outdoor sound. The data of the acquired target sound 131 is analog data, but if the sound collection unit 130 has an AD converter, for example, the analog data of the collected target sound 131 may be converted to digital data in the sound collection unit 130. Furthermore, if the sound collection unit 130 does not have an AD converter, the analog data may be converted to digital data using an AD converter provided in the mobile terminal 120. Furthermore, if neither the sound collection unit 130 nor the mobile terminal 120 has an AD converter, the analog data of the collected target sound 131 may be converted to digital data in an AD converter provided in the target sound processing device 110.

送信部２２２は、取得した対象音１３１のデータを対象音加工装置１１０へ送信する。対象音加工装置１１０へ送信される対象音１３１のデータは、アナログデータであっても、デジタルデータであってもよい。 The transmission unit 222 transmits the acquired data of the target sound 131 to the target sound processing device 110. The data of the target sound 131 transmitted to the target sound processing device 110 may be analog data or digital data.

受信部２２３は、対象音加工装置１１０から送信された加工済試聴音１４１を受信する。受信される加工済試聴音１４１のデータは、デジタルデータとなっている。 The receiving unit 223 receives the processed sample sound 141 transmitted from the target sound processing device 110. The data of the received processed sample sound 141 is digital data.

出力制御部２２４は、受信した加工済試聴音１４１を制御して出力部１４０から出力する。出力制御部２２４は、受信した加工済試聴音１４１のデジタルデータを、アナログデータに変換して、出力部１４０へ送り、出力部１４０において、加工済試聴音１４１のアナログデータを再生するように制御する。なお、上述したように、対象音加工装置１１０の出力制御部２２０から出力部１４０に対して、加工済対象音を直接送信して、出力するようにしてもよい。 The output control unit 224 controls the received processed sample sound 141 to output it from the output unit 140. The output control unit 224 converts the received digital data of the processed sample sound 141 into analog data and sends it to the output unit 140, and controls the output unit 140 to play the analog data of the processed sample sound 141. As described above, the processed target sound may also be sent directly from the output control unit 220 of the target sound processing device 110 to the output unit 140 for output.

次に、図３Ａを参照して、対象音加工装置１１０が有するマイク補正値テーブル３０１の一例について説明する。マイク補正値テーブル３０１は、マイクＩＤ（Ｉｄｅｎｔｉｆｉｅｒ）３１１に関連付けて特性３１２および補正値３１３を記憶する。マイクＩＤ３１１は、対象音１３１を収音可能なマイクを識別するための識別子である。特性３１２は、マイクの特性を示し、周波数特性および収音可能レベルを含む。補正値３１３は、周波数補正値およびレベル補正値を含む。そして、対象音加工装置１１０の加工済対象音生成部２１６は、マイク補正値テーブル３０１を参照して、マイク（収音部１３０）の特性に応じた補正値を抽出し、受信した対象音１３１を加工する。 Next, referring to Figure 3A, an example of the microphone correction value table 301 possessed by the target sound processing device 110 will be described. The microphone correction value table 301 stores characteristics 312 and correction values 313 in association with a microphone ID (Identifier) 311. The microphone ID 311 is an identifier for identifying a microphone capable of collecting the target sound 131. The characteristics 312 indicate the characteristics of the microphone, and include frequency characteristics and the sound collection level. The correction values 313 include a frequency correction value and a level correction value. The processed target sound generation unit 216 of the target sound processing device 110 then references the microphone correction value table 301, extracts a correction value according to the characteristics of the microphone (sound collection unit 130), and processes the received target sound 131.

図３Ｂを参照して、対象音加工装置１１０が有するスピーカ補正値テーブル３０２の一例について説明する。スピーカ補正値テーブル３０２は、スピーカＩＤ３２１に関連付けて特性３２２および補正値３２３を記憶する。スピーカＩＤ３２１は、対象音加工装置１１０から送信された加工済試聴音１４１を出力可能なスピーカを識別するための識別子である。特性３２２は、スピーカの特性を示し、周波数特性および出力可能レベルを含む。補正値３２３は、周波数補正値およびレベル補正値を含む。そして、対象音加工装置１１０の加工済試聴音生成部２１９は、スピーカ補正値テーブル３０２を参照して、スピーカ（出力部１４０）の特性に応じた補正値を抽出し、生成した試聴音を加工する。 With reference to Figure 3B, an example of the speaker correction value table 302 possessed by the target sound processing device 110 will be described. The speaker correction value table 302 stores characteristics 322 and correction values 323 in association with speaker IDs 321. The speaker IDs 321 are identifiers for identifying speakers capable of outputting the processed sample sound 141 transmitted from the target sound processing device 110. The characteristics 322 indicate the characteristics of the speaker, and include frequency characteristics and possible output levels. The correction values 323 include frequency correction values and level correction values. The processed sample sound generation unit 219 of the target sound processing device 110 then references the speaker correction value table 302, extracts correction values according to the characteristics of the speaker (output unit 140), and processes the generated sample sound.

図３Ｃを参照して、対象音加工装置１１０が有する対象音画像テーブル３０３の一例について説明する。対象音画像テーブル３０３は、対象音ＩＤ３３１に関連付けて画像３３２、場所３３３、収音／撮像条件３３４を記憶する。対象音ＩＤ３３１は、収音部１３０により収音された対象音１３１のそれぞれを識別するための識別子である。画像３３２は、収音された対象音１３１に関連付けられて格納されている画像であり、対象音１３１を収音した場所を撮像した画像である。場所３３３は、対象音１３１を収音した場所を示す座標データであり、１つの場所に対して、複数の対象音１３１や画像が対応していることもある。収音／撮像条件３３４は、対象音１３１を収音した際の収音条件と収音場所の画像を撮像した際の撮像条件であり、被写体からの距離や露光時間、天候、気温などが含まれる。 An example of a target sound image table 303 held by the target sound processing device 110 will be described with reference to Figure 3C. The target sound image table 303 stores an image 332, a location 333, and sound collection/imaging conditions 334 in association with a target sound ID 331. The target sound ID 331 is an identifier for identifying each target sound 131 collected by the sound collection unit 130. The image 332 is an image stored in association with the collected target sound 131, and is an image of the location where the target sound 131 was collected. The location 333 is coordinate data indicating the location where the target sound 131 was collected, and multiple target sounds 131 and images may correspond to one location. The sound collection/imaging conditions 334 are the sound collection conditions when the target sound 131 was collected and the imaging conditions when an image of the sound collection location was captured, and include the distance from the subject, exposure time, weather, temperature, etc.

図４を参照して、対象音加工装置１１０のハードウェア構成について説明する。ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）４１０は、演算制御用のプロセッサであり、プログラムを実行することで図２の対象音加工装置１１０の各機能構成を実現する。ＣＰＵ４１０は複数のプロセッサを有し、異なるプログラムやモジュール、タスク、スレッドなどを並行して実行してもよい。ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）４２０は、初期データおよびプログラムなどの固定データおよびその他のプログラムを記憶する。また、ネットワークインタフェース４３０は、ネットワークを介して他の装置などと通信する。なお、ＣＰＵ４１０は１つに限定されず、複数のＣＰＵであっても、あるいは画像処理用のＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）を含んでもよい。また、ネットワークインタフェース４３０は、ＶＰＵ４１０とは独立したＣＰＵを有して、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）４４０の領域に送受信データを書き込みあるいは読み出しするのが望ましい。また、ＲＡＭ４４０とストレージ４５０との間でデータを転送するＤＭＡＣ（ＤｉｒｅｃｔＭｅｍｏｒｙＡｃｃｅｓｓＣｏｎｔｒｏｌｌｅｒ）を設けるのが望ましい（図示なし）。さらに、ＣＰＵ４１０は、ＲＡＭ４４０にデータが受信あるいは転送されたことを認識してデータを処理する。また、ＣＰＵ４１０は、処理結果をＲＡＭ４４０に準備し、後の送信あるいは転送はネットワークインタフェース４３０やＤＭＡＣに任せる。 The hardware configuration of the target sound processing device 110 will be described with reference to Figure 4. The CPU (Central Processing Unit) 410 is a processor for arithmetic and control, and by executing programs, realizes each functional configuration of the target sound processing device 110 in Figure 2. The CPU 410 has multiple processors and may execute different programs, modules, tasks, threads, etc. in parallel. The ROM (Read Only Memory) 420 stores fixed data such as initial data and programs, as well as other programs. The network interface 430 communicates with other devices via a network. The CPU 410 is not limited to one, and may be multiple CPUs, or may include a GPU (Graphics Processing Unit) for image processing. Furthermore, it is desirable that the network interface 430 has a CPU independent of the VPU 410, and writes and reads transmitted and received data to and from an area of the RAM (Random Access Memory) 440. It is also desirable to provide a DMAC (Direct Memory Access Controller) (not shown) that transfers data between the RAM 440 and the storage 450. Furthermore, the CPU 410 recognizes that data has been received or transferred to the RAM 440 and processes the data. The CPU 410 also prepares the processing results in the RAM 440, and leaves subsequent transmission or transfer to the network interface 430 or the DMAC.

ＲＡＭ４４０は、ＣＰＵ４１０が一時記憶のワークエリアとして使用するランダムアクセスメモリである。ＲＡＭ４４０には、本実施形態の実現に必要なデータを記憶する記憶領域が確保されている。対象音データ４４１は、所定場所において、収音部１３０を用いて収音された対象音のデータであり、デジタル変換されたデータである。マイク補正値４４２は、対象音１３１の収音に用いた収音部１３０（マイク）に依存するマイク感度（実際の音と収音された音との差）などを較正するために用いられる補正値である。 RAM 440 is a random access memory used by CPU 410 as a work area for temporary storage. RAM 440 has a storage area reserved for storing data necessary to implement this embodiment. Target sound data 441 is data on target sound picked up at a specified location using the sound pickup unit 130, and is digitally converted data. Microphone correction value 442 is a correction value used to calibrate microphone sensitivity (the difference between the actual sound and the picked-up sound), which depends on the sound pickup unit 130 (microphone) used to pick up target sound 131.

スピーカ補正値４４３は、生成した加工済試聴音を再生する出力部１４０（スピーカ）に依存するスピーカ固有の音響特性（生成された試聴音と再生される試聴音との差）などを解消するために用いられる補正値である。加工済対象音データ４４４は、収音された対象音１３１を第１加工用補正値により加工した音のデータである。 The speaker correction value 443 is a correction value used to eliminate speaker-specific acoustic characteristics (differences between the generated sample sound and the reproduced sample sound) that depend on the output unit 140 (speaker) that reproduces the generated processed sample sound. The processed target sound data 444 is sound data obtained by processing the collected target sound 131 using the first processing correction value.

試聴音データ４４５は、加工済対象音を加工して得られる、所定場所において、所定環境下で、対象音１３１を試聴した場合に実際に聞こえる音を再現する試聴音のデータである。加工済試聴音データ４４６は、生成された試聴音を出力するための出力部１４０の出力条件に基づいて、第２加工用補正値を用いて加工された試聴音である。画像データ４４７は、対象音１３１を収音した場所を撮像した画像のデータである。 Trial sound data 445 is data on the trial sound obtained by processing the processed target sound, which reproduces the sound that would actually be heard when target sound 131 is listened to in a specified location under a specified environment. Processed trial sound data 446 is trial sound that has been processed using the second processing correction value based on the output conditions of output unit 140 for outputting the generated trial sound. Image data 447 is data on an image captured of the location where target sound 131 was picked up.

送受信データ４４８は、ネットワークインタフェース４３０を介して送受信されるデータである。また、ＲＡＭ４４０は、各種アプリケーションモジュールを実行するためのアプリケーション実行領域４４９を有する。 Transmitted/received data 448 is data transmitted and received via the network interface 430. RAM 440 also has an application execution area 449 for executing various application modules.

ストレージ４５０には、データベースや各種パラメータ、あるいは本実施形態の実現に必要な以下のデータまたはプログラムが記憶されている。ストレージ４５０は、マイク補正値テーブル３０１、スピーカ補正値テーブル３０２および対象音画像テーブル３０３を格納する。マイク補正値テーブル３０１は、図３Ａに示した、マイクＩＤ３１１と補正値３１３などとの関係を管理するテーブルである。スピーカ補正値テーブル３０２は、図３Ｂに示した、スピーカＩＤ３２１と補正値３２３などとの関係を管理するテーブルである。対象音画像テーブル３０３は、図３Ｃに示した、対象音ＩＤ３３１と画像３３２などとの関係を管理するテーブルである。 Storage 450 stores databases, various parameters, and the following data or programs required to implement this embodiment. Storage 450 stores a microphone correction value table 301, a speaker correction value table 302, and a target sound image table 303. The microphone correction value table 301 is a table that manages the relationship between microphone ID 311 and correction value 313, etc., shown in Figure 3A. The speaker correction value table 302 is a table that manages the relationship between speaker ID 321 and correction value 323, etc., shown in Figure 3B. The target sound image table 303 is a table that manages the relationship between target sound ID 331 and image 332, etc., shown in Figure 3C.

ストレージ４５０は、さらに、画像受付モジュール４５１、探索モジュール４５２、対象音取得モジュール４５３、第１特性取得モジュール４５４、加工済対象音生成モジュール４５５、試聴音生成モジュール４５６、第２特性取得モジュール４５７、加工済対象音生成おジュール４５８および出力制御モジュール４５９を格納する。 Storage 450 further stores an image reception module 451, a search module 452, a target sound acquisition module 453, a first characteristic acquisition module 454, a processed target sound generation module 455, a preview sound generation module 456, a second characteristic acquisition module 457, a processed target sound generation module 458, and an output control module 459.

画像受付モジュール４５１は、所定場所において、所定環境下で対象音１３１を試聴した場合に、実際に聞こえる音である絶対音を再現する試聴音を試聴した場所であって、当該所定場所とは異なる場所を撮像し試聴場所画像１５１を受け付けるモジュールである。探索モジュール４５２は、受け付けた試聴場所画像１５１に類似する所定場所画像１６０を格納部２１１から探索するモジュールである。対象音取得モジュール４５３は、探索された所定場所画像１６０に関連付けられている対象音１３１を取得するモジュールである。 The image reception module 451 is a module that captures an image of a location different from the predetermined location where a sample sound that reproduces absolute sound, which is the sound that is actually heard when the target sound 131 is listened to in a predetermined environment at the predetermined location, and receives a listening location image 151. The search module 452 is a module that searches the storage unit 211 for a predetermined location image 160 that is similar to the received listening location image 151. The target sound acquisition module 453 is a module that acquires the target sound 131 associated with the searched predetermined location image 160.

第１特性取得モジュール４５４は、収音部１３０の第１特性を取得するモジュールである。加工済対象音生成モジュール４５５は、第１特性に基づいて、対象音１３１を加工するための第１加工用補正値を取得し、取得した第１加工用補正値を用いて、対象音１３１を加工して加工済対象音を生成するモジュールである。試聴音生成モジュール４５６は、試聴音を生成された加工済対象音を加工して生成するモジュールである。 The first characteristic acquisition module 454 is a module that acquires the first characteristic of the sound pickup unit 130. The processed target sound generation module 455 is a module that acquires a first processing correction value for processing the target sound 131 based on the first characteristic, and processes the target sound 131 using the acquired first processing correction value to generate a processed target sound. The preview sound generation module 456 is a module that processes the generated processed target sound to generate a preview sound.

第２特性取得モジュール４５７は、生成された試聴音を出力するための出力部１４０の第２特性を取得するモジュールである。加工済対象音生成モジュール４５８は、第２特性に基づいて、試聴音を加工するための第２加工用補正値を取得し、取得した第２加工用補正値を用いて、試聴音を加工して加工済試聴音を生成するモジュールである。出力制御モジュール４５９は、生成した加工済試聴音を制御して、出力部１４０から出力するモジュールである。 The second characteristic acquisition module 457 is a module that acquires the second characteristic of the output unit 140 for outputting the generated preview sound. The processed target sound generation module 458 is a module that acquires a second processing correction value for processing the preview sound based on the second characteristic, and processes the preview sound using the acquired second processing correction value to generate a processed preview sound. The output control module 459 is a module that controls the generated processed preview sound and outputs it from the output unit 140.

これらのモジュール４５１～４５９は、ＣＰＵ４１０によりＲＡＭ４４０のアプリケーション実行領域４４９に読み出され、実行される。制御プログラム４７０は、対象音加工装置１１０の全体を制御するためのプログラムである。 These modules 451 to 459 are loaded into the application execution area 449 of the RAM 440 by the CPU 410 and executed. The control program 470 is a program for controlling the entire target sound processing device 110.

入出力インタフェース４６０は、入出力機器との入出力データをインタフェースする。入出力インタフェース４６０には、表示部４６１、操作部４６２が接続される。また、入出力インタフェース４６０には、さらに、記憶媒体４６４が接続されてもよい。さらに、音声出力部であるスピーカ４６３や、音声入力部であるマイク（図示せず）、あるいは、ＧＰＳ位置判定部が接続されてもよい。なお、図４に示したＲＡＭ４４０やストレージ４５０には、対象音加工装置１１０が有する汎用の機能や他の実現可能な機能に関するプログラムやデータは図示されていない。 The input/output interface 460 interfaces input/output data with input/output devices. A display unit 461 and an operation unit 462 are connected to the input/output interface 460. A storage medium 464 may also be connected to the input/output interface 460. A speaker 463 serving as an audio output unit, a microphone (not shown) serving as an audio input unit, or a GPS position determination unit may also be connected. Note that the RAM 440 and storage 450 shown in Figure 4 do not include programs or data related to the general-purpose functions of the target sound processing device 110 or other feasible functions.

次に図５Ａに示したフローチャートを参照して、対象音加工装置１１０の処理手順について説明する。このフローチャートは、図４のＣＰＵ４１０がＲＡＭ４４０を使用して実行し、図２の対象音加工装置１１０の各機能構成を実現する。 Next, the processing procedure of the target sound processing device 110 will be described with reference to the flowchart shown in Figure 5A. This flowchart is executed by the CPU 410 in Figure 4 using the RAM 440, and realizes the various functional components of the target sound processing device 110 in Figure 2.

ステップＳ５０１において、格納部２１１は、所定場所において収音された対象音１３１と、対象音１３１を収音した所定場所を撮像した所定場所画像とを関連付けて格納する。ステップＳ５０３において、画像受付部２１２は、対象音１３１を収音した所定場所とは異なる場所であって、試聴音を試聴した場所を撮像した試聴場所画像１５１を受け付ける。 In step S501, the storage unit 211 associates and stores the target sound 131 picked up at a predetermined location with a predetermined location image capturing the predetermined location where the target sound 131 was picked up. In step S503, the image receiving unit 212 receives a listening location image 151 capturing an image of a location where the sample sound was listened to, which is a location different from the predetermined location where the target sound 131 was picked up.

ステップＳ５０５において、探索部２１３は、受け付けた試聴場所画像１５１に類似する所定場所画像を格納部２１１から探索する。探索の方法は、例えば、所定場所画像１６０と試聴場所画像１５１との一致度に基づいて行われる。ステップＳ５０７において、探索部２１３は、試聴場所画像１５１に類似する所定場所画像１６０が探索できたか否かを判断する。探索できなかったと判断した場合（ステップＳ５０７のＮＯ）、対象音加工装置１１０は、処理を終了する。探索できたと判断した場合（ステップＳ５０７のＹＥＳ）、対象音加工装置１１０は、次のステップへ進む。 In step S505, the search unit 213 searches the storage unit 211 for a predetermined location image similar to the received listening location image 151. The search method is, for example, based on the degree of match between the predetermined location image 160 and the listening location image 151. In step S507, the search unit 213 determines whether a predetermined location image 160 similar to the listening location image 151 has been found. If it is determined that a search has not been found (NO in step S507), the target sound processing device 110 terminates processing. If it is determined that a search has been found (YES in step S507), the target sound processing device 110 proceeds to the next step.

ステップＳ５０９において、対象音取得部２１４は、探索された所定場所画像１６０に関連付けられている対象音１３１を取得する。ステップＳ５１１において、第１特性取得部２１５は、収音部１３０の第１特性を取得する。ステップＳ５１３において、加工済対象音生成部２１６は、取得した第１特性に基づいて、対象音１３１を加工するための第１加工用補正値を取得し、取得した第１加工用補正値を用いて対象音１３１を加工して、加工済対象音を生成する。 In step S509, the target sound acquisition unit 214 acquires the target sound 131 associated with the searched predetermined location image 160. In step S511, the first characteristic acquisition unit 215 acquires the first characteristic of the sound collection unit 130. In step S513, the processed target sound generation unit 216 acquires a first processing correction value for processing the target sound 131 based on the acquired first characteristic, and processes the target sound 131 using the acquired first processing correction value to generate the processed target sound.

ステップＳ５１５において、試聴音生成部２１７は、生成された加工済対象音を加工して、所定場所とは異なる場所において、所定環境下で、対象音１３１を聴取した場合に、実際に聞こえる音である絶対音を再現する試聴音を生成する。ステップＳ５１７において、第２特性取得部２１８は、生成された試聴音を出力するための出力部１４０の第２特性を取得する。ステップＳ５１９において、加工済試聴音生成部２１９は、取得した第２特性に基づいて、試聴音を加工するための第２加工用補正値を取得し、取得した第２加工用補正値を用いて、試聴音を加工して加工済試聴音を生成する。ステップＳ５２１において、生成した加工済試聴音を制御して、出力部１４０から出力させる。 In step S515, the preview sound generation unit 217 processes the generated processed target sound to generate a preview sound that reproduces the absolute pitch, which is the sound that would actually be heard when the target sound 131 is listened to in a location different from the specified location and under a specified environment. In step S517, the second characteristic acquisition unit 218 acquires the second characteristic of the output unit 140 for outputting the generated preview sound. In step S519, the processed preview sound generation unit 219 acquires a second processing correction value for processing the preview sound based on the acquired second characteristic, and processes the preview sound using the acquired second processing correction value to generate the processed preview sound. In step S521, the generated processed preview sound is controlled and output from the output unit 140.

次に図５Ｂに示したフローチャートを参照して、携帯端末１２０の処理手順について説明する。このフローチャートは、不図示のＣＰＵが不図示のＲＡＭを使用して実行し、図２の携帯端末１２０の各機能構成を実現する。 Next, the processing procedure of the mobile terminal 120 will be described with reference to the flowchart shown in Figure 5B. This flowchart is executed by a CPU (not shown) using RAM (not shown), and realizes each functional configuration of the mobile terminal 120 shown in Figure 2.

ステップＳ５３１において、携帯端末１２０は、収音部１３０により収音された対象音１３１を取得する。ここで、携帯端末１２０は、取得した対象音１３１をデジタル変換してもよい。ステップＳ５３３において、携帯端末１２０は、取得した対象音１３１のデータを対象音加工装置１１０へ送信する。ステップＳ５３５において、携帯端末１２０は、対象音加工装置１１０から加工済試聴音１４１を受信する。受信した加工済試聴音１４１は、出力部１４０から出力される。 In step S531, the mobile terminal 120 acquires the target sound 131 collected by the sound collection unit 130. Here, the mobile terminal 120 may digitally convert the acquired target sound 131. In step S533, the mobile terminal 120 transmits data of the acquired target sound 131 to the target sound processing device 110. In step S535, the mobile terminal 120 receives the processed sample sound 141 from the target sound processing device 110. The received processed sample sound 141 is output from the output unit 140.

本実施形態によれば、対象音と所定場所画像とを関連付けて格納しているので、新たな場所の画像を撮像すれば、新たな場所で音を収音しなくても、試聴音を生成できる。すなわち、新たな場所に専用の収音用マイクを持っていって収音する必要がないので、気軽に利用することができる。 In this embodiment, the target sound and a predetermined location image are stored in association with each other, so by capturing an image of a new location, a sample sound can be generated without having to record the sound at the new location. In other words, there is no need to carry a dedicated sound recording microphone to the new location to record the sound, making it easy to use.

［実施例］
次に、図６～図１５を参照して本発明の実施例について説明する。なお、本発明の範囲は以下の実施例には限定されない。 [Example]
Next, embodiments of the present invention will be described with reference to Figures 6 to 15. It should be noted that the scope of the present invention is not limited to the following embodiments.

対象音加工システム１００の対象音加工装置１１０において、生成される試聴音は、［表１］に示した６項目（（ａ）～（ｆ））である。すなわち、（ａ）外部騒音に起因する室内騒音、（ｂ）空間遮蔽性能（隣室から伝搬する騒音）、（ｃ）建物内外から敷地境界へ伝搬する騒音、（ｄ）空間設備に起因する室内騒音、（ｅ）床衝撃音遮断性能（床衝撃音）、（ｆ）室内残響時間（音の響き）の６項目である。 The target sound processing device 110 of the target sound processing system 100 generates test sounds for the six items (a) to (f) shown in Table 1. These are: (a) indoor noise caused by external noise, (b) spatial shielding performance (noise propagating from adjacent rooms), (c) noise propagating from inside and outside the building to the site boundary, (d) indoor noise caused by spatial equipment, (e) floor impact sound insulation performance (floor impact sound), and (f) indoor reverberation time (sound reverberation).

６項目それぞれの具体例については、例えば、（ａ）交通騒音（道路交通騒音、鉄道騒音など）、（ｂ）会議室の話声やホテルのテレビ音などの空気伝搬音、（ｃ）屋外設備機械、室内設備機械、（ｄ）室内の空調設備、（ｅ）床衝撃音（重量床衝撃音・軽量床衝撃音）、（ｆ）会議室の話声、教室の授業の声などである。それぞれの計算には、例えば、特許文献１～３に記載の技術が利用される。 Specific examples of each of the six categories include (a) traffic noise (road traffic noise, railway noise, etc.), (b) airborne noise such as conversations in conference rooms and hotel television sounds, (c) outdoor equipment and machinery, indoor equipment and machinery, (d) indoor air conditioning equipment, (e) floor impact noise (heavy floor impact noise, light floor impact noise), and (f) conversations in conference rooms and classroom lectures. Calculations for each category utilize technologies described in, for example, Patent Documents 1 to 3.

また、６項目それぞれの対象周波数は、（ａ）５０Ｈｚ～５，０００Ｈｚ帯域、（ｂ）１００Ｈｚ～５，０００Ｈｚ帯域、（ｃ）および（ｄ）５０Ｈｚ～５，０００Ｈｚ帯域、（ｅ）重量床衝撃音：５０Ｈｚ～６３０Ｈｚ帯域、軽量床衝撃音：５０Ｈｚ～５，０００Ｈｚ帯域、（ｆ）１００Ｈｚ～５，０００Ｈｚ帯域となっている。 The target frequencies for each of the six items are: (a) 50Hz to 5,000Hz band, (b) 100Hz to 5,000Hz band, (c) and (d) 50Hz to 5,000Hz band, (e) heavy floor impact sound: 50Hz to 630Hz band, light floor impact sound: 50Hz to 5,000Hz band, and (f) 100Hz to 5,000Hz band.

そして、試聴音の生成については、（ａ）～（ｄ）においては、減音量フィルタを生成し、音源データ（収音した対象音データ）にフィルタ処理して試聴音を生成する。 To generate the preview sound, in steps (a) to (d), a volume reduction filter is generated and the sound source data (the picked-up target sound data) is filtered to generate the preview sound.

（ｅ）においては、ＪＩＳ規格による標準衝撃源（タイヤ、ボール、タッピング）の衝撃力から床衝撃音レベルを求め、スラブ厚や床仕上げ構造、仕上げ天井などの条件ごとに収音した標準衝撃源による床衝撃音から計算した条件に近い音源データを抽出し、その床衝撃音レベルと計算値のレベル差とを減音量としたフィルタを生成して、試聴音を生成する。なお、重量床衝撃音は、５０Ｈｚ～６３０ｈｚ帯域が評価対象であるので、対象外の周波数についてはフィルタ処理して再生しない。 In (e), the floor impact sound level is calculated from the impact force of standard impact sources (tires, balls, tapping) according to JIS standards, and sound source data closest to the calculated conditions is extracted from floor impact sounds from standard impact sources recorded under various conditions such as slab thickness, floor finish structure, and finished ceiling. A filter is then generated that reduces the volume by the difference between this floor impact sound level and the calculated value, and a sample sound is generated. Note that heavy floor impact sound is evaluated in the 50Hz to 630Hz band, so frequencies outside of this range are not filtered and are not played back.

（ｆ）においては、無響室で収音した朗読音等のドライソースの音に、音響シミュレーション等により予測したインパルス応答やインパルス応答の実測値を畳み込み演算することにより、試聴音を生成する。 In (f), a test sound is generated by convolving a dry source sound, such as a reading recorded in an anechoic chamber, with an impulse response predicted by acoustic simulation or the like, or an actual measured value of the impulse response.

次に、使用するマイク（収音部１３０や収音装置）やヘッドホン（出力部１４０やスピーカ）などはそれぞれ、固有の音響特性を有している。そのため、生成した試聴音を忠実に再生するためには、これらの音響特性を補正する必要があり、予め求めた補正値をもとに収音した音源（対象音１３１）や生成した試聴音を補正する。なお、使用機器の音響特性に対する補正値は、以下の方法で求められる。 Next, the microphones (sound pickup unit 130 or sound pickup device) and headphones (output unit 140 or speaker) used each have their own unique acoustic characteristics. Therefore, in order to faithfully reproduce the generated sample sound, these acoustic characteristics must be corrected, and the picked-up sound source (target sound 131) and the generated sample sound are corrected based on a correction value calculated in advance. The correction value for the acoustic characteristics of the equipment used can be calculated using the following method.

≪マイクの音響特性の補正方法≫
＜マイクの周波数特性＞
マイク（収音部１３０）の音響特性の補正方法について、携帯端末１２０としてタブレット端末を使用した場合を例に説明する。まず、評価対象周波数の音をマイクで収音できることを実験的に確認する。無響室において、スピーカ（音源）からピンクノイズ（雑音）を発生させ、１ｍ離れた位置に精密騒音計を設置して収音する。図６には、収音データの１／３オクターブバンドレベルが示されている。 <<How to correct the acoustic characteristics of a microphone>>
<Microphone frequency characteristics>
A method for correcting the acoustic characteristics of a microphone (sound pickup unit 130) will be described using an example in which a tablet device is used as the mobile terminal 120. First, experimentally confirm that the microphone can pick up the sound of the frequency to be evaluated. In an anechoic chamber, pink noise (noise) is generated from a speaker (sound source), and a precision sound level meter is placed 1 m away to pick up the sound. Figure 6 shows the 1/3 octave band levels of the picked-up sound data.

タブレット端末に外付けしたマイクと精密騒音計とは、対象周波数範囲である５０Ｈｚ～５，０００Ｈｚ帯域において概ね一致している。一方、タブレット端末に内蔵されたマイクでは、８０Ｈｚ帯域以下の低い周波数でレベル差が見られる。重量床衝撃音や設備音等の低音域で問題となる音を収音する場合には、外付けマイクを利用することが好ましい。 The microphone attached to the tablet device and the precision sound level meter generally match in the target frequency range of 50 Hz to 5,000 Hz. On the other hand, the microphone built into the tablet device shows level differences at low frequencies below 80 Hz. When capturing problematic low-frequency sounds such as heavy floor impact noise and equipment noise, it is preferable to use an external microphone.

＜マイクの収音可能レベルと線形性＞
試聴音の試聴の際には、システム使用時は使用者の聴力障害への配慮や騒音対策が必要になる騒音の大きさを踏まえ、収音する音圧レベルを３０ｄＢ～８０ｄＢと想定する。また、マイクの周波数特性は、収音する音圧レベルに応じて線形変化するものを用いる。そのため、使用するマイクの収音可能レベルと周波数特性との線形性を実験的に確認する。 <Microphone pickup level and linearity>
When listening to the test sound, the sound pressure level to be picked up is assumed to be 30 dB to 80 dB, taking into consideration the user's hearing impairment and the level of noise that requires noise control measures when using the system. In addition, the frequency characteristics of the microphone used will change linearly depending on the sound pressure level to be picked up. Therefore, the linearity of the sound pick-up level and frequency characteristics of the microphone to be used will be experimentally confirmed.

ピンクノイズの音量を３０ｄＢ～８０ｄＢとし、タブレット端末の内蔵マイクと精密騒音計とを用いて収音した。収音した音を１／３オクターブバンド分析し、タブレット端末の内蔵マイクと精密騒音計との対応を検討する。図７には、代表例として、１，０００Ｈｚ帯域の結果が示されている。 Pink noise was recorded at a volume between 30 dB and 80 dB using the tablet's built-in microphone and a precision sound level meter. The recorded sound was analyzed in 1/3 octave bands to examine the correspondence between the tablet's built-in microphone and the precision sound level meter. Figure 7 shows the results for the 1,000 Hz band as a representative example.

精密騒音計の値を真値とした場合、タブレット端末の内蔵マイクの使用時には、３０ｄＢ～８０ｄＢまでは概ね線形に変化している。システム利用で想定している音圧レベル範囲で線形変化することから周波数帯域ごとに一定の補正値で補正が行える。 If the precision sound level meter value is taken as the true value, when using the tablet device's built-in microphone, the change is roughly linear from 30 dB to 80 dB. Because the change is linear within the sound pressure level range expected for system use, correction can be made using a fixed correction value for each frequency band.

対象とするマイクが、対象音加工システム１００で利用できることを確認後、精度よく収音できるように、精密騒音計と対象とするマイクとの１／３オクターブバンドレベル差を補正値（第１加工用補正値）として利用する。 After confirming that the target microphone can be used with the target sound processing system 100, the 1/3 octave band level difference between the precision sound level meter and the target microphone is used as a correction value (first processing correction value) to ensure accurate sound pickup.

≪ヘッドホンの音響特性の補正方法≫
＜ヘッドホンの周波数特性＞
ヘッドホンは、用途や好みに応じたチューニングがなされているものが多く、生成した試聴音を忠実に再生するには、個々のヘッドホンの周波数特性をキャンセルしなければならない。そのため、無響室において、ダミーヘッドにヘッドホンを装着し、タブレット端末経由でピンクノイズ（音源ソース）を再生する。図８には、ピンクノイズとヘッドホンの再生音との１／３オクターブバンドレベルが示されている。平坦な、周波数特性のピンクノイズに対して、使用したヘッドホンの再生音は、周波数ごとに異なり、特に、４００Ｈｚ～１，０００Ｈｚ帯域の音が強調される特性となっている。 <<How to correct the acoustic characteristics of headphones>>
<Headphone frequency characteristics>
Many headphones are tuned according to purpose and preference, and to faithfully reproduce the generated sample sound, the frequency characteristics of each headphone must be canceled. Therefore, in an anechoic chamber, headphones were attached to a dummy head, and pink noise (sound source) was played via a tablet device. Figure 8 shows the 1/3 octave band levels of the pink noise and the sound reproduced by the headphones. Compared to the pink noise, which has a flat frequency response, the sound reproduced by the headphones used varies by frequency, with sounds in the 400 Hz to 1,000 Hz band being particularly emphasized.

＜ヘッドホンの再生可能レベルと線形性＞
ピンクノイズのレベルを変えてヘッドホンで再生し、ヘッドホンの再生可能レベルと線形性とを検討する。図９には、代表例として、１，０００Ｈｚ帯域の１／３オクターブバンドレベルが示されている。 <Headphone playback level and linearity>
Pink noise is reproduced through headphones at various levels to examine the reproducible level and linearity of the headphones. Figure 9 shows the 1/3 octave band level of a 1,000 Hz band as a typical example.

２０ｄＢ～８０ｄＢの範囲において、ヘッドホンで再生できており、ピンクノイズの５ｄＢピッチのレベル変化に応じて線形に変化している。他の周波数帯域においても同様の結果が得られた。２０ｄＢまで再生が行えることから、例えば、空間の遮音を検討する際に、音源室において、７５ｄＢで発生している音源に対して、空間遮音性能でＤｒ－５５までの効果を試聴音として表現できる。ここで、Ｄｒは、室間音圧レベル差等級であり、Ｄｒ値が大きいほど空間の遮音性能が高く、空気伝搬音が伝わりにくいことを示す。 Sounds can be reproduced through headphones in the range of 20dB to 80dB, and change linearly in response to changes in the pink noise level at 5dB intervals. Similar results were obtained in other frequency bands. Because reproduction up to 20dB is possible, when considering the sound insulation of a space, for example, the spatial sound insulation performance of up to Dr-55 can be expressed as a listening test sound for a sound source generated at 75dB in the sound source room. Here, Dr is the rating of the difference in sound pressure level between rooms, and the higher the Dr value, the higher the sound insulation performance of the space and the less airborne sound is transmitted.

対象とするヘッドホンが対象音加工システム１００で利用できることを確認後、生成した試聴音を忠実に再生できるように、ピンクノイズ（音源ソース）とヘッドホンの再生音との１／３オクターブバンドレベル差を補正値（第２加工用補正値）として、ヘッドホン固有の音響特性をキャンセルする。 After confirming that the target headphones can be used with the target sound processing system 100, the 1/3 octave band level difference between the pink noise (sound source) and the sound reproduced by the headphones is used as a correction value (second processing correction value) to cancel out the acoustic characteristics unique to the headphones so that the generated trial sound can be faithfully reproduced.

≪実建物におけるシステム精度の検証事例≫
＜検証概要＞
オフィスの室内を利用し、空間遮音性能に関する対象音加工装置１１０による予測計算と試聴音、加工済試聴音の生成精度を検証した。測定室の概要を図１０に示す。 <<Verification example of system accuracy in an actual building>>
<Verification Overview>
Using an office room, the accuracy of the predicted calculations regarding spatial sound insulation performance by the target sound processing device 110, the sample sound, and the processed sample sound were verified. An outline of the measurement room is shown in FIG.

会議室と執務室との間の界壁には、乾式二重壁（遮音性能ＴＬ_Ｄ－４０）となっている。会議室と執務室との廊下扉は、エアタイトが設置されていない一般的なスチール製親子扉である。会議室を音源室とし、スピーカから対象音（ピンクノイズまたは男性朗読音）を再生した。再生された対象音を対象音加工装置１１０（タブレット端末の内蔵マイクを使用）でそれぞれ収音して、収音された対象音の音源データとした。なお、収音精度検証のため、精密騒音計（ＲＩＯＮ社製ＮＡ－２８）でも収音を行った。受音室にはダミーヘッドを設置して会議室からの伝搬音を収音した。また、ＪＩＳＡ１４１８：２０００「建築物の空気音遮断性能の測定方法」における室間音圧レベル差の測定も行った。 The partition wall between the conference room and the office is a dry double wall (sound insulation performance TL _D -40). The corridor door between the conference room and the office is a standard steel double door without an airtight seal. The conference room was used as the sound source room, and target sounds (pink noise or a male reading) were played from the speaker. The played target sounds were each picked up by the target sound processing device 110 (using the built-in microphone of a tablet device), and the picked-up target sound was used as sound source data. To verify sound pickup accuracy, sound was also picked up using a precision sound level meter (NA-28 manufactured by RION Corporation). A dummy head was installed in the sound receiving room to pick up the sound propagating from the conference room. The difference in sound pressure level between the rooms was also measured according to JIS A1418:2000, "Method for measuring the airborne sound insulation performance of buildings."

＜予測計算精度＞
対象音加工システム１００による、予測計算精度を確認するために、室間音圧レベル差の予測値と実測値とを比較した。図１１には、オクターブバンドレベルの比較結果が示されている。実測値に対して、予測値は、１２５Ｈｚ帯域～４，０００Ｈｚ帯域において概ね対応しており、精度よく室間音圧レベル差を予測できている。 <Prediction calculation accuracy>
To confirm the accuracy of the prediction calculations performed by the target sound processing system 100, the predicted values of the sound pressure level differences between rooms were compared with the actual measured values. Figure 11 shows the comparison results for octave band levels. The predicted values roughly correspond to the actual measured values in the 125 Hz to 4,000 Hz bands, and the sound pressure level differences between rooms were predicted with high accuracy.

収音精度が試聴音の生成精度に影響を及ぼすため、対象音加工システム１００（タブレット内蔵マイク）の収音データと精密騒音計とによる収音データを比較した。精密騒音計の音圧波形と対象音加工システム１００（タブレット内蔵マイク）で収音した音圧波形（加工済対象音）を図１２に、それぞれの音圧波形のオクターブバンドレベルを図１３に示す。 Because sound pickup accuracy affects the accuracy of generating the sample sound, we compared the sound pickup data from the target sound processing system 100 (tablet's built-in microphone) with the sound pickup data from a precision sound level meter. Figure 12 shows the sound pressure waveform from the precision sound level meter and the sound pressure waveform (processed target sound) picked up by the target sound processing system 100 (tablet's built-in microphone), and Figure 13 shows the octave band levels of each sound pressure waveform.

対象音加工システム１００（タブレット内蔵マイク）で収音した音圧波形の形状や振幅は、ピンクノイズおよび男性朗読音ともに精密騒音計で収音した音圧波形と同波形、同振幅である。オクターブバンドレベルでは、最大１ｄＢ程度の誤差であり、普通騒音計と同等の精度で収音できている。 The shape and amplitude of the sound pressure waveforms picked up by the target sound processing system 100 (tablet's built-in microphone) are the same as those picked up by a precision sound level meter for both pink noise and a male reading aloud. At the octave band level, the error is a maximum of about 1 dB, meaning that the sound is picked up with the same accuracy as a standard sound level meter.

＜試聴音の精度＞
対象音加工システム１００の収音データに対して予測計算結果を加味した試聴音の音圧波形に対してヘッドホン固有の音響特性をキャンセルする補正処理を行い、加工済試聴音を生成した。 <Preview sound accuracy>
A correction process was performed to cancel the acoustic characteristics specific to headphones on the sound pressure waveform of the sample sound, which was obtained by taking into account the predicted calculation results for the sound pickup data of the target sound processing system 100, and a processed sample sound was generated.

試聴音の音圧波形、加工済試聴音（ヘッドホン再生音）の音圧波形、受音室の執務室でダミーヘッドを用いて収音した音圧波形（実測値）を図１４に、それぞれのオクターブバンドレベルを図１５に示した。なお、受音室の暗騒音が図１５に示されているが、男性朗読再生時では、１，０００Ｈｚ帯域以上の受音室の暗騒音の影響を受けるため評価から除外している。 Figure 14 shows the sound pressure waveform of the test sound, the sound pressure waveform of the processed test sound (sound played back through headphones), and the sound pressure waveform (actual measured values) picked up using a dummy head in the sound receiving room office, and Figure 15 shows the octave band levels of each. Note that while Figure 15 shows the background noise in the sound receiving room, this was excluded from the evaluation when playing back a male reading, as it is affected by background noise in the sound receiving room above the 1,000 Hz band.

加工済試聴音（ヘッドホン再生音）は、試聴音および実測値の音圧波形とほぼ同形状、同振幅であり、オクターブバンドレベルも同程度である。 The processed sample sound (sound played through headphones) has roughly the same shape and amplitude as the sample sound and the actual measured sound pressure waveform, and the octave band levels are also roughly the same.

システムの対象音加工装置１１０で生成した試聴音をヘッドホンで正しく再生できており、加工済試聴音は実際の音を再現できている。 The preview sound generated by the system's target sound processing device 110 can be played back correctly through headphones, and the processed preview sound reproduces the actual sound.

以上、実施形態を参照して本発明を説明したが、本発明は、上述した実施形態に制限されず適宜変更可能である。本発明の構成や詳細には、本発明のスコープ内で当業者が理解し得る様々な変更をすることができる。また、それぞれの実施形態に含まれる別々の特徴を如何様に組み合わせた外装容器及び包装箱も、本発明の範疇に含まれる。 The present invention has been described above with reference to the embodiments, but the present invention is not limited to the above-described embodiments and can be modified as appropriate. The configuration and details of the present invention can be modified in various ways that are understandable to those skilled in the art and are within the scope of the present invention. Furthermore, outer containers and packaging boxes that combine the separate features included in each embodiment in any way are also included in the scope of the present invention.

また、本発明は、複数の機器から構成されるシステムに適用されてもよいし、単体の装置に適用されてもよい。さらに、本発明は、実施形態の機能を実現する情報処理プログラムが、システムあるいは装置に供給され、内蔵されたプロセッサによって実行される場合にも適用可能である。したがって、本発明の機能をコンピュータで実現するために、コンピュータにインストールされるプログラム、あるいはそのプログラムを格納した媒体、そのプログラムをダウンロードさせるＷＷＷ(World Wide Web)サーバも、プログラムを実行するプロセッサも本発明の技術的範囲に含まれる。特に、少なくとも、上述した実施形態に含まれる処理ステップをコンピュータに実行させるプログラムを格納した非一時的コンピュータ可読媒体（non-transitory computer readable medium）は本発明の技術的範囲に含まれる。
The present invention may be applied to a system consisting of multiple devices or to a single device. Furthermore, the present invention may also be applied when an information processing program that realizes the functions of the embodiments is supplied to a system or device and executed by a built-in processor. Therefore, the technical scope of the present invention also includes a program installed on a computer to realize the functions of the present invention, a medium storing the program, a World Wide Web (WWW) server from which the program is downloaded, and a processor that executes the program. In particular, the technical scope of the present invention also includes a non-transitory computer-readable medium storing a program that causes a computer to execute at least the processing steps included in the above-described embodiments.

Claims

a storage unit that stores a target sound collected by a sound collection unit at a predetermined location and a predetermined location image obtained by capturing the predetermined location in association with each other;
an image receiving unit that receives a listening location image, the listening location image being an image of a location different from the predetermined location, where a user wishes to listen to a listening sound that reproduces absolute sound, which is a sound that is actually heard when the target sound is listened to under a predetermined environment at the predetermined location;
a search unit that searches the storage unit for the predetermined location image similar to the received listening location image;
a target sound acquisition unit that acquires a target sound associated with the searched predetermined location image;
a first characteristic acquisition unit that acquires a first characteristic of the sound collection unit;
a processed target sound generation unit that acquires a first processing correction value for processing the target sound based on the first characteristic, and processes the target sound using the acquired first processing correction value to generate a processed target sound;
a sample sound generation unit that processes the generated processed target sound to generate the sample sound that reproduces an absolute sound that is a sound that is actually heard when the target sound is listened to in a predetermined environment at a location different from the predetermined location;
a second characteristic acquisition unit that acquires a second characteristic of an output unit for outputting the generated sample sound;
a processed sample sound generating unit that obtains a second processing correction value for processing the sample sound based on the second characteristic, and processes the sample sound using the obtained second processing correction value to generate a processed sample sound;
an output control unit that controls the generated processed sample sound and outputs it from the output unit;
A target sound processing device comprising:

A model generation unit that generates a learned predetermined location image model by training an artificial intelligence to learn the predetermined location image,
The target sound processing device according to claim 1 , wherein the search unit searches the storage unit for the predetermined location image similar to the listening location image by using the learned predetermined location image model.

The target sound processing device described in claim 1 or 2, wherein the search unit searches the storage unit for a predetermined location image similar to the listening location image based on the degree of similarity between the listening location image and the predetermined location image.

The target sound processing device described in any one of claims 1 to 3, wherein the processed sample sound includes at least one of indoor noise caused by external noise, noise propagating from an adjacent room, noise propagating from inside or outside the building to the site boundary, indoor noise caused by air conditioning equipment, floor impact sound insulation performance, and indoor reverberation time.

The target sound processing device described in claim 2, wherein the model generation unit generates the trained predetermined location image model using left-right inversion as padding data.

The target sound processing device described in claim 2, wherein the model generation unit generates the trained predetermined location image model using transfer learning.

The target sound processing device described in any one of claims 1 to 6, wherein the listening location image is an image captured according to a predetermined imaging method.

a storing step of storing a target sound collected by a sound collecting unit at a predetermined location and a predetermined location image obtained by capturing the predetermined location in association with each other;
an image receiving step of receiving a listening location image of a location different from the predetermined location where the user wishes to listen to a listening sound that reproduces absolute sound, which is a sound that is actually heard when the target sound is listened to in a predetermined environment at the predetermined location;
a searching step of searching a storage unit for an image of the predetermined location similar to the received image of the listening location;
a target sound acquisition step of acquiring a target sound associated with the searched predetermined location image;
a first characteristic acquisition step of acquiring a first characteristic of the sound collection unit;
a processed target sound generation step of acquiring a first processing correction value for processing the target sound based on the first characteristic, and processing the target sound using the acquired first processing correction value to generate a processed target sound;
a sample sound generating step of processing the generated processed target sound to generate the sample sound that reproduces an absolute sound that is a sound that is actually heard when the target sound is listened to in a predetermined environment at a location different from the predetermined location;
a second characteristic acquisition step of acquiring a second characteristic of an output unit for outputting the generated sample sound;
a processed sample sound generating step of acquiring second processing correction values for processing the sample sound based on the second characteristic, and processing the sample sound using the acquired second processing correction values to generate a processed sample sound;
an output control step of controlling the generated processed sample sound to output it from the output unit;
A target sound processing method including:

a storing step of storing a target sound collected by a sound collecting unit at a predetermined location and a predetermined location image obtained by capturing the predetermined location in association with each other;
an image receiving step of receiving a listening location image of a location different from the predetermined location where the user wishes to listen to a listening sound that reproduces absolute sound, which is a sound that is actually heard when the target sound is listened to in a predetermined environment at the predetermined location;
a searching step of searching a storage unit for an image of the predetermined location similar to the received image of the listening location;
a target sound acquisition step of acquiring a target sound associated with the searched predetermined location image;
a first characteristic acquisition step of acquiring a first characteristic of the sound collection unit;
a processed target sound generation step of acquiring a first processing correction value for processing the target sound based on the first characteristic, and processing the target sound using the acquired first processing correction value to generate a processed target sound;
a sample sound generating step of processing the generated processed target sound to generate the sample sound that reproduces an absolute sound that is a sound that is actually heard when the target sound is listened to in a predetermined environment at a location different from the predetermined location;
a second characteristic acquisition step of acquiring a second characteristic of an output unit for outputting the generated sample sound;
a processed sample sound generating step of acquiring second processing correction values for processing the sample sound based on the second characteristic, and processing the sample sound using the acquired second processing correction values to generate a processed sample sound;
an output control step of controlling the generated processed sample sound to output it from the output unit;
A target sound processing program that causes a computer to execute the above.