JP4702043B2

JP4702043B2 - Digital watermark encoding apparatus, digital watermark decoding apparatus, digital watermark encoding method, digital watermark decoding method, and program

Info

Publication number: JP4702043B2
Application number: JP2005372651A
Authority: JP
Inventors: 邦博須賀
Original assignee: Kenwood KK
Current assignee: Kenwood KK
Priority date: 2005-12-26
Filing date: 2005-12-26
Publication date: 2011-06-15
Anticipated expiration: 2025-12-26
Also published as: JP2007171833A

Description

この発明は、電子透かしエンコード装置、電子透かしデコード装置、電子透かしエンコード方法、電子透かしデコード方法及びプログラムに関する。 The present invention relates to a digital watermark encoding apparatus, a digital watermark decoding apparatus, a digital watermark encoding method, a digital watermark decoding method, and a program.

音声を合成する手法として、録音編集方式と呼ばれる手法がある。録音編集方式は、駅の音声案内システムや、車載用のナビゲーション装置などに用いられている。
録音編集方式は、単語と、この単語を読み上げる音声を表す音声データとを対応付けておき、音声合成する対象の文章を単語に区切ってから、これらの単語に対応付けられた音声データを取得してつなぎ合わせる、という手法である（例えば、特許文献１参照）。
特開平１０−４９１９３号公報 As a technique for synthesizing speech, there is a technique called a recording editing system. The recording / editing system is used in a station voice guidance system, an in-vehicle navigation system, and the like.
The recording and editing method associates a word with voice data representing a voice that reads out the word, divides a sentence to be synthesized into words, and acquires voice data associated with these words. This is a technique of joining them together (for example, see Patent Document 1).
JP 10-49193 A

録音編集方式により得られる合成音声の話者の変更を可能としたり、あるいはその他、得られる合成音声を多様にするための手法としては、音声データをリムーバブルメディア（可搬な記録媒体）に記録して用いるものとして、互いに異なる音声データを記録した複数のリムーバブルメディアを必要に応じて差し替える、というものが考えられる。しかし、リムーバブルメディアに記録された音声データは、不正な複製や改竄、あるいはその他の不正利用をされやすいという問題がある。 As a technique for making it possible to change the speaker of the synthesized speech obtained by the recording and editing method, or to diversify the synthesized speech obtained, record the speech data on removable media (portable recording media). As one to use, a plurality of removable media in which different audio data are recorded may be replaced as necessary. However, there is a problem that the audio data recorded on the removable medium is likely to be illegally copied, falsified or otherwise illegally used.

この発明は、上記実状に鑑みてなされたものであり、音声を表すデータの自由な供給を図りながら、音声データの有効な保護を図るための電子透かしエンコード装置、電子透かしデコード装置、電子透かしエンコード方法、電子透かしデコード方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and is an electronic watermark encoding apparatus, electronic watermark decoding apparatus, electronic watermark encoding for effectively protecting audio data while freely supplying data representing sound. It is an object to provide a method, a digital watermark decoding method, and a program.

上記目的を達成するため、この発明の第１の観点に係る電子透かしエンコード装置は、
音声を表す音声信号を取得する音声信号取得手段と、
取得された前記音声信号を、当該音声信号が表す音声のピッチ成分の周波数のシフトの時間軸上におけるパターンが、電子透かしとして埋め込む対象のデータの値を表すものとなるように加工する音声信号加工手段と、を備える、
ことを特徴とする。 In order to achieve the above object, a digital watermark encoding apparatus according to the first aspect of the present invention provides:
An audio signal acquisition means for acquiring an audio signal representing the audio;
Audio signal processing for processing the acquired audio signal so that the pattern on the time axis of the frequency shift of the pitch component of the audio represented by the audio signal represents the value of data to be embedded as a digital watermark Means,
It is characterized by that.

前記音声信号加工手段は、例えば、
取得された前記音声信号より、当該音声信号が表す音声のピッチ成分を表すピッチ成分信号を抽出するピッチ成分抽出手段と、
取得された前記音声信号より、当該音声信号が表す音声のピッチ成分以外の成分を表す非ピッチ成分信号を抽出する非ピッチ成分抽出手段と、
前記ピッチ成分信号を取得し、当該ピッチ成分信号の周波数を、電子透かしとして埋め込む対象のデータの値を表すようなパターンを時間軸上において有するようにシフトするピッチ成分周波数シフト手段と、
前記周波数のシフトを施されたピッチ成分信号、及び前記非ピッチ成分信号を取得して互いに加算することにより、電子透かしを施された音声信号を生成する加算手段と、を備えるものであればよい。 The audio signal processing means is, for example,
Pitch component extraction means for extracting a pitch component signal representing the pitch component of the voice represented by the voice signal from the acquired voice signal;
A non-pitch component extracting means for extracting a non-pitch component signal representing a component other than the pitch component of the sound represented by the sound signal from the acquired sound signal;
Pitch component frequency shift means for acquiring the pitch component signal and shifting the frequency of the pitch component signal so as to have a pattern on the time axis that represents the value of data to be embedded as a digital watermark;
What is necessary is just to have an adding means for generating a digital watermarked audio signal by obtaining the pitch component signal subjected to the frequency shift and the non-pitch component signal and adding them together. .

前記ピッチ成分抽出手段は、例えば、
前記音声信号が表す音声のピッチ成分の周波数を特定するピッチ周波数特定手段と、
前記音声信号のうち、特定された前記周波数近傍の成分を前記ピッチ成分として抽出するバンドパスフィルタと、を備えるものであればよく、
前記非ピッチ成分抽出手段は、例えば、前記音声信号から、特定された前記周波数近傍の成分を除去することによって、残余の成分を前記非ピッチ成分として抽出するバンドリジェクションフィルタにより構成されるものであればよい。 The pitch component extraction means is, for example,
Pitch frequency specifying means for specifying the frequency of the pitch component of the voice represented by the voice signal;
What is necessary is just to include a bandpass filter that extracts a component in the vicinity of the identified frequency as the pitch component of the audio signal,
The non-pitch component extraction unit is configured by a band rejection filter that extracts a residual component as the non-pitch component by removing a component near the specified frequency from the audio signal, for example. I just need it.

また、この発明の第２の観点に係る電子透かしデコード装置は、
音声を表す、電子透かしを施された音声信号を取得する音声信号取得手段と、
取得された前記音声信号より、当該音声信号が表す音声のピッチ成分を表すピッチ成分信号を抽出するピッチ成分抽出手段と、
抽出された前記ピッチ成分信号が有する、時間軸上における周波数のシフトのパターンを特定し、特定されたパターンに基づいて、前記音声信号に電子透かしとして埋め込まれたデータを特定する手段と、を備える、
ことを特徴とする。 A digital watermark decoding apparatus according to the second aspect of the present invention provides:
An audio signal acquisition means for acquiring an audio signal to which a digital watermark is applied, which represents audio;
Pitch component extraction means for extracting a pitch component signal representing the pitch component of the voice represented by the voice signal from the acquired voice signal;
Means for specifying a frequency shift pattern on the time axis of the extracted pitch component signal, and specifying data embedded as a digital watermark in the audio signal based on the specified pattern. ,
It is characterized by that.

また、この発明の第３の観点に係る電子透かしエンコード方法は、音声信号取得部と音声信号加工部を有する装置において実行される電子透かしエンコード方法であって、
前記音声信号取得部が、音声を表す音声信号を取得する音声信号取得ステップと、
前記音声信号加工部が、前記取得された音声信号を、当該音声信号が表す音声のピッチ成分の周波数のシフトの時間軸上におけるパターンが、電子透かしとして埋め込む対象のデータの値を表すものとなるように加工する音声信号加工ステップと、
を備えることを特徴とする。 A digital watermark encoding method according to a third aspect of the present invention is a digital watermark encoding method executed in an apparatus having an audio signal acquisition unit and an audio signal processing unit,
An audio signal acquisition step in which the audio signal acquisition unit acquires an audio signal representing the audio;
The audio signal processing unit, the acquired audio signal, and that the pattern on the time axis of the shift of the frequency of the pitch component of the speech to which the sound signal is represented, represents the value of the data to be embedded as an electronic watermark and the audio signal processing step of processing so that,
And wherein a call with a.

また、この発明の第４の観点に係る電子透かしデコード方法は、音声信号取得部とピッチ成分抽出部と特定部とを有する装置において実行される電子透かしデコード方法であって、
前記音声信号取得部が、音声を表す、電子透かしを施された音声信号を取得する音声信号取得ステップと、
前記ピッチ成分抽出部が、前記取得された音声信号から、当該音声信号が表す音声のピッチ成分を表すピッチ成分信号を抽出するピッチ成分抽出ステップと、
前記特定部が、前記抽出されたピッチ成分信号が有する、時間軸上における周波数のシフトのパターンを特定し、当該特定されたパターンに基づいて、前記音声信号に電子透かしとして埋め込まれたデータを特定する特定ステップと、
を備えることを特徴とする。
A digital watermark decoding method according to a fourth aspect of the present invention is a digital watermark decoding method executed in an apparatus having an audio signal acquisition unit, a pitch component extraction unit, and a specification unit,
The audio signal acquisition unit, representing sound, an audio signal acquiring step of acquiring speech signal subjected to digital watermark,
The pitch component extraction unit, wherein the acquired voice signal, the pitch component extraction step of extracting a pitch component signal representing a pitch component of the speech to which the sound signal is represented,
The specific portion, wherein a is extracted pitch component signal to identify the pattern of the frequency shift on the time axis, based on the specified pattern has been embedded as a digital watermark into the audio signal data Specific steps to identify ,
And wherein a call with a.

また、この発明の第５の観点に係るプログラムは、
コンピュータを、
音声を表す音声信号を取得する音声信号取得手段と、
取得された前記音声信号を、当該音声信号が表す音声のピッチ成分の周波数のシフトの時間軸上におけるパターンが、電子透かしとして埋め込む対象のデータの値を表すものとなるように加工する音声信号加工手段と、
して機能させるためのものであることを特徴とする。 A program according to the fifth aspect of the present invention is
Computer
An audio signal acquisition means for acquiring an audio signal representing the audio;
Audio signal processing for processing the acquired audio signal so that the pattern on the time axis of the frequency shift of the pitch component of the audio represented by the audio signal represents the value of data to be embedded as a digital watermark Means,
It is for making it function.

また、この発明の第６の観点に係るプログラムは、
コンピュータを、
音声を表す、電子透かしを施された音声信号を取得する音声信号取得手段と、
取得された前記音声信号より、当該音声信号が表す音声のピッチ成分を表すピッチ成分信号を抽出するピッチ成分抽出手段と、
抽出された前記ピッチ成分信号が有する、時間軸上における周波数のシフトのパターンを特定し、特定されたパターンに基づいて、前記音声信号に電子透かしとして埋め込まれたデータを特定する手段と、
して機能させるためのものであることを特徴とする。 A program according to the sixth aspect of the present invention is
Computer
An audio signal acquisition means for acquiring an audio signal to which a digital watermark is applied, which represents audio;
Pitch component extraction means for extracting a pitch component signal representing the pitch component of the voice represented by the voice signal from the acquired voice signal;
Means for specifying a frequency shift pattern on the time axis of the extracted pitch component signal, and specifying data embedded as a digital watermark in the audio signal based on the specified pattern;
It is for making it function.

この発明によれば、音声を表すデータの自由な供給を図りながら、音声データの有効な保護を図るための電子透かしエンコード装置、電子透かしデコード装置、電子透かしエンコード方法、電子透かしデコード方法及びプログラムが実現される。 According to the present invention, there are provided a digital watermark encoding device, a digital watermark decoding device, a digital watermark encoding method, a digital watermark decoding method, and a program for effectively protecting audio data while freely providing data representing audio. Realized.

以下、この発明の実施の形態を、電子透かしエンコード装置及び電子透かしデコード装置を例とし、図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings, taking a digital watermark encoding apparatus and a digital watermark decoding apparatus as examples.

（電子透かしエンコード装置）
図１は、この発明の実施の形態に係る電子透かしエンコード装置Ｅの構成を示す図である。図示するように、電子透かしエンコード装置Ｅは、音声データ入力部Ｅ１と、ピッチ特定部Ｅ２と、ＢＲＦ（バンドリジェクションフィルタ）Ｅ３と、ＢＰＦ（バンドパスフィルタ）Ｅ４と、ピッチシフト部Ｅ５と、加算部Ｅ６とにより構成されている。 (Digital watermark encoding device)
FIG. 1 is a diagram showing a configuration of a digital watermark encoding apparatus E according to an embodiment of the present invention. As shown in the figure, the digital watermark encoding device E includes an audio data input unit E1, a pitch specifying unit E2, a BRF (band rejection filter) E3, a BPF (bandpass filter) E4, a pitch shift unit E5, And an adder E6.

音声データ入力部Ｅ１、ピッチ特定部Ｅ２、ＢＲＦＥ３、ＢＰＦＥ４、ピッチシフト部Ｅ５及び加算部Ｅ６は、例えば、ＣＰＵ（中央処理ユニット）等のプロセッサや、このプロセッサが実行するためのプログラムを記憶するメモリなどより構成されている。
なお、音声データ入力部Ｅ１、ピッチ特定部Ｅ２、ＢＲＦＥ３、ＢＰＦＥ４、ピッチシフト部Ｅ５及び加算部Ｅ６の一部又は全部の機能を単一のプロセッサが行うようにしてもよい。 The audio data input unit E1, pitch identification unit E2, BRFE3, BPFE4, pitch shift unit E5, and addition unit E6 are, for example, a processor such as a CPU (Central Processing Unit) or a memory that stores a program to be executed by the processor. Etc.
A single processor may perform a part or all of the functions of the audio data input unit E1, the pitch identification unit E2, BRFE3, BPFE4, the pitch shift unit E5, and the addition unit E6.

音声データ入力部Ｅ１は、外部より、音声の波形を表すディジタル形式（例えば、ＰＣＭ（パルス符号変調）形式）の入力音声データを取得して、ピッチ特定部Ｅ２、ＢＲＦＥ３及びＢＰＦＥ４に供給する。 The voice data input unit E1 obtains input voice data in a digital format (for example, PCM (pulse code modulation) format) representing a voice waveform from the outside, and supplies the input voice data to the pitch specifying unit E2, BRFE3, and BPFE4.

なお、音声データ入力部Ｅ１が入力音声データを取得する手法は任意であり、例えば、図示しないインターフェース回路を介して外部の装置（例えば、ハードディスク装置など）やネットワークから取得してもよいし、図示しない記録媒体ドライブ装置にセットされた記録媒体（例えば、フレキシブルディスクやＣＤ−ＲＯＭなど）から、この記録媒体ドライブ装置を介して読み取ってもよい。 Note that the voice data input unit E1 may acquire any input voice data. For example, the voice data input unit E1 may acquire the voice data from an external device (for example, a hard disk device) or a network via an interface circuit (not shown). You may read from the recording medium (for example, a flexible disk, CD-ROM, etc.) set to the recording medium drive device which does not carry out via this recording medium drive device.

ピッチ特定部Ｅ２は、音声データ入力部Ｅ１より供給された入力音声データが表す音声のピッチ成分の周波数の時間変化を特定し、特定した当該時間変化を表すデータ（以下、ピッチ周波数データと呼ぶ）を、ＢＲＦＥ３及びＢＰＦＥ４へと、同時に、連続的に供給する。 The pitch specifying unit E2 specifies the time change of the frequency of the pitch component of the voice represented by the input voice data supplied from the voice data input unit E1, and represents the specified time change (hereinafter referred to as pitch frequency data). Are supplied continuously to BRFE3 and BPFE4 simultaneously.

ピッチ特定部Ｅ２は、具体的には、例えばこの入力音声データにケプストラム解析を施すことにより、ピッチ成分の周波数の時間変化を特定すればよい。すなわち、例えば入力音声データが表す波形を時間軸上で多数の小部分へと区切り、得られたそれぞれの小部分の強度を、元の値の対数（対数の底は任意）に実質的に等しい値へと変換し、値が変換されたこの小部分のスペクトル（すなわち、ケプストラム）を、高速フーリエ変換の手法（あるいは、離散的変数をフーリエ変換した結果を表すデータを生成する他の任意の手法）により求める。そして、このケプストラムの極大値を与える周波数のうちの最小値を、この小部分におけるピッチ成分の周波数として特定する。 Specifically, the pitch specifying unit E2 may specify the time change of the frequency of the pitch component, for example, by performing cepstrum analysis on the input voice data. That is, for example, the waveform represented by the input voice data is divided into a number of small parts on the time axis, and the intensity of each obtained small part is substantially equal to the logarithm of the original value (the base of the logarithm is arbitrary). Convert this value into a value, and then convert this small portion of the spectrum (ie, the cepstrum) to a Fast Fourier Transform technique (or any other technique that produces data representing the result of Fourier transforming a discrete variable) ) Then, the minimum value among the frequencies giving the maximum value of the cepstrum is specified as the frequency of the pitch component in this small portion.

なお、ピッチ成分の周波数の時間変化は、例えば、特開２００３−１０８１７２号公報に開示された手法に従って入力音声データをピッチ波形データへと変換してから、このピッチ波形データに基づいて特定するようにすると良好な結果が期待できる。具体的には、入力音声データをフィルタリングしてピッチ信号を抽出し、抽出されたピッチ信号に基づいて、入力音声データが表す波形を単位ピッチ長の区間へと区切り、各区間について、ピッチ信号との相関関係に基づいて位相のずれを特定して各区間の位相を揃えることにより、入力音声データをピッチ波形信号へと変換すればよい。そして、得られたピッチ波形信号を入力音声データとして扱い、ケプストラム解析を行う等することにより、ピッチ成分の周波数の時間変化を特定すればよい。 The time change of the frequency of the pitch component is specified based on the pitch waveform data after converting the input voice data into the pitch waveform data according to the method disclosed in Japanese Patent Laid-Open No. 2003-108172, for example. A good result can be expected. Specifically, the input voice data is filtered to extract a pitch signal, and based on the extracted pitch signal, the waveform represented by the input voice data is divided into sections of unit pitch length. The input speech data may be converted into a pitch waveform signal by identifying the phase shift based on the correlation between the two and aligning the phases of the sections. Then, the obtained pitch waveform signal is handled as input voice data, and a cepstrum analysis is performed, for example, so that the time change of the frequency of the pitch component may be specified.

ＢＲＦＥ３は、自己に入力された信号のうち、自己の中心周波数及びその近傍の周波数の成分を実質的に遮断し、その他の成分を通過させるようなフィルタリングを行う。
具体的には、自己の中心周波数を、ピッチ特定部Ｅ２より供給されるピッチ周波数データが示す値（つまり、入力音声データのピッチ成分の周波数の現在の値）に実質的に等しくなるよう設定する。そして、音声データ入力部Ｅ１より供給される入力音声データをフィルタリングして、フィルタリングされた入力音声データ（以下、非ピッチ成分と呼ぶ）を、加算部Ｅ６へと供給する。 BRFE3 performs filtering so as to substantially cut off the center frequency of the signal input to itself and the frequency components in the vicinity thereof and allow other components to pass therethrough.
Specifically, the self center frequency is set to be substantially equal to the value indicated by the pitch frequency data supplied from the pitch specifying unit E2 (that is, the current value of the frequency of the pitch component of the input audio data). . Then, the input voice data supplied from the voice data input unit E1 is filtered, and the filtered input voice data (hereinafter referred to as a non-pitch component) is supplied to the adding unit E6.

ＢＰＦＥ４は、自己に入力された信号のうち、自己の中心周波数及びその近傍の周波数の成分を通過させ、その他の成分を実質的に遮断するようなフィルタリングを行う。
具体的には、自己の中心周波数を、ピッチ特定部Ｅ２より供給されるピッチ周波数データが示す値に実質的に等しくなるよう設定する。そして、音声データ入力部Ｅ１より供給される入力音声データをフィルタリングして、フィルタリングされた入力音声データ（以下、ピッチ成分と呼ぶ）を、ピッチシフト部Ｅ５へと供給する。 The BPFE 4 performs filtering so as to pass components of its own center frequency and frequencies in the vicinity of the signal inputted to itself and substantially block other components.
Specifically, the self center frequency is set to be substantially equal to the value indicated by the pitch frequency data supplied from the pitch specifying unit E2. Then, the input voice data supplied from the voice data input unit E1 is filtered, and the filtered input voice data (hereinafter referred to as pitch component) is supplied to the pitch shift unit E5.

なお、非ピッチ成分及びピッチ成分は、両者を互いに加算すると入力音声データと実質的に同一の音声データが得られる、という関係になるように、ＢＲＦＥ３及びＢＰＦＥ４はそれぞれフィルタリングを行うものとする。
また、非ピッチ成分及びピッチ成分は、いずれも同一の形式のデータからなるものとし、例えば、ＢＲＦＥ３及びＢＰＦＥ４に供給された入力音声データのサンプリング間隔と実質的に同一のサンプリング間隔を有するディジタル形式のデータからなるものとする。 It is assumed that BRFE3 and BPFE4 perform filtering so that the non-pitch component and the pitch component have the relationship that when the two are added together, substantially the same audio data as the input audio data is obtained.
The non-pitch component and the pitch component are both composed of the same type of data. For example, a digital format having a sampling interval substantially the same as the sampling interval of the input audio data supplied to BRFE3 and BPFE4. It shall consist of data.

ピッチシフト部Ｅ５は、ＢＰＦＥ４よりピッチ成分を取得する一方、外部より、電子透かしの形で埋め込む対象の２値のビット列（例えば、入力音声データの作成者名を２値のビット列として表すデータなど）を取得する。そして、取得したピッチ成分の周波数を、このビット列内の各ビット値に従ってシフトさせることにより、このビット列が表す情報を含むようピッチ成分をエンコードする。エンコードしたピッチ成分は、加算部Ｅ６へと供給する。 The pitch shift unit E5 obtains the pitch component from the BPFE 4, while the binary bit string to be embedded from the outside in the form of a digital watermark (for example, data representing the creator name of the input audio data as a binary bit string) To get. Then, by shifting the frequency of the acquired pitch component according to each bit value in the bit string, the pitch component is encoded so as to include information represented by the bit string. The encoded pitch component is supplied to the adding unit E6.

具体的には、ピッチシフト部Ｅ５は、例えば図２に示すように、当該ビット列の先頭から順に各々のビットの値を判別し、当該ビットの値が“１”であれば一定時間（図２において“ΔＴ”として示す時間）だけピッチ成分の周波数を一定幅（図２において“Δｆ”として示す周波数）だけシフトし、値が“０”であれば当該時間ΔＴの間はピッチ成分の周波数のシフトを行わない、という動作を、当該ビット列の末尾のビットに至るまで続ければよい。 Specifically, for example, as shown in FIG. 2, the pitch shift unit E5 determines the value of each bit in order from the top of the bit string, and if the value of the bit is “1”, the pitch shift unit E5 determines a certain time (FIG. 2). The frequency of the pitch component is shifted by a certain width (the frequency indicated as “Δf” in FIG. 2) by the time indicated as “ΔT” in FIG. 2, and if the value is “0”, the frequency of the pitch component is increased during the time ΔT. The operation of not shifting may be continued until the last bit of the bit string is reached.

なお、ピッチ成分の周波数のシフトは、例えば、上述の値Δｆに等しい周波数を有する局部発振信号を生成して、この局部発振信号とピッチ成分とを混合し、得られた信号をフィルタリングして、局部発振信号の周波数とピッチ信号の周波数との和又は差に当たる周波数を有する成分を抽出することにより行えばよい。 The frequency shift of the pitch component is, for example, by generating a local oscillation signal having a frequency equal to the above-described value Δf, mixing the local oscillation signal and the pitch component, and filtering the obtained signal, This may be done by extracting a component having a frequency corresponding to the sum or difference between the frequency of the local oscillation signal and the frequency of the pitch signal.

また、上述の値Δｆは、ピッチシフト部Ｅ５によるピッチ成分の周波数のシフトの結果、ピッチ成分のうち周波数のシフトが行われた区間と行われていない区間との境界に、必ず一定量（後述する図４において“ΔＦ”として示す量）以上の周波数の差を生じさせるような十分大きな値であるものとする。この条件を満たす周波数Δｆは、例えば実験などに基づいて経験的に定められればよい。 Further, the above-described value Δf is always a fixed amount (described later) at the boundary between the section where the frequency shift is performed and the section where the frequency shift is not performed among the pitch components as a result of the frequency shift of the pitch component by the pitch shift unit E5. It is assumed that the value is sufficiently large so as to cause a frequency difference equal to or greater than the amount shown as “ΔF” in FIG. The frequency Δf that satisfies this condition may be determined empirically based on, for example, experiments.

また、ピッチシフト部Ｅ５がビット列を取得する手法は任意であり、例えば、音声データ入力部Ｅ１が入力音声データを取得する手法と実質的に同一の手法によりビット列を取得すればよい。また、ピッチシフト部Ｅ５の機能を実現する処理を行うプロセッサが、自己の実行する他の処理により生成されたビット列を、電子透かしの形で埋め込むビット列として、ピッチシフト部Ｅ５の機能を実現する処理へと引き渡すようにしてもよい。 The technique by which the pitch shift unit E5 acquires the bit string is arbitrary, and for example, the bit string may be acquired by a technique that is substantially the same as the technique by which the audio data input unit E1 acquires the input audio data. In addition, a processor that performs processing for realizing the function of the pitch shift unit E5 performs processing for realizing the function of the pitch shift unit E5 using a bit string generated by other processing executed by itself as a bit string embedded in the form of a digital watermark. You may make it hand over.

加算部Ｅ６は、ＢＲＦＥ３より供給される非ピッチ成分を取得し、また、ＢＰＦＥ４よりピッチ成分が供給されている間は、このピッチ成分も取得する。そして、同時に取得した非ピッチ成分及びピッチ成分の各値の和（ただし、ピッチ成分が供給されていない間は、ピッチ成分の値は０であるものとする）に相当する値を有する出力音声データを生成し、出力する。この出力音声データは、音声データ入力部Ｅ１が取得した入力音声データと同一の形式のデータからなるものとする。 The adder E6 acquires the non-pitch component supplied from BRFE3, and also acquires this pitch component while the pitch component is supplied from BPFE4. The output audio data having a value corresponding to the sum of the values of the non-pitch component and the pitch component acquired at the same time (provided that the value of the pitch component is 0 while the pitch component is not supplied). Is generated and output. The output audio data is assumed to be composed of data in the same format as the input audio data acquired by the audio data input unit E1.

出力音声データは、音声データ入力部Ｅ１が取得した入力音声データに電子透かしが施されたものに相当するものであって、ピッチシフト部Ｅ５が取得したビット列が表す情報が埋め込まれているものであるといえる。 The output audio data is equivalent to the input audio data acquired by the audio data input unit E1 and digital watermark is applied, and the information represented by the bit string acquired by the pitch shift unit E5 is embedded. It can be said that there is.

なお、加算部Ｅ６が出力音声データを出力する手法は任意であり、例えば、図示しないインターフェース回路を介して外部の装置（例えば、ハードディスク装置など）の記憶領域に格納してもよいし、ネットワークへと送出してもよい。また、図示しない記録媒体ドライブ装置にセットされた記録媒体へ、この記録媒体ドライブ装置を介して書き込んでもよい。 Note that the adding unit E6 can output the output audio data in any manner. For example, the adding unit E6 may store the output audio data in a storage area of an external device (for example, a hard disk device) via an interface circuit (not shown) or to a network. May be sent. Further, the recording medium may be written to a recording medium set in a recording medium drive device (not shown) via the recording medium drive device.

（電子透かしデコード装置）
図３は、この発明の実施の形態に係る電子透かしデコード装置Ｄの構成を示す図である。図示するように、電子透かしデコード装置Ｄは、音声データ入力部Ｄ１と、ピッチ特定部Ｄ２と、ＢＰＦＤ３と、ビット列抽出部Ｄ４とにより構成されている。 (Digital watermark decoding device)
FIG. 3 is a diagram showing the configuration of the digital watermark decoding apparatus D according to the embodiment of the present invention. As shown in the figure, the digital watermark decoding apparatus D is composed of an audio data input unit D1, a pitch specifying unit D2, a BPFD 3, and a bit string extraction unit D4.

音声データ入力部Ｄ１、ピッチ特定部Ｄ２、ＢＰＦＤ３及びビット列抽出部Ｄ４は、例えばＣＰＵ等のプロセッサや、このプロセッサが実行するためのプログラムを記憶するメモリなどより構成されている。なお、音声データ入力部Ｄ１、ピッチ特定部Ｄ２、ＢＰＦＤ３及びビット列抽出部Ｄ４の一部又は全部の機能を単一のプロセッサが行うようにしてもよい。 The audio data input unit D1, the pitch identification unit D2, the BPFD 3, and the bit string extraction unit D4 are configured by, for example, a processor such as a CPU and a memory that stores a program to be executed by the processor. Note that a single processor may perform a part or all of the functions of the audio data input unit D1, the pitch identification unit D2, the BPFD 3, and the bit string extraction unit D4.

音声データ入力部Ｄ１は、外部より、上述の出力音声データに相当する、電子透かしが施されたディジタル形式の音声データ（以下、デコード対象音声データと呼ぶ）を取得して、ピッチ特定部Ｄ２及びＢＰＦＤ３に供給する。音声データ入力部Ｄ１がデコード対象音声データを取得する手法は任意であり、例えば、上述の電子透かしエンコード装置Ｅの音声データ入力部Ｅ１が入力音声データを取得する手法と実質的に同一の手法により取得すればよい。 The audio data input unit D1 obtains, from the outside, digital audio data (hereinafter referred to as decoding target audio data) to which digital watermarking is applied, corresponding to the output audio data described above, and the pitch specifying unit D2 and Supply to BPFD3. The method by which the audio data input unit D1 acquires the decoding target audio data is arbitrary. For example, the audio data input unit E1 of the digital watermark encoding device E described above is substantially the same as the method by which the input audio data is acquired. Get it.

ピッチ特定部Ｄ２は、音声データ入力部Ｄ１より供給されたデコード対象音声データが表す音声のピッチ成分の周波数の時間変化を、例えば電子透かしエンコード装置Ｅのピッチ特定部Ｅ２が行う手法と実質的に同一の手法により特定する。そして、特定した当該時間変化を表すピッチ周波数データを、ＢＰＦＤ３へと連続的に供給する。 The pitch specifying unit D2 is substantially the same as the technique in which the pitch specifying unit E2 of the digital watermark encoding apparatus E performs the time change of the frequency of the pitch component of the voice represented by the decoding target audio data supplied from the audio data input unit D1. Specify by the same method. And the pitch frequency data showing the specified said time change is continuously supplied to BPFD3.

なお、ピッチ特定部Ｄ２は、ピッチ周波数データのうち、ピッチ成分が含まれていないデコード対象音声データの区間に相当する部分については、例えばラグランジェ補間あるいは直線補間などの手法により補間を行うものとすればよい。 Note that the pitch specifying unit D2 interpolates a portion corresponding to a section of the audio data to be decoded that does not include a pitch component in the pitch frequency data by a method such as Lagrange interpolation or linear interpolation, for example. do it.

ＢＰＦＤ３は、例えば電子透かしエンコード装置ＥのＢＰＦＥ４と実質的に同一の機能を行うものであり、自己に入力された信号のうち、自己の中心周波数及びその近傍の周波数の成分を通過させ、その他の成分を実質的に遮断するようなフィルタリングを行う。
そして、フィルタリングされたデコード対象音声データ（以下、デコード用ピッチ成分と呼ぶ）をビット列抽出部Ｄ４へと供給する。 The BPFD 3 performs substantially the same function as the BPFE 4 of the digital watermark encoding apparatus E, for example, and passes the center frequency of the signal input to itself and components in the vicinity thereof, Filtering is performed so as to substantially block the components.
The filtered decoding target audio data (hereinafter referred to as a decoding pitch component) is supplied to the bit string extraction unit D4.

ビット列抽出部Ｄ４は、ＢＰＦＤ３から供給されるデコード用ピッチ成分を取得し、このデコード用ピッチ成分の周波数のシフトのパターンに基づいて、デコード対象音声データに埋め込まれているビット列を抽出する。そして、抽出されたビット列を外部に出力する。 The bit string extraction unit D4 acquires the decoding pitch component supplied from the BPFD 3, and extracts the bit string embedded in the decoding target audio data based on the frequency shift pattern of the decoding pitch component. Then, the extracted bit string is output to the outside.

具体的には、例えばデコード対象音声データが、ピッチ成分について図２を参照して上述した態様による電子透かしを施されたものである場合、ビット列抽出部Ｄ４は、例えば図４に示すように、デコード用ピッチ成分の先頭から時間ΔＴおきに到来する時点を経過するたびに、当該時点直後のピッチ成分の周波数が、当該時点又はその直前のピッチ成分の周波数よりΔＦ以上高いか、ΔＦ以上低いかを判別すればよい（なお、図４に示す例は、ΔＦが正の値である場合を表している）。そして、ΔＦ以上高いと判別したときは、当該時点以降の時間ΔＴ分の区間が値“１”のビットを表しているものと判断して、値が“１”のビットを生成し、また、ΔＦ以上低いと判別したときは、当該時点以降の時間ΔＴ分の区間が値“０”のビットを表しているものと判断して、値が“０”のビットを生成する、という処理を順次行うことにより、ビット列の抽出を行えばよい。 Specifically, for example, when the audio data to be decoded has been subjected to the digital watermark according to the above-described aspect with respect to the pitch component with reference to FIG. 2, the bit string extraction unit D4, for example, as shown in FIG. Whether the frequency of the pitch component immediately after the time point is higher by ΔF or lower than the frequency of the pitch component immediately before or immediately after the time point that has arrived from the beginning of the decoding pitch component every time ΔT (The example shown in FIG. 4 represents a case where ΔF is a positive value). When it is determined that it is higher than ΔF, it is determined that the section for the time ΔT after the time point represents the bit having the value “1”, and the bit having the value “1” is generated. When it is determined that it is lower than ΔF, it is determined that a section of time ΔT after that time point represents a bit having a value “0”, and a process of generating a bit having a value “0” is sequentially performed. By doing so, the bit string may be extracted.

なお、ビット列抽出部Ｄ４がビット列を出力する手法は任意であり、例えば、図示しないインターフェース回路を介して外部の装置の記憶領域に格納してもよいし、ネットワークへと送出してもよい。また、図示しない記録媒体ドライブ装置にセットされた記録媒体へ、この記録媒体ドライブ装置を介して書き込んでもよい。 Note that the bit string extraction unit D4 can output the bit string in any manner. For example, the bit string extraction unit D4 may store the bit string in a storage area of an external device via an interface circuit (not shown) or send the bit string to a network. Further, the recording medium may be written to a recording medium set in a recording medium drive device (not shown) via the recording medium drive device.

以上説明した電子透かしエンコード装置Ｅは、音声データを、ピッチ成分と非ピッチ成分とに分離し、埋め込むべきビット列の値に従ってピッチ成分の周波数をシフトした上、当該ピッチ成分と非ピッチ成分とを加算することにより、当該ビット列が表す情報を当該音声データに埋め込む（すなわち、当該音声データに電子透かしを施す）。そして電子透かしデコード装置Ｄは、電子透かしエンコード装置Ｅによって音声データに埋め込まれたビット列を、当該音声データから抽出する。 The digital watermark encoding apparatus E described above separates audio data into a pitch component and a non-pitch component, shifts the frequency of the pitch component according to the value of the bit string to be embedded, and then adds the pitch component and the non-pitch component. As a result, the information represented by the bit string is embedded in the audio data (that is, the audio data is digitally watermarked). Then, the digital watermark decoding device D extracts the bit string embedded in the audio data by the digital watermark encoding device E from the audio data.

一般的に、人間が発する音声のピッチ成分の周波数は２００［Ｈｚ］前後であり、一方、人間の可聴域はほぼ２０〜２００００［Ｈｚ］の範囲にあるものの、２００［Ｈｚ］前後の周波数の成分については感度が低く、このような成分に変化を生じさせてもこの変化を聞き分けることは困難である。従って、電子透かしエンコード装置Ｅによって電子透かしを施された音声データを用いて再生された音声と、電子透かしを施される前の当該音声データを用いて再生された音声との差異は、人間には識別が困難である。すなわち、電子透かしエンコード装置Ｅによる電子透かしが音声に与える影響は無視できる程度に抑えられる。 In general, the frequency of the pitch component of the voice uttered by human beings is around 200 [Hz], while the human audible range is in the range of about 20 to 20000 [Hz], but the frequency around 200 [Hz]. The sensitivity of the component is low, and even if a change is caused in such a component, it is difficult to distinguish this change. Therefore, the difference between the audio reproduced using the audio data that has been digitally watermarked by the digital watermark encoding apparatus E and the audio reproduced using the audio data before the digital watermarking is Is difficult to identify. In other words, the influence of the digital watermark by the digital watermark encoding apparatus E on the sound is suppressed to a negligible level.

また、電子透かしエンコード装置Ｅは音声データのピッチ成分の周波数を一部シフトすることにより電子透かしを施すものである。従って、このシフトの幅を知らない者は、電子透かしエンコード装置Ｅにより電子透かしを施された音声データから、電子透かしを施される前の音声データを完全に復元することができない。
また、電子透かしエンコード装置Ｅによって電子透かしを施された音声データをアナログコピーして得られたデータからも、埋め込まれたビット列の抽出が可能である。 The digital watermark encoding device E applies digital watermarking by partially shifting the frequency of the pitch component of the audio data. Therefore, a person who does not know the width of the shift cannot completely restore the audio data before being subjected to the digital watermark from the audio data subjected to the digital watermark by the digital watermark encoding apparatus E.
Also, it is possible to extract an embedded bit string from data obtained by analog copying audio data that has been digitally watermarked by the digital watermark encoding device E.

なお、電子透かしエンコード装置Ｅ及び電子透かしデコード装置Ｄの構成は上述のものに限られない。
例えば、音声データ入力部Ｅ１は、マイクロフォン、増幅器、サンプリング回路、Ａ／Ｄ（Analog-to-Digital）コンバータ及びＰＣＭエンコーダなどを備えていてもよい。この場合、音声データ入力部Ｅ１は、自己のマイクロフォンが集音した音声を表す音声信号を増幅し、サンプリングしてＡ／Ｄ変換した後、サンプリングされた音声信号にＰＣＭ変調を施すことにより、音声データを作成してもよい。 The configurations of the digital watermark encoding device E and the digital watermark decoding device D are not limited to those described above.
For example, the audio data input unit E1 may include a microphone, an amplifier, a sampling circuit, an A / D (Analog-to-Digital) converter, a PCM encoder, and the like. In this case, the audio data input unit E1 amplifies the audio signal representing the audio collected by its own microphone, performs sampling and A / D conversion, and then performs PCM modulation on the sampled audio signal, thereby generating audio. Data may be created.

また、ピッチシフト部Ｅ５がピッチ成分の周波数をシフトする規則も図２に示すものに限られない。従って、例えば図５に示すように、埋め込む対象のビット列の先頭から順に各々のビットの値を判別し、当該ビットの値が“１”であれば一定時間ΔＴだけピッチ成分の周波数を一定幅だけ前回シフトした方向と逆方向にシフトし（すなわち、例えば前回Δｆだけシフトしたのであれば（−Δｆ）だけシフトし、前回（−Δｆ）だけシフトしたのであればΔｆだけシフトし）、値が“０”であれば当該時間ΔＴの間は前回シフトした方向と同方向で上述の一定幅のシフトを継続する、という動作を、当該ビット列の末尾のビットに至るまで続けるようにしてもよい。
この場合、ビット列抽出部Ｄ４は、例えば図６に示すように、デコード用ピッチ成分の先頭から時間ΔＴおきに到来する時点を経過するたびに、当該時点前後のピッチ成分の周波数に±ΔＦ以上の差があるか否かを判別すればよい。そして、±ΔＦ以上の差があると判別したときは、当該時点におけるピッチ成分の周波数の変化が値“１”のビットを表しているものと判断して、値が“１”のビットを生成し、また、±ΔＦ以上の差がないと判別したときは、当該時点におけるピッチ成分の周波数の変化が値“０”のビットを表しているものと判断して、値が“０”のビットを生成する、という処理を順次行うことにより、ビット列の抽出を行えばよい。 Further, the rule by which the pitch shift unit E5 shifts the frequency of the pitch component is not limited to that shown in FIG. Therefore, for example, as shown in FIG. 5, the value of each bit is determined in order from the beginning of the bit string to be embedded. Shift in the direction opposite to the previous shift direction (ie, if the previous shift is Δf, shift by (−Δf), if the previous shift is (−Δf), shift by Δf), and the value is “ If it is “0”, the operation of continuing the above-mentioned shift of the constant width in the same direction as the previous shift during the time ΔT may be continued until the last bit of the bit string is reached.
In this case, for example, as shown in FIG. 6, the bit string extraction unit D4 increases the frequency of the pitch component before and after the time point by more than ± ΔF every time when the time point arrives from the beginning of the decoding pitch component every time ΔT. What is necessary is just to discriminate | determine whether there exists a difference. When it is determined that there is a difference of ± ΔF or more, it is determined that the change in the frequency of the pitch component at that time point represents the bit having the value “1”, and the bit having the value “1” is generated If it is determined that there is no difference of ± ΔF or more, it is determined that the change in the frequency of the pitch component at that time point represents the bit having the value “0”, and the bit having the value “0”. The bit string may be extracted by sequentially performing the process of generating.

以上、この発明の実施の形態を説明したが、この発明にかかる電子透かしエンコード装置及び電子透かしデコード装置は、専用のシステムによらず、通常のコンピュータシステムを用いて実現可能である。 Although the embodiment of the present invention has been described above, the digital watermark encoding apparatus and the digital watermark decoding apparatus according to the present invention can be realized using a normal computer system, not a dedicated system.

例えば、パーソナルコンピュータに上述の音声データ入力部Ｅ１、ピッチ特定部Ｅ２、ＢＲＦＥ３、ＢＰＦＥ４、ピッチシフト部Ｅ５及び加算部Ｅ６の動作を実行させるためのプログラムを格納した記録媒体（ＣＤ−ＲＯＭ、フレキシブルディスク等）から該プログラムをインストールすることにより、上述の処理を実行する電子透かしエンコード装置Ｅを構成することができる。 For example, a recording medium (CD-ROM, flexible disk) storing a program for causing a personal computer to execute the operations of the audio data input unit E1, pitch identification unit E2, BRFE3, BPFE4, pitch shift unit E5, and addition unit E6. Etc.), the digital watermark encoding apparatus E that executes the above-described processing can be configured.

また、パーソナルコンピュータに上述の音声データ入力部Ｄ１、ピッチ特定部Ｄ２、ＢＰＦＤ３及びビット列抽出部Ｄ４の動作を実行させるためのプログラムを格納した記録媒体（ＣＤ−ＲＯＭ、フレキシブルディスク等）から該プログラムをインストールすることにより、上述の処理を実行する電子透かしデコード装置Ｄを構成することができる。 Further, the program is stored from a recording medium (CD-ROM, flexible disk, etc.) that stores a program for causing the personal computer to execute the operations of the audio data input unit D1, pitch identification unit D2, BPFD3, and bit string extraction unit D4. By installing, it is possible to configure the digital watermark decoding apparatus D that executes the above-described processing.

なお、パーソナルコンピュータに電子透かしエンコード装置Ｅ又は電子透かしデコード装置Ｄの機能を行わせるプログラムは、例えば、通信回線の掲示板（ＢＢＳ）にアップロードし、これを通信回線を介して配信してもよく、また、これらのプログラムを表す信号により搬送波を変調し、得られた変調波を伝送し、この変調波を受信した装置が変調波を復調してこれらのプログラムを復元するようにしてもよい。
そして、これらのプログラムを起動し、ＯＳの制御下に、他のアプリケーションプログラムと同様に実行することにより、上述の処理を実行することができる。 Note that a program for causing a personal computer to perform the function of the digital watermark encoding device E or the digital watermark decoding device D may be uploaded to a bulletin board (BBS) of a communication line and distributed via the communication line, for example. Further, the carrier wave may be modulated with a signal representing these programs, the obtained modulated wave may be transmitted, and the apparatus that receives the modulated wave may demodulate the modulated wave to restore these programs.
The above-described processing can be executed by starting up these programs and executing them under the control of the OS in the same manner as other application programs.

なお、ＯＳが処理の一部を分担する場合、あるいは、ＯＳが本願発明の１つの構成要素の一部を構成するような場合には、記録媒体には、その部分を除いたプログラムを格納してもよい。この場合も、この発明では、その記録媒体には、コンピュータが実行する各機能又はステップを実行するためのプログラムが格納されているものとする。 When the OS shares a part of the processing, or when the OS constitutes a part of one component of the present invention, a program excluding the part is stored in the recording medium. May be. Also in this case, in the present invention, it is assumed that the recording medium stores a program for executing each function or step executed by the computer.

この発明の実施の形態に係る電子透かしエンコード装置の構成を示すブロック図である。It is a block diagram which shows the structure of the digital watermark encoding apparatus which concerns on embodiment of this invention. ピッチ成分が加工される態様を示すグラフである。It is a graph which shows the aspect by which a pitch component is processed. この発明の実施の形態に係る電子透かしデコード装置の構成を示すブロック図である。It is a block diagram which shows the structure of the digital watermark decoding apparatus concerning embodiment of this invention. ビット列が抽出される態様を説明するグラフである。It is a graph explaining the aspect from which a bit string is extracted. ピッチ成分が加工される他の態様を示すグラフである。It is a graph which shows the other aspect by which a pitch component is processed. ビット列が抽出される他の態様を説明するグラフである。It is a graph explaining the other aspect from which a bit sequence is extracted.

Explanation of symbols

Ｅ電子透かしエンコード装置
Ｅ１音声データ入力部
Ｅ２ピッチ特定部
Ｅ３ＢＲＦ
Ｅ４ＢＰＦ
Ｅ５ピッチシフト部
Ｅ６加算部
Ｄ電子透かしデコード装置
Ｄ１音声データ入力部
Ｄ２ピッチ特定部
Ｄ３ＢＰＦ
Ｄ４ビット列抽出部 E Digital watermark encoding device E1 Audio data input unit E2 Pitch identification unit E3 BRF
E4 BPF
E5 Pitch shift unit E6 Addition unit D Digital watermark decoding device D1 Audio data input unit D2 Pitch identification unit D3 BPF
D4 Bit string extraction unit

Claims

An audio signal acquisition means for acquiring an audio signal representing the audio;
Audio signal processing for processing the acquired audio signal so that the pattern on the time axis of the frequency shift of the pitch component of the audio represented by the audio signal represents the value of data to be embedded as a digital watermark Means,
A digital watermark encoding apparatus characterized by the above.

The voice signal processing means is
Pitch component extraction means for extracting a pitch component signal representing the pitch component of the voice represented by the voice signal from the acquired voice signal;
A non-pitch component extracting means for extracting a non-pitch component signal representing a component other than the pitch component of the sound represented by the sound signal from the acquired sound signal;
Pitch component frequency shift means for acquiring the pitch component signal and shifting the frequency of the pitch component signal so as to have a pattern on the time axis that represents the value of data to be embedded as a digital watermark;
Adding a frequency component shifted pitch component signal and a non-pitch component signal, and adding them together to generate a digital watermarked audio signal;
The digital watermark encoding apparatus according to claim 1.

The pitch component extraction means includes
Pitch frequency specifying means for specifying the frequency of the pitch component of the voice represented by the voice signal;
A band-pass filter that extracts a component in the vicinity of the identified frequency as the pitch component of the audio signal,
The non-pitch component extraction unit is configured by a band rejection filter that extracts a residual component as the non-pitch component by removing a component near the specified frequency from the audio signal.
The digital watermark encoding apparatus according to claim 2.

An audio signal acquisition means for acquiring an audio signal to which a digital watermark is applied, which represents audio;
Pitch component extraction means for extracting a pitch component signal representing the pitch component of the voice represented by the voice signal from the acquired voice signal;
Means for specifying a frequency shift pattern on the time axis of the extracted pitch component signal, and specifying data embedded as a digital watermark in the audio signal based on the specified pattern. ,
A digital watermark decoding apparatus characterized by the above.

An electronic watermark encoding method executed in an apparatus having an audio signal acquisition unit and an audio signal processing unit,
An audio signal acquisition step in which the audio signal acquisition unit acquires an audio signal representing the audio;
The audio signal processing unit, the acquired audio signal, and that the pattern on the time axis of the shift of the frequency of the pitch component of the speech to which the sound signal is represented, represents the value of the data to be embedded as an electronic watermark and the audio signal processing step of processing so that,
Watermarking encoding method characterized that you provided with.

An electronic watermark decoding method executed in an apparatus having an audio signal acquisition unit, a pitch component extraction unit, and a specification unit,
The audio signal acquisition unit, representing sound, an audio signal acquiring step of acquiring speech signal subjected to digital watermark,
The pitch component extraction unit, wherein the acquired voice signal, the pitch component extraction step of extracting a pitch component signal representing a pitch component of the speech to which the sound signal is represented,
The specific portion, wherein a is extracted pitch component signal to identify the pattern of the frequency shift on the time axis, based on the specified pattern has been embedded as a digital watermark into the audio signal data Specific steps to identify ,
Electronic watermark decoding method characterized that you provided with.

Computer
An audio signal acquisition means for acquiring an audio signal representing the audio;
Audio signal processing for processing the acquired audio signal so that the pattern on the time axis of the frequency shift of the pitch component of the audio represented by the audio signal represents the value of data to be embedded as a digital watermark Means,
Program to make it function.

Computer
An audio signal acquisition means for acquiring an audio signal to which a digital watermark is applied, which represents audio;
Pitch component extraction means for extracting a pitch component signal representing the pitch component of the voice represented by the voice signal from the acquired voice signal;
Means for specifying a frequency shift pattern on the time axis of the extracted pitch component signal, and specifying data embedded as a digital watermark in the audio signal based on the specified pattern;
Program to make it function.