JPH0442679B2

JPH0442679B2 -

Info

Publication number: JPH0442679B2
Application number: JP60243566A
Authority: JP
Inventors: Shinsuke Sakai; Katsunobu Fushikida
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1985-10-30
Filing date: 1985-10-30
Publication date: 1992-07-14
Also published as: JPS62102296A

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、音声蓄積サービス等に用いる音声編
集装置に関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to an audio editing device used for audio storage services and the like.

（従来技術とその問題点）従来、音声の蓄積サービスにおいては、符号化
によるデータ量の圧縮等の処理は行なわれている
が、音声の内容に対しては処理は行なつていなか
つた。そこで、入力された音声からキーワードの
みを抽出して蓄積するなどという処理は不可能で
あつた。例えば、電子通信学会技術研究報告
SE79−98に「音声蓄積サービスの構想」と題し
て発表された論文においては、音声の蓄積・再生
については言及されているが、音声の内容に立ち
入つた処理は、なされていない。(Prior Art and its Problems) Conventionally, in audio storage services, processing such as compression of the amount of data through encoding has been performed, but no processing has been performed on the content of the audio. Therefore, it has been impossible to extract and store only keywords from input speech. For example, the Institute of Electronics and Communication Engineers technical research report
The paper published in SE79-98 titled ``Sound Storage Service Concept'' mentions the storage and playback of audio, but does not go into any detailed processing of the content of the audio.

そこで、本発明の目的は、入力された音声のア
クセント句境界を検出し、強調して発声されたア
クセント句のみを選択して蓄積するような、キー
ワード抽出機能をもつ音声編集装置を提供するこ
とにある。 SUMMARY OF THE INVENTION Therefore, an object of the present invention is to provide a speech editing device having a keyword extraction function that detects accent phrase boundaries in input speech and selects and stores only emphasized accent phrases. It is in.

（問題点を解決するための手段）前述の問題点を解決するために本発明が提供す
る音声編集装置は、入力された音声のピツチ周波
数の時系列を抽出するピツチ抽出手段と、前記入
力音声を一時記憶しておく波形一時メモリと、前
記ピツチ周波数の時系列から前記入力音声のアク
セント句の境界を検出する手段と、前記アクセン
ト句の強調度を算出する手段と、前記アクセント
句の強調度の大きさによりアクセント句を選択
し、選択されたアクセント句に対応する音声波形
を前記波形一時メモリから出力する手段とを有し
てなる。(Means for Solving the Problems) In order to solve the above-mentioned problems, an audio editing device provided by the present invention includes pitch extraction means for extracting a time series of pitch frequencies of input audio, and pitch extraction means for extracting a time series of pitch frequencies of input audio. a waveform temporary memory for temporarily storing the pitch frequency, means for detecting the boundary of the accent phrase of the input speech from the time series of the pitch frequencies, means for calculating the degree of emphasis of the accent phrase, and the degree of emphasis of the accent phrase. and means for selecting an accent phrase based on the magnitude of the accent phrase and outputting a speech waveform corresponding to the selected accent phrase from the waveform temporary memory.

（発明の原理）ここで本発明の原理について説明する。日本語
音声は、アクセント核を１つ保有する発声のまと
まりであるアクセント句により構成されている。
本発明による音声編集装置は、ピツチ周波数が谷
となつている区間よりアクセント句境界を推定
し、推定されたアクセント句の平均ピツチレベル
が、十分に大であるならば選択することによつ
て、強調されたアクセント句のみを抽出し、蓄
積・出力するものである。以下、本発明の原理を
例をもとに説明する。(Principle of the invention) The principle of the invention will now be explained. Japanese speech is composed of accent phrases, which are groups of utterances that have one accent nucleus.
The audio editing device according to the present invention estimates the accent phrase boundary from the interval where the pitch frequency is a valley, and selects the accent phrase if the estimated average pitch level of the accent phrase is sufficiently large. It extracts, stores, and outputs only accented phrases. Hereinafter, the principle of the present invention will be explained based on an example.

第２図は、「子供の時から損ばかりしている」
という発話のピツチ周波数の時間変化を表わした
例である。縦軸は、入力音声のピツチ周波数であ
り、横軸は、時間軸である。このピツチ周波数の
時系列は、例えば、「日本音響学会講演論文集」
（1977年10月）35〜36ページに「ピツチ周期系列
最適選択の一方式」と題して発表された論文に述
べられた方法によつて得ることができる。ピツチ
周波数の軌跡がなだらかな谷をなす部分をアクセ
ント句境界として検出する。なだらかな谷は、例
えば、ピツチの時系列p_(i)，ｉ＝１……Ｎの移動
平均 p_(i)＝ｐ（ｉ−１）＋2p（ｉ）＋ｐ（ｉ＋１）／４の極小となる部分として得ることができる。第２
図におけるａ点およびｂ点は、アクセント句境界
の例である。また、一点鎖線は、各アクセント句
の平均ピツチレベルをあらわしている。この平均
ピツチレベルが、例えば、あるしきい値θを超え
るならば、そのアクセント句を選択するようにす
る。 Figure 2 is ``I've been losing money ever since I was a child.''
This is an example showing the change in pitch frequency of the utterance over time. The vertical axis is the pitch frequency of the input audio, and the horizontal axis is the time axis. The time series of this pitch frequency is, for example, "Proceedings of the Acoustical Society of Japan"
(October 1977), pages 35-36, by the method described in the paper titled ``A method for optimal selection of pitch periodic sequences''. The part where the locus of the pitch frequency forms a gentle valley is detected as an accent phrase boundary. A gentle valley is, for example, the minimum of Pitch's time series p _(i) , i = 1...N moving average p _(i) = p (i - 1) + 2p (i) + p (i + 1) / 4 Can be obtained as part. Second
Points a and b in the figure are examples of accent phrase boundaries. Furthermore, the dashed-dotted line represents the average pitch level of each accented phrase. For example, if this average pitch level exceeds a certain threshold value θ, that accented phrase is selected.

このように本発明において、入力音声は、アク
セント句ごとに区切られ、それらのうち、ピツチ
レベルが十分大きいもののみが選択されて、蓄
積・出力されるので、音声の強調して発声された
部分を選択的に蓄積・出力でき、データ圧縮効果
のある音声編集装置が得られる。 In this way, in the present invention, the input speech is divided into accent phrases, and only those with sufficiently high pitch levels are selected, stored, and output, so that the parts of the speech that are uttered with emphasis can be An audio editing device that can selectively store and output data and has a data compression effect can be obtained.

（実施例）次に、図面を参照して本発明の実施例を説明す
る。(Example) Next, an example of the present invention will be described with reference to the drawings.

第１図は本発明の一実施例を示すブロツク図で
ある。入力音声１は、波形一時メモリ５に保持さ
れ、またピツチ抽出部２に供給される。ピツチ抽
出部２は、供給された入力音声１よりピツチ周波
数の時系列を計算し、アクセント句境界検出部３
に供給する。アクセント句境界検出部３は、供給
されたピツチ周波数の時系列より谷となる部分を
検出し、その部分の波形上でのアドレスを境界ア
ドレスメモリ４に出力する。また、アクセント句
境界検出部３は、アクセント句ごとに区切られた
ピツチ周波数の時系列をピツチレベル算出部６に
出力する。ピツチレベル算出部６は、各アクセン
ト句の平均ピツチレベルを算出する。アクセント
句選択部７は、ピツチレベル算出部６より供給さ
れた各アクセント句の平均ピツチレベルが十分大
きいか否かを判定し、アクセント句選択可否の情
報を出力する。波形一時メモリ５中に保持された
音声波形からは、境界アドレスメモリ４より供給
されたアクセント句境界の波形上でのアドレスの
情報と、アクセント句選択部７より供給されたア
クセント句選択可否の情報をもとに、選択された
アクセント句に対応する波形のみ、波形メモリ１
０に出力される。出力指令８の入力により、制御
回路９は、波形メモリより音声波形をアクセント
句ごとに境界処理部１１へ出力せしめる。境界処
理部１１は、供給されたアクセント句単位の音声
波形に適宜休止区間を付加して音声出力１２を出
力する。 FIG. 1 is a block diagram showing one embodiment of the present invention. The input audio 1 is held in the waveform temporary memory 5 and is also supplied to the pitch extraction section 2. The pitch extraction unit 2 calculates a time series of pitch frequencies from the input speech 1 supplied, and the accent phrase boundary detection unit 3 calculates a time series of pitch frequencies.
supply to. The accent phrase boundary detection unit 3 detects a valley portion from the time series of pitch frequencies supplied, and outputs the address on the waveform of that portion to the boundary address memory 4. Further, the accent phrase boundary detection section 3 outputs a time series of pitch frequencies divided for each accent phrase to the pitch level calculation section 6. The pitch level calculation unit 6 calculates the average pitch level of each accent phrase. The accent phrase selection section 7 determines whether the average pitch level of each accent phrase supplied from the pitch level calculation section 6 is sufficiently large, and outputs information on whether or not the accent phrase can be selected. From the speech waveform held in the temporary waveform memory 5, information on the address on the waveform of the accent phrase boundary supplied from the boundary address memory 4 and information on whether or not the accent phrase can be selected supplied from the accent phrase selection section 7 are obtained. Based on , only the waveform corresponding to the selected accent phrase is stored in waveform memory 1.
Output to 0. In response to the input of the output command 8, the control circuit 9 outputs the speech waveform from the waveform memory to the boundary processing unit 11 for each accented phrase. The boundary processing unit 11 adds a pause section as appropriate to the supplied speech waveform of each accent phrase, and outputs the speech output 12.

（発明の効果）本発明によれば、入力された音声のアクセント
句境界を検出し、強調して発声されたアクセント
句のみを選択して蓄積するような、キーワード抽
出機能をもつ音声編集装置が得られる。(Effects of the Invention) According to the present invention, there is provided a speech editing device having a keyword extraction function that detects accent phrase boundaries in input speech and selects and stores only emphasized accent phrases. can get.

[Brief explanation of drawings]

第１図は本発明の一実施例を示すブロツク図、
第２図は本発明の原理を説明するためにピツチ周
波数の時系列の一例を示す図である。２……ピツチ抽出部、３……アクセント句境界
検出部、４……境界アドレスメモリ、５……波形
一時メモリ、６……ピツチレベル算出部、７……
アクセント句選択部、９……制御回路、１０……
波形メモリ、１１……境界処理部。 FIG. 1 is a block diagram showing one embodiment of the present invention;
FIG. 2 is a diagram showing an example of a time series of pitch frequencies for explaining the principle of the present invention. 2... Pitch extraction unit, 3... Accent phrase boundary detection unit, 4... Boundary address memory, 5... Waveform temporary memory, 6... Pitch level calculation unit, 7...
Accent phrase selection section, 9... Control circuit, 10...
Waveform memory, 11... boundary processing section.

Claims

[Claims]

1. Pitch extraction means for extracting a time series of pitch frequencies of input speech, a temporary waveform memory for temporarily storing the input speech, and detecting boundaries of accent phrases of the input speech from the time series of pitch frequencies. means for calculating the degree of emphasis of the accent phrase; and means for selecting an accent phrase based on the degree of emphasis of the accent phrase, and outputting a speech waveform corresponding to the selected accent phrase from the waveform temporary memory. An audio editing device comprising means.