JP5228300B2

JP5228300B2 - Audio expansion device, audio expansion method, and program

Info

Publication number: JP5228300B2
Application number: JP2006218887A
Authority: JP
Inventors: 博康井手
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2006-08-10
Filing date: 2006-08-10
Publication date: 2013-07-03
Anticipated expiration: 2026-08-10
Also published as: JP2008046160A

Abstract

<P>PROBLEM TO BE SOLVED: To reduce degradation of voice quality by voice extension and contraction processing. <P>SOLUTION: A voice processing device 1101 extends and contracts a voice signal with magnification instructed via an input device 1111. Extension and contraction are applied for each predetermined unit waveform. A control section 1121 puts a priority order for performing insertion of a new unit waveform or replacing to the new unit waveform, for each predetermined unit waveform, and according to the order, the insertion or the replacing is performed, and thereby, the voice signal is gradually extended and contracted. The new unit waveform is created based on an original unit waveform, however the same unit waveform is not made continuous so that the voice after extension and contraction processing may not become unnatural. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、音声信号を時間領域で処理して音声信号を伸縮する音声伸張装置、音声伸張方法、及び、プログラムに関する。 The present invention relates to an audio expansion device , an audio expansion method , and a program that process an audio signal in the time domain to expand and contract the audio signal.

音声信号を変形する処理の１つとして、処理対象の音声信号の振幅や周波数特性を変更することなく、処理対象の音声信号の長さ（サンプル数）を伸縮する処理がある。 As one of the processes for transforming the audio signal, there is a process for expanding or contracting the length (number of samples) of the audio signal to be processed without changing the amplitude or frequency characteristic of the audio signal to be processed.

例えば、英会話教材で聞き取りにくい部分をゆっくり再生するような場面では、音声信号の伸張処理が必要である。 For example, in a scene where a portion difficult to hear in an English conversation teaching material is reproduced slowly, an audio signal expansion process is necessary.

一方、録音された会議や講演の内容を短時間で把握したいような場面では、音声信号の縮小処理が必要である。 On the other hand, in a scene where it is desired to grasp the contents of a recorded meeting or lecture in a short time, it is necessary to reduce the audio signal.

音声信号を伸張するために、連続する２つの部分波形に基づいて新たな部分波形を生成し、該連続する２つの部分波形の間に新たな部分波形を割り込ませる方法が開示されている（特許文献１）。以下では、音声信号を何らかの規則に従って切り分けた部分波形を、区分波形と呼ぶこともある。なお、かかる切り分けの方法としては、例えば、ピッチ波形を単位とする方法が開示されている（特許文献２）。 In order to expand an audio signal, a method of generating a new partial waveform based on two consecutive partial waveforms and interrupting the new partial waveform between the two consecutive partial waveforms is disclosed (patent). Reference 1). Hereinafter, a partial waveform obtained by dividing an audio signal according to some rules may be referred to as a divided waveform. For example, a method using a pitch waveform as a unit is disclosed as a method of such separation (Patent Document 2).

新たな区分波形の挿入により音声波形を伸張するには、最も簡単には、連続する２つの区分波形のいずれかをそのままコピーしたものを新たな区分波形とすればよい。 In order to extend the speech waveform by inserting a new segmented waveform, the simplest method is to use a copy of one of the two consecutive segmented waveforms as it is as a new segmented waveform.

しかしこのようにすると、伸張後の音声は、全く同一の区分波形が連続する時間帯が存在するために、不自然な音声になってしまう。 However, if this is done, the decompressed speech will be unnatural speech because there is a time zone in which exactly the same segmental waveform continues.

この短所は、新たな区分波形を、当該新たな区分波形の挿入先の前後の区分波形の中間的な波形、例えば重み付き平均波形、とすれば、解消され得る。 This disadvantage can be eliminated if the new segment waveform is an intermediate waveform of segment waveforms before and after the new segment waveform is inserted, for example, a weighted average waveform.

しかしそれも、伸張度が２倍を超える場合には、注意を要する。伸張度が２倍を超えると、元の音声信号における連続する２つの区分波形の境界のうちに、複数の新たな音声波形を割り込ませることを要する箇所が出現する。この場合、当該複数の新たな音声波形が同じものであるときには、結局、伸張後の音声は、全く同一の区分波形が連続する時間帯が存在するために、不自然な音声になってしまう。 However, it is necessary to be careful when the degree of expansion exceeds twice. When the degree of expansion exceeds twice, a portion that requires interrupting a plurality of new speech waveforms appears at the boundary between two consecutive segmented waveforms in the original speech signal. In this case, when the plurality of new speech waveforms are the same, the decompressed speech eventually becomes unnatural speech because there is a time zone in which exactly the same segmental waveform continues.

一方、音声信号を縮小するためには、最も簡単には、区分波形を適宜間引くことが考えられる。 On the other hand, in order to reduce the audio signal, the simplest method is to thin out the segmented waveforms as appropriate.

しかし、単純に区分波形を間引くと、元の音声信号においては連続していなかった区分波形が新たに隣接することなる。すると当該隣接箇所で音声信号の波形変化が急激なものとなってしまうので、縮小後の音声は、不自然な音声になってしまう。
特開２００５−２７５０１０号公報特開平９−２５８７６２号公報 However, if the segmented waveforms are simply thinned out, the segmented waveforms that are not continuous in the original audio signal are newly adjacent. Then, since the waveform change of the audio signal becomes abrupt in the adjacent portion, the reduced audio becomes unnatural audio.
JP 2005-275010 A JP-A-9-258762

以上のように、音声伸張時には、複数の同一波形が連続することによる音声品質の低下が懸念される。一方、音声縮小時には、音声信号に変化の急激な部分が生じることによる音声品質の低下が懸念される。 As described above, at the time of voice decompression, there is a concern that the voice quality is deteriorated due to a plurality of continuous identical waveforms. On the other hand, at the time of audio reduction, there is a concern that the audio quality is deteriorated due to a sudden change in the audio signal.

本発明は、上記問題点に鑑みてなされたもので、伸張度が２倍を超えた場合でも音声波形の劣化を減少させる音声伸張装置、音声伸張方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide an audio expansion device, an audio expansion method, and a program that reduce deterioration of an audio waveform even when the expansion degree exceeds twice .

本発明に係る音声伸張装置は、
区分波形の時系列である音声波形を記憶する記憶部と、
前記記憶部に記憶された音声波形の中の連続する２個の区分波形について、前方の区分波形に対しては時間的に直線的に増加する重み係数を、後方の区分波形に対しては時間的に直線的に減少する重み係数を、それぞれ用いて、重み付け加算することで１つの新規区分波形を生成する新規区分波形生成手段と、
前記新規区分波形生成手段で生成された新規区分波形を時間軸上で伸縮することにより、前記前方の区分波形の長さと前記後方の区分波形の長さの間でそれぞれ異なる長さとなる複数の挿入区分波形を生成する挿入区分波形生成手段と、
前記挿入区分波形生成手段で生成された複数の挿入区分波形を前記前方の区分波形と後方の区分波形の間に挿入する挿入手段と、
前記挿入手段により挿入区分波形が挿入されて伸張された音声波形を出力する音声波形出力手段と、
を具備することを特徴とする。 The audio decompression apparatus according to the present invention is
A storage unit for storing a speech waveform that is a time series of segmented waveforms;
With respect to two consecutive segment waveforms in the speech waveform stored in the storage unit, a weighting factor that increases linearly in time with respect to the front segment waveform and time with respect to the rear segment waveform. Each using a weighting factor that decreases linearly in each case, and a new segmented waveform generating means for generating one new segmented waveform by weighted addition ,
A plurality of insertions each having a different length between the length of the front segment waveform and the length of the rear segment waveform by expanding / contracting the new segment waveform generated by the new segment waveform generation means on the time axis Insertion segment waveform generation means for generating segment waveform,
Insertion means for inserting a plurality of insertion segment waveforms generated by the insertion segment waveform generation unit between the front segment waveform and the rear segment waveform;
A voice waveform output means for outputting a voice waveform expanded by inserting an insertion section waveform by the insertion means;
It is characterized by comprising.

本発明に係る音声伸張方法は、区分波形の時系列である音声波形を伸張する音声伸張方法であって、
前記音声波形の中の連続する２個の区分波形について、前方の区分波形に対しては時間的に直線的に増加する重み係数を、後方の区分波形に対しては時間的に直線的に減少する重み係数を、それぞれ用いて、重み付け加算することで１つの新規区分波形を生成する新規区分波形生成ステップと、
前記新規区分波形生成ステップで生成された新規区分波形を時間軸上で伸縮することにより、前記前方の区分波形の長さと前記後方の区分波形の長さの間でそれぞれ異なる長さとなる複数の挿入区分波形を生成する挿入区分波形生成ステップと、
前記挿入区分波形生成ステップで生成された複数の挿入区分波形を前記前方の区分波形と後方の区分波形の間に挿入する挿入ステップと、
からなることを特徴とする。 A speech decompression method according to the present invention is a speech decompression method for decompressing a speech waveform that is a time series of segmented waveforms,
For two consecutive segment waveforms in the speech waveform, a weighting factor that increases linearly in time for the preceding segment waveform and decreases linearly in time for the rear segment waveform. A new segment waveform generation step for generating one new segment waveform by weighting and adding the weighting factors to be used, respectively,
A plurality of insertions having different lengths between the length of the front segment waveform and the length of the rear segment waveform by expanding and contracting the new segment waveform generated in the new segment waveform generation step on the time axis. An insertion segment waveform generation step for generating a segment waveform ; and
An insertion step of inserting a plurality of insertion segment waveforms generated in the insertion segment waveform generation step between the front segment waveform and the rear segment waveform;
It is characterized by comprising.

本発明にかかるプログラムは、The program according to the present invention is:
記憶部に記憶された区分波形の時系列である音声波形を伸張する音声伸張装置のコンピュータを、A computer of a voice expansion device that expands a voice waveform that is a time series of segmented waveforms stored in a storage unit,
前記記憶部に記憶された音声波形の中の連続する２個の区分波形について、前方の区分波形に対しては時間的に直線的に増加する重み係数を、後方の区分波形に対しては時間的に直線的に減少する重み係数を、それぞれ用いて、重み付け加算することで１つの新規区分波形を生成する新規区分波形生成手段、 With respect to two consecutive segment waveforms in the speech waveform stored in the storage unit, a weighting factor that increases linearly in time with respect to the front segment waveform and time with respect to the rear segment waveform. Each newly using a weighting factor that decreases linearly, and a new segmented waveform generating means for generating one new segmented waveform by weighted addition,
前記新規区分波形生成手段で生成された新規区分波形を時間軸上で伸縮することにより、前記前方の区分波形の長さと前記後方の区分波形の長さの間でそれぞれ異なる長さとなる複数の挿入区分波形を生成する挿入区分波形生成手段、A plurality of insertions each having a different length between the length of the front segment waveform and the length of the rear segment waveform by expanding / contracting the new segment waveform generated by the new segment waveform generation means on the time axis Insertion section waveform generation means for generating section waveforms,
前記挿入区分波形生成手段で生成された複数の挿入区分波形を前記前方の区分波形と後方の区分波形の間に挿入する挿入手段、Insertion means for inserting a plurality of insertion segment waveforms generated by the insertion segment waveform generation unit between the front segment waveform and the rear segment waveform;
前記挿入手段により挿入区分波形が挿入されて伸張された音声波形を出力する音声波形出力手段、A voice waveform output means for outputting a voice waveform expanded by inserting an insertion section waveform by the insertion means;
として機能させることを特徴とする。It is made to function as.

本発明によれば、音声波形を２倍以上に伸張する場合にも、伸張後の音声波形の劣化を減少させることができる。 According to the present invention, it is possible to reduce degradation of a speech waveform after decompression even when the speech waveform is stretched by a factor of two or more .

（実施形態１）
本発明に係る実施形態１を、以下、図面を参照して説明する。図１は、本発明の実施形態１に係る音声処理装置、特に、音声伸張装置、の構成を示すブロック図である。 (Embodiment 1)
Embodiment 1 according to the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a speech processing apparatus, particularly a speech decompression apparatus, according to Embodiment 1 of the present invention.

図１に示すように、音声処理装置１１０１は、例えば、コンピュータなどの情報処理装置から構成される。入力装置１１１１と出力装置１１１３と記録媒体１１１５とが音声処理装置１１０１に接続される。音声処理装置１１０１は、入力装置１１１１から指示を受けて、記録媒体１１１５から入力された音声波形データを指定された倍数の長さに伸張し、記録媒体１１１５に出力する。 As shown in FIG. 1, the voice processing device 1101 is configured by an information processing device such as a computer, for example. An input device 1111, an output device 1113, and a recording medium 1115 are connected to the sound processing device 1101. In response to an instruction from the input device 1111, the audio processing device 1101 expands the audio waveform data input from the recording medium 1115 to a specified multiple length, and outputs it to the recording medium 1115.

ここで、音声波形データとは、アナログ音声が所定のサンプリング周波数（例えば、８ｋＨｚ）で量子化されているサンプル値データである。 Here, the audio waveform data is sample value data in which analog audio is quantized at a predetermined sampling frequency (for example, 8 kHz).

記録媒体１１１５は、例えば、ＣＤ−ＲＷ（Compact Disk ReWritable）ディスクなどであり、元となる音声波形データを格納したり、該音声波形データを指定された倍数の長さに伸張した音声波形データを格納したりする。 The recording medium 1115 is, for example, a CD-RW (Compact Disk ReWritable) disk or the like. The recording medium 1115 stores the original audio waveform data or the audio waveform data obtained by expanding the audio waveform data to a specified multiple length. Or store.

音声処理装置１１０１は、制御部１１２１と、入力制御部１１３１と、出力制御部１１３３と、プログラム格納部１１３５と、記憶部１１３７と、データ記録部１１３９とを備える。 The voice processing device 1101 includes a control unit 1121, an input control unit 1131, an output control unit 1133, a program storage unit 1135, a storage unit 1137, and a data recording unit 1139.

制御部１１２１は、例えば、ＣＰＵ（Central Processing Unit：中央演算処理装置）、レジスタ、ＲＡＭ（Random Access Memory）等を備え、プログラム格納部１１３５に予め格納されている動作プログラムに基づいて、音声処理装置１１０１の各部を制御したり、データ記録部１１３９を介して、記録媒体１１１５に格納されている音声波形データを読み出したり、伸張した音声波形データを記録媒体１１１５に書き込んだり、後述する波形伸張処理などを実行したりする。 The control unit 1121 includes, for example, a CPU (Central Processing Unit), a register, a RAM (Random Access Memory), and the like, and is based on an operation program stored in advance in the program storage unit 1135. 1101 is controlled, the audio waveform data stored in the recording medium 1115 is read out via the data recording unit 1139, the expanded audio waveform data is written in the recording medium 1115, a waveform expansion process to be described later, and the like. Or execute.

入力制御部１１３１は、例えば、キーボードやポインティングデバイス、等の入力装置１１１１を接続し、入力装置１１１１から入力された制御部１１２１への指示などを受け付けて制御部１１２１に伝達する。 The input control unit 1131 connects, for example, an input device 1111 such as a keyboard or a pointing device, receives an instruction to the control unit 1121 input from the input device 1111, and transmits the instruction to the control unit 1121.

出力制御部１１３３は、例えば、ディスプレイやスピーカ、等の出力装置１１１３を接続し、制御部１１２１の処理結果などを必要に応じて出力装置１１１３に出力する。 The output control unit 1133 connects, for example, an output device 1113 such as a display or a speaker, and outputs the processing result of the control unit 1121 to the output device 1113 as necessary.

プログラム格納部１１３５は、ＲＯＭ（Read Only Memory）などによって構成され、制御部１１２１が実行するプログラムを格納する。 The program storage unit 1135 is configured by a ROM (Read Only Memory) or the like, and stores a program executed by the control unit 1121.

記憶部１１３７は、例えば、ハードディスク装置やＲＡＭ（Random Access Memory）などの記憶装置から構成され、データ記録部１１３９から送られてきた音声波形データ、及び波形伸張処理後の音声波形データを一時記憶する。記憶部１１３７は、一時記憶した音声波形データを、制御部１１２１に送り出したり、制御部１１２１を介してデータ記録部１１３９に送り出したりする。 The storage unit 1137 includes, for example, a storage device such as a hard disk device or a RAM (Random Access Memory), and temporarily stores the audio waveform data sent from the data recording unit 1139 and the audio waveform data after waveform expansion processing. . The storage unit 1137 sends the temporarily stored audio waveform data to the control unit 1121 or sends it to the data recording unit 1139 via the control unit 1121.

データ記録部１１３９は、例えば、ＣＤ−ＲＷドライブなどであって、制御部１１２１からの指示に従って、記録媒体１１１５に格納されている音声波形データを読み出す。また、伸張された音声波形データを記録媒体１１１５に書き込む。 The data recording unit 1139 is, for example, a CD-RW drive, and reads audio waveform data stored in the recording medium 1115 in accordance with an instruction from the control unit 1121. Further, the expanded audio waveform data is written in the recording medium 1115.

制御部１１２１は、記憶部１１３７に一時記憶された音声波形データに対して、波形伸張処理を行い、伸張後の音声波形データを記憶部１１３７に格納する。波形伸張処理において、制御部１１２１は、音声波形データを繰り返し単位でいくつかの区分波形に分割する。そして、各区分波形相互の境界に、該境界の前後の区分波形に基づいて、新たな区分波形を生成し、該区分波形を該境界に挿入することにより、音声波形データを伸張していく。かかる挿入動作の繰り返しにより、最終的には、指定の倍数となるような音声波形データが生成される。波形データは離散データであるから、指定の倍数がＭ倍である場合には、伸張後の音声波形データは、処理対象の元となる音声波形データのサンプル数のＭ倍のサンプル数を有する。 The control unit 1121 performs waveform expansion processing on the audio waveform data temporarily stored in the storage unit 1137, and stores the expanded audio waveform data in the storage unit 1137. In the waveform expansion process, the control unit 1121 divides the audio waveform data into several divided waveforms in units of repetition. Then, a new segment waveform is generated at the boundary between the segment waveforms based on the segment waveforms before and after the boundary, and the segment waveform is inserted into the boundary to expand the voice waveform data. By repeating such an insertion operation, audio waveform data that ultimately becomes a specified multiple is generated. Since the waveform data is discrete data, when the designated multiple is M times, the decompressed speech waveform data has M times as many samples as the number of samples of the speech waveform data to be processed.

本実施形態においては、元となる音声波形は、制御部１１２１により、繰り返し単位をピッチとして分割される。この結果、元の音声信号は、音声処理装置１１０１の内部においては、図２に示すように、Ｎ個のピッチ波形Ｓ₀、Ｓ₁、・・・、Ｓ_N-1が連続したものとして扱われる。 In the present embodiment, the original speech waveform is divided by the control unit 1121 using the repetition unit as a pitch. As a result, the original audio signal is treated as a series of _N pitch waveforms S ₀ , S ₁ ,..., S _N-1 as shown in FIG. Is called.

図２に示すとおり、サンプリングの時間間隔をｑとする。例えばサンプリング周波数が上述の８ｋＨｚであれば、ｑはその逆数である１２．５μｓとなる。 As shown in FIG. 2, let the sampling time interval be q. For example, if the sampling frequency is 8 kHz as described above, q is 12.5 μs which is the reciprocal thereof.

また、ピッチ波形Ｓ_j（０≦ｊ≦Ｎ−１）は、音声処理装置１１０１の内部では、ｐｌ（ｊ）個の離散データで表現されている。よって、ピッチ波形Ｓ_ｊは、時間長ｐｌ（ｊ）×ｑを有する。以後このことを、単に、ピッチ波形の長さがｐｌ（ｊ）である、と述べることがある。 Further, the pitch waveform S _j (0 ≦ j ≦ N−1) is expressed by pl (j) discrete data inside the audio processing device 1101. Therefore, pitch waveform S _j has time length pl (j) × q. Hereinafter, this may simply be described as the length of the pitch waveform being pl (j).

このように、サンプリングの時間間隔ｑは一定であるが、各ピッチ波形の長さは、一般には、概ね近い値ではあるものの一定ではない。このことは、後に詳しく説明するように、音声信号を２倍以上に伸張する際に役立つ。 Thus, although the sampling time interval q is constant, the length of each pitch waveform is generally not nearly constant although it is generally a close value. This is useful when the audio signal is expanded more than twice, as will be described in detail later.

ピッチ波形Ｓ_ｊ（０≦ｊ≦Ｎ−１）は、ｐｌ（ｊ）個のサンプリング点の波高の列｛ｓ_j、0、ｓ_j、1、・・・、ｓ_j、i、・・・、ｓ_j、pl(j)-1｝で表現されている。このことを、以下では、Ｓ_j＝｛ｓ_j、0、・・・、ｓ_j、pl(j)-1｝と表す。 The pitch waveform S _j (0 ≦ j ≦ N−1) is a series of wave heights of pl (j) sampling points {s _{j, 0} , s _{j, 1} ,..., S _{j, i} ,. , S _{j, pl (j) -1} }. This is expressed as S _j = {s _{j, 0} ,..., S _{j, pl (j) −1} } below.

例えば、音声信号の時間変化を表した図２においては、ｊ番目のピッチ波形Ｓ_ｊのｉ番目のサンプリング点（図２の白四角印）は、ピッチ波形Ｓ_ｊの始点から時間長（ｉ−１）×ｑの位置にあり、波高はｓ_j、i-1である。より具体的には例えば、音声信号の始点を時間の原点にとった場合、最初のピッチ波形が継続中の時刻７ｑにおける波高はｓ_0、7である（図２の左から８番目の白丸印）。 For example, in FIG. 2 showing the time change of the audio signal, the i-th sampling point (white square mark in FIG. 2) of the j-th pitch waveform S _j is a time length (i−) from the start point of the pitch waveform S _j. 1) At the position of xq, the wave height is s _{j, i-1} . More specifically, for example, when the start point of the audio signal is taken as the origin of time, the wave height at time 7q when the first pitch waveform is continued is s ₀ , 7 (the eighth white circle mark from the left in FIG. 2). ).

波形伸張は、ピッチ単位で行う。基本的には、隣接する２つのピッチ波形の間に、なんらかのピッチ波形を割り込ませることを繰り返すことにより、伸張を行う。かかる繰り返しによれば、原理的には、元の音声波形データを任意の長さに伸張することが可能である。 Waveform expansion is performed in pitch units. Basically, the expansion is performed by repeatedly interrupting some pitch waveform between two adjacent pitch waveforms. According to such repetition, in principle, the original speech waveform data can be expanded to an arbitrary length.

割り込ませるピッチ波形は、割り込まれる２個のピッチ波形の一方又は両方となんらかの相関を有していることが望ましいのは明らかである。仮に、割り込まれる２個のピッチ波形と全く相関のない唐突な形状のピッチ波形を割り込ませれば、その部分で音声信号に滑らかさが失われ、音質が劣化してしまう。 Obviously, the pitch waveform to be interrupted preferably has some correlation with one or both of the two pitch waveforms to be interrupted. If an abrupt pitch waveform having no correlation with two interrupted pitch waveforms is interrupted, the sound signal is lost at that portion, and the sound quality deteriorates.

割り込ませる新たなピッチ波形に、割り込まれる２個の既存のピッチ波形の一方又は両方との相関関係を有せしめるためには、例えば、２個の既存のピッチ波形のうちどちらかを単にコピーしたものを、新たなピッチ波形とすればよい。 To make the new pitch waveform to be interrupted correlate with one or both of the two existing pitch waveforms to be interrupted, for example, simply copy one of the two existing pitch waveforms May be a new pitch waveform.

しかし、このようにすると、新たなピッチ波形を割り込ませた結果、全く同一のピッチ波形が連続することになってしまう。一般に、人の音声を伸張対象とする場合には、全く同一の波形を連続させることによる伸張を行うと、不自然な音声になってしまう。 However, if it does in this way, as a result of interrupting a new pitch waveform, the completely same pitch waveform will be continued. In general, when human speech is to be decompressed, if decompression is performed by continuing identical waveforms continuously, the sound becomes unnatural.

以上のように、ピッチ波形単位での伸張に際しては、元の２つのピッチ波形の間に、全く唐突な波形を挿入するのが不適切であることはもちろんのこと、かかる２つのピッチ波形のいずれかの単なるコピーを挿入することも、望ましくない。つまり、挿入すべき新たな波形は、該挿入前には隣接関係にあった２個のピッチ波形の形状と比べて、全く無関係な形状であってはならないし、元の２個のピッチ波形の一方と似すぎていてもいけない。 As described above, when stretching in units of pitch waveforms, it is not appropriate to insert a completely abrupt waveform between the two original pitch waveforms. It is also undesirable to insert such a simple copy. In other words, the new waveform to be inserted must not have a completely unrelated shape compared to the shape of the two pitch waveforms that were adjacent before the insertion, and the original two pitch waveforms Don't be too similar to one.

かかる要請に応えるべく、図３に示すように、本実施形態においては、新規ピッチ波形の挿入予定先を境界としていた元の２つのピッチ波形（図３（ａ））をそれぞれ適当に波形変形処理し（図３（ｂ））、かかる処理済みの波形を重ね合わせることにより、元の２つのピッチ波形の中間的な形状を有する新規ピッチ波形を生成し（図３（ｃ））、該新規ピッチ波形を挿入して音声信号を伸張する（図３（ｄ））。 In order to meet such a demand, as shown in FIG. 3, in the present embodiment, the original two pitch waveforms (FIG. 3A) having the insertion destination of the new pitch waveform as a boundary are appropriately subjected to waveform deformation processing. Then, a new pitch waveform having an intermediate shape between the two original pitch waveforms is generated by superimposing such processed waveforms (FIG. 3B), and the new pitch is generated. An audio signal is expanded by inserting a waveform (FIG. 3D).

上述の波形変形処理（図３（ｂ））は、その後の重ね合わせの結果生成される新規ピッチ波形（図３（ｃ））が元の２つのピッチ波形（図３（ａ））の中間的な形状となるものであれば任意の処理でかまわない。後に、重み付けを用いた簡単な処理方法を説明する。 In the waveform deformation process (FIG. 3B), the new pitch waveform (FIG. 3C) generated as a result of subsequent superposition is intermediate between the two original pitch waveforms (FIG. 3A). Any processing can be used as long as it has a simple shape. A simple processing method using weighting will be described later.

元のＮ個のピッチ波形の全ての境界に新たなピッチ波形をひとつずつ挿入したとしても、伸張度は２倍にしかならない。２倍より長く伸張したい場合には、元のピッチ波形の境界の少なくとも１カ所以上で、新規ピッチ波形を複数挿入する必要が生じる。 Even if new pitch waveforms are inserted one by one at all boundaries of the original N pitch waveforms, the degree of expansion is only doubled. If it is desired to extend more than twice, it is necessary to insert a plurality of new pitch waveforms at at least one boundary of the original pitch waveform boundary.

ここで、上述の、全く同一の波形が連続するのは避けるべきであるという要請が、再び問題となる。すなわち、元のピッチ波形の境界のひとつに新規ピッチ波形を複数挿入する場合、かかる複数の新規ピッチ波形は、元のピッチ波形のいずれとも同一でないことはもちろんのこと、新規ピッチ波形同士であっても、少なくとも伸張処理後に隣接関係になるものについては、同一であってはならない。このことは、元の２つのピッチ波形から図３に示す手順により新規波形を生成する場合に、新規波形の生成の仕方が複数要求されることを意味する。生成の仕方がひとつで済まないため、処理の複雑度が増す。 Here, the above-mentioned request that the identical waveform should be avoided again becomes a problem. That is, when a plurality of new pitch waveforms are inserted at one of the boundaries of the original pitch waveform, the plurality of new pitch waveforms are not identical to any of the original pitch waveforms, However, it should not be the same for at least the adjacent relationship after the decompression process. This means that when a new waveform is generated from the original two pitch waveforms by the procedure shown in FIG. 3, a plurality of methods for generating the new waveform are required. Since only one generation method is required, the processing complexity increases.

そこで、以下では、まず、より簡単な処理である、伸張が２倍より小さい場合の処理について説明する。その後、２倍より大きい場合の処理について説明する。 Therefore, in the following, a simpler process, i.e., a process when the expansion is smaller than twice will be described. Then, the process in the case of larger than 2 times is demonstrated.

伸張が２倍より小さい場合には、元のピッチ波形の境界のうち、いくつかには新規波形を挿入し、残りのいくつかには何も挿入しないことになる。 If the expansion is smaller than twice, new waveforms are inserted into some of the boundaries of the original pitch waveform, and nothing is inserted into the remaining some.

そこで、音声信号を指定された伸張度に達せしめるために、元のピッチ波形の境界のうち、新規ピッチ波形を挿入すべき境界と、そうでない境界とを判別する必要が生じる。 Therefore, in order to make the audio signal reach the specified degree of expansion, it is necessary to distinguish between the boundary where the new pitch waveform is to be inserted and the boundary where the new pitch waveform is not inserted among the boundaries of the original pitch waveform.

このとき、元のピッチ波形の各境界に何らかの基準により優先順位を割り当て、優先順位の高い所から順番に新規ピッチ波形を挿入して音声信号を伸張していき、指定された伸張度まで伸びた時点で挿入を止める、という方針が簡便である。 At this time, a priority order is assigned to each boundary of the original pitch waveform according to some criteria, and a new pitch waveform is inserted in order from a higher priority order to expand the audio signal, extending to the specified degree of expansion. The policy of stopping the insertion at the time is simple.

そこで次は、優先順位をいかなる基準で定めるかが問題となる。定性的には、連続する元の２つのピッチ波形が似ているような境界ほど優先順位を高くすればよいのは明らかである。かかる境界においては、もともとピッチ波形の時間変化が緩やかであるため、人工的な波形である新規ピッチ波形を挿入しても、さほど目立たない、つまり、音声品質が劣化しづらいと考えられるからである。 Therefore, the next issue is how to determine the priority order. Qualitatively, it is obvious that the priority should be set higher at the boundary where two consecutive original pitch waveforms are similar. At this boundary, since the time variation of the pitch waveform is originally gentle, even if a new pitch waveform, which is an artificial waveform, is inserted, it is not so noticeable, that is, it is considered that the voice quality is hardly deteriorated. .

するとさらに次は、２つのピッチ波形がどの程度似ているかを定量的に示す指標をいかに定義するかが問題となる。新規ピッチ波形の挿入の優先順位を決定するのに足りれば、いかなる指標でもよいが、本実施例では、簡便さを重視して、平均２乗誤差を採用する。そして、これを乖離度と称することにする。乖離度が小さい境界ほど、該境界をなす２つのピッチ波形は似ている、すなわち、ピッチ波形の時間変化が緩やかであるといえる。よって、乖離度の小さい境界から優先的に、新規ピッチ波形の挿入を行う。図２のように、ピッチ波形Ｓ_ｊとＳ_ｊ＋１との乖離度ｅ_ｊは、
ｅ_j＝｛（ｓ_j、0−ｓ_j+1、０）²＋
・・・＋（ｓ_j、pl(j)-1−ｓ_{j+1、pl(j)-1}）²｝／ｐｌ（ｊ）
のように定義する。 Then, the next problem is how to define an index that quantitatively indicates how similar the two pitch waveforms are. Any index may be used as long as it determines the priority for inserting a new pitch waveform. However, in this embodiment, the mean square error is adopted with emphasis on simplicity. This is called the degree of divergence. It can be said that the smaller the degree of divergence is, the two pitch waveforms forming the boundary are similar, that is, the time change of the pitch waveform is more gradual. Therefore, a new pitch waveform is inserted preferentially from a boundary with a small deviation. As shown in FIG. 2, the deviation degree e _j between the pitch waveforms S _j and S _{j + 1} is
e _j = {(s _{j, 0} −s _{j + 1,0} ) ² +
... + (s _{j, pl (j) -1} -s _{j + 1, pl (j) -1} ) ² } / pl (j)
Define as follows.

乖離度を用いた音声信号伸張処理の具体的な手順を、図４を参照しつつ説明する。すでに図１の記憶部１１３７には、処理対象となる音声信号がＮ個のピッチ波形Ｓ₀、・・・、Ｓ_N-1に分割された状態で格納されているとする。各ピッチ波形Ｓ_j（０≦ｊ≦Ｎ−１）は、サンプリング時間間隔ｑで採取された波高の列｛ｓ_j、0、・・・、ｓ_j、pl(j)-1｝である。 A specific procedure of audio signal expansion processing using the degree of divergence will be described with reference to FIG. It is assumed that the storage unit 1137 in FIG. 1 has already stored an audio signal to be processed in a state of being divided into _N pitch waveforms S ₀ ,. Each pitch waveform S _j (0 ≦ j ≦ N−1) is a sequence of wave heights {s _{j, 0} ,..., S _{j, pl (j) −1} } collected at the sampling time interval q.

まず、制御部１１２１は、記憶部１１３７に、伸張後ピッチ波形を構成要素とする構造体Ｓ_magnifiedを格納する領域を確保する。そして、Ｓ_magnifiedの初期値Ｓ_initとしては、記憶部１１３７に格納されている元の音声信号のピッチ波形の列｛Ｓ₀、・・・、Ｓ_N-1｝をそのままコピーしたものを採用する。すなわち、Ｓ_magnified＝Ｓ_init＝｛Ｓ₀、・・・、Ｓ_N-1｝とする（ステップＳ１４０１）。 First, the control unit 1121 secures an area in the storage unit 1137 for storing the structure S _magnified having the expanded pitch waveform as a component. As the initial value S _init of S _magnified, a value obtained by directly copying the sequence {S ₀ ,..., S _N-1 } of the pitch waveform of the original audio signal stored in the storage unit 1137 is employed. . That is, S _magnified = S _init = {S ₀ ,..., S _N-1 } (step S1401).

上述のように、伸張の倍率が２倍より小さいか大きいかにより、処理の手順は異なる。そこで、倍率が２倍より小さいかどうかを判別し（ステップＳ１４０３）、２倍より大きい場合には（ステップＳ１４０３；Ｎｏ）、後に詳しく説明する２倍以上の伸張処理を行う（ステップＳ１４１９）。一方、倍率が２倍より小さいと判別された場合には（ステップＳ１４０３；Ｙｅｓ）、新規波形の割り込み先を決定するステップＳ１４０５に進む。 As described above, the processing procedure differs depending on whether the expansion ratio is smaller or larger than two. Therefore, it is determined whether or not the magnification is smaller than 2 times (step S1403). If it is larger than 2 times (step S1403; No), an expansion process of 2 times or more which will be described in detail later is performed (step S1419). On the other hand, if it is determined that the magnification is smaller than 2 (step S1403; Yes), the process proceeds to step S1405 for determining an interrupt destination of a new waveform.

新規波形の割り込み先の決定には、上述のように、隣接ピッチ波形乖離度ｅ₀、・・・、ｅ_Nを利用する。乖離度はあらかじめ計算され記憶部１１３７に格納されているものとする。乖離度の小さいピッチ波形境界ほど優先して新規ピッチ波形を挿入することになるため、制御部１１２１は、乖離度を昇順にならべたときの順番を表す変数ｋをレジスタにカウンタとして格納することとし、初期値は１とする（ステップＳ１４０５）。乖離度ｅ₀、・・・、ｅ_Nを昇順に並べ替えたとき、例えばｅ_jがｋ番目になるとすれば、それは、ピッチ波形Ｓ_jとピッチ波形Ｓ_j+1との境界が、新規ピッチ波形の挿入先としてはｋ番目の候補となることを意味する。 As described above, adjacent pitch waveform divergence e ₀ ,..., E _N is used to determine the interrupt destination of the new waveform. The degree of divergence is calculated in advance and stored in the storage unit 1137. Since a pitch waveform boundary with a smaller divergence degree is preferentially inserted into a new pitch waveform, the control unit 1121 stores a variable k indicating the order when the divergence degrees are arranged in ascending order as a counter in the register. The initial value is 1 (step S1405). Discrepancy e _0, · · ·, when rearranging the e _N in ascending order, for example, if e _j is the k-th, it is a boundary between the pitch waveforms S _j pitch waveforms S _{j + 1,} a new pitch This means that the waveform is inserted into the kth candidate.

次に、制御部１１２１は、記憶部１１３７に格納されている乖離度ｅ₀、・・・、ｅ_Nのうち、ｋ番目に小さいものを検索し、その添字を取り出し、ｋが格納されているレジスタとは別のカウンタ用レジスタにロードする（ステップ１４０７）。例えば、ｋ番目に小さい乖離度がｅ_ｊであれば、ｊをロードする。なお、このとき、乖離度のソート作業が必要となるが、かかる作業は、バブルソート等、任意の既知のソート法を用いることとする。 Next, the control unit 1121 searches for the kth smallest one of the divergence degrees e ₀ ,..., E _N stored in the storage unit 1137, extracts the subscript, and stores k. A counter register different from the register is loaded (step 1407). For example, a small discrepancy in the k th if e _j, loading j. At this time, a sort operation for the degree of divergence is required. For this work, any known sort method such as bubble sort is used.

上述の例のとおり、ステップ１４０７においてｊがロードされたとすれば、ピッチ波形Ｓ_jとピッチ波形Ｓ_j+1との境界に新規ピッチ波形を挿入することになる。新規ピッチ波形は、図３に示したとおり、元の２つのピッチ波形から生成される。よって、制御部１１２１は、プログラム格納部１１３５に格納されたプログラムに従って元の２つのピッチ波形に基づいた新規ピッチ波形を生成するに際して、まず、元となるピッチ波形Ｓ_jとＳ_j+1とがどのようなものであるかを調べる必要がある。そこで、制御部１１２１は、汎用レジスタに、記憶部１１３７からピッチ波形Ｓ_jとＳ_j+1とをロードする（ステップＳ１４０９）。 As described above, if j is loaded in step 1407, a new pitch waveform is inserted at the boundary between pitch waveform S _j and pitch waveform S _{j + 1} . The new pitch waveform is generated from the original two pitch waveforms as shown in FIG. Therefore, when the control unit 1121 generates a new pitch waveform based on the original two pitch waveforms in accordance with the program stored in the program storage unit 1135, first, the original pitch waveforms S _j and S _{j + 1} are obtained. You need to find out what it is. Therefore, the control unit 1121 loads the pitch waveforms S _j and S _{j + 1} from the storage unit 1137 to the general-purpose register (step S1409).

次に、制御部１１２１は、ピッチ波形Ｓ_jの波高列データｓ_j、0、・・・、ｓ_j、pl(j)-1と、ピッチ波形Ｓ_j+1の波高列データｓ_j+1、0、・・・、ｓ_{j+1、pl(j+1)-1}と、から、新しいピッチ波形Ｄ_j＝｛ｄ_j、0、・・・、ｄ_j、pl(j)-1｝を、プログラム格納部１１３５に格納されたプログラムに従って生成する（Ｓ１４１１）。 Next, the control unit 1121, the wave height column data s _j of pitch waveforms _{_{S j, 0, ···, s}} j, pl (j) _-1, wave height column of pitch waveforms S _{j + 1} data s _{j + 1 , 0} ,..., S _{j + 1, pl (j + 1) −1} and new pitch waveform D _j = {d _{j, 0} ,..., D _{j, pl (j) −1} } Is generated according to the program stored in the program storage unit 1135 (S1411).

前述のように、新しいピッチ波形Ｄ_jは、元となる２つのピッチ波形Ｓ_jとＳ_j+1との中間的な形を有するものでなければならない。一方で、中間的な形といえるものが生成されれば、いかなる生成法であってもよい。本実施形態においては、重み付け加算により、新たな波形を生成するものとする。重み付け加算には、計算が簡単であるという利点がある。 As described above, the new pitch waveform D _j must have an intermediate form between the two original pitch waveforms S _j and S _{j + 1} . On the other hand, any generation method may be used as long as an intermediate form is generated. In the present embodiment, a new waveform is generated by weighted addition. Weighted addition has the advantage of being easy to calculate.

具体的には、元の２つのピッチ波形に図３（ａ）の鎖線で示すような重み付けを行ってから、重ね合わせる。伸張後に新規ピッチ波形からみて時間的に過去のピッチ波形となるＳ_ｊには、０から始まり１で終わる、直線的に変化する重み付け係数を乗じることにより、図３（ｂ）の左側に示すような波形
｛ｓ_j、0×０／（ｐｌ（ｊ）−１）、ｓ_j、1×１／（ｐｌ（ｊ）−１）、・・・、ｓ_j、pl(j)-1×（ｐｌ（ｊ）−１）／（ｐｌ（ｊ）−１）｝
を生成する。一方、伸張後に新規ピッチ波形からみて時間的に未来のピッチ波形となるＳ_j+1には、１から始まり０で終わる、直線的に変化する重み付け係数を乗じることにより、図３（ｂ）の右側に示すような波形
｛ｓ_j+1、0×（ｐｌ（ｊ）−１）／（ｐｌ（ｊ）−１）、ｓ_j+1、1×（ｐｌ（ｊ）−２）／（ｐｌ（ｊ）−１）、・・・、ｓ_{j+1、pl(j)-1}×０／（ｐｌ（ｊ）−１）｝
を生成する。その後、重み付けの完了した２つの波形を重ね合わせて新しいピッチ波形
Ｄ_j＝｛ｄ_j、i（０≦ｉ≦ｐｌ（ｊ）−１）｜ｄ_j、i＝（ｓ_j、i×ｉ＋ｓ_j+1、i×（ｐｌ（ｊ）−１−ｉ））／（ｐｌ（ｊ）−１）｝
を生成する。 Specifically, the original two pitch waveforms are weighted as indicated by the chain line in FIG. As shown in the left side of FIG. 3B, S _j that becomes a past pitch waveform in time when viewed from the new pitch waveform after expansion is multiplied by a linearly changing weighting coefficient starting from 0 and ending at 1. Waveform {s _{j, 0} × 0 / (pl (j) −1), s _{j, 1} × 1 / (pl (j) −1),..., S _{j, pl (j) −1} × ( pl (j) -1) / (pl (j) -1)}
Is generated. On the other hand, by multiplying S _{j + 1} , which becomes a pitch waveform of the future in time as viewed from the new pitch waveform after expansion, by a linearly changing weighting coefficient starting with 1 and ending with 0, FIG. Waveform as shown on the right side {s _{j + 1, 0} × (pl (j) −1) / (pl (j) −1), s _{j + 1, 1} × (pl (j) −2) / (pl (J) -1), ..., _{sj + 1, pl (j) -1} * 0 / (pl (j) -1)}
Is generated. Thereafter, the two weighted waveforms are superposed to form a new pitch waveform D _j = {d _{j, i} (0 ≦ i ≦ pl (j) −1) | d _{j, i} = (s _{j, i} × i + s _{j +1, i} * (pl (j) -1-i)) / (pl (j) -1)}
Is generated.

図３（ａ）の鎖線で示すような山状の重み付け係数を用いる理由は、挿入箇所に近い波形の形状ほど重視して新しいピッチ波形を作れば、人工的な波形を挿入することにより生じ得る音声としての不自然さを最小限に止めることが期待できるからである。 The reason for using the mountain-shaped weighting coefficient as shown by the chain line in FIG. 3A can be caused by inserting an artificial waveform if a new pitch waveform is created with an emphasis on the shape of the waveform closer to the insertion location. This is because it can be expected to minimize the unnaturalness of speech.

新しいピッチ波形Ｄ_jの生成が完了したら、図３（ｄ）に示すように、それを元のピッチ波形Ｓ_jとＳ_j+1の間に挿入することにより、音声信号の長さをｐｌ（ｊ）だけ伸張する。この操作を音声処理装置１１０１の内部で行うには、制御部１１２１は、記憶部１１３７からピッチ波形列Ｓ_magnifiedを取り出し、Ｄ_jをＳ_jとＳ_j+1の間に挿入してＳ_magnifiedをＳ_magnified＝｛Ｓ₀、・・・、Ｓ_j、Ｄ_j、Ｓ_j+1、・・・、Ｓ_N-1｝のように更新する。さらに、制御部１１２１は、伸張の目標値との比較のために、この更新されたＳ_magnifiedの長さを測った後、更新されたＳ_magnifiedを記憶部１１３７に保存する（ステップＳ１４１３）。 When the generation of the new pitch waveform D _j is completed, it is inserted between the original pitch waveforms S _j and S _{j + 1} as shown in FIG. j) Decompress only. In order to perform this operation inside the speech processing apparatus 1101, the control unit 1121 extracts the pitch waveform sequence S _magnified from the storage unit 1137, _inserts D _j between S _j and S _{j + 1} , and _sets S _magnified . _{_{S magnified = {S 0, ···}} , S j, D j, S j + 1, ···, S N-1} is updated as. Further, the control unit 1121 measures the length of the updated S _magnified for comparison with the expansion target value, and then stores the updated S _magnified in the storage unit 1137 (step S1413).

伸張の目標値は、伸張処理の開始の前に、例えばユーザが入力装置１１１１を用いて入力制御部１１３１を介して制御部１１２１に伝えてあるものとする。ステップＳ１４１３で測られたＳ_magnifiedの長さが、伸張の目標値に到達しているかどうかは、ステップＳ１４１５において判別される。目標値に到達しているならば（ステップＳ１４１５；Ｙｅｓ）、これ以上の伸張は不要なので、制御部１１２１は伸張処理を終了し、この時点におけるＳ_magnifiedを伸張の最終結果としてデータ記録部１１３９により記録媒体１１１５に記録したり、伸張処理完了の旨を出力制御部１１３３を介して出力装置１１１３に出力してユーザに知らせたりする。 It is assumed that the target value for expansion is transmitted to the control unit 1121 via the input control unit 1131 using the input device 1111, for example, before the start of the expansion process. In step S1415, it is determined whether or not the length of S _magnified measured in step S1413 has reached the target expansion value. If the target value has been reached (step S1415; Yes), no further expansion is necessary, so the control unit 1121 terminates the expansion process, and the data recording unit 1139 _sets S _magnified at this time as the final expansion result. Recording is performed on the recording medium 1115, or the completion of the expansion process is output to the output device 1113 via the output control unit 1133 to notify the user.

それに対して、ステップＳ１４１３で更新されたＳ_magnifiedが目標長に到達していないと判別された場合（ステップＳ１４１５；Ｎｏ）、制御部１１２１はさらに新たなピッチ波形を生成してＳ_magnifiedに追加しＳ_magnifiedを伸張すべきであるから、新規ピッチ波形挿入先となる境界を検索するステップＳ１４０７に戻る。このとき、まだ新規ピッチ波形が挿入されていない境界のうちから挿入の優先度が最も高い境界を選び出すために、優先順位を表すカウンタｋを１だけ増加する（ステップＳ１４１７）。 On the other hand, when it is determined that the S _magnified updated in step S1413 has not reached the target length (step S1415; No), the control unit 1121 generates a new pitch waveform and adds it to the S _magnified. Since S _magnified should be expanded, the process returns to step S1407 to search for a boundary to which a new pitch waveform is to be inserted. At this time, in order to select a boundary having the highest insertion priority from boundaries where a new pitch waveform has not yet been inserted, the counter k representing the priority is incremented by 1 (step S1417).

伸張の程度は２倍より小さいので（ステップＳ１４０３；Ｙｅｓ）、元のピッチ波形同士がなす境界の全てに新規ピッチ波形を挿入し尽くすまでに、Ｓ_magnifiedは必ず目標値に到達（ステップＳ１４１５；Ｙｅｓ）し、処理が完了する。 Since the degree of expansion is smaller than twice (step S1403; Yes), S _magnified always reaches the target value (step _S1415 ; Yes) until the new pitch waveform is completely inserted into all the boundaries between the original pitch waveforms. ) And the process is complete.

次に、指定された伸張が、２倍より大きい場合（ステップＳ１４０３；Ｎｏ）の処理（ステップＳ１４１９）について、図を改めて説明する。 Next, the process (step S1419) when the designated expansion is larger than twice (step S1403; No) will be described again.

図５が、２倍より大きい場合の伸張処理の具体的な手順を示したフローチャートである。２倍より小さく伸張する場合に比べると、元の連続する２つのピッチ波形の境界の一つ以上について、新規ピッチ波形を複数挿入しなければならないぶんだけ、処理が煩雑になる。基本的には、２倍より小さく伸張する場合と同じく、優先度の高い境界から順に新規ピッチ波形を挿入していく。しかし、全ての境界にひとつずつ新規ピッチ波形を挿入し終えても、信号を目標長にまで伸張させることができない。よって、さらにもう一回り、再び優先度の高い境界から順に、今度は２つずつ、新規ピッチ波形を挿入していく必要がある。その手順を終えてもなお目標長に到達しない場合には、さらにもう一回り、今度は３つずつ新規ピッチ波形を挿入する。このため、２倍より大きい場合の伸張処理（図５）は、おおまかにいって、２倍より小さい場合の伸張処理（図４）に対して、全境界を何回巡回しているかをカウントするループがひとつ増えた処理となる。 FIG. 5 is a flowchart showing a specific procedure of the decompression process in the case where it is larger than twice. Compared with the case where the extension is smaller than twice, the processing becomes complicated as much as a plurality of new pitch waveforms must be inserted at one or more of the boundaries between the two original continuous pitch waveforms. Basically, a new pitch waveform is inserted in order from the boundary with the highest priority, as in the case where the extension is smaller than twice. However, the signal cannot be extended to the target length even after the new pitch waveform has been inserted one by one at every boundary. Therefore, it is necessary to insert new pitch waveforms one by two in turn, starting from the boundary with the highest priority again. If the target length is not reached even after the procedure is completed, another new pitch waveform is inserted three times this time. For this reason, the expansion process (FIG. 5) when it is larger than twice roughly counts how many times the entire boundary is circulated with respect to the expansion process (FIG. 4) when it is smaller than twice. This is a process with one more loop.

上述の、全境界を何回巡回しているかをカウントするカウンタを、全ピッチ波形間巡回回数カウンタｍと呼ぶことにする。初期値は、１巡目を表すために、ｍ＝１とする（ステップＳ１５０１）。 The above-mentioned counter that counts how many times the entire boundary is circulated will be referred to as an all pitch waveform cycle number counter m. The initial value is set to m = 1 to represent the first round (step S1501).

次に、元のピッチ波形のなす境界に、新規ピッチ波形の挿入先としての優先順位を割り当てるために、優先順位を表すカウンタｋをｋ＝１に初期化し（ステップＳ１５０３）、ｋ番目の候補である境界を表す添字ｊを求め（ステップＳ１５０５）、ｊに対応した元のピッチ波形であるＳ_j及びＳ_j+1を取り出す（ステップＳ１５０７）。これらは、２倍以下の伸張の際に行った手続（図４のステップＳ１４０５、Ｓ１４０７、Ｓ１４０９）と同一である。 Next, in order to assign a priority as an insertion destination of the new pitch waveform to the boundary formed by the original pitch waveform, a counter k representing the priority is initialized to k = 1 (step S1503), and the kth candidate is selected. seeking index j representing a certain boundary (step S1505), taken out S _j and S _{j + 1} is the original pitch waveforms corresponding to j (step S1507). These are the same as the procedures (steps S1405, S1407, and S1409 in FIG. 4) performed at the time of expansion of 2 times or less.

この後、ステップＳ１５０９においては、Ｓ_jとＳ_j+1とから、新しいピッチ波形を生成する。全ピッチ波形間の巡回が１回目のとき、すなわち、ｍ＝１のときには、新しいピッチ波形をひとつだけ生成すればよいので、２倍以下の伸張手続における新規波形生成方法（図４のステップＳ１４１１）と変わるところがない。前述のとおり、適切な重み付けを行った上で、加算して新規ピッチ波形をひとつ生成すればよい。 Thereafter, in step S1509, a new pitch waveform is generated from S _j and S _{j + 1} . When the cycle between all the pitch waveforms is the first time, that is, when m = 1, only one new pitch waveform needs to be generated. Therefore, a new waveform generation method in the expansion procedure of twice or less (step S1411 in FIG. 4). There is no change. As described above, one new pitch waveform may be generated by performing an appropriate weighting and adding.

ところが、全ピッチ波形間の巡回を１回だけ行って伸張することができるのは、たかだか２倍までである。２倍以上の伸張処理の場合には、全ピッチ波形間の巡回は、１回では済まない。すなわち、全ピッチ波形間巡回回数カウンタｍの最大値は、必ず、２以上となる。ｍ回目の全ピッチ波形間巡回に際しては、すでに（ｍ−１）個の新規ピッチ波形が挿入済みである境界の中から、乖離度の小ささにより定まる優先度の高い境界を順に選び取り、該境界に挿入済みの（ｍ−１）個の新規ピッチ波形をいったん破棄し、その代わりに、新たにｍ個の新規ピッチ波形を生成して該境界に挿入する。これにより、ピッチ波形１個分だけ、音声信号を伸張することができる。 However, the maximum number of cycles that can be extended by performing only one cycle between all pitch waveforms is up to twice. In the case of the expansion process of 2 times or more, the circulation between all pitch waveforms is not completed once. That is, the maximum value of the total pitch waveform cycle count counter m is always 2 or more. In the m-th cycle between all pitch waveforms, a boundary having a high priority determined by the small degree of divergence is sequentially selected from the boundaries where (m−1) new pitch waveforms have already been inserted. The (m−1) new pitch waveforms that have been inserted into the boundary are discarded once, and instead, m new pitch waveforms are newly generated and inserted into the boundary. As a result, the audio signal can be expanded by one pitch waveform.

ここで問題となるのは、新規ピッチ波形は、元となる２つのピッチ波形のいずれとも異なっていなければならないばかりでなく、新規ピッチ波形同士も、複数生成された場合には、少なくとも挿入後に隣り合うことになるもの同士は異なっていなければならないということである。これは、前に説明したとおり、全く同じ波形の繰り返しにより人の音声として不自然なものとなってしまう事態を、避けるための要請である。 The problem here is that the new pitch waveform must be different from both of the two original pitch waveforms, and if a plurality of new pitch waveforms are generated, at least after the insertion, they are adjacent. The things that fit together must be different. As described above, this is a request for avoiding a situation in which a human voice is unnatural due to repetition of the same waveform.

かかる問題のうち、元となる２つのピッチ波形のいずれとも異なっていなければならないという要請については、伸張度が２倍より小さい場合と同じ要請である。よって、伸張度が２倍より小さい場合と同じく、重み付き加算により新たなピッチ波形の生成を行えばよい。 Of these problems, the request that must be different from any of the two original pitch waveforms is the same request as when the degree of expansion is less than twice. Therefore, a new pitch waveform may be generated by weighted addition as in the case where the degree of expansion is smaller than twice.

それに対して、新たに生成する複数の新規ピッチ波形を、相互に異なるものとしなければならないという要請は、伸張度が２倍以上の場合になって初めて生じたものである。かかる要請に応えるためには、例えば、新規ピッチ波形を元の２つのピッチ波形から生成する方法を何種類も用意して、新規ピッチ波形の生成の度に新たな生成法を採用することにより、複数の新規ピッチ波形の形が重なってしまうのを避けるようにする、ということが考えられる。 On the other hand, a request that a plurality of new pitch waveforms to be newly generated must be different from each other occurs only when the degree of expansion is twice or more. In order to respond to such a request, for example, by preparing various types of methods for generating a new pitch waveform from the original two pitch waveforms, and adopting a new generation method every time a new pitch waveform is generated, It is conceivable to avoid overlapping a plurality of new pitch waveform shapes.

しかし、生成方法の種類をあらかじめ複数用意しておくのは煩雑である上、伸張度が大きい場合には用意された種類の生成方法を全て使い切ってしまい、一度生成した新規ピッチ波形と同一の新規ピッチ波形を再び生成せざるを得ず、上述の要請に応えられない可能性がある。 However, it is cumbersome to prepare multiple types of generation methods in advance, and when the degree of expansion is large, all of the prepared types of generation methods are used up, and the same new pitch waveform generated once is generated. There is a possibility that the pitch waveform must be generated again, and the above-described request may not be met.

そこで、本実施例においては、より簡便かつ確実な方法を採用する。前述のとおり、元のピッチ波形１個１個の長さは、オーダーとしては同程度ではあるものの、一般には、異なることが期待される。特に隣接ピッチ波形同士で、長さが同一になる可能性はほぼゼロである。本実施例に係る音声処理装置１１０１の処理対象としては主として人の音声が想定されているのであって、人の音声が長時間に渡り定常状態を保つことは考えにくいからである。 Therefore, in this embodiment, a simpler and more reliable method is adopted. As described above, the length of each original pitch waveform is generally the same as the order but is expected to be different. In particular, the possibility that the adjacent pitch waveforms have the same length is almost zero. This is because a human voice is mainly assumed as a processing target of the voice processing apparatus 1101 according to the present embodiment, and it is difficult to think that a human voice remains in a steady state for a long time.

本実施例では、このような、元の隣接ピッチ波形同士の長さが異なる性質を利用する。すなわち、新規ピッチ波形の生成方法としては、元の２つのピッチ波形の重み付け加算を行うという、上述の方法１種類だけにする。これにより、処理が簡潔になる。そして、新規ピッチ波形相互に差をつけなければならないという要請を満たすためには、生成された新規ピッチ波形を時間軸方向に何通りも伸縮することにより、同じ新規ピッチ波形が生じないようにする。 In the present embodiment, such a property that the lengths of the original adjacent pitch waveforms are different is used. That is, as a method for generating a new pitch waveform, only one type of the above-described method of performing weighted addition of the original two pitch waveforms is used. This simplifies the process. In order to satisfy the requirement that there is a difference between the new pitch waveforms, the generated new pitch waveform is expanded and contracted in the time axis direction so that the same new pitch waveform does not occur. .

新規ピッチ波形を時間軸方向に伸縮するということは、波形の長さを変化させるということである。そこで次は、どのような範囲で変化させるか、という点が問題となる。伸張後の音声の品質の劣化を最小限にするためには、明らかに、元の２つのピッチ波形の長さの間に収まる範囲で必要な個数の長さを選ぶのが適切である。 Expansion and contraction of the new pitch waveform in the time axis direction means changing the length of the waveform. Therefore, the next issue is the range of changes. Obviously, in order to minimize degradation of the quality of the voice after decompression, it is appropriate to select the required number of lengths within a range that falls between the lengths of the original two pitch waveforms.

本実施例でもそのような方法で複数の新規ピッチ波形を生成する。すなわち、ｍ個の新規ピッチ波形を生成する場合（ステップＳ１５０９）、まず、２倍以下伸張時（図３、図４のステップＳ１４１１）と同じく重み付け加算により新たなピッチ波形Ｄ_jを生成する。このＤ_jを時間軸上で伸縮することにより、ｍ個のバリエーションのピッチ波形
Ｄ_1、j＝｛ｄ_1、j、0、・・・、ｄ_{1、j、pl(j)-1}｝、・・・、Ｄ_m、j＝｛ｄ_m、j、0、・・・、ｄ_{m、j、pl(j+1)-1}｝
を生成する。すなわち、元の２つの隣接ピッチ波形Ｓ_jの長さｐｌ（ｊ）とＳ_j+1の長さｐｌ（ｊ＋１）とが前述のようにほぼ確実に異なることに着目し、Ｄ_1、j、・・・、Ｄ_m、jの長さを、ｐｌ（ｊ）からｐｌ（ｊ＋１）までのｍ段階の長さに調節する。ｐｌ（ｊ）からｐｌ（ｊ＋１）の間をどのようにｍ段階に分割するかについては、本実施例では最も簡潔に、比例的に分割するものとする。すなわち、ｉ番目（１≦ｉ≦ｍ）の新規ピッチ波形Ｄ_i、jの長さを、
ｐｌ（ｊ）＋［｛ｐｌ（ｊ＋１）−ｐｌ（ｊ）｝×（ｉ−１）／（ｍ−１）〕
とする。 Also in this embodiment, a plurality of new pitch waveforms are generated by such a method. That is, when generating the m new pitch waveform (step S1509), first, when stretched 2 times or less to produce a new pitch waveform D _j by the same weighted addition (Fig. 3, step S1411 of FIG. 4). By expanding and contracting D _j on the time axis, m variations of the pitch waveform D _{1, j} = {d _{1, j, 0} ,..., D _{1, j, pl (j) −1} }, ..., _{Dm, j} = {dm _{, j, 0} , ..., dm _{, j, pl (j + 1) -1} }
Is generated. That is, paying attention to the fact that the length pl (j) of the original two adjacent pitch waveforms S _{j and} the length pl (j _{+ 1} ) of S _{j + 1} are almost certainly different as described above, D _{1, j} , ..., the length of D _{m, j} is adjusted to m lengths from pl (j) to pl (j + 1). As for how to divide between pl (j) and pl (j + 1) into m stages, in the present embodiment, it is assumed that the division is most simply and proportionally. That is, the length of the i-th (1 ≦ i ≦ m) new pitch waveform D _{i, j} is
pl (j) + [{pl (j + 1) −pl (j)} × (i−1) / (m−1)]
And

挿入する複数の新規ピッチ波形の長さを、このように、元のピッチ波形のうち時間的に過去のピッチ波形の長さから、時間的に未来のピッチ波形の長さにまで、漸増又は漸減させることには、処理が簡単になるばかりでなく、挿入による音声信号の変化が緩やかであるために伸張後の音声の品質の劣化を最小限に抑える利点もあると考えられる。 The length of a plurality of new pitch waveforms to be inserted is gradually increased or decreased from the length of the past pitch waveform in the original pitch waveform to the length of the pitch waveform in the future in time. In addition to simplifying the process, it is considered that there is an advantage of minimizing degradation of the quality of the voice after expansion because the change of the voice signal due to the insertion is gentle.

なお、ピッチ波形の長さを変化させるに方法は、様々なものが考えられるが、サンプリング位置の変更を行うのが簡便である。この方法については、後に詳しく説明する。 There are various methods for changing the length of the pitch waveform, but it is easy to change the sampling position. This method will be described in detail later.

結局、元の２つのピッチ波形Ｓ_jとＳ_j+1とから、ｍ個の新しいピッチ波形Ｄ_i、j（１≦ｉ≦ｍ）が生成される（ステップＳ１５０９）。Ｄ_1、jの長さはＳ_ｊと同じくｐｌ（ｊ）であり、Ｄ_m、jの長さはＳ_j+1と同じくｐｌ（ｊ＋１）であり、Ｄ_2、j〜Ｄ_m-1、jの長さはｐｌ（ｊ）とｐｌ（ｊ＋１）との間である。 Eventually, m new pitch waveforms D _{i, j} (1 ≦ i ≦ m) are generated from the original two pitch waveforms S _j and S _{j + 1} (step S1509). The length of the D _{1, j} is likewise pl (j) and _{S j,} D _m, the length of _j is also pl and _{S j + 1 (j + 1} ), D 2, j ~D m-1, The length of _j is between pl (j) and pl (j + 1).

ステップＳ１５０９においてｍ個の新規ピッチ波形が生成されたら、これらを、この時点での音声伸張結果であるＳ_magnifiedに付加してＳ_magnifiedを更新する。Ｓ_magnifiedは、伸張処理の際に用意されている（図４のステップＳ１４０１）、伸張信号の候補である。より具体的には、制御部１１２１は、記憶部からピッチ波形列Ｓ_magnifiedを取り出し、Ｄ_1、j、・・・、Ｄ_m、jをＳ_jとＳ_j+1との間に挿入することにより、Ｓ_magnified＝｛Ｓ₀、・・・、Ｓ_j、Ｄ_1、j、・・・、Ｄ_m、j、Ｓ_j+1、・・・、Ｓ_N-1｝のように更新し、記憶部１１３７に保存する。また、同時に、目標長との比較に役立てるために、Ｓ_magnifiedの長さを測っておく（ステップＳ１５１１）。 When m new pitch waveforms are generated in step S1509, these are added to S _magnified which is the result of voice expansion at this time, and S _magnified is updated. S _magnified is a candidate for the extension signal prepared during the extension process (step S1401 in FIG. 4). More specifically, the control unit 1121 takes out the pitch waveform sequence S _magnified from the storage unit, and _inserts D _{1, j} ,..., D _{m, j} between S _j and S _{j + 1.} by, S _Magnified = updated as _{_{{S 0, ···, S j}} , D 1, j, ···, D m, j, S j + 1, ···, S N-1}, Save in the storage unit 1137. At the same time, the length of S _magnified is measured in order to be useful for comparison with the target length (step _S1511 ).

上述のように、ｍ回目の全ピッチ波形間巡回においてｍ個の新規ピッチ波形を生成し適切な位置に挿入する際には、前回のピッチ波形巡回において該位置に挿入された（ｍ−１）個の新規波形は、破棄される。一方、この時点での挿入対象になっていない境界にすでに挿入されている（ｍ−１）個またはｍ個の新規ピッチ波形は、そのままにしておく。 As described above, when m new pitch waveforms are generated and inserted at an appropriate position in the m-th cycle between all pitch waveforms, they are inserted at the positions in the previous pitch waveform cycle (m−1). The new waveforms are discarded. On the other hand, the (m−1) or m new pitch waveforms already inserted at the boundary that is not the insertion target at this time are left as they are.

すると、挿入されたピッチ波形が１個増えたことになるため、音声信号Ｓ_magnifiedは、ピッチ波形の長さ１個ぶんだけ伸張したことになる。そこで、ステップＳ１５１１にて測っておいたＳ_magnifiedの長さが、この時点で伸張度の目標値に達しているか否かを判別し（ステップＳ１５１３）、達している場合には（ステップＳ１５１３；Ｙｅｓ）、伸張処理を完了し、Ｓ_magnifiedを最終的な伸張結果とする。 Then, since the inserted pitch waveform is increased by one, the audio signal S _magnified is expanded by one pitch waveform length. Therefore, it is determined whether or not the length of S _magnified measured in step _S1511 has reached the target value of the degree of expansion at this point (step S1513). If it has reached (step S1513; Yes) ), The expansion process is completed, and S _magnified is _set as the final expansion result.

目標値に達していない場合（ステップＳ１５１３；Ｎｏ）、次に、優先順位のカウンタであるｋが、元のピッチ波形の個数Ｎより小さいかどうかを判別する。小さい場合には（ステップＳ１５１５；Ｙｅｓ）、まだｍ個の新規波形を挿入していない境界が存在するので、ｋを１だけ増やして（ステップＳ１５１９）、次の優先順位の挿入箇所を検索する処理に戻る（ステップＳ１５０５）。 If the target value has not been reached (step S1513; No), it is next determined whether k, which is a priority counter, is smaller than the number N of the original pitch waveforms. If it is smaller (step S1515; Yes), there are boundaries where m new waveforms have not yet been inserted, so k is incremented by 1 (step S1519), and a process of searching for the next priority insertion position is searched. Return to (step S1505).

一方、ｋがＮになった場合には（ステップＳ１５１５；Ｎｏ）、全ての境界にｍ個の新規波形を挿入し終えたので、ｍを１だけ増加させて（ステップＳ１５１７）、全ピッチ波形間を乖離度により定まる優先順に巡回する操作を繰り返す（ステップＳ１５０３）。 On the other hand, when k becomes N (step S1515; No), since m new waveforms have been inserted into all the boundaries, m is increased by 1 (step S1517), and all pitch waveforms are interleaved. Are repeated in order of priority determined by the degree of divergence (step S1503).

全ピッチ波形間巡回を繰り返すことにより、すなわち、ｍを大きくしていくことにより、原理的には、Ｓ_magnifiedをいくらでも長くすることができる。よって、以上の手順により、処理は無限ループに陥ることなく、指定された伸張度に必ず達し、処理は完了する。 In principle, S _magnified can be lengthened by repeating the cycle between all pitch waveforms, that is, by increasing m. Therefore, according to the above procedure, the process always reaches the designated degree of expansion without falling into an infinite loop, and the process is completed.

（実施形態２）
本発明に係る実施形態２を、以下、図面を参照して説明する。実施形態２に係る音声処理装置は、音声縮小装置である。その構成は、図１に示す実施形態１に係る音声処理装置とほぼ同じであるが、音声伸張処理の代わりに音声縮小処理を行う。 (Embodiment 2)
Embodiment 2 according to the present invention will be described below with reference to the drawings. The sound processing device according to the second embodiment is a sound reduction device. The configuration is almost the same as that of the audio processing apparatus according to the first embodiment shown in FIG. 1, but an audio reduction process is performed instead of the audio expansion process.

波形縮小処理において、制御部１１２１は、処理対象である元の音声波形データをピッチ波形単位に分割する。そして、連続する２つのピッチ波形に基づいて、新たなピッチ波形を生成し、元の連続する２つのピッチ波形を、該新たなピッチ波形で置き換えることにより、ピッチ波形の長さ１個ぶんずつ、音声波形データを縮小していく。かかる置換動作の繰り返しにより、最終的には、指定の長さとなるような音声波形データが生成される。 In the waveform reduction process, the control unit 1121 divides the original speech waveform data to be processed into pitch waveform units. Then, based on the two consecutive pitch waveforms, a new pitch waveform is generated, and the original two consecutive pitch waveforms are replaced with the new pitch waveform, thereby increasing the length of the pitch waveform one by one. Reduce voice waveform data. By repeating this replacement operation, speech waveform data having a specified length is finally generated.

波形縮小のためには、ピッチ波形を適宜間引きするのが最も簡単ではあるが、単に間引きしただけだと、間引きした結果隣り合うことになったピッチ波形同士の違いが大きすぎる箇所が生じる可能性がある。すると音声信号に不連続部分が生じたのと同様な効果が生じ、その結果、縮小後の音声信号の品質劣化につながる可能性がある。 In order to reduce the waveform, it is easiest to thin out the pitch waveform appropriately. However, if the thinning is simply performed, the difference between the pitch waveforms adjacent to each other as a result of the thinning may be too large. There is. Then, an effect similar to that in which a discontinuous portion occurs in the audio signal is produced, and as a result, there is a possibility that the quality of the audio signal after the reduction is deteriorated.

単なる間引きではなく置換を行うことにした以上、置換のために新たに生成されるピッチ波形は、それに置換される２個のピッチ波形の一方又は両方となんらかの相関を有していることが望ましいのは明らかである。仮に、とり除かれる２個のピッチ波形と全く相関のない唐突な形状のピッチ波形を配置すれば、その部分で音声信号に滑らかさが失われ、音質が劣化してしまう。 Since it is decided to perform replacement rather than simple decimation, it is desirable that the newly generated pitch waveform for replacement has some correlation with one or both of the two pitch waveforms to be replaced. Is clear. If an abrupt pitch waveform having no correlation with the two pitch waveforms to be removed is arranged, the audio signal is not smooth at that portion and the sound quality is deteriorated.

置換のために生成された新たなピッチ波形に、置換される２個の既存のピッチ波形の一方又は両方との相関関係を有せしめるためには、例えば、２個の既存のピッチ波形のうちどちらかを単にコピーしたものを、新たなピッチ波形とすればよい。 In order for the new pitch waveform generated for replacement to have a correlation with one or both of the two existing pitch waveforms to be replaced, for example, which of the two existing pitch waveforms A simple copy of these may be used as a new pitch waveform.

しかし、このようにすると、新たなピッチ波形を割り込ませた結果、全く同一のピッチ波形が連続することになってしまう。一般に、全く同一の波形が連続すると、不自然な音声になってしまう。 However, if it does in this way, as a result of interrupting a new pitch waveform, the completely same pitch waveform will be continued. Generally, when exactly the same waveform continues, unnatural sound is produced.

以上のように、ピッチ波形単位での縮小に際しては、元の２つのピッチ波形の代わりに、全く唐突な波形を配置するのは不適切であることはもちろんのこと、かかる２つのピッチ波形のいずれかの単なるコピーを配置することも、望ましくない。つまり、置換に用いるべき新たなピッチ波形は、該置換前には隣接関係にあった２個のピッチ波形の形状と比べて、全く無関係な形状であってはならないし、元の２個のピッチ波形の一方と似すぎていてもいけない。 As described above, when reducing in units of pitch waveforms, it is not appropriate to arrange a completely abrupt waveform instead of the original two pitch waveforms. It is also undesirable to place a simple copy of it. In other words, the new pitch waveform to be used for replacement must not have an irrelevant shape compared to the shape of the two pitch waveforms that were adjacent before the replacement, and the original two pitch waveforms. Don't be too similar to one of the waveforms.

かかる要請に応えるべく、図６に示すように、本実施形態においては、新規ピッチ波形の配置予定先に存在している元の２つのピッチ波形（図６（ａ））をそれぞれ適当に波形変形処理し（図６（ｂ））、かかる処理済みの波形を重ね合わせることにより、元の２つのピッチ波形の中間的な形状を有する新規ピッチ波形を生成し、該新規ピッチ波形を元の２つのピッチ波形の代わりに配置して音声信号を縮小する（図６（ｃ））。 In order to meet such a demand, as shown in FIG. 6, in the present embodiment, the original two pitch waveforms (FIG. 6A) existing at the planned placement destination of the new pitch waveform are appropriately deformed. Processing (FIG. 6 (b)) and superimposing such processed waveforms to generate a new pitch waveform having an intermediate shape between the two original pitch waveforms, The audio signal is reduced by arranging it instead of the pitch waveform (FIG. 6C).

上述の波形変形処理（図６（ｂ））は、その後の重ね合わせの結果生成される新規ピッチ波形（図６（ｃ））が元の２つのピッチ波形（図６（ａ））の中間的な形状となるものであれば任意の処理でかまわない。後に、重み付けを用いた簡単な処理方法を説明する。 In the above-described waveform deformation process (FIG. 6B), the new pitch waveform (FIG. 6C) generated as a result of subsequent superposition is intermediate between the two original pitch waveforms (FIG. 6A). Any processing can be used as long as it has a simple shape. A simple processing method using weighting will be described later.

元のＮ個のピッチ波形の全ての境界に新たなピッチ波形をひとつずつ配置したとしても、縮小度は０．５倍にしかならない。０．５倍より短く縮小したい場合が問題となる。 Even if new pitch waveforms are arranged one by one at all boundaries of the original N pitch waveforms, the reduction degree is only 0.5 times. A problem arises when it is desired to reduce the length to be shorter than 0.5 times.

かかる問題は、実施形態１において、２倍以上の伸張処理には特別な配慮が必要となった問題と呼応関係にある。 Such a problem is related to the problem in the first embodiment in which special consideration is required for the extension processing of twice or more.

しかしながら、縮小処理は拡大処理と異なり、ピッチ波形の数を減らしていくだけの処理である。よって、実施形態１で問題となったような、複数生成した新規ピッチ波形を相互に異なるものとしなければならないという配慮は、不要である。 However, unlike the enlargement process, the reduction process is a process that only reduces the number of pitch waveforms. Therefore, there is no need to consider that a plurality of generated new pitch waveforms that are problematic in the first embodiment must be different from each other.

したがって、本実施形態において行われる、０．５倍より短く縮小する処理は、実施形態１において行われた、２倍より長く縮小する処理よりも簡単である。すなわち、０．５倍より短く縮小することが指定されている場合には、０．５倍までの縮小を終えた直後の縮小音声信号を、音声信号の新たな初期値であると考え、同じ手順を繰り返せば足りる。２回繰り返せば０．２５倍、ｉ回繰り返せば（０．５）ⁱ倍、というように、原理的には、ピッチ波形長程度の長さにまで縮小することを要求しない限りは、いくらでも縮小することができる。 Therefore, the process of reducing the image to be shorter than 0.5 times performed in the present embodiment is simpler than the process of reducing the image to be longer than 2 times performed in the first embodiment. That is, when it is specified that the reduction is shorter than 0.5 times, the reduced audio signal immediately after the reduction to 0.5 times is considered as the new initial value of the audio signal, and the same Repeating the procedure is enough. In principle, 0.25 times if it is repeated twice, (0.5) ⁱ times if it is repeated ⁱ times, and so on, unless it is required to reduce the pitch waveform length to the length. can do.

ゆえに、以下では、元の音声信号を０．５倍まで縮小可能な手順だけを説明する。 Therefore, only the procedure that can reduce the original audio signal to 0.5 times will be described below.

要求される縮小度が０．５倍より大きい場合には、元のピッチ波形のうちには、新規波形に置換されるものと、そのまま残るものとが、生じることになる。 When the required reduction degree is larger than 0.5 times, an original pitch waveform that is replaced with a new waveform and a waveform that remains as it is are generated.

そこで、音声信号を指定された縮小度に達せしめるために、元のピッチ波形のうち、新規ピッチ波形に置換すべきものと、そのまま残すべきものとを判別する必要が生じる。 Therefore, in order to make the audio signal reach the specified degree of reduction, it is necessary to discriminate between the original pitch waveform to be replaced with the new pitch waveform and the one to be left as it is.

そこで、実施形態１の場合と同様に、元のピッチ波形の境界毎に乖離度ｅ_j（０≦ｊ≦Ｎ−１）を計算し、この値が小さい境界を挟む２個のピッチ波形から優先して、新たなピッチ波形に置き換えることにする。そして、優先順位の高い所から順番に置換して音声信号を縮小していき、指定された縮小度まで縮小した時点で置換を止める、という方針を採る。 Therefore, as in the case of the first embodiment, the divergence degree e _j (0 ≦ j ≦ N−1) is calculated for each boundary of the original pitch waveform, and priority is given to the two pitch waveforms sandwiching the boundary where this value is small. Therefore, it is replaced with a new pitch waveform. Then, a policy is adopted in which the audio signal is reduced in order from the place with the highest priority, and the replacement is stopped when the audio signal is reduced to a specified reduction degree.

乖離度が小さい箇所ほど、元々、音声信号の時間変化が緩やかであったといえる。そこで、かかる箇所ほど、置換という人為的操作を加えても音声信号の品質が劣化しづらいと期待される。このことから、乖離度に基づいて置換箇所の優先順位を決めることは、合理的であるといえる。 It can be said that the time change of the audio signal was originally gentler as the divergence degree was smaller. Therefore, it is expected that the quality of the audio signal is less likely to deteriorate in such a portion even if an artificial operation of replacement is applied. From this, it can be said that it is reasonable to determine the priority order of replacement locations based on the degree of deviation.

本実施形態における音声信号縮小処理の具体的な手順を、図７を参照しつつ説明する。すでに図１の記憶部１１３７には、処理対象となる音声信号がＮ個のピッチ波形Ｓ₀、・・・、Ｓ_N-1に分割された状態で格納されているとする。各ピッチ波形Ｓ_j（０≦ｊ≦Ｎ−１）は、サンプリング時間間隔ｑで採取された波高の列｛ｓ_j、0、・・・、ｓ_j、pl(j)-1｝である。 A specific procedure of the audio signal reduction process in the present embodiment will be described with reference to FIG. It is assumed that the storage unit 1137 in FIG. 1 has already stored an audio signal to be processed in a state of being divided into _N pitch waveforms S ₀ ,. Each pitch waveform S _j (0 ≦ j ≦ N−1) is a sequence of wave heights {s _{j, 0} ,..., S _{j, pl (j) −1} } collected at the sampling time interval q.

まず、制御部１１２１は、記憶部１１３７に、伸張後ピッチ波形を構成要素とする構造体Ｓ_reducedを格納する領域を確保する。そして、Ｓ_reducedの初期値Ｓ_initとしては、記憶部１１３７に格納されている元の音声信号のピッチ波形の列｛Ｓ₀、・・・、Ｓ_N-1｝をそのままコピーしたものを採用する。すなわち、Ｓ_reduced＝Ｓ_init＝｛Ｓ₀、・・・、Ｓ_N-1｝とする（ステップＳ１７０１）。 First, the control unit 1121 reserves an area in the storage unit 1137 for storing the structure S _reduced having the expanded pitch waveform as a component. As the initial value S _init of S _reduced, a value obtained by directly copying the sequence {S ₀ ,..., S _N-1 } of the pitch waveform of the original audio signal stored in the storage unit 1137 is employed. . That is, S _reduced = S _init = {S ₀ ,..., S _N-1 } (step S1701).

新規波形の配置先の決定には、上述のように、隣接ピッチ波形乖離度ｅ₀、・・・、ｅ_Nを利用する。乖離度はあらかじめ計算されて記憶部１１３７に格納されているものとする。乖離度の小さいピッチ波形境界ほど優先して新規ピッチ波形を挿入することになるため、制御部１１２１は、乖離度を昇順にならべたときの順番を表す変数ｋをレジスタにカウンタとして格納することとし、初期値は１とする（ステップＳ１７０３）。乖離度ｅ₀、・・・、ｅ_Nを昇順に並べ替えたとき、ｅ_jがｋ番目になるとすれば、それは、ピッチ波形Ｓ_jとピッチ波形Ｓ_j+1との境界が、置換操作の対象としてはｋ番目の候補となることを意味する。 As described above, adjacent pitch waveform divergence e ₀ ,..., E _N is used to determine the placement location of the new waveform. The divergence degree is calculated in advance and stored in the storage unit 1137. Since a pitch waveform boundary with a smaller divergence degree is preferentially inserted into a new pitch waveform, the control unit 1121 stores a variable k indicating the order when the divergence degrees are arranged in ascending order as a counter in the register. The initial value is 1 (step S1703). Discrepancy e _0, · · ·, when rearranging the e _N in ascending order, if e _j is the k-th, it boundary between the pitch waveforms S _j pitch waveforms S _{j + 1} is the replacement operation It means to become the kth candidate as a target.

次に、制御部１１２１は、記憶部１１３７に格納されている乖離度ｅ₀、・・・、ｅ_Nのうち、ｋ番目に小さいものを検索し、その添字を取り出し、ｋが格納されているレジスタとは別のカウンタ用レジスタにロードする（ステップ１７０５）。例えば、ｋ番目に小さい乖離度がｅ_jであれば、ｊをロードする。 Next, the control unit 1121 searches for the kth smallest one of the divergence degrees e ₀ ,..., E _N stored in the storage unit 1137, extracts the subscript, and stores k. A counter register different from the register is loaded (step 1705). For example, a small discrepancy in the k th if e _j, loading j.

上述の例のとおり、ステップ１７０５においてｊがロードされたとすれば、ピッチ波形Ｓ_jとピッチ波形Ｓ_j+1とを新規ピッチ波形に置き換えることになる。新規ピッチ波形は、図６に示したとおり、元の２つのピッチ波形から生成される。よって、制御部１１２１は、プログラム格納部１１３５に格納されたプログラムに従って元の２つのピッチ波形に基づいた新規ピッチ波形を生成するに際して、まず、元となるピッチ波形Ｓ_jとＳ_j+1とがどのようなものであるかを調べる必要がある。そこで、制御部１１２１は、汎用レジスタに、記憶部１１３７からピッチ波形Ｓ_jとＳ_j+1とをロードする（ステップＳ１７０７）。 As described above, if j is loaded in step 1705, the pitch waveform S _j and the pitch waveform S _{j + 1} are replaced with the new pitch waveform. The new pitch waveform is generated from the original two pitch waveforms as shown in FIG. Therefore, when the control unit 1121 generates a new pitch waveform based on the original two pitch waveforms in accordance with the program stored in the program storage unit 1135, first, the original pitch waveforms S _j and S _{j + 1} are obtained. You need to find out what it is. Therefore, the control unit 1121 loads the pitch waveforms S _j and S _{j + 1} from the storage unit 1137 to the general-purpose register (step S1707).

次に、制御部１１２１は、ピッチ波形Ｓ_jの波高列データｓ_j、0、・・・、ｓ_j、pl(j)-1と、ピッチ波形Ｓ_j+1の波高列データｓ_j+1、0、・・・、ｓ_{j+1、pl(j+1)-1}と、から、新しいピッチ波形Ｄ_j＝｛ｄ_j、0、・・・、ｄ_j、pl(j)-1｝を、プログラム格納部１１３５に格納されたプログラムに従って生成する（Ｓ１７０９）。 Next, the control unit 1121, the wave height column data s _j of pitch waveforms _{_{S j, 0, ···, s}} j, pl (j) _-1, wave height column of pitch waveforms S _{j + 1} data s _{j + 1 , 0} ,..., S _{j + 1, pl (j + 1) −1} and new pitch waveform D _j = {d _{j, 0} ,..., D _{j, pl (j) −1} } Is generated according to the program stored in the program storage unit 1135 (S1709).

本実施形態においては、重み付け加算により、新たな波形を生成するものとする。重み付け加算には、計算が簡単であるという利点がある。 In the present embodiment, a new waveform is generated by weighted addition. Weighted addition has the advantage of being easy to calculate.

具体的には、元の２つのピッチ波形に図６（ａ）の鎖線で示すような重み付けを行ってから、重ね合わせる。Ｓ_ｊには、１から始まり０で終わる、直線的に変化する重み付け係数を乗じることにより、図６（ｂ）の左側に示すような波形
｛ｓ_j、0×（ｐｌ（ｊ）−１）／（ｐｌ（ｊ）−１）、ｓ_j、1×（ｐｌ（ｊ）−２）／（ｐｌ（ｊ）−１）、・・・、ｓ_j、pl(j)-1×０／（ｐｌ（ｊ）−１）｝
を生成する。一方、Ｓ_j+1には、０から始まり１で終わる、直線的に変化する重み付け係数を乗じることにより、図３（ｂ）の右側に示すような波形
｛ｓ_j+1、0×０／（ｐｌ（ｊ）−１）、ｓ_j+1、1×１／（ｐｌ（ｊ）−１）、・・・、ｓ_{j+1、pl(j)-1}×（ｐｌ（ｊ）−１）／（ｐｌ（ｊ）−１）｝
を生成する。その後、重み付けの完了した２つの波形を重ね合わせて新しいピッチ波形
Ｃ_j＝｛ｃ_j、i（０≦ｉ≦ｐｌ（ｊ）−１）｜ｃ_j、i＝｛ｓ_j、i×（ｐｌ（ｊ）−１−ｉ）＋ｓ_j+1、i×ｉ）／（ｐｌ（ｊ）−１）｝
を生成する。 Specifically, the original two pitch waveforms are weighted as shown by the chain line in FIG. S _j is multiplied by a linearly changing weighting coefficient starting with 1 and ending with 0, thereby giving a waveform {s _{j, 0} × (pl (j) −1) as shown on the left side of FIG. / (Pl (j) -1), _{sj, 1} * (pl (j) -2) / (pl (j) -1), ..., _{sj, pl (j) -1} * 0 / ( pl (j) -1)}
Is generated. On the other hand, S _{j + 1} is multiplied by a linearly changing weighting coefficient starting with 0 and ending with 1, thereby obtaining a waveform {s _{j + 1, 0} × 0 / (Pl (j) -1), _{sj + 1, 1} * 1 / (pl (j) -1), ..., _{sj + 1, pl (j) -1} * (pl (j) -1 ) / (Pl (j) -1)}
Is generated. Thereafter, the two weighted waveforms are superposed to form a new pitch waveform C _j = {c _{j, i} (0 ≦ i ≦ pl (j) −1) | c _{j, i} = {s _{j, i} × (pl (J) -1-i) + _{sj + 1, i} * i) / (pl (j) -1)}
Is generated.

図６（ａ）の鎖線で示すような谷状の重み付け係数を用いる理由は、置換により取り除かれるピッチ波形Ｓ_j及びＳ_j+1のうち、置換されずに残されるピッチ波形Ｓ_j-1に近い部分及びＳ_j+2に近い部分の形状ほど重視して新しいピッチ波形を作れば、人工的な波形を挿入することにより生じ得る音声としての不自然さを最小限に止めることが期待できるからである。 The reason why the valley-shaped weighting coefficient as shown by the chain line in FIG. 6A is used is that the pitch waveform S _j-1 that is left without being replaced among the pitch waveforms S _j and S _{j + 1 that} are removed by the replacement. If a new pitch waveform is created with a greater emphasis on the shape of the closer portion and the portion closer to S _{j + 2} , it can be expected to minimize the unnaturalness of speech that can occur by inserting an artificial waveform. It is.

新しいピッチ波形Ｃ_jの生成が完了したら、図６（ｃ）に示すように、それを元のピッチ波形Ｓ_jとＳ_j+1の代わりに配置することにより、音声信号の長さをｐｌ（ｊ＋１）だけ縮める。この操作を音声処理装置１１０１の内部で行うには、制御部１１２１は、記憶部１１３７からピッチ波形列Ｓ_reduced＝｛Ｓ_r、0、・・・、Ｓ_r、N-k｝を取り出す。なお、Ｓ_reducedは、（ｋ−１）回の置換を経ているためにピッチ波形の数が（ｋ−１）個減って、結局、合計（Ｎ−ｋ＋１）個のピッチ波形で構成されている。制御部１１２１は、Ｓ_reducedからＳ_r、j+1を削除することにより第１暫定ピッチ波形列Ｓ_tmp＝｛Ｓ_r、0、・・・、Ｓ_r、j、Ｓ_r、j+2、・・・、Ｓ_r、N-k｝を生成した後、Ｃ_jをＳ_r、jに代入する操作を行う。続いて、Ｓ_r、j+2の値をＳ_r、j+1に代入する操作、Ｓ_r、j+3の値をＳ_r、j+2に代入する操作、等を、Ｓ_r、N-kの値をＳ_r、N-k-1に代入するまで継続する。こうして第２暫定ピッチ波形列Ｓ’_tmp＝｛Ｓ_r、0、・・・、Ｓ_r、N-k-1｝を生成し、Ｓ_reducedをＳ_reduced＝Ｓ’_tmpに更新する。さらに、制御部１１２１は、縮小の目標値との比較のために、この更新されたＳ_reducedの長さを測った後、更新されたＳ_reducedを記憶部１１３７に保存する（ステップＳ１７１１）。 When the generation of the new pitch waveform C _j is completed, as shown in FIG. 6C, it is arranged in place of the original pitch waveforms S _j and S _{j + 1} so that the length of the audio signal is pl ( Shrink by j + 1). In order to perform this operation inside the voice processing device 1101, the control unit 1121 takes out the pitch waveform sequence S _reduced = {S _{r, 0} ,..., S _{r, Nk} } from the storage unit 1137. In addition, since S _reduced has undergone (k-1) substitutions, the number of pitch waveforms is reduced by (k-1), and is eventually composed of a total of (N-k + 1) pitch waveforms. . The control unit 1121 deletes S _{r, j + 1} from S _reduced to thereby obtain a first provisional pitch waveform sequence S _tmp = {S _{r, 0} ,..., S _{r, j} , S _{r, j + 2} , _.. , S _{r, Nk} } is generated, and then an operation of substituting C _j into S _{r, j} is performed. Subsequently, S _r, the operation of substituting the value of _{j + 2} S _r, the _{_{j + 1, S r, j}} + 3 of the value S _r, the operation of substituting the _{j + 2,} etc., S _{r, Nk} This is continued until the value of is substituted for S _{r and Nk−1} . Thus, the second provisional pitch waveform sequence S ′ _tmp = {S _{r, 0} ,..., S _{r, Nk−1} } is generated, and S _reduced is updated to S _reduced = S ′ _tmp . Further, the control unit 1121 measures the length of the updated S _reduced for comparison with the reduction target value, and then stores the updated S _reduced in the storage unit 1137 (step S1711).

縮小の目標値は、縮小処理の開始の前に、例えばユーザが入力装置１１１１を用いて入力制御部１１３１を介して制御部１１２１に伝えてあるものとする。ステップＳ１７１１で測られたＳ_reducedの長さが、縮小の目標値に到達しているかどうかは、ステップＳ１７１３において判別される。目標値に到達しているならば（ステップＳ１７１３；Ｙｅｓ）、これ以上の縮小は不要なので、制御部１１２１は伸張処理を終了し、この時点におけるＳ_reducedを縮小の最終結果としてデータ記録部１１３９により記録媒体１１１５に記録したり、縮小処理完了の旨を出力制御部１１３３を介して出力装置１１１３に出力してユーザに知らせたりする。 It is assumed that the reduction target value is transmitted to the control unit 1121 via the input control unit 1131 using the input device 1111, for example, before the reduction process is started. In step S1713, it is determined whether or not the length of S _reduced measured in step S1711 has reached the target value for reduction. If the target value has been reached (step S1713; Yes), no further reduction is necessary, so the control unit 1121 terminates the expansion process, and the data recording unit 1139 uses S _reduced as the final reduction result. Recording is performed on the recording medium 1115, and the fact that the reduction process is completed is output to the output device 1113 via the output control unit 1133 to notify the user.

それに対して、ステップＳ１７１１更新されたＳ_reducedが目標長に到達していないと判別された場合（ステップＳ１７１３；Ｎｏ）、制御部１１２１はさらに新たなピッチ波形を生成して置換操作を行いＳ_reducedを縮めるべきであるから、新規ピッチ波形配置先となる境界を検索するステップＳ１７０５に戻る。このとき、まだ新規ピッチ波形の配置に関わっていない境界のうちから挿入の優先度が最も高い境界を選び出すために、優先順位を表すカウンタｋを１だけ増加する（ステップＳ１７１５）。 On the other hand, when it is determined that the updated S _reduced in step S1711 has not reached the target length (step S1713; No), the control unit 1121 generates a new pitch waveform and performs a replacement operation to perform the S _reduced. Therefore, the process returns to step S1705 to search for a boundary that is a new pitch waveform placement destination. At this time, in order to select the boundary having the highest insertion priority from the boundaries not yet related to the arrangement of the new pitch waveform, the counter k indicating the priority is incremented by 1 (step S1715).

前述のように、指定された縮小度としては０．５より大きい値を前提としているから、元のピッチ波形同士がなす境界の全てについて置換操作を行い尽くすまでに、Ｓ_reducedは必ず目標値に到達（ステップＳ１７１３；Ｙｅｓ）し、処理が完了する。 As described above, since the specified degree of reduction is assumed to be a value larger than 0.5, S _reduced is always set to the target value until the replacement operation is performed for all the boundaries formed by the original pitch waveforms. It reaches (step S1713; Yes), and the processing is completed.

前述のように、指定された縮小度が０．５より小さい場合であっても、図７に示す処理を繰り返すことにより、目標の長さにまで縮小することができる。 As described above, even when the specified reduction degree is smaller than 0.5, the process can be reduced to the target length by repeating the process shown in FIG.

（実施形態３）
本発明に係る実施形態３に係る装置として、実施形態１に係る装置の音声伸張機能と、実施形態２に係る装置の音声縮小機能とを、併せ持つ装置が挙げられる。つまり、本実施形態に係る装置は、音声伸縮装置である。 (Embodiment 3)
An apparatus according to the third embodiment of the present invention includes an apparatus that has both the voice expansion function of the apparatus according to the first embodiment and the voice reduction function of the apparatus according to the second embodiment. That is, the device according to the present embodiment is an audio expansion / contraction device.

その構成は、実施形態１及び２に係る装置と同じく、図１に示す音声処理装置１１０１である。ただし、プログラム格納部１１３５には、音声伸張と音声縮小の両方の処理に対応できるようなプログラムが格納されている。 The configuration is the audio processing apparatus 1101 shown in FIG. 1 as in the apparatuses according to the first and second embodiments. However, the program storage unit 1135 stores a program that can handle both audio expansion and audio reduction processes.

本実施形態に係る装置は、まず伸張か縮小かの判別を行い、伸張する場合であれば実施形態１に係る装置と同じ動作を行うものとし、縮小する場合であれば実施形態２に係る装置と同じ動作を行うものであるとする。 The apparatus according to the present embodiment first determines whether to expand or contract, and performs the same operation as the apparatus according to the first embodiment if it is to be expanded, and if it is to be reduced, the apparatus according to the second embodiment. It is assumed that the same operation is performed.

ユーザは、図１に示す入力装置１１１１によって、伸張倍率または縮小倍率Ｍ（Ｍ＞０）を指定する。すると、制御部１１２１は、プログラム格納部１１３５に格納されたプログラムに従って、まず、Ｍが、０＜Ｍ＜１であるか、それともＭ＞１であるか、あるいはＭ＝１であるか、を判別する。次に、制御部１１２１は、０＜Ｍ＜１と判別した場合には、縮小処理が要求されていると判断して、本実施形態に係る装置を、実施形態２に係る装置として動作させる。Ｍ＞１と判別された場合には、実施形態１に係る装置により行われる伸張処理と同じ伸張処理が実行される。なお、Ｍ＝１である場合は、等倍であることを意味するから、処理対象である元の音声波形データを、そのままコピーしたものを処理結果とする。 The user designates the expansion magnification or the reduction magnification M (M> 0) using the input device 1111 shown in FIG. Then, according to the program stored in program storage unit 1135, control unit 1121 first determines whether M is 0 <M <1, M> 1, or M = 1. To do. Next, when it is determined that 0 <M <1, the control unit 1121 determines that the reduction process is requested, and causes the apparatus according to the present embodiment to operate as the apparatus according to the second embodiment. If it is determined that M> 1, the same decompression process as that performed by the apparatus according to the first embodiment is executed. Note that when M = 1, it means equal magnification, and the processing result is a copy of the original speech waveform data to be processed as it is.

（ピッチ波形の長さの変更方法の例）
実施形態１に係る装置においては、２倍以上の伸張時に、ピッチ波形の長さを様々に変更する処理が必要とされる。かかるピッチ波形長の変更方法としては、既に述べたように、サンプリング位置の変更を行うのが簡便である。以下ではこの方法について、図８を参照して具体的に説明する。 (Example of how to change the length of the pitch waveform)
In the apparatus according to the first embodiment, a process for variously changing the length of the pitch waveform is required when the expansion is twice or more. As a method for changing the pitch waveform length, as described above, it is easy to change the sampling position. This method will be specifically described below with reference to FIG.

図８（ａ）は、ピッチ波形Ｓ_jの時間依存性を模式的に示したものである。横軸が時間を表す。サンプリング周期をｑとすると、ピッチ波形Ｓ_jは、図１の音声処理装置１１０１の内部では時間間隔ｑごとの波高の配列として表現され（図８（ａ）の白丸印）、ピッチ波形長はｐｌ（ｊ）×ｑである。 FIG. 8A schematically shows the time dependence of the pitch waveform S _j . The horizontal axis represents time. If the sampling period is q, the pitch waveform S _j is expressed as an array of wave heights for each time interval q in the speech processing apparatus 1101 in FIG. 1 (white circles in FIG. 8A), and the pitch waveform length is pl. (J) × q.

サンプリング位置の変更によるピッチ波形長変更方法の原理は、ピッチ波形長を伸張する場合も、縮小する場合も、同じである。ここでは、縮小する場合を例に、説明する。 The principle of the pitch waveform length changing method by changing the sampling position is the same whether the pitch waveform length is extended or reduced. Here, the case of reduction will be described as an example.

サンプリング周期をｒ（ただし、ｒ＞ｑである。）として、再サンプリングを行うと仮定する。この新たなサンプリングが行われる時刻は、図８（ａ）の黒三角印で示され、新たなサンプリングにより採取される波形は、図８（ａ）の白四角印で示される。理解を容易にするため、さらに、ｒのｑに対する関係は、図８に示すようなものであるとする。すなわち、ピッチ波形Ｓ_ｊの開始時刻から順に時系列を追ったときに、新たなサンプリング時刻が、元のサンプリング時刻からしだいに遅れていき、ピッチ波形Ｓ_ｊが終了する時刻には、新旧のサンプリング時刻が一致するとする。 Assume that the sampling period is r (where r> q), and re-sampling is performed. The time at which this new sampling is performed is indicated by a black triangle mark in FIG. 8A, and the waveform acquired by the new sampling is indicated by a white square mark in FIG. 8A. In order to facilitate understanding, it is further assumed that the relationship of r to q is as shown in FIG. That is, when the time series is followed in order from the start time of the pitch waveform S _j , the new sampling time is gradually delayed from the original sampling time, and at the time when the pitch waveform S _j ends, the old and new samplings are performed. Suppose that the times match.

新たなサンプリング周期ｒは、元来は、元のサンプリング周期ｑよりも長いのであるが、計算上は、新たなサンプリング周期ｒで採取された波高を、元のサンプリング周期ｑで採取されたものであるとして扱う。つまり、ピッチ波形長ｐｌ（ｊ）×ｑの長さのピッチ波形が新たなサンプリング周期ｒの波高配列で離散的に表現されている状態（図８（ｂ））から、波高配列はそのままにして、サンプリング周期がｑになったものとみなす処理を行う。この処理は、図８（ｂ）において、白四角印で表されたサンプリング点を白矢印のように移動させることを意味する。 The new sampling period r is originally longer than the original sampling period q. However, in the calculation, the wave height sampled at the new sampling period r is acquired at the original sampling period q. Treat as there is. That is, from the state where the pitch waveform having the pitch waveform length pl (j) × q is discretely represented by the wave height array of the new sampling period r (FIG. 8B), the wave height array is left as it is. Then, processing is performed assuming that the sampling period is q. This processing means that the sampling points represented by white squares in FIG. 8B are moved as indicated by white arrows.

該移動の結果、図８（ｃ）に示す新たなピッチ波形が生成された。この図から明らかなように、新たなピッチ波形は、元のピッチ波形Ｓ_ｊを、時間軸上で縮めたものである。そして、新たなピッチ波形の波形長ｐｌ’は、元のピッチ波形Ｓ_ｊの波形長ｐ（ｊ）×ｑより、時間ｑだけ短縮されたものである。すなわち、
ｐｌ’＝｛ｐｌ（ｊ）−１｝×ｑ
である。あるいは、倍率にして
〔｛ｐｌ（ｊ）−１｝／｛ｐｌ（ｊ）｝〕
倍の縮小が完了したということもできる。 As a result of the movement, a new pitch waveform shown in FIG. 8C was generated. As is clear from this figure, the new pitch waveform is the original pitch waveform S _j shortened on the time axis. The waveform length pl ′ of the new pitch waveform is shorter than the waveform length p (j) × q of the original pitch waveform S _j by time q. That is,
pl ′ = {pl (j) −1} × q
It is. Alternatively, the magnification is [{pl (j) -1} / {pl (j)}]
It can also be said that double reduction has been completed.

以上では、元のピッチ波形に忠実に、再サンプリングを行うことを前提として説明してきた。しかし、音声処理装置１１０１の動作によっては、ピッチ波形Ｓ_jをひとたび周期ｑでサンプリングし終えた後、アナログデータとしてのピッチ波形Ｓ_jを破棄してしまう場合も考えられる。あるいは、破棄しないまでも、再サンプリングに要する手続を省略したい場合も考えられる。 The above description has been made on the assumption that resampling is performed faithfully to the original pitch waveform. However, depending on the operation of the audio processing device 1101, it may be considered that the pitch waveform S _j as analog data is discarded after the pitch waveform S _j is once sampled at the period q. Alternatively, it may be possible to omit the procedure required for resampling even if it is not discarded.

かかる場合には、元々サンプルのない位置での再サンプリングが必要となる。そのためには様々な方法が考えられるが、最も簡単な方法は、以下に説明する１次補間を用いる方法である。 In such a case, re-sampling at a position where there is no sample is necessary. For this purpose, various methods are conceivable, but the simplest method is a method using linear interpolation described below.

図８（ｄ）には、アナログデータとしてのピッチ波形が鎖線で描かれ、最初のサンプリングが行われた箇所を白丸印で表している。波形縮小処理のために、図８（ｄ）の黒三角印の時刻で再サンプリングをしようとしたときに、アナログデータとしてのピッチ波形が既に失われているか、または、もうアナログデータの処理を繰り返したくないとする。このとき、厳密な再サンプリングは、たまたま最初のサンプリング時刻と重なっている時刻を除き、もはや不可能である。よって、次善の策として、図８（ｄ）に示すように、例えば元の波高ａとｂとに挟まれ元のサンプリング間隔をｔ：ｕに内分する時刻における再サンプリング値が必要になったときには、１次補間
ａ＋〔｛（ｂ−ａ）×ｔ｝／（ｔ＋ｕ）｝
により計算される値で近似する。 In FIG. 8D, a pitch waveform as analog data is drawn with a chain line, and a portion where the first sampling is performed is represented by a white circle. When re-sampling is attempted at the time indicated by the black triangle in FIG. 8D for waveform reduction processing, the pitch waveform as analog data has already been lost, or analog data processing has been repeated. I don't want to. At this time, exact resampling is no longer possible except for a time that happens to overlap the initial sampling time. Therefore, as a second best measure, as shown in FIG. 8D, for example, a re-sampling value at the time when the original sampling interval is internally divided into t: u between the original wave heights a and b is necessary. Linear interpolation a + [{(b−a) × t} / (t + u)}
Approximate with the value calculated by.

なお、本発明は上記実施形態に限定されず、種々の変形及び応用が可能である。
例えば、実施形態１では、音声信号を２倍より大きく伸張する場合に、ピッチ波形長を少しずつ変えた複数の新規ピッチ波形長を生成することとした。一方、２倍より長くする伸張処理の場合であっても、実施形態２において半分以下の長さに縮めることが要求された場合に行われる処理によく似た処理を用いてもよい。つまり、２倍以上の伸張を要求された際には、いったん２倍までの伸張を行い、その伸張結果を再び元の音声データとみなして、２倍を限度とする伸張処理を再び行えば、ピッチ波形長の調整をせずに、本当の元の音声データからみて４倍までの伸張が可能になる。これを繰り返せば、原理的には、任意の倍率に伸張することができる。 In addition, this invention is not limited to the said embodiment, A various deformation | transformation and application are possible.
For example, in the first embodiment, when the audio signal is expanded more than twice, a plurality of new pitch waveform lengths in which the pitch waveform length is changed little by little are generated. On the other hand, even in the case of an extension process that is longer than twice, a process that is similar to the process that is performed when it is requested to reduce the length to half or less in the second embodiment may be used. In other words, when the expansion of 2 times or more is requested, once the expansion is performed up to 2 times, the expansion result is regarded as the original audio data again, and the expansion processing up to 2 times is performed again. Without adjusting the pitch waveform length, it is possible to expand up to four times when viewed from the true original audio data. If this is repeated, in principle, it can be expanded to an arbitrary magnification.

また、音声処理装置１１０１は、インターネット等の通信ネットワークを介して他の装置と通信を行う通信制御部をさらに備えてもよく、この通信制御部を介して、伸縮した音声波形データを他の装置に送信するようにしてもよい。また、この通信制御部を介して、音声波形データを他の装置から受信し、伸縮を行うようにしてもよい。 The voice processing device 1101 may further include a communication control unit that communicates with other devices via a communication network such as the Internet, and the expanded and contracted voice waveform data is transmitted to the other devices via the communication control unit. You may make it transmit to. In addition, voice waveform data may be received from another device via this communication control unit, and expanded or contracted.

あるいはまた、出力装置１１１３を音声出力が可能なものとし、伸縮処理の結果得られた音声信号をすぐに再生して、ユーザがその出来映えをその場で判断することができるようにしてもよい。 Alternatively, the output device 1113 may be capable of outputting sound, and the sound signal obtained as a result of the expansion / contraction process may be immediately reproduced so that the user can judge the performance on the spot.

なお、本発明の実施形態にかかる音声処理装置１１０１を実現するための情報処理装置は、必ずしも専用のシステムによらずとも、通常のコンピュータシステムを用いて実現可能である。例えば、汎用コンピュータに、上述の動作を実行するためのプログラムを格納したコンピュータ読み取り可能な記録媒体（ＦＤ、ＣＤ−ＲＯＭ、ＤＶＤ等）に格納して配布し、該プログラムをコンピュータにインストールすることにより、上述の処理を実行する音声処理装置１１０１を構成することができる。また、該プログラムをインターネット等の通信ネットワーク上のサーバ装置が有するディスク装置に格納しておき、例えばコンピュータにダウンロード等するようにしてもよい。 Note that the information processing apparatus for realizing the audio processing apparatus 1101 according to the embodiment of the present invention can be realized using a normal computer system, not necessarily a dedicated system. For example, by storing and distributing a general-purpose computer in a computer-readable recording medium (FD, CD-ROM, DVD, etc.) storing a program for executing the above-described operation, and installing the program in the computer The voice processing device 1101 that executes the above-described processing can be configured. In addition, the program may be stored in a disk device included in a server device on a communication network such as the Internet and downloaded to a computer, for example.

また、ＯＳが上述の処理の一部を分担する場合、あるいは、ＯＳが本願発明の構成要素の一部を構成するような場合には、記録媒体には、その部分を除いたプログラムを格納して配布してもよく、また、コンピュータにダウンロード等してもよい。この場合も、その記録媒体には、コンピュータが実行する各機能または各ステップを実行するためのプログラムが格納されている。 Further, when the OS shares a part of the above-described processing or when the OS constitutes a part of the constituent elements of the present invention, a program excluding the part is stored in the recording medium. It may be distributed or downloaded to a computer. Also in this case, the recording medium stores a program for executing each function or each step executed by the computer.

本発明の実施形態１にかかる音声処理装置のブロック図である。1 is a block diagram of a speech processing apparatus according to Embodiment 1 of the present invention. 処理対象である音声波形がピッチ波形に時分割されている状態を示す図である。It is a figure which shows the state by which the audio | voice waveform which is a process target is time-divided into the pitch waveform. 連続する２つのピッチ波形に基づいて新規ピッチ波形を生成し、それを挿入する様子を示す図である。It is a figure which shows a mode that a new pitch waveform is produced | generated based on two continuous pitch waveforms, and it is inserted. 本発明の実施形態１に係る音声伸張処理のうち、主に２倍より小さい伸張が要求されている場合の処理を説明するためのフローチャートである。It is a flowchart for demonstrating the process when the expansion | extension mainly smaller than 2 times is requested | required among the audio | voice expansion | extension processes which concern on Embodiment 1 of this invention. 本発明の実施形態１に係る音声伸張処理のうち、２倍より大きい伸張が要求されている場合の処理を説明するためのフローチャートである。It is a flowchart for demonstrating a process when expansion | extension larger than 2 times is requested | required among the audio | voice expansion | extension processes which concern on Embodiment 1 of this invention. 連続する２つのピッチ波形に基づいて新規ピッチ波形を生成し、それにより元の２つのピッチ波形を置換する様子を示す図である。It is a figure which shows a mode that a new pitch waveform is produced | generated based on two continuous pitch waveforms, and the original two pitch waveforms are substituted by it. 本発明の実施形態２に係る音声縮小処理を説明するためのフローチャートである。It is a flowchart for demonstrating the audio | voice reduction | decrease process which concerns on Embodiment 2 of this invention. サンプリング点の変更と１次補間とによりピッチ波形の長さを変更する様子を示す図である。It is a figure which shows a mode that the length of a pitch waveform is changed by the change of a sampling point, and primary interpolation.

Explanation of symbols

１１０１・・・音声処理装置、１１１１・・・入力装置、１１１３・・・出力装置、１１１５・・・記録媒体、１１２１・・・制御部、１１３１・・・入力制御部、１１３３・・・出力制御部、１１３５・・・プログラム格納部、１１３７・・・記憶部、１１３９・・・データ記録部 1101 ... Audio processing device, 1111 ... Input device, 1113 ... Output device, 1115 ... Recording medium, 1121 ... Control unit, 1131, ... Input control unit, 1133 ... Output control 1135 ... Program storage unit 1137 ... Storage unit 1139 ... Data recording unit

Claims

A storage unit for storing a speech waveform that is a time series of segmented waveforms;
With respect to two consecutive segment waveforms in the speech waveform stored in the storage unit, a weighting factor that increases linearly in time with respect to the front segment waveform and time with respect to the rear segment waveform. Each using a weighting factor that decreases linearly in each case, and a new segmented waveform generating means for generating one new segmented waveform by weighted addition,
A plurality of insertions each having a different length between the length of the front segment waveform and the length of the rear segment waveform by expanding / contracting the new segment waveform generated by the new segment waveform generation means on the time axis Insertion segment waveform generation means for generating segment waveform,
Insertion means for inserting a plurality of insertion segment waveforms generated by the insertion segment waveform generation unit between the front segment waveform and the rear segment waveform;
A voice waveform output means for outputting a voice waveform expanded by inserting an insertion section waveform by the insertion means;
A voice decompression device comprising:

Similarity calculation means for calculating a similarity indicating how similar the two continuous waveform segments are;
Target length arrival determining means for determining whether or not the length of the voice waveform expanded by inserting the insertion section waveform by the inserting means has reached a predetermined voice waveform target length;
Further comprising
The insertion means inserts the insertion segment waveform in descending order of similarity between the two consecutive segment waveforms based on the similarity calculated by the similarity calculation unit,
The target length arrival determination means determines whether or not the length of the voice waveform expanded every time the insertion section waveform is inserted by the insertion means has reached a predetermined voice waveform target length,
When it is determined that the target length has been achieved by the target length arrival determining means, the insertion by the inserting means is terminated, and when it is determined that the target length has not been achieved, the insertion means continues to insert according to the order. Do,
The audio decompression apparatus according to claim 1, wherein:

The insertion segment waveform generation means is configured to insert the insertion segment waveform between the two consecutive segment waveforms included in the speech waveform stored in the storage unit by the insertion means. If the target length arrival determining means determines that the target length has not been reached, increase the number of segmented waveforms to be generated,
The speech decompression apparatus according to claim 2, wherein

The insertion segment waveform generation means generates a plurality of insertion segment waveforms each having a different length by changing the sampling position of the new segment waveform.
The audio expansion device according to any one of claims 1 to 3, wherein

5. The speech decompression apparatus according to claim 1, further comprising speech segmentation means for segmenting a speech waveform to be processed into time series of segmented waveforms.

The voice dividing means detects a pitch break of the voice, and generates a time series of segmented waveforms by division according to the break;
The voice decompression apparatus according to claim 5.

An audio expansion method for expanding an audio waveform that is a time series of segmented waveforms,
For two consecutive segment waveforms in the speech waveform, a weighting factor that increases linearly in time for the preceding segment waveform and decreases linearly in time for the rear segment waveform. A new segment waveform generation step for generating one new segment waveform by weighting and adding the weighting factors to be used, respectively,
A plurality of insertions having different lengths between the length of the front segment waveform and the length of the rear segment waveform by expanding and contracting the new segment waveform generated in the new segment waveform generation step on the time axis. An insertion segment waveform generation step for generating a segment waveform; and
An insertion step of inserting a plurality of insertion segment waveforms generated in the insertion segment waveform generation step between the front segment waveform and the rear segment waveform;
An audio decompression method consisting of:

A similarity calculation step for calculating a similarity indicating how similar the two continuous waveform segments are;
A target length attainment determining step for determining whether or not the length of the voice waveform expanded by inserting the insertion segment waveform by the insertion step has reached a predetermined voice waveform target length;
Further comprising
The insertion step inserts an insertion segment waveform in descending order of similarity between the two consecutive segment waveforms based on the similarity calculated by the similarity calculation step;
The target length arrival determination step determines whether or not the length of the voice waveform expanded every time the insertion segment waveform is inserted by the insertion step has reached a predetermined voice waveform target length,
When it is determined that the target length has been achieved by the target length arrival determination step, the insertion by the insertion step is terminated, and when it is determined that the target length has not been achieved, the insertion step continues to insert according to the order. Do,
The audio decompression method according to claim 7.

Said insertion segment waveform generation step, the insertion be performed is inserted in the insertion segment waveform by step, the target length reach determination steps for between two segment waveform successive all included in the sound voice waveform If it is determined that the target length has not been reached, increase the number of segmented waveforms to be generated.
The audio decompression method according to claim 8.

The insertion segment waveform generation step generates a plurality of insertion segment waveforms each having a different length by changing the sampling position of the new segment waveform.
The audio decompression method according to claim 7, wherein the audio decompression method is performed.

The speech decompression method according to any one of claims 7 to 10, further comprising a speech division step of dividing a speech waveform to be processed into time series of segmented waveforms.

A computer of a voice expansion device that expands a voice waveform that is a time series of segmented waveforms stored in a storage unit,
With respect to two consecutive segment waveforms in the speech waveform stored in the storage unit, a weighting factor that increases linearly in time with respect to the front segment waveform and time with respect to the rear segment waveform. Each newly using a weighting factor that decreases linearly, and a new segmented waveform generating means for generating one new segmented waveform by weighted addition,
A plurality of insertions each having a different length between the length of the front segment waveform and the length of the rear segment waveform by expanding / contracting the new segment waveform generated by the new segment waveform generation means on the time axis Insertion section waveform generation means for generating section waveforms,
Insertion means for inserting a plurality of insertion segment waveforms generated by the insertion segment waveform generation unit between the front segment waveform and the rear segment waveform;
A voice waveform output means for outputting a voice waveform expanded by inserting an insertion section waveform by the insertion means;
Program to function as.

Computer
Similarity calculation means for calculating a similarity indicating how similar the two continuous waveform segments are;
Target length arrival determining means for determining whether or not the length of the voice waveform expanded by inserting the insertion section waveform by the inserting means has reached a predetermined voice waveform target length;
Further function as
The insertion means inserts the insertion segment waveform in descending order of similarity between the two consecutive segment waveforms based on the similarity calculated by the similarity calculation unit,
The target length arrival determination means determines whether or not the length of the voice waveform expanded every time the insertion section waveform is inserted by the insertion means has reached a predetermined voice waveform target length,
When it is determined that the target length has been achieved by the target length arrival determining means, the insertion by the inserting means is terminated, and when it is determined that the target length has not been achieved, the insertion means continues to insert according to the order. Do,
The program according to claim 12, wherein the program is made to function as follows.

The insertion segment waveform generation means is configured to insert the insertion segment waveform between the two consecutive segment waveforms included in the speech waveform stored in the storage unit by the insertion means. If the target length arrival determining means determines that the target length has not been reached, increase the number of segmented waveforms to be generated,
The program according to claim 13, wherein the program is made to function as follows.

The insertion segment waveform generation means generates a plurality of insertion segment waveforms each having a different length by changing the sampling position of the new segment waveform.
The program according to any one of claims 12 to 14, wherein the program is made to function as follows.

Computer
The program according to any one of claims 12 to 15, further functioning as voice dividing means for dividing a voice waveform to be processed into a time series of segmented waveforms.