JP6546288B2

JP6546288B2 - Voice noise detection device and voice noise detection method

Info

Publication number: JP6546288B2
Application number: JP2017555099A
Authority: JP
Inventors: 田中　宏幸; 宏幸田中
Original assignee: Hitachi Kokusai Electric Inc; Kokusai Denki Electric Inc
Current assignee: Kokusai Denki Electric Inc
Priority date: 2015-12-08
Filing date: 2016-12-07
Publication date: 2019-07-17
Anticipated expiration: 2036-12-07
Also published as: WO2017099123A1; JPWO2017099123A1

Description

本発明は、音声ノイズ検出装置および音声ノイズ検出方法に関するものである。 The present invention relates to an audio noise detection apparatus and an audio noise detection method.

放送業務用のビデオサーバーシステムにおける映像、音声については種々の特徴がある。
例えば、放送局において、映像のフレームレートは約29.97fps（ドロップフレーム）が一般的に採用されている。また、音声のサンプリングレートは48kHz（48000サンプル/秒）が一般的に採用されている。There are various features of video and audio in a video server system for broadcast service.
For example, in a broadcast station, a video frame rate of about 29.97 fps (drop frame) is generally employed. In addition, a sampling rate of 48 kHz (48000 samples / second) is generally adopted for voice.

ビデオサーバーシステムにおいて、素材データは１フレームを単位とした扱いをするケースが多い。29.97fps、48kHzでは単位時間（１フレーム）あたりの音声サンプル数は1601.6サンプルとなり、端数が生じる。そのため１フレームあたり１６０１サンプルか１６０２サンプルの範囲で規則立てて変動させることで解決している。例えば５フレームを周期として、１、３、５番目のフレームにおいて1602サンプルとし、２、４番目フレームにおいて1601サンプルとするシーケンスを用いる。そうすると５フレーム毎に、1602サンプルのフレームが２回連続することになる。 In video server systems, material data is often handled in units of one frame. At 29.97 fps and 48 kHz, the number of audio samples per unit time (one frame) is 1601.6 samples, resulting in fractions. Therefore, the problem is solved by regularly changing in the range of 1601 samples or 1602 samples per frame. For example, a sequence of 1602 samples in the first, third, and fifth frames and 1601 samples in the second and fourth frames is used with a period of 5 frames as a cycle. Then, every five frames, a frame of 1602 samples will be continuous twice.

したがって、音声データを送る側と受け取る側で、現在のフレームが1602サンプルなのか1601サンプルなのか、認識を同じくする必要がある。それを示すために、例えばANSI/SMPTE 272Mでは、Audio control packetに1から5の間の整数のAudio frame numberを挿入することができる。 Therefore, it is necessary to recognize whether the current frame is 1602 samples or 1601 samples at the sending side and the receiving side of voice data. In order to indicate that, for example in ANSI / SMPTE 272M, it is possible to insert an integer audio frame number between 1 and 5 in the Audio control packet.

しかし、元データであるＨＤ−ＳＤＩ（High Definition-Serial Digital Interface）信号上の音声データや、蓄積サーバに格納される音声データ自体に、ＣＲＣ（Cyclic Redundancy Check）、チェックサムといった照合手段が常に提供されるとは限らない。音声データは一旦受信されると、ヘッダーも無い実データのみメモリに保持しているのが一般的である。その結果、何らかの理由で送る側或いは受け側がシーケンスを誤認したり、入力された音声データ量が上記シーケンスに従ったデータ量と異なっていたりすると、結果的にこの同期が外れ、データの欠損が生じうる。こうしたことは、完璧な編集を行うことが困難な時事報道や時差送出のシステムで、音声のソースが切替えられたり、編集されたりしたときに、起こり得る。 However, verification means such as CRC (Cyclic Redundancy Check) and checksum is always provided for audio data on HD-SDI (High Definition-Serial Digital Interface) signal which is original data and audio data stored in the storage server itself. It is not always the case. Generally, once voice data is received, only real data without a header is stored in the memory. As a result, if the sender or receiver misunderstands the sequence for some reason, or if the amount of input voice data is different from the amount of data according to the above sequence, this synchronization may result, resulting in data loss. sell. This can happen when audio sources are switched or edited in a journalistic or time-shifting system where perfect editing is difficult.

図３はサーバ等に転送される音声データについて説明する図である。ここでは、エンコーダが、16bit/サンプルの非圧縮のPCM音声データをフレーム毎に蓄積し、符号化映像データ等とともにサーバに転送することを想定する。エンコーダアプリケーションは、共有メモリに書き込まれたPCM音声データをフレーム毎に読み出す。読出し量は上述のシーケンスで変化する。受け取る側のサーバでは、各フレームの音声に対して十分な量の記録領域を確保しており、受取った音声をそのまま記録する。図３（Ａ）はエンコーダから転送されサーバに格納される格納データ量が本来の音声データ量より小さい場合である。 FIG. 3 is a diagram for explaining voice data transferred to a server or the like. Here, it is assumed that the encoder accumulates 16-bit / sample uncompressed PCM audio data for each frame and transfers it to a server together with encoded video data and the like. The encoder application reads out the PCM audio data written in the shared memory frame by frame. The read amount changes in the above-mentioned sequence. The receiving server secures a sufficient amount of recording area for the voice of each frame, and records the received voice as it is. FIG. 3A shows the case where the amount of stored data transferred from the encoder and stored in the server is smaller than the original amount of audio data.

例えば、エンコーダが、現在のフレームを1601サンプルのフレームであると認識しているとする。しかし、共有メモリには1602サンプルのPCM音声データが書き込まれている。この結果、エンコーダアプリケーションは、３２０４バイトあるPCM音声データのうち、２１０２バイトしか転送せず、ビデオサーバは２バイト分の音声データを収録（記憶）できずに失うことになる。 For example, assume that the encoder recognizes the current frame as a frame of 1601 samples. However, 1602 samples of PCM audio data are written in the shared memory. As a result, the encoder application transfers only 2102 bytes of the PCM audio data having 3204 bytes, and the video server loses without storing (storing) 2 bytes of audio data.

図３（Ｂ）は格納データ量が音声データ量より大きい場合である。例えば、エンコーダが、現在のフレームを1602サンプルのフレームであると認識しているのに対し、共有メモリには1601サンプルのPCM音声データしか書き込まれていない。このような現象は、映像と音声が別々の物理I/Fで入力される時などに起こり得る。この結果、エンコーダアプリケーションは、本来の３２０２バイトのPCM音声データに２バイト分のNull値（0x00）を付加した３２０４バイトのデータを転送し、転送すると、ビデオサーバはオリジナルには無かった２バイト分のNull値（0x00）を余計に収録（記憶）する。なお最後の２バイトがNull値となるのは、フレーム毎にメモリを初期化しているためである。 FIG. 3B shows the case where the amount of stored data is larger than the amount of audio data. For example, while the encoder recognizes the current frame as a frame of 1602 samples, only 1601 samples of PCM audio data are written in the shared memory. Such a phenomenon may occur, for example, when video and audio are input as separate physical I / Fs. As a result, the encoder application transfers 3204 bytes of data in which a null value (0x00) of 2 bytes is added to the original 3202 bytes of PCM audio data, and the video server transfers 2 bytes of data that was not present in the original. The extra null value (0x00) is recorded (stored). The last two bytes become null values because the memory is initialized for each frame.

このように、音声データは可変長になっているため、フレーム境界部でデータの過不足が生じやすい。過不足は音声ノイズとして素材に記録され、そのまま再生すると放送障害の一因となる。ただし、音声ノイズを特定するのは、高度な解析手段をもってしても困難である。また、ビデオサーバーシステムにおけるリアルタイム機器では、解析のような高い処理負荷を極力避ける必要がある。データ過剰は有効データに初期化データが付加された状態になる。初期化データが0x00だとしても、無音データは0x00の連続データであるため、0x00が必ずしもノイズ発生を示すわけではない。 As described above, since the voice data has a variable length, excess and deficiency of data tends to occur at frame boundaries. Excess or deficiency is recorded as audio noise in the material, and if played as it is, it contributes to broadcast failure. However, it is difficult to identify speech noise even with sophisticated analysis means. Also, in real-time devices in video server systems, it is necessary to avoid high processing load such as analysis as much as possible. Data excess is a state in which initialization data is added to valid data. Even if the initialization data is 0x00, 0x00 does not necessarily indicate noise occurrence because silence data is continuous data of 0x00.

先行技術文献としては、例えば、特許文献１に、出力用フレーム同期信号を基準とした１フレームのオーディオサンプル数を１フレーム単位とみなして、サンプリング変換処理を行う発明が開示されている。 As a prior art document, for example, Patent Document 1 discloses an invention which performs sampling conversion processing by regarding the number of audio samples of one frame based on the output frame synchronization signal as one frame unit.

特許登録第３８２５６７７号公報Patent registration 3825677 gazette 特開２０１０−０２８２２３号公報JP, 2010-028223, A 特許登録第４１２２６２４号公報Patent Registration No. 4122264 特開２０００−３０７６４５号公報JP, 2000-307645, A 特許登録第４８１２１７１号公報Patent registration 4812171 gazette

本発明の目的は、音声データのフレーム境界部分に発生するノイズを低負荷で判別することである。 An object of the present invention is to discriminate noise generated at frame boundaries of audio data with low load.

本発明の音声ノイズ検出装置は、音声データが映像フレーム周期で分割され、該分割の境界部分に既知の値のサンプルが挿入さることによって生じるノイズを検出する音声ノイズ検出装置であって、フレーム境界付近において既知の値を検出する境界部分判定部と、フレーム境界付近以外のサンプルに基づいて、フレーム毎に音量を判定する音量判定部と、判定された音量が所定値を超え、且つ既知の値が検出されたフレームにおいてカウントを行い、該カウント値に基づいてノイズを検出するノイズ判定部とを有する。 An audio noise detection apparatus according to the present invention is an audio noise detection apparatus for detecting noise caused by audio data being divided at a video frame period and a sample of a known value being inserted into the boundary portion of the division. A volume determination unit that determines the volume for each frame based on a boundary portion determination unit that detects a known value in the vicinity, and a sample other than the vicinity of a frame boundary, and the determined volume exceeds a predetermined value and is a known value And a noise determination unit that counts noise in a frame in which is detected and detects noise based on the count value.

前記既知の値は、０、NULL、または初期化データの値であり、境界部分判定部は、フレーム境界部分を、音声サンプルレートを映像フレームで除した非整数値の間隔で決定し、音量判定部は、フレーム境界付近以外のサンプルが前記既知の値と同じときに音量が所定値以下と判定することを特徴とする。 The known value is 0, NULL, or a value of initialization data, and the boundary portion determination unit determines the frame boundary portion at an interval of a noninteger value obtained by dividing the audio sample rate by the video frame, and determines the volume. The unit is characterized in that the volume is determined to be equal to or less than a predetermined value when the samples other than those near the frame boundary are equal to the known value.

さらに、本発明の音声ノイズ検出方法は、音声データのフレーム境界部分からノイズを検出する音声ノイズ検出方法であって、音声データのフレーム境界部分のデータをチェックし、音声データのフレーム境界部分以外のデータをチェックし、音声データのフレーム境界部分が0x00の場合には音量が所定値を超えているかを判定し、音量が所定値を超えている場合には0x00をカウントし、該カウント値が所定値以上の場合にはノイズ有りと判定することを特徴とする。 Furthermore, the audio noise detection method according to the present invention is an audio noise detection method for detecting noise from frame boundaries of audio data, wherein data in frame boundaries of audio data is checked, and other than frame boundaries of audio data. The data is checked, and if the frame boundary portion of the audio data is 0x00, it is determined whether the volume exceeds a predetermined value, and if the volume exceeds a predetermined value, 0x00 is counted and the count value is predetermined In the case of the value or more, it is characterized that it is determined that there is noise.

本発明によれば、音声データのフレーム境界部分に発生するノイズを低負荷で判別することができる。 According to the present invention, noise generated at a frame boundary portion of audio data can be determined at low load.

一実施例に係る音声ノイズ検出装置のブロック図。FIG. 1 is a block diagram of an audio noise detection apparatus according to an embodiment. 一実施例に係る音声ノイズ検出装置の動作を説明するためのフローチャート。The flowchart for demonstrating the operation | movement of the audio | voice noise detection apparatus which concerns on one Example. サーバ等に転送される音声データについて説明する図。The figure explaining the audio | voice data transferred to a server etc. FIG.

以下、本発明の実施形態について図面を参照して詳細に説明する。図１において、音声ノイズ検出装置１は、境界部分判定部１１、音量判定部１２、ノイズ判定部１４、ノイズ低減部１５で構成されている。ここでは、ノイズ原因となるNull値の初期化データは、フレーム境界において１サンプルのみ現れ、連続しないものとする。なお、音声ノイズ検出装置１は、ＣＰＵ（Central Processing Unit）等を用いたソフトウェア処理で音声ノイズを検出してもよい。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In FIG. 1, the audio noise detection device 1 includes a boundary portion determination unit 11, a sound volume determination unit 12, a noise determination unit 14, and a noise reduction unit 15. Here, it is assumed that the initialization data of the null value causing the noise appears only one sample at the frame boundary and is not continuous. The audio noise detection device 1 may detect audio noise by software processing using a CPU (Central Processing Unit) or the like.

共有メモリ１０は、図示しないSDIやストレージ等のインタフェースや、CPUの実行する他のプロセス（例えば復号化プロセス）から、音声データのストリームが書き込まれ、音声ノイズ検出装置１に対してその音声データを読み取り可能に提供する。なお共有メモリ以外にも、メモリマップトI/O、メモリマップトファイルのような様々な抽象化レベルのプロセス間通信手法や関連するＡＰＩ（Application Programming Interface）が利用できる。これらで利用できるＡＰＩは、ストリームのオープンとクローズ、初期化の命令、読み出しの用意ができているか否かのフラグ、読出しがデータの最後に達したか否かのフラグなどに限られ、もしくは、そのようなＡＰＩでさえも部分的に利用できない。 In the shared memory 10, a stream of audio data is written from an interface (not shown) such as an SDI or storage, or another process (for example, a decoding process) executed by the CPU, and the audio data is sent to the audio noise detection device 1. Provide readability. Other than shared memory, inter-process communication methods of various abstraction levels such as memory mapped I / O, memory mapped file, and related APIs (Application Programming Interface) can be used. The APIs that can be used with these are limited to the open and close of the stream, the initialization instruction, the flag indicating whether the read is ready, the flag indicating whether the read has reached the end of the data, or Even such an API is partially unavailable.

境界部分判定部１１は、共有メモリから読み出した音声データについて、フレーム境界部分となるサンプル位置（約１６００サンプルのうち先頭と末尾）のみを対象としてデータチェックを行うとともに、読み出された音声データを音量判定部１２に渡す。フレーム境界は、1601.2サンプル間隔で設定し、そこを中心に連続する１乃至数サンプルをチェックのために抽出する。つまり1601.2サンプル間隔の境界が、ある２サンプルの間に来た場合、その２サンプルを対象とし、ちょうどあるサンプルのタイミングと一致した場合はその１サンプルを対象にすればよい。フレーム境界のずれや初期化データの混入が多重的に起こることを考慮すると、常時３サンプル以上抽出してもよい。なお、フレーム単位で音声データがバースト的に入力される場合、フレーム境界は明確であり、バーストの最後の１サンプルのみ、もしくはその前後を含めた３サンプルをチェックすればよい。チェックの結果、対象データにNull値のサンプル、あるいはその他の初期化データが含まれていた場合、それをノイズ判定部１４に通知する。 The boundary portion determination unit 11 performs data check on audio data read out from the shared memory only at sample positions (the beginning and the end of about 1600 samples) serving as frame boundary portions, as well as the read audio data. It passes to the volume determination unit 12. A frame boundary is set at a 1601.2 sample interval, and one or several samples continuous around it are extracted for checking. In other words, if the boundary of the 1601.2 sample interval comes between two samples, the two samples may be targeted, and if the timing of just one sample matches, the one sample may be targeted. In consideration of multiple occurrences of frame boundary deviation and initialization data mixing, three or more samples may always be extracted. When voice data is input in a burst manner in frame units, the frame boundary is clear, and it is sufficient to check only the last one sample of the burst or three samples including before and after it. As a result of the check, if the target data includes a null value sample or other initialization data, the noise determination unit 14 is notified of this.

音量判定部１２は、無音区間で0x00をノイズとして誤検出することを防ぐ目的で、境界部分判定部１１から受取った音声データのフレーム境界部分以外に少なくとも１サンプル、例えば、約１６００サンプルの中間のデータが所定の音量を超える値であるかチェックする。例えば、データがNull値であれば当該フレーム付近において無音（音量が所定以下）、それ以外であれば有音（音量が所定を超えた）と判定し、その結果をノイズ判定部１４に出力する。有音フレームにおいてあるサンプルが0x00となる確率は無視できるほど小さい。音量判定部１２はまた、境界部分判定部１１から受取った音声データをそのままノイズ低減部１５に渡す。 The volume determination unit 12 has an intermediate value of at least one sample, for example, about 1600 samples, in addition to the frame boundary portion of the audio data received from the boundary portion determination unit 11 for the purpose of preventing false detection of 0x00 as noise in the silent section. Check if the data is above the specified volume. For example, if the data is a null value, it is determined that there is no sound (volume is below a predetermined level) in the vicinity of the frame, and other than that it is determined to be sound (a volume exceeds a predetermined level) and the result is output to the noise determination unit 14 . The probability that a sample in a spoken frame is 0x00 is negligibly small. Also, the volume determination unit 12 passes the audio data received from the boundary portion determination unit 11 to the noise reduction unit 15 as it is.

ノイズ判定部１４は、カウンタを有し、境界部分判定部１１からNull値等の初期化データを検出した旨の通知がある度に、カウンタをインクリメントする。ただし音量判定部１２の判定結果が、音量が所定以下であることを示す場合には、無音区間と判定しインクリメントしない。従って、Null値が続いてもノイズ発生と見なされない。また、無音でない場合、かつフレーム数に対してカウント数が所定値以上の場合、処理異常に起因するノイズ発生であると判断し、ノイズ検出結果を出力する。ノイズ検出結果には、もし必要であれば、境界部分判定部１１で検出されたNull値等のサンプルのタイミングを示す情報が含まれる。一方で、ノイズ判定部１４は、２以上の所定フレーム数の周期でカウンタを無条件にデクリメントする（ただし負数にはならない）。これにより、単純なカウント数の閾値処理で判断できる。 The noise determination unit 14 has a counter, and increments the counter each time it is notified from the boundary portion determination unit 11 that initialization data such as a null value has been detected. However, when the determination result of the volume determining unit 12 indicates that the volume is equal to or less than a predetermined level, it is determined as a silent section and is not incremented. Therefore, even if a null value continues, it is not considered as noise generation. Further, if not silent and if the count number is equal to or more than a predetermined value with respect to the number of frames, it is determined that noise is generated due to processing abnormality, and a noise detection result is output. The noise detection result includes, if necessary, information indicating the timing of the sample, such as the null value detected by the boundary portion determination unit 11. On the other hand, the noise determination unit 14 unconditionally decrements the counter (but does not become a negative number) in a cycle of a predetermined number of frames of 2 or more. This makes it possible to judge by simple threshold processing of the count number.

ノイズ低減部１５は、ノイズ発生を示すノイズ検出結果を受け取ると、Null値等の初期化データのサンプルを、周辺のサンプルに基づく推定値で置換する。推定値ｓ2は、リアルタイム処理のために既定以下の処理遅延で算出されなければならず、例えばｓ2＝２×ｓ1−ｓ0で得られる。ただしｓ1はｓ2の１つ前のサンプル値、ｓ0はｓ2の２つ前のサンプル値である。なお、小さな遅延の要求が無ければ、置換されるべきサンプルの前後の多数のサンプルをFIRフィルタ等に入力して得られる推定値を用いてもよい。 When receiving the noise detection result indicating noise generation, the noise reduction unit 15 replaces the sample of the initialization data such as the null value with the estimated value based on the surrounding samples. The estimated value s2 must be calculated with a processing delay equal to or less than a predetermined value for real-time processing, and can be obtained, for example, as s2 = 2 × s1−s0. Here, s1 is a sample value immediately before s2, and s0 is a sample value immediately before s2. If there is no requirement for a small delay, estimated values obtained by inputting a large number of samples before and after the sample to be replaced into an FIR filter or the like may be used.

図２は本発明の一実施例に係る音声ノイズ検出装置の動作を説明するためのフローチャートである。
音声ノイズ検出装置１は、音声データの読み込みを行う（Ｓ２０１）。
Ｓ２０２の処理では、フレーム境界部分となるサンプル位置（約１６００サンプルのうち先頭と末尾）のみを対象とし、データチェックを行う。FIG. 2 is a flow chart for explaining the operation of the speech noise detection apparatus according to one embodiment of the present invention.
The voice noise detection device 1 reads voice data (S201).
In the process of S202, the data check is performed on only the sample positions (the beginning and the end of the approximately 1600 samples) serving as the frame boundary portion.

Ｓ２０３の処理では、フレーム境界部以外に１サンプル程度、約１６００サンプルの中間を参照し、データをチェックする。
Ｓ２０４の処理では、音声データが0x00であるか否かを判定し、音声データが0x00の場合（ＹＥＳ）にはＳ２０５の処理に進み、音声データが0x00でない場合（ＮＯ）にはＳ２１０の処理に進む。In the process of S203, the data is checked with reference to the middle of about 1,600 samples by about one sample other than the frame boundary.
In the process of S204, it is determined whether the audio data is 0x00. If the audio data is 0x00 (YES), the process proceeds to the process of S205. If the audio data is not 0x00 (NO), the process of S210 is performed. move on.

Ｓ２０５の処理では、無音区間で0x00をノイズとして誤検出することを防ぐため、約１６００サンプルの中間のデータが所定の音量を超える値であるか否かを判定し、音量が所定以下の場合（ＮＯ）にはＳ２０６の処理に進み、音量が所定値を超える場合（ＹＥＳ）にはＳ２０７の処理に進む。
Ｓ２０６の処理では、無音区間と判定し、0x00等が続いてもノイズ発生と見なさないでＳ２１０の処理に進む。In the process of S205, in order to prevent false detection of 0x00 as noise in a silent section, it is determined whether or not intermediate data of about 1600 samples has a value exceeding a predetermined volume, and the volume is less than a predetermined value ( If the sound volume exceeds the predetermined value (YES), the process proceeds to step S207.
In the process of S206, a silent interval is determined, and even if 0x00 and so on continue, the process proceeds to the process of S210 without being regarded as noise generation.

Ｓ２０７の処理では、0x00である場合、またはその他の初期化データの場合としてカウントして、Ｓ２０８の処理に進む。
Ｓ２０８の処理では、カウント数が所定値以上か否かを判定し、カウント数が所定値以上の場合（ＹＥＳ）にはＳ２０９の処理に進み、カウント数が所定値未満の場合（ＮＯ）にはＳ２１０の処理に進む。In the process of S207, the process is counted as if it is 0x00 or other initialization data, and the process proceeds to S208.
In the process of S208, it is determined whether or not the count number is equal to or more than a predetermined value. If the count number is equal to or more than the predetermined value (YES), the process proceeds to S209, and if the count number is less than the predetermined value (NO) The process proceeds to the process of S210.

Ｓ２０９の処理では、フレーム境界部分に処理以上に起因するノイズ有りと判定したノイズ検出結果を出力する。
Ｓ２１０の処理では、次の処理に進む。In the process of S209, the noise detection result determined to have noise due to the process or more is output at the frame boundary portion.
In the process of S210, the process proceeds to the next process.

本発明の一実施例である音声ノイズ検出装置を収録装置に適用する場合は、収録する音声データに対して検出を行い、監視装置へのアラーム通知、および場合に応じて脱落サンプルを補正して収録、あるいはシステムを管理している装置に対して補正タスク命令を発行(出力)する。 When the voice noise detection device according to an embodiment of the present invention is applied to a recording device, detection is performed on voice data to be recorded, alarm notification to the monitoring device, and a dropout sample are corrected depending on cases. Issue (output) a correction task instruction to the device that is recording or managing the system.

また、本発明の一実施例である音声ノイズ検出装置を再生装置、編集装置等の素材を使用する装置に適用する場合は、再生・編集等に使用する音声データ対して検出を行い、監視装置へのアラーム通知、および場合に応じて脱落サンプルを補正して使用、あるいはシステムを管理している装置に対して補正タスク命令を発行(出力)する。 When the audio noise detection apparatus according to an embodiment of the present invention is applied to an apparatus using materials such as a reproduction apparatus and an editing apparatus, detection is performed on audio data used for reproduction / editing etc. Alarm notification, and, if necessary, correct (drop out) a corrected sample and use or issue (output) a corrected task instruction to a device which manages the system.

本発明の実施形態である音声ノイズ検出装置および音声ノイズ検出方法は、1601及び1602サンプルのフレームからなるシーケンスに相当する内部状態を持たない、ステートレス検証であり、本装置または方法自体がシーケンスを誤認して誤作動する恐れはない。また音声データのフレーム境界部分に発生するノイズを極めて低負荷で判別することができる。 The voice noise detection apparatus and the voice noise detection method according to the embodiment of the present invention are stateless verifications that do not have an internal state corresponding to a sequence consisting of frames of 1601 and 1602 samples. And there is no risk of malfunction. In addition, noise generated at frame boundaries of audio data can be determined with extremely low load.

以上、本発明の一実施形態について詳細に説明したが、本発明は上述した実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で種々変更して実施することができる。 As mentioned above, although one embodiment of the present invention was described in detail, the present invention is not limited to the embodiment mentioned above, and can be variously changed and carried out in the range which does not deviate from the meaning of the present invention.

音声データの取扱い上の理由でノイズの発生しうる用途に適用でき、SDI信号に音声データを埋め込むSMPTE 272Mや299M、及びそのSDI信号をIPパケットでカプセル化するSMPTE 2022-6等に準拠したシステムや装置等に好適である。 A system based on SMPTE 272M and 299M that embeds audio data in an SDI signal, and that conforms to SMPTE 2022-6 that encapsulates the SDI signal in an IP packet, etc. And suitable for devices and the like.

１０：音声ノイズ検出装置、１１：境界部分判定部、１２：音量判定部、１４：ノイズ判定部、１５：ノイズ低減部。 10: voice noise detection device, 11: boundary portion determination unit, 12: volume determination unit, 14: noise determination unit, 15: noise reduction unit.

Claims

An audio noise detection device for detecting noise generated by audio data being divided at a video frame period and insertion of a sample of a known value at a boundary portion of the division,
A boundary portion determination unit that detects the known value near the frame boundary;
A volume determination unit that determines a volume for each frame based on samples other than the vicinity of the frame boundary;
An audio noise detection device comprising: a noise determination unit that counts in a frame in which the determined volume exceeds a predetermined value and the known value is detected, and determines the presence or absence of noise based on the count value.

The known value is 0, NULL, or a value of initialization data,
The boundary portion determination unit determines the frame boundary portion at a noninteger value interval obtained by dividing the audio sample rate by the video frame,
The volume determining unit is configured to determine that the volume is equal to or less than the predetermined value when a sample other than the vicinity of the frame boundary has the same value as the known value.
The noise reduction device according to claim 1, wherein the voice noise detection device replaces the known value with estimated values based on a plurality of samples around the known value in real time when the noise determination unit determines the occurrence of noise. The speech noise detection apparatus according to claim 1, further comprising:

A speech noise detection method for detecting noise from frame boundaries of speech data, comprising:
Check the data of the frame boundary portion of the audio data, check the data other than the frame boundary portion of the audio data, and if the frame boundary portion of the audio data is 0x00, determine whether the volume exceeds a predetermined value, A voice noise detection method comprising: counting 0x00 when the value exceeds a predetermined value, and determining that there is noise when the count value is equal to or more than the predetermined value.