JP4465221B2

JP4465221B2 - Speech data summarization system and its program

Info

Publication number: JP4465221B2
Application number: JP2004141588A
Authority: JP
Inventors: 享邦西田
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2004-05-11
Filing date: 2004-05-11
Publication date: 2010-05-19
Anticipated expiration: 2024-05-11
Also published as: JP2005321731A

Abstract

PROBLEM TO BE SOLVED: To provide a speech data summarization system with which a portion of prescribed speech can be summarized and reproduced, for example, in which an auditor shows an interest, from memoranda etc., written by many auditors when the system is applied to a lecture etc. SOLUTION: The system comprises a plurality of writing detection terminals 2 for detecting that which is written in recording media and a speech data summarization terminal 4 which inputs the speech and summarizes the speech data indicating the speech inputted based on the writing detection information detected at the writing detection terminals 2. The writing detection terminals 2 detects that which is written and transmits the detected writing detection information. The speech data summarization terminal 4 memorizes the speech data by making the same correspondent to the input time when the speech is inputted. The system receives the writing detection information transmitted by the writing detection terminals 2, memorizes the writing frequency data making the received writing detection information and time correspondent to each other, acquires the frequency time of the occurrence of the frequency when the writing frequency data exceeds the threshold and summarizes the memorized speech data based on the acquired frequency time. COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、音声データ要約システムに関し、特に、タブレット等によって得られた筆記データに基づいて録音された音声を要約する音声データ要約システムに関する。 The present invention relates to a voice data summarization system, and more particularly to a voice data summarization system that summarizes voices recorded based on written data obtained by a tablet or the like.

従来、ペンで筆記されたストローク情報を用いて文字を認識したり、紙面上に筆記された文字のイメージ情報をスキャナーやカメラを用いて取り込み、そのイメージ情報をもとにして文字を認識する文字認識装置が知られている（例えば特許文献１参照）。さらに、このような文字認識装置に音声を録音再生する録音再生装置を組み合わせたシステムが考えられている。このシステムは、会議等において音声を連続して録音すると共に、議事録作成メモとして筆記データと音声データとを同期させて記憶し、会議後に筆記データを指示することにより、対応する音声を再生するようになっている。
特開平８−１６１４３５号公報 Conventionally, a character that recognizes characters using stroke information written with a pen, or captures image information of characters written on paper using a scanner or camera, and recognizes characters based on the image information A recognition device is known (see, for example, Patent Document 1). Furthermore, a system in which such a character recognition device is combined with a recording / reproducing device for recording and reproducing sound has been considered. This system records voice continuously in a meeting, etc., and stores the written data and the voice data in synchronism as a minutes creation memo, and reproduces the corresponding voice by instructing the written data after the meeting. It is like that.
JP-A-8-161435

しかしながら、前述の従来技術では、講演時等に適用した場合に、多くの聴講者が筆記したメモ等から、例えば、聴講者が興味を示した音声の一部を要約し、再生することができないという課題が残されていた。 However, in the above-described conventional technology, when applied at the time of a lecture or the like, it is not possible to summarize and reproduce, for example, a part of the audio that the audience showed interest from, for example, notes written by many listeners. The problem was left.

本発明は、このような課題を解決するためになされたもので、その目的は、講演時等に適用した場合に、多くの聴講者が筆記したメモ等から、例えば聴講者が興味を示した音声のような特定な音声の一部を要約し、再生することができる音声データ要約システムとそのプログラムを提供することにある。 The present invention has been made in order to solve such problems, and the purpose of the present invention was, for example, the audience showed interest from notes written by many audiences when applied at the time of lectures, etc. An object of the present invention is to provide an audio data summarization system and program for summarizing and reproducing a part of a specific voice such as voice.

請求項１記載の音声データ要約システムは、聴講者に所持され、記録媒体に筆記されたことを検出する複数の筆記検出端末と、音声を入力すると共に前記筆記検出端末において検出された筆記検出情報に基づいて前記入力された音声を示す音声データを要約する音声データ要約端末とを備え、
前記筆記検出端末は、聴講者が筆記していることを検出する筆記検出手段と、
前記筆記検出手段によって検出された筆記検出情報を送信する送信手段とを備え、
前記音声データ要約端末は、時刻を計る時刻取得手段と、入力された音声を示す音声データと前記時刻取得手段によって測定された前記音声が入力された入力時刻とを対応させて記憶する音声データ記憶手段と、前記送信手段によって送信された筆記検出情報を受信する受信手段と、前記受信手段によって受信された筆記検出情報と前記時刻取得手段によって測定された時刻とを対応させ、聴講者の属性別に筆記頻度データを記憶する筆記頻度データ記憶手段と、前記筆記頻度データ記憶手段によって記憶された属性別の筆記頻度データの中で、音声データの要約を要求するユーザによって定まる筆記頻度データを用い、筆記頻度データが閾値を超える頻度が生じた頻度時刻を取得する頻度時刻取得手段と、前記頻度時刻取得手段によって取得された頻度時刻に基づいて前記音声データ記憶手段に記憶されている音声データを要約する要約手段とを備える。また、請求項２記載の音声データ要約システムのように、請求項１記載の音声データ要約システムにおいて、音声データの要約を要求するユーザによって定まる筆記頻度データは、該ユーザの属性に最もマッチする筆記頻度データとしてもよい。 The voice data summarization system according to claim 1 is provided with a plurality of writing detection terminals that are held by a listener and that detects writing on a recording medium, and writing detection information that is input to the voice and detected at the writing detection terminal. A voice data summarizing terminal that summarizes voice data indicating the input voice based on
The writing detection terminal is a writing detection means for detecting that the listener is writing,
Transmission means for transmitting the writing detection information detected by the writing detection means,
The voice data summarizing terminal stores a time data acquisition unit for measuring time, a voice data storage for storing the voice data indicating the input voice and the input time when the voice measured by the time acquisition unit is input. Means, receiving means for receiving the handwritten detection information transmitted by the transmitting means, correspondence between the handwritten detection information received by the receiving means and the time measured by the time acquiring means, and for each attribute of the listener The writing frequency data storage means for storing the writing frequency data, and the writing frequency data determined by the user who requests the summary of the voice data among the writing frequency data for each attribute stored by the writing frequency data storage means , A frequency time acquisition unit that acquires a frequency time at which a frequency at which the frequency data exceeds a threshold occurs, and the frequency time acquisition unit. And a summary means for summarizing the voice data stored in the voice data storage means on the basis of the frequency time. Further, as in the speech data summarization system according to claim 2, in the speech data summarization system according to claim 1, the handwriting frequency data determined by the user who requests summarization of the speech data is the handwriting best matching the attribute of the user. It may be frequency data.

本発明によれば、複数の筆記検出端末から送信された筆記検出情報から求められる筆記頻度データに基づいて、音声データを要約することができるので、例えば、講演時等に適用した場合に、多くの聴講者が筆記したメモ等から、聴講者が興味を示した音声の一部を要約し、再生することができる。 According to the present invention, since voice data can be summarized based on writing frequency data obtained from writing detection information transmitted from a plurality of writing detection terminals, for example, when applied at the time of a lecture, etc. A part of the voice that the listener showed interest in can be summarized and reproduced from a memo written by the listener.

また、請求項３記載の音声データ要約システムは、請求項１および２記載の音声データ要約システムにおいて、前記要約手段は、前記音声データ記憶手段に蓄積された音声データの無声区間と有声区間を判定し、前記有声区間に基づき音声データの再生開始時刻を算出する。 Further, the voice data summarizing system according to claim 3 is the voice data summarizing system according to claim 1 and 2 , wherein the summarizing means determines an unvoiced section and a voiced section of the voice data stored in the voice data storing means. Then, the reproduction start time of the audio data is calculated based on the voiced section.

本発明によれば、蓄積された音声データの無声区間と有声区間を判定し、有声区間から音声データの再生開始時刻を算出しているので、有声区間のみにおける音声データの要約が可能となる。 According to the present invention, since the unvoiced and voiced sections of the accumulated voice data are determined and the reproduction start time of the voice data is calculated from the voiced sections, it is possible to summarize the voice data only in the voiced sections.

以上の音声データ要約システムは、コンピュータとプログラムによっても実現できる。すなわち、コンピュータを請求項１から３のいずれか１項に記載の音声データ要約システムを構成する手段として機能させるためのプログラムをコンピュータ読み取り可能な記録媒体に記録することやネットワークを通して提供すればコンピュータによって実施可能である。前記記録媒体としては、例えばフレキシブルディスクや、ＭＯ、ＲＯＭ、メモリカード、ＣＤ、ＤＶＤ、リムーバルディスク等が挙げられる。 The above audio data summarization system can also be realized by a computer and a program. That is, if a program for causing a computer to function as means for configuring the audio data summarization system according to any one of claims 1 to 3 is recorded on a computer-readable recording medium or provided through a network, the computer It can be implemented. Examples of the recording medium include a flexible disk, MO, ROM, memory card, CD, DVD, and removable disk.

以上のように本発明の音声データ要約システムとそのプログラムによれば、講演時等に適用した場合に、多くの聴講者が筆記したメモ等から、例えば、聴講者が興味を示した音声のような特定な音声の一部を要約し、再生することができるシステムを提供することができる。 As described above, according to the audio data summarization system and the program thereof of the present invention, when applied at the time of a lecture or the like, from a memo etc. written by a large number of listeners, for example, a voice in which the listener has shown interest It is possible to provide a system capable of summarizing and reproducing a part of a specific voice.

以下、本発明の実施の形態について、図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図１は、本発明の一実施の形態に係る音声データ要約システムの概略構成図であり、図２は、音声データ要約システムの機能構成を示すブロック図である。 FIG. 1 is a schematic configuration diagram of an audio data summarizing system according to an embodiment of the present invention, and FIG. 2 is a block diagram showing a functional configuration of the audio data summarizing system.

図１及び図２に示された音声データ要約システム１は、記録媒体に筆記されたことを検出する複数の筆記検出端末２と、音声を入力する音声入力手段としてのマイク装置３と、筆記検出端末２において検出された筆記検出情報に基づいて入力された音声を示す音声データを要約する音声データ要約端末４と、音声データ要約端末４によって予約された音声データを出力する音声出力手段としてのスピーカ装置５とを備えている。 The voice data summarization system 1 shown in FIGS. 1 and 2 includes a plurality of writing detection terminals 2 for detecting writing on a recording medium, a microphone device 3 as voice input means for inputting voice, and writing detection. A voice data summarizing terminal 4 that summarizes voice data indicating voice input based on the handwriting detection information detected in the terminal 2 and a speaker as voice output means for outputting voice data reserved by the voice data summarizing terminal 4 And a device 5.

例えば、講演時に適用した場合、講演者がマイク装置３を用いて音声を連続して録音する。各聴講者は、筆記検出端末２を所持しており、筆記検出端末２上に紙などの記録媒体６を置き、講演を聴きながらペン７を用いて記録媒体６にメモをとる。尚、筆記検出端末２は、タブレットなどのポインティングデバイスによって構成してもよい。 For example, when the present invention is applied at the time of a lecture, the lecturer continuously records voice using the microphone device 3. Each listener has a writing detection terminal 2, puts a recording medium 6 such as paper on the writing detection terminal 2, and takes notes on the recording medium 6 using the pen 7 while listening to the lecture. The writing detection terminal 2 may be configured by a pointing device such as a tablet.

図２に示された筆記検出端末２は、筆記していることを検出する筆記検出手段２０と、筆記検出手段２０によって検出された筆記検出情報を送信する送信手段２１とを備えている。 The writing detection terminal 2 shown in FIG. 2 includes a writing detection unit 20 that detects writing, and a transmission unit 21 that transmits writing detection information detected by the writing detection unit 20.

筆記検出情報は、筆記検出端末２を識別するためのＩＤと、筆記を開始した旨を表す情報とが含まれており、筆記検出手段２０は、筆記検出端末２に置かれた記録媒体６にペン７の先端が触れたときに筆記の開始を検出し、筆記を開始した旨を表す情報を含む筆記検出情報を生成するようになっている。 The writing detection information includes an ID for identifying the writing detection terminal 2 and information indicating that writing has started, and the writing detection means 20 is stored in the recording medium 6 placed on the writing detection terminal 2. When the tip of the pen 7 is touched, the start of writing is detected, and writing detection information including information indicating that writing has started is generated.

尚、筆記検出端末２が時計を有する場合には、筆記検出手段２０は、筆記検出情報に時刻情報を含むようにしてもよい。筆記検出情報に時刻情報を含む場合には、送信手段２１は、筆記検出情報を一時的に蓄積し、複数の筆記検出情報をまとめて送信するようにしてもよい。 When the writing detection terminal 2 has a clock, the writing detection means 20 may include time information in the writing detection information. When the time information is included in the handwriting detection information, the transmission means 21 may temporarily store the handwriting detection information and transmit a plurality of handwriting detection information collectively.

送信手段２１は、アンテナ、ドライバ、及びレシーバを備え、無線で情報を送受信するようにしてもよく、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）やネットワーク等のインタフェイスを備え、有線で情報を送受信するようにしてもよい。 The transmission means 21 includes an antenna, a driver, and a receiver, and may transmit and receive information wirelessly, or includes an interface such as a USB (Universal Serial Bus) and a network, and transmits and receives information via a wire. Also good.

一方、音声データ要約端末４は、時刻を測る時刻取得手段４１と、マイク装置３によって入力された音声を示す音声データと時刻取得手段４１によって得られた音声が入力された入力時刻とを対応させて記憶する音声データ記憶手段４２と、送信手段２１によって送信された筆記検出情報を受信する受信手段４３と、受信手段４３によって受信された筆記検出情報と時刻取得手段４１によって得られた時刻とを対応させた筆記頻度データを記憶する筆記頻度データ記憶手段４４と、筆記頻度データ記憶手段４４によって記憶された筆記頻度データが閾値を超える頻度が生じた頻度時刻を取得する頻度時刻取得手段４５と、頻度時刻取得手段４５によって取得された頻度時刻に基づいて、音声データ記憶手段４２に記憶されている音声データを要約する要約手段４６と、要約手段４６によって要約された音声データをスピーカ装置５に出力するよう制御する音声出力制御手段４８とを備えている。 On the other hand, the voice data summarizing terminal 4 associates the time acquisition means 41 for measuring time with the voice data indicating the voice input by the microphone device 3 and the input time when the voice obtained by the time acquisition means 41 is input. Voice data storage means 42 to be stored, receiving means 43 for receiving the handwriting detection information transmitted by the transmitting means 21, handwriting detection information received by the receiving means 43 and the time obtained by the time acquisition means 41. Writing frequency data storage means 44 for storing the corresponding writing frequency data, frequency time acquisition means 45 for acquiring a frequency time at which the writing frequency data stored by the writing frequency data storage means 44 exceeds the threshold, and Based on the frequency time acquired by the frequency time acquisition unit 45, the audio data stored in the audio data storage unit 42 is stored. And summary means 46 promises and audio data summarized by summary means 46 and an audio output control unit 48 for controlling to output to the speaker apparatus 5.

例えば、音声データ記憶手段４２は、講演等における音声データをリアルタイムで連続して蓄積するようになっている。また、音声データは、無圧縮のＰＣＭ形式に対応したデータや、ＭＰ３に準拠して符号化されたデータなどである。 For example, the audio data storage means 42 is configured to continuously accumulate audio data in a lecture or the like in real time. The audio data is data corresponding to the uncompressed PCM format, data encoded in conformity with MP3, and the like.

音声データ記憶手段４２は、音声データの格納アドレスと入力時刻とを対応させることによって、入力時刻に対応した音声データと入力時刻とを対応させて記憶するようになっている。 The voice data storage means 42 stores the voice data corresponding to the input time and the input time in association with each other by associating the storage address of the voice data with the input time.

ここで、音声データ記憶手段４２によって記憶された入力時刻及び格納アドレスを対応付けたデータの一例を表１に示す。格納時刻は、音声データが格納アドレスに蓄積されたときの時刻であり、時刻Ｔｎ（ｎは自然数）で表す。格納アドレスは、音声データが格納時刻の時点で記憶されたときの記憶領域を示すアドレスであり、アドレスｎ（ｎは自然数）で表す。尚、表１に示したデータには、一定時間間隔で格納時刻及び格納アドレスを対応付けて蓄積されてもよい。 Here, an example of data in which the input time and the storage address stored by the voice data storage unit 42 are associated is shown in Table 1. The storage time is the time when the audio data is accumulated at the storage address, and is represented by time Tn (n is a natural number). The storage address is an address indicating a storage area when audio data is stored at the time of storage, and is expressed by an address n (n is a natural number). Note that the data shown in Table 1 may be stored in association with storage times and storage addresses at regular time intervals.

音声データ記憶手段４２は、前述した音声データに加えて、音声データに基づいて分析時間毎に求められる分析データを記憶するようになっており、この音声データ記憶手段４２に分析データを記憶させるよう制御する音声入力制御手段４７が音声データ要約端末４にさらに設けられている。 The voice data storage means 42 stores analysis data required for each analysis time based on the voice data in addition to the voice data described above. The voice data storage means 42 stores the analysis data. A voice input control means 47 for controlling the voice data summary terminal 4 is further provided.

図５は分析時間毎に求められた分析データをグラフ化して表した図である。 FIG. 5 is a graph showing the analysis data obtained for each analysis time.

例えば、音声データ記憶手段４２は、図５に示されたように、分析時間毎、すなわち、区間Ａ、区間Ｂ、区間Ｃ、及び以降などから求められた分析データを記憶するようになっている。ここで、分析データは、分析時間内の音声データから得られる音圧値を二乗し、二乗して得られたそれぞれの値の平均を意味する。 For example, as shown in FIG. 5, the voice data storage means 42 stores analysis data obtained from every analysis time, that is, from the section A, the section B, the section C, and the like. . Here, the analysis data means an average of respective values obtained by squaring the sound pressure value obtained from the sound data within the analysis time and squaring.

図６は分析時間毎に求められた筆記頻度データをグラフ化して表した図である。 FIG. 6 is a graph showing the writing frequency data obtained for each analysis time.

筆記頻度データ記憶手段４４に記憶される筆記頻度データは、受信手段４３によって受信された筆記検出情報と時刻取得手段４１によって得られた時刻とを対応させたものであり、図６に示されたように、受信手段４３による筆記検出情報の受信回数を分析時間毎に累積したものである。 The writing frequency data stored in the writing frequency data storage means 44 corresponds to the writing detection information received by the receiving means 43 and the time obtained by the time acquisition means 41, and is shown in FIG. As described above, the number of times of reception of the handwriting detection information by the receiving means 43 is accumulated every analysis time.

頻度時刻取得手段４５は、筆記頻度データが閾値を超える頻度が生じた頻度時刻を取得する。図６に示された例においては、区間Ｊ及び区間Ｐで筆記頻度データが閾値を超えているため時刻Ｔ１０及び時刻Ｔ１６が頻度時刻取得手段４５によって取得される。 The frequency time acquisition means 45 acquires the frequency time when the frequency that the writing frequency data exceeds the threshold value occurs. In the example shown in FIG. 6, since the writing frequency data exceeds the threshold value in the section J and the section P, the time T 10 and the time T 16 are acquired by the frequency time acquisition unit 45.

要約手段４６は、無声区間と有声区間の判定機能を有する。すなわち、音声データ記憶手段４２に蓄積された音声データの無声区間と有声区間を判定し、有声区間から音声データの再生開始時刻を求める。無声区間と有声区間の判定方法は公知であり、例えば、「音響・音声工学」（古井貞熙、近代科学社、１９９２）がある。以下、音圧に基づく要約処理について説明する。 The summarizing means 46 has a function for determining an unvoiced section and a voiced section. That is, the unvoiced and voiced sections of the voice data stored in the voice data storage means 42 are determined, and the playback start time of the voice data is obtained from the voiced sections. A method for determining an unvoiced section and a voiced section is known, for example, “acoustic / voice engineering” (Sadaaki Furui, Modern Science Co., 1992). Hereinafter, a summary process based on sound pressure will be described.

また、要約手段４６は、分析データが閾値を超える音声領域から頻度時刻に対応する音声領域を選択し、選択した音声領域のなかで分析データが閾値を超えた始点時刻及び終点時刻に基づいて、音声データ記憶手段４２に記憶されている音声データを要約する。 The summarizing means 46 selects a voice region corresponding to the frequency time from the voice region where the analysis data exceeds the threshold, and based on the start time and the end time when the analysis data exceeds the threshold in the selected voice region, The voice data stored in the voice data storage means 42 is summarized.

さらに、要約手段４６は、分析データが閾値を超える音声領域を音声領域群にグループ化し、頻度時刻取得手段４５によって取得された頻度時刻に基づいて、グループ化した音声領域群から頻度時刻に対応する音声領域群を選択し、この選択した音声領域群のなかで分析データが閾値を超えた始点時刻及び終点時刻に基づいて、音声データ記憶手段４２に記憶されている音声データを要約するようにしてもよい。 Further, the summarizing unit 46 groups the voice regions whose analysis data exceeds the threshold into a voice region group, and responds to the frequency time from the grouped voice region group based on the frequency time acquired by the frequency time acquiring unit 45. A voice region group is selected, and the voice data stored in the voice data storage means 42 is summarized based on the start point time and the end point time when the analysis data exceeds the threshold in the selected voice region group. Also good.

ここで、音声領域群は、区間の平均データが所定数連続して閾値を超えていない区間を区切りとして分けられる。例えば、２連続した区間の平均データが閾値を超えていない区間を区切りとすれば、図５に示されたように、区間Ｃ及び区間Ｄ、並びに区間Ｌ及び区間Ｍを区切りとして、音声領域群は、区間Ｅから区間Ｋまで、区間Ｎ及び区間Ｑのように分けられる。 Here, the speech area group is divided into sections where the average data of the sections does not exceed the threshold continuously for a predetermined number of times. For example, if a section in which the average data of two consecutive sections does not exceed the threshold is defined as a section, as shown in FIG. 5, the section C and the section D, and the section L and the section M are defined as a section. Are divided from section E to section K as section N and section Q.

例えば、図６に示すように、区間Ｊで筆記頻度データが閾値を超えているため頻度時刻取得手段４５によって頻度時刻Ｔ１０が取得され、要約手段４６は、頻度時刻Ｔ１０に対応する区間Ｅから区間Ｋまでを選択するようになっている。また、区間Ｐで筆記頻度データが閾値を超えているため、頻度時刻取得手段４５によって頻度時刻Ｔ１６が取得され、要約手段４６は、頻度時刻Ｔ１６に対応する区間Ｎ及び区間Ｑまでを選択する。 For example, as shown in FIG. 6, since the writing frequency data exceeds the threshold value in the section J, the frequency time T10 is acquired by the frequency time acquiring means 45, and the summarizing means 46 starts from the section E corresponding to the frequency time T10. Select up to K. In addition, since the writing frequency data exceeds the threshold value in the section P, the frequency time T16 is acquired by the frequency time acquisition unit 45, and the summarization unit 46 selects the section N and the section Q corresponding to the frequency time T16.

また、要約手段４６は、選択した音声領域または音声領域群の音声データを要約する場合に、前述した始点時刻及び終点時刻を調節して要約する。例えば、音声データ要約端末４に設けられた操作手段４ａを操作することによって、ユーザの好みに合わせて設定可能になっている。 Further, when summarizing the voice data of the selected voice region or voice region group, the summarizing means 46 adjusts and summarizes the above-described start point time and end point time. For example, by operating the operating means 4a provided in the voice data summarizing terminal 4, it can be set according to the user's preference.

さらに、要約手段４６は、分析データの最大値及び最小値の差分に対する任意の割合から得られた値に最小値を加算した値を閾値として用いる。 Further, the summarizing means 46 uses a value obtained by adding the minimum value to a value obtained from an arbitrary ratio with respect to the difference between the maximum value and the minimum value of the analysis data as a threshold value.

この閾値は、例示であって、この閾値に換えて、要約手段４６は、分析データの最大値及び最小値の差分を任意の時間間隔で取得し、取得した最大値及び最小値の差分に対する任意の割合から得られた値に最小値を加算した値を閾値として用いるようにしてもよい。 This threshold value is merely an example, and instead of this threshold value, the summarizing unit 46 acquires the difference between the maximum value and the minimum value of the analysis data at an arbitrary time interval, and arbitrarily selects the difference between the acquired maximum value and minimum value. A value obtained by adding the minimum value to the value obtained from the ratio may be used as the threshold value.

さらに、要約手段４６は、分析データの最大値及び平均値の差分を任意の時間間隔で取得し、取得した最大値及び平均値の差分に対する任意の割合から得られた値に平均値を加算した値を閾値として用いるようにしてもよい。 Further, the summarizing means 46 acquires the difference between the maximum value and the average value of the analysis data at an arbitrary time interval, and adds the average value to the value obtained from an arbitrary ratio with respect to the acquired difference between the maximum value and the average value. A value may be used as a threshold value.

以上のように構成された音声データ要約システム１の動作について、図３に示されたフローチャートを参照しながら説明する。 The operation of the audio data summarizing system 1 configured as described above will be described with reference to the flowchart shown in FIG.

音声データ要約システム１における音声データ要約制御動作は、筆記頻度データ記憶手段４４に筆記頻度データの蓄積が完了した場合にスタートする。 The voice data summarization control operation in the voice data summarization system 1 starts when the writing frequency data is completely stored in the writing frequency data storage means 44.

先ず、頻度時刻取得手段４５によって筆記頻度データ記憶手段４４に記憶されている筆記頻度データが解析され（Ｓ１）、この解析された筆記頻度データから頻度時刻が取得される（Ｓ２）。そして、この取得された頻度時刻に基づいて要約手段４６によって音声データの要約処理がなされる（Ｓ３）。 First, the writing time data stored in the writing frequency data storage unit 44 is analyzed by the frequency time acquiring unit 45 (S1), and the frequency time is acquired from the analyzed writing frequency data (S2). Then, the summarizing means 46 summarizes the voice data based on the acquired frequency time (S3).

次に、図４ａを参照しながら要約手段４６による要約処理の手順について説明する。図４ａは、要約手段４６による音声データ要約処理の一例（第１例）を説明したフローチャートである。 Next, the procedure of summarization processing by the summarizing means 46 will be described with reference to FIG. 4a. FIG. 4 a is a flowchart for explaining an example (first example) of voice data summarization processing by the summarizing means 46.

先ず、音声データ記憶手段４２に記憶されている音声データから求められる分析データに基づいて、閾値を超える音声領域を抽出する（Ｓ１１）。次いで、この抽出された音声領域のうち頻度時刻取得手段４５によって取得された頻度時刻に基づいて音声領域を選択する（Ｓ１２）。次いで、この選択された音声領域から閾値を超える始点時刻及び終点時刻を算出する（Ｓ１３）。そして、この算出された始点時刻及び終点時刻を調節した後（Ｓ１４）、この調節された始点時刻及び終点時刻に基づいて音声データ記憶手段４２に記憶されている音声データを要約する（Ｓ１５）。 First, based on the analysis data calculated | required from the audio | voice data memorize | stored in the audio | voice data storage means 42, the audio | voice area | region exceeding a threshold value is extracted (S11). Next, a voice area is selected based on the frequency time acquired by the frequency time acquisition means 45 from the extracted voice area (S12). Next, a start point time and an end point time exceeding the threshold are calculated from the selected voice region (S13). Then, after adjusting the calculated start point time and end point time (S14), the voice data stored in the voice data storage means 42 is summarized based on the adjusted start point time and end point time (S15).

次いで、要約手段４６によるその他の音声データ要約処理について説明する。 Next, other voice data summarization processing by the summarizing means 46 will be described.

図４ｂは要約手段４６による音声データ要約処理の一例（第２例）を説明したフローチャートである。 FIG. 4B is a flowchart for explaining an example (second example) of voice data summarization processing by the summarizing means 46.

先ず、音声データ記憶手段４２に記憶されている音声データから求められる分析データに基づいて、閾値を超える音声領域を抽出する（Ｓ２１）。次いで、前記抽出された音声領域をグループ化する（Ｓ２２）。次いで、グループ化された音声領域群のうち頻度時刻取得手段４５によって取得された頻度時刻に基づいて音声領域群を選択する（Ｓ２３）。次いで、この選択された音声領域群から閾値を超える始点時刻及び終点時刻を算出する（Ｓ２４）。そして、この算出された始点時刻及び終点時刻の調節を実行した後（Ｓ２５）、この調節された始点時刻及び終点時刻に基づいて音声データ記憶手段４２に記憶されている音声データを要約する（Ｓ２６）。 First, based on the analysis data calculated | required from the audio | voice data memorize | stored in the audio | voice data storage means 42, the audio | voice area | region exceeding a threshold value is extracted (S21). Next, the extracted voice regions are grouped (S22). Next, a speech region group is selected based on the frequency time acquired by the frequency time acquisition unit 45 from the grouped speech region group (S23). Next, a start point time and an end point time exceeding the threshold are calculated from the selected voice region group (S24). Then, after adjusting the calculated start point time and end point time (S25), the voice data stored in the voice data storage means 42 is summarized based on the adjusted start point time and end point time (S26). ).

このような本実施形態の音声データ要約システム１によれば、複数の筆記検出端末２から送信された筆記検出情報から求められる筆記頻度データ及び分析データに基づいて、音声データを要約することができるので、例えば、講演時等に適用した場合に、多くの聴講者が筆記したメモ等から、聴講者が興味を示した音声の一部を要約し、再生することができる。 According to the speech data summarizing system 1 of this embodiment, speech data can be summarized based on writing frequency data and analysis data obtained from writing detection information transmitted from a plurality of writing detection terminals 2. Therefore, for example, when applied at the time of a lecture or the like, it is possible to summarize and reproduce a part of the audio that the listener has shown interest from notes written by many listeners.

また、筆記頻度データ記憶手段４４において聴講者の技術分野、性別、年齢別に筆記頻度データを記憶しておき、音声要求時にユーザがどのデータを選択するか若しくはユーザの技術分野、性別、年齢別等から最もマッチする頻度データを音声データ要求端末が選択するようにしてユーザの嗜好に合った要約音声を再生することもできる。 The writing frequency data storage means 44 stores writing frequency data for each technical field, sex, and age of the listener, and what data the user selects at the time of voice request or the user's technical field, sex, age, etc. Therefore, the voice data requesting terminal selects the frequency data that most closely matches, and the summary voice that matches the user's preference can be reproduced.

本発明の音声データ要約システムは、前述した実施形態に係る装置の機能を実現するソフトウェアのプログラムコードを記録したコンピュータ読み取り可能な記録媒体を、コンピュータに供給し、このコンピュータの演算手段が前記記憶媒体に格納されたプログラムコードを読み出し実行することによっても実現できる。この場合、記録媒体から読み出されたプログラムコード自体が上述した実施の形態の機能を実現することになり、このプログラムコードを記憶した記憶媒体、例えばフレキシブルディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷ、ＭＯ、ＨＤＤ等は本発明を構成するものである。 The audio data summarization system of the present invention supplies a computer-readable recording medium that records a program code of software that realizes the functions of the apparatus according to the above-described embodiment to a computer. This can also be realized by reading and executing the program code stored in the. In this case, the program code itself read from the recording medium realizes the functions of the above-described embodiment, and a storage medium storing this program code, such as a flexible disk, CD-ROM, DVD-ROM, CD -R, CD-RW, MO, HDD, etc. constitute the present invention.

本発明の一実施形態例に係る音声データ要約システムの概略構成図。1 is a schematic configuration diagram of an audio data summarization system according to an embodiment of the present invention. 本発明の一実施形態例に係る音声データ要約システムの機能構成を示すブロック図。1 is a block diagram showing a functional configuration of an audio data summarization system according to an embodiment of the present invention. 本発明の一実施形態例に係る音声データ要約システムの動作を説明したフローチャート。The flowchart explaining operation | movement of the audio | voice data summary system which concerns on one example of embodiment of this invention. 本発明の一実施形態例に係る音声データ要約システムの動作を説明したフローチャート。The flowchart explaining operation | movement of the audio | voice data summary system which concerns on one example of embodiment of this invention. 本発明の一実施形態例に係る音声データ要約システムの動作を説明したフローチャート。The flowchart explaining operation | movement of the audio | voice data summary system which concerns on one example of embodiment of this invention. 分析時間毎に求められた分析データのグラフ。A graph of analysis data obtained for each analysis time. 分析時間毎に求められた筆記頻度データのグラフ。A graph of writing frequency data obtained for each analysis time.

Explanation of symbols

１…音声データ要約システム
２…筆記検出端末
３…マイク装置
４…音声データ要約端末
５…スピーカ装置
２０…筆記検出手段
２１…送信手段
４１…時刻取得手段
４２…音声データ記憶手段
４３…受信手段
４４…筆記頻度データ記憶手段
４５…頻度時刻取得手段
４６…要約手段
４７…音声入力制御手段
４８…音声出力制御手段
DESCRIPTION OF SYMBOLS 1 ... Voice data summarization system 2 ... Writing detection terminal 3 ... Microphone apparatus 4 ... Voice data summarization terminal 5 ... Speaker apparatus 20 ... Writing detection means 21 ... Transmission means 41 ... Time acquisition means 42 ... Voice data storage means 43 ... Reception means 44 ... Writing frequency data storage means 45 ... Frequency time acquisition means 46 ... Summary means 47 ... Voice input control means 48 ... Voice output control means

Claims

A plurality of writing detection terminals for detecting the possession of the listener and writing on the recording medium;
A voice data summarization terminal that inputs voice and summarizes voice data indicating the input voice based on the handwriting detection information detected in the handwriting detection terminal;
The handwriting detection terminal is
Writing detection means for detecting that the listener is writing;
Transmission means for transmitting the writing detection information detected by the writing detection means,
The voice data summary terminal
Time acquisition means for measuring time;
Voice data storage means for storing voice data indicating the voice inputted and the input time when the voice measured by the time acquisition means is input;
Receiving means for receiving writing detection information transmitted by the transmitting means;
A writing frequency data storage means for storing writing frequency data for each attribute of the listener, by associating the writing detection information received by the receiving means with the time measured by the time acquisition means;
Among the writing frequency data for each attribute stored by the writing frequency data storage means , using the writing frequency data determined by the user who requests the summary of the voice data, the frequency time at which the frequency that the writing frequency data exceeds the threshold occurs. Frequency time acquisition means to acquire;
A voice data summarizing system comprising: summarizing means for summarizing voice data stored in the voice data storage means based on the frequency time acquired by the frequency time acquiring means.

2. The voice data summarization system according to claim 1, wherein the writing frequency data determined by a user who requests summarization of voice data is writing frequency data that most closely matches the attribute of the user .

The summary section, the audio data unvoiced and voiced sections of the accumulated audio data to determine the storage means, according to claim 1 or 2, characterized in that to calculate the reproduction start time of the audio data based on the voiced The voice data summarization system described.

An audio data summarization program for causing a computer to function as means for constituting the audio data summarization system according to any one of claims 1 to 3.