JP3364569B2

JP3364569B2 - Voice information processing device

Info

Publication number: JP3364569B2
Application number: JP28889496A
Authority: JP
Inventors: 力三好
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1996-10-31
Filing date: 1996-10-31
Publication date: 2003-01-08
Anticipated expiration: 2016-10-31
Also published as: JPH10133679A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明に属する技術分野】本発明は、留守番電話、ボイ
スメール、携帯情報端末等のインターフェース機器に応
用され、音声情報の表示、編集等の処理を行う音声情報
処理装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice information processing apparatus which is applied to an interface device such as an answering machine, a voice mail, a portable information terminal or the like, and which displays and edits voice information.

【０００２】[0002]

【従来の技術】現在、波形を変形することによって、サ
ウンドデータを編集する様々な音源用エディタが実用化
されている。また、文章中にボタン状の印をつけそこを
指定することによりサウンドデータを再生するようなマ
ルチメディアエディタも実用化されている。2. Description of the Related Art Currently, various sound source editors for editing sound data by transforming a waveform are put into practical use. Also, a multimedia editor that puts a button-like mark in the text and specifies the button to reproduce sound data has been put into practical use.

【０００３】例えば、録音したサウンドデータの強度を
縦軸に、時間を横軸にとり、そのサウンドデータの表示
を行い、カット、コピー、ペースなどの操作を行うとい
う音声データ処理装置が、特開平１−１７２９００号公
報に開示れている。For example, an audio data processing device for displaying the sound data with the strength of the recorded sound data on the vertical axis and the time on the horizontal axis and performing operations such as cutting, copying and pace is disclosed in Japanese Patent Application Laid-Open No. HEI-1. No. 172900.

【０００４】一方、データ入力の方法としてペン入力、
超小型キーボードなどを利用し、入力効率を犠牲にして
携帯性を重視する携帯端末などのデバイスが実用化され
ている。音声入出力は、利用者に最もなじみ深い入出力
方法であり、要求される設備も入力効率を犠牲にせずに
非常に装置の小型化可能であるメリットを有し、上記の
ような携帯用のデバイスに適している。On the other hand, as a data input method, pen input,
Devices such as mobile terminals, which use ultra-compact keyboards and the like, and which emphasize portability at the expense of input efficiency, have been put into practical use. Voice input / output is the most familiar input / output method for users, and has the advantage that the equipment required can be made extremely compact without sacrificing input efficiency. Suitable for devices.

【０００５】ところが、音声データの内容の表示、編
集、メッセージの再利用などの音声情報処理が困難なた
め、携帯端末に音声入出力が十分に活用されていないの
が現状である。また、同様の原因により、ボイスメール
の普及の妨げとなっている。However, since voice information processing such as displaying, editing, and reusing messages of voice data is difficult, voice input / output is not fully utilized in mobile terminals at present. In addition, the same cause hinders the spread of voice mail.

【０００６】そして、留守番電話においても、同様の音
声情報処理の困難性により、入力音声の訂正、編集の機
能を備えるものが実現されておらず、入力を敬遠して留
守番電話であることが分かった時点で通話を終了する利
用者が多い。すなわち、例えば、留守番電話に、入力誤
り、繰り返し、「あー」や「うーん」等の無意味な表現
などの入力をしてしまった場合、その入力を訂正、削除
等の音声情報処理が不可能なため、これらの入力を恐れ
て入力自体を放棄してしまう利用者が多い。そこで、こ
のような留守番電話に音声情報処理機能を付与すること
により、上記のような入力誤り、繰り返し、無意味な表
現などを訂正や削除できるので、入力を敬遠していた利
用者を削減することが期待される。[0006] Even in the answering machine, due to the similar difficulty of voice information processing, no one having the function of correcting and editing the input voice has been realized, and it is understood that the answering machine is an answering machine. There are many users who terminate the call at the time. That is, for example, if you make a mistake in the answering machine, repeat it, or enter a meaningless expression such as "ah" or "hmm", you cannot correct or delete the input and perform voice information processing such as deletion. Therefore, many users are afraid of these inputs and abandon the inputs themselves. Therefore, by adding a voice information processing function to such an answering machine, it is possible to correct or delete the above-mentioned input error, repetition, meaningless expression, etc., so that the number of users who refrain from inputting can be reduced. It is expected.

【０００７】しかし、上記のような特開平１−１７２９
００号公報に開示されたようなものを、携帯端末、ボイ
スメール、留守番電話に応用することは困難である。そ
こで、例えば、特公昭６３−３０６４５号公報に開示さ
れたような技術を採用することが考えられる。この特公
昭６３−３０６４５号公報には、証示マークを用いて音
声データをテキストと混在させて表示し、音声データを
テキストデータと同様に、自由に移動、削除、挿入、再
生等の操作が可能な情報処理システムが開示されてい
る。すなわち、この特公昭６３−３０６４５号公報に開
示されたものは、文字列と音声を混在させて表示、編集
することにより、音声メッセージの編集、再利用が可能
となつものである。However, the above-mentioned Japanese Patent Laid-Open No. 1-1729.
It is difficult to apply what is disclosed in Japanese Patent Publication No. 00 to mobile terminals, voice mails, and answering machines. Therefore, for example, it is conceivable to adopt the technique disclosed in Japanese Patent Publication No. 63-30645. In this Japanese Examined Patent Publication No. 63-30645, voice data is displayed in a mixed manner with a text by using a certification mark, and voice data can be freely moved, deleted, inserted, reproduced and the like like the text data. A possible information processing system is disclosed. That is, the one disclosed in Japanese Patent Publication No. Sho 63-30645 is capable of editing and reusing a voice message by displaying and editing a mixed character string and voice.

【０００８】[0008]

【発明が解決しようとする課題】しかしながら、上記の
特公昭６３−３０６４５号公報に開示されたものは、音
声を証示マークという大きな塊で取り扱うため、音声デ
ータの編集では、音声中での語の中間の位置に切れ目が
生じる可能性が非常に高く、編集時に意味のあるまとま
りとして音声を取り扱うことが困難となり、したがっ
て、編集後の音声の品質が非常に低くなってしまう。ま
た、音声入力がない場合には、データ格納を抑止するの
に、利用者が手動で切れ目を指示しない限り、断続的に
入力された音声データも連続した入力として取り扱わ
れ、音声の切れ目がわからないため、音声編集が非常に
困難となってしまう。However, in the one disclosed in the above Japanese Patent Publication No. Sho 63-30645, since the voice is handled in a large block called the certification mark, when the voice data is edited, the words in the voice are edited. There is a very high possibility that a break will occur in the middle position of, and it will be difficult to handle the sound as a meaningful unit during editing, and therefore the quality of the edited sound will be very low. Also, if there is no voice input, intermittently input voice data is treated as continuous input to prevent data storage, and the user does not know the voice gap, unless the user manually instructs the break. Therefore, voice editing becomes very difficult.

【０００９】本発明は、上記のような課題を解決するた
めになされたものであって、従来の音声情報の欠点であ
った表示を大幅に改善し、音声情報の編集、再利用を容
易にする音声情報処理装置を提供することを目的とす
る。The present invention has been made in order to solve the above problems, and greatly improves the display, which is a drawback of conventional voice information, and facilitates editing and reuse of voice information. It is an object of the present invention to provide a voice information processing device that performs

【００１０】[0010]

【問題を解決するための手段】上記課題を解決するた
め、本発明では、音声情報の解析を行うデータ解析部
と、そのデータ解析部により解析された音声情報の時間
長に対応した大きさのシンボルとして表示する制御を行
うシンボル表示制御部とから構成される音声情報処理装
置において、シンボル表示制御部が、音声情報の０．１
秒から０．３秒までの間の時間長を一単位としてシンボ
ル表示の制御を行い、音声情報の約０．０５秒に対し
て、高さ対幅の比が４：１となる形状のシンボルを表示
するように構成している。 In order to solve the above problems, the present invention provides a data analysis unit for analyzing voice information.
And the time of voice information analyzed by the data analysis unit
Control to display as a symbol with a size corresponding to the length.
Voice information processing device composed of a symbol display control unit
In the display, the symbol display control unit displays 0.1
The symbol length is defined as a unit of time length from 2 seconds to 0.3 seconds.
Control the display of the voice information for about 0.05 seconds
Display a symbol with a height-to-width ratio of 4: 1
It is configured to do.

【００１１】[0011]

【００１２】[0012]

【００１３】[0013]

【００１４】上記構成の本発明の基本的な処理は以下の
ようになる。データ解析部では、表示対象の音声情報を
特定の時間長で区切り、シンボル表示制御部に伝達す
る。シンボル表示制御部では、そのデータに対応するシ
ンボルを選択し、表示領域内に２次元表示を行うため、
過去の表示があればその表示状態から次のシンボルの表
示位置を算定して表示部に伝達する。このとき、シンボ
ル表示制御部では、音声情報の０．１秒から０．３秒ま
での間の時間長を一単位としてシンボル表示を行うよう
に制御する。表示部では、所定の位置に音声情報に基づ
くシンボルを表示する。The basic processing of the present invention having the above configuration is as follows. In the data analysis unit, the voice information to be displayed is divided into specific time lengths and transmitted to the symbol display control unit. The symbol display control unit selects a symbol corresponding to the data and performs two-dimensional display in the display area.
If there is a past display, the display position of the next symbol is calculated from the display state and transmitted to the display unit. At this time, the symbol display control unit controls the symbol display with the time length of 0.1 seconds to 0.3 seconds of the voice information as one unit. The display unit displays a symbol based on audio information at a predetermined position.

【００１５】ここで、本発明で、音声情報の０．１秒か
ら０．３秒までの間の時間長を一単位としてシンボル表
示を行うようにしたことについて説明する。日本語の発
話速度は、アナウンサーで１秒間に８モーラ（ｍｏｒ
ａ）から１２モーラ、一般人では５モーラから８モーラ
といわれている。これは、ひらがな表示で、アナウンサ
ーが１秒間に５文字分から８文字分、一般人が５文字分
から８文字分の発話を行うことを示している。ところ
で、文章に目を向けると、同じ内容の漢字仮名交じり表
示とひらがな表示とでは、表示長の比がおよそ「漢字仮
名交じり文：ひらがな文」で「２：３」から「４：５」
といわれている。これらの条件に基づいて、音声の時間
長とシンボルの表示長の関係を計算する。発話速度か
ら、音声１秒間は５文字から１２文字のひらがなに相当
する。Here, in the present invention, a description will be given of the case where the symbol display is performed with the time length of 0.1 seconds to 0.3 seconds of the voice information as one unit. The speaking speed of Japanese is 8 Mora per second by the announcer.
It is said that 12 aura from a) and 5 to 8 aura by the general public. This is a hiragana display, indicating that the announcer speaks 5 to 8 characters per second and the general public speaks 5 to 8 characters. By the way, when we look at sentences, the ratio of display lengths between the kanji and kana kanji mixed display and the hiragana display of the same content is approximately "kanji kana mixed sentence: hiragana sentence" from "2: 3" to "4: 5".
It is said that. Based on these conditions, the relationship between the voice time length and the symbol display length is calculated. From the speech rate, one second of voice corresponds to 5 to 12 characters of Hiragana.

【００１６】一方、漢字仮名交じり文では、「２：３」
で３．３文字から８文字、「４：５」で４文字から９．
６文字、すなわち、３．３文字から９．６文字に相当す
る。これらから、一文字あたりの時間長は文字数の逆数
となるので、一文字あたり１／９．６秒から１／３．３
秒、すなわち、０．１秒から０．３秒となる。したがっ
て、シンボルの表示長の一単位を、０．１秒から０．３
秒を音声情報を一単位として対応させることにより、発
話内容を漢字仮名交じり文と類似の表示長で表現するこ
とができ、視覚上ごく自然な表示が可能となる。なお、
一般に、日本語表示用文字は縦横比が「１：１」の正方
形であるので、シンボルの表示も、その一単位を縦横比
１：１とすることが好ましい。なお、１単位は、さらに
複数の小さな部分で表現すると、自然な編集操作が可能
になる。On the other hand, in the kanji kana mixed sentence, "2: 3"
3.3 characters to 8 characters, "4: 5" 4 characters to 9 characters.
It corresponds to 6 characters, that is, 3.3 to 9.6 characters. From these, the time length per character is the reciprocal of the number of characters, so 1 / 9.6 seconds to 1 / 3.3 per character.
Seconds, that is, 0.1 seconds to 0.3 seconds. Therefore, one unit of symbol display length is 0.1 seconds to 0.3
By making the seconds correspond to the voice information as one unit, the utterance content can be expressed with a display length similar to a kanji kana mixed sentence, and a visually very natural display is possible. In addition,
In general, Japanese display characters are squares having an aspect ratio of "1: 1", and therefore it is preferable that one unit of the symbol display has an aspect ratio of 1: 1. If one unit is expressed by a plurality of smaller parts, a natural editing operation becomes possible.

【００１７】さらに、本発明では、上記の音声情報処理
装置において、文字列が入力されるテキスト入力部を設
け、シンボル表示制御部が、音声情報に基づくシンボル
に、テキスト入力部に入力された文字列を多重してシン
ボル表示の制御を行うように構成している。Further, according to the present invention, in the above-mentioned voice information processing apparatus, a text input section for inputting a character string is provided, and the symbol display control section causes a character input to the text input section to be a symbol based on voice information. The columns are multiplexed to control the symbol display.

【００１８】上記構成の本発明の基本的な処理は以下の
ようになる。データ解析部では、表示対象の音声情報を
特定の時間長で区切り、シンボル表示制御部に伝達す
る。シンボル表示制御部では、そのデータに対応するシ
ンボルを選択し、表示領域内に２次元表示を行うため、
過去の表示があればその表示状態から次のシンボルの表
示位置を算定して表示部に伝達する。表示部では、所定
の位置に音声情報に基づくシンボルを表示する。一方、
テキスト入力部から音声情報に関連した文字列が入力さ
れ、その文字列が発見されると、対象音声情報と関連文
字列とをシンボル表示制御部に伝達する。シンボル表示
制御部では、対象音声情報に対応するシンボルを選択
し、過去の表示状態から次のシンボルの表示位置を算定
して表示部に伝達すると共に、同一の表示位置と関連文
字列を表示部に伝達する。表示部では、所定の位置にシ
ンボル及び文字列を表示する。このとき、シンボルと文
字列を多重して表示する。The basic processing of the present invention having the above configuration is as follows. In the data analysis unit, the voice information to be displayed is divided into specific time lengths and transmitted to the symbol display control unit. The symbol display control unit selects a symbol corresponding to the data and performs two-dimensional display in the display area.
If there is a past display, the display position of the next symbol is calculated from the display state and transmitted to the display unit. The display unit displays a symbol based on audio information at a predetermined position. on the other hand,
When a character string related to voice information is input from the text input unit and the character string is found, the target voice information and the related character string are transmitted to the symbol display control unit. The symbol display control unit selects a symbol corresponding to the target voice information, calculates the display position of the next symbol from the past display state, and transmits the calculated display position to the display unit, and also displays the same display position and the related character string on the display unit. Communicate to. The display unit displays the symbol and the character string at a predetermined position. At this time, the symbol and the character string are multiplexed and displayed.

【００１９】さらに、本発明では、上記の音声情報処理
装置において、シンボル表示制御部が、文字列の表示長
に対応させて、シンボルの表示長の表示制御を行うよう
に構成している。Further, according to the present invention, in the above-described voice information processing device, the symbol display control unit is configured to control the display length of the symbol in correspondence with the display length of the character string.

【００２０】本発明によれば、シンボルの表示長に文字
列の表示長を対応された場合に、文字が縮小されて読み
難くなるようなときに、文字列の表示長に対応させてシ
ンボルの表示長を表示制御するので、文字が読みやすい
表示が可能となる。According to the present invention, when the display length of the symbol corresponds to the display length of the symbol and the character is reduced and becomes difficult to read, the display length of the character string is made to correspond to the display length of the symbol. Since the display length is controlled to be displayed, it is possible to display characters in an easy-to-read manner.

【００２１】そして、本発明では、上記の音声情報処理
装置において、シンボル表示制御部が、音声情報の属性
に応じてシンボルの表示属性を決定してシンボルの表示
制御を行うように構成している。Further, according to the present invention, in the above-described voice information processing apparatus, the symbol display control unit is configured to determine the display attribute of the symbol according to the attribute of the voice information and control the display of the symbol. .

【００２２】本発明によれば、シンボル表示制御部で、
音量、周波数、ピッチ（周期性）等の音声情報の属性に
応じて、線幅、表示サイズ、表示間隔、表示色、アンダ
ーライン、傾き、網掛等のシンボルの表示属性を決定し
て、シンボルの表示制御を行うよう構成しているので、
音声情報の内容が視覚的に容易に認識できる表示が可能
となる。According to the present invention, in the symbol display controller,
The display attributes of symbols such as line width, display size, display interval, display color, underline, tilt, and shading are determined according to the attributes of audio information such as volume, frequency, pitch (periodicity), and the like. Since it is configured to perform display control,
It is possible to display such that the contents of the voice information can be easily visually recognized.

【００２３】さらに、本発明では、上記の音声情報処理
装置において、シンボル表示制御部が、音声情報の複数
の属性に応じてシンボルの複数の表示属性を決定してシ
ンボルの表示制御を行うように構成している。Further, according to the present invention, in the above-described voice information processing apparatus, the symbol display control unit determines a plurality of display attributes of the symbol according to a plurality of attributes of the voice information and controls the display of the symbol. I am configuring.

【００２４】本発明によれば、シンボル表示制御部で、
音声情報の複数の属性に応じて、シンボルの複数の表示
属性を決定して、シンボルの表示制御を行うよう構成し
ているので、音声情報の内容の複数の情報が一見して視
覚的に容易に認識できる表示が可能となる。According to the present invention, in the symbol display controller,
The display attributes of the symbols are determined by determining the display attributes of the symbols according to the attributes of the audio information, so it is easy to visually see the information of the audio information contents. A display that can be recognized by the user becomes possible.

【００２５】[0025]

【発明の実施の形態】以下、本発明の実施形態につい
て、図面を参照して説明する。図１に、本発明の音声情
報処理装置の第１の実施形態のブロック図を示す。この
音声情報処理装置は、テキストエディタと同様のインタ
フェースを想定したものである。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 shows a block diagram of a first embodiment of a voice information processing apparatus of the present invention. This voice information processing device assumes an interface similar to a text editor.

【００２６】図１において、１０１は音声情報を入力す
る音声入力部、１０２は音声入力部１０１から入力され
た音声情報のデータ解析を行うデータ解析部、１０３は
データ解析部１０２において解析された音声データを表
すシンボルの表示制御を行うシンボル表示制御部、１０
４はデータ解析部１０３において解析された無音部分を
表す表示の制御を行う無音部表示制御部、１０５はシン
ボル表示制御部１０３により制御された音声データを表
すシンボルの表示を行う表示部である。なお、音声入力
部１０１としては、予め入力された音声データを保持し
ている外部メモリや、マイクロホン等を用いることがで
きる。ただし、音声入力部１０１として、マイクロホン
などメモリ機能のないものを用いた場合、後に音声デー
タの編集、再利用等を行うには、音声データを蓄積して
おくメモリが必要となる。In FIG. 1, 101 is a voice input unit for inputting voice information, 102 is a data analysis unit for analyzing data of voice information input from the voice input unit 101, and 103 is voice analyzed by the data analysis unit 102. Symbol display control unit for controlling display of symbols representing data, 10
Reference numeral 4 denotes a silent portion display control unit that controls the display of the silent portion analyzed by the data analysis unit 103, and reference numeral 105 denotes a display unit that displays the symbols representing the audio data controlled by the symbol display control unit 103. As the voice input unit 101, an external memory holding previously input voice data, a microphone, or the like can be used. However, when a voice input unit 101 that does not have a memory function, such as a microphone, is used, a memory for storing voice data is required for later editing and reuse of voice data.

【００２７】図１に示すように、音声入力部１０１から
入力された音声データは、データ解析部１０２で解析さ
れる。この解析により得られた結果のうち、音声データ
の時間長の情報を、シンボル表示制御部１０３に伝達す
る。すると、シンボル表示制御部１０３では、表示する
シンボルの１単位の形状が高さに対して幅が１／４の短
冊状となるものを用意し、音声情報の約０．０５秒に対
して、シンボル１単位の割合で表示するように制御す
る。これによって、縦横比が１：１のシンボルで約０．
２秒の音声に対応するので、日本語漢字仮名交じり文を
朗読した場合の発話速度と文字数に近い表示となる。そ
の表示の一例を、図２に示す。これは、「テキストから
音声合成する」というテキスト（「てきすとからおんせ
いごうせいする」という音声）に対応したシンボルを、
テキスト及び音声に対応させて示したものであり、「て
きすとからおんせいごうせいする」という音声の発話に
要する時間が約２．８秒であることを示す。As shown in FIG. 1, the voice data input from the voice input unit 101 is analyzed by the data analysis unit 102. Of the results obtained by this analysis, information on the time length of the audio data is transmitted to the symbol display control unit 103. Then, the symbol display control unit 103 prepares one unit shape of the symbol to be displayed in a strip shape having a width of ¼ with respect to the height, and for about 0.05 seconds of voice information, The display is controlled to be displayed at the rate of one symbol. As a result, a symbol having an aspect ratio of 1: 1 is about 0.
Since it corresponds to 2 seconds of voice, the display is similar to the speaking speed and the number of characters when reading a sentence containing Japanese kanji and kana. An example of the display is shown in FIG. This is a symbol that corresponds to the text "Speech synthesis from text" (speech "Speak from the text").
It is shown in correspondence with text and voice, and it indicates that the time required for uttering the voice "I'll come to you" is about 2.8 seconds.

【００２８】また、本実施形態の場合、表示位置の算定
では、テキストエディタと類似の外見を実現するよう
に、表示位置が左上から右向きに直線的に指定され、表
示範囲の右端に至ると次の行の右端から継続するように
なっている。Further, in the case of the present embodiment, in the calculation of the display position, the display position is linearly specified from the upper left to the right so as to realize an appearance similar to that of the text editor, and when the display position reaches the right end, It is designed to continue from the right end of the line.

【００２９】音声データに無音部分がある場合には、無
音部分を表すシンボルとして、句読点、スペース、改行
等の表示を行う。すなわち、データ解析部１０１で、無
音区間を検出すると、無音区間の時間長を無音部表示制
御部１０４に伝達し、そこで無音時間に対応して無音を
表すシンボルとして、句点、読点、改行等を選択し、シ
ンボル制御部へ伝達する。無音シンボルは、０．００５
秒単位に３つまでで、読点は０．２秒、句点は０．４秒
の無音部分に対応させる。そして、無音部分が０．４秒
以上続く場合は、句点の後に改行を行い、その後０．８
秒を単位として、無音区間の時間長に対応した個数の改
行を行う。この結果、例えば、図３及び図４に示すよう
な表示となる。ここで、図３は短冊状シンボルによる２
次元表示を行ったものを示し、図４はそのシンボルの一
部を音声データと対応させて示したものである。When the voice data has a silent portion, punctuation marks, spaces, line feeds, etc. are displayed as symbols representing the silent portion. That is, when the data analysis unit 101 detects a silent section, the time length of the silent section is transmitted to the silent section display control unit 104, and as a symbol representing silence corresponding to the silent time, a punctuation mark, a reading point, a line break, etc. Selected and transmitted to the symbol controller. Silence symbol is 0.005
It corresponds to a silent part of 0.2 seconds for the reading point and 0.4 seconds for the punctuation mark, with a maximum of three per second. If the silent part continues for 0.4 seconds or more, a line break is inserted after the punctuation mark, and then 0.8
Line breaks are performed by the number of seconds corresponding to the time length of the silent section. As a result, for example, the display as shown in FIGS. 3 and 4 is obtained. Here, FIG. 3 shows 2 by the strip-shaped symbol.
FIG. 4 shows the one-dimensional display, and FIG. 4 shows a part of the symbol in association with the voice data.

【００３０】なお、無音部部の表示制御として、上記の
他、改行が２つ以上続いた場合に、以後の無音部分を無
視するなどの設定を行っても良い。In addition to the above, as the display control of the silent portion, it is possible to make a setting such that the silent portion thereafter is ignored when two or more line feeds continue.

【００３１】次に、第２の実施形態として、上記第１の
実施形態のものに加え、入力された音声データに基づい
て表示される短冊状シンボルに、利用者が文字列を多重
して表示できる機能を有する音声情報処理装置について
説明する。Next, as a second embodiment, in addition to that of the first embodiment, the user displays the strip-shaped symbol displayed based on the input voice data by superimposing a character string. A voice information processing apparatus having a function that can be performed will be described.

【００３２】第２の実施形態の音声情報処理装置は、図
５に示すように、上記第１の実施形態のものに、文字列
等のテキスト入力を行うテキスト入力部１０７と、利用
者が処理の指示を行う処理指示部１０８と、テキスト入
力部１０７から入力されたテキストデータを処理指示部
１０８からの指示に基づいて処理を行うデータ処理部１
０６と、音声データの音声出力を行う音声出力部１０９
とを加えた構成となっているものである。As shown in FIG. 5, the voice information processing apparatus of the second embodiment is different from that of the first embodiment in that a text input unit 107 for inputting text such as a character string and a user process. And a data processing unit 1 for processing the text data input from the text input unit 107 based on the instruction from the processing instruction unit 108.
06, and an audio output unit 109 that outputs audio data.
It has a configuration that includes and.

【００３３】上記第１の実施形態で説明したように、短
冊状のシンボルは音声データを解析することで決定でき
るが、シンボルに文字列を多重する場合には文字列を音
声データと関連付けて保持する必要がある。そこで、ま
ず、その関連付けについて、第２の実施形態による表示
例を示す図６を用いて説明する。まず、音声入力部１０
１に入力された音声データＶ１０００１、Ｖ１０００２
に基づいて、表示部１０５でシンボルＳ１００１、Ｓ１
００２が表示されたとする（図６（ａ））。As described in the first embodiment, the strip-shaped symbol can be determined by analyzing the voice data, but when the character string is multiplexed on the symbol, the character string is held in association with the voice data. There is a need to. Therefore, first, the association will be described with reference to FIG. 6 showing a display example according to the second embodiment. First, the voice input unit 10
Voice data V10001, V10002 input to
On the display unit 105, the symbols S1001 and S1
It is assumed that 002 is displayed (FIG. 6A).

【００３４】次に、利用者が目的の音声データ部分を類
推し、処理指示部１０８からその部分を選択する。する
と、その部分の音声データがデータ解析部１０２からデ
ータ処理部１０６を介して音声出力部１０９により再生
されるので、それが対象のものであるかどうかを確認す
る。そして、その前後に不要な部分があったり、選択領
域が不十分であった場合には、処理指示部１０８により
一旦選択を解除してから、再度選択を行う。選択が正し
く、処理指示部１０８でシンボルＳ１００２を選択した
時点で、データ処理部１０６に選択範囲を通知する。Next, the user analogizes the target voice data portion and selects the portion from the processing instruction unit 108. Then, the audio data of the portion is reproduced by the audio output unit 109 from the data analysis unit 102 via the data processing unit 106, so that it is confirmed whether or not it is the target. Then, if there is an unnecessary portion before or after that, or if the selection area is insufficient, the process instructing unit 108 once cancels the selection and then performs the selection again. When the selection is correct and the processing instruction unit 108 selects the symbol S1002, the data processing unit 106 is notified of the selection range.

【００３５】そして、このように処理指定部１０８でシ
ンボルＳ１００２が選択された状態として、テキスト入
力部１０７にテキストＴ１００２が入力されると、デー
タ処理部１０６ではシンボルＳ１００２とテキストＴ１
００２を関連付ける。データ解析部１０２では、関連付
けられたテキストＴ１００２を検出し、シンボル表示制
御部１０３に伝達する。そして、表示部１０５では、そ
のテキストデータの文字列の表示長に応じて、シンボル
の表示長を文字列の表示長と同じになるように調整し、
シンボルを文字列と多重して、図６（ｂ）のようにして
表示する。図６（ｂ）においては、一例として、テキス
トＴ１００２を「テキストを重ねる」としている。When the text T1002 is input to the text input unit 107 with the symbol S1002 selected by the process designation unit 108, the data processing unit 106 displays the symbol S1002 and the text T1.
002. The data analysis unit 102 detects the associated text T1002 and transmits it to the symbol display control unit 103. Then, in the display unit 105, the display length of the symbol is adjusted to be the same as the display length of the character string according to the display length of the character string of the text data,
The symbol is multiplexed with the character string and displayed as shown in FIG. In FIG. 6B, as an example, the text T1002 is “overlaid with text”.

【００３６】なお、上記第２の実施形態では、テキスト
データの文字列の表示長に応じて、シンボルの表示長を
文字列の表示長と同じになるように調整し、シンボルを
文字列と多重するようにしたが、これは、文字が小さく
なって読み難くなるのを防止するためのものである。こ
のような不都合が発生しない場合には、音声データに基
づくシンボルの表示長に応じて、シンボルの表示長を音
声データのシンボル列の表示長と同じになるように調整
し、シンボルを文字列と多重するようにしても良い。In the second embodiment, the display length of the symbol is adjusted to be the same as the display length of the character string according to the display length of the character string of the text data, and the symbol is multiplexed with the character string. However, this is to prevent the characters from becoming small and difficult to read. If such an inconvenience does not occur, the display length of the symbol is adjusted to be the same as the display length of the symbol string of the audio data according to the display length of the symbol based on the audio data, and the symbol is changed to the character string. You may make it multiple.

【００３７】次いで、第２の実施形態において、短冊状
シンボルによる２次元表示の一例を示す図７を用いて、
全体的なシンボル表示について説明する。本実施形態で
は、音声データを、表示色が白から黒までの無彩色で明
るさ（濃度）を１６等分した１６種類のシンボルで表現
するものである。シンボルの表示色の決定に関しては、
各シンボルに対応する音声データの音量の平均値を算出
し、値の大きいものほど黒く（濃く）なるように予め定
めておいたテーブルと算出した平均値を比較し、対応す
る表示色のシンボルを採用するものであり、その表示の
一例が図７に示すようなものである。Next, in the second embodiment, referring to FIG. 7 showing an example of two-dimensional display by strip-shaped symbols,
The overall symbol display will be described. In the present embodiment, the audio data is represented by 16 types of symbols in which the display color is an achromatic color from white to black and the brightness (density) is divided into 16 equal parts. For determining the display color of the symbol,
Calculate the average value of the volume of the voice data corresponding to each symbol, compare the calculated average value with a table that is predetermined so that the larger value becomes darker (darker), and display the symbol of the corresponding display color. This is adopted, and an example of the display is as shown in FIG.

【００３８】なお、上記第２の実施形態では音声情報
（音声データ）の音量に対応してシンボルの表示色（表
示濃度）を決定したが、これに代えて、図８に示すよう
に、シンボルの表示属性として、種類の網掛を用いてる
こともできる。また、これ以外にも、線幅、表示サイ
ズ、表示間隔、アンダーライン、傾き等の表示属性を用
いて、シンボルの表示を行っても良い。そして、上記２
実施形態では、音声の属性として音量に対応したシンボ
ル表示を行ったが、音量に代えて、周波数成分、ピッチ
等の音声の属性を用いて、シンボル表示を行っても良
い。In the second embodiment, the display color (display density) of the symbol is determined according to the volume of the voice information (voice data), but instead of this, as shown in FIG. It is also possible to use a kind of shading as a display attribute of. In addition to this, symbols may be displayed using display attributes such as line width, display size, display interval, underline, and inclination. And the above 2
In the embodiment, the symbol display corresponding to the volume is performed as the voice attribute, but the symbol display may be performed using the voice attribute such as the frequency component and the pitch instead of the volume.

【００３９】なお、音声出力部１０９は、適宜音声デー
タの再生（音声出力）に用いることができるものであ
る。The audio output unit 109 can be used for reproduction (audio output) of audio data as appropriate.

【００４０】第３の実施形態として、音声の複数の属性
に対応させて、シンボルの複数の表示属性による表示を
行うものについて説明する。本実施形態の処理は、上記
第２の実施形態とほぼ同様にして行うものであるが、異
なる点は、音声情報（音声データ）の音量、周波数、ピ
ッチ等の音声の複数の属性に基づいて、表示色、線幅、
表示サイズ、表示間隔、アンダーライン、傾き等のシン
ボルの複数の属性により、シンボルの表示制御を行うこ
とである。As a third embodiment, a description will be given of a case where a symbol is displayed with a plurality of display attributes in association with a plurality of voice attributes. The process of the present embodiment is performed almost in the same manner as in the second embodiment, except that it is based on a plurality of audio attributes such as volume, frequency, and pitch of audio information (audio data). , Display color, line width,
Symbol display control is performed by a plurality of attributes of the symbol such as display size, display interval, underline, and inclination.

【００４１】第３の本実施形態のシンボル表示の一例を
図９に示す。図９に示したものは、音量に対応させて表
示サイズを変化させ、ピッチに対応させて網掛を変化さ
せて表示させたものである。より詳細に説明すると、音
声データＶ７０１の領域は音量が大きくピッチ（周期
性）が荒いので、それに対応するシンボルＳ７０１は表
示サイズが大きく網掛が薄くなっている。そして、音声
データＶ７０２の領域では音量が大きくピッチ（周期
性）が細かいので、それに対応するシンボルＳ７０２は
表示サイズが大きく網掛が濃くなっている。また、音声
データＶ７０３の領域は音量が小さくピッチ（周期性）
が荒いので、それに対応するシンボルＳ７０３は表示サ
イズが小さく網掛が薄くなっている。そして、音声デー
タＶ７０４の領域では音量が小さくピッチ（周期性）が
細かいので、それに対応するシンボルＳ７０４は表示サ
イズが小さく網掛が濃くなっている。FIG. 9 shows an example of the symbol display of the third embodiment. The display shown in FIG. 9 is displayed by changing the display size in correspondence with the volume and changing the meshing in correspondence with the pitch. More specifically, since the area of the audio data V701 has a large volume and a rough pitch (periodicity), the corresponding symbol S701 has a large display size and a light shade. In the area of the audio data V702, since the volume is large and the pitch (periodicity) is fine, the corresponding symbol S702 has a large display size and a dark shade. Also, the volume of the audio data V703 is small and the pitch (periodicity) is small.
Is rough, the corresponding symbol S703 has a small display size and a light shade. Since the volume is small and the pitch (periodicity) is fine in the area of the audio data V704, the display size of the corresponding symbol S704 is small and the shade is dark.

【００４２】[0042]

【発明の効果】以上のように、本発明によれば、視覚的
にその内容が容易に認識できる音声情報の表示が可能と
なる。As described above, according to the present invention, it is possible to display voice information whose contents can be easily visually recognized.

【００４３】したがって、本発明を採用すれば、音声デ
ータの編集、検索も容易になり、また、音声データの一
部を編集するような音声データの再利用も容易に行える
ようになる。Therefore, by adopting the present invention, it is possible to easily edit and search the voice data, and also to easily reuse the voice data such as editing a part of the voice data.

【００４４】また、本発明によれば、無音部分シンボ
ル、句読点、改行の自動表示を行うことにより、音声の
内容をより容易に類推できる表示が可能となるので、音
声データの再利用が容易になる。Further, according to the present invention, since silent parts symbols, punctuation marks, and line breaks are automatically displayed, it is possible to more easily analogize the contents of the voice, so that the voice data can be reused easily. Become.

【００４５】また、本発明によれば、音声シンボルとテ
キストとを利用することにより、音声データの内容を文
字情報で表示でき、音声データの再利用性を向上させる
ことが可能となる。Further, according to the present invention, the contents of the voice data can be displayed as character information by using the voice symbol and the text, and the reusability of the voice data can be improved.

【００４６】さらに、本発明は、携帯端末などの携帯性
を重視する小形のシステムに好適なものであり、これら
に適用されれば、入力効率を犠牲にすることなく、デー
タの入出力が可能である。Furthermore, the present invention is suitable for small-sized systems such as portable terminals that place importance on portability, and if applied to these systems, data input / output can be performed without sacrificing input efficiency. Is.

【００４７】また、本発明は、ペンインターフェースや
タッチパネル等と複合的に応用すれば、直感的で初心者
にもなじみやすいインターフェースを構築することが可
能となる。If the present invention is applied in combination with a pen interface, a touch panel, etc., it is possible to construct an interface that is intuitive and familiar to beginners.

[Brief description of drawings]

【図１】本発明の第１の実施形態の音声情報処理装置の
概略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a voice information processing device according to a first embodiment of the present invention.

【図２】「テキストから音声合成する」という日本語漢
字仮名交じり文のテキストを朗読した場合のそれに対応
する表示シンボルと音声を示す図である。FIG. 2 is a diagram showing a display symbol and a voice corresponding to a text of a Japanese kanji kana mixed sentence of “synthesize voice from text”.

【図３】短冊状シンボルによる２次元表示を行った例を
示す図である。FIG. 3 is a diagram showing an example in which two-dimensional display is performed using strip-shaped symbols.

【図４】図３のシンボル表示の一部を音声データに対応
させて示す図である。FIG. 4 is a diagram showing a part of the symbol display of FIG. 3 in association with audio data.

【図５】本発明の第２の実施形態の音声情報処理装置の
概略構成を示すブロック図である。FIG. 5 is a block diagram showing a schematic configuration of a voice information processing device according to a second embodiment of the present invention.

【図６】音声データに基づくシンボルにテキストに基づ
くシンボルを多重して表示する処理について、テキス
ト、シンボル、及びテキストをそれぞれ対応させて示す
図である。FIG. 6 is a diagram showing a text, a symbol, and a text in association with each other in a process of displaying a symbol based on a text and a symbol based on text in a multiplexed manner.

【図７】表示色（表示濃度）を利用したシンボルによる
２次元表示を行った例を示す図である。FIG. 7 is a diagram showing an example in which two-dimensional display is performed using symbols using display colors (display densities).

【図８】網掛を利用したシンボルによる２次元表示を行
った例を示す図である。FIG. 8 is a diagram showing an example in which two-dimensional display is performed by using a symbol using a mesh.

【図９】本発明の第３の実施形態の音声情報処理装置に
おいて、音声情報の複数の属性にシンボルの複数の表示
属性を対応させた表示の一例であり、音声情報の音量に
表示サイズ、音声情報のピッチ（周期性）に網掛をそれ
ぞれ対応させた場合の音声データとシンボルを示す図で
ある。FIG. 9 is an example of a display in which a plurality of display attributes of a symbol are associated with a plurality of attributes of voice information in the voice information processing apparatus according to the third embodiment of the present invention. It is a figure which shows the audio | voice data and the symbol at the time of making each mesh correspond to the pitch (periodicity) of audio | voice information.

[Explanation of symbols]

１０１音声入力部１０２データ解析部１０３シンボル表示制御部１０４無音声部表示制御部１０５表示部１０６データ処理部１０７テキスト入力部１０８処理指示部１０９音声出力部 101 voice input unit 102 Data analysis unit 103 Symbol display control unit 104 Voiceless display control unit 105 display 106 data processing unit 107 text input section 108 processing instruction unit 109 audio output section

Claims

(57) [Claims]

1. A data analysis unit for analyzing voice information,
The time length of the voice information analyzed by the data analysis unit
A thin control that displays as a symbol of the corresponding size.
In the voice information processing device composed of the display control unit
In addition, the symbol display control unit starts from 0.1 seconds of voice information.
Symbol display with a unit of time length up to 0.3 seconds
Control, and about 0.05 seconds of voice information, the height
Display a symbol with a ratio of width to width of 4: 1
A voice information processing device characterized by:

2. The voice information processing device according to claim 1.
A text input section for inputting a character string, and
The display control unit displays the
A symbol table that multiplexes the character strings entered in the text input section.
An audio information processing apparatus, characterized by performing the control shown.

3. The voice information processing apparatus according to claim 2,
The symbol display control unit corresponds to the display length of the character string.
And display control of the symbol display length is performed.
Voice information processing device.

4. The method according to any one of claims 1 to 3.
In the voice information processing device, the symbol display control unit changes the system according to the attribute of voice information.
Determine display attributes of symbols and control symbol display
A voice information processing device characterized by the above.

5. The voice information processing apparatus according to claim 4,
And the symbol display control unit responds to a plurality of attributes of voice information.
The symbol display to determine multiple display attributes for the symbol.
An audio information processing apparatus characterized by performing an indication control.