JP7624571B2

JP7624571B2 - Voice recognition system, voice recognition device, voice recognition method, and program used for facility inspection, etc.

Info

Publication number: JP7624571B2
Application number: JP2020076808A
Authority: JP
Inventors: 健太郎山本; 潤一千嶋; 佑記片▲瀬▼; 和輝西山; 勝彦須賀; 恵里宮田; 浩一郎武田
Original assignee: Takasago Thermal Engineering Co Ltd
Current assignee: Takasago Thermal Engineering Co Ltd
Priority date: 2020-04-23
Filing date: 2020-04-23
Publication date: 2025-01-31
Anticipated expiration: 2040-04-23
Also published as: JP2021173841A; JP2025028327A

Description

本発明は、設備点検等に用いられる音声認識システム、音声認識装置、音声認識方法、及び、プログラムに関する。 The present invention relates to a voice recognition system, a voice recognition device, a voice recognition method, and a program used for equipment inspection, etc.

設備等を点検する上で、点検を行う者を支援する技術が知られている。 Technology is known that assists inspectors when inspecting equipment, etc.

例えば、タブレット端末、又は、ハンディーターミナル等によって、設備に張り付けたバーコードをスキャンして管理する技術が知られている。このようなバーコードを用いることで、機器の状況を素早く判断する。また、点検データが電子化されて集計及び管理されるため、点検のし忘れ、及び、点検の漏れといったヒューマンエラーを防止し、確実な点検の実施を支援する技術が知られている（例えば、非特許文献１等）。 For example, there is a known technology that uses a tablet terminal or handheld terminal to scan and manage barcodes attached to equipment. By using such barcodes, the status of the equipment can be quickly determined. In addition, there is a known technology that prevents human error such as forgetting to inspect or missing an inspection by digitizing, and supports the implementation of reliable inspections (for example, Non-Patent Document 1, etc.).

"設備巡回点検システム"、［ｏｎｌｉｎｅ］、［令和２年３月１１日検索］、インターネット〈URL:http://www.tm-es.co.jp/service-product/services/products/mimawari-kun-mit.html〉"Facility Patrol Inspection System", [online], [searched on March 11, 2020], Internet <URL: http://www.tm-es.co.jp/service-product/services/products/mimawari-kun-mit.html>

従来の技術では、点検の説明等を行う音声がすべて出力された後でないと、ユーザは、点検結果を音声で入力できない場合がある。そのため、装置による音声の出力が終わるまで待たないと、ユーザによる音声の入力、音声認識の処理を開始、又は、音声認識の処理結果を出力できないため、音声の入力を行うための待ち時間が長くなる場合がある。 In conventional technology, a user may not be able to input the inspection results by voice until all audio explaining the inspection has been output. As a result, the user cannot input voice, start voice recognition processing, or output the results of voice recognition processing until the device has finished outputting the audio, which can result in long wait times for voice input.

本発明は、上記課題に鑑みてなされたものであり、音声の入力を行うための時間を短縮させることを目的とする。 The present invention was made in consideration of the above problems, and aims to reduce the time required to input voice.

本発明の各実施形態による音声認識システム等は、以下のような構成を含む。 The speech recognition system according to each embodiment of the present invention includes the following configuration:

音声認識システム（例えば、音声認識システム１である。）は、
第１音声（例えば、第１音声ＳＤ１である。）を入力する音声入力手段（例えば、ステップＳ０３である。）と、
前記第１音声に基づいて、音声認識を行う音声認識手段（例えば、ステップＳ０５である。）と、
第２音声（例えば、第２音声ＳＤ２である。）を登録する登録手段（例えば、ステップＳ０１である。）と、
前記第２音声を出力する出力手段（例えば、ステップＳ０２である。）と、
前記第２音声が出力されている間に、前記第１音声が入力されると、前記第２音声の出力を制限する制限手段（例えば、ステップＳ０４である。）と
を含む。 A speech recognition system (for example, speech recognition system 1) includes:
A voice input unit (e.g., step S03) for inputting a first voice (e.g., the first voice SD1);
A voice recognition unit (e.g., step S05) that performs voice recognition based on the first voice;
A registration means (e.g., step S01) for registering a second voice (e.g., the second voice SD2);
An output means for outputting the second sound (e.g., step S02);
The voice input device further includes a limiting means (eg, step S04) for limiting the output of the second voice when the first voice is input while the second voice is being output.

このような構成であると、音声の入力を行うための時間を短縮できる。 This configuration can reduce the time it takes to input voice.

また、音声認識システムは、設備の点検に用いられるのが望ましい（例えば、図６である）。 It is also desirable for the voice recognition system to be used for equipment inspection (see, for example, Figure 6).

このような構成であると、点検を効率良く行うことができる。 This configuration allows inspections to be carried out efficiently.

また、音声認識システムは、
前記第１音声の音声認識結果に基づいて、前記第１音声の内容を示す第１入力データを生成する入力データ生成手段と、
前記第１入力データをチェックするのに用いられる第２入力データを記憶する記憶手段と、
前記第１入力データと前記第２入力データを比較して異常であるか否かを判断する判断手段と
を更に含む構成が望ましい。 In addition, the voice recognition system
an input data generating means for generating first input data indicating the content of the first voice based on a speech recognition result of the first voice;
a storage means for storing second input data used to check the first input data;
It is preferable that the configuration further includes a determination means for comparing the first input data with the second input data to determine whether or not the first input data is abnormal.

また、音声認識システムでは、
前記第２入力データは、
前記第１入力データより前に入力され（例えば、前回結果Ｖ２１である。）、
前記判断手段は、
前記第１入力データと前記第２入力データが異なる、又は、前記第１入力データが前記第２入力データに対して許容範囲（例えば、許容範囲Ｖ２２である。）外であると、異常であると判断するのが望ましい。 In addition, the voice recognition system
The second input data is
Inputted before the first input data (for example, the previous result V21),
The determination means is
It is desirable to determine that an abnormality exists if the first input data and the second input data are different, or if the first input data is outside an allowable range (for example, allowable range V22) with respect to the second input data.

また、音声認識システムでは、
前記第２入力データは、
正常値又は正常範囲を示す値（例えば、正常値Ｖ２３である。）であり、
前記判断手段は、
前記第１入力データと前記第２入力データが異なる、又は、前記第１入力データが前記正常範囲の範囲外であると、異常であると判断するのが望ましい。 In addition, the voice recognition system
The second input data is
A value indicating a normal value or a normal range (for example, normal value V23),
The determination means is
It is preferable that if the first input data and the second input data are different, or if the first input data is outside the normal range, it is determined that an abnormality has occurred.

このように、前回結果Ｖ２１、すなわち、直近の点検結果と比較して、違いがあるような場合には、設備に異常があると判断される。このようなチェックが行われると、音声認識システム１は、異常が発生している設備を知らせることができる。 In this way, if there is a difference when compared with the previous result V21, i.e., the most recent inspection result, it is determined that there is an abnormality in the equipment. When such a check is performed, the voice recognition system 1 can notify the user of the equipment in which the abnormality is occurring.

また、許容範囲Ｖ２２によって、前回結果Ｖ２１に対して幅を持たせる構成であってもよい。特に、数値が点検の対象となる場合には、数値の微小な変動が異常でない場合が多い。したがって、このように許容できる範囲が設定できると、異常を精度良く判断できる。 The tolerance range V22 may also be configured to provide a margin for the previous result V21. In particular, when a numerical value is subject to inspection, minute fluctuations in the numerical value are often not abnormal. Therefore, if an acceptable range can be set in this way, abnormalities can be determined with high accuracy.

また、音声認識システムは、
第１情報処理装置と第２情報処理装置を含む音声認識システムであって、
前記第１情報処理装置は、
第１辞書（例えば、第１辞書Ｄ２１１である。）を用いて音声認識を行い、
前記第２情報処理装置は、
第２辞書（例えば、第２辞書Ｄ２１２である。）を用いて、前記第１音声を認識する前記音声認識手段を含み（例えば、図１７である。）、
前記第２辞書は、点検の分野用の辞書であるが望ましい。 In addition, the voice recognition system
A speech recognition system including a first information processing device and a second information processing device,
The first information processing device,
Performing speech recognition using a first dictionary (e.g., the first dictionary D211);
The second information processing device is
The speech recognition means recognizes the first speech by using a second dictionary (e.g., the second dictionary D212) (e.g., FIG. 17 ),
The second dictionary is preferably a dictionary for the field of inspection.

第２情報処理装置は、携帯する情報処理装置であるため、記憶領域ＭＥＭが第１情報処理装置と比較すると小さくなりやすい。 Since the second information processing device is a portable information processing device, the memory area MEM tends to be smaller than that of the first information processing device.

また、第２辞書Ｄ２１２は、点検の分野用の辞書である。したがって、第１辞書Ｄ２１１より、第２辞書Ｄ２１２は、データの容量を小さくできる。そのため、第２辞書Ｄ２１２は、第２情報処理装置のように、第１情報処理装置と比較して、記憶装置の記憶できる容量が小さい記憶領域ＭＥＭの情報処理装置であっても、記憶できる辞書が用いられるのが望ましい。 The second dictionary D212 is a dictionary for the inspection field. Therefore, the data capacity of the second dictionary D212 can be made smaller than that of the first dictionary D211. Therefore, it is preferable that the second dictionary D212 is a dictionary that can be stored even in an information processing device with a memory area MEM that has a smaller storage capacity than the first information processing device, such as the second information processing device.

点検の分野用の辞書は、点検に良く用いられる用語を音声認識するのに適する辞書である。例えば、点検の分野用の辞書は、数値、点検結果に用いられる用語及び設備の名称等が設定される。このように、点検において使用頻度が高い用語に絞った辞書が用いられると、音声認識を小さい記憶領域で実現でき、かつ、精度良く音声認識を実行して第１入力データＶ１０を生成できる。 The dictionary for the inspection field is a dictionary suitable for voice recognition of terms frequently used in inspection. For example, a dictionary for the inspection field is set with numerical values, terms used in inspection results, names of equipment, etc. In this way, when a dictionary limited to terms frequently used in inspection is used, voice recognition can be realized in a small memory area, and voice recognition can be performed with high accuracy to generate the first input data V10.

また、音声認識システムは、
前記第１音声に含まれるノイズ（例えば、第１ノイズＮＺ１、第２ノイズＮＺ２及び第３ノイズＮＺ３である。）をキャンセルするノイズキャンセル手段を更に含み、
前記音声認識手段は、
前記ノイズキャンセル手段によって前記ノイズを減衰させた第３音声を用いて音声認識を行い、
前記ノイズキャンセル手段は、現場又は位置ごとに、キャンセルの対象とする周波数帯域（例えば、第１周波数帯域ＦＲ１及び第２周波数帯域ＦＲ２である。）が設定されるのが望ましい（例えば、図１８である）。 In addition, the voice recognition system
The apparatus further includes a noise canceling unit that cancels noises (e.g., a first noise NZ1, a second noise NZ2, and a third noise NZ3) included in the first sound,
The voice recognition means
performing voice recognition using the third voice in which the noise has been attenuated by the noise canceling means;
It is desirable that the noise canceling means sets frequency bands to be cancelled (for example, a first frequency band FR1 and a second frequency band FR2) for each site or position (for example, as shown in FIG. 18).

このようにすると、それぞれの現場又は位置に適したノイズのキャンセルができる。そのため、音声に含まれるノイズを減衰させた第３音声で音声認識ができるため、音声認識の精度を向上させることができる。 In this way, noise can be canceled in a way that is appropriate for each site or location. As a result, voice recognition can be performed using the third voice in which the noise contained in the voice has been attenuated, thereby improving the accuracy of voice recognition.

また、音声認識システムでは、
前記制限手段は、前記第２音声の出力を停止する、前記第２音声の音量を小さくする、前記第２音声の音量を徐々に小さくする、前記第２音声の次に出力させる音声の出力を開始する、又は、前記第２音声の出力速度を速くして、前記第２音声の出力を制限する（例えば、ステップＳ０４である。）のが望ましい。 In addition, the voice recognition system
It is desirable for the limiting means to limit the output of the second voice by stopping the output of the second voice, reducing the volume of the second voice, gradually reducing the volume of the second voice, starting the output of a voice to be output after the second voice, or increasing the output speed of the second voice (e.g., step S04).

このような制限処理ＰＲ１が行われると、不要な音声の出力を少なくできる。 When this type of restriction process PR1 is performed, it is possible to reduce the output of unnecessary audio.

また、音声認識システムでは、
前記登録手段は、複数の点検項目を前記第２音声に対応させて登録し、
前記点検項目をまとめたグループ（例えば、グループＧＳである。）を設定するグループ設定手段と（例えば、図１３である。）、
前記グループを省略する省略操作（例えば、省略操作Ｃ３である。）を入力する省略操作手段と、
前記省略操作が入力されると、前記省略操作に対応する前記グループに属する前記点検項目に基づく前記第２音声の出力が省略されるのが望ましい。 In addition, the voice recognition system
The registration means registers a plurality of inspection items in association with the second sound,
A group setting means for setting a group (for example, group GS) that includes the inspection items (for example, FIG. 13 );
an omission operation means for inputting an omission operation (e.g., an omission operation C3) for omitting the group;
It is preferable that when the omission operation is input, output of the second sound based on the inspection item belonging to the group corresponding to the omission operation is omitted.

設備は、例えば、稼働していない場合には、点検を行わなくともよい場合がある。そのような設備に対する点検を省略できると、点検を効率良く行うことができる。 For example, equipment may not need to be inspected if it is not in operation. If inspection of such equipment can be omitted, inspections can be carried out more efficiently.

また、音声認識装置（例えば、携帯端末１１である。）は、
第１音声を入力する音声入力手段と、
前記第１音声に基づいて、音声認識を行う音声認識手段と、
第２音声を登録する登録手段と、
前記第２音声を出力する出力手段と、
前記第２音声が出力されている間に、前記第１音声が入力されると、前記第２音声の出力を制限する制限手段と
を含む。 In addition, the voice recognition device (for example, the mobile terminal 11)
A voice input means for inputting a first voice;
a voice recognition unit that performs voice recognition based on the first voice;
A registration means for registering a second voice;
an output means for outputting the second sound;
The device further includes a limiting means for limiting the output of the second sound when the first sound is input while the second sound is being output.

また、音声認識システムが行う音声認識方法（例えば、図３である。）であって、
音声認識システムが、第１音声を入力する音声入力手順と、
音声認識システムが、前記第１音声に基づいて、音声認識を行う音声認識手順と、
音声認識システムが、第２音声を登録する登録手順と、
音声認識システムが、前記第２音声を出力する出力手順と、
音声認識システムが、前記第２音声が出力されている間に、前記第１音声が入力されると、前記第２音声の出力を制限する制限手順と
を含む。 Also, a speech recognition method (for example, FIG. 3) performed by a speech recognition system,
a speech input step of inputting a first speech by the speech recognition system;
a speech recognition step in which a speech recognition system performs speech recognition based on the first speech;
a registration step in which the voice recognition system registers the second voice;
an output step of the speech recognition system outputting the second speech;
The voice recognition system includes a limiting step of limiting output of the second voice when the first voice is input while the second voice is being output.

また、音声認識方法をコンピュータに実行させるためのプログラムである（例えば、図３である）。 It is also a program for causing a computer to execute the speech recognition method (for example, FIG. 3).

本発明に係る各実施形態によれば、音声の入力を行うための時間を短縮できる。 Each embodiment of the present invention can reduce the time required to input voice.

音声認識システム１のシステム構成例を示す図である。FIG. 1 is a diagram illustrating an example of a system configuration of a voice recognition system 1. 情報処理装置のハードウェア構成例を示す図である。FIG. 2 is a diagram illustrating an example of a hardware configuration of an information processing device. 第１実施形態における全体処理例を示す図である。FIG. 4 is a diagram illustrating an example of an overall process in the first embodiment. 登録データベースＤ１の例を示す図である。FIG. 2 is a diagram showing an example of a registration database D1. 第１実施形態における処理結果の例を示す図である。FIG. 11 is a diagram illustrating an example of a processing result in the first embodiment. 第１実施形態における携帯端末の画面表示例を示す図である。FIG. 4 is a diagram showing an example of a screen display of a mobile terminal in the first embodiment. 第１実施形態における機能構成例を示す図である。FIG. 2 is a diagram illustrating an example of a functional configuration according to the first embodiment. 第２実施形態における全体処理例を示す図である。FIG. 11 is a diagram illustrating an example of an overall process in the second embodiment. 第２実施形態における処理結果の例を示す図である。FIG. 11 is a diagram illustrating an example of a processing result in the second embodiment. 第２実施形態における携帯端末の画面表示例を示す図である。FIG. 11 is a diagram showing an example of a screen display of a mobile terminal in the second embodiment. 第２実施形態における機能構成例を示す図である。FIG. 11 is a diagram illustrating an example of a functional configuration according to a second embodiment. グループ設定及び省略を行う変形例を示す図である。FIG. 13 is a diagram showing a modified example in which group setting and omission are performed. グループ設定の例を示す図である。FIG. 13 is a diagram illustrating an example of group settings. 第２入力データを前回結果とする例を示す図である。FIG. 13 is a diagram showing an example in which second input data is used as a previous result. 許容範囲を設定する例を示す図である。FIG. 13 is a diagram illustrating an example of setting a tolerance range. 第２入力データを正常値又は正常範囲とする例を示す図である。FIG. 13 is a diagram showing an example in which second input data is set as a normal value or a normal range. 第１辞書Ｄ２１１及び第２辞書Ｄ２１２を用いる変形例を示す図である。FIG. 23 is a diagram showing a modified example in which a first dictionary D211 and a second dictionary D212 are used. 現場ごとにキャンセルの対象とする周波数帯域を設定する例を示す図である。FIG. 13 is a diagram illustrating an example of setting a frequency band to be canceled for each site. 中断及び解除の第１変形例を示す図である。FIG. 13 is a diagram showing a first modified example of interruption and release. 中断及び解除の第２変形例を示す図である。FIG. 13 is a diagram showing a second modified example of interruption and release. 音声入力及び音声出力の変形例を示す図である。FIG. 13 is a diagram showing a modified example of audio input and audio output.

以下、発明を実施するための最適かつ最小限な形態について、図面を参照して説明する。なお、図面において、同一の符号を付す場合には、同様の構成であることを示し、重複する説明を省略する。また、図示する具体例は、例示であり、図示する以外の構成が更に含まれる構成であってもよい。 The optimal and minimal form for implementing the invention will be described below with reference to the drawings. Note that when the same reference numerals are used in the drawings, they indicate similar configurations, and duplicated explanations will be omitted. Also, the specific examples shown in the drawings are merely examples, and the configuration may further include configurations other than those shown in the drawings.

＜第１実施形態＞
＜全体構成例＞
図１は、音声認識システム１のシステム構成例を示す図である。例えば、音声認識システム１は、サーバ１０、音声認識装置の例である携帯端末１１、及び、イヤホン１２を含む構成である。 First Embodiment
<Overall configuration example>
1 is a diagram showing an example of a system configuration of a voice recognition system 1. For example, the voice recognition system 1 includes a server 10, a mobile terminal 11 which is an example of a voice recognition device, and an earphone 12.

サーバ１０、携帯端末１１、及び、イヤホン１２は、ネットワークＮＷを介して接続する。 The server 10, mobile terminal 11, and earphones 12 are connected via a network NW.

図示するように、ユーザ１３は、イヤホン１２を装着して、設備の点検を行う。また、ユーザ１３は、携帯端末１１を持って設備の点検を行う。一方で、サーバ１０は、設備の点検が行われる現場とは異なる位置に設置される。したがって、設定値及びデータ等は、サーバ１０にあらかじめ入力され、点検を行う際に、携帯端末１１は、ネットワークＮＷを介して、設定値及びデータ等を取得する。 As shown in the figure, the user 13 wears earphones 12 and inspects the equipment. The user 13 also inspects the equipment while holding a mobile terminal 11. Meanwhile, the server 10 is installed in a location different from the site where the equipment inspection is performed. Therefore, the setting values, data, etc. are input in advance to the server 10, and when the inspection is performed, the mobile terminal 11 acquires the setting values, data, etc. via the network NW.

点検の対象となる設備は、例えば、空調設備等である。したがって、点検の現場は、企業における機械室等である。そして、点検では、圧力計、電圧計、電流計、薬液残量計、温度計、及び、湿度計等の計測器が示す値が、ユーザ１３によって点検される。 The equipment to be inspected is, for example, air conditioning equipment. Therefore, the inspection site is a machine room in a company. During the inspection, the values indicated by measuring instruments such as a pressure gauge, a voltmeter, an ammeter, a chemical level gauge, a thermometer, and a hygrometer are checked by the user 13.

サーバ１０及び携帯端末１１は、情報処理装置である。例えば、情報処理装置は、以下のようなハードウェア構成の装置である。 The server 10 and the mobile terminal 11 are information processing devices. For example, the information processing device is a device with the following hardware configuration:

＜ハードウェア構成例＞
図２は、情報処理装置のハードウェア構成例を示す図である。例えば、サーバ１０等の情報処理装置は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ、以下「ＣＰＵＨＷ１」という。）、記憶装置ＨＷ２、ネットワークインタフェースＨＷ３、入力装置ＨＷ４、出力装置ＨＷ５、及び、インタフェースＨＷ６を有するハードウェア構成である。 <Hardware configuration example>
2 is a diagram showing an example of the hardware configuration of an information processing device. For example, an information processing device such as a server 10 has a hardware configuration including a CPU (Central Processing Unit, hereinafter referred to as "CPUHW1"), a storage device HW2, a network interface HW3, an input device HW4, an output device HW5, and an interface HW6.

ＣＰＵＨＷ１は、演算装置及び制御装置の例である。 CPUHW1 is an example of a calculation device and a control device.

記憶装置ＨＷ２は、主記憶装置及び補助記憶装置等である。 The memory device HW2 is a main memory device, an auxiliary memory device, etc.

ネットワークインタフェースＨＷ３は、ネットワークを介して外部装置等とデータを送受信する通信装置である。 The network interface HW3 is a communication device that transmits and receives data to and from external devices via a network.

入力装置ＨＷ４は、ユーザの操作を入力するための装置である。例えば、入力装置ＨＷ４は、マウス、及び、キーボード等である。 The input device HW4 is a device for inputting user operations. For example, the input device HW4 is a mouse and a keyboard.

出力装置ＨＷ５は、処理結果をユーザに出力するための装置である。例えば、出力装置ＨＷ５は、ディスプレイ等である。 The output device HW5 is a device for outputting the processing results to the user. For example, the output device HW5 is a display, etc.

インタフェースＨＷ６は、周辺機器を接続するためのコネクタ等である。 Interface HW6 is a connector for connecting peripheral devices.

イヤホン１２は、音声を入力する入力装置である。そして、イヤホン１２を介して入力されるユーザ１３が発する音声は、携帯端末１１に入力されて、音声認識等の処理が行われる。また、イヤホン１２は、音声を出力する出力装置である。したがって、イヤホン１２は、ユーザ１３に対して、携帯端末１１による処理によって音声を出力する。 The earphones 12 are an input device for inputting voice. Voice uttered by the user 13 and input through the earphones 12 is then input to the mobile terminal 11, where processing such as voice recognition is performed. The earphones 12 are also an output device for outputting voice. Therefore, the earphones 12 output voice to the user 13 through processing by the mobile terminal 11.

以下、ユーザ１３がイヤホン１２で入力する音声を「第１音声ＳＤ１」という。一方で、イヤホン１２を介してユーザ１３に出力される音声を「第２音声ＳＤ２」という。 Hereinafter, the audio input by the user 13 through the earphone 12 is referred to as the "first audio SD1." On the other hand, the audio output to the user 13 through the earphone 12 is referred to as the "second audio SD2."

＜全体処理例＞
図３は、第１実施形態における全体処理例を示す図である。以下、図示するように、全体処理を「事前処理」と「本処理」に分けて説明する。「事前処理」は、「本処理」より前に行われる、音声認識システム１を運用する準備となる処理である。したがって、「事前処理」と「本処理」は連続して行われなくともよく、「事前処理」が「本処理」が開始されるまでに完了していればよい。 <Overall processing example>
3 is a diagram showing an example of the overall processing in the first embodiment. Below, as shown in the figure, the overall processing will be explained by dividing it into "pre-processing" and "main processing". The "pre-processing" is a process that is performed before the "main processing" and is a preparation process for operating the speech recognition system 1. Therefore, the "pre-processing" and the "main processing" do not have to be performed consecutively, and it is sufficient that the "pre-processing" is completed before the "main processing" starts.

＜事前処理の例＞
「事前処理」では、以下のような処理が行われる。 <Example of pre-processing>
In the "pre-processing", the following processing is carried out:

ステップＳ０１では、音声認識システム１は、例えば、点検項目、及び、第２音声ＳＤ２用のデータ等を登録して、登録データベースＤ１を構築する。すなわち、登録データベースＤ１は、点検において、第２音声ＳＤ２となって読み上げられる内容を示す。また、入力される音声のデータのうち、どのデータをどのタイミングで第２音声ＳＤ２として出力すればよいかといった点検項目との対応付けも、登録データベースＤ１上に登録される。例えば、このように登録手順が行われる。このようにして、「事前処理」では、音声認識システム１は、「本処理」で第２音声ＳＤ２を出力できるようにするためのデータの入力及び設定を行う。具体的には、登録データベースＤ１は、以下のように構築される。 In step S01, the voice recognition system 1 registers, for example, inspection items and data for the second voice SD2 to construct a registration database D1. That is, the registration database D1 indicates the contents to be read out as the second voice SD2 during the inspection. In addition, the correspondence between the inspection items and which data of the input voice should be output as the second voice SD2 and at what timing is also registered in the registration database D1. For example, the registration procedure is performed in this manner. In this way, in the "pre-processing", the voice recognition system 1 inputs and sets data to enable the second voice SD2 to be output in the "main processing". Specifically, the registration database D1 is constructed as follows.

図４は、登録データベースＤ１の例を示す図である。図示するように、登録データベースＤ１は、「項目番号」及び「点検項目」で構成される。 Figure 4 shows an example of registration database D1. As shown in the figure, registration database D1 is composed of "item numbers" and "inspection items."

例えば、第２音声ＳＤ２で出力される内容は、「点検項目」に入力される内容である。具体的には、「点検項目」には、点検の対象となる設備を特定する設備の名称（例えば、「項目番号」が「１」の例では、「消火ポンプ」である。）等が入力される。したがって、第２音声によって設備の名称が読み上げられると、ユーザ１３は、第２音声ＳＤ２を聞いて、これから点検する対象となる設備を特定できる。 For example, the content output by the second audio SD2 is the content entered in "inspection item." Specifically, the name of the equipment that identifies the equipment to be inspected (for example, in the example where the "item number" is "1," it is "fire pump") is entered in "inspection item." Therefore, when the name of the equipment is read out by the second audio, the user 13 can identify the equipment that is to be inspected by listening to the second audio SD2.

さらに、「点検項目」には、図示するように、詳細な点検内容が入力される。具体的には、詳細な点検内容は、「項目番号」が「１」の例における「圧力値」である。このような詳細な点検内容が第２音声によって読み上げられると、ユーザ１３は、第２音声ＳＤ２を聞いて、圧力値を点検し、第１音声ＳＤ１で圧力値を入力する作業を行うことが分かる。 Furthermore, as shown in the figure, detailed inspection content is input in "Inspection item". Specifically, the detailed inspection content is "Pressure value" in the example where "Item number" is "1". When such detailed inspection content is read out by the second voice, the user 13 knows that by listening to the second voice SD2, he or she will inspect the pressure value and input the pressure value with the first voice SD1.

また、ステップＳ０１では、音声認識システム１は、辞書Ｄ２を入力する。すなわち、「事前処理」では、音声認識システム１は、「本処理」において第１音声ＳＤ１で入力される音声を言葉として認識する、音声認識が可能となるようにするためのデータの入力及び設定を行う。 In addition, in step S01, the voice recognition system 1 inputs the dictionary D2. That is, in the "pre-processing", the voice recognition system 1 inputs and sets data to enable voice recognition, that is, to recognize the voice input in the first voice SD1 as words in the "main processing".

＜本処理の例＞
「本処理」は、点検の開始に応じて開始される。 <Example of this process>
"This process" begins when inspection begins.

ステップＳ０２では、音声認識システム１は、登録された第２音声ＳＤ２を出力する出力手順を行う。 In step S02, the voice recognition system 1 performs an output procedure to output the registered second voice SD2.

以下、ステップＳ０２が実行されている間、すなわち、第２音声ＳＤ２が出力されている間に第１音声ＳＤ１が入力されるとする。このような場合には、音声認識システム１は、ステップＳ０３に進む。 Hereinafter, it is assumed that the first voice SD1 is input while step S02 is being executed, i.e., while the second voice SD2 is being output. In such a case, the voice recognition system 1 proceeds to step S03.

ステップＳ０３では、音声認識システム１は、第１音声ＳＤ１を入力する音声入力手順を行う。すなわち、ステップＳ０３では、音声認識システム１は、ユーザ１３が音声を発したのを感知する。このように、第１音声ＳＤ１が入力されると、音声認識システム１は、ステップＳ０４に進む。 In step S03, the voice recognition system 1 performs a voice input procedure to input the first voice SD1. That is, in step S03, the voice recognition system 1 detects that the user 13 has spoken. In this way, when the first voice SD1 is input, the voice recognition system 1 proceeds to step S04.

ステップＳ０４では、音声認識システム１は、第２音声ＳＤ２の出力を制限する制限手順を行う。 In step S04, the voice recognition system 1 performs a restriction procedure to restrict the output of the second voice SD2.

ステップＳ０５では、音声認識システム１は、第１音声ＳＤ１に基づいて音声認識を行う音声認識手順を行う。 In step S05, the voice recognition system 1 performs a voice recognition procedure to perform voice recognition based on the first voice SD1.

ステップＳ０６では、音声認識システム１は、第１音声ＳＤ１を音声認識した音声認識結果を第２音声ＳＤ２で出力する。 In step S06, the voice recognition system 1 outputs the voice recognition result of the first voice SD1 as the second voice SD2.

例えば、以上のような全体処理が行われると、以下のような処理結果となる。 For example, when the overall processing described above is performed, the processing result will be as follows.

図５は、第１実施形態における処理結果の例を示す図である。以下、制限を行わない「第１項目実行例」と制限を行う「第２項目実行例」を比較して説明する。 Figure 5 shows an example of a processing result in the first embodiment. Below, we will explain by comparing a "first item execution example" that does not impose restrictions and a "second item execution example" that imposes restrictions.

「第１項目実行例」では、まず、音声認識システム１は、第２音声ＳＤ２によって第１１出力ＥＸ１１を出力する。この例では、第１１出力ＥＸ１１は、「消火ポンプ：圧力」というように、点検の対象となる設備の名称、及び、点検の内容を続けて示す音声である。 In the "first item execution example," the voice recognition system 1 first outputs the eleventh output EX11 by the second voice SD2. In this example, the eleventh output EX11 is a voice that successively indicates the name of the equipment to be inspected and the details of the inspection, such as "fire pump: pressure."

第１１出力ＥＸ１１に対して、ユーザ１３は、第１１出力ＥＸ１１で示す設備を点検した結果を示す第１２出力ＥＸ１２を第１音声ＳＤ１で、音声認識システム１に入力する操作を行う。具体的には、この例では、ユーザ１３は、「消火ポンプ」の設備が有する圧力計が示す「圧力」の値である「１２．３」を読み上げる。この読み上げで発せられる音声が第１２出力ＥＸ１２となる。 In response to the 11th output EX11, the user 13 performs an operation to input the 12th output EX12, which indicates the results of inspecting the equipment indicated by the 11th output EX11, to the speech recognition system 1 using the first voice SD1. Specifically, in this example, the user 13 reads out "12.3", which is the "pressure" value indicated by the pressure gauge of the "fire pump" equipment. The voice emitted by this reading becomes the 12th output EX12.

次に、この例では、音声認識システム１は、第１２出力ＥＸ１２を音声認識する。そして、第１３出力ＥＸ１３で、点検が完了した内容、かつ、音声認識結果を第２音声ＳＤ２によって出力する。図示するように、第１３出力ＥＸ１３で出力される内容のうち、前半は、第１１出力ＥＸ１１の復唱である。続いて、第１３出力ＥＸ１３では、後半で、音声認識結果、すなわち、第１２出力ＥＸ１２と同様の内容が続けて出力される。このように、制限をせず、かつ、音声認識が正しく行われた場合には、音声認識システム１は、点検項目等の出力、点検結果の入力、音声認識、及び、音声認識結果の出力という順で動作する。 Next, in this example, the voice recognition system 1 performs voice recognition on the twelfth output EX12. Then, in the thirteenth output EX13, the details of the completed inspection and the voice recognition result are output by the second voice SD2. As shown in the figure, the first half of the contents output in the thirteenth output EX13 is a repetition of the eleventh output EX11. Next, in the second half of the thirteenth output EX13, the voice recognition result, i.e., the same contents as the twelfth output EX12, is output. In this way, if no restrictions are imposed and the voice recognition is performed correctly, the voice recognition system 1 operates in the following order: output of inspection items, etc., input of the inspection results, voice recognition, and output of the voice recognition results.

このような動作が可能な音声認識システム１に対して、「第２項目実行例」では、第２音声ＳＤ２が出力されている間に、ユーザ１３は、第１音声ＳＤ１を入力して、第２音声ＳＤ２の出力を制限する。 For a voice recognition system 1 capable of such operations, in the "second item execution example", while the second voice SD2 is being output, the user 13 inputs the first voice SD1 to limit the output of the second voice SD2.

図示する例では、制限の対象となる第２音声ＳＤ２を第２１出力ＥＸ２１とする例である（ステップＳ０２）。第２１出力ＥＸ２１は、第１１出力ＥＸ１１が示す点検項目の次になる点検項目である。そして、第２１出力ＥＸ２１は、制限されない場合には、「消火ポンプ：外観」という内容である。 In the illustrated example, the second audio SD2 to be restricted is set as the 21st output EX21 (step S02). The 21st output EX21 is the inspection item that follows the inspection item indicated by the 11th output EX11. If the 21st output EX21 is not restricted, the content is "Fire pump: exterior."

図示するように、「第２項目実行例」では、第２１出力ＥＸ２１が出力されている（ステップＳ０２）途中で、ユーザ１３は、第２１出力ＥＸ２１で示す設備を点検した結果を示す第２２出力ＥＸ２２を第１音声ＳＤ１で、音声認識システム１に入力する操作を行う（ステップＳ０３）。 As shown in the figure, in the "second item execution example", while the 21st output EX21 is being output (step S02), the user 13 performs an operation to input the 22nd output EX22, which indicates the results of inspecting the equipment indicated by the 21st output EX21, to the speech recognition system 1 using the first voice SD1 (step S03).

このように、第２１出力ＥＸ２１が出力されている間に、第２２出力ＥＸ２２が入力されると、音声認識システム１は、第２１出力ＥＸ２１の出力を制限する制限処理ＰＲ１を行う（ステップＳ０４）。例えば、制限処理ＰＲ１は、第２１出力ＥＸ２１の出力を停止させる処理である。 In this way, when the 22nd output EX22 is input while the 21st output EX21 is being output, the speech recognition system 1 performs a restriction process PR1 that restricts the output of the 21st output EX21 (step S04). For example, the restriction process PR1 is a process that stops the output of the 21st output EX21.

また、ユーザ１３は、「消火ポンプ」の設備の外観をチェックした結果である「× 液漏れあり」を読み上げる（ステップＳ０３）。この読み上げで発せられる音声が第２２出力ＥＸ２２となる。 The user 13 also reads out the result of checking the appearance of the "fire pump" equipment: "X - Liquid leaking" (step S03). The voice emitted by this reading becomes the 22nd output EX22.

次に、この例では、音声認識システム１は、第２２出力ＥＸ２２を音声認識する（ステップＳ０５）。そして、音声認識システム１は、第２３出力ＥＸ２３で、第１３出力ＥＸ１３と同様に、点検が完了した内容、かつ、音声認識結果を第２音声ＳＤ２によって出力する（ステップＳ０６）。図示するように、第２３出力ＥＸ２３で出力される内容のうち、前半は、第２１出力ＥＸ２１の復唱である。続いて、第２３出力ＥＸ２３では、後半で、音声認識結果、すなわち、第２２出力ＥＸ２２と同様の内容が続けて出力される。 Next, in this example, the voice recognition system 1 performs voice recognition on the 22nd output EX22 (step S05). Then, in the 23rd output EX23, the voice recognition system 1 outputs the content that the inspection has been completed and the voice recognition result by the second voice SD2, similar to the 13th output EX13 (step S06). As shown in the figure, the first half of the content output in the 23rd output EX23 is a repetition of the 21st output EX21. Next, in the second half of the 23rd output EX23, the voice recognition result, i.e., the same content as the 22nd output EX22, is continuously output.

＜画面表示例＞
図６は、第１実施形態における携帯端末の画面表示例を示す図である。例えば、点検は、以下のような順序で行われる。ただし、図示するような画面表示、順序及び入力項目は必須ではない。 <Screen display example>
6 is a diagram showing an example of a screen display of a mobile terminal in the first embodiment. For example, inspection is performed in the following order. However, the screen display, order, and input items shown in the figure are not essential.

図６（Ａ）は、「メインメニュー」を示す画面である。以下、図６（Ａ）に示す画面を第１１画面ＰＮ１１という。第１１画面ＰＮ１１で「点検フロー」が押されると、点検を行う準備を行う画面である図６（Ｂ）に画面が遷移する。 Figure 6 (A) is a screen showing the "Main Menu." Hereinafter, the screen shown in Figure 6 (A) will be referred to as the eleventh screen PN11. When "Inspection Flow" is pressed on the eleventh screen PN11, the screen transitions to Figure 6 (B), which is a screen for preparing to perform an inspection.

図６（Ｂ）は、「点検フロー一覧」を選択する画面である。以下、図６（Ｂ）に示す画面を第１２画面ＰＮ１２という。第１２画面ＰＮ１２は、点検を行う対象となる設備を選ぶ画面である。次に、第１２画面ＰＮ１２で設備が選ばれると、点検を行うユーザ１３を指定する画面である図６（Ｃ）に画面が遷移する。 Figure 6 (B) is a screen for selecting "Inspection flow list." Hereinafter, the screen shown in Figure 6 (B) will be referred to as the 12th screen PN12. The 12th screen PN12 is a screen for selecting the equipment to be inspected. Next, when equipment is selected on the 12th screen PN12, the screen transitions to Figure 6 (C), which is a screen for specifying the user 13 who will perform the inspection.

図６（Ｃ）は、「作業者の選択」を行う画面である。以下、図６（Ｃ）に示す画面を第１３画面ＰＮ１３という。第１３画面ＰＮ１３は、これから点検を行う作業者となるユーザ１３を選ぶ画面である。次に、第１３画面ＰＮ１３でユーザ１３が選ばれると、点検が開始され、図６（Ｄ）に画面が遷移する。 Figure 6 (C) is a screen for "selecting a worker." Hereinafter, the screen shown in Figure 6 (C) will be referred to as the thirteenth screen PN13. The thirteenth screen PN13 is a screen for selecting a user 13 who will be the worker who will perform the inspection. Next, when a user 13 is selected on the thirteenth screen PN13, the inspection begins and the screen transitions to Figure 6 (D).

図６（Ｄ）は、第１番目の点検項目用画面である。以下、図６（Ｄ）に示す画面を第１４画面ＰＮ１４という。第１４画面ＰＮ１４は、「点検項目書」に、点検の対象となる設備の名称及び点検の進捗度を示す。また、第１４画面ＰＮ１４は、「点検項目」に、点検内容となる「缶内圧力」を示す。例えば、第１４画面ＰＮ１４の表示に応じて、同様の内容を示す第２音声ＳＤ２が出力される（ステップＳ０２）。次に、点検結果を示す第１音声ＳＤ１が入力されると（ステップＳ０３）、音声認識が行われて（ステップＳ０５）、図６（Ｅ）に画面が遷移する。 Figure 6 (D) is a screen for the first inspection item. Hereinafter, the screen shown in Figure 6 (D) is referred to as the 14th screen PN14. The 14th screen PN14 shows the name of the equipment to be inspected and the progress of the inspection in the "Inspection Item List". The 14th screen PN14 also shows the "boiler pressure", which is the inspection content, in the "Inspection Item". For example, in response to the display of the 14th screen PN14, a second voice SD2 indicating the same content is output (step S02). Next, when the first voice SD1 indicating the inspection result is input (step S03), voice recognition is performed (step S05) and the screen transitions to Figure 6 (E).

図６（Ｅ）は、第１番目の点検結果を示す画面である。以下、図６（Ｅ）に示す画面を第１５画面ＰＮ１５という。第１５画面ＰＮ１５は、第１４画面ＰＮ１４と比較すると、認識結果に「０．６６」という数値が表示される点が異なる。この「０．６６」は、第１音声ＳＤ１を音声認識した結果である。このように、音声認識結果が画面に表示されると、ユーザ１３は、音声認識結果を確認しやすい。 Figure 6 (E) is a screen showing the first inspection result. Hereinafter, the screen shown in Figure 6 (E) will be referred to as the fifteenth screen PN15. The fifteenth screen PN15 differs from the fourteenth screen PN14 in that the numerical value "0.66" is displayed as the recognition result. This "0.66" is the result of voice recognition of the first voice SD1. When the voice recognition result is displayed on the screen in this way, it is easy for the user 13 to check the voice recognition result.

例えば、以上のように、第１番目の点検項目について点検が行われる。以下、２番目以降の点検項目も同様に処理が行われる。 For example, the first inspection item is inspected as described above. The second and subsequent inspection items are processed in the same manner.

図６（Ｆ）は、第２番目の点検項目用画面である。以下、図６（Ｆ）に示す画面を第１６画面ＰＮ１６という。第１番目の点検項目が「缶内圧力」であったのに対して、第１６画面ＰＮ１６では、点検項目が「押込ファン電流値」となる。そして、第１６画面ＰＮ１６では、第１番目の点検項目と同様に、画面と同様の内容を示す第２音声ＳＤ２が出力される（ステップＳ０２）。次に、点検結果を示す第１音声ＳＤ１が入力されると（ステップＳ０３）、音声認識が行われて（ステップＳ０５）、図６（Ｇ）に画面が遷移する。 Figure 6 (F) is the screen for the second inspection item. Hereinafter, the screen shown in Figure 6 (F) is referred to as the 16th screen PN16. Whereas the first inspection item was "boiler internal pressure", the inspection item in the 16th screen PN16 is "forced fan current value". Then, in the 16th screen PN16, as in the case of the first inspection item, a second voice SD2 indicating the same content as in the screen is output (step S02). Next, when the first voice SD1 indicating the inspection result is input (step S03), voice recognition is performed (step S05) and the screen transitions to Figure 6 (G).

図６（Ｇ）は、第２番目の点検結果を示す画面である。以下、図６（Ｇ）に示す画面を第１７画面ＰＮ１７という。第１７画面ＰＮ１７は、第１６画面ＰＮ１６と比較すると、認識結果に「４３」という数値が表示される点が異なる。この「４３」は、第１音声ＳＤ１を音声認識した結果である。 Figure 6 (G) is a screen showing the second inspection result. Hereinafter, the screen shown in Figure 6 (G) will be referred to as the 17th screen PN17. The 17th screen PN17 differs from the 16th screen PN16 in that the number "43" is displayed as the recognition result. This "43" is the result of voice recognition of the first voice SD1.

例えば、第１４画面ＰＮ１４又は第１６画面ＰＮ１６において、画面と同様の内容を示す第２音声ＳＤ２が出力されている間に、第１音声ＳＤ１が入力されると、音声認識システム１は、第２音声ＳＤ２の出力が制限して（ステップＳ０４）、音声認識の開始、及び、次の画面に遷移する。 For example, when the first voice SD1 is input while the second voice SD2 indicating the same content as the screen is being output on the 14th screen PN14 or the 16th screen PN16, the voice recognition system 1 restricts the output of the second voice SD2 (step S04), starts voice recognition, and transitions to the next screen.

＜機能構成例＞
図７は、第１実施形態における機能構成例を示す図である。例えば、音声認識システム１は、音声入力手段１Ｆ１１、音声認識手段１Ｆ１２、登録手段１Ｆ１３、出力手段１Ｆ１４、制限手段１Ｆ１５、入力データ生成手段１Ｆ１６、記憶手段１Ｆ１７、判断手段１Ｆ１８、ノイズキャンセル手段１Ｆ１９、グループ設定手段１Ｆ２０、及び、省略操作手段１Ｆ２１を含む機能構成である。 <Functional configuration example>
7 is a diagram showing an example of a functional configuration in the first embodiment. For example, the voice recognition system 1 has a functional configuration including a voice input unit 1F11, a voice recognition unit 1F12, a registration unit 1F13, an output unit 1F14, a restriction unit 1F15, an input data generation unit 1F16, a storage unit 1F17, a determination unit 1F18, a noise cancellation unit 1F19, a group setting unit 1F20, and an omission operation unit 1F21.

音声入力手段１Ｆ１１及び出力手段１Ｆ１４は、例えば、イヤホン１２等で実現される。また、音声認識手段１Ｆ１２、登録手段１Ｆ１３、制限手段１Ｆ１５、入力データ生成手段１Ｆ１６、記憶手段１Ｆ１７、判断手段１Ｆ１８、ノイズキャンセル手段１Ｆ１９、グループ設定手段１Ｆ２０、及び、省略操作手段１Ｆ２１は、携帯端末１１又はサーバ１０が有するＣＰＵＨＷ１、記憶装置ＨＷ２、入力装置ＨＷ４、出力装置ＨＷ５、及び、インタフェースＨＷ６等の演算装置、記憶装置、入力装置、及び、出力装置等が協働して動作して実現する。 The voice input means 1F11 and the output means 1F14 are realized, for example, by earphones 12. The voice recognition means 1F12, the registration means 1F13, the restriction means 1F15, the input data generation means 1F16, the storage means 1F17, the determination means 1F18, the noise cancellation means 1F19, the group setting means 1F20, and the omission operation means 1F21 are realized by the cooperation of the arithmetic units, storage units, input devices, and output devices, such as the CPU HW1, the storage unit HW2, the input unit HW4, the output unit HW5, and the interface HW6, of the mobile terminal 11 or the server 10.

特に、点検を行うユーザ１３が点検に慣れているような場合には、点検の内容及び順序といった詳細が第２音声ＳＤ２で案内されなくとも、熟知している場合がある。このような場合には、ユーザ１３は、第２音声ＳＤ２がすべて出力され終わる前に、点検結果を第１音声ＳＤ１で入力して、点検を早く終わらせたい場合がある。そこで、上記の例のように、第２音声ＳＤ２の出力をしている間に、第１音声ＳＤ１が入力されると、第２音声ＳＤ２の出力が制限されるのが望ましい。このように、第２音声ＳＤ２の出力を制限して、第１音声ＳＤ１が入力できるようになると、ユーザ１３は、第２音声ＳＤ２が出力され終わるのを待たずに済むため、音声の入力を行うための時間を短縮できる。 In particular, if the user 13 performing the inspection is accustomed to inspections, the user 13 may be familiar with the details of the inspection, such as the contents and sequence of the inspection, even if they are not guided by the second voice SD2. In such a case, the user 13 may want to finish the inspection quickly by inputting the inspection results in the first voice SD1 before all of the second voice SD2 has been output. Therefore, as in the above example, if the first voice SD1 is input while the second voice SD2 is being output, it is desirable to limit the output of the second voice SD2. In this way, if the output of the second voice SD2 is limited and the first voice SD1 can be input, the user 13 does not have to wait for the second voice SD2 to finish being output, and the time required to input voice can be shortened.

＜第２実施形態＞
第２実施形態は、第１実施形態と同様のシステム構成である音声認識システム１で実現する。また、第２実施形態における携帯端末１１及びサーバ１０のハードウェア構成も第１実施形態と同様とする。第２実施形態は、第１実施形態と比較すると、全体処理が異なる。以下、異なる点を中心に説明し、重複する説明を省略する。 Second Embodiment
The second embodiment is realized by a voice recognition system 1 having the same system configuration as the first embodiment. The hardware configurations of the mobile terminal 11 and the server 10 in the second embodiment are also the same as those in the first embodiment. The second embodiment differs from the first embodiment in the overall processing. The following description will focus on the differences and omit redundant description.

＜全体処理例＞
図８は、第２実施形態における全体処理例を示す図である。第１実施形態と比較すると、第２実施形態は、ステップＳ２１乃至ステップＳ２３が行われる点が異なる。また、第２実施形態における「事前処理」は、第１実施形態と同様とする。 <Overall processing example>
8 is a diagram showing an example of the overall processing in the second embodiment. Compared with the first embodiment, the second embodiment is different in that steps S21 to S23 are performed. Also, the "pre-processing" in the second embodiment is the same as that in the first embodiment.

ステップＳ０２では、音声認識システム１は、登録された第２音声ＳＤ２を出力する。 In step S02, the voice recognition system 1 outputs the registered second voice SD2.

以下、ステップＳ０２が実行されている間、すなわち、第２音声ＳＤ２が出力されている間に、中断操作Ｃ１が入力される例で説明する。このような場合には、音声認識システム１は、ステップＳ２１に進む。なお、中断操作Ｃ１がどのような操作かは、事前に設定される。 Below, an example will be described in which the interrupt operation C1 is input while step S02 is being executed, i.e., while the second voice SD2 is being output. In such a case, the voice recognition system 1 proceeds to step S21. Note that the type of operation that the interrupt operation C1 is is set in advance.

ステップＳ２１では、音声認識システム１は、中断操作Ｃ１を入力する。このように、中断操作Ｃ１が入力されると、音声認識システム１は、第２音声ＳＤ２の出力を中断させる。 In step S21, the voice recognition system 1 inputs an interruption operation C1. In this manner, when the interruption operation C1 is input, the voice recognition system 1 interrupts the output of the second voice SD2.

また、音声認識システム１は、第２音声ＳＤ２の出力を中断している間に中断を解除する解除操作Ｃ２が入力されると、ステップＳ２２に進む。なお、解除操作Ｃ２がどのような操作かは、事前に設定される。 When a cancel operation C2 is input to cancel the interruption while the output of the second voice SD2 is interrupted, the voice recognition system 1 proceeds to step S22. Note that the type of operation that the cancel operation C2 is is set in advance.

ステップＳ２２では、音声認識システム１は、解除操作Ｃ２に基づいて、中断を解除する。 In step S22, the voice recognition system 1 releases the interruption based on the release operation C2.

ステップＳ２３では、音声認識システム１は、第２音声ＳＤ２の出力を再開する。 In step S23, the voice recognition system 1 resumes output of the second voice SD2.

図９は、第２実施形態における処理結果の例を示す図である。以下、第２実施形態における「第２項目実行例」の第２１出力ＥＸ２１を中断の対象とする例で説明する。 Figure 9 is a diagram showing an example of a processing result in the second embodiment. Below, we will explain an example in which the 21st output EX21 of the "Second item execution example" in the second embodiment is the target of interruption.

例えば、第２１出力ＥＸ２１を示す出力音声が出力されている（ステップＳ０２）間に、中断操作Ｃ１が入力されるとする。そして、中断操作Ｃ１が入力されると、音声認識システム１は、中断処理ＰＲ２を行う（ステップＳ２１）。以下、中断操作Ｃ１を入力して、中断した時点を「第１時点」という。 For example, suppose that an interruption operation C1 is input while an output voice indicating the 21st output EX21 is being output (step S02). When the interruption operation C1 is input, the voice recognition system 1 performs an interruption process PR2 (step S21). Hereinafter, the point in time when the interruption operation C1 is input and interrupted is referred to as the "first point in time."

中断処理ＰＲ２は、図示するように、第２１出力ＥＸ２１の途中で出力を止める処理である。さらに、中断処理ＰＲ２は、中断した時点以降、解除操作Ｃ２が入力されるまで出力を行わないようにする処理である。 As shown in the figure, the interruption process PR2 is a process that stops the output midway through the 21st output EX21. Furthermore, the interruption process PR2 is a process that prevents output from being performed after the interruption until the cancel operation C2 is input.

次に、解除操作Ｃ２が入力されると、音声認識システム１は、中断を解除する（ステップＳ２２）。具体的には、解除操作Ｃ２が入力されると、音声認識システム１は、第２００出力ＥＸ２００を示す出力音声を出力し、第２１出力ＥＸ２１を示す出力音声の出力を再開する。 Next, when the cancel operation C2 is input, the voice recognition system 1 cancels the interruption (step S22). Specifically, when the cancel operation C2 is input, the voice recognition system 1 outputs an output voice indicating the 200th output EX200, and resumes output of an output voice indicating the 21st output EX21.

第２００出力ＥＸ２００は、第１時点で完了していた点検の内容を示す出力音声である。このような出力音声によって、ユーザ１３に、どこまで点検が終わっていたかを思い出させる。 The 200th output EX200 is an output sound that indicates the contents of the inspection that was completed at the first point in time. This output sound reminds the user 13 of how much of the inspection has been completed.

そして、第２００出力ＥＸ２００が出力された後、第２１出力ＥＸ２１の出力が再開される（ステップＳ２３）。 Then, after the 200th output EX200 is output, the output of the 21st output EX21 is resumed (step S23).

＜画面表示例＞
図１０は、第２実施形態における携帯端末の画面表示例を示す図である。例えば、点検は、以下のような順序で行われる。ただし、図示するような画面表示、順序及び入力項目は必須ではない。 <Screen display example>
10 is a diagram showing an example of a screen display of a mobile terminal in the second embodiment. For example, inspection is performed in the following order. However, the screen display, order, and input items shown in the figure are not essential.

図１０（Ａ）は、第１番目の点検結果を示す画面である。以下、図１０（Ａ）に示す画面を第２１画面ＰＮ２１という。第２１画面ＰＮ２１は、第１番目に行う点検の内容が「蒸気温度」であって、点検結果として「１２０」が入力された場合に表示される画面の例である。図示するように、テキスト出力Ｂ２によって音声認識結果が出力される。そして、テキスト出力Ｂ２を見て音声認識結果が誤っている場合には、再入力ボタンＢ１を押して、再度、点検結果を入力できるのが望ましい。 Figure 10 (A) is a screen showing the first inspection result. Hereinafter, the screen shown in Figure 10 (A) will be referred to as the 21st screen PN21. The 21st screen PN21 is an example of a screen that is displayed when the first inspection is "steam temperature" and "120" is entered as the inspection result. As shown in the figure, the voice recognition result is output by text output B2. Then, if the voice recognition result is incorrect when looking at the text output B2, it is desirable to be able to press the re-input button B1 and re-input the inspection result.

同様に、第２番目、第３番目、及び、第４番目の順序で点検が行われる。そして、この例では、図１０（Ｂ）に示す第２２画面ＰＮ２２が、第２番目の点検結果が入力された場合に表示される画面の例である。 Similarly, the second, third, and fourth inspections are performed in that order. In this example, the 22nd screen PN22 shown in FIG. 10(B) is an example of the screen that is displayed when the second inspection result is entered.

図１０（Ｃ）に示す第２３画面ＰＮ２３は、第３番目の点検結果が入力された場合に表示される画面の例である。そして、図１０（Ｄ）に示す第２４画面ＰＮ２４は、第４番目の点検結果が入力された場合に表示される画面の例である。 The 23rd screen PN23 shown in FIG. 10(C) is an example of a screen that is displayed when the third inspection result is input. And the 24th screen PN24 shown in FIG. 10(D) is an example of a screen that is displayed when the fourth inspection result is input.

＜機能構成例＞
図１１は、第２実施形態における機能構成例を示す図である。例えば、音声認識システム１は、音声入力手段１Ｆ１１、音声認識手段１Ｆ１２、出力手段１Ｆ１４、解除手段１Ｆ１０１、及び、中断手段１Ｆ１００を含む機能構成である。 <Functional configuration example>
11 is a diagram showing an example of a functional configuration in the second embodiment. For example, the voice recognition system 1 has a functional configuration including a voice input unit 1F11, a voice recognition unit 1F12, an output unit 1F14, a release unit 1F101, and an interruption unit 1F100.

音声入力手段１Ｆ１１及び出力手段１Ｆ１４は、例えば、イヤホン１２等で実現される。また、音声認識手段１Ｆ１２、解除手段１Ｆ１０１、及び、中断手段１Ｆ１００は、携帯端末１１又はサーバ１０が有するＣＰＵＨＷ１、記憶装置ＨＷ２、入力装置ＨＷ４、出力装置ＨＷ５、及び、インタフェースＨＷ６等の演算装置、記憶装置、入力装置、及び、出力装置等が協働して動作して実現する。 The voice input means 1F11 and the output means 1F14 are realized, for example, by earphones 12. The voice recognition means 1F12, the release means 1F101, and the interruption means 1F100 are realized by the cooperation of the arithmetic units, memory units, input devices, and output devices, such as the CPU HW1, the memory unit HW2, the input unit HW4, the output unit HW5, and the interface HW6, of the mobile terminal 11 or the server 10.

例えば、図１０に示すように、事前に設定される複数の点検項目に基づいて、点検が行われる。このような点検作業の流れの中では、例えば、電話が鳴る、又は、他の人から話しかけられるといった割込となる作業が発生する場合がある。このような割込があった場合に、ユーザ１３は、中断操作Ｃ１で点検を中断できるのが望ましい。このように、ユーザ１３は、中断操作Ｃ１で点検を中断できると、点検を中断させた上で、割り込んできた作業を行うことができる。そして、割り込んできた作業の完了後、解除操作Ｃ２によって、ユーザ１３は、点検を再開できる。 For example, as shown in FIG. 10, an inspection is performed based on multiple inspection items that are set in advance. During the flow of such an inspection work, an interrupting task may occur, for example, the telephone rings or someone speaks to the user. When such an interruption occurs, it is desirable for the user 13 to be able to interrupt the inspection with an interrupt operation C1. In this way, if the user 13 can interrupt the inspection with the interrupt operation C1, the user 13 can interrupt the inspection and then perform the interrupting task. Then, after completing the interrupting task, the user 13 can resume the inspection with a cancel operation C2.

＜変形例＞
なお、点検項目は、数値、又は、「〇」若しくは「×」（「ＹＥＳ」若しくは「ＮＯ」等でもよい。）の形式で入力及びチェックされる「定式」と、「漏れあり」等のようにユーザ１３がコメントで入力する「非定式」があってもよい。また、点検結果は、ユーザ１３が撮影した画像等が添付できてもよい。 <Modification>
The inspection items may be either "standardized" items that are input and checked in the form of numerical values or "o" or "x" (or "YES" or "NO", etc.), or "non-standardized" items that the user 13 inputs as comments such as "missing". Also, images taken by the user 13 may be attached to the inspection results.

なお、音声認識システム１は、設備の点検以外に用いられてもよい。すなわち、点検の対象となる対象物は、設備以外の装置でもよい。また、音声認識システム１は、点検以外に用いられてもよい。 The voice recognition system 1 may be used for purposes other than equipment inspection. In other words, the object to be inspected may be a device other than equipment. The voice recognition system 1 may be used for purposes other than inspection.

音声認識システム１が設備の点検に用いられると、設備の点検を行う作業時間を短縮させて、作業の省力化ができる。 When the voice recognition system 1 is used for equipment inspection, the time required for equipment inspection can be shortened, making the work more labor-saving.

なお、音声認識システム１は、空調設備機器の機械室における点検で用いられるのが望ましい。空調設備機器の機械室における点検では、点検箇所が機械室内で点在する場合が多い。また、多数の大型機器が設置されている場合には、個々の大型機器について、点検の対象となるメータ類も点在する場合が多い。このような機械室では、点検項目も多岐にわたる場合が多い。そして、このような多数の点検を行う場合には、ユーザは、点検をするために多く移動する。また、場合によっては、ユーザは、移動において梯子を登る等の移動もあり得る。 The voice recognition system 1 is preferably used for inspections in the machine room of air conditioning equipment. When inspecting the machine room of air conditioning equipment, the inspection points are often scattered throughout the machine room. Furthermore, when a large number of large pieces of equipment are installed, the meters to be inspected for each piece of large equipment are often also scattered throughout the machine room. In such machine rooms, the inspection items are often diverse. Furthermore, when performing such a large number of inspections, the user moves around a lot to carry out the inspections. Furthermore, in some cases, the user may have to move around by climbing a ladder, etc.

さらに、ユーザは、点検では、照明を点灯させる（そして、終了後は消灯させる。）、又は、動力盤の扉を開ける等の付随的な作業も行う。そのため、点検結果の入力等をできるだけ手作業で入力するのをなくしたい事情がある。そこで、音声認識システム１によって、音声で点検結果等を入力できると、ユーザは点検を行いやすい。 Furthermore, when inspecting, the user also performs incidental tasks such as turning on the lights (and turning them off when finished) or opening the door to the power panel. For this reason, there is a desire to eliminate manual input of inspection results as much as possible. Therefore, if the voice recognition system 1 can be used to input inspection results by voice, it will be easier for the user to inspect.

また、上記のような機械室での点検のように、移動が多い場合には、他の作業者及び管理者等から点検を行っている間に声を掛けられる可能性も高くなる。そのため、音声認識システム１で中断及び解除ができると、点検を中断し、再開できるため、点検を効率良く行うことができる。 In addition, when there is a lot of movement, such as in the case of inspections in machine rooms as described above, there is a high possibility that other workers and managers will talk to the inspector while he or she is performing the inspection. Therefore, if the voice recognition system 1 can pause and cancel the inspection, the inspection can be stopped and resumed, making the inspection more efficient.

また、設備及びメータ類等の配置によっては、１か所に複数の計器類が設置される場合もある。このような設置状況下では、１か所で多くの点検結果を入力する場合も多い。そのため、装置による音声の出力が終わるまで待ってから音声入力を行うと、音声の入力を行うための待ち時間が長くなるので点検に時間がかかる。一方で、装置による音声の出力を制限して、音声が入力できる構成であると、ユーザは、音声が出力され終わるのを待たずに済むため、音声の入力を行うための時間を短縮できる。ゆえに、点検を効率良く行うことができる。 Depending on the layout of the equipment and meters, multiple instruments may be installed in one location. In such installation situations, many inspection results are often input in one location. Therefore, if the user waits until the device finishes outputting audio before inputting audio, the waiting time for inputting audio increases, and the inspection takes time. On the other hand, if the device is configured to limit audio output while allowing audio input, the user does not have to wait for audio output to finish, and the time required to input audio can be reduced. This allows the inspection to be performed efficiently.

音声を出力するのに用いられるイヤホン１２は、内耳式であるのが望ましい。内耳式は、イヤホン１２を装着した際に、外部からの音を遮音する形状である。例えば、設備が稼働中の現場では、設備が稼働する音等によるノイズが多い環境で場合が多い。そのため、内耳式のイヤホン１２であると、ノイズが多い環境であっても、出力される音声が聞き取りやすい。また、骨伝導方式のイヤホンと比較すると、内耳式である方が小型化できる場合が多い。 The earphones 12 used to output sound are preferably of the cochlear type. Cochlear type earphones 12 are shaped to block external sounds when worn. For example, at work sites where equipment is in operation, the environment is often noisy due to the sounds of the equipment operating. Therefore, with cochlear type earphones 12, the output sound is easy to hear even in noisy environments. Also, compared to bone conduction type earphones, cochlear type earphones can often be made smaller.

点検項目に対して、順序、及び、点検の要否等が設定できてもよい。例えば、第２音声ＳＤ２は、登録データベースＤ１における「項目番号」の順に出力を行う。そこで、点検を行う前に、「事前処理」において、「項目番号」の設定、又は、「点検項目」と「項目番号」の対応付けを変更して、点検が行われる順序を変更できてもよい。 The order and the necessity of inspection for each inspection item may be set. For example, the second audio SD2 is output in the order of the "item numbers" in the registration database D1. Therefore, before the inspection is performed, the order in which the inspections are performed may be changed in "pre-processing" by changing the setting of the "item numbers" or the correspondence between the "inspection items" and the "item numbers."

また、点検を行わない「点検項目」を「ＯＦＦ」にするような設定ができてもよい。 It may also be possible to set "inspection items" that are not to be inspected to "OFF."

点検する項目は、設備の稼働状況等によって異なる場合がある。したがって、順序及び点検の要否が設定できると、点検を行う日の稼働状況等に揃えて柔軟に点検する内容を変更できる。 The items to be inspected may vary depending on the operating status of the equipment. Therefore, if the order and necessity of inspections can be set, the contents of the inspection can be flexibly changed to match the operating status on the day of inspection.

情報処理装置は、上記に示すハードウェア構成に限られない。例えば、情報処理装置は、上記に示す以外の演算装置、制御装置、記憶装置、入力装置、出力装置、及び、周辺機器を更に有するハードウェア構成でもよい。また、入力装置及び出力装置が一体となってタッチパネル等でもよい。なお、携帯端末１１及びサーバ１０は、異なるハードウェア構成の装置でもよい。さらに、周辺機器及び他の装置との接続は、有線でもよいし、無線でもよい。 The information processing device is not limited to the hardware configuration shown above. For example, the information processing device may have a hardware configuration that further includes a calculation device, a control device, a storage device, an input device, an output device, and peripheral devices other than those shown above. Furthermore, the input device and the output device may be integrated into a touch panel or the like. Note that the mobile terminal 11 and the server 10 may be devices with different hardware configurations. Furthermore, the connection with the peripheral devices and other devices may be wired or wireless.

また、携帯端末１１は、タブレット、スマートフォン、又は、モバイルパソコン等の情報処理装置でもよい。 The mobile terminal 11 may also be an information processing device such as a tablet, smartphone, or mobile computer.

＜制限処理の変形例＞
なお、制限処理ＰＲ１は、第２音声ＳＤ２の出力を停止する処理に限られない。例えば、制限処理ＰＲ１は、第２音声ＳＤ２の音量を小さくして第２音声ＳＤ２を制限する処理でもよい。また、制限処理ＰＲ１は、第２音声ＳＤ２の音量を徐々に小さくして第２音声ＳＤ２を制限する処理でもよい。このように、制限処理ＰＲ１は、通常よりも音量を絞って第２音声ＳＤ２の出力を制限してもよい。 <Modification of Restriction Processing>
Note that the restriction process PR1 is not limited to a process of stopping the output of the second sound SD2. For example, the restriction process PR1 may be a process of limiting the second sound SD2 by lowering the volume of the second sound SD2. The restriction process PR1 may also be a process of limiting the second sound SD2 by gradually lowering the volume of the second sound SD2. In this way, the restriction process PR1 may limit the output of the second sound SD2 by lowering the volume more than usual.

又は、制限処理ＰＲ１は、第２音声ＳＤ２の次に出力させる音声の出力を開始する第２音声ＳＤ２を制限する処理でもよい。このように、制限処理ＰＲ１は、現在、第２音声ＳＤ２で示している点検項目から次の点検項目に進むようにして、第２音声ＳＤ２の出力を制限してもよい。 Alternatively, the restriction process PR1 may be a process that restricts the second audio SD2 that starts to be output as the audio to be output next to the second audio SD2. In this way, the restriction process PR1 may restrict the output of the second audio SD2 by moving from the inspection item currently indicated by the second audio SD2 to the next inspection item.

ほかにも、制限処理ＰＲ１は、第２音声ＳＤ２の出力速度を速くして第２音声ＳＤ２を制限する処理でもよい。すなわち、制限処理ＰＲ１は、第２音声ＳＤ２を早送り再生して、第２音声ＳＤ２の出力を制限してもよい。 Alternatively, the restriction process PR1 may be a process that restricts the second audio SD2 by increasing the output speed of the second audio SD2. In other words, the restriction process PR1 may restrict the output of the second audio SD2 by playing the second audio SD2 in fast forward.

＜グループ設定及び省略を行う変形例＞
なお、点検は、一部が省略できてもよい。例えば、省略は、以下のような処理である。 <Modification for Group Setting and Omission>
In addition, some of the inspections may be omitted. For example, the following process may be omitted.

図１２は、グループ設定及び省略を行う変形例を示す図である。以下、点検の対象となる設備が「第１設備」、「第２設備」、「第３設備」、及び、「第４設備」であり、この順序で点検を行う例で説明する。 Figure 12 shows a modified example of group setting and omission. In the following, an example will be described in which the equipment to be inspected is "first equipment," "second equipment," "third equipment," and "fourth equipment," and inspections are performed in this order.

この例では、第１実施形態等と同様に、第１設備についての点検項目が、第３１出力ＥＸ３１で出力されるとする。また、この例では、第３１出力ＥＸ３１を含む第１設備についての点検が省略の対象であるとする。 In this example, similar to the first embodiment, the inspection items for the first equipment are output in the 31st output EX31. Also, in this example, the inspection of the first equipment including the 31st output EX31 is subject to omission.

省略を行う場合には、ユーザ１３は、省略操作Ｃ３を行う。例えば、省略操作Ｃ３は、所定のボタンを押す、又は、「省略」等のように事前に設定される言葉を第１音声ＳＤ１で入力する等の操作である。なお、省略操作Ｃ３がどのような操作かは事前に設定される。 When omission is to be performed, the user 13 performs the omission operation C3. For example, the omission operation C3 is an operation such as pressing a specific button or inputting a pre-set word such as "omission" in the first voice SD1. Note that the type of operation that the omission operation C3 is is set in advance.

図示するように、第３１出力ＥＸ３１が出力されている間に、省略操作Ｃ３が行われると、音声認識システム１は、第３１出力ＥＸ３１を含む第１設備の点検を省略する省略処理ＰＲ３を行う。例えば、省略処理ＰＲ３は、第３１出力ＥＸ３１の出力を中止して、第２設備の点検に進む処理である。したがって、省略操作Ｃ３が行われると、音声認識システム１は、第２設備についての点検項目を示す第３２出力ＥＸ３２の出力を開始する。 As shown in the figure, when an omission operation C3 is performed while the 31st output EX31 is being output, the voice recognition system 1 performs an omission process PR3 that omits the inspection of the first equipment, including the 31st output EX31. For example, the omission process PR3 is a process that stops the output of the 31st output EX31 and proceeds to the inspection of the second equipment. Therefore, when the omission operation C3 is performed, the voice recognition system 1 starts outputting the 32nd output EX32, which indicates the inspection items for the second equipment.

なお、事前に登録される点検項目のうち、どの点検項目が第１設備の点検に対応するかは、例えば、以下のようなグループ設定に基づいて把握される。 Which of the pre-registered inspection items corresponds to the inspection of the first equipment is determined based on, for example, a group setting such as the following:

図１３は、グループ設定の例を示す図である。例えば、図示するような点検項目が事前に登録された場合を例に説明する。この例では、「項目番号」が「１」及び「２」の「点検項目」が第１設備についての点検である。同様に、「項目番号」が「３」乃至「５」の「点検項目」が第２設備についての点検である。さらに、「項目番号」が「６」の「点検項目」が第３設備についての点検である。さらにまた、「項目番号」が「７」の「点検項目」が第４設備についての点検である。 Figure 13 is a diagram showing an example of group settings. For example, a case will be described where inspection items such as those shown in the figure have been registered in advance. In this example, the "inspection items" with "item numbers" of "1" and "2" are inspections of the first equipment. Similarly, the "inspection items" with "item numbers" of "3" through "5" are inspections of the second equipment. Furthermore, the "inspection item" with "item number" of "6" is an inspection of the third equipment. Furthermore, the "inspection item" with "item number" of "7" is an inspection of the fourth equipment.

これらの複数の点検項目に対して、グループＧＳが設定される。例えば、グループＧＳは、図示するように、音声認識システム１は、「グループ」に数値を入力して設定するグループ設定手順を行う。この例では、「項目番号」が「１」及び「２」の「点検項目」をまとめて「Ｇ１」というグループに設定した例である。同様に、この例では、「項目番号」が「３」乃至「５」の「点検項目」をまとめて「Ｇ２」というグループに設定した例である。なお、グループＧＳは、設備ごとに限られず、「Ｇ３」のように、第３設備及び第４設備の点検項目をまとめて１つのグループに設定してもよい。 A group GS is set for these multiple inspection items. For example, as shown in the figure, the voice recognition system 1 performs a group setting procedure in which a numerical value is entered into "group" to set the group GS. In this example, "inspection items" with "item numbers" of "1" and "2" are grouped together and set as a group called "G1". Similarly, in this example, "inspection items" with "item numbers" of "3" through "5" are grouped together and set as a group called "G2". Note that groups GS are not limited to per equipment, and may be set as one group, such as "G3," which groups together the inspection items for the third and fourth equipment.

図１２に示すように、例えば、第１設備についての点検に対応する第３１出力ＥＸ３１に対して省略操作Ｃ３がされた場合には、音声認識システム１は、省略操作Ｃ３が「Ｇ１」というグループに対応する操作であると判断する。 As shown in FIG. 12, for example, when the omission operation C3 is performed on the 31st output EX31 corresponding to the inspection of the first equipment, the voice recognition system 1 determines that the omission operation C3 is an operation corresponding to the group "G1."

図１３に示すように、「Ｇ１」のグループＧＳには、「項目番号」が「１」及び「２」の「点検項目」が属する。ゆえに、音声認識システム１は、「項目番号」が「１」及び「２」の「点検項目」を示す第２音声ＳＤ２を省略する省略手順を行う。 As shown in FIG. 13, the group GS of "G1" includes the "inspection items" with the "item numbers" "1" and "2." Therefore, the voice recognition system 1 performs an omission procedure to omit the second voice SD2 indicating the "inspection items" with the "item numbers" "1" and "2."

なお、省略操作Ｃ３は、省略するグループ又は設備の名称を入力する操作でもよい。 Note that the omission operation C3 may be an operation of inputting the name of the group or facility to be omitted.

＜チェックを行う変形例＞
音声認識システム１は、点検結果をチェックして設備に異常があるか否かを判断する構成が望ましい。以下、チェックの対象となる値、すなわち、第１音声ＳＤ１で入力される内容であって音声認識された結果を示すデータを「第１入力データ」という。一方で、第１データをチェックするのに用いられるデータを「第２入力データ」という。 <Modification for Checking>
It is desirable that the voice recognition system 1 is configured to check the inspection results and determine whether or not there is an abnormality in the equipment. Hereinafter, the value to be checked, i.e., the data input by the first voice SD1 and showing the voice recognition result, will be referred to as "first input data". On the other hand, the data used to check the first data will be referred to as "second input data".

音声認識システム１は、第１入力データが第１データ生成手順、すなわち、音声認識によって生成されると、第１入力データと第２入力データを比較して異常であるか否かを判断する判断手順を行う。 When the first input data is generated through the first data generation procedure, i.e., through voice recognition, the voice recognition system 1 performs a judgment procedure in which the first input data is compared with the second input data to judge whether or not there is an abnormality.

第２入力データは、第１入力データより前、すなわち、第１入力データをチェックするまでに記憶手順によって記憶されていればよい。例えば、第２入力データは、以下のように記憶される。 The second input data only needs to be stored by the storage procedure before the first input data, i.e., before the first input data is checked. For example, the second input data is stored as follows:

図１４は、第２入力データを前回結果とする例を示す図である。以下、第１実施形態と同様の点検項目である場合を例に説明する。例えば、第２入力データは、図示する「第２入力データ」のように、それぞれの「点検項目」に対応して記憶される前回の点検結果（以下「前回結果Ｖ２１」という。）である。 Figure 14 is a diagram showing an example in which the second input data is the previous result. Below, an example will be described in which the inspection items are the same as those in the first embodiment. For example, the second input data is the previous inspection result (hereinafter referred to as "previous result V21") that is stored corresponding to each "inspection item" as shown in the figure as "second input data."

なお、前回結果Ｖ２１は、前回の点検で異常がなかったのを前提とする。したがって、前回の点検で異常があった場合には、それ以前の点検結果又は正常値が前回結果Ｖ２１に入力されてもよい。 Note that the previous result V21 is based on the assumption that no abnormalities were found in the previous inspection. Therefore, if an abnormality was found in the previous inspection, the previous inspection result or a normal value may be entered in the previous result V21.

つまり、音声認識システム１は、第１入力データ、すなわち、今回の点検結果を前回結果Ｖ２１と比較して違いがあるか否かを判断する。そして、音声認識システム１は、第１入力データと第２入力データが異なると、異常であると判断する。 In other words, the voice recognition system 1 compares the first input data, i.e., the current inspection result, with the previous result V21 to determine whether there is a difference. Then, if the first input data and the second input data differ, the voice recognition system 1 determines that there is an abnormality.

なお、正常と判断する範囲には、ある程度の許容範囲があってもよい。すなわち、音声認識システム１は、第１入力データが第２入力データに対して許容範囲外であると、異常であると判断する。例えば、許容範囲は、以下のように設定される。 The range that is determined to be normal may have a certain degree of tolerance. In other words, the voice recognition system 1 determines that the first input data is abnormal if it is outside the tolerance range for the second input data. For example, the tolerance range is set as follows:

図１５は、許容範囲を設定する例を示す図である。例えば、許容範囲Ｖ２２は、図示する「許容範囲」のように、前回結果Ｖ２１から結果が外れても「正常」と判断する範囲を示す。なお、この例は、「項目番号」が「２」の場合のように、前回結果Ｖ２１と一致しない場合をすべて「異常」と判断させるために、許容範囲Ｖ２２に「なし」と設定する例である。 Figure 15 is a diagram showing an example of setting an acceptable range. For example, the acceptable range V22 indicates the range within which the result is judged to be "normal" even if it deviates from the previous result V21, as in the "acceptable range" shown in the figure. Note that this example is an example in which the acceptable range V22 is set to "none" so that all cases where there is no match with the previous result V21, such as when the "item number" is "2", are judged to be "abnormal".

このように、許容範囲Ｖ２２によって、前回結果Ｖ２１に対して幅を持たせる構成であってもよい。特に、数値が点検の対象となる場合には、数値の微小な変動が異常でない場合が多い。したがって、このように許容できる範囲が設定できると、異常を精度良く判断できる。 In this way, the tolerance range V22 may be configured to provide a margin for the previous result V21. In particular, when a numerical value is subject to inspection, minute fluctuations in the numerical value are often not abnormal. Therefore, if an acceptable range can be set in this way, abnormalities can be determined with high accuracy.

このように、前回結果Ｖ２１、すなわち、直近の点検結果と比較して、違いがあるような場合には、設備に異常があると判断される。このようなチェックが行われると、音声認識システム１は、異常が発生しているような設備を知らせることができる。 In this way, if there is a difference when compared with the previous result V21, i.e., the most recent inspection result, it is determined that there is an abnormality in the equipment. When such a check is performed, the voice recognition system 1 can notify the user of any equipment in which an abnormality has occurred.

なお、チェックは、以下のように第２入力データに正常値又は正常範囲を示す値を設定して行われてもよい。 The check may also be performed by setting a value indicating a normal value or normal range in the second input data as follows:

図１６は、第２入力データを正常値又は正常範囲とする例を示す図である。例えば、図示する「第２入力データ正常値又は正常範囲」のような値（以下「正常値Ｖ２３」という。）が事前に設定されてもよい。 Figure 16 is a diagram showing an example in which the second input data is set to a normal value or normal range. For example, a value such as the illustrated "second input data normal value or normal range" (hereinafter referred to as "normal value V23") may be set in advance.

正常値Ｖ２３が設定されると、音声認識システム１は、正常とする値又は範囲を把握できる。したがって、音声認識システム１は、第１入力データを正常値Ｖ２３と比較して、第１入力データと第２入力データが異なる、又は、第１入力データが正常範囲の範囲外であると、異常であると判断する。このようなチェックが行われると、音声認識システム１は、異常が発生しているような設備を知らせることができる。 When the normal value V23 is set, the voice recognition system 1 can grasp the value or range that is considered normal. Therefore, the voice recognition system 1 compares the first input data with the normal value V23, and if the first input data and the second input data are different, or if the first input data is outside the normal range, it determines that there is an abnormality. When such a check is performed, the voice recognition system 1 can notify equipment in which an abnormality is occurring.

＜第１辞書及び第２辞書を用いる変形例＞
なお、携帯端末１１にデータ及びプログラム等がダウンロードされて、携帯端末１１で全体処理が行われてもよい。例えば、点検を行う現場は、通信環境が良くない場合がある。すなわち、現場は、携帯端末１１とサーバ１０の間で通信を行うのが難しい通信環境である場合がある。 <Modification using first and second dictionaries>
In addition, data, programs, etc. may be downloaded to the mobile terminal 11, and the entire processing may be performed by the mobile terminal 11. For example, the communication environment at the site where the inspection is performed may be poor. That is, the communication environment at the site may be such that it is difficult to communicate between the mobile terminal 11 and the server 10.

そこで、点検を開始する前に、サーバ１０から事前に入力される設定値等がダウンロードされて、点検を行っている間は、携帯端末１１とサーバ１０の間で通信が行われなくとも全体処理が完了できる構成であってもよい。なお、この場合には、点検が完了し、携帯端末１１とサーバ１０の間で通信が行える環境となった場合に、携帯端末１１からサーバ１０へ点検結果等をアップロードしてもよい。 Therefore, a configuration may be adopted in which setting values and the like input in advance are downloaded from the server 10 before the start of the inspection, and the entire process can be completed without communication between the mobile terminal 11 and the server 10 while the inspection is being performed. In this case, when the inspection is completed and an environment is created in which communication can be performed between the mobile terminal 11 and the server 10, the inspection results and the like may be uploaded from the mobile terminal 11 to the server 10.

このような場合等において、音声認識を行うために用いる辞書が、以下のように点検の分野用の辞書であるのが望ましい。 In such cases, it is desirable that the dictionary used for voice recognition is a dictionary for the inspection field, as follows:

図１７は、第１辞書Ｄ２１１及び第２辞書Ｄ２１２を用いる変形例を示す図である。以下、サーバ１０のようにネットワークＮＷを介して利用できる情報処理装置を「第１情報処理装置」という。第１情報処理装置に対して、現場でユーザ１３が利用できる携帯端末１１のような情報処理装置を「第２情報処理装置」という。 Figure 17 is a diagram showing a modified example using a first dictionary D211 and a second dictionary D212. Hereinafter, an information processing device that can be used via a network NW, such as the server 10, is referred to as a "first information processing device." In contrast to the first information processing device, an information processing device such as a mobile terminal 11 that can be used by a user 13 on-site is referred to as a "second information processing device."

図示するように、第１情報処理装置で音声認識に用いられる辞書を「第１辞書Ｄ２１１」という。一方で、第２情報処理装置で音声認識に用いられる辞書を「第２辞書Ｄ２１２」という。 As shown in the figure, the dictionary used for voice recognition in the first information processing device is called the "first dictionary D211." On the other hand, the dictionary used for voice recognition in the second information processing device is called the "second dictionary D212."

第２辞書Ｄ２１２は、点検の分野用の辞書である。したがって、第１辞書Ｄ２１１より、第２辞書Ｄ２１２は、データの容量を小さくできる。そのため、第２辞書Ｄ２１２は、第２情報処理装置のように、第１情報処理装置と比較して、記憶装置の記憶できる容量が小さい記憶領域ＭＥＭの情報処理装置であっても、記憶できる辞書が用いられるのが望ましい。 The second dictionary D212 is a dictionary for the inspection field. Therefore, the second dictionary D212 can have a smaller data capacity than the first dictionary D211. For this reason, it is preferable that the second dictionary D212 is a dictionary that can be stored even in an information processing device with a memory area MEM that has a smaller storage capacity than the first information processing device, such as the second information processing device.

＜ノイズキャンセルの変形例＞
音声認識システム１は、第１音声ＳＤ１に含まれるノイズをキャンセルする構成が望ましい。そして、第１音声ＳＤ１ノイズをキャンセルした音声（以下「第３音声」という。）に基づいて、音声認識が行われるのが望ましい。 <Modification of noise cancellation>
The voice recognition system 1 is preferably configured to cancel noise contained in the first voice SD1, and preferably performs voice recognition based on a voice (hereinafter referred to as a "third voice") obtained by canceling the noise of the first voice SD1.

ノイズのキャンセルは、例えば、ローパスフィルタ、及び、バンドパスフィルタ等のフィルタ又は複数のフィルタの組み合わせによって、対象する周波数帯域に含まれるノイズを減衰するようにして行われる。例えば、フィルタは、デジタルフィルタ、又は、フィルタリングを行う回路等によって実現する。 Noise cancellation is performed by attenuating noise in the target frequency band using, for example, a low-pass filter, a band-pass filter, or a combination of multiple filters. For example, the filter is realized by a digital filter or a filtering circuit.

また、キャンセルの対象となる周波数帯域は、現場又は位置ごとに個別に設定されるのが望ましい。以下、現場ごとに、キャンセルの対象とする周波数帯域を設定する場合の例を説明する。 Furthermore, it is desirable to set the frequency bands to be canceled individually for each site or location. Below, we explain an example of setting the frequency bands to be canceled for each site.

図１８は、現場ごとにキャンセルの対象とする周波数帯域を設定する例を示す図である。以下、図１８（Ａ）を「第１現場」用とし、図１８（Ｂ）を「第２現場」用とする。なお、キャンセルの対象とする周波数帯域は、ＧＰＳ等を用いて位置情報に基づいて設定されてもよい。 Figure 18 shows an example of setting frequency bands to be cancelled for each site. Hereinafter, Figure 18(A) is for the "first site" and Figure 18(B) is for the "second site." Note that the frequency bands to be cancelled may be set based on location information using GPS or the like.

例えば、第１現場において、一定時間の音声（ユーザ１３が発する音声はない状態であるとする。）を入力し、音声を入力したデータに対して周波数解析（例えば、ＦＦＴ（高速フーリエ変換、ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）等である。）を行うと、図１８（Ａ）及び図１８（Ｂ）のような周波数解析結果が得られる。 For example, at the first site, if a certain period of voice (assuming that no voice is being produced by the user 13) is input and frequency analysis (e.g., FFT (Fast Fourier Transform)) is performed on the voice input data, the frequency analysis results shown in Figures 18(A) and 18(B) are obtained.

図１８（Ａ）は、第１現場で、図１８（Ａ）に示すような周波数帯域（以下「第１周波数帯域ＦＲ１」という。）に、ノイズ（以下、第１現場で発生するノイズを「第１ノイズＮＺ１」という。）が発生する例である。 Figure 18 (A) shows an example in which noise (hereinafter, the noise generated at the first site is referred to as "first noise NZ1") occurs in the frequency band shown in Figure 18 (A) (hereinafter, referred to as "first frequency band FR1") at the first site.

図１８（Ｂ）は、第２現場で、図１８（Ｂ）に示すような周波数帯域（以下「第２周波数帯域ＦＲ２」という。）に、ノイズ（以下、第２現場で発生する２つのノイズを「第２ノイズＮＺ２」及び「第３ノイズＮＺ３」という。）が発生する例である。 Figure 18 (B) is an example where noise (hereinafter, the two noises occurring at the second site are referred to as "second noise NZ2" and "third noise NZ3") occurs at the second site in the frequency band shown in Figure 18 (B) (hereinafter, referred to as "second frequency band FR2").

ノイズは、現場又は位置ごとに異なる場合がある。例えば、現場ごとに、稼働している設備の種類が異なると、設備は、異なる音を発する場合があるため、ノイズの発生する周波数帯域も異なるようになる場合がある。この例では、第１現場は、第１ノイズＮＺ１が発生するのに対して、第２現場は、第２ノイズＮＺ２及び第３ノイズＮＺ３が発生する。この例では、第１現場は、第２現場よりも低い周波数帯域でノイズが発生する。 Noise may vary from site to site or location. For example, if different types of equipment are in operation at each site, the equipment may emit different sounds, and the frequency band in which noise occurs may also be different. In this example, the first site generates a first noise NZ1, while the second site generates a second noise NZ2 and a third noise NZ3. In this example, the first site generates noise in a lower frequency band than the second site.

そのため、第１現場では、第１ノイズＮＺ１を減衰させるため、第１周波数帯域ＦＲ１がキャンセルの対象となるのが望ましい。一方で、第２現場では、第２ノイズＮＺ２及び第３ノイズＮＺ３を減衰させるため、第２周波数帯域ＦＲ２がキャンセルの対象となるのが望ましい。 Therefore, at the first site, it is desirable to target the first frequency band FR1 for cancellation in order to attenuate the first noise NZ1. On the other hand, at the second site, it is desirable to target the second frequency band FR2 for cancellation in order to attenuate the second noise NZ2 and the third noise NZ3.

第１周波数帯域ＦＲ１及び第２周波数帯域ＦＲ２は、例えば、点検を行う前に現場又は位置ごとに音声を解析して設定される。なお、周波数帯域の設定は、数値で入力できてもよい。 The first frequency band FR1 and the second frequency band FR2 are set, for example, by analyzing the sound for each site or location before an inspection is performed. The frequency bands may be set by inputting numerical values.

＜中断及び解除の変形例＞
中断処理ＰＲ２は、以下のように、音声認識システム１が、音声認識によって所定の言葉を認識する場合に行われてもよい。 <Variations of Suspension and Cancellation>
The interruption process PR2 may be performed when the voice recognition system 1 recognizes a predetermined word by voice recognition, as follows.

図１９は、中断及び解除の第１変形例を示す図である。以下、第２実施形態と同様の第２音声ＳＤ２を出力する場合を例に説明する。第２実施形態と比較すると、この変形例は、中断及び解除に所定の言葉を用いる点が異なる。 Figure 19 shows a first modified example of interruption and cancellation. Below, an example will be described in which the second audio SD2 is output in the same way as in the second embodiment. Compared to the second embodiment, this modified example differs in that specific words are used for interruption and cancellation.

この例では、音声認識システム１は、第１音声ＳＤ１で「ポーズ」という言葉（以下「中断音声Ｃ１１」という。）を入力すると、第２音声ＳＤ２の出力を中断する。 In this example, when the word "pause" (hereinafter referred to as "interrupted speech C11") is input in the first speech SD1, the speech recognition system 1 interrupts the output of the second speech SD2.

そして、中断の後、音声認識システム１は、第１音声ＳＤ１で「解除」という言葉（以下「解除音声Ｃ２１」という。）を入力すると、中断を解除する。 Then, after the interruption, the voice recognition system 1 cancels the interruption when the word "cancel" (hereinafter referred to as "cancel voice C21") is input as the first voice SD1.

なお、中断音声Ｃ１１及び解除音声Ｃ２１がどのような言葉かは、事前に設定される。 The words used for the interruption sound C11 and the release sound C21 are set in advance.

このように、所定の言葉で中断及び解除が操作できると、ユーザ１３は、例えば、手に道具を持つような場合等でも操作を行うことができ、操作性を向上できる。 In this way, being able to pause and cancel using specific words allows the user 13 to perform the operation even when, for example, holding a tool in their hand, improving operability.

図２０は、中断及び解除の第２変形例を示す図である。この例は、図示するように、イヤホン１２が有するボタンを押す操作が中断操作となる例である。 Figure 20 shows a second modified example of interruption and cancellation. In this example, as shown in the figure, the operation of pressing a button on the earphone 12 is the interruption operation.

例えば、１回目のボタンを押す操作（以下「第１操作Ｃ１２」という。）によって、音声認識システム１は、第２音声ＳＤ２の出力を中断する。 For example, the first button press (hereinafter referred to as the "first operation C12") causes the voice recognition system 1 to interrupt the output of the second voice SD2.

次に、２回目のボタンを押す操作（以下「第２操作Ｃ２２」という。）によって、音声認識システム１は、中断を解除する。 Next, by pressing the button a second time (hereinafter referred to as the "second operation C22"), the voice recognition system 1 releases the interruption.

このようなイヤホン１２で操作ができると、携帯端末１１を取り出す手間等を省ける。 Being able to operate using such earphones 12 eliminates the need to take out the mobile terminal 11.

なお、図示するように、第２００出力ＥＸ２００が省略されてもよい。この例では、音声認識システム１は、中断が行われた第１時点より、前の時点である第２１出力ＥＸ２１が開始される時点（以下「第２時点」という。）から出力音声を出力する。なお、第２時点は、事前に設定される、又は、第１時点で出力されていた出力音声の最初の時点等である。このように、出力音声が言い直しされる構成でもよい。 As shown in the figure, the 200th output EX200 may be omitted. In this example, the speech recognition system 1 outputs the output speech from the point in time when the 21st output EX21 is started (hereinafter referred to as the "second point in time"), which is a point in time before the first point in time when the interruption occurred. The second point in time is set in advance, or is the initial point in time of the output speech that was output at the first point in time, etc. In this way, the output speech may be restated.

また、中断及び解除は、例えば、所定のボタンを押している間、中断し、ボタンが押されるのが終わると解除するといったように、ボタン等で実現してもよい。 Also, the interruption and release may be realized by a button, for example, by interrupting while a specific button is pressed and releasing when the button is no longer pressed.

他にも、中断及び解除は、携帯端末１１等による所定の動作の開始及び終了に連動してもよい。例えば、携帯端末１１に電話がかかってきたのを感知すると、音声認識システム１は、第２音声ＳＤ２の出力を中断する。なお、所定の動作は、事前に設定される。また、所定の動作は、外部装置による動作であってもよい。 Additionally, the interruption and release may be linked to the start and end of a specified operation by the mobile terminal 11 or the like. For example, when the voice recognition system 1 detects an incoming call to the mobile terminal 11, it interrupts the output of the second voice SD2. The specified operation is set in advance. The specified operation may also be an operation by an external device.

＜音声入力及び音声出力の変形例＞
図２１は、音声入力及び音声出力の変形例を示す図である。以下、図示するように音声で「１２．３」という数値（以下「対象数値２０」という。）を扱う場合を例に説明する。 <Modifications of Audio Input and Audio Output>
21 is a diagram showing a modified example of voice input and voice output. Hereinafter, an example will be described in which a numerical value of "12.3" (hereinafter referred to as "target numerical value 20") is handled by voice as shown in the figure.

第１音声ＳＤ１、すなわち、入力では、対象数値２０は、１桁ずつ入力されるのが望ましい。具体的には、図示するように、第１音声ＳＤ１となる発音２１は、対象数値２０を分解して、「いち」、「に」、「てん」、及び、「さん」というように、１桁ずつ読み上げられるのが望ましい。 In the first voice SD1, i.e., input, it is preferable that the target number 20 is input one digit at a time. Specifically, as shown in the figure, the pronunciation 21 that becomes the first voice SD1 is preferable to break down the target number 20 and read out one digit at a time, such as "ichi," "ni," "ten," and "san."

第２音声ＳＤ２、すなわち、出力では、対象数値２０は、数値全体を表現するように出力されるのが望ましい。具体的には、図示するように、第２音声ＳＤ２となるデータ音声２２は、対象数値２０の全体を表現して「じゅうにてんさん」というように、数値全体が表現されるように出力されるのが望ましい。 In the second audio SD2, i.e., in the output, it is desirable that the target numerical value 20 is output so as to express the entire numerical value. Specifically, as shown in the figure, it is desirable that the data audio 22 which becomes the second audio SD2 expresses the entire target numerical value 20 and is output so as to express the entire numerical value, such as "ten thousand and ten."

発音２１のように、１桁ずつ読み上げられる音声であると、音声認識を精度良く行うことができる。一方で、データ音声２２が、数値全体を表現すると、ユーザ１３は、１桁ずつ出力されるより、数値を音声で理解しやすい。 When the voice reads out one digit at a time, as in pronunciation 21, voice recognition can be performed with high accuracy. On the other hand, when data voice 22 expresses the entire number, it is easier for user 13 to understand the number by voice rather than when it is output one digit at a time.

＜オフラインで音声認識を行う変形例＞
オフラインの環境であっても、入力された音声を音声認識できる構成が望ましい。 <Modification of Offline Speech Recognition>
It is desirable to have a configuration that can recognize input speech even in an offline environment.

建築・保守現場は、通信環境によってインターネットにつながりにくい環境である場合も多い。そのため、常時クラウドを用いるのが困難な場合も多い。ゆえに、常時、クラウドにある音声認識エンジンを用いる構成であると、作業現場で音声認識等が実行できない場合がある。このような事態を避けるため、携帯端末１１内で動作する音声認識エンジンを用いる構成が望ましい。特に、定型であって、短い言葉は、携帯端末１１内で動作する音声認識エンジンで音声認識される構成が望ましい。 In construction and maintenance sites, the communication environment often makes it difficult to connect to the Internet. As a result, it is often difficult to use the cloud all the time. Therefore, if a voice recognition engine in the cloud is always used, voice recognition and the like may not be able to be performed at the work site. To avoid such situations, a configuration that uses a voice recognition engine that operates within the mobile terminal 11 is desirable. In particular, a configuration in which standard, short words are voice-recognized by the voice recognition engine that operates within the mobile terminal 11 is desirable.

このような構成であると、オフラインでも音声認識システム１を用いることができる。ゆえに、インターネットにつながりにくい環境であっても、音声認識システム１を用いて音声を入力することができる。 With this configuration, the voice recognition system 1 can be used offline. Therefore, even in an environment where it is difficult to connect to the Internet, voice can be input using the voice recognition system 1.

さらに、ユーザ１３によるコメント等といった非定型な音声入力は、携帯端末１１が録音する構成であるのが望ましい。そして、事務所等といったインターネットにつながる環境下において、携帯端末１１は、録音済みの音声をクラウド上の音声認識エンジンに送信してテキスト化する構成が望ましい。 Furthermore, it is preferable that the mobile terminal 11 is configured to record non-standard voice input such as comments by the user 13. In an environment connected to the Internet, such as an office, the mobile terminal 11 is preferably configured to transmit the recorded voice to a voice recognition engine on the cloud and convert it into text.

このように、オフラインで使用できる音声認識エンジンと、オンラインで使用できる音声認識エンジンを使い分ける構成が望ましい。 In this way, it is desirable to have a configuration that allows separate use of a voice recognition engine that can be used offline and a voice recognition engine that can be used online.

また、音声認識エンジンの使い分けは、通信環境を考慮して切り替えられてもよい。具体的には、携帯端末１１は、電波強度を計測して通信環境の良し悪しを判断する。なお、通信環境良し悪しは、現場ごとにあらかじめ登録又は以前の判断結果等が記憶されてもよい。このように、通信環境を考慮する構成であると、クラウド上の音声認識エンジンが使用できないといったトラブルを防ぐことができる。 The voice recognition engine may be switched depending on the communication environment. Specifically, the mobile terminal 11 measures radio wave strength to determine whether the communication environment is good or bad. The quality of the communication environment may be registered in advance for each site, or previous judgment results may be stored. In this way, a configuration that takes the communication environment into consideration can prevent problems such as the voice recognition engine on the cloud being unable to be used.

また、上記のようにオフライン等で録音を行うため、携帯端末１１は、入力した音声データを録音し、録音データを生成及び録音データを記憶できる記憶部及び録音データ生成部を有する構成であるのが望ましい。 In addition, in order to record offline as described above, it is preferable that the mobile terminal 11 is configured to have a memory unit and a recording data generation unit that can record input voice data, generate recording data, and store the recording data.

なお、記憶部及び録音データ生成部は、オフラインに用いられるに限られない。すなわち、記憶部及び録音データ生成部は、クラウド上の音声認識エンジンが使用できる環境であっても、録音データを生成及び録音データを記憶してもよい。 The storage unit and the recording data generation unit are not limited to being used offline. In other words, the storage unit and the recording data generation unit may generate and store recording data even in an environment where a cloud-based voice recognition engine can be used.

通信は、突然切断される場合もあるため、クラウド上の音声認識エンジンが使用できる場合であっても、録音できる構成が望ましい。このような構成であると、バックアップを行うことができる。 Since communication may be suddenly cut off, it is desirable to have a configuration that allows recording even if a cloud-based voice recognition engine is available. This configuration allows for backup.

＜その他の実施形態＞
実施形態は、上記の例に限られない。例えば、装置の数は、上記の例に示す台数に限られない。したがって、上記の例における各装置は、２台以上のシステムであってもよい。一方で、装置は、１台の構成でもよい。また、情報処理装置の種類及び組み合わせも、上記に示す装置でなくともよい。 <Other embodiments>
The embodiment is not limited to the above example. For example, the number of devices is not limited to the number shown in the above example. Therefore, each device in the above example may be a system of two or more devices. On the other hand, the device may be a single device. In addition, the type and combination of information processing devices may not be the devices shown above.

実施形態は、上記の処理に限られない。例えば、本発明に係る音声認識方法は、上記に説明した以外の順序で行われてもよい。また、音声認識方法は、複数の情報処理装置で実行されてもよい。つまり、音声認識方法における各ステップは、冗長、分散、並列、仮想化又はこれらの組み合わせで実行されてもよい。 The embodiment is not limited to the above process. For example, the speech recognition method according to the present invention may be performed in an order other than that described above. Furthermore, the speech recognition method may be executed on multiple information processing devices. In other words, each step in the speech recognition method may be executed in a redundant, distributed, parallel, virtualized, or combination of these.

実施形態は、プログラムによって実現されてもよい。すなわち、情報処理装置等のコンピュータは、プログラムに基づいて、演算装置及び記憶装置等を制御して、上記の方法を実行してもよい。また、プログラムは、コンピュータが読み取り可能な記録媒体に記録されて頒布することができる。なお、記録媒体は、磁気テープ、フラッシュメモリ、光ディスク、光磁気ディスク又は磁気ディスク等のメディアである。さらに、プログラムは、電気通信回線を通じて頒布することができる。 The embodiment may be realized by a program. That is, a computer such as an information processing device may control an arithmetic unit and a storage device based on the program to execute the above-mentioned method. The program may be recorded on a computer-readable recording medium and distributed. The recording medium may be a medium such as a magnetic tape, a flash memory, an optical disk, a magneto-optical disk, or a magnetic disk. Furthermore, the program may be distributed through a telecommunications line.

なお、上記に示す実施形態の構成等に、その他の要素との組み合わせ等、上記の構成に本発明が限定されるものではない。これらの点に関しては、本発明の趣旨を逸脱しない範囲で変更することが可能であり、その応用形態に応じて適切に定めることができる。 The present invention is not limited to the above-described configuration of the embodiment shown above, and may be combined with other elements. These aspects may be changed without departing from the spirit of the present invention, and may be appropriately determined according to the application form.

１音声認識システム
１Ｆ１００中断手段
１Ｆ１０１解除手段
１Ｆ１１音声入力手段
１Ｆ１２音声認識手段
１Ｆ１３登録手段
１Ｆ１４出力手段
１Ｆ１５制限手段
１Ｆ１６入力データ生成手段
１Ｆ１７記憶手段
１Ｆ１８判断手段
１Ｆ１９ノイズキャンセル手段
１Ｆ２０グループ設定手段
１Ｆ２１省略操作手段
１０サーバ
１１携帯端末
１２イヤホン
１３ユーザ
２０対象数値
２１発音
２２データ音声
Ｃ１中断操作
Ｃ２解除操作
Ｃ３省略操作
Ｃ１１中断音声
Ｃ１２第１操作
Ｃ２１解除音声
Ｃ２２第２操作
Ｄ２１１第１辞書
Ｄ２１２第２辞書
ＥＸ１１第１１出力
ＥＸ１２第１２出力
ＥＸ１３第１３出力
ＥＸ２１第２１出力
ＥＸ２２第２２出力
ＥＸ２３第２３出力
ＥＸ３１第３１出力
ＥＸ３２第３２出力
ＥＸ２００第２００出力
ＦＲ１第１周波数帯域
ＦＲ２第２周波数帯域
ＧＳグループ
ＭＥＭ記憶領域
ＮＷネットワーク
ＮＺ１第１ノイズ
ＮＺ２第２ノイズ
ＮＺ３第３ノイズ
ＰＲ１制限処理
ＰＲ２中断処理
ＰＲ３省略処理
ＳＤ１第１音声
ＳＤ２第２音声
Ｖ１０第１入力データ
Ｖ２１前回結果
Ｖ２２許容範囲
Ｖ２３正常値 1 Speech recognition system 1F100 Interruption means 1F101 Cancellation means 1F11 Speech input means 1F12 Speech recognition means 1F13 Registration means 1F14 Output means 1F15 Restriction means 1F16 Input data generation means 1F17 Storage means 1F18 Judgment means 1F19 Noise cancellation means 1F20 Group setting means 1F21 Omission operation means 10 Server 11 Mobile terminal 12 Earphone 13 User 20 Target numerical value 21 Pronunciation 22 Data voice C1 Interruption operation C2 Cancellation operation C3 Omission operation C11 Interruption voice C12 First operation C21 Cancellation voice C22 Second operation D211 First dictionary D212 Second dictionary EX11 Eleventh output EX12 Twelfth output EX13 Thirteenth output EX21 Twenty-first output EX22 Twenty-second output EX23 Twenty-third output EX31 Thirty-first output EX32 32nd output EX200 200th output FR1 First frequency band FR2 Second frequency band GS Group MEM Memory area NW Network NZ1 First noise NZ2 Second noise NZ3 Third noise PR1 Restriction process PR2 Interruption process PR3 Omission process SD1 First voice SD2 Second voice V10 First input data V21 Previous result V22 Allowable range V23 Normal value

Claims

A voice recognition system for use in equipment inspection, comprising:
A voice input means for inputting a first voice;
a voice recognition unit that performs voice recognition based on the first voice;
a registration means for registering a plurality of inspection items, an inspection sequence, and a second sound in association with each other;
an output means for outputting the second sound;
a limiting means for limiting an output of the second sound when the first sound is input while the second sound is being output,
the output means outputs the second voice indicating the inspection items in accordance with the inspection order, and when the first voice indicating the inspection result of the inspection item is input, outputs the second voice indicating the inspection item and a voice recognition result of the first voice, and outputs the second voice indicating the next inspection item;
the limiting means stops output of the second voice and starts output of a voice to be output next to the second voice .
Voice recognition system.

displaying a speech recognition result of the first speech indicating the inspection result on a screen of a mobile terminal;
When a re-input operation is performed on the mobile terminal, re-input of the inspection result displayed on the screen is accepted.
2. The speech recognition system of claim 1.

A voice input means for inputting a first voice;
a voice recognition unit that performs voice recognition based on the first voice;
A registration means for registering a second voice;
an output means for outputting the second sound;
a limiting means for limiting an output of the second sound when the first sound is input while the second sound is being output,
The registration means registers a plurality of inspection items in association with the second sound,
A group setting means for setting a group of the inspection items;
and an omission operation means for inputting an omission operation for omitting the group,
When the omission operation is input, output of the second sound based on the inspection item belonging to the group corresponding to the omission operation is omitted.
Voice recognition system.

4. The voice recognition system according to claim 3, which is used for inspecting facilities.

an input data generating means for generating first input data indicating the content of the first voice based on a speech recognition result of the first voice;
a storage means for storing second input data used to check the first input data;
5. The speech recognition system according to claim 1, further comprising a determination unit for comparing the first input data with the second input data to determine whether or not the first input data is abnormal.

The second input data is
inputted before the first input data,
The determination means is
6. The speech recognition system according to claim 5 , wherein if the first input data and the second input data are different, or if the first input data is outside an allowable range for the second input data, it is determined that an abnormality has occurred.

The second input data is
A value indicating a normal value or normal range,
The determination means is
6. The speech recognition system according to claim 5 , wherein if the first input data and the second input data are different, or if the first input data is outside the normal range, it is determined that there is an abnormality.

A speech recognition system including a first information processing device and a second information processing device,
The first information processing device,
Performing speech recognition using the first dictionary;
The second information processing device is
the speech recognition means for recognizing the first speech using a second dictionary;
8. A speech recognition system according to claim 1 , wherein the second dictionary is a dictionary for the field of inspection.

a noise canceling unit that cancels noise included in the first sound to generate a third sound,
The voice recognition means
performing speech recognition using the third speech;
9. The speech recognition system according to claim 1, wherein the noise canceling means sets a frequency band to be cancelled for each site or position.

A voice recognition device used for inspecting equipment, comprising:
A voice input means for inputting a first voice;
a voice recognition unit that performs voice recognition based on the first voice;
a registration means for registering a plurality of inspection items, an inspection sequence, and a second sound in association with each other;
an output means for outputting the second sound;
a limiting means for limiting an output of the second sound when the first sound is input while the second sound is being output,
the output means outputs the second voice indicating the inspection items in accordance with the inspection order, and when the first voice indicating the inspection result of the inspection item is input, outputs the second voice indicating the inspection item and a voice recognition result of the first voice, and outputs the second voice indicating the next inspection item;
the limiting means stops output of the second voice and starts output of a voice to be output next to the second voice .
Speech recognition device.

A voice input means for inputting a first voice;
a voice recognition unit that performs voice recognition based on the first voice;
A registration means for registering a second voice;
an output means for outputting the second sound;
a limiting means for limiting an output of the second sound when the first sound is input while the second sound is being output,
The registration means registers a plurality of inspection items in association with the second sound,
A group setting means for setting a group of the inspection items;
and an omission operation means for inputting an omission operation for omitting the group,
When the omission operation is input, output of the second sound based on the inspection item belonging to the group corresponding to the omission operation is omitted.
Speech recognition device.

A voice recognition method performed by a voice recognition system used for inspecting equipment , comprising:
a speech input step of inputting a first speech by the speech recognition system;
a speech recognition step in which a speech recognition system performs speech recognition based on the first speech;
a registration step in which the voice recognition system registers a plurality of inspection items, an inspection sequence, and a second voice in association with each other;
an output step of the speech recognition system outputting the second speech;
a limiting step of limiting output of the second voice when the first voice is input while the second voice is being output,
The output step includes outputting the second voice indicating the inspection items according to the inspection order, and when the first voice indicating an inspection result of the inspection item is input, outputting the second voice indicating the inspection item and a voice recognition result of the first voice, and outputting the second voice indicating the next inspection item;
the restriction step includes stopping output of the second voice and starting output of a voice to be output next to the second voice .
Speech recognition methods.

A speech recognition method performed by a speech recognition system, comprising:
a speech input step of inputting a first speech by the speech recognition system;
a speech recognition step in which a speech recognition system performs speech recognition based on the first speech;
a registration step in which the voice recognition system registers the second voice;
an output step of the speech recognition system outputting the second speech;
a limiting step of limiting output of the second voice when the first voice is input while the second voice is being output,
The registration step includes registering a plurality of inspection items in association with the second sound,
A group setting procedure for setting a group that includes the inspection items;
and an omission operation procedure for inputting an omission operation for omitting the group,
When the omission operation is input, output of the second sound based on the inspection item belonging to the group corresponding to the omission operation is omitted.
Speech recognition methods.

A program for causing a computer to execute the speech recognition method according to claim 12 or 13 .