JP7778201B2

JP7778201B2 - system

Info

Publication number: JP7778201B2
Application number: JP2024161872A
Authority: JP
Inventors: 敬子千野
Original assignee: SoftBank Group Corp
Current assignee: SoftBank Group Corp
Priority date: 2023-09-19
Filing date: 2024-09-19
Publication date: 2025-12-01
Anticipated expiration: 2044-09-19
Also published as: JP2025044289A

Description

本開示の技術は、システムに関する。 The technology disclosed herein relates to a system.

特許文献１には、少なくとも一つのプロセッサにより遂行される、ペルソナチャットボット制御方法であって、ユーザ発話を受信するステップと、前記ユーザ発話を、チャットボットのキャラクターに関する説明と関連した指示文を含むプロンプトに追加するステップと前記プロンプトをエンコードするステップと、前記エンコードしたプロンプトを言語モデルに入力して、前記ユーザ発話に応答するチャットボット発話を生成するステップ、を含む、方法が開示されている。 Patent document 1 discloses a persona chatbot control method executed by at least one processor, the method including the steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to a description of the chatbot's character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

特開２０２２－１８０２８２号公報Japanese Patent Application Laid-Open No. 2022-180282

特殊詐欺は社会問題となっており、被害者が詐欺であることに気づくのが難しい場合が多い。特に、電話による詐欺では、被害者が詐欺であることに気づくまでに時間がかかり、その間に大きな損害を受けることがある。 Special frauds have become a social problem, and it is often difficult for victims to realize they are being scammed. In particular, with telephone fraud, it can take time for victims to realize they are being scammed, and in the meantime they may suffer significant losses.

本発明は、詐欺電話の入電を検知し、スマートフォンアプリケーションまたは家庭用電話に設置したマイクから会話を聞き取る。その会話データを文字化し、生成系ＡＩへ送信する。生成系ＡＩは会話データから詐欺が疑われると判断し、詐欺が疑われる場合には、前記マイクから音声アラートを発する。これにより、ユーザーは詐欺の可能性に早期に気づくことができ、被害を防ぐことが可能となる。 This invention detects incoming fraudulent calls and listens to the conversation through a smartphone application or a microphone installed on a home phone. The conversation data is converted into text and sent to a generative AI. The generative AI determines from the conversation data that fraud is suspected, and if fraud is suspected, an audio alert is issued from the microphone. This allows users to become aware of the possibility of fraud early on and prevent damage.

第１実施形態に係るデータ処理システムの構成の一例を示す概念図である。1 is a conceptual diagram showing an example of the configuration of a data processing system according to a first embodiment. 第１実施形態に係るデータ処理装置及びスマートデバイスの要部機能の一例を示す概念図である。1 is a conceptual diagram showing an example of main functions of a data processing device and a smart device according to a first embodiment. 第２実施形態に係るデータ処理システムの構成の一例を示す概念図である。FIG. 10 is a conceptual diagram illustrating an example of the configuration of a data processing system according to a second embodiment. 第２実施形態に係るデータ処理装置及びスマート眼鏡の要部機能の一例を示す概念図である。FIG. 10 is a conceptual diagram showing an example of main functions of a data processing device and smart glasses according to a second embodiment. 第３実施形態に係るデータ処理システムの構成の一例を示す概念図である。FIG. 10 is a conceptual diagram illustrating an example of the configuration of a data processing system according to a third embodiment. 第３実施形態に係るデータ処理装置及びヘッドセット型端末の要部機能の一例を示す概念図である。FIG. 11 is a conceptual diagram showing an example of main functions of a data processing device and a headset-type terminal according to a third embodiment. 第４実施形態に係るデータ処理システムの構成の一例を示す概念図である。FIG. 10 is a conceptual diagram illustrating an example of the configuration of a data processing system according to a fourth embodiment. 第４実施形態に係るデータ処理装置及びロボットの要部機能の一例を示す概念図である。FIG. 10 is a conceptual diagram showing an example of main functions of a data processing device and a robot according to a fourth embodiment. 複数の感情がマッピングされる感情マップを示す。1 shows an emotion map onto which multiple emotions are mapped. 複数の感情がマッピングされる感情マップを示す。1 shows an emotion map onto which multiple emotions are mapped. 形態例１の実施例１におけるデータ処理システムの処理の流れを示すシーケンス図である。FIG. 10 is a sequence diagram showing a flow of processing in the data processing system according to the first embodiment of the first form example. 形態例１の応用例１におけるデータ処理システムの処理の流れを示すシーケンス図である。FIG. 10 is a sequence diagram showing a processing flow of the data processing system in Application Example 1 of Form Example 1. 形態例２の実施例２におけるデータ処理システムの処理の流れを示すシーケンス図である。FIG. 10 is a sequence diagram showing a processing flow of a data processing system in a second embodiment of the second form example. 形態例２の応用例２におけるデータ処理システムの処理の流れを示すシーケンス図である。FIG. 10 is a sequence diagram showing the flow of processing in the data processing system in Application Example 2 of Form Example 2. 形態例３の実施例３におけるデータ処理システムの処理の流れを示すシーケンス図である。FIG. 10 is a sequence diagram showing a processing flow of a data processing system in a third embodiment of the third form example. 形態例３の応用例３におけるデータ処理システムの処理の流れを示すシーケンス図である。FIG. 13 is a sequence diagram showing the flow of processing in the data processing system in Application Example 3 of Form Example 3. 感情エンジンを組み合わせた場合の形態例１の実施例１におけるデータ処理システムの処理の流れを示すシーケンス図である。FIG. 10 is a sequence diagram showing the flow of processing in the data processing system in the first embodiment of the first form example when an emotion engine is combined. 感情エンジンを組み合わせた場合の形態例１の応用例１におけるデータ処理システムの処理の流れを示すシーケンス図である。FIG. 10 is a sequence diagram showing the flow of processing in the data processing system in Application Example 1 of Form Example 1 when an emotion engine is combined. 感情エンジンを組み合わせた場合の形態例２の実施例２におけるデータ処理システムの処理の流れを示すシーケンス図である。FIG. 10 is a sequence diagram showing the flow of processing in the data processing system in the second embodiment of the second form example when an emotion engine is combined. 感情エンジンを組み合わせた場合の形態例２の応用例２におけるデータ処理システムの処理の流れを示すシーケンス図である。FIG. 10 is a sequence diagram showing the flow of processing in the data processing system in Application Example 2 of Form Example 2 when an emotion engine is combined. 感情エンジンを組み合わせた場合の形態例３の実施例３におけるデータ処理システムの処理の流れを示すシーケンス図である。FIG. 10 is a sequence diagram showing the flow of processing in the data processing system in the third embodiment of the third form example when an emotion engine is combined. 感情エンジンを組み合わせた場合の形態例３の応用例３におけるデータ処理システムの処理の流れを示すシーケンス図である。FIG. 10 is a sequence diagram showing the flow of processing in the data processing system in Application Example 3 of Form Example 3 when an emotion engine is combined.

以下、添付図面に従って本開示の技術に係るシステムの実施形態の一例について説明する。 Below, an example of an embodiment of a system relating to the technology disclosed herein will be described with reference to the accompanying drawings.

先ず、以下の説明で使用される文言について説明する。 First, let me explain the terminology used in the following explanation.

以下の実施形態において、符号付きのプロセッサ（以下、単に「プロセッサ」と称する）は、１つの演算装置であってもよいし、複数の演算装置の組み合わせであってもよい。また、プロセッサは、１種類の演算装置であってもよいし、複数種類の演算装置の組み合わせであってもよい。演算装置の一例としては、ＣＰＵ（Central Processing Unit）、ＧＰＵ（Graphics Processing Unit）、ＧＰＧＰＵ（General-Purpose computing on Graphics Processing Units）、ＡＰＵ（Accelerated Processing Unit）、又はＴＰＵ（ＴＥＮＳＯＲＰＲＯＣＥＳＳＩＮＧＵＮＩＴ（登録商標））等が挙げられる。 In the following embodiments, a coded processor (hereinafter simply referred to as a "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Furthermore, a processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose Computing on Graphics Processing Units), an APU (Accelerated Processing Unit), or a TPU (TENSOR PROCESSING UNIT (registered trademark)).

以下の実施形態において、符号付きのＲＡＭ（Random Access Memory）は、一時的に情報が格納されるメモリであり、プロセッサによってワークメモリとして用いられる。 In the following embodiments, coded random access memory (RAM) is memory in which information is temporarily stored and is used as work memory by the processor.

以下の実施形態において、符号付きのストレージは、各種プログラム及び各種パラメータ等を記憶する１つ又は複数の不揮発性の記憶装置である。不揮発性の記憶装置の一例としては、フラッシュメモリ（ＳＳＤ（Solid State Drive））、磁気ディスク（例えば、ハードディスク）、又は磁気テープ等が挙げられる。 In the following embodiments, the coded storage refers to one or more non-volatile storage devices that store various programs, parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), and magnetic tapes.

以下の実施形態において、符号付きの通信Ｉ／Ｆ（Interface）は、通信プロセッサ及びアンテナ等を含むインタフェースである。通信Ｉ／Ｆは、複数のコンピュータ間での通信を司る。通信Ｉ／Ｆに対して適用される通信規格の一例としては、５Ｇ（5th Generation Mobile Communication System）、Ｗｉ－Ｆｉ（登録商標）、又はＢｌｕｅｔｏｏｔｈ（登録商標）等を含む無線通信規格が挙げられる。 In the following embodiments, a communication I/F (Interface) with a symbol is an interface that includes a communication processor, an antenna, etc. The communication I/F controls communication between multiple computers. Examples of communication standards that can be applied to the communication I/F include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

以下の実施形態において、「Ａ及び／又はＢ」は、「Ａ及びＢのうちの少なくとも１つ」と同義である。つまり、「Ａ及び／又はＢ」は、Ａだけであってもよいし、Ｂだけであってもよいし、Ａ及びＢの組み合わせであってもよい、という意味である。また、本明細書において、３つ以上の事柄を「及び／又は」で結び付けて表現する場合も、「Ａ及び／又はＢ」と同様の考え方が適用される。 In the following embodiments, "A and/or B" is synonymous with "at least one of A and B." In other words, "A and/or B" means that it may be just A, just B, or a combination of A and B. Furthermore, in this specification, the same concept as "A and/or B" also applies when three or more things are expressed connected by "and/or."

［第１実施形態］ [First embodiment]

図１には、第１実施形態に係るデータ処理システム１０の構成の一例が示されている。 Figure 1 shows an example of the configuration of a data processing system 10 according to the first embodiment.

図１に示すように、データ処理システム１０は、データ処理装置１２及びスマートデバイス１４を備えている。データ処理装置１２の一例としては、サーバが挙げられる。 As shown in FIG. 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

データ処理装置１２は、コンピュータ２２、データベース２４、及び通信Ｉ／Ｆ２６を備えている。コンピュータ２２は、本開示の技術に係る「コンピュータ」の一例である。コンピュータ２２は、プロセッサ２８、ＲＡＭ３０、及びストレージ３２を備えている。プロセッサ２８、ＲＡＭ３０、及びストレージ３２は、バス３４に接続されている。また、データベース２４及び通信Ｉ／Ｆ２６も、バス３４に接続されている。通信Ｉ／Ｆ２６は、ネットワーク５４に接続されている。ネットワーク５４の一例としては、ＷＡＮ（Wide Area Network）及び／又はＬＡＮ（Local Area Network）等が挙げられる。 The data processing device 12 includes a computer 22, a database 24, and a communication I/F 26. The computer 22 is an example of a "computer" according to the technology of the present disclosure. The computer 22 includes a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and communication I/F 26 are also connected to the bus 34. The communication I/F 26 is connected to a network 54. Examples of the network 54 include a WAN (Wide Area Network) and/or a LAN (Local Area Network).

スマートデバイス１４は、コンピュータ３６、受付装置３８、出力装置４０、カメラ４２、及び通信Ｉ／Ｆ４４を備えている。コンピュータ３６は、プロセッサ４６、ＲＡＭ４８、及びストレージ５０を備えている。プロセッサ４６、ＲＡＭ４８、及びストレージ５０は、バス５２に接続されている。また、受付装置３８、出力装置４０、及びカメラ４２も、バス５２に接続されている。 The smart device 14 includes a computer 36, a reception device 38, an output device 40, a camera 42, and a communication I/F 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

受付装置３８は、タッチパネル３８Ａ及びマイクロフォン３８Ｂ等を備えており、ユーザ入力を受け付ける。タッチパネル３８Ａは、指示体（例えば、ペン又は指等）の接触を検出することにより、指示体の接触によるユーザ入力を受け付ける。マイクロフォン３８Ｂは、ユーザの音声を検出することにより、音声によるユーザ入力を受け付ける。制御部４６Ａは、タッチパネル３８Ａ及びマイクロフォン３８Ｂによって受け付けたユーザ入力を示すデータをデータ処理装置１２に送信する。データ処理装置１２では、特定処理部２９０が、ユーザ入力を示すデータを取得する。 The reception device 38 is equipped with a touch panel 38A, a microphone 38B, etc., and receives user input. The touch panel 38A detects contact with an indicator (e.g., a pen or finger) to receive user input via the indicator. The microphone 38B detects the user's voice to receive user input via voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

出力装置４０は、ディスプレイ４０Ａ及びスピーカ４０Ｂ等を備えており、データをユーザ２０が知覚可能な表現形（例えば、音声及び／又はテキスト）で出力することでデータをユーザ２０に対して提示する。ディスプレイ４０Ａは、プロセッサ４６からの指示に従ってテキスト及び画像等の可視情報を表示する。スピーカ４０Ｂは、プロセッサ４６からの指示に従って音声を出力する。カメラ４２は、レンズ、絞り、及びシャッタ等の光学系と、ＣＭＯＳ（Complementary Metal-Oxide-Semiconductor）イメージセンサ又はＣＣＤ（Charge Coupled Device）イメージセンサ等の撮像素子とが搭載された小型デジタルカメラである。 The output device 40 is equipped with a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible by the user 20 (e.g., audio and/or text). The display 40A displays visible information such as text and images in accordance with instructions from the processor 46. The speaker 40B outputs audio in accordance with instructions from the processor 46. The camera 42 is a compact digital camera equipped with an optical system including a lens, aperture, and shutter, and an imaging element such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

通信Ｉ／Ｆ４４は、ネットワーク５４に接続されている。通信Ｉ／Ｆ４４及び２６は、ネットワーク５４を介してプロセッサ４６とプロセッサ２８との間の各種情報の授受を司る。 The communication I/F 44 is connected to the network 54. The communication I/Fs 44 and 26 are responsible for the exchange of various information between the processor 46 and the processor 28 via the network 54.

図２には、データ処理装置１２及びスマートデバイス１４の要部機能の一例が示されている。 Figure 2 shows an example of the main functions of the data processing device 12 and smart device 14.

図２に示すように、データ処理装置１２では、プロセッサ２８によって特定処理が行われる。ストレージ３２には、特定処理プログラム５６が格納されている。特定処理プログラム５６は、本開示の技術に係る「プログラム」の一例である。プロセッサ２８は、ストレージ３２から特定処理プログラム５６を読み出し、読み出した特定処理プログラム５６をＲＡＭ３０上で実行する。特定処理は、プロセッサ２８がＲＡＭ３０上で実行する特定処理プログラム５６に従って特定処理部２９０として動作することによって実現される。 As shown in FIG. 2, in the data processing device 12, specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" according to the technology of the present disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

ストレージ３２には、データ生成モデル５８及び感情特定モデル５９が格納されている。データ生成モデル５８及び感情特定モデル５９は、特定処理部２９０によって用いられる。 Storage 32 stores a data generation model 58 and an emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

スマートデバイス１４では、プロセッサ４６によって受付出力処理が行われる。ストレージ５０には、受付出力プログラム６０が格納されている。受付出力プログラム６０は、データ処理システム１０によって特定処理プログラム５６と併用される。プロセッサ４６は、ストレージ５０から受付出力プログラム６０を読み出し、読み出した受付出力プログラム６０をＲＡＭ４８上で実行する。受付出力処理は、プロセッサ４６がＲＡＭ４８上で実行する受付出力プログラム６０に従って、制御部４６Ａとして動作することによって実現される。 In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores a reception output program 60. The reception output program 60 is used in conjunction with the specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as the control unit 46A in accordance with the reception output program 60 executed on the RAM 48.

次に、データ処理装置１２の特定処理部２９０による特定処理について説明する。 Next, we will explain the identification process performed by the identification processing unit 290 of the data processing device 12.

「形態例１」 "Example 1"

本発明の一実施形態として、スマートフォンアプリケーションを用いる形態がある。この場合、スマートフォンにインストールされたアプリケーションが、電話の入電を検知し、会話をマイクから聞き取る。聞き取った会話データは文字化され、生成系ＡＩへ送信される。生成系ＡＩは、会話データから詐欺が疑われると判断し、詐欺が疑われる場合には、スマートフォンのスピーカーから音声アラートを発する。 One embodiment of the present invention uses a smartphone application. In this case, the application installed on the smartphone detects an incoming phone call and listens to the conversation through the microphone. The listened-to conversation data is converted into text and sent to the generative AI. The generative AI determines from the conversation data that fraud is suspected, and if fraud is suspected, issues an audio alert from the smartphone's speaker.

「形態例２」 "Example 2"

本発明の別の実施形態として、家庭用電話にマイクを設置する形態がある。この場合、家庭用電話のマイクが電話の会話を聞き取り、聞き取った会話データは文字化され、生成系ＡＩへ送信される。生成系ＡＩは、会話データから詐欺が疑われると判断し、詐欺が疑われる場合には、家庭用電話のスピーカーから音声アラートを発する。 Another embodiment of the present invention involves installing a microphone on a home phone. In this case, the home phone's microphone listens to telephone conversations, and the listened conversation data is transcribed and sent to the generative AI. The generative AI determines from the conversation data that fraud is suspected, and if fraud is suspected, issues an audio alert from the home phone's speaker.

「形態例３」 "Example 3"

本発明のさらなる実施形態として、生成系ＡＩが特殊詐欺の可能性を示す特定のフレーズまたはパターンを検出する形態がある。具体的には、生成系ＡＩは「振り込め詐欺」の典型的なフレーズや「オレオレ詐欺」の典型的なパターンなどを学習し、これらを検出することで詐欺の可能性を判断する。 A further embodiment of the present invention is one in which the generative AI detects specific phrases or patterns that indicate the possibility of specialized fraud. Specifically, the generative AI learns typical phrases for "bank transfer fraud" and typical patterns for "it's me" fraud, and by detecting these, determines the possibility of fraud.

以下に、各形態例の処理の流れについて説明する。 The processing flow for each example is explained below.

「形態例１」 "Example 1"

ステップ１：スマートフォンにインストールされたアプリケーションが電話の入電を検知する。 Step 1: The application installed on the smartphone detects an incoming call.

ステップ２：同アプリケーションがスマートフォンのマイクから会話を聞き取る。 Step 2: The application listens to your conversation through your smartphone's microphone.

ステップ３：聞き取った会話データは文字化され、生成系ＡＩへ送信される。 Step 3: The conversation data that was heard is transcribed and sent to the generative AI.

ステップ４：生成系ＡＩは、会話データから詐欺が疑われると判断する。 Step 4: The generative AI determines that fraud is suspected based on the conversation data.

ステップ５：詐欺が疑われる場合には、スマートフォンのスピーカーから音声アラートを発する。 Step 5: If fraud is suspected, an audio alert will be issued through your smartphone's speaker.

「形態例２」 "Example 2"

ステップ１：家庭用電話に設置されたマイクが電話の会話を聞き取る。 Step 1: A microphone installed on a home phone listens to the phone conversation.

ステップ２：聞き取った会話データは文字化され、生成系ＡＩへ送信される。 Step 2: The conversation data that is heard is transcribed and sent to the generative AI.

ステップ３：生成系ＡＩは、会話データから詐欺が疑われると判断する。 Step 3: The generative AI determines that fraud is suspected based on the conversation data.

ステップ４：詐欺が疑われる場合には、家庭用電話のスピーカーから音声アラートを発する。 Step 4: If fraud is suspected, an audio alert will be issued over the home phone speaker.

「形態例３」 "Example 3"

ステップ１：生成系ＡＩは特殊詐欺の可能性を示す特定のフレーズまたはパターンを学習する。 Step 1: The generative AI learns specific phrases or patterns that indicate potential fraud.

ステップ２：生成系ＡＩは、会話データから詐欺が疑われると判断するために、学習した特定のフレーズまたはパターンを検出する。 Step 2: The generative AI detects specific phrases or patterns it has learned to identify suspected fraud in the conversation data.

ステップ３：詐欺が疑われる場合には、音声アラートを発する。 Step 3: If fraud is suspected, an audio alert will be issued.

（実施例１） (Example 1)

次に、形態例１の実施例１について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、スマートデバイス１４を「端末」と称する。 Next, we will explain Example 1 of Form Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart device 14 will be referred to as the "terminal."

近年、電話を利用した詐欺が増加しており、特に高齢者を狙った特殊詐欺が社会問題となっている。従来の対策では、詐欺電話を受けた際にユーザーが自ら詐欺を見抜く必要があり、詐欺の手口が巧妙化する中でその判断が難しくなっている。したがって、ユーザーが詐欺電話を受けた際に自動的に詐欺の可能性を検知し、警告を発するシステムが求められている In recent years, telephone fraud has been on the rise, with special frauds targeting the elderly becoming a particular social problem. Conventional countermeasures require users to identify fraudulent calls themselves, but as fraud methods become more sophisticated, this becomes increasingly difficult. Therefore, there is a need for a system that automatically detects potential fraud and issues a warning when users receive a fraudulent call.

実施例１におけるデータ処理装置１２の特定処理部２９０による特定処理を、以下の各手段により実現する。この発明では、サーバは、詐欺電話の入電を検知する手段と、通信端末に設置したマイクから会話を聞き取る手段と、会話データを文字化し生成AIモデルへ送信する手段と、生成AIモデルが会話データから詐欺が疑われると判断した場合に、前記通信端末のスピーカーから音声アラートを発する手段と、を含む。これにより、ユーザーが詐欺電話を受けた際に自動的に詐欺の可能性を検知し、即座に警告を発することが可能となる。 The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means. In this invention, the server includes means for detecting incoming fraudulent calls, means for listening to the conversation from a microphone installed in the communication terminal, means for converting the conversation data into text and sending it to the generative AI model, and means for issuing an audio alert from the speaker of the communication terminal if the generative AI model determines that fraud is suspected based on the conversation data. This makes it possible to automatically detect the possibility of fraud when a user receives a fraudulent call and immediately issue a warning.

「詐欺電話の入電を検知する手段」とは、通信端末において電話の着信を検知するための機能や装置である。 "Means for detecting incoming fraudulent calls" refers to functions or devices in a communications terminal for detecting incoming calls.

「通信端末に設置したマイクから会話を聞き取る手段」とは、通信端末に内蔵または接続されたマイクを用いて、通話中の音声をリアルタイムで収集する機能や装置である。 "Means for listening to conversations using a microphone installed in a communications terminal" refers to a function or device that uses a microphone built into or connected to a communications terminal to collect audio during a call in real time.

「会話データを文字化し生成AIモデルへ送信する手段」とは、収集した音声データをテキストデータに変換し、そのテキストデータを生成AIモデルに送信するための機能や装置である。 "Means for converting conversation data into text and sending it to the generative AI model" refers to functions or devices for converting collected voice data into text data and sending that text data to the generative AI model.

「生成AIモデル」とは、入力されたデータを解析し、特定のパターンやフレーズを検出するための人工知能モデルである。 A "generative AI model" is an artificial intelligence model that analyzes input data and detects specific patterns and phrases.

「詐欺が疑われると判断した場合に、前記通信端末のスピーカーから音声アラートを発する手段」とは、生成AIモデルが詐欺の可能性を検出した際に、通信端末のスピーカーを通じて警告音やメッセージを再生するための機能や装置である。 "Means for issuing an audio alert from the speaker of the communication terminal when it is determined that fraud is suspected" refers to a function or device for playing an alert sound or message through the speaker of the communication terminal when the generative AI model detects the possibility of fraud.

「特殊詐欺の可能性を示す特定のフレーズまたはパターン」とは、詐欺行為に関連する特定の言葉や文の構造であり、生成AIモデルがこれを検出することで詐欺の可能性を判断する基準となるものである。 "Specific phrases or patterns that indicate the possibility of specialized fraud" are specific words or sentence structures associated with fraudulent acts, and the generative AI model's detection of these serves as the basis for determining the possibility of fraud.

「音声アラート」とは、ユーザーに対して音声で警告を発するためのメッセージや音である。 "Audio alert" is a message or sound that warns the user audibly.

この発明は、通信端末にインストールされたアプリケーションを用いて、詐欺電話を自動的に検知し、ユーザーに警告を発するシステムである。以下に、このシステムの具体的な実施形態を説明する。 This invention is a system that automatically detects fraudulent calls and issues a warning to the user using an application installed on a communication terminal. A specific embodiment of this system is described below.

まず、ユーザは通信端末に専用のアプリケーションをインストールする。このアプリケーションは、電話の入電を検知する機能を持つ。具体的には、通信端末の電話APIを利用して、入電イベントをキャッチする。 First, the user installs a dedicated application on their communication device. This application has the function of detecting incoming phone calls. Specifically, it uses the communication device's phone API to catch incoming call events.

ユーザが電話に出ると、端末はマイクを通じて会話をリアルタイムで聞き取る。この際、アプリケーションはバックグラウンドで動作し、会話内容をキャプチャする。聞き取った会話データは、Google（登録商標） Speech-to-Text APIを使用して文字化される。音声データをAPIに送信し、テキストデータとして受け取る。 When a user answers a call, the device listens to the conversation in real time through the microphone. During this time, the application runs in the background and captures the conversation. The listened conversation data is transcribed using the Google (registered trademark) Speech-to-Text API. The audio data is sent to the API and received as text data.

文字化された会話データは、インターネットを通じてサーバに送信される。送信にはHTTPSプロトコルを使用し、データのセキュリティを確保する。サーバは、受信した会話データを生成AIモデル（例えば、OpenAI（登録商標）のGPT-4（登録商標））に入力する。生成AIモデルは、プロンプト文を用いて詐欺の可能性を解析する。具体的なプロンプト文の例は以下の通りである： The transcribed conversation data is sent to a server via the internet. The HTTPS protocol is used for transmission to ensure data security. The server inputs the received conversation data into a generative AI model (e.g., OpenAI's GPT-4 (registered trademark)). The generative AI model uses prompt text to analyze the possibility of fraud. Specific examples of prompt text are as follows:

以下の会話データを解析し、詐欺の可能性があるかどうかを判断してください。 Analyze the conversation data below to determine if it may be a scam.

会話データ: "あなたの銀行口座が不正利用されています。確認のために口座番号を教えてください" Conversation data: "Your bank account has been compromised. Please provide your account number for verification."

生成AIモデルは、会話データを解析し、詐欺が疑われるかどうかを判断する。詐欺が疑われると判断された場合、サーバはその情報を端末に返送する。返送には再びHTTPSプロトコルを使用する。 The generative AI model analyzes the conversation data and determines whether fraud is suspected. If fraud is suspected, the server sends that information back to the device, again using the HTTPS protocol.

端末は、サーバから受信した判定結果を解析し、詐欺が疑われる場合には通信端末のスピーカーから音声アラートを発する。具体的には、「詐欺の可能性があります。注意してください」といった音声メッセージを再生する。このアラートにより、ユーザは詐欺の可能性に気づき、適切な対応を取ることができる。 The device analyzes the results received from the server, and if fraud is suspected, an audio alert is emitted from the communication device's speaker. Specifically, an audio message such as "There is a possibility of fraud. Please be careful" is played. This alert makes the user aware of the possibility of fraud and allows them to take appropriate action.

具体例として、ユーザが「あなたの銀行口座が不正利用されています。確認のために口座番号を教えてください」という電話を受けた場合を考える。この会話がマイクで聞き取られ、文字化される。文字化されたデータは「あなたの銀行口座が不正利用されています。確認のために口座番号を教えてください」となる。このデータが生成AIモデルに送信され、モデルは詐欺の可能性が高いと判断する。サーバはその結果を端末に返送し、端末は音声アラートを発する。ユーザはこのアラートを聞いて、詐欺の可能性があることに気づくことができる。 As a concrete example, consider the case where a user receives a phone call saying, "Your bank account has been fraudulently used. Please provide your account number to confirm." This conversation is picked up by a microphone and transcribed. The transcribed data becomes, "Your bank account has been fraudulently used. Please provide your account number to confirm." This data is sent to a generative AI model, which determines that there is a high possibility of fraud. The server sends the result back to the device, which then issues an audio alert. The user hears this alert and becomes aware of the possibility of fraud.

このようにして、この発明はユーザが詐欺電話を受けた際に自動的に詐欺の可能性を検知し、即座に警告を発することが可能となる。 In this way, the invention can automatically detect possible fraud when a user receives a fraudulent call and immediately issue a warning.

実施例１における特定処理の流れについて図１１を用いて説明する。 The flow of the identification process in Example 1 will be explained using Figure 11.

ステップ１：入電の検知 Step 1: Detect incoming calls

端末は、通信端末にインストールされたアプリケーションを通じて電話の入電を検知する。具体的には、通信端末の電話APIを利用して、入電イベントをキャッチする。入力は電話の着信信号であり、出力は入電の検知イベントである。 The terminal detects incoming calls through an application installed on the communication terminal. Specifically, it uses the communication terminal's telephone API to catch incoming call events. The input is the incoming call signal, and the output is the incoming call detection event.

ステップ２：会話の聞き取り Step 2: Listen to the conversation

ユーザが電話に出ると、端末はマイクを通じて会話をリアルタイムで聞き取る。アプリケーションはバックグラウンドで動作し、会話内容をキャプチャする。入力は通話中の音声データであり、出力は収集された音声データである。 When a user answers a call, the device listens to the conversation in real time through the microphone. The application runs in the background and captures the conversation. The input is the audio data during the call, and the output is the collected audio data.

ステップ３：会話データの文字化 Step 3: Transcribe the conversation data

端末は、聞き取った会話データを文字化するためにGoogle Speech-to-Text APIを使用する。音声データをAPIに送信し、テキストデータとして受け取る。入力は収集された音声データであり、出力は文字化されたテキストデータである。 The device uses the Google Speech-to-Text API to transcribe the conversation data it hears. It sends the audio data to the API and receives it as text data. The input is the collected audio data, and the output is the transcribed text data.

ステップ４：文字化データの送信 Step 4: Send the transcribed data

端末は、文字化された会話データをサーバに送信する。送信にはHTTPSプロトコルを使用し、データのセキュリティを確保する。入力は文字化されたテキストデータであり、出力はサーバへのデータ送信イベントである。 The terminal sends the transcribed conversation data to the server. The HTTPS protocol is used for transmission to ensure data security. The input is transcribed text data, and the output is a data transmission event to the server.

ステップ５：詐欺判定 Step 5: Fraud Detection

サーバは、受信した会話データを生成AIモデルに入力する。生成AIモデルは、プロンプト文を用いて詐欺の可能性を解析する。具体的なプロンプト文の例は以下の通りである： The server inputs the received conversation data into a generative AI model. The generative AI model uses prompts to analyze the possibility of fraud. Specific examples of prompts are as follows:

入力は文字化されたテキストデータであり、出力は詐欺の可能性に関する判定結果である。 The input is transcribed text data, and the output is a judgment result regarding the possibility of fraud.

ステップ６：判定結果の受信 Step 6: Receive the results

サーバは、生成AIモデルからの判定結果を受け取り、その結果を端末に返送する。返送には再びHTTPSプロトコルを使用する。入力は生成AIモデルからの判定結果であり、出力は端末へのデータ送信イベントである。 The server receives the judgment results from the generative AI model and returns them to the device. The return is again done using the HTTPS protocol. The input is the judgment result from the generative AI model, and the output is a data transmission event to the device.

ステップ７：音声アラートの発生 Step 7: Trigger an audio alert

端末は、サーバから受信した判定結果を解析し、詐欺が疑われる場合には通信端末のスピーカーから音声アラートを発する。具体的には、「詐欺の可能性があります。注意してください」といった音声メッセージを再生する。入力はサーバからの判定結果であり、出力は音声アラートの発生である。 The device analyzes the judgment results received from the server, and if fraud is suspected, it issues an audio alert from the communication device's speaker. Specifically, it plays an audio message such as "There is a possibility of fraud. Please be careful." The input is the judgment result from the server, and the output is the generation of an audio alert.

（応用例１） (Application Example 1)

次に、形態例１の応用例１について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、スマートデバイス１４を「端末」と称する。 Next, we will explain Application Example 1 of Form Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart device 14 will be referred to as the "terminal."

近年、電話を利用した詐欺が増加しており、特に高齢者を狙った特殊詐欺が社会問題となっている。従来の詐欺防止策では、ユーザーが詐欺の可能性に気づくことが難しく、被害を未然に防ぐことが困難であった。そこで、電話の会話内容をリアルタイムで解析し、詐欺の可能性がある場合に即座に警告を発するシステムが求められている In recent years, telephone fraud has been on the rise, with special fraud targeting the elderly becoming a particular social problem. Conventional fraud prevention measures make it difficult for users to detect potential fraud, making it difficult to prevent damage before it occurs. Therefore, there is a need for a system that can analyze telephone conversations in real time and immediately issue a warning if there is a possibility of fraud.

応用例１におけるデータ処理装置１２の特定処理部２９０による特定処理を、以下の各手段により実現する。 The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

この発明では、サーバは、詐欺電話の入電を検知する手段と、通信端末に設置した音声入力装置から会話を聞き取る手段と、会話データを文字化し生成系ＡＩへ送信する手段と、生成系ＡＩが会話データから詐欺が疑われると判断した場合に、前記音声出力装置から音声アラートを発する手段と、前記生成系ＡＩが、詐欺の可能性を示す特定のフレーズまたはパターンを検出する手段と、前記音声アラートが、ユーザーが詐欺の可能性に気づくことを促す内容である手段と、を含む。これにより、ユーザーが詐欺の可能性に即座に気づき、被害を未然に防ぐことが可能となる。 In this invention, the server includes means for detecting incoming fraudulent calls, means for listening to conversations from a voice input device installed in the communication terminal, means for converting the conversation data into text and sending it to the generative AI, means for issuing an audio alert from the voice output device if the generative AI determines that fraud is suspected from the conversation data, means for the generative AI to detect specific phrases or patterns that indicate the possibility of fraud, and means for the audio alert to contain content that encourages the user to become aware of the possibility of fraud. This allows the user to immediately become aware of the possibility of fraud and prevent damage from occurring.

「詐欺電話」とは、不正な手段で金銭や個人情報を騙し取ることを目的とした電話である。 A "fraudulent call" is a call made with the intent of defrauding someone of money or personal information through fraudulent means.

「入電」とは、電話がかかってくることを指す。 "Incoming call" refers to an incoming phone call.

「通信端末」とは、音声やデータの送受信を行うための装置であり、スマートフォンや家庭用電話などが含まれる。 A "communications terminal" is a device for sending and receiving voice and data, including smartphones and home telephones.

「音声入力装置」とは、音声をデジタルデータに変換するための装置であり、マイクロフォンがこれに該当する。 An "audio input device" is a device that converts audio into digital data, such as a microphone.

「会話データ」とは、音声入力装置によって取得された音声を文字化したデータである。 "Conversation data" refers to data that has been converted into text from speech captured by a voice input device.

「生成系ＡＩ」とは、入力されたデータを解析し、特定のパターンやフレーズを検出する人工知能である。 "Generative AI" is artificial intelligence that analyzes input data and detects specific patterns and phrases.

「音声出力装置」とは、デジタルデータを音声として出力するための装置であり、スピーカーがこれに該当する。 An "audio output device" is a device for outputting digital data as audio, and a speaker is an example of this.

「音声アラート」とは、音声出力装置から発せられる警告音やメッセージである。 "Audio alert" is a warning sound or message emitted from an audio output device.

「特定のフレーズまたはパターン」とは、詐欺の可能性を示す特徴的な言葉や文の構造である。 "Specific phrases or patterns" are characteristic words or sentence structures that indicate possible fraud.

「ユーザー」とは、このシステムを利用する個人または団体を指す。 "User" refers to an individual or organization that uses this system.

この発明を実施するためのシステムは、詐欺電話の入電を検知し、通信端末に設置された音声入力装置から会話を聞き取る機能を持つ。会話データはリアルタイムで文字化され、生成系AIに送信される。生成系AIは、会話データから詐欺の可能性を示す特定のフレーズまたはパターンを検出し、詐欺が疑われる場合には音声出力装置から音声アラートを発する。 A system for implementing this invention has the ability to detect incoming fraudulent calls and listen to the conversation from a voice input device installed on the communication terminal. The conversation data is transcribed in real time and sent to a generative AI. The generative AI detects specific phrases or patterns from the conversation data that indicate the possibility of fraud, and if fraud is suspected, issues a voice alert from a voice output device.

使用するハードウェアおよびソフトウェア Hardware and software used

ハードウェア：スマートフォンや家庭用電話などの通信端末、マイクロフォン（音声入力装置）、スピーカー（音声出力装置） Hardware: Communication devices such as smartphones and home phones, microphones (audio input devices), speakers (audio output devices)

ソフトウェア： Software:

SpeechRecognition：音声をテキストに変換するためのPythonライブラリ SpeechRecognition: A Python library for converting speech to text

requests：HTTPリクエストを送信するためのPythonライブラリ requests: A Python library for sending HTTP requests

生成系AI：外部APIを使用してテキストデータを解析する人工知能 Generative AI: Artificial intelligence that analyzes text data using external APIs

システムの動作 System Operation

1. 入電検知：通信端末が電話の入電を検知すると、会話の録音を開始する。 1. Incoming call detection: When the communication terminal detects an incoming call, it begins recording the conversation.

2. 会話の文字化：録音された音声データは、SpeechRecognitionライブラリを用いてリアルタイムで文字化される。 2. Conversation transcription: Recorded audio data is transcribed in real time using the SpeechRecognition library.

3. 生成系AIによる解析：文字化された会話データは、HTTPリクエストを通じて生成系AIに送信される。生成系AIは、詐欺の可能性を示す特定のフレーズまたはパターンを検出する。 3. Generative AI Analysis: The transcribed conversation data is sent to a generative AI via an HTTP request. The generative AI detects specific phrases or patterns that indicate potential fraud.

4. アラート発信：生成系AIが詐欺の可能性を検出した場合、通信端末のスピーカーから音声アラートが発せられる。この音声アラートは、ユーザーが詐欺の可能性に気づくことを促す内容である。 4. Alert generation: If the generative AI detects the possibility of fraud, an audio alert will be emitted from the communication device's speaker. This audio alert will alert the user to the possibility of fraud.

具体例 Specific examples

例えば、ユーザーが電話を受け、相手が「あなたの銀行口座が不正利用されています。すぐに口座情報を教えてください」と言った場合、システムはこの会話をリアルタイムで文字化し、生成系AIに送信する。生成系AIは、このフレーズが詐欺の可能性を示す特定のパターンに一致することを検出し、即座に音声アラートを発する。 For example, if a user receives a phone call and the caller says, "Your bank account has been fraudulently accessed. Please provide your account details immediately," the system will transcribe this conversation in real time and send it to the generative AI. The generative AI will detect that this phrase matches a specific pattern that indicates potential fraud and immediately issue an audio alert.

プロンプト文の例 Example prompt

生成系AIモデルへのプロンプト文の例は以下の通りである： Examples of prompts for generative AI models include:

"以下のテキストが詐欺の可能性があるかどうかを判断してください：'あなたの銀行口座が不正利用されています。すぐに口座情報を教えてください。'" "Determine whether the following text is potentially fraudulent: 'Your bank account has been compromised. Please provide your account information immediately.'"

このようにして、ユーザーは詐欺の可能性に即座に気づき、被害を未然に防ぐことが可能となる。 In this way, users can immediately become aware of potential fraud and prevent damage before it occurs.

応用例１における特定処理の流れについて図１２を用いて説明する。 The flow of the specific processing in Application Example 1 will be explained using Figure 12.

ステップ１：入電検知 Step 1: Incoming call detection

通信端末が電話の入電を検知する。入力は電話の着信信号であり、出力は録音開始のトリガーである。具体的には、通信端末の電話アプリケーションが着信を検知し、録音モジュールを起動する。 The communication terminal detects an incoming call. The input is the incoming call signal, and the output is a trigger to start recording. Specifically, the communication terminal's phone application detects the incoming call and activates the recording module.

ステップ２：会話の録音 Step 2: Record the conversation

通信端末の音声入力装置（マイクロフォン）が会話を録音する。入力は通話中の音声データであり、出力は録音された音声ファイルである。具体的には、録音モジュールが通話中の音声をキャプチャし、音声ファイルとして保存する。 The communication terminal's audio input device (microphone) records the conversation. The input is the audio data during the call, and the output is the recorded audio file. Specifically, the recording module captures the audio during the call and saves it as an audio file.

ステップ３：会話の文字化 Step 3: Transcribe the conversation

録音された音声データをSpeechRecognitionライブラリを用いて文字化する。入力は音声ファイルであり、出力はテキストデータである。具体的には、音声認識エンジンが音声ファイルを解析し、対応するテキストを生成する。 Recorded audio data is converted into text using the SpeechRecognition library. The input is an audio file, and the output is text data. Specifically, the speech recognition engine analyzes the audio file and generates the corresponding text.

ステップ４：生成系AIによる解析 Step 4: Analysis using generative AI

文字化されたテキストデータをHTTPリクエストを通じて生成系AIに送信する。入力はテキストデータであり、出力は詐欺の可能性を示すフラグである。具体的には、テキストデータを含むリクエストを生成系AIのAPIエンドポイントに送信し、解析結果を受け取る。 The transcribed text data is sent to the generative AI via an HTTP request. The input is text data, and the output is a flag indicating the possibility of fraud. Specifically, a request containing the text data is sent to the generative AI's API endpoint, and the analysis results are received.

ステップ５：詐欺の可能性の判断 Step 5: Determine the possibility of fraud

生成系AIがテキストデータを解析し、詐欺の可能性を示す特定のフレーズまたはパターンを検出する。入力はテキストデータであり、出力は詐欺の可能性を示すフラグである。具体的には、生成系AIがテキストデータを解析し、詐欺の可能性が高い場合にフラグを立てる。 Generative AI analyzes text data to detect specific phrases or patterns that indicate potential fraud. The input is text data, and the output is a flag indicating potential fraud. Specifically, generative AI analyzes text data and flags cases where there is a high probability of fraud.

ステップ６：アラート発信 Step 6: Send an alert

詐欺の可能性が検出された場合、通信端末の音声出力装置（スピーカー）から音声アラートを発する。入力は詐欺の可能性を示すフラグであり、出力は音声アラートである。具体的には、音声アラートモジュールが詐欺の可能性を示すフラグを受け取り、ユーザーに警告する音声メッセージを再生する。 If possible fraud is detected, an audio alert is emitted from the communication terminal's audio output device (speaker). The input is a flag indicating possible fraud, and the output is an audio alert. Specifically, the audio alert module receives a flag indicating possible fraud and plays an audio message to warn the user.

（実施例２） (Example 2)

次に、形態例２の実施例２について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、スマートデバイス１４を「端末」と称する。 Next, we will explain Example 2 of Form Example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart device 14 will be referred to as the "terminal."

近年、詐欺電話の手口が巧妙化しており、多くの人々が被害に遭っている。特に高齢者を狙った特殊詐欺が増加しており、これに対する効果的な対策が求められている。従来の対策では、詐欺電話をリアルタイムで検知し、ユーザーに警告を発するシステムが不足しているため、被害を未然に防ぐことが難しいという課題がある In recent years, fraudulent phone scams have become more sophisticated, resulting in many people falling victim to them. Special frauds targeting the elderly in particular are on the rise, creating a need for effective countermeasures. Conventional countermeasures lack systems that can detect fraudulent phone calls in real time and issue warnings to users, making it difficult to prevent fraud before it occurs.

実施例２におけるデータ処理装置１２の特定処理部２９０による特定処理を、以下の各手段により実現する。 The specific processing performed by the specific processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

この発明では、サーバは、詐欺電話の入電を検知する手段と、通信装置に設置した音声入力装置から会話を収集する手段と、会話データを文字データに変換する手段と、文字データを生成AIモデルへ送信する手段と、生成AIモデルが会話データから詐欺が疑われると判断した場合に、前記音声入力装置から音声アラートを発する手段と、を含む。これにより、詐欺電話をリアルタイムで検知し、ユーザーに即座に警告を発することが可能となる。 In this invention, the server includes means for detecting incoming fraudulent calls, means for collecting conversation from a voice input device installed in the communication device, means for converting the conversation data into text data, means for transmitting the text data to the generative AI model, and means for issuing a voice alert from the voice input device if the generative AI model determines that fraud is suspected based on the conversation data. This makes it possible to detect fraudulent calls in real time and issue an immediate warning to the user.

「詐欺電話の入電を検知する手段」とは、通信装置に対して詐欺の可能性がある電話の着信を識別し、検知するための機能である。 "Means for detecting incoming fraudulent calls" refers to a function that identifies and detects incoming calls that may be fraudulent to a communications device.

「音声入力装置」とは、通信装置に設置され、ユーザーの会話を収集するためのマイクロフォンやその他の音声キャプチャデバイスである。 An "audio input device" is a microphone or other audio capture device installed on a communications device to capture a user's speech.

「会話データを文字データに変換する手段」とは、収集された音声データを解析し、テキスト形式に変換するための音声認識ソフトウェアやアルゴリズムである。 "Means for converting conversational data into text data" refers to speech recognition software or algorithms used to analyze collected voice data and convert it into text format.

「生成AIモデル」とは、入力されたテキストデータを解析し、特定のパターンやフレーズを基に詐欺の可能性を判断するための人工知能モデルである。 A "generative AI model" is an artificial intelligence model that analyzes input text data and determines the possibility of fraud based on specific patterns and phrases.

「音声アラートを発する手段」とは、詐欺が疑われる場合に、ユーザーに警告を発するための音声メッセージを再生する機能である。 "Means for issuing audio alerts" refers to a function that plays an audio message to warn the user if fraud is suspected.

この発明は、詐欺電話の入電をリアルタイムで検知し、ユーザーに警告を発するシステムである。以下に、このシステムの具体的な実施形態を説明する。 This invention is a system that detects incoming fraudulent calls in real time and issues a warning to users. A specific embodiment of this system is described below.

1. システムの構成 1. System Configuration

このシステムは、以下の主要なコンポーネントから構成される。 The system consists of the following main components:

詐欺電話の入電を検知する手段 Methods for detecting incoming fraudulent calls

音声入力装置 Voice input device

会話データを文字データに変換する手段 Method of converting conversation data into text data

生成AIモデル Generative AI model

音声アラートを発する手段 Method for issuing audio alerts

2. ハードウェアおよびソフトウェアの使用 2. Hardware and Software Use

音声入力装置としては、家庭用電話に設置されたマイクロフォンを使用する。このマイクロフォンは、ユーザーの会話をリアルタイムで収集する。 The voice input device uses a microphone installed in a home telephone. This microphone collects the user's speech in real time.

会話データを文字データに変換する手段としては、音声認識ソフトウェア（例: Google Speech-to-Text API）を使用する。このソフトウェアは、収集された音声データを解析し、テキストデータに変換する。 To convert conversation data into text data, we use voice recognition software (e.g., Google Speech-to-Text API). This software analyzes the collected voice data and converts it into text data.

生成AIモデルとしては、OpenAI GPT-4などの高度な人工知能モデルを使用する。このモデルは、入力されたテキストデータを解析し、詐欺の可能性を判断する。 The generative AI model uses advanced artificial intelligence models such as OpenAI GPT-4. This model analyzes the input text data and determines the likelihood of fraud.

音声アラートを発する手段としては、家庭用電話のスピーカーを使用する。このスピーカーは、詐欺が疑われる場合に音声アラートを発する。 The audio alert is generated using the speaker on a home phone, which will emit an audio alert if fraud is suspected.

3. システムの動作 3. System Operation

ユーザーが家庭用電話で会話を始めると、音声入力装置が会話を収集する。収集された音声データは、音声認識ソフトウェアによってテキストデータに変換される。このテキストデータがサーバに送信され、生成AIモデルに入力される。生成AIモデルが会話データを解析し、詐欺が疑われると判断した場合、サーバが家庭用電話にアラート信号を送信する。家庭用電話のスピーカーが音声アラートを発し、ユーザーに警告を行う。 When a user starts a conversation on their home phone, a voice input device collects the conversation. The collected voice data is converted into text data using voice recognition software. This text data is sent to a server and input into a generative AI model. The generative AI model analyzes the conversation data and, if it determines that fraud is suspected, the server sends an alert signal to the home phone. The home phone's speaker emits an audio alert to warn the user.

4. 具体例 4. Specific Examples

例えば、ユーザーが家庭用電話で「こんにちは、あなたの銀行口座が危険にさらされています。すぐに対応が必要です。」という会話をしているとする。音声入力装置がこの会話を収集し、音声認識ソフトウェアがテキストデータに変換する。このテキストデータがサーバに送信され、生成AIモデルが解析を行う。生成AIモデルが詐欺の可能性が高いと判断すると、サーバが家庭用電話にアラート信号を送信する。家庭用電話のスピーカーが「詐欺の可能性があります。注意してください。」という音声アラートを発する。 For example, suppose a user is speaking on their home phone, "Hello, your bank account has been compromised. Immediate action is required." A voice input device collects this speech, and speech recognition software converts it into text data. This text data is sent to a server, where it is analyzed by a generative AI model. If the generative AI model determines that there is a high possibility of fraud, the server sends an alert signal to the home phone. The speaker on the home phone will emit an audio alert saying, "Possible fraud. Please be careful."

5. プロンプト文の例 5. Prompt Sentence Examples

「以下の会話データを解析し、詐欺が疑われるかどうかを判断してください。詐欺が疑われる場合は、その理由も説明してください。 "Analyze the conversation data below and determine whether or not you suspect fraud. If you suspect fraud, please explain why.

会話データ: 'こんにちは、あなたの銀行口座が危険にさらされています。すぐに対応が必要です。'」 Conversation data: 'Hello, your bank account has been compromised. We need your immediate attention.'

このようにして、ユーザーは詐欺のリスクをリアルタイムで認識することができる。 In this way, users can be aware of their fraud risk in real time.

実施例２における特定処理の流れについて図１３を用いて説明する。 The flow of the identification process in Example 2 will be explained using Figure 13.

ステップ１： Step 1:

音声の収集 Audio collection

端末: 家庭用電話に設置された音声入力装置（マイク）が、ユーザーの会話をリアルタイムで収集する。 Device: A voice input device (microphone) installed on a home phone collects the user's conversation in real time.

入力: ユーザーの音声会話 Input: User's voice conversation

出力: デジタル音声データ Output: Digital audio data

具体的な動作: ユーザーが電話で会話を始めると、マイクが自動的に音声をキャプチャし、デジタル音声データとして保存する。 What it does: When a user starts a phone conversation, the microphone automatically captures the audio and stores it as digital audio data.

ステップ２： Step 2:

音声データの文字化 Transcription of audio data

端末: 家庭用電話に内蔵された音声認識ソフトウェア（例: Google Speech-to-Text API）が、収集された音声データをテキストデータに変換する。 Device: Built-in voice recognition software (e.g., Google Speech-to-Text API) in home phones converts collected voice data into text data.

入力: デジタル音声データ Input: Digital audio data

出力: テキストデータ Output: Text data

具体的な動作: 音声認識ソフトウェアが音声データを解析し、「こんにちは、あなたの銀行口座が危険にさらされています。すぐに対応が必要です。」といったテキストデータを生成する。 What it does: Speech recognition software analyzes the audio data and generates text data such as, "Hello, your bank account has been compromised. Immediate action is required."

ステップ３： Step 3:

テキストデータの送信 Send text data

端末: 家庭用電話が、文字化された会話データをサーバに送信する。 Device: The home phone sends the transcribed conversation data to the server.

入力: テキストデータ Input: Text data

出力: サーバへのデータ送信 Output: Send data to the server

具体的な動作: 家庭用電話がインターネットを通じてサーバにデータを送信し、サーバがそのデータを生成AIモデルに渡す。 How it works: A home phone sends data over the internet to a server, which then passes the data to a generative AI model.

ステップ４： Step 4:

詐欺の検出 Fraud detection

サーバ: 生成AIモデル（例: OpenAI GPT-4）が、受信した会話データを解析し、詐欺が疑われるかどうかを判断する。 Server: A generative AI model (e.g., OpenAI GPT-4) analyzes the received conversation data and determines whether fraud is suspected.

入力: テキストデータ Input: Text data

出力: 詐欺の可能性に関する判断結果 Output: Judgment result regarding the likelihood of fraud

具体的な動作: 生成AIモデルが「こんにちは、あなたの銀行口座が危険にさらされています。すぐに対応が必要です。」というテキストを解析し、詐欺の可能性が高いと判断する。 Specific behavior: The generative AI model analyzes the text "Hello, your bank account has been compromised. Immediate action is required" and determines that it is likely to be fraudulent.

ステップ５： Step 5:

音声アラートの発生 Audio alert occurs

サーバ: 詐欺が疑われると判断された場合、サーバが家庭用電話にアラート信号を送信する。 Server: If fraud is suspected, the server sends an alert signal to the home phone.

端末: 家庭用電話のスピーカーが、サーバからのアラート信号を受け取り、音声アラートを発する。 Device: The speaker on your home phone receives the alert signal from the server and plays an audio alert.

入力: アラート信号 Input: Alert signal

出力: 音声アラート Output: Audio alert

具体的な動作: 家庭用電話のスピーカーが「詐欺の可能性があります。注意してください。」という音声アラートを再生する。 Specific behavior: The speaker on your home phone will play a voice alert saying, "This may be a scam. Be careful."

（応用例２） (Application Example 2)

次に、形態例２の応用例２について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、スマートデバイス１４を「端末」と称する。 Next, we will explain Application Example 2 of Form Example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart device 14 will be referred to as the "terminal."

近年、詐欺電話による被害が増加しており、特に高齢者を狙った特殊詐欺が社会問題となっている。従来の詐欺防止システムは、詐欺電話の検知が不十分であり、ユーザーが詐欺の可能性に気づくことが難しいという課題があった。さらに、家庭用電話やスマートフォンにおいて、リアルタイムで詐欺を検知し、ユーザーに警告を発するシステムが求められている。 In recent years, the number of victims of fraudulent phone calls has been increasing, and special frauds targeting the elderly in particular have become a social problem. Conventional fraud prevention systems have been inadequate in detecting fraudulent calls, making it difficult for users to recognize the possibility of fraud. Furthermore, there is a demand for systems that can detect fraud in real time and issue warnings to users on home phones and smartphones.

応用例２におけるデータ処理装置１２の特定処理部２９０による特定処理を、以下の各手段により実現する。 The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

この発明では、サーバは、詐欺電話の入電を検知する手段と、音声認識装置から会話を聞き取る手段と、会話データを文字化し生成系ＡＩへ送信する手段と、生成系ＡＩが会話データから詐欺が疑われると判断した場合に、音声出力装置から音声アラートを発する手段と、スマートフォンにインストールされるアプリケーションを含む手段と、を含む。これにより、詐欺電話のリアルタイム検知とユーザーへの迅速な警告が可能となる。 In this invention, the server includes means for detecting incoming fraudulent calls, means for listening to conversations using a voice recognition device, means for converting the conversation data into text and sending it to the generative AI, means for issuing an audio alert from a voice output device if the generative AI determines that fraud is suspected based on the conversation data, and means including an application to be installed on a smartphone. This enables real-time detection of fraudulent calls and prompt warnings to users.

「音声認識装置」とは、音声をテキストデータに変換する装置である。 A "voice recognition device" is a device that converts voice into text data.

「会話データ」とは、音声認識装置によってテキスト化された会話の内容を指す。 "Conversation data" refers to the content of a conversation converted into text by a voice recognition device.

「生成系ＡＩ」とは、入力されたデータを基に新たな情報を生成する人工知能である。 "Generative AI" is artificial intelligence that generates new information based on input data.

「音声出力装置」とは、音声を出力するための装置である。 An "audio output device" is a device for outputting audio.

「音声アラート」とは、音声によって警告を発する機能である。 "Voice alert" is a function that issues a warning via voice.

「スマートフォンにインストールされるアプリケーション」とは、スマートフォン上で動作するソフトウェアプログラムである。 "Applications installed on smartphones" are software programs that run on smartphones.

この発明を実施するためには、以下のハードウェアおよびソフトウェアを用いる。ハードウェアとしては、スマートフォン、音声認識装置、音声出力装置が必要である。ソフトウェアとしては、音声認識ライブラリ（例：speech_recognition）、生成系AIライブラリ（例：openai）、テキスト読み上げライブラリ（例：pyttsx3）を使用する。 The following hardware and software are required to implement this invention. Hardware requires a smartphone, a voice recognition device, and a voice output device. Software requires a voice recognition library (e.g., speech_recognition), a generative AI library (e.g., openai), and a text-to-speech library (e.g., pyttsx3).

まず、スマートフォンのマイクを用いて通話内容をリアルタイムで録音する。録音された音声データは、音声認識装置を通じてテキストデータに変換される。この音声認識には、speech_recognitionライブラリを使用する。 First, the phone call is recorded in real time using the smartphone's microphone. The recorded voice data is converted into text data using a voice recognition device. The speech_recognition library is used for this voice recognition.

次に、変換されたテキストデータは生成系AIに送信される。生成系AIは、入力されたテキストデータを解析し、詐欺の可能性を判断する。この解析には、openaiライブラリを使用する。具体的には、以下のようなプロンプト文を生成系AIに送信する： The converted text data is then sent to the generative AI. The generative AI analyzes the input text data and determines whether it is fraudulent. This analysis is performed using the OpenAI library. Specifically, the following prompt is sent to the generative AI:

以下の会話が詐欺かどうか判断してください: Please determine if the following conversation is a scam:

「こんにちは、こちらは銀行のセキュリティ部門です。あなたの口座に不正アクセスがありましたので、口座番号と暗証番号を教えてください。」 "Hello, this is the bank's security department. Your account has been compromised. Please provide your account number and PIN."

詐欺の可能性がある場合は'詐欺'と返答してください。 If you suspect it may be a scam, please reply 'scam'.

生成系AIが詐欺の可能性があると判断した場合、スマートフォンの音声出力装置を用いて音声アラートを発する。この音声アラートには、pyttsx3ライブラリを使用する。音声アラートの内容は、ユーザーが詐欺の可能性に気づくことを促すものである。 If the generative AI determines that there is a possibility of fraud, it will issue an audio alert using the smartphone's audio output device. This audio alert uses the pyttsx3 library. The content of the audio alert is intended to alert the user to the possibility of fraud.

具体例として、ユーザーがスマートフォンで通話中に、詐欺の可能性がある会話が行われた場合、アプリケーションが自動的にその会話をテキスト化し、生成系AIに送信して詐欺の可能性を判断する。詐欺の可能性があると判断された場合、スマートフォンのスピーカーから「詐欺の可能性があります！」という音声アラートが発せられる。 For example, if a user is on a call on their smartphone and a conversation that could be fraudulent takes place, the application automatically converts the conversation into text and sends it to a generative AI to determine whether it is fraudulent. If it determines that there is a possibility of fraud, an audio alert will sound from the smartphone speaker saying, "Possible fraud!"

このようにして、この発明は詐欺電話のリアルタイム検知とユーザーへの迅速な警告を実現するものである。 In this way, the invention enables real-time detection of fraudulent calls and prompt warning to users.

応用例２における特定処理の流れについて図１４を用いて説明する。 The flow of the specific processing in Application Example 2 will be explained using Figure 14.

ステップ１： Step 1:

ユーザがスマートフォンで通話を開始する。スマートフォンのマイクが通話内容をリアルタイムで録音する。 The user initiates a call on their smartphone. The smartphone's microphone records the call in real time.

入力：通話音声 Input: Call audio

出力：録音された音声データ Output: Recorded audio data

具体的な動作：スマートフォンのマイクが通話音声をキャプチャし、音声データとして保存する。 Specific operation: The smartphone's microphone captures the call audio and saves it as audio data.

ステップ２： Step 2:

端末が録音された音声データを音声認識装置に送信する。音声認識装置が音声データをテキストデータに変換する。 The device sends the recorded voice data to a voice recognition device, which converts the voice data into text data.

入力：録音された音声データ Input: Recorded audio data

出力：テキストデータ Output: Text data

具体的な動作：音声認識ライブラリ（例：speech_recognition）が音声データを解析し、対応するテキストデータを生成する。 Specific operation: The speech recognition library (e.g., speech_recognition) analyzes the audio data and generates corresponding text data.

ステップ３： Step 3:

端末が生成されたテキストデータを生成系AIに送信する。生成系AIがテキストデータを解析し、詐欺の可能性を判断する。 The device sends the generated text data to the generative AI, which analyzes the text data and determines whether it is fraudulent.

入力：テキストデータ Input: Text data

出力：詐欺の可能性に関する判断結果 Output: Judgment result regarding the possibility of fraud

具体的な動作：生成系AIライブラリ（例：openai）がプロンプト文を基にテキストデータを解析し、詐欺の可能性があるかどうかを判断する。 Specific operation: A generative AI library (e.g., openai) analyzes text data based on the prompt and determines whether it is likely to be fraudulent.

ステップ４： Step 4:

サーバが生成系AIからの判断結果を受け取り、詐欺の可能性がある場合、音声アラートを生成する。 The server receives the judgment results from the generative AI and generates an audio alert if there is a possibility of fraud.

入力：詐欺の可能性に関する判断結果 Input: Judgment result regarding the possibility of fraud

出力：音声アラートの内容 Output: Audio alert content

具体的な動作：生成系AIが「詐欺の可能性があります！」という警告メッセージを生成する。 Specific behavior: The generative AI generates a warning message saying, "Possible fraud!"

ステップ５： Step 5:

端末が音声アラートの内容を音声出力装置に送信し、音声アラートを発する。 The device sends the contents of the audio alert to the audio output device, which then issues the audio alert.

入力：音声アラートの内容 Input: Audio alert content

出力：音声アラート Output: Audio alert

具体的な動作：テキスト読み上げライブラリ（例：pyttsx3）が音声アラートの内容を音声に変換し、スマートフォンのスピーカーから警告音声を出力する。 Specific operation: A text-to-speech library (e.g., pyttsx3) converts the contents of the audio alert into speech, and the warning sound is output from the smartphone speaker.

（実施例３） (Example 3)

次に、形態例３の実施例３について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、スマートデバイス１４を「端末」と称する。 Next, we will explain Example 3 of Form Example 3. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart device 14 will be referred to as the "terminal."

従来の詐欺検出システムは、詐欺電話の入電を検知することに限定されており、ユーザが受け取るテキストメッセージやメールに対する詐欺検出が不十分であった。また、詐欺の可能性をユーザに通知する手段が限定的であり、ユーザが詐欺の危険性に気づくのが遅れることがあった。これにより、詐欺被害を未然に防ぐことが難しいという課題があった Conventional fraud detection systems were limited to detecting incoming fraudulent phone calls and were inadequate at detecting fraud in text messages and emails received by users. Furthermore, there were limited ways to notify users of potential fraud, which could delay users' realization of the risk of fraud. This made it difficult to prevent fraud from occurring.

実施例３におけるデータ処理装置１２の特定処理部２９０による特定処理を、以下の各手段により実現する。 The specific processing performed by the specific processing unit 290 of the data processing device 12 in Example 3 is realized by the following means.

この発明では、サーバは、ユーザが入力したテキストデータをサーバに送信する手段と、サーバが生成系人工知能モデルを用いてテキストデータを解析し、詐欺の可能性を判断する手段と、サーバが解析結果を通信端末に送信する手段と、を含む。これにより、ユーザが受け取るテキストメッセージやメールに対しても詐欺の可能性を迅速に検出し、ユーザに通知することが可能となる。 In this invention, the server includes means for transmitting text data entered by the user to the server, means for the server to analyze the text data using a generative artificial intelligence model and determine the possibility of fraud, and means for the server to transmit the analysis results to the communication terminal. This makes it possible to quickly detect the possibility of fraud even in text messages or emails received by the user and notify the user.

「詐欺電話の入電を検知する手段」とは、通信回線を通じてかかってくる電話の中から詐欺の可能性がある電話を識別し、検出するための装置またはソフトウェアである。 "Means for detecting incoming fraudulent calls" refers to equipment or software that identifies and detects potentially fraudulent calls from among calls received over a communication line.

「通信端末に設置した音声入力装置」とは、スマートフォンや家庭用電話などの通信機器に取り付けられた、音声を収集するためのマイクロフォンや関連するハードウェアである。 "A voice input device installed on a communication terminal" refers to a microphone or related hardware for collecting voice that is attached to communication devices such as smartphones and home telephones.

「会話データを文字化し生成系人工知能へ送信する手段」とは、音声データをテキストデータに変換し、そのテキストデータを生成系人工知能に送信するためのソフトウェアまたはハードウェアである。 "Means for converting conversation data into text and sending it to the generative AI" refers to software or hardware for converting voice data into text data and sending that text data to the generative AI.

「生成系人工知能」とは、自然言語処理技術を用いてテキストデータを解析し、特定のパターンやフレーズを検出することができる人工知能モデルである。 "Generative AI" is an AI model that uses natural language processing technology to analyze text data and detect specific patterns and phrases.

「音声アラートを発する手段」とは、詐欺の可能性があると判断された場合に、音声で警告を発するための装置またはソフトウェアである。 "A means for issuing an audio alert" is a device or software that issues an audio warning if it determines that there is a possibility of fraud.

「ユーザが入力したテキストデータをサーバに送信する手段」とは、ユーザが通信端末に入力したテキストデータをインターネットなどの通信回線を通じてサーバに送信するためのソフトウェアまたはハードウェアである。 "Means for transmitting text data entered by a user to a server" refers to software or hardware for transmitting text data entered by a user into a communication terminal to a server via a communication line such as the Internet.

「サーバが生成系人工知能モデルを用いてテキストデータを解析し、詐欺の可能性を判断する手段」とは、サーバが受信したテキストデータを生成系人工知能モデルに入力し、その解析結果から詐欺の可能性を評価するためのソフトウェアまたはハードウェアである。 "Means for the server to analyze text data using a generative artificial intelligence model and determine the possibility of fraud" refers to software or hardware that inputs the text data received by the server into a generative artificial intelligence model and evaluates the possibility of fraud from the analysis results.

「サーバが解析結果を通信端末に送信する手段」とは、生成系人工知能モデルによる解析結果を通信端末に送信するためのソフトウェアまたはハードウェアである。 "Means by which the server transmits the analysis results to the communications terminal" refers to software or hardware for transmitting the analysis results from the generative artificial intelligence model to the communications terminal.

「通信端末が解析結果をユーザに表示する手段」とは、サーバから受信した解析結果をユーザに視覚的または音声的に通知するためのソフトウェアまたはハードウェアである。 "Means by which the communications terminal displays the analysis results to the user" refers to software or hardware that visually or audibly notifies the user of the analysis results received from the server.

この発明は、生成系人工知能モデルを用いて特殊詐欺の可能性を検出するシステムに関するものである。以下に、このシステムの具体的な実施形態を説明する。 This invention relates to a system that uses a generative artificial intelligence model to detect the possibility of special fraud. A specific embodiment of this system is described below.

システムの構成 System Configuration

主語：サーバ Subject: Server

サーバは、生成系人工知能モデルを用いてテキストデータを解析し、詐欺の可能性を判断する役割を担う。具体的には、サーバは以下のハードウェアおよびソフトウェアを使用する： The server uses a generative artificial intelligence model to analyze text data and determine the likelihood of fraud. Specifically, the server uses the following hardware and software:

ハードウェア：高性能なプロセッサ、メモリ、ストレージデバイス Hardware: High-performance processor, memory, and storage devices

ソフトウェア：生成系人工知能モデル（例えば、自然言語処理技術を用いたAIモデル） Software: Generative AI models (e.g., AI models using natural language processing technology)

サーバは、ユーザから送信されたテキストデータを受信し、生成系人工知能モデルに入力する。モデルは、詐欺の可能性を示す特定のフレーズやパターンを検出し、その結果をサーバに返す。サーバは解析結果を通信端末に送信する。 The server receives text data sent by the user and inputs it into a generative artificial intelligence model. The model detects specific phrases and patterns that indicate potential fraud and returns the results to the server. The server then sends the analysis results to the communication device.

主語：端末 Subject: Terminal

端末は、ユーザから入力されたテキストデータをサーバに送信し、サーバからの解析結果をユーザに表示する役割を担う。具体的には、端末は以下のハードウェアおよびソフトウェアを使用する： The device is responsible for sending text data entered by the user to the server and displaying the analysis results from the server to the user. Specifically, the device uses the following hardware and software:

ハードウェア：スマートフォン、タブレット、パソコンなどの通信機器 Hardware: Communication devices such as smartphones, tablets, and PCs

ソフトウェア：専用アプリケーション Software: Dedicated application

端末は、ユーザが入力したテキストデータをサーバに送信し、サーバから受信した解析結果をユーザに表示する。解析結果が詐欺の可能性を示す場合、端末は警告メッセージを表示する。 The device sends the text data entered by the user to the server and displays the analysis results received from the server to the user. If the analysis results indicate the possibility of fraud, the device displays a warning message.

主語：ユーザ Subject: User

ユーザは、端末を通じてテキストデータを入力し、解析結果を確認する役割を担う。具体的には、ユーザは以下の操作を行う： The user is responsible for entering text data through the device and checking the analysis results. Specifically, the user performs the following operations:

メッセージやメールの内容を端末に入力する Enter message or email content into your device

解析結果を確認し、必要に応じて適切な対応を取る Check the analysis results and take appropriate action as necessary.

具体例 Specific examples

具体例1 Example 1

ユーザが「お母さん、今すぐお金を振り込んでください」というメッセージを受け取った場合、以下のようにシステムが動作する： When a user receives a message saying, "Mom, please transfer the money now," the system behaves as follows:

1. ユーザがメッセージを端末の専用アプリケーションに入力する。 1. The user enters a message into a dedicated application on the device.

2. 端末がこのメッセージをサーバに送信する。 2. The device sends this message to the server.

3. サーバがメッセージを受信し、生成系人工知能モデルにプロンプト文として入力する。 3. The server receives the message and inputs it as a prompt into the generative artificial intelligence model.

4. 生成系人工知能モデルがメッセージを解析し、「詐欺の可能性があります」という結果を返す。 4. The generative artificial intelligence model analyzes the message and returns the result "Possible fraud."

5. サーバが解析結果を端末に送信する。 5. The server sends the analysis results to the device.

6. 端末が解析結果をユーザに表示し、ユーザが警告メッセージを確認する。 6. The device displays the analysis results to the user, and the user confirms the warning message.

具体例2 Example 2

ユーザが「あなたの銀行口座が不正利用されています。今すぐこちらに連絡してください」というメールを受け取った場合、以下のようにシステムが動作する： When a user receives an email saying, "Your bank account has been compromised. Contact us immediately," the system works as follows:

1. ユーザがメールの内容を端末の専用アプリケーションに入力する。 1. The user enters the email content into a dedicated application on their device.

2. 端末がこのメールの内容をサーバに送信する。 2. The device sends the contents of this email to the server.

3. サーバがメールの内容を受信し、生成系人工知能モデルにプロンプト文として入力する。 3. The server receives the email content and inputs it as a prompt into the generative artificial intelligence model.

4. 生成系人工知能モデルがメールの内容を解析し、「詐欺の可能性があります」という結果を返す。 4. The generative artificial intelligence model analyzes the contents of the email and returns the result "Possible fraud."

プロンプト文の例 Example prompt

以下は、生成系人工知能モデルに入力するプロンプト文の例である： Below is an example of a prompt to input to a generative artificial intelligence model:

「以下のメッセージが詐欺の可能性があるかどうかを判断してください：『お母さん、今すぐお金を振り込んでください』」 "Please determine if the following message is a potential scam: 'Mom, please transfer the money now.'"

「以下のメールが詐欺の可能性があるかどうかを判断してください：『あなたの銀行口座が不正利用されています。今すぐこちらに連絡してください』」 "Determine whether the following email is a potential scam: 'Your bank account has been compromised. Contact us immediately.'"

このようにして、生成系人工知能モデルを用いた特殊詐欺検出システムが具体的に動作する。実施例３における特定処理の流れについて図１５を用いて説明する。 This is how the specialized fraud detection system using a generative artificial intelligence model operates. The flow of the identification process in Example 3 will be explained using Figure 15.

ステップ１： Step 1:

ユーザがテキストデータを入力する。 The user enters text data.

ユーザは、受け取ったメッセージやメールの内容を端末の専用アプリケーションに入力する。例えば、「お母さん、今すぐお金を振り込んでください」というメッセージをコピーしてアプリケーションに貼り付ける。入力データはテキスト形式で端末に保存される。 The user enters the contents of the message or email they receive into a dedicated application on their device. For example, they might copy and paste the message "Mom, please transfer the money now" into the application. The input data is saved in text format on the device.

ステップ２： Step 2:

端末がテキストデータをサーバに送信する。 The device sends the text data to the server.

端末は、ユーザが入力したテキストデータをサーバに送信する。具体的には、端末のアプリケーションがHTTPリクエストを生成し、テキストデータを含むペイロードをサーバに送信する。この際、データは暗号化されて送信される。入力はユーザが入力したテキストデータであり、出力はサーバに送信されたテキストデータである。 The terminal sends the text data entered by the user to the server. Specifically, the terminal application generates an HTTP request and sends a payload containing the text data to the server. At this time, the data is encrypted before being sent. The input is the text data entered by the user, and the output is the text data sent to the server.

ステップ３： Step 3:

サーバがテキストデータを受信し、生成系人工知能モデルに入力する。 The server receives the text data and inputs it into the generative artificial intelligence model.

サーバは、端末から送信されたテキストデータを受信する。受信したデータは、まずデータベースに保存される。その後、サーバは生成系人工知能モデルにプロンプト文としてテキストデータを入力する。プロンプト文の例としては、「以下のメッセージが詐欺の可能性があるかどうかを判断してください：『お母さん、今すぐお金を振り込んでください』」がある。入力は端末から受信したテキストデータであり、出力は生成系人工知能モデルに入力されたプロンプト文である。 The server receives text data sent from the terminal. The received data is first stored in a database. The server then inputs the text data as a prompt sentence into the generative AI model. An example of a prompt sentence is "Please determine whether the following message is likely to be fraudulent: 'Mom, please transfer the money now.'" The input is the text data received from the terminal, and the output is the prompt sentence input into the generative AI model.

ステップ４： Step 4:

生成系人工知能モデルがテキストデータを解析し、結果を返す。 A generative artificial intelligence model analyzes the text data and returns the results.

生成系人工知能モデルは、入力されたプロンプト文を解析し、詐欺の可能性があるかどうかを判断する。具体的には、モデルは「振り込め詐欺」や「オレオレ詐欺」の典型的なフレーズやパターンと照らし合わせて解析を行う。解析結果は、詐欺の可能性が高い場合には「詐欺の可能性があります」という警告メッセージとして返される。入力はプロンプト文であり、出力は解析結果である。 The generative artificial intelligence model analyzes the input prompt text and determines whether it is likely to be fraud. Specifically, the model analyzes it by comparing it with typical phrases and patterns of "bank transfer fraud" and "it's my son" fraud. If the analysis result indicates a high possibility of fraud, it returns a warning message stating "Possible fraud." The input is the prompt text, and the output is the analysis result.

ステップ５： Step 5:

サーバが解析結果を端末に送信する。 The server sends the analysis results to the device.

サーバは、生成系人工知能モデルから得られた解析結果を端末に送信する。具体的には、サーバはHTTPレスポンスを生成し、解析結果を含むペイロードを端末に送信する。この際、データは暗号化されて送信される。入力は生成系人工知能モデルからの解析結果であり、出力は端末に送信された解析結果である。 The server sends the analysis results obtained from the generative AI model to the terminal. Specifically, the server generates an HTTP response and sends a payload containing the analysis results to the terminal. At this time, the data is encrypted before being sent. The input is the analysis results from the generative AI model, and the output is the analysis results sent to the terminal.

ステップ６： Step 6:

端末が解析結果をユーザに表示する。 The device displays the analysis results to the user.

端末は、サーバから受信した解析結果をユーザに表示する。例えば、解析結果が「詐欺の可能性があります」という警告メッセージであれば、端末のアプリケーションはこのメッセージをポップアップウィンドウや通知としてユーザに表示する。入力はサーバから受信した解析結果であり、出力はユーザに表示された警告メッセージである。ユーザは、この警告メッセージを確認し、適切な対応を取ることができる。 The device displays the analysis results received from the server to the user. For example, if the analysis result is a warning message stating "Possible fraud," the device application displays this message to the user as a pop-up window or notification. The input is the analysis result received from the server, and the output is the warning message displayed to the user. The user can check this warning message and take appropriate action.

（応用例３） (Application Example 3)

次に、形態例３の応用例３について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、スマートデバイス１４を「端末」と称する。 Next, we will explain Application Example 3 of Form Example 3. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart device 14 will be referred to as the "terminal."

近年、特殊詐欺の手口が巧妙化し、多くの人々が被害に遭っている。特に高齢者を狙った「振り込め詐欺」や「オレオレ詐欺」などが社会問題となっている。これらの詐欺を未然に防ぐためには、詐欺の可能性を早期に検出し、ユーザーに警告を発するシステムが必要である。しかし、現行のシステムでは、詐欺の検出精度が低く、ユーザーが詐欺の可能性に気づくことが難しいという課題がある。 In recent years, special fraud methods have become more sophisticated, with many people falling victim to them. In particular, "bank transfer fraud" and "it's me" frauds targeting the elderly have become a social problem. In order to prevent these frauds, a system is needed that can detect possible fraud early and issue a warning to users. However, current systems have low fraud detection accuracy, making it difficult for users to recognize the possibility of fraud.

応用例３におけるデータ処理装置１２の特定処理部２９０による特定処理を、以下の各手段により実現する。 The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 3 is realized by the following means.

この発明では、サーバは、詐欺電話の入電を検知する手段と、通信端末に設置した音声入力装置から会話を聞き取る手段と、会話データを文字化し生成系ＡＩへ送信する手段と、生成系ＡＩが会話データから詐欺が疑われると判断した場合に、前記音声入力装置から音声アラートを発する手段と、生成系ＡＩが詐欺の可能性を示す特定のフレーズまたはパターンを検出する手段と、生成系ＡＩが詐欺の可能性を示す特定のフレーズまたはパターンを検出した場合に、ユーザーに警告を発する手段と、を含む。これにより、詐欺の可能性を高精度で検出し、ユーザーに迅速に警告を発することが可能となる。 In this invention, the server includes means for detecting incoming fraudulent calls, means for listening to conversations from a voice input device installed in the communication terminal, means for converting the conversation data into text and sending it to the generative AI, means for issuing a voice alert from the voice input device if the generative AI determines that fraud is suspected from the conversation data, means for the generative AI to detect specific phrases or patterns that indicate the possibility of fraud, and means for issuing a warning to the user if the generative AI detects specific phrases or patterns that indicate the possibility of fraud. This makes it possible to detect the possibility of fraud with high accuracy and issue a warning to the user quickly.

「詐欺電話の入電を検知する手段」とは、通信端末に対して詐欺の可能性がある電話の着信を識別し、通知するための機能である。 "Means for detecting incoming fraudulent calls" refers to a function that identifies and notifies a communication terminal of incoming calls that may be fraudulent.

「通信端末に設置した音声入力装置」とは、スマートフォンや家庭用電話などの通信機器に取り付けられた、音声を収集するためのデバイスである。 "A voice input device installed on a communication terminal" is a device for collecting voice that is attached to communication devices such as smartphones and home telephones.

「会話データを文字化し生成系ＡＩへ送信する手段」とは、音声入力装置で収集した音声データをテキストデータに変換し、そのテキストデータを生成系AIに送信するための機能である。 "Means for converting conversation data into text and sending it to generative AI" refers to a function for converting voice data collected by a voice input device into text data and sending that text data to generative AI.

「生成系ＡＩが会話データから詐欺が疑われると判断した場合に、前記音声入力装置から音声アラートを発する手段」とは、生成系AIが解析した会話データに基づいて詐欺の可能性があると判断した場合に、音声入力装置を通じて警告音を発するための機能である。 "Means for issuing a voice alert from the voice input device when the generative AI determines that fraud is suspected based on the conversation data" refers to a function that issues a warning sound through the voice input device when the generative AI determines that there is a possibility of fraud based on the conversation data analyzed.

「生成系ＡＩが詐欺の可能性を示す特定のフレーズまたはパターンを検出する手段」とは、生成系AIが事前に学習した詐欺の典型的なフレーズやパターンを会話データから検出するための機能である。 "Means for the generative AI to detect specific phrases or patterns that indicate the possibility of fraud" refers to a function that enables the generative AI to detect typical fraudulent phrases and patterns that it has previously learned from conversation data.

「生成系ＡＩが詐欺の可能性を示す特定のフレーズまたはパターンを検出した場合に、ユーザーに警告を発する手段」とは、生成系AIが詐欺の可能性を示すフレーズやパターンを検出した際に、ユーザーに対して警告を発するための機能である。 "Means for issuing a warning to the user when the generative AI detects a specific phrase or pattern that indicates the possibility of fraud" refers to a function that issues a warning to the user when the generative AI detects a phrase or pattern that indicates the possibility of fraud.

この発明を実施するための形態として、詐欺検出アシスタントシステムをスマートフォンにインストールする方法を説明する。 As an embodiment of this invention, we will explain how to install a fraud detection assistant system on a smartphone.

まず、システムは詐欺電話の入電を検知する手段を備える。この手段は、通信端末に対して詐欺の可能性がある電話の着信を識別し、通知する機能を持つ。具体的には、通信端末に設置された音声入力装置を用いて、通話内容をリアルタイムで収集する。 First, the system is equipped with a means for detecting incoming fraudulent calls. This means has the function of identifying and notifying the communication terminal of incoming calls that may be fraudulent. Specifically, the content of the call is collected in real time using a voice input device installed on the communication terminal.

次に、収集された音声データは、会話データを文字化し生成系AIへ送信する手段を通じてテキストデータに変換される。この変換には、音声認識ソフトウェア（例：Googleの音声認識API）を使用する。変換されたテキストデータは、生成系AIモデルに送信される。 The collected voice data is then converted into text data through a means of transcribing the conversation data and sending it to the generative AI. This conversion is done using voice recognition software (e.g., Google's voice recognition API). The converted text data is then sent to the generative AI model.

生成系AIモデルは、詐欺の可能性を示す特定のフレーズやパターンを検出するために事前に学習されている。このモデルは、例えばHugging Faceのtransformersライブラリを使用して構築される。生成系AIが会話データから詐欺が疑われると判断した場合、音声入力装置から音声アラートを発する手段を通じてユーザーに警告を発する。 Generative AI models are pre-trained to detect specific phrases and patterns that indicate potential fraud. These models are built using, for example, the Hugging Face transformers library. If the generative AI determines from the conversation data that fraud is suspected, it will warn the user by issuing an audio alert via a voice input device.

さらに、生成系AIが詐欺の可能性を示す特定のフレーズまたはパターンを検出した場合、ユーザーに警告を発する手段が動作する。この手段は、ユーザーに対して視覚的または聴覚的な警告を発することで、詐欺の可能性に気づかせる役割を果たす。 Furthermore, if the generative AI detects certain phrases or patterns that indicate potential fraud, a user alert mechanism will be activated. This mechanism will alert the user to the potential fraud by providing a visual or audio warning.

具体例として、以下のような音声ファイルが入力される場合を考える： As a concrete example, consider the following audio file:

音声内容：「お母さん、今すぐお金を振り込んでください。急いでいます。」 Voice: "Mom, please transfer the money right now. It's urgent."

この音声ファイルを解析するためのプロンプト文の例： Example prompt for parsing this audio file:

「お母さん、今すぐお金を振り込んでください。急いでいます。」というフレーズが詐欺の可能性があるかどうかを判断してください。 Judge whether the phrase "Mom, please transfer the money now. It's urgent." could be a scam.

このプロンプト文を生成系AIモデルに入力することで、詐欺の可能性を解析し、ユーザーに警告を発することができる。 By feeding this prompt into a generative AI model, it can analyze the possibility of fraud and issue a warning to the user.

応用例３における特定処理の流れについて図１６を用いて説明する。 The flow of the specific processing in Application Example 3 will be explained using Figure 16.

ステップ１： Step 1:

詐欺電話の入電を検知する Detect incoming fraudulent calls

サーバは、通信端末に対して詐欺の可能性がある電話の着信を識別し、通知する。具体的には、通信端末の通話ログを監視し、特定の番号やパターンに基づいて詐欺の可能性を検出する。入力は通話ログデータであり、出力は詐欺の可能性がある電話の通知である。 The server identifies and notifies the communication terminal of incoming calls that may be fraudulent. Specifically, it monitors the communication terminal's call log and detects possible fraud based on specific numbers or patterns. The input is call log data, and the output is a notification of a potentially fraudulent call.

ステップ２： Step 2:

音声入力装置から会話を聞き取る Listen to conversations from a voice input device

端末は、通信端末に設置された音声入力装置を用いて、通話内容をリアルタイムで収集する。具体的には、マイクを通じて音声データを取得し、音声ファイルとして保存する。入力は通話中の音声であり、出力は音声ファイルである。 The terminal collects call content in real time using an audio input device installed on the communication terminal. Specifically, audio data is acquired through a microphone and saved as an audio file. The input is the audio during the call, and the output is an audio file.

ステップ３： Step 3:

会話データを文字化し生成系AIへ送信する Conversation data is transcribed and sent to generative AI.

端末は、収集された音声データをテキストデータに変換し、生成系AIへ送信する。具体的には、音声認識ソフトウェア（例：Googleの音声認識API）を使用して音声データを文字化する。入力は音声ファイルであり、出力はテキストデータである。 The device converts the collected voice data into text data and sends it to the generative AI. Specifically, it uses voice recognition software (e.g., Google's voice recognition API) to transcribe the voice data. The input is an audio file, and the output is text data.

ステップ４： Step 4:

生成系AIが会話データを解析する Generative AI analyzes conversation data

サーバは、生成系AIモデルを用いて、テキストデータから詐欺の可能性を示す特定のフレーズやパターンを検出する。具体的には、生成系AIモデル（例：Hugging Faceのtransformersライブラリ）を使用してテキストデータを解析する。入力はテキストデータであり、出力は詐欺の可能性の判定結果である。 The server uses a generative AI model to detect specific phrases and patterns in text data that indicate potential fraud. Specifically, it analyzes the text data using a generative AI model (e.g., Hugging Face's transformers library). The input is the text data, and the output is a determination of the likelihood of fraud.

ステップ５： Step 5:

詐欺が疑われる場合に音声アラートを発する Audible alerts when fraud is suspected

端末は、生成系AIが詐欺の可能性があると判断した場合、音声入力装置から音声アラートを発する。具体的には、警告音やメッセージを再生する。入力は詐欺の可能性の判定結果であり、出力は音声アラートである。 If the generative AI determines that there is a possibility of fraud, the device will issue an audio alert from the voice input device. Specifically, it will play a warning sound or message. The input is the result of the determination of the possibility of fraud, and the output is an audio alert.

ステップ６： Step 6:

ユーザーに警告を発する Warn the user

端末は、生成系AIが詐欺の可能性を示す特定のフレーズまたはパターンを検出した場合、ユーザーに警告を発する。具体的には、画面上に警告メッセージを表示したり、通知を送信する。入力は詐欺の可能性の判定結果であり、出力はユーザーへの警告である。 If the generative AI detects certain phrases or patterns that indicate potential fraud, the device will alert the user by displaying a warning message on the screen or sending a notification. The input is the result of the determination of potential fraud, and the output is a warning to the user.

更に、ユーザの感情を推定する感情エンジンを組み合わせてもよい。すなわち、特定処理部２９０は、感情特定モデル５９を用いてユーザの感情を推定し、ユーザの感情を用いた特定処理を行うようにしてもよい。 Furthermore, an emotion engine that estimates the user's emotion may be combined. That is, the identification processing unit 290 may estimate the user's emotion using the emotion identification model 59 and perform identification processing using the user's emotion.

「形態例１」 "Example 1"

本発明の一実施形態として、生成系ＡＩに感情エンジンを組み合わせたシステムがある。このシステムは、ユーザーの声のトーン、音量、話速などから感情を分析する。具体的には、ユーザーが詐欺電話を受けている最中に、その声のトーンや音量が急に上がった場合、ユーザーがパニックになっている可能性があると感じ取る。この情報を基に、生成系ＡＩは詐欺の可能性が更に高まったと判断し、音声アラートを発する。 One embodiment of the present invention is a system that combines a generative AI with an emotion engine. This system analyzes emotions from the user's voice tone, volume, speaking rate, etc. Specifically, if the user's voice tone or volume suddenly increases while they are receiving a scam call, the system senses that the user may be panicking. Based on this information, the generative AI determines that the possibility of a scam has increased further and issues an audio alert.

「形態例２」 "Example 2"

また、別の実施形態として、感情エンジンが特定の閾値を超えた場合に音声アラートを発するシステムも考えられる。例えば、ユーザーの声のトーンが一定の閾値を超えた場合、または話速が一定の速度を超えた場合などに、感情エンジンはユーザーが強い感情を感じていると判断する。この判断を基に、生成系ＡＩは音声アラートを発し、ユーザーに詐欺の可能性を警告する。 In another embodiment, the system could issue an audio alert if the emotion engine exceeds a certain threshold. For example, if the user's tone of voice exceeds a certain threshold, or if the user's speaking speed exceeds a certain rate, the emotion engine could determine that the user is experiencing strong emotion. Based on this determination, the generative AI could issue an audio alert, warning the user of possible fraud.

「形態例３」 "Example 3"

さらに、別の実施形態として、感情エンジンと生成系ＡＩが連携して動作するシステムも考えられる。このシステムでは、感情エンジンがユーザーの感情を分析し、その結果を生成系ＡＩに送信する。生成系ＡＩは、感情エンジンからの情報と自身が分析した会話データを組み合わせて詐欺の可能性を判断する。このようにして、より精度の高い詐欺検出が可能となる。 Furthermore, as another embodiment, a system in which the emotion engine and generative AI work in conjunction with each other is also possible. In this system, the emotion engine analyzes the user's emotions and sends the results to the generative AI. The generative AI combines information from the emotion engine with conversation data it has analyzed to determine the possibility of fraud. In this way, more accurate fraud detection is possible.

以下に、各形態例の処理の流れについて説明する。 The processing flow for each example form is explained below.

「形態例１」 "Example 1"

ステップ１：ユーザーが詐欺電話を受ける。 Step 1: The user receives a scam call.

ステップ２：スマートフォンのアプリや家庭用電話に設置されたマイクロフォンが会話を聞き取る。 Step 2: A smartphone app or a microphone installed on a home phone listens to the conversation.

ステップ３：会話データをテキストに変換し、生成系ＡＩに送信する。 Step 3: Convert the conversation data into text and send it to the generative AI.

ステップ４：生成系ＡＩがテキストデータを分析し、詐欺の可能性を判断する。 Step 4: Generative AI analyzes the text data and determines the likelihood of fraud.

ステップ５：感情エンジンがユーザーの声のトーン、音量、話速などから感情を分析する。 Step 5: The emotion engine analyzes the user's emotions based on their tone of voice, volume, speaking rate, etc.

ステップ６：生成系ＡＩが感情エンジンからの情報を基に、詐欺の可能性が更に高まったと判断し、音声アラートを発する。 Step 6: Based on the information from the emotion engine, the generative AI determines that the likelihood of fraud has increased and issues an audio alert.

「形態例２」 "Example 2"

ステップ６：感情エンジンが特定の閾値を超えた場合、生成系ＡＩに通知する。 Step 6: If the emotion engine exceeds a certain threshold, it notifies the generative AI.

ステップ７：生成系ＡＩが感情エンジンからの通知を基に、詐欺の可能性が更に高まったと判断し、音声アラートを発する。 Step 7: Based on the notification from the emotion engine, the generative AI determines that the likelihood of fraud has increased and issues an audio alert.

「形態例３」 "Example 3"

ステップ５：感情エンジンがユーザーの声のトーン、音量、話速などから感情を分析し、その結果を生成系ＡＩに送信する。 Step 5: The emotion engine analyzes the user's emotions based on their tone of voice, volume, speaking rate, etc., and sends the results to the generative AI.

ステップ６：生成系ＡＩが感情エンジンからの情報と自身が分析した会話データを組み合わせて詐欺の可能性を判断する。 Step 6: The generative AI combines information from the emotion engine with the conversation data it has analyzed to determine the likelihood of fraud.

ステップ７：詐欺の可能性が高いと判断された場合、生成系ＡＩが音声アラートを発する。 Step 7: If it determines there is a high possibility of fraud, the generative AI will issue an audio alert.

（実施例１） (Example 1)

従来の詐欺電話検知システムでは、詐欺の可能性を判断する際に会話内容のみを解析するため、ユーザーの感情状態を考慮することができず、詐欺の検知精度が低いという問題があった。また、ユーザーがパニック状態に陥った場合の対応が不十分であり、詐欺被害を未然に防ぐことが難しいという課題があった Conventional fraud call detection systems analyze only the content of conversations to determine the possibility of fraud, which means they are unable to take the user's emotional state into account, resulting in low fraud detection accuracy. Furthermore, they are unable to respond adequately when the user panics, making it difficult to prevent fraud from occurring.

実施例１におけるデータ処理装置１２の特定処理部２９０による特定処理を、以下の各手段により実現する。 The specific processing performed by the specific processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

この発明では、サーバは、詐欺電話の入電を検知する手段と、通信端末に設置した音声入力装置から会話を聞き取る手段と、会話データを文字化し生成系人工知能へ送信する手段と、生成系人工知能が会話データから詐欺が疑われると判断した場合に、前記音声入力装置から音声アラートを発する手段と、ユーザーの声のトーン、音量、話速などのパラメータを収集し、感情を分析する手段と、感情分析の結果を基に詐欺の可能性を再評価する手段を含む。これにより、ユーザーの感情状態を考慮した詐欺検知が可能となり、詐欺被害を未然に防ぐことができる。 In this invention, the server includes means for detecting incoming fraudulent calls, means for listening to conversations from a voice input device installed in the communication terminal, means for converting the conversation data into text and transmitting it to the generative artificial intelligence, means for issuing a voice alert from the voice input device if the generative artificial intelligence determines that fraud is suspected from the conversation data, means for collecting parameters such as the user's voice tone, volume, and speaking rate and analyzing their emotions, and means for reassessing the possibility of fraud based on the results of the emotion analysis. This makes it possible to detect fraud while taking the user's emotional state into account, thereby preventing fraud damage before it occurs.

「詐欺電話の入電を検知する手段」とは、通信端末に対してかかってくる電話が詐欺の可能性があるかどうかを検出するための装置またはソフトウェアである。 "Means for detecting incoming fraudulent calls" refers to equipment or software that detects whether an incoming call to a communications terminal is likely to be fraudulent.

「通信端末に設置した音声入力装置」とは、スマートフォンや家庭用電話などの通信機器に取り付けられたマイクロフォンなどの音声を入力するための装置である。 "A voice input device installed on a communication terminal" refers to a device for inputting voice, such as a microphone attached to a communication device such as a smartphone or home telephone.

「会話データを文字化し生成系人工知能へ送信する手段」とは、音声データをテキストデータに変換し、そのテキストデータを生成系人工知能に送信するための装置またはソフトウェアである。 "Means for converting conversation data into text and sending it to generative AI" refers to a device or software that converts voice data into text data and sends that text data to generative AI.

「生成系人工知能」とは、受け取ったデータを解析し、特定のパターンやフレーズを検出することで詐欺の可能性を判断するための人工知能システムである。 "Generative AI" is an artificial intelligence system that analyzes received data and determines the likelihood of fraud by detecting specific patterns and phrases.

「音声アラートを発する手段」とは、詐欺の可能性が高いと判断された場合に、ユーザーに警告を発するための音声メッセージを再生する装置またはソフトウェアである。 "A means for issuing an audio alert" is a device or software that plays an audio message to warn the user if it determines that there is a high possibility of fraud.

「ユーザーの声のトーン、音量、話速などのパラメータを収集し、感情を分析する手段」とは、ユーザーの音声データからトーン、音量、話速などの情報を収集し、それを基にユーザーの感情状態を解析するための装置またはソフトウェアである。 "Means for collecting parameters such as the user's voice tone, volume, and speaking rate, and analyzing emotions" refers to a device or software that collects information such as tone, volume, and speaking rate from the user's voice data, and analyzes the user's emotional state based on that information.

「感情分析の結果を基に詐欺の可能性を再評価する手段」とは、感情分析の結果を生成系人工知能に入力し、詐欺の可能性を再度評価するための装置またはソフトウェアである。 "Means for reassessing the possibility of fraud based on the results of sentiment analysis" refers to a device or software that inputs the results of sentiment analysis into generative artificial intelligence and reassess the possibility of fraud.

発明を実施するための形態 Form for implementing the invention

本発明は、詐欺電話の検知とユーザーへの警告を行うシステムであり、通信端末にインストールされたアプリケーションとサーバを用いて実施される。このシステムは、詐欺電話の入電を検知し、会話を聞き取り、会話データを文字化して生成系人工知能に送信する。さらに、ユーザーの感情状態を分析し、詐欺の可能性を再評価することで、詐欺被害を未然に防ぐことを目的としている。 This invention is a system that detects fraudulent phone calls and warns users, and is implemented using an application and server installed on a communications terminal. This system detects incoming fraudulent calls, listens to the conversation, transcribes the conversation data, and sends it to a generative artificial intelligence. Furthermore, it aims to prevent fraud by analyzing the user's emotional state and reassessing the possibility of fraud.

1. 通信端末 1. Communication terminal

スマートフォンや家庭用電話などの通信機器。 Communication devices such as smartphones and home phones.

音声入力装置としてのマイクロフォン。 A microphone as an audio input device.

2. サーバ 2. Server

音声データの文字化には、音声認識ソフトウェア（例：Google Cloud Speech-to-Text API）を使用する。 Speech recognition software (e.g., Google Cloud Speech-to-Text API) is used to transcribe audio data.

感情分析には、感情エンジン（例：IBM Watson（登録商標） Tone Analyzer）を使用する。 Sentiment analysis uses an emotion engine (e.g., IBM Watson (registered trademark) Tone Analyzer).

生成系人工知能には、自然言語処理技術を持つAIモデル（例：OpenAIのGPT-4）を使用する。 For generative artificial intelligence, we use AI models with natural language processing technology (e.g., OpenAI's GPT-4).

システムの具体的な動作 Specific system operation

1. 電話の入電検知 1. Incoming phone call detection

サーバは、通信端末にインストールされたアプリケーションを通じて、電話の入電を検知する。アプリケーションは、通信端末のネイティブAPIを使用して電話の入電イベントをキャッチする。 The server detects incoming phone calls through an application installed on the communication device. The application uses the communication device's native API to catch incoming phone call events.

2. 会話の聞き取り 2. Listening to Conversations

端末は、電話中の会話をマイクロフォンを用いて聞き取る。アプリケーションは、マイクの音声入力をリアルタイムでキャプチャし、音声データとして保存する。 The device uses a microphone to listen to phone conversations. The application captures the microphone's audio input in real time and saves it as audio data.

3. 音声データの文字化 3. Transcription of audio data

サーバは、聞き取った音声データを文字データに変換する。この処理には、Google Cloud Speech-to-Text APIを呼び出し、音声データをテキストに変換する。 The server converts the audio data it hears into text data. This process involves calling the Google Cloud Speech-to-Text API to convert the audio data into text.

4. 生成系人工知能への送信 4. Sending to generative AI

サーバは、文字化された会話データを生成系人工知能に送信する。サーバは、HTTPリクエストを使用して、生成系人工知能にテキストデータを送信する。 The server sends the transcribed conversation data to the generative AI. The server sends the text data to the generative AI using an HTTP request.

5. 詐欺の検知 5. Fraud Detection

生成系人工知能は、受け取った会話データを解析し、詐欺が疑われるかどうかを判断する。生成系人工知能は、自然言語処理技術を用いて、会話の内容を解析し、詐欺のパターンに一致するかどうかを評価する。 The generative AI analyzes the received conversation data and determines whether fraud is suspected. Using natural language processing technology, the generative AI analyzes the content of the conversation and evaluates whether it matches any fraudulent patterns.

6. 感情の分析 6. Sentiment Analysis

サーバは、ユーザーの声のトーン、音量、話速などのパラメータを収集し、感情エンジンを用いて分析する。サーバは、音声解析ソフトウェアを使用して、音声データから感情パラメータを抽出する。 The server collects parameters such as the user's voice tone, volume, and speaking rate and analyzes them using an emotion engine. The server uses voice analysis software to extract emotion parameters from the voice data.

7. パニック状態の検知 7. Panic Detection

サーバは、ユーザーの声のトーンや音量が急に上がった場合、ユーザーがパニックになっている可能性があると判断する。サーバは、感情エンジンからのデータを解析し、急激な変化があるかどうかを評価する。 If the user's tone or volume of voice suddenly increases, the server determines that the user may be panicking. The server analyzes data from the emotion engine and evaluates whether there is a sudden change.

8. 詐欺の可能性の再評価 8. Reassessing the possibility of fraud

生成系人工知能は、感情エンジンからの情報を基に、詐欺の可能性が更に高まったと判断する。生成系人工知能は、感情データを追加の入力として受け取り、詐欺の可能性を再評価する。 Based on information from the emotion engine, the generative AI determines that the likelihood of fraud has increased. The generative AI receives the emotion data as additional input and reassess the likelihood of fraud.

9. 音声アラートの発信 9. Audio alerts

サーバは、詐欺の可能性が高いと判断された場合、通信端末のスピーカーから音声アラートを発するように指示する。サーバは、通信端末のアプリケーションに対して、音声アラートを再生するコマンドを送信する。 If the server determines that there is a high possibility of fraud, it instructs the communication device to play an audio alert from its speaker. The server then sends a command to the communication device's application to play the audio alert.

具体例とプロンプト文 Examples and prompts

具体例として、ユーザーがスマートフォンで電話を受けているとする。電話の相手が「あなたの銀行口座が不正利用されています。すぐに口座番号とパスワードを教えてください」と言った場合、ユーザーの声のトーンが急に上がり、音量も大きくなる。この情報を基に、生成系人工知能は詐欺の可能性が高いと判断し、通信端末のスピーカーから「詐欺の可能性があります。注意してください」という音声アラートを発する。 As a concrete example, imagine a user is receiving a phone call on their smartphone. If the person on the other end of the line says, "Your bank account has been fraudulently used. Please tell us your account number and password immediately," the user's voice tone will suddenly rise and the volume will also increase. Based on this information, the generative artificial intelligence will determine that there is a high possibility of fraud, and will emit an audio alert from the communication device's speaker saying, "This may be a fraud. Please be careful."

プロンプト文の例: Example prompt:

「以下の会話データを解析し、詐欺の可能性があるかどうかを判断してください。ユーザーの声のトーンや音量の変化も考慮してください。 "Analyze the conversation data below to determine if it may be a scam. Also consider changes in the user's tone and volume of voice.

会話データ: 'あなたの銀行口座が不正利用されています。すぐに口座番号とパスワードを教えてください。' Conversation data: 'Your bank account has been compromised. Please provide your account number and password immediately.'

ユーザーの声のトーン: 急に上がる User's voice tone: Sudden rise

ユーザーの音量: 大きくなる」 User volume: Increased"

実施例１における特定処理の流れについて図１７を用いて説明する。 The flow of the identification process in Example 1 will be explained using Figure 17.

ステップ１： Step 1:

電話の入電検知 Incoming phone call detection

サーバは、通信端末にインストールされたアプリケーションを通じて、電話の入電を検知する。 The server detects incoming phone calls through an application installed on the communication terminal.

入力: 通信端末の入電イベント。 Input: Incoming call event from communication terminal.

データ加工/演算: 通信端末のネイティブAPIを使用して入電イベントをキャッチする。 Data processing/calculation: Capture incoming call events using the communication device's native API.

出力: 入電が検知されたというイベント情報。 Output: Event information indicating that an incoming call was detected.

具体的な動作: アプリケーションは、ＡＮＤＲＯＩＤ（登録商標）やiOSのネイティブAPIを使用して電話の入電イベントをキャッチし、サーバに通知する。 Specific operation: The application uses ANDROID (registered trademark) or iOS native APIs to catch incoming phone calls and notify the server.

ステップ２： Step 2:

会話の聞き取り Listening to conversations

端末は、電話中の会話をマイクロフォンを用いて聞き取る。 The device uses a microphone to listen to the conversation during the call.

入力: 通話中の音声データ。 Input: Audio data during the call.

データ加工/演算: マイクの音声入力をリアルタイムでキャプチャし、音声データとして保存する。 Data processing/calculation: Captures microphone audio input in real time and saves it as audio data.

出力: 音声データ。 Output: Audio data.

具体的な動作: アプリケーションは、マイクの音声入力をリアルタイムでキャプチャし、音声データとして保存する。 Specific behavior: The application captures audio input from the microphone in real time and saves it as audio data.

ステップ３： Step 3:

音声データの文字化 Transcription of audio data

サーバは、聞き取った音声データを文字データに変換する。 The server converts the audio data it hears into text data.

入力: 音声データ。 Input: Audio data.

データ加工/演算: Google Cloud Speech-to-Text APIを呼び出し、音声データをテキストに変換する。 Data processing/calculation: Call the Google Cloud Speech-to-Text API to convert voice data into text.

出力: 文字データ。 Output: Character data.

具体的な動作: サーバは、Google Cloud Speech-to-Text APIを使用して音声データをテキストに変換する。 Specific operation: The server converts the audio data into text using the Google Cloud Speech-to-Text API.

ステップ４： Step 4:

生成系人工知能への送信 Send to generative AI

サーバは、文字化された会話データを生成系人工知能に送信する。 The server sends the transcribed conversation data to the generative artificial intelligence.

入力: 文字データ。 Input: Character data.

データ加工/演算: HTTPリクエストを使用して、生成系人工知能にテキストデータを送信する。 Data processing/calculation: Send text data to generative artificial intelligence using an HTTP request.

出力: 生成系人工知能へのリクエスト送信結果。 Output: Result of request sent to generative artificial intelligence.

具体的な動作: サーバは、HTTPリクエストを使用して、生成系人工知能にテキストデータを送信する。 Specific operation: The server sends text data to the generative artificial intelligence using an HTTP request.

ステップ５： Step 5:

詐欺の検知 Fraud detection

生成系人工知能は、受け取った会話データを解析し、詐欺が疑われるかどうかを判断する。 The generative artificial intelligence analyzes the received conversation data and determines whether fraud is suspected.

入力: 文字データ。 Input: Character data.

データ加工/演算: 自然言語処理技術を用いて、会話の内容を解析し、詐欺のパターンに一致するかどうかを評価する。 Data processing/calculation: Using natural language processing technology, the content of the conversation is analyzed and evaluated to see if it matches a fraud pattern.

出力: 詐欺の可能性に関する評価結果。 Output: Assessment result regarding likelihood of fraud.

具体的な動作: 生成系人工知能は、自然言語処理技術を用いて、会話の内容を解析し、詐欺のパターンに一致するかどうかを評価する。 Specific operation: The generative artificial intelligence uses natural language processing technology to analyze the content of the conversation and evaluate whether it matches any fraud patterns.

ステップ６： Step 6:

感情の分析 Emotion analysis

サーバは、ユーザーの声のトーン、音量、話速などのパラメータを収集し、感情エンジンを用いて分析する。 The server collects parameters such as the user's voice tone, volume, and speaking speed, and analyzes them using an emotion engine.

入力: 音声データ。 Input: Audio data.

データ加工/演算: 音声解析ソフトウェアを使用して、音声データから感情パラメータを抽出する。 Data processing/calculation: Use voice analysis software to extract emotional parameters from voice data.

出力: 感情パラメータ。 Output: Emotion parameters.

具体的な動作: サーバは、音声解析ソフトウェア（例：IBM Watson Tone Analyzer）を使用して、音声データから感情パラメータを抽出する。 Specific operation: The server uses voice analysis software (e.g., IBM Watson Tone Analyzer) to extract emotion parameters from the voice data.

ステップ７： Step 7:

パニック状態の検知 Detecting panic states

サーバは、ユーザーの声のトーンや音量が急に上がった場合、ユーザーがパニックになっている可能性があると判断する。 If the user's voice tone or volume suddenly increases, the server will determine that the user may be panicking.

入力: 感情パラメータ。 Input: Emotion parameters.

データ加工/演算: 感情エンジンからのデータを解析し、急激な変化があるかどうかを評価する。 Data processing/calculation: Analyze data from the emotion engine and evaluate whether there are any sudden changes.

出力: パニック状態の評価結果。 Output: Panic state assessment results.

具体的な動作: サーバは、感情エンジンからのデータを解析し、急激な変化があるかどうかを評価する。 Specific behavior: The server analyzes data from the emotion engine and evaluates whether there are any sudden changes.

ステップ８： Step 8:

詐欺の可能性の再評価 Reassessing the possibility of fraud

生成系人工知能は、感情エンジンからの情報を基に、詐欺の可能性が更に高まったと判断する。 Based on information from the emotion engine, the generative artificial intelligence determines that the likelihood of fraud has increased.

入力: 感情パラメータ。 Input: Emotion parameters.

データ加工/演算: 感情データを追加の入力として受け取り、詐欺の可能性を再評価する。 Data processing/calculation: Emotional data is taken as additional input and the likelihood of fraud is reassessed.

出力: 再評価された詐欺の可能性。 Output: Reassessed fraud probability.

具体的な動作: 生成系人工知能は、感情データを追加の入力として受け取り、詐欺の可能性を再評価する。 Specific behavior: The generative AI receives emotional data as additional input and reassesss the likelihood of fraud.

ステップ９： Step 9:

音声アラートの発信 Send audio alerts

サーバは、詐欺の可能性が高いと判断された場合、通信端末のスピーカーから音声アラートを発するように指示する。 If the server determines that there is a high possibility of fraud, it will instruct the communication device to emit an audio alert from its speaker.

入力: 再評価された詐欺の可能性。 Input: Reassessed potential fraud.

データ加工/演算: 通信端末のアプリケーションに対して、音声アラートを再生するコマンドを送信する。 Data processing/calculation: Sends a command to the communication terminal application to play an audio alert.

出力: 音声アラートの発信。 Output: Plays an audio alert.

具体的な動作: サーバは、通信端末のアプリケーションに対して、音声アラートを再生するコマンドを送信し、通信端末のスピーカーから音声アラートを発する。 Specific operation: The server sends a command to the application on the communication device to play an audio alert, which is then played from the communication device's speaker.

（応用例１） (Application Example 1)

近年、詐欺電話の手口が巧妙化しており、多くの人々が被害に遭っている。特に高齢者や技術に不慣れな人々は、詐欺電話に対する警戒心が低く、被害に遭いやすい。また、詐欺電話を受けた際に、ユーザーがパニックに陥ることが多く、冷静な判断が難しくなる。このような状況を改善し、詐欺被害を未然に防ぐための効果的な手段が求められている。 In recent years, fraudulent phone scams have become more sophisticated, with many people falling victim to them. Elderly people and those unfamiliar with technology are particularly prone to falling victim to these scams, as they are less wary of them. Furthermore, when users receive a fraudulent call, they often panic, making it difficult for them to make a calm judgment. There is a need for effective measures to improve this situation and prevent fraudulent incidents.

この発明では、サーバは、詐欺電話の入電を検知する手段と、通信端末に設置した音声入力装置から会話を聞き取る手段と、会話データを文字化し生成系ＡＩへ送信する手段と、生成系ＡＩが会話データから詐欺が疑われると判断した場合に、前記音声入力装置から音声アラートを発する手段と、ユーザーの声のトーン、音量、話速などから感情を分析する感情エンジンを含む手段と、感情エンジンの分析結果を基に詐欺の可能性をさらに高める手段と、を含む。これにより、詐欺電話の検知とユーザーの感情分析を組み合わせることで、詐欺被害を未然に防ぐことが可能となる。 In this invention, the server includes means for detecting incoming fraudulent calls, means for listening to conversations from a voice input device installed in the communication terminal, means for converting the conversation data into text and sending it to the generative AI, means for issuing a voice alert from the voice input device if the generative AI determines that fraud is suspected from the conversation data, means including an emotion engine for analyzing emotions from the user's voice tone, volume, speaking speed, etc., and means for further increasing the possibility of fraud based on the analysis results of the emotion engine. In this way, by combining fraudulent call detection with user emotion analysis, it is possible to prevent fraud victims before they occur.

「通信端末」とは、音声やデータの送受信を行うための電子機器である。 A "communications terminal" is an electronic device used to send and receive voice and data.

「音声入力装置」とは、音声をデジタルデータに変換するための装置である。 An "audio input device" is a device for converting audio into digital data.

「生成系ＡＩ」とは、入力されたデータを基に特定の判断や生成を行う人工知能である。 "Generative AI" is artificial intelligence that makes specific decisions and generates results based on input data.

「音声アラート」とは、音声によって警告や通知を行う手段である。 "Audio alert" is a means of warning or notifying someone by voice.

「感情エンジン」とは、音声データから感情を分析するためのソフトウェアまたはシステムである。 An "emotion engine" is software or a system for analyzing emotions from voice data.

「トーン」とは、音声の高さや質感を指す。 "Tone" refers to the pitch and texture of the voice.

「音量」とは、音の大きさを指す。 "Volume" refers to the loudness of the sound.

「話速」とは、話す速度を指す。 "Speech rate" refers to the speed at which you speak.

この発明を実施するための形態として、以下のシステム構成を用いる。 The following system configuration is used to implement this invention.

システム構成 System Configuration

1. ハードウェア 1. Hardware

通信端末：スマートフォンや家庭用電話などの音声通信が可能なデバイス。 Communication terminal: A device capable of voice communication, such as a smartphone or home telephone.

音声入力装置：通信端末に内蔵されたマイク。 Audio input device: A microphone built into the communication terminal.

音声出力装置：通信端末に内蔵されたスピーカー。 Audio output device: Speaker built into the communication terminal.

2. ソフトウェア 2. Software

音声認識ライブラリ：speech_recognitionライブラリを使用して、音声をテキストに変換する。 Speech Recognition Library: Use the speech_recognition library to convert speech to text.

生成系AIモデル：transformersライブラリを使用して、詐欺の可能性を判断するためのAIモデル。 Generative AI model: An AI model that uses the transformers library to determine the likelihood of fraud.

感情エンジン：transformersライブラリを使用して、ユーザーの感情を分析するためのモデル。 Emotion Engine: A model for analyzing user emotions using the transformers library.

音声アラートライブラリ：pyttsx3ライブラリを使用して、音声アラートを発する。 Audio Alert Library: Uses the pyttsx3 library to generate audio alerts.

処理の流れ Processing flow

1. 入電検知 1. Incoming call detection

サーバは、通信端末に入電があったことを検知する。 The server detects that a call has been received by the communication terminal.

通信端末の音声入力装置が会話を聞き取り、音声データを取得する。 The communication terminal's voice input device listens to the conversation and acquires voice data.

2. 音声認識 2. Voice Recognition

サーバは、音声認識ライブラリを用いて、取得した音声データをテキストデータに変換する。 The server uses a voice recognition library to convert the acquired voice data into text data.

3. 生成系AIによる詐欺検出 3. Fraud Detection Using Generative AI

サーバは、生成系AIモデルにテキストデータを入力し、詐欺の可能性を判断する。 The server inputs the text data into a generative AI model to determine the likelihood of fraud.

特定のフレーズやパターンが検出された場合、詐欺の可能性が高いと判断する。 If certain phrases or patterns are detected, it is determined that there is a high possibility of fraud.

4. 感情分析 4. Sentiment analysis

サーバは、感情エンジンを用いて、ユーザーの声のトーン、音量、話速などから感情を分析する。 The server uses an emotion engine to analyze the user's emotions based on their tone of voice, volume, speaking speed, etc.

ユーザーがパニック状態にあると判断された場合、詐欺の可能性がさらに高まる。 If a user is perceived to be in a panic, the likelihood of fraud increases even more.

5. 音声アラート 5. Audio alerts

サーバは、詐欺の可能性が高いと判断した場合、音声アラートライブラリを用いて、通信端末の音声出力装置から音声アラートを発する。 If the server determines that there is a high possibility of fraud, it uses the audio alert library to issue an audio alert from the communication terminal's audio output device.

アラートの内容は、ユーザーが詐欺の可能性に気づくことを促すものである。 The alert is intended to alert users to potential fraud.

具体例 Specific examples

例えば、ユーザーが電話を受けた際に、詐欺の可能性があると判断された場合、通信端末のスピーカーから「警告！詐欺の可能性があります。」という音声アラートが発せられる。このアラートにより、ユーザーは詐欺の可能性に気づき、冷静な対応を取ることができる。 For example, if a user receives a phone call and it is determined that the call may be fraudulent, an audio alert will sound from the communication device's speaker saying, "Warning! Possible fraud." This alert will make the user aware of the possibility of fraud and allow them to respond calmly.

プロンプト文の例 Example prompt

「この電話は詐欺の可能性がありますか？」 "Could this call be a scam?"

このようにして、この発明は詐欺電話の検知とユーザーの感情分析を組み合わせることで、詐欺被害を未然に防ぐことが可能となる。 In this way, this invention combines the detection of fraudulent calls with user sentiment analysis, making it possible to prevent fraudulent incidents.

応用例１における特定処理の流れについて図１８を用いて説明する。 The flow of the specific processing in Application Example 1 is explained using Figure 18.

ステップ１： Step 1:

入力：通信端末からの入電信号。 Input: Incoming signal from communication terminal.

データ加工：入電信号を受信し、通話開始をトリガーとして処理を開始する。 Data processing: Receives an incoming call signal and begins processing when the call starts.

出力：通話開始のフラグ。 Output: Call start flag.

具体的な動作：通信端末が電話の着信を受けた際に、サーバはその信号を検知し、通話が開始されたことを認識する。 Specific operation: When a communication terminal receives an incoming call, the server detects the signal and recognizes that a call has started.

ステップ２： Step 2:

端末の音声入力装置が会話を聞き取り、音声データを取得する。 The device's voice input device listens to the conversation and captures the voice data.

入力：通話中の音声。 Input: Audio during a call.

データ加工：音声入力装置が音声データをリアルタイムで収集する。 Data processing: The voice input device collects voice data in real time.

出力：取得された音声データ。 Output: Captured audio data.

具体的な動作：通話中の音声がマイクを通じて収集され、音声データとして保存される。 Specific operation: Audio during a call is collected through the microphone and saved as audio data.

ステップ３： Step 3:

入力：取得された音声データ。 Input: Acquired audio data.

データ加工：音声認識ライブラリ（speech_recognition）を使用して音声データをテキストデータに変換する。 Data processing: Use the speech recognition library (speech_recognition) to convert audio data into text data.

出力：変換されたテキストデータ。 Output: Converted text data.

具体的な動作：音声データが音声認識ライブラリに入力され、テキストデータとして出力される。 Specific operation: Voice data is input into the voice recognition library and output as text data.

ステップ４： Step 4:

入力：変換されたテキストデータ。 Input: Converted text data.

データ加工：生成系AIモデル（transformers）にテキストデータを入力し、詐欺の可能性を評価する。 Data processing: Text data is fed into generative AI models (transformers) to assess the likelihood of fraud.

出力：詐欺の可能性に関する評価結果。 Output: Assessment results regarding the likelihood of fraud.

具体的な動作：テキストデータが生成系AIモデルに入力され、詐欺の可能性が高いかどうかの評価結果が出力される。 Specific operation: Text data is input into a generative AI model, which outputs an evaluation result indicating whether the likelihood of fraud is high.

ステップ５： Step 5:

入力：変換されたテキストデータおよび音声データ。 Input: Converted text and audio data.

データ加工：感情エンジン（transformers）を使用して、ユーザーの感情を分析する。 Data processing: Analyze user emotions using emotion engines (transformers).

出力：ユーザーの感情に関する分析結果。 Output: Analysis results on user sentiment.

具体的な動作：音声データとテキストデータが感情エンジンに入力され、ユーザーの感情状態が分析される。 Specific operation: Voice and text data are input into the emotion engine, and the user's emotional state is analyzed.

ステップ６： Step 6:

入力：詐欺の可能性に関する評価結果およびユーザーの感情に関する分析結果。 Input: Fraud likelihood assessment results and user sentiment analysis results.

データ加工：詐欺の可能性が高いと判断された場合、音声アラートライブラリ（pyttsx3）を使用して音声アラートを生成する。 Data processing: If a high possibility of fraud is determined, an audio alert is generated using the audio alert library (pyttsx3).

出力：音声アラート。 Output: Audio alert.

具体的な動作：詐欺の可能性が高いと判断された場合、通信端末のスピーカーから「警告！詐欺の可能性があります。」という音声アラートが発せられる。 Specific operation: If it is determined that there is a high possibility of fraud, an audio alert will be emitted from the communication device's speaker saying, "Warning! Possible fraud."

（実施例２） (Example 2)

従来の詐欺電話対策システムでは、詐欺の可能性を検出する精度が低く、ユーザーが詐欺に遭うリスクを完全に排除することができなかった。また、ユーザーの感情状態を考慮せずに警告を発するため、ユーザーが適切に対応できない場合があった。これにより、詐欺の被害を未然に防ぐことが困難であった Conventional fraud prevention systems had low accuracy in detecting potential fraud, making it impossible to completely eliminate the risk of users falling victim to fraud. Furthermore, warnings were issued without taking the user's emotional state into consideration, which sometimes led to users being unable to respond appropriately. This made it difficult to prevent fraud from occurring.

この発明では、サーバは、詐欺電話の入電を検知する手段と、通信装置に設置した音声入力装置から会話を聞き取る手段と、会話データを文字化し生成AIモデルへ送信する手段と、生成AIモデルが会話データから詐欺が疑われると判断した場合に、前記音声入力装置から音声アラートを発する手段と、感情解析エンジンがユーザーの声のトーンや話速を解析する手段と、感情解析エンジンが特定の閾値を超えた場合に生成AIモデルが音声アラートを発する手段と、を含む。これにより、詐欺の可能性を高精度で検出し、ユーザーの感情状態に応じた適切な警告を発することが可能となる。 In this invention, the server includes means for detecting incoming fraudulent calls, means for listening to conversations from a voice input device installed in the communication device, means for converting the conversation data into text and sending it to the generative AI model, means for issuing an audio alert from the voice input device if the generative AI model determines that fraud is suspected from the conversation data, means for an emotion analysis engine to analyze the user's tone of voice and speaking rate, and means for the generative AI model to issue an audio alert if the emotion analysis engine exceeds a specific threshold. This makes it possible to detect the possibility of fraud with high accuracy and issue an appropriate warning according to the user's emotional state.

「詐欺電話の入電を検知する手段」とは、通信装置に対して詐欺の可能性がある電話の着信を識別し、通知するための機能である。 "Means for detecting incoming fraudulent calls" refers to a function that identifies and notifies a communication device of incoming calls that may be fraudulent.

「音声入力装置」とは、ユーザーと相手の会話をリアルタイムで録音し、デジタル形式で保存するための装置である。 A "voice input device" is a device that records conversations between a user and another person in real time and stores them in digital format.

「会話データを文字化し生成AIモデルへ送信する手段」とは、音声データをテキストデータに変換し、そのテキストデータを生成AIモデルに送信するための機能である。 "Means for converting conversation data into text and sending it to the generative AI model" refers to a function for converting voice data into text data and sending that text data to the generative AI model.

「生成AIモデル」とは、入力されたデータを解析し、特定のパターンやフレーズを検出して詐欺の可能性を判断する人工知能システムである。 A "generative AI model" is an artificial intelligence system that analyzes input data, detects specific patterns and phrases, and determines the likelihood of fraud.

「音声アラートを発する手段」とは、詐欺の可能性があると判断された場合に、ユーザーに警告を発するための音声通知機能である。 "Means for issuing audio alerts" refers to an audio notification function that warns users if a potential fraud is detected.

「感情解析エンジン」とは、ユーザーの声のトーンや話速を解析し、ユーザーの感情状態を判断するためのエンジンである。 The "emotion analysis engine" is an engine that analyzes the user's tone of voice and speaking speed to determine their emotional state.

「特定の閾値を超えた場合」とは、感情解析エンジンがユーザーの感情状態を解析し、予め設定された基準値を超えた場合を指す。 "When a specific threshold is exceeded" refers to when the emotion analysis engine analyzes the user's emotional state and it exceeds a pre-set reference value.

「通信装置」とは、家庭用電話やスマートフォンなど、音声通話を行うためのデバイスである。 "Communication equipment" refers to devices used for making voice calls, such as home telephones and smartphones.

発明を実施するための形態 Form for implementing the invention

この発明は、詐欺電話の入電を検知し、ユーザーに警告を発するシステムである。以下に、このシステムの具体的な実施形態を説明する。 This invention is a system that detects incoming fraudulent calls and issues a warning to the user. A specific embodiment of this system is described below.

通信装置: 家庭用電話やスマートフォンなど、音声通話を行うためのデバイス Communication devices: devices used for voice calls, such as home phones and smartphones.

音声入力装置: 通信装置に設置されたマイク Audio input device: A microphone installed in the communication device

サーバ: 音声データを処理し、生成AIモデルおよび感情解析エンジンを実行するためのコンピュータシステム Server: A computer system for processing voice data and running generative AI models and sentiment analysis engines.

生成AIモデル: 会話データを解析し、詐欺の可能性を判断する人工知能システム Generative AI model: An artificial intelligence system that analyzes conversation data and determines the likelihood of fraud.

感情解析エンジン: ユーザーの声のトーンや話速を解析し、感情状態を判断するエンジン Emotion analysis engine: An engine that analyzes the user's tone of voice and speaking rate to determine their emotional state.

音声認識ソフトウェア: 音声データをテキストデータに変換するためのソフトウェア（例: Google Speech-to-Text API） Speech recognition software: Software for converting voice data into text data (e.g., Google Speech-to-Text API)

システムの動作 System Operation

1. ユーザが通信装置で会話を始めると、端末の音声入力装置が会話をリアルタイムで録音する。録音された音声データはデジタル形式で保存される。 1. When a user starts a conversation on a communication device, the device's voice input device records the conversation in real time. The recorded voice data is saved in digital format.

2. サーバが音声認識ソフトウェアを使用して、収集された音声データを文字データに変換する。例えば、「こんにちは、銀行の担当者です」という音声が「こんにちは、銀行の担当者です」というテキストに変換される。 2. The server uses speech recognition software to convert the collected voice data into text. For example, the speech "Hello, this is a bank representative" is converted into text "Hello, this is a bank representative."

3. サーバが変換された文字データを生成AIモデルに送信する。生成AIモデルはこのデータを受け取り、解析の準備を行う。 3. The server sends the converted text data to the generative AI model. The generative AI model receives this data and prepares it for analysis.

4. 生成AIモデルが会話データを解析し、特定のキーワードやフレーズ（例: 「銀行口座」「パスワード」）を検出する。これにより、詐欺の可能性があるかどうかを判断する。 4. The generative AI model analyzes the conversation data and detects specific keywords and phrases (e.g., "bank account" or "password"), which determines whether there is potential for fraud.

5. 生成AIモデルが詐欺の可能性が高いと判断した場合、サーバが端末に指示を送り、通信装置のスピーカーから「詐欺の可能性があります。注意してください」という音声アラートを発する。 5. If the generative AI model determines that there is a high possibility of fraud, the server will send instructions to the terminal and an audio alert will be issued from the communication device's speaker saying, "This is likely fraud. Please be careful."

6. 感情解析エンジンがユーザーの声のトーンや話速をリアルタイムで解析する。例えば、ユーザーの声が急に高くなったり、話速が速くなった場合、感情解析エンジンはこれを検出する。 6. The emotion analysis engine analyzes the user's tone of voice and speaking rate in real time. For example, if the user's voice suddenly gets higher in pitch or their speaking rate increases, the emotion analysis engine will detect this.

7. 感情解析エンジンがユーザーの感情が特定の閾値を超えたと判断した場合、生成AIモデルが「詐欺の可能性があります。冷静になってください」という音声アラートを発する。 7. If the sentiment analysis engine determines that the user's emotions exceed a certain threshold, the generative AI model will issue an audio alert saying, "This may be a scam. Please remain calm."

具体例 Specific examples

具体例1: ユーザが電話で「銀行口座の情報を教えてください」と言われた場合、生成AIモデルは「銀行口座」というキーワードを検出し、詐欺の可能性があると判断する。通信装置のスピーカーから「詐欺の可能性があります。注意してください」という音声アラートが発せられる。 Example 1: When a user says over the phone, "Please tell me your bank account information," the generative AI model detects the keyword "bank account" and determines that there is a possibility of fraud. The speaker on the communication device issues a voice alert saying, "There is a possibility of fraud. Please be careful."

具体例2: ユーザが電話で「急いでパスワードを教えてください」と言われ、ユーザーの声のトーンが高くなり、話速が速くなった場合、感情解析エンジンはユーザーが強い感情を感じていると判断する。生成AIモデルは音声アラートを発し、「詐欺の可能性があります。冷静になってください」と警告する。 Example 2: If a user says "Please tell me my password urgently" over the phone and their voice tone gets higher and faster, the emotion analysis engine determines that the user is experiencing strong emotions. The generative AI model issues a voice alert, warning, "This may be a scam. Please remain calm."

プロンプト文の例 Example prompt

プロンプト文1: 「電話で『銀行口座の情報を教えてください』と言われた場合、詐欺の可能性があるかどうかを判断してください。」 Prompt 1: "If someone calls and asks for your bank account information, determine whether it's a scam."

プロンプト文2: 「ユーザーの声のトーンが高くなり、話速が速くなった場合、感情エンジンがどのように判断するか説明してください。」 Prompt 2: "Please explain how the emotion engine determines if the user's voice tone gets higher and their speaking rate gets faster."

このシステムは、ユーザーが電話で詐欺に遭うリスクを低減するために設計されている。生成AIモデルと感情解析エンジンを組み合わせることで、より高い精度で詐欺の可能性を検出し、ユーザーに警告を発することができる。 The system is designed to reduce the risk of users falling victim to telephone fraud. By combining a generative AI model with a sentiment analysis engine, it can detect potential fraud with greater accuracy and alert users.

実施例２における特定処理の流れについて図１９を用いて説明する。 The flow of the identification process in Example 2 will be explained using Figure 19.

ステップ１： Step 1:

ユーザが通信装置で会話を始める。 The user begins a conversation on a communication device.

具体的な動作として、ユーザが家庭用電話やスマートフォンを使用して通話を開始する。入力はユーザの音声であり、出力は音声入力装置によって収集される音声データである。 Specific operations involve a user initiating a call using a home phone or smartphone. The input is the user's voice, and the output is voice data collected by a voice input device.

ステップ２： Step 2:

端末の音声入力装置が会話を聞き取り、音声データを収集する。 The device's voice input device listens to the conversation and collects voice data.

具体的な動作として、端末のマイクがユーザと相手の会話をリアルタイムで録音する。入力はユーザと相手の音声であり、出力はデジタル形式で保存された音声データである。 Specifically, the device's microphone records the conversation between the user and the other party in real time. The input is the user's and the other party's voice, and the output is audio data stored in digital format.

ステップ３： Step 3:

サーバが音声データを文字データに変換する。 The server converts the audio data into text data.

具体的な動作として、サーバが音声認識ソフトウェア（例: Google Speech-to-Text API）を使用して、収集された音声データを文字データに変換する。入力はデジタル形式の音声データであり、出力はテキスト形式の会話データである。 Specifically, the server uses speech recognition software (e.g., Google Speech-to-Text API) to convert the collected voice data into text data. The input is digital voice data, and the output is text conversation data.

ステップ４： Step 4:

サーバが文字化された会話データを生成AIモデルに送信する。 The server sends the transcribed conversation data to the generative AI model.

具体的な動作として、サーバが変換された文字データを生成AIモデルに送信する。入力はテキスト形式の会話データであり、出力は生成AIモデルに送信されたデータである。 Specific operations involve the server sending the converted text data to the generative AI model. The input is text-format conversation data, and the output is the data sent to the generative AI model.

ステップ５： Step 5:

生成AIモデルが会話データを解析し、詐欺の可能性を判断する。 A generative AI model analyzes conversation data to determine the likelihood of fraud.

具体的な動作として、生成AIモデルが会話データを解析し、特定のキーワードやフレーズ（例: 「銀行口座」「パスワード」）を検出する。入力はテキスト形式の会話データであり、出力は詐欺の可能性に関する判断結果である。 Specifically, the generative AI model analyzes conversation data and detects specific keywords and phrases (e.g., "bank account" or "password"). The input is text-based conversation data, and the output is a judgment about the likelihood of fraud.

ステップ６： Step 6:

詐欺の可能性がある場合、端末の音声入力装置から音声アラートを発する。 If there is a possibility of fraud, an audio alert will be issued via the device's voice input device.

具体的な動作として、生成AIモデルが詐欺の可能性が高いと判断した場合、サーバが端末に指示を送り、通信装置のスピーカーから「詐欺の可能性があります。注意してください」という音声アラートを発する。入力は生成AIモデルの判断結果であり、出力は音声アラートである。 Specifically, if the generative AI model determines that there is a high possibility of fraud, the server sends instructions to the terminal and an audio alert is emitted from the communication device's speaker saying, "There is a possibility of fraud. Please be careful." The input is the judgment result of the generative AI model, and the output is an audio alert.

ステップ７： Step 7:

感情解析エンジンがユーザーの声のトーンや話速を解析する。 The emotion analysis engine analyzes the user's tone of voice and speaking speed.

具体的な動作として、感情解析エンジンがユーザーの声のトーンや話速をリアルタイムで解析する。入力はユーザーの音声データであり、出力は感情状態に関する解析結果である。 Specifically, the emotion analysis engine analyzes the user's tone of voice and speaking rate in real time. The input is the user's voice data, and the output is the analysis results regarding their emotional state.

ステップ８： Step 8:

感情解析エンジンが特定の閾値を超えた場合、生成AIモデルが音声アラートを発する。 If the sentiment analysis engine exceeds a certain threshold, the generative AI model will issue an audio alert.

具体的な動作として、感情解析エンジンがユーザーの感情が特定の閾値を超えたと判断した場合、生成AIモデルが「詐欺の可能性があります。冷静になってください」という音声アラートを発する。入力は感情解析エンジンの解析結果であり、出力は音声アラートである。 Specifically, if the emotion analysis engine determines that the user's emotion exceeds a certain threshold, the generative AI model issues an audio alert saying, "This may be a scam. Please remain calm." The input is the analysis result of the emotion analysis engine, and the output is an audio alert.

（応用例２） (Application Example 2)

近年、詐欺電話の手口が巧妙化しており、多くの人々が被害に遭っている。特に高齢者や技術に不慣れな人々は、詐欺電話に対する警戒心が低く、被害に遭いやすい。また、詐欺電話の検知システムが存在するものの、ユーザーの感情状態を考慮した警告システムは少ない。これにより、ユーザーが詐欺の可能性に気づかず、被害を受けるリスクが高まっている。したがって、詐欺電話の検知とユーザーの感情状態を考慮した警告システムを提供することが求められている In recent years, fraudulent phone scams have become increasingly sophisticated, leaving many people vulnerable. Elderly people and those unfamiliar with technology are particularly vulnerable to these scams, as they are less vigilant against them. While systems for detecting fraudulent calls exist, few warning systems take the user's emotional state into account. This increases the risk of users becoming victims of fraudulent calls without realizing they are being scammed. Therefore, there is a need for a system that detects fraudulent calls and takes the user's emotional state into account.

この発明では、サーバは、詐欺電話の入電を検知する手段と、スマートデバイスに設置した音声入力装置から会話を聞き取る手段と、会話データを文字化し生成系AIへ送信する手段と、生成系AIが会話データから詐欺が疑われると判断した場合に、前記音声入力装置から音声アラートを発する手段と、感情エンジンを用いてユーザーの感情を分析し、特定の閾値を超えた場合に警告を発する手段と、を含む。これにより、詐欺電話の検知とユーザーの感情状態を考慮した警告が可能となる。 In this invention, the server includes means for detecting incoming fraudulent calls, means for listening to conversations from a voice input device installed in the smart device, means for converting the conversation data into text and sending it to the generative AI, means for issuing a voice alert from the voice input device if the generative AI determines that fraud is suspected from the conversation data, and means for analyzing the user's emotions using an emotion engine and issuing a warning if a specific threshold is exceeded. This makes it possible to detect fraudulent calls and issue warnings that take into account the user's emotional state.

「スマートデバイス」とは、スマートフォンやタブレットなどのインターネット接続機能を持つ電子機器である。 A "smart device" is an electronic device with internet connectivity, such as a smartphone or tablet.

「音声入力装置」とは、音声をデジタルデータに変換するためのマイクロフォンなどの装置である。 An "audio input device" is a device such as a microphone that converts audio into digital data.

「会話データ」とは、音声入力装置を通じて取得された音声を文字化したデータである。 "Conversation data" refers to data that has been converted into text from speech acquired through a voice input device.

「生成系AI」とは、人工知能を用いてデータを解析し、特定の判断や生成を行うシステムである。 "Generative AI" is a system that uses artificial intelligence to analyze data and make specific decisions or generate results.

「音声アラート」とは、音声を用いてユーザーに警告や通知を行う手段である。 "Audio alert" is a means of warning or notifying the user using audio.

「感情エンジン」とは、音声やテキストデータからユーザーの感情状態を分析するためのアルゴリズムやシステムである。 An "emotion engine" is an algorithm or system that analyzes a user's emotional state from voice and text data.

「特定の閾値」とは、感情エンジンがユーザーの感情状態を評価する際に基準となる数値や条件である。 A "specific threshold" is a numerical value or condition that serves as a benchmark when the emotion engine evaluates the user's emotional state.

「警告」とは、ユーザーに対して注意を促すための通知やアラートである。 A "warning" is a notification or alert that alerts the user.

この発明を実施するためのシステムは、詐欺電話の入電を検知し、ユーザーに警告を発するための一連の手段を含む。以下に、システムの具体的な実施形態を説明する。 A system for implementing this invention includes a series of means for detecting incoming fraudulent calls and issuing a warning to the user. A specific embodiment of the system is described below.

まず、スマートデバイス（例えば、スマートフォンやタブレット）に音声入力装置（マイクロフォン）が設置されている。この音声入力装置は、ユーザーの通話内容をリアルタイムで取得する。取得された音声データは、音声認識ソフトウェア（例えば、speech_recognitionライブラリ）を用いて文字データに変換される。 First, a voice input device (microphone) is installed on a smart device (e.g., a smartphone or tablet). This voice input device captures the user's call content in real time. The captured voice data is converted into text data using voice recognition software (e.g., the speech_recognition library).

次に、文字化された会話データは、生成系AIモデル（例えば、transformersライブラリを用いたモデル）に送信される。この生成系AIモデルは、会話データを解析し、詐欺の可能性があるかどうかを判断する。詐欺の可能性があると判断された場合、音声アラートを発するためのソフトウェア（例えば、pyttsx3ライブラリ）が起動し、ユーザーに対して警告を発する。 The transcribed conversation data is then sent to a generative AI model (e.g., a model using the transformers library). This generative AI model analyzes the conversation data and determines whether there is a possibility of fraud. If a fraudulent activity is detected, software for issuing audio alerts (e.g., the pyttsx3 library) is activated to warn the user.

さらに、感情エンジン（例えば、transformersライブラリを用いた感情分析モデル）を用いて、ユーザーの感情状態を分析する。感情エンジンは、ユーザーの声のトーンや話速などを評価し、特定の閾値を超えた場合に警告を発する。この警告も音声アラートとしてユーザーに通知される。 Furthermore, an emotion engine (e.g., an emotion analysis model using the transformers library) is used to analyze the user's emotional state. The emotion engine evaluates the user's tone of voice, speaking rate, etc., and issues a warning if certain thresholds are exceeded. This warning is also notified to the user as an audio alert.

具体例として、ユーザーが電話で「銀行の口座情報を教えてください」と言われた場合、生成系AIモデルが詐欺の可能性を検知し、音声アラートを発する。また、ユーザーが怒りや不安を感じている場合、感情エンジンがそれを検知し、音声アラートを発する。 For example, if a user calls and says, "Please tell me your bank account information," the generative AI model will detect the possibility of fraud and issue a voice alert. Also, if the user is feeling angry or anxious, the emotion engine will detect this and issue a voice alert.

プロンプト文の例としては、以下のようなものがある。 Examples of prompt statements include:

ユーザーの通話内容: "銀行の口座情報を教えてください。" User's call: "Please tell me my bank account information."

生成AIモデルへのプロンプト: "この通話内容は詐欺の可能性がありますか？" Prompt for generative AI model: "Is this call potentially fraudulent?"

このようにして、詐欺電話の検知とユーザーの感情状態を考慮した警告が可能となるシステムを実現することができる。 In this way, it is possible to create a system that can detect fraudulent calls and issue warnings that take into account the user's emotional state.

応用例２における特定処理の流れについて図２０を用いて説明する。 The flow of the specific processing in Application Example 2 is explained using Figure 20.

ステップ１： Step 1:

ユーザがスマートデバイスで通話を開始する。スマートデバイスの音声入力装置（マイクロフォン）が通話内容をリアルタイムで取得する。入力はユーザの音声データであり、出力は音声データのストリームである。 A user initiates a call on a smart device. The smart device's audio input device (microphone) captures the call content in real time. The input is the user's voice data, and the output is a stream of voice data.

ステップ２： Step 2:

端末が音声認識ソフトウェア（speech_recognitionライブラリ）を用いて、取得した音声データを文字データに変換する。入力は音声データのストリームであり、出力は文字化された会話データである。 The device uses speech recognition software (speech_recognition library) to convert the acquired voice data into text data. The input is a stream of voice data, and the output is transcribed conversation data.

ステップ３： Step 3:

端末が文字化された会話データを生成系AIモデル（transformersライブラリを用いたモデル）に送信する。生成系AIモデルが会話データを解析し、詐欺の可能性があるかどうかを判断する。入力は文字化された会話データであり、出力は詐欺の可能性に関する判断結果である。 The device sends the transcribed conversation data to a generative AI model (a model using the transformers library). The generative AI model analyzes the conversation data and determines whether there is a possibility of fraud. The input is the transcribed conversation data, and the output is a judgment result regarding the possibility of fraud.

ステップ４： Step 4:

サーバが生成系AIモデルの判断結果を受け取り、詐欺の可能性があると判断された場合、音声アラートを発するためのソフトウェア（pyttsx3ライブラリ）を起動する。入力は詐欺の可能性に関する判断結果であり、出力は音声アラートである。 The server receives the judgment results of the generative AI model, and if it determines that there is a possibility of fraud, it launches software (pyttsx3 library) to issue an audio alert. The input is the judgment result regarding the possibility of fraud, and the output is an audio alert.

ステップ５： Step 5:

端末が感情エンジン（transformersライブラリを用いた感情分析モデル）を用いて、ユーザの感情状態を分析する。感情エンジンは、ユーザの声のトーンや話速などを評価し、特定の閾値を超えた場合に警告を発する。入力は文字化された会話データであり、出力は感情状態に関する評価結果である。 The device uses an emotion engine (an emotion analysis model using the Transformers library) to analyze the user's emotional state. The emotion engine evaluates the user's tone of voice, speaking rate, etc., and issues a warning if certain thresholds are exceeded. The input is transcribed conversation data, and the output is an evaluation result regarding the user's emotional state.

ステップ６： Step 6:

サーバが感情エンジンの評価結果を受け取り、特定の閾値を超えた場合、音声アラートを発するためのソフトウェア（pyttsx3ライブラリ）を起動する。入力は感情状態に関する評価結果であり、出力は音声アラートである。 The server receives the emotion engine's evaluation results, and if a certain threshold is exceeded, it launches software (pyttsx3 library) to issue an audio alert. The input is the evaluation result regarding the emotional state, and the output is an audio alert.

ステップ７： Step 7:

ユーザが音声アラートを受け取り、詐欺の可能性や感情状態に関する警告を認識する。入力は音声アラートであり、出力はユーザの認識である。 The user receives an audio alert and acknowledges the warning about potential fraud or emotional state. The input is the audio alert and the output is the user's acknowledgment.

（実施例３） (Example 3)

従来の詐欺検出システムでは、詐欺の可能性を判断する際に、会話データの解析のみを行うため、精度が低いという問題があった。また、ユーザの感情状態を考慮しないため、詐欺の可能性を見逃すリスクが高かった。さらに、詐欺の可能性をユーザに通知する手段が限定的であり、ユーザが詐欺の危険性に気づかない場合があった Conventional fraud detection systems have the problem of low accuracy because they only analyze conversation data to determine the possibility of fraud. Furthermore, because they do not take the user's emotional state into account, there is a high risk of overlooking possible fraud. Furthermore, there are limited ways to notify users of possible fraud, which can lead to users not realizing the risk of fraud.

この発明では、サーバは、詐欺電話の入電を検知する手段と、通信端末に設置した音声入力装置から会話を聞き取る手段と、会話データを文字化し生成系人工知能へ送信する手段と、生成系人工知能が会話データから詐欺が疑われると判断した場合に、前記音声入力装置から音声アラートを発する手段と、感情解析エンジンがユーザの感情を分析する手段と、生成系人工知能が感情解析エンジンからの情報と会話データを統合して詐欺の可能性を判断する手段と、を含む。これにより、詐欺の可能性を高精度に検出し、ユーザに迅速に通知することが可能となる。 In this invention, the server includes means for detecting incoming fraudulent calls, means for listening to conversations from a voice input device installed in the communication terminal, means for converting the conversation data into text and transmitting it to the generative artificial intelligence, means for issuing a voice alert from the voice input device if the generative artificial intelligence determines that fraud is suspected from the conversation data, means for an emotion analysis engine to analyze the user's emotions, and means for the generative artificial intelligence to integrate information from the emotion analysis engine and the conversation data to determine the possibility of fraud. This makes it possible to detect the possibility of fraud with high accuracy and notify the user promptly.

「詐欺電話の入電を検知する手段」とは、通信回線を監視し、詐欺の可能性がある電話の着信を識別するための装置またはソフトウェアである。 "Means for detecting incoming fraudulent calls" refers to devices or software that monitor communication lines and identify incoming calls that may be fraudulent.

「通信端末に設置した音声入力装置」とは、スマートフォンや家庭用電話などの通信機器に取り付けられた、音声を収集するためのマイクロフォンである。 "A voice input device installed on a communication terminal" refers to a microphone attached to a communication device such as a smartphone or home telephone for collecting voice.

「会話データを文字化し生成系人工知能へ送信する手段」とは、音声データをテキストデータに変換し、そのテキストデータを生成系人工知能に送信するためのソフトウェアまたは装置である。 "Means for converting conversation data into text and sending it to generative AI" refers to software or a device for converting voice data into text data and sending that text data to generative AI.

「生成系人工知能」とは、自然言語処理技術を用いてテキストデータを解析し、特定のパターンやフレーズを検出するための人工知能モデルである。 "Generative AI" is an artificial intelligence model that uses natural language processing technology to analyze text data and detect specific patterns and phrases.

「音声アラートを発する手段」とは、生成系人工知能が詐欺の可能性を検出した場合に、ユーザに警告を発するための音声出力装置である。 "Means for issuing audio alerts" refers to an audio output device that issues a warning to the user if the generative artificial intelligence detects the possibility of fraud.

「感情解析エンジン」とは、ユーザの音声やテキストデータを解析し、感情状態を特定するためのソフトウェアまたは装置である。 An "emotion analysis engine" is software or a device that analyzes a user's voice and text data to identify their emotional state.

「詐欺の可能性を判断する手段」とは、生成系人工知能が会話データと感情解析エンジンからの情報を統合し、詐欺の可能性を評価するためのアルゴリズムまたはソフトウェアである。 "Means for determining the likelihood of fraud" refers to an algorithm or software that uses generative artificial intelligence to integrate conversational data and information from a sentiment analysis engine to assess the likelihood of fraud.

この発明は、詐欺電話の入電を検知し、ユーザに詐欺の可能性を通知するシステムである。以下に、このシステムの具体的な実施形態を説明する。 This invention is a system that detects incoming fraudulent calls and notifies users of the possibility of fraud. A specific embodiment of this system is described below.

まず、ユーザが通信端末（スマートフォンや家庭用電話など）を使用している際に、詐欺電話の入電を検知する手段が動作する。この手段は、通信回線を監視し、詐欺の可能性がある電話の着信を識別するための装置またはソフトウェアである。 First, while a user is using a communication terminal (such as a smartphone or home phone), a means for detecting incoming fraudulent calls is activated. This means is a device or software that monitors communication lines and identifies incoming calls that may be fraudulent.

次に、通信端末に設置された音声入力装置（マイクロフォン）が、ユーザと詐欺電話の発信者との会話を聞き取る。この音声データは、会話データを文字化し生成系人工知能へ送信する手段によってテキストデータに変換される。この変換には、音声認識技術を用いる。具体的には、Google Speech-to-Text APIやIBM Watson Speech to Textなどの音声認識サービスを利用することができる。 Next, a voice input device (microphone) installed on the communication terminal listens to the conversation between the user and the caller making the fraudulent call. This voice data is converted into text data by transcribing the conversation data and sending it to generative artificial intelligence. This conversion uses voice recognition technology. Specifically, voice recognition services such as Google Speech-to-Text API and IBM Watson Speech to Text can be used.

サーバは、生成系人工知能（例えば、BERTやGPT-3（登録商標）などの自然言語処理モデル）を用いて、テキストデータを解析する。生成系人工知能は、詐欺の可能性がある特定のフレーズやパターンを検出し、その結果を評価する。例えば、「お金を振り込んでください」というフレーズが含まれている場合、詐欺の可能性が高いと判断する。 The server analyzes the text data using generative artificial intelligence (e.g., natural language processing models such as BERT or GPT-3 (registered trademark)). The generative artificial intelligence detects specific phrases and patterns that may be fraudulent and evaluates the results. For example, if the phrase "Please transfer money" is included, it will determine that there is a high possibility of fraud.

さらに、サーバは感情解析エンジン（例えば、IBM WatsonやMicrosoft（登録商標） Azure（登録商標）の感情分析API）を使用して、ユーザの感情を分析する。ユーザの音声やテキストデータを解析し、不安や緊張などの感情状態を特定する。感情解析エンジンが得た感情データは、生成系人工知能に送信される。 Furthermore, the server uses an emotion analysis engine (for example, IBM Watson or Microsoft® Azure® emotion analysis API) to analyze the user's emotions. It analyzes the user's voice and text data to identify emotional states such as anxiety or tension. The emotion data obtained by the emotion analysis engine is sent to the generative artificial intelligence.

生成系人工知能は、感情解析エンジンからの情報と会話データを統合し、詐欺の可能性をより高精度に判断する。例えば、メッセージが「お金を振り込んでください」であり、ユーザの感情が不安である場合、詐欺の可能性が非常に高いと結論付ける。 Generative AI combines information from the sentiment analysis engine with conversational data to more accurately assess the likelihood of fraud. For example, if the message is "Please transfer money" and the user's emotions are anxious, it will conclude that there is a very high possibility of fraud.

最終的に、サーバは詐欺の可能性が高いと判断した場合、その結果をユーザに通知する。通知は、音声アラートを発する手段によって行われる。具体的には、通信端末のスピーカーから「このメッセージは詐欺の可能性があります。注意してください。」という音声アラートが発せられる。 Finally, if the server determines that the message is likely to be fraudulent, it will notify the user of this result. The notification is made by issuing an audio alert. Specifically, an audio alert will be issued from the communication device's speaker saying, "This message may be fraudulent. Please be careful."

具体例として、ユーザが「お金を振り込んでください」というメッセージを受け取った場合を考える。このメッセージは、通信端末の音声入力装置によって聞き取られ、テキストデータに変換される。サーバは生成系人工知能を用いてこのテキストデータを解析し、詐欺の可能性が高いと判断する。同時に、感情解析エンジンがユーザの不安を検出し、その情報を生成系人工知能に送信する。サーバはこれらの情報を統合し、詐欺の可能性が非常に高いと結論付ける。最終的に、サーバは「このメッセージは詐欺の可能性があります。注意してください。」という音声アラートをユーザの通信端末に送信する。 As a concrete example, consider the case where a user receives a message saying, "Please transfer money." This message is picked up by the voice input device of the communication device and converted into text data. The server analyzes this text data using generative artificial intelligence and determines that there is a high possibility of fraud. At the same time, the emotion analysis engine detects the user's anxiety and sends this information to the generative artificial intelligence. The server combines this information and concludes that there is a very high possibility of fraud. Finally, the server sends an audio alert to the user's communication device saying, "This message may be fraudulent. Please be careful."

プロンプト文の例： Example prompt:

「以下のメッセージが詐欺の可能性があるかどうかを判断してください。メッセージ：『お金を振り込んでください。』」 "Please determine whether the following message is likely a scam. Message: 'Please transfer money.'"

このようにして、サーバ、端末、ユーザが連携して、詐欺の可能性を高精度に検出し、ユーザに迅速に通知することが可能となる。実施例３における特定処理の流れについて図２１を用いて説明する。 In this way, the server, terminal, and user work together to detect the possibility of fraud with high accuracy and quickly notify the user. The flow of the identification process in Example 3 will be explained using Figure 21.

ステップ１： Step 1:

ユーザがメッセージを入力する。 The user enters a message.

ユーザは、通信端末（スマートフォンや家庭用電話など）を使用して、受信したメッセージを入力する。例えば、「お金を振り込んでください」というメッセージを入力する。入力されたメッセージは、音声データとして端末に保存される。 The user uses a communication device (such as a smartphone or home phone) to input the received message. For example, they might input a message such as "Please transfer money." The input message is saved on the device as voice data.

ステップ２： Step 2:

端末がメッセージをサーバに送信する。 The device sends a message to the server.

端末は、ユーザが入力した音声データをサーバに送信する。この際、端末はインターネットを介して音声データをサーバに送信する。入力は音声データであり、出力はサーバへの音声データの転送である。 The terminal sends the voice data entered by the user to the server. At this time, the terminal sends the voice data to the server via the Internet. The input is voice data, and the output is the transfer of voice data to the server.

ステップ３： Step 3:

サーバが音声データをテキストデータに変換する。 The server converts the audio data into text data.

サーバは、受信した音声データを音声認識技術を用いてテキストデータに変換する。具体的には、Google Speech-to-Text APIやIBM Watson Speech to Textなどの音声認識サービスを利用する。入力は音声データであり、出力はテキストデータである。 The server converts the received voice data into text data using voice recognition technology. Specifically, it uses voice recognition services such as Google Speech-to-Text API and IBM Watson Speech to Text. The input is voice data, and the output is text data.

ステップ４： Step 4:

サーバが生成系人工知能を用いてテキストデータを解析する。 The server analyzes the text data using generative artificial intelligence.

サーバは、生成系人工知能（例えば、BERTやGPT-3などの自然言語処理モデル）を用いて、テキストデータを解析する。生成系人工知能は、詐欺の可能性がある特定のフレーズやパターンを検出し、その結果を評価する。入力はテキストデータであり、出力は詐欺の可能性に関する評価結果である。 The server analyzes the text data using generative artificial intelligence (e.g., natural language processing models such as BERT or GPT-3). The generative artificial intelligence detects specific phrases and patterns that may be fraudulent and evaluates the results. The input is the text data, and the output is an evaluation result regarding the likelihood of fraud.

ステップ５： Step 5:

サーバが感情解析エンジンを用いてユーザの感情を分析する。 The server analyzes the user's emotions using an emotion analysis engine.

サーバは、感情解析エンジン（例えば、IBM WatsonやMicrosoft Azureの感情分析API）を使用して、ユーザの感情を分析する。ユーザの音声やテキストデータを解析し、不安や緊張などの感情状態を特定する。入力は音声データまたはテキストデータであり、出力は感情データである。 The server uses an emotion analysis engine (for example, IBM Watson or Microsoft Azure's emotion analysis API) to analyze the user's emotions. It analyzes the user's voice and text data to identify emotional states such as anxiety or tension. The input is voice or text data, and the output is emotion data.

ステップ６： Step 6:

サーバが生成系人工知能と感情解析エンジンの結果を統合し、詐欺の可能性を判断する。 The server combines the results of the generative artificial intelligence and the sentiment analysis engine to determine the likelihood of fraud.

サーバは、生成系人工知能からの解析結果と感情解析エンジンからの感情データを統合する。これにより、詐欺の可能性をより高精度に判断する。例えば、メッセージが「お金を振り込んでください」であり、ユーザの感情が不安である場合、詐欺の可能性が非常に高いと結論付ける。入力は解析結果と感情データであり、出力は詐欺の可能性に関する最終評価である。 The server integrates the analysis results from the generative AI with the emotional data from the emotion analysis engine. This allows for a more accurate assessment of the likelihood of fraud. For example, if the message is "Please transfer money" and the user's emotions are anxious, it will conclude that there is a very high likelihood of fraud. The input is the analysis results and emotional data, and the output is a final assessment of the likelihood of fraud.

ステップ７： Step 7:

サーバが詐欺の可能性をユーザに通知する。 The server will notify the user of possible fraud.

サーバは、詐欺の可能性が高いと判断した場合、その結果をユーザに通知する。通知は、音声アラートを発する手段によって行われる。具体的には、通信端末のスピーカーから「このメッセージは詐欺の可能性があります。注意してください。」という音声アラートが発せられる。入力は詐欺の可能性に関する最終評価であり、出力は音声アラートである。 If the server determines that there is a high possibility of fraud, it notifies the user of the result. The notification is done by issuing an audio alert. Specifically, an audio alert saying "This message may be fraudulent. Please be careful" is issued from the speaker of the communication terminal. The input is the final assessment of the possibility of fraud, and the output is the audio alert.

（応用例３） (Application Example 3)

従来の詐欺検出システムでは、特定のフレーズやパターンを検出するだけであり、ユーザーの感情状態を考慮しないため、詐欺の可能性を正確に判断することが難しいという課題があった。また、詐欺の可能性が高い場合にユーザーに適切な警告を提供する手段が不足していた Traditional fraud detection systems only detect specific phrases and patterns and do not take the user's emotional state into account, making it difficult to accurately determine the likelihood of fraud. Furthermore, there was a lack of means to provide appropriate warnings to users when fraud was likely.

応用例３におけるデータ処理装置１２の特定処理部２９０による特定処理を、以下の各手段により実現する。この発明では、サーバは、詐欺電話の入電を検知する手段と、スマートフォンアプリケーションまたは家庭用電話に設置したマイクから会話を聞き取る手段と、会話データを文字化し生成系ＡＩへ送信する手段と、生成系ＡＩが会話データから詐欺が疑われると判断した場合に、前記マイクから音声アラートを発する手段と、感情エンジンを用いてユーザーの感情を分析する手段と、感情エンジンからの情報と生成系ＡＩが分析した会話データを組み合わせて詐欺の可能性を判断する手段と、詐欺の可能性が高い場合にユーザーに警告を表示する手段を含む。これにより、詐欺の可能性をより高精度に判断し、ユーザーに適切な警告を提供することが可能となる。 The identification processing by the identification processing unit 290 of the data processing device 12 in Application Example 3 is realized by the following means. In this invention, the server includes means for detecting incoming fraudulent calls, means for listening to conversations from a smartphone application or a microphone installed in a home phone, means for converting the conversation data into text and sending it to the generative AI, means for issuing an audio alert from the microphone if the generative AI determines that fraud is suspected based on the conversation data, means for analyzing the user's emotions using an emotion engine, means for determining the possibility of fraud by combining information from the emotion engine with the conversation data analyzed by the generative AI, and means for displaying a warning to the user if there is a high possibility of fraud. This makes it possible to more accurately determine the possibility of fraud and provide the user with an appropriate warning.

「詐欺電話の入電を検知する手段」とは、電話回線や通信ネットワークを監視し、詐欺の可能性がある電話の着信を検出する装置やソフトウェアである。 "Means for detecting incoming fraudulent calls" refers to devices or software that monitor telephone lines or communications networks and detect incoming calls that may be fraudulent.

「スマートフォンアプリケーションまたは家庭用電話に設置したマイク」とは、音声を収集するためにスマートフォンや家庭用電話に取り付けられた音声入力装置である。 "A microphone installed in a smartphone application or home phone" is a voice input device attached to a smartphone or home phone to collect audio.

「会話を聞き取る手段」とは、マイクを通じて収集された音声データを解析し、会話内容を認識する装置やソフトウェアである。 "Means for listening to conversations" refers to devices or software that analyze audio data collected through a microphone and recognize the content of the conversation.

「会話データを文字化し生成系ＡＩへ送信する手段」とは、音声データをテキストデータに変換し、そのテキストデータを生成系ＡＩに送信する装置やソフトウェアである。 "Means for converting conversation data into text and sending it to the generative AI" refers to a device or software that converts voice data into text data and sends that text data to the generative AI.

「生成系ＡＩ」とは、特定のタスクを実行するために訓練された人工知能モデルであり、ここでは詐欺の可能性を判断するために使用される。 "Generative AI" is an artificial intelligence model trained to perform a specific task, in this case to determine the likelihood of fraud.

「詐欺が疑われると判断した場合に、前記マイクから音声アラートを発する手段」とは、生成系ＡＩが詐欺の可能性を検出した際に、マイクを通じて警告音やメッセージを発する装置やソフトウェアである。 "Means for issuing an audio alert from the microphone when it is determined that fraud is suspected" refers to a device or software that issues an alarm or message through the microphone when the generative AI detects the possibility of fraud.

「感情エンジン」とは、ユーザーの音声やテキストデータから感情状態を解析するためのソフトウェアまたは装置である。 An "emotion engine" is software or a device that analyzes a user's emotional state from their voice or text data.

「ユーザーの感情を分析する手段」とは、感情エンジンを用いてユーザーの感情状態を解析する装置やソフトウェアである。 "Means for analyzing user emotions" refers to devices or software that use an emotion engine to analyze the user's emotional state.

「感情エンジンからの情報と生成系ＡＩが分析した会話データを組み合わせて詐欺の可能性を判断する手段」とは、感情エンジンから得られた感情情報と生成系ＡＩによる会話データの解析結果を統合し、詐欺の可能性を総合的に判断する装置やソフトウェアである。 "Means for determining the possibility of fraud by combining information from the emotion engine with conversation data analyzed by generative AI" refers to devices or software that integrate emotional information obtained from the emotion engine with the results of analysis of conversation data by generative AI, and make a comprehensive determination of the possibility of fraud.

「詐欺の可能性が高い場合にユーザーに警告を表示する手段」とは、詐欺の可能性が高いと判断された場合に、ユーザーに対して視覚的または聴覚的に警告を提供する装置やソフトウェアである。 "Means for displaying a warning to the user in the event of a high likelihood of fraud" refers to devices or software that provide a visual or audible warning to the user in the event that a high likelihood of fraud is determined.

この発明を実施するためのシステムは、詐欺電話の入電を検知し、ユーザーに警告を提供するための一連の手段を含む。以下に、具体的な実施形態について説明する。 A system for implementing this invention includes a series of means for detecting incoming fraudulent calls and providing a warning to the user. Specific embodiments are described below.

まず、サーバは詐欺電話の入電を検知する手段を備えている。この手段は、電話回線や通信ネットワークを監視し、詐欺の可能性がある電話の着信を検出する装置やソフトウェアである。例えば、特定の電話番号や発信元をブラックリストに登録し、それに基づいて詐欺電話を検知することができる。 First, the server is equipped with a means for detecting incoming fraudulent calls. This means is a device or software that monitors telephone lines and communication networks to detect potentially fraudulent calls. For example, specific phone numbers or callers can be registered on a blacklist, and fraudulent calls can be detected based on this.

次に、端末（スマートフォンアプリケーションまたは家庭用電話）には、会話を聞き取るためのマイクが設置されている。このマイクは、通話中の音声を収集し、会話データとしてサーバに送信する。会話データは、音声認識技術を用いて文字化され、生成系ＡＩに送信される。 Next, the device (smartphone application or home phone) is equipped with a microphone to pick up on the conversation. This microphone collects the voice during the call and sends it to the server as conversation data. The conversation data is then converted into text using voice recognition technology and sent to the generative AI.

生成系ＡＩは、会話データを解析し、詐欺の可能性を判断する。この際、生成系ＡＩは特定のフレーズやパターンを検出するために訓練されたモデルを使用する。例えば、「振り込め詐欺」や「オレオレ詐欺」の典型的なフレーズを検出することができる。 Generative AI analyzes conversation data to determine the likelihood of fraud. In doing so, it uses models trained to detect specific phrases and patterns. For example, it can detect typical phrases such as "bank transfer fraud" and "it's me" fraud.

さらに、感情エンジンがユーザーの感情を分析する。感情エンジンは、音声やテキストデータからユーザーの感情状態を解析するためのソフトウェアである。感情エンジンから得られた感情情報と生成系ＡＩによる会話データの解析結果を統合し、詐欺の可能性を総合的に判断する。 Furthermore, an emotion engine analyzes the user's emotions. The emotion engine is software that analyzes the user's emotional state from voice and text data. The emotional information obtained from the emotion engine is integrated with the results of analysis of conversation data by generative AI to make a comprehensive judgment on the possibility of fraud.

詐欺の可能性が高いと判断された場合、端末はユーザーに対して警告を表示する。この警告は、視覚的または聴覚的に提供される。例えば、スマートフォンの画面に警告メッセージを表示したり、音声アラートを発することができる。 If it determines that there is a high possibility of fraud, the device will display a warning to the user. This warning can be provided visually or audibly. For example, a warning message can be displayed on the smartphone screen or an audio alert can be issued.

具体例として、以下のようなユーザー入力が考えられる： For example, consider the following user input:

「あなたの口座が凍結されました。急いで振り込んでください。」 "Your account has been frozen. Please transfer the funds immediately."

「オレオレ、今すぐお金が必要なんだ。」 "Hey, I need money right now."

これらのプロンプト文を生成系ＡＩモデルに入力することで、詐欺の可能性を解析し、感情エンジンと連携してユーザーに警告を表示することができる。 By inputting these prompts into a generative AI model, it can analyze the possibility of fraud and, in conjunction with an emotion engine, display a warning to the user.

応用例３における特定処理の流れについて図２２を用いて説明する。 The flow of the specific processing in Application Example 3 will be explained using Figure 22.

ステップ１： Step 1:

サーバは、電話回線や通信ネットワークを監視し、詐欺の可能性がある電話の着信を検出する。入力は電話回線や通信ネットワークからの信号であり、出力は詐欺電話の入電を示すフラグである。具体的には、特定の電話番号や発信元をブラックリストに登録し、それに基づいて詐欺電話を検知する。 The server monitors telephone lines and communication networks to detect incoming calls that may be fraudulent. The input is a signal from the telephone line or communication network, and the output is a flag indicating an incoming fraudulent call. Specifically, it registers specific phone numbers and callers on a blacklist and detects fraudulent calls based on that.

ステップ２： Step 2:

端末（スマートフォンアプリケーションまたは家庭用電話）は、マイクを通じて通話中の音声を収集する。入力は通話中の音声信号であり、出力は音声データである。具体的には、マイクが音声をデジタル信号に変換し、そのデータをサーバに送信する。 The device (smartphone application or home phone) collects audio during a call through a microphone. The input is the audio signal during the call, and the output is audio data. Specifically, the microphone converts the audio into a digital signal and sends that data to a server.

ステップ３： Step 3:

サーバは、音声認識技術を用いて音声データを文字化する。入力は音声データであり、出力はテキストデータである。具体的には、音声認識ソフトウェアが音声データを解析し、対応するテキストに変換する。 The server uses voice recognition technology to convert the voice data into text. The input is voice data and the output is text data. Specifically, the voice recognition software analyzes the voice data and converts it into corresponding text.

ステップ４： Step 4:

サーバは、生成系ＡＩにテキストデータを送信し、詐欺の可能性を解析する。入力はテキストデータであり、出力は詐欺の可能性を示すスコアやフラグである。具体的には、生成系ＡＩモデルが特定のフレーズやパターンを検出し、詐欺の可能性を判断する。 The server sends text data to the generative AI, which analyzes it for the possibility of fraud. The input is text data, and the output is a score or flag indicating the possibility of fraud. Specifically, the generative AI model detects specific phrases and patterns and determines the possibility of fraud.

ステップ５： Step 5:

サーバは、感情エンジンを用いてユーザーの感情を分析する。入力はテキストデータであり、出力は感情状態を示すデータである。具体的には、感情エンジンがテキストデータを解析し、ユーザーの感情状態を特定する。 The server uses an emotion engine to analyze the user's emotions. The input is text data, and the output is data indicating the user's emotional state. Specifically, the emotion engine analyzes the text data and identifies the user's emotional state.

ステップ６： Step 6:

サーバは、感情エンジンからの情報と生成系ＡＩが分析した会話データを組み合わせて詐欺の可能性を総合的に判断する。入力は感情状態データと詐欺の可能性を示すスコアやフラグであり、出力は最終的な詐欺の可能性を示すフラグである。具体的には、感情情報と生成系ＡＩの解析結果を統合し、詐欺の可能性を高精度に判断する。 The server combines information from the emotion engine with conversation data analyzed by the generative AI to make a comprehensive judgment on the possibility of fraud. The input is emotional state data and a score or flag indicating the possibility of fraud, and the output is a final flag indicating the possibility of fraud. Specifically, it integrates emotional information with the analysis results of the generative AI to make a highly accurate judgment on the possibility of fraud.

ステップ７： Step 7:

詐欺の可能性が高いと判断された場合、端末はユーザーに対して警告を表示する。入力は最終的な詐欺の可能性を示すフラグであり、出力は警告メッセージや音声アラートである。具体的には、スマートフォンの画面に警告メッセージを表示したり、音声アラートを発する。 If it determines that there is a high possibility of fraud, the device will display a warning to the user. The input is a flag indicating the final possibility of fraud, and the output is a warning message or audio alert. Specifically, a warning message is displayed on the smartphone screen or an audio alert is issued.

特定処理部２９０は、特定処理の結果をスマートデバイス１４に送信する。スマートデバイス１４では、制御部４６Ａが、出力装置４０に対して特定処理の結果を出力させる。マイクロフォン３８Ｂは、特定処理の結果に対するユーザ入力を示す音声を取得する。制御部４６Ａは、マイクロフォン３８Ｂによって取得されたユーザ入力を示す音声データをデータ処理装置１２に送信する。データ処理装置１２では、特定処理部２９０が音声データを取得する。 The specific processing unit 290 transmits the results of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the results of the specific processing. The microphone 38B acquires audio indicating the user input regarding the results of the specific processing. The control unit 46A transmits audio data indicating the user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

データ生成モデル５８は、いわゆる生成ＡＩ（Artificial Intelligence）である。データ生成モデル５８の一例としては、ChatGPT（登録商標）（インターネット検索＜URL: https://openai.com/blog/chatgpt＞）等の生成ＡＩが挙げられる。データ生成モデル５８は、ニューラルネットワークに対して深層学習を行わせることによって得られる。データ生成モデル５８には、指示を含むプロンプトが入力され、かつ、音声を示す音声データ、テキストを示すテキストデータ、及び画像を示す画像データ等の推論用データが入力される。データ生成モデル５８は、入力された推論用データをプロンプトにより示される指示に従って推論し、推論結果を音声データ及びテキストデータ等のデータ形式で出力する。ここで、推論とは、例えば、分析、分類、予測、及び／又は要約等を指す。 Data generation model 58 is what is known as generative AI (artificial intelligence). An example of data generation model 58 is generative AI such as ChatGPT (registered trademark) (Internet search <URL: https://openai.com/blog/chatgpt>). Data generation model 58 is obtained by performing deep learning on a neural network. A prompt containing an instruction is input to data generation model 58, and inference data such as voice data indicating voice, text data indicating text, and image data indicating an image is also input. Data generation model 58 performs inference on the input inference data in accordance with the instructions indicated by the prompt, and outputs the inference results in the form of data such as voice data and text data. Here, inference refers to, for example, analysis, classification, prediction, and/or summarization.

生成AIの他の例としては、Ｇｅｍｉｎｉ（登録商標）（インターネット検索＜URL: https://gemini.google.com/?hl=ja＞）が挙げられる。 Another example of generative AI is Gemini (registered trademark) (Internet search <URL: https://gemini.google.com/?hl=ja>).

上記実施形態では、データ処理装置１２によって特定処理が行われる形態例を挙げたが、本開示の技術はこれに限定されず、スマートデバイス１４によって特定処理が行われるようにしてもよい。 In the above embodiment, an example was given in which the specific processing was performed by the data processing device 12, but the technology disclosed herein is not limited to this, and the specific processing may also be performed by the smart device 14.

［第２実施形態］ [Second embodiment]

図３には、第２実施形態に係るデータ処理システム２１０の構成の一例が示されている。 Figure 3 shows an example of the configuration of a data processing system 210 according to the second embodiment.

図３に示すように、データ処理システム２１０は、データ処理装置１２及びスマート眼鏡２１４を備えている。データ処理装置１２の一例としては、サーバが挙げられる。 As shown in FIG. 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

スマート眼鏡２１４は、コンピュータ３６、マイクロフォン２３８、スピーカ２４０、カメラ４２、及び通信Ｉ／Ｆ４４を備えている。コンピュータ３６は、プロセッサ４６、ＲＡＭ４８、及びストレージ５０を備えている。プロセッサ４６、ＲＡＭ４８、及びストレージ５０は、バス５２に接続されている。また、マイクロフォン２３８、スピーカ２４０、及びカメラ４２も、バス５２に接続されている。 The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication I/F 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

マイクロフォン２３８は、ユーザ２０が発する音声を受け付けることで、ユーザ２０から指示等を受け付ける。マイクロフォン２３８は、ユーザ２０が発する音声を捕捉し、捕捉した音声を音声データに変換してプロセッサ４６に出力する。スピーカ２４０は、プロセッサ４６からの指示に従って音声を出力する。 The microphone 238 receives instructions and the like from the user 20 by receiving voice uttered by the user 20. The microphone 238 captures the voice uttered by the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio in accordance with instructions from the processor 46.

カメラ４２は、レンズ、絞り、及びシャッタ等の光学系と、ＣＭＯＳ（Complementary Metal-Oxide-Semiconductor）イメージセンサ又はＣＣＤ（Charge Coupled Device）イメージセンサ等の撮像素子とが搭載された小型デジタルカメラであり、ユーザ２０の周囲（例えば、一般的な健常者の視界の広さに相当する画角で規定された撮像範囲）を撮像する。 Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an imaging element such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the user 20's surroundings (e.g., an imaging range defined by an angle of view equivalent to the field of vision of a typical healthy person).

通信Ｉ／Ｆ４４は、ネットワーク５４に接続されている。通信Ｉ／Ｆ４４及び２６は、ネットワーク５４を介してプロセッサ４６とプロセッサ２８との間の各種情報の授受を司る。通信Ｉ／Ｆ４４及び２６を用いたプロセッサ４６とプロセッサ２８との間の各種情報の授受はセキュアな状態で行われる。 The communication I/F 44 is connected to the network 54. The communication I/Fs 44 and 26 are responsible for the exchange of various information between the processor 46 and the processor 28 via the network 54. The exchange of various information between the processor 46 and the processor 28 using the communication I/Fs 44 and 26 is carried out in a secure manner.

図４には、データ処理装置１２及びスマート眼鏡２１４の要部機能の一例が示されている。図４に示すように、データ処理装置１２では、プロセッサ２８によって特定処理が行われる。ストレージ３２には、特定処理プログラム５６が格納されている。 Figure 4 shows an example of the main functions of the data processing device 12 and smart glasses 214. As shown in Figure 4, in the data processing device 12, specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32.

特定処理プログラム５６は、本開示の技術に係る「プログラム」の一例である。プロセッサ２８は、ストレージ３２から特定処理プログラム５６を読み出し、読み出した特定処理プログラム５６をＲＡＭ３０上で実行する。特定処理は、プロセッサ２８がＲＡＭ３０上で実行する特定処理プログラム５６に従って、特定処理部２９０として動作することによって実現される。 The specific processing program 56 is an example of a "program" according to the technology of the present disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as the specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

スマート眼鏡２１４では、プロセッサ４６によって受付出力処理が行われる。ストレージ５０には、受付出力プログラム６０が格納されている。プロセッサ４６は、ストレージ５０から受付出力プログラム６０を読み出し、読み出した受付出力プログラム６０をＲＡＭ４８上で実行する。受付出力処理は、プロセッサ４６がＲＡＭ４８上で実行する受付出力プログラム６０に従って、制御部４６Ａとして動作することによって実現される。 In the smart glasses 214, the reception output process is performed by the processor 46. A reception output program 60 is stored in the storage 50. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output process is realized by the processor 46 operating as the control unit 46A in accordance with the reception output program 60 executed on the RAM 48.

「形態例１」 "Example 1"

「形態例２」 "Example 2"

「形態例３」 "Example 3"

「形態例１」 "Example 1"

ステップ３：聞き取った会話データは文字化され、生成系ＡＩへ送信される。 Step 3: The conversation data heard is transcribed and sent to the generative AI.

「形態例２」 "Example 2"

「形態例３」 "Example 3"

（実施例１） (Example 1)

次に、形態例１の実施例１について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、スマート眼鏡２１４を「端末」と称する。 Next, Example 1 of Form Example 1 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal."

ユーザが電話に出ると、端末はマイクを通じて会話をリアルタイムで聞き取る。この際、アプリケーションはバックグラウンドで動作し、会話内容をキャプチャする。聞き取った会話データは、Google Speech-to-Text APIを使用して文字化される。音声データをAPIに送信し、テキストデータとして受け取る。 When a user answers a call, the device listens to the conversation in real time through the microphone. At this time, the application runs in the background and captures the conversation. The listened conversation data is transcribed using the Google Speech-to-Text API. The audio data is sent to the API and received as text data.

文字化された会話データは、インターネットを通じてサーバに送信される。送信にはHTTPSプロトコルを使用し、データのセキュリティを確保する。サーバは、受信した会話データを生成AIモデル（例えば、OpenAIのGPT-4）に入力する。生成AIモデルは、プロンプト文を用いて詐欺の可能性を解析する。具体的なプロンプト文の例は以下の通りである： The transcribed conversation data is sent to a server via the internet. The HTTPS protocol is used for transmission to ensure data security. The server inputs the received conversation data into a generative AI model (e.g., OpenAI's GPT-4). The generative AI model uses prompt text to analyze the possibility of fraud. Specific examples of prompt text are as follows:

ステップ１：入電の検知 Step 1: Detect incoming calls

ステップ２：会話の聞き取り Step 2: Listen to the conversation

ステップ４：文字化データの送信 Step 4: Send the transcribed data

ステップ５：詐欺判定 Step 5: Fraud Detection

ステップ６：判定結果の受信 Step 6: Receive the results

ステップ７：音声アラートの発生 Step 7: Receive an audio alert

（応用例１） (Application Example 1)

次に、形態例１の応用例１について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、スマート眼鏡２１４を「端末」と称する。 Next, we will explain Application Example 1 of Form Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal."

ソフトウェア： Software:

システムの動作 System Operation

具体例 Specific examples

プロンプト文の例 Example prompt

応用例１における特定処理の流れについて図１２を用いて説明する。 The flow of the identification process in Application Example 1 will be explained using Figure 12.

ステップ１：入電検知 Step 1: Incoming call detection

ステップ２：会話の録音 Step 2: Record the conversation

ステップ３：会話の文字化 Step 3: Transcribe the conversation

ステップ６：アラート発信 Step 6: Send an alert

（実施例２） (Example 2)

次に、形態例２の実施例２について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、スマート眼鏡２１４を「端末」と称する。 Next, Example 2 of Form Example 2 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal."

「会話データを文字データに変換する手段」とは、収集された音声データを解析し、テキスト形式に変換するための音声認識ソフトウェアやアルゴリズムである。 "Means for converting speech data into text data" refers to speech recognition software or algorithms used to analyze collected voice data and convert it into text format.

1. システムの構成 1. System Configuration

音声入力装置 Voice input device

生成AIモデル Generative AI model

音声アラートを発する手段 Method for issuing audio alerts

3. システムの動作 3. System Operation

4. 具体例 4. Specific Examples

5. プロンプト文の例 5. Prompt Sentence Examples

ステップ１： Step 1:

音声の収集 Audio collection

入力: ユーザーの音声会話 Input: User's voice conversation

出力: デジタル音声データ Output: Digital audio data

ステップ２： Step 2:

音声データの文字化 Transcription of audio data

入力: デジタル音声データ Input: Digital audio data

出力: テキストデータ Output: Text data

ステップ３： Step 3:

テキストデータの送信 Send text data

入力: テキストデータ Input: Text data

出力: サーバへのデータ送信 Output: Send data to the server

ステップ４： Step 4:

詐欺の検出 Fraud detection

入力: テキストデータ Input: Text data

ステップ５： Step 5:

音声アラートの発生 Audio alert occurs

入力: アラート信号 Input: Alert signal

出力: 音声アラート Output: Audio alert

具体的な動作: 家庭用電話のスピーカーが「詐欺の可能性があります。注意してください。」という音声アラートを再生する。 Specific action: The speaker on your home phone will play a voice alert saying, "This may be a scam. Be careful."

（応用例２） (Application Example 2)

次に、形態例２の応用例２について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、スマート眼鏡２１４を「端末」と称する。 Next, we will explain Application Example 2 of Form Example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal."

ステップ１： Step 1:

入力：通話音声 Input: Call audio

出力：録音された音声データ Output: Recorded audio data

ステップ２： Step 2:

入力：録音された音声データ Input: Recorded audio data

出力：テキストデータ Output: Text data

ステップ３： Step 3:

入力：テキストデータ Input: Text data

ステップ４： Step 4:

出力：音声アラートの内容 Output: Audio alert content

ステップ５： Step 5:

入力：音声アラートの内容 Input: Audio alert content

出力：音声アラート Output: Audio alert

（実施例３） (Example 3)

次に、形態例３の実施例３について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、スマート眼鏡２１４を「端末」と称する。 Next, we will explain Example 3 of Form Example 3. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal."

システムの構成 System Configuration

主語：サーバ Subject: Server

主語：端末 Subject: Terminal

主語：ユーザ Subject: User

具体例 Specific examples

具体例1 Example 1

ユーザが「お母さん、今すぐお金を振り込んでください」というメッセージを受け取った場合、以下のようにシステムが動作する： When a user receives a message saying, "Mom, please transfer the money now," the system works as follows:

具体例2 Example 2

プロンプト文の例 Example prompt

ステップ１： Step 1:

ユーザがテキストデータを入力する。 The user enters text data.

ステップ２： Step 2:

ステップ３： Step 3:

ステップ４： Step 4:

ステップ５： Step 5:

ステップ６： Step 6:

（応用例３） (Application Example 3)

次に、形態例３の応用例３について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、スマート眼鏡２１４を「端末」と称する。 Next, we will explain Application Example 3 of Form Example 3. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal."

ステップ１： Step 1:

詐欺電話の入電を検知する Detect incoming fraudulent calls

ステップ２： Step 2:

ステップ３： Step 3:

ステップ４： Step 4:

ステップ５： Step 5:

ステップ６： Step 6:

ユーザーに警告を発する Warn the user

なお、更に、ユーザの感情を推定する感情エンジンを組み合わせてもよい。すなわち、特定処理部２９０は、感情特定モデル５９を用いてユーザの感情を推定し、ユーザの感情を用いた特定処理を行うようにしてもよい。 It is also possible to further combine an emotion engine that estimates the user's emotion. That is, the identification processing unit 290 may estimate the user's emotion using the emotion identification model 59 and perform identification processing using the user's emotion.

「形態例１」 "Example 1"

「形態例２」 "Example 2"

「形態例３」 "Example 3"

「形態例１」 "Example 1"

ステップ６：生成系ＡＩが感情エンジンからの情報を基に、詐欺の可能性が更に高まった Step 6: The generative AI uses information from the emotion engine to further increase the likelihood of fraud.

と判断し、音声アラートを発する。 and issues an audio alert.

「形態例２」 "Example 2"

「形態例３」 "Example 3"

（実施例１） (Example 1)

発明を実施するための形態 Form for implementing the invention

1. 通信端末 1. Communication terminal

2. サーバ 2. Server

感情分析には、感情エンジン（例：IBM Watson Tone Analyzer）を使用する。 For sentiment analysis, we use an emotion engine (e.g., IBM Watson Tone Analyzer).

システムの具体的な動作 Specific system operation

1. 電話の入電検知 1. Incoming phone call detection

2. 会話の聞き取り 2. Listening to Conversations

3. 音声データの文字化 3. Transcription of audio data

4. 生成系人工知能への送信 4. Sending to generative AI

5. 詐欺の検知 5. Fraud Detection

6. 感情の分析 6. Sentiment Analysis

7. パニック状態の検知 7. Panic Detection

8. 詐欺の可能性の再評価 8. Reassessing the possibility of fraud

9. 音声アラートの発信 9. Audio alerts

具体例とプロンプト文 Examples and prompts

プロンプト文の例: Example prompt:

ユーザーの声のトーン: 急に上がる User's voice tone: Sudden rise

ユーザーの音量: 大きくなる」 User volume: Increased"

ステップ１： Step 1:

電話の入電検知 Incoming phone call detection

具体的な動作: アプリケーションは、AndroidやiOSのネイティブAPIを使用して電話の入電イベントをキャッチし、サーバに通知する。 Specific behavior: The application uses native Android and iOS APIs to catch incoming phone calls and notify the server.

ステップ２： Step 2:

会話の聞き取り Listening to conversations

入力: 通話中の音声データ。 Input: Audio data during the call.

出力: 音声データ。 Output: Audio data.

ステップ３： Step 3:

音声データの文字化 Transcription of audio data

入力: 音声データ。 Input: Audio data.

出力: 文字データ。 Output: Character data.

ステップ４： Step 4:

生成系人工知能への送信 Send to generative AI

入力: 文字データ。 Input: Character data.

ステップ５： Step 5:

詐欺の検知 Fraud detection

入力: 文字データ。 Input: Character data.

ステップ６： Step 6:

感情の分析 Emotion analysis

入力: 音声データ。 Input: Audio data.

出力: 感情パラメータ。 Output: Emotion parameters.

ステップ７： Step 7:

パニック状態の検知 Detecting panic states

入力: 感情パラメータ。 Input: Emotion parameters.

ステップ８： Step 8:

詐欺の可能性の再評価 Reassessing the possibility of fraud

入力: 感情パラメータ。 Input: Emotion parameters.

ステップ９： Step 9:

音声アラートの発信 Send audio alerts

出力: 音声アラートの発信。 Output: Plays an audio alert.

（応用例１） (Application Example 1)

システム構成 System Configuration

1. ハードウェア 1. Hardware

2. ソフトウェア 2. Software

音声アラートライブラリ：pyttsx3ライブラリを使用して、音声アラートを発する。 Audio Alert Library: Uses the pyttsx3 library to emit audio alerts.

処理の流れ Processing flow

1. 入電検知 1. Incoming call detection

2. 音声認識 2. Voice Recognition

3. 生成系AIによる詐欺検出 3. Fraud Detection Using Generative AI

4. 感情分析 4. Sentiment analysis

5. 音声アラート 5. Audio alerts

具体例 Specific examples

例えば、ユーザーが電話を受けた際に、詐欺の可能性があると判断された場合、通信端末のスピーカーから「警告！詐欺の可能性があります。」という音声アラートが発せられる。このアラートにより、ユーザーは詐欺の可能性に気づき、冷静な対応を取ることができる。 For example, if a user receives a phone call and it is determined that the call may be a scam, an audio alert will sound from the communication device's speaker saying, "Warning! Possible scam." This alert will make the user aware of the possibility of fraud and allow them to respond calmly.

プロンプト文の例 Example prompt

ステップ１： Step 1:

データ加工：入電信号を受信し、通話開始をトリガーとして処理を開始する。 Data processing: Receives incoming call signals and begins processing when the call starts.

出力：通話開始のフラグ。 Output: Call start flag.

ステップ２： Step 2:

入力：通話中の音声。 Input: Audio during a call.

出力：取得された音声データ。 Output: Captured audio data.

ステップ３： Step 3:

入力：取得された音声データ。 Input: Acquired audio data.

出力：変換されたテキストデータ。 Output: Converted text data.

ステップ４： Step 4:

入力：変換されたテキストデータ。 Input: Converted text data.

ステップ５： Step 5:

ステップ６： Step 6:

出力：音声アラート。 Output: Audio alert.

（実施例２） (Example 2)

次に、形態例２の実施例２について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、スマート眼鏡２１４を「端末」と称する。 Next, we will explain Example 2 of Form Example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal."

発明を実施するための形態 Form for implementing the invention

システムの動作 System Operation

5. 生成AIモデルが詐欺の可能性が高いと判断した場合、サーバが端末に指示を送り、通信装置のスピーカーから「詐欺の可能性があります。注意してください」という音声アラートを発する。 5. If the generative AI model determines that there is a high possibility of fraud, the server will send instructions to the terminal and an audio alert will be issued from the communication device's speaker saying, "There is a possibility of fraud. Please be careful."

具体例 Specific examples

プロンプト文の例 Example prompt

ステップ１： Step 1:

ステップ２： Step 2:

ステップ３： Step 3:

ステップ４： Step 4:

ステップ５： Step 5:

ステップ６： Step 6:

ステップ７： Step 7:

ステップ８： Step 8:

（応用例２） (Application Example 2)

ステップ１： Step 1:

ステップ２： Step 2:

ステップ３： Step 3:

ステップ４： Step 4:

ステップ５： Step 5:

ステップ６： Step 6:

ステップ７： Step 7:

（実施例３） (Example 3)

サーバは、生成系人工知能（例えば、BERTやGPT-3などの自然言語処理モデル）を用いて、テキストデータを解析する。生成系人工知能は、詐欺の可能性がある特定のフレーズやパターンを検出し、その結果を評価する。例えば、「お金を振り込んでください」というフレーズが含まれている場合、詐欺の可能性が高いと判断する。 The server analyzes the text data using generative artificial intelligence (for example, natural language processing models such as BERT or GPT-3). The generative artificial intelligence detects specific phrases and patterns that may be fraudulent and evaluates the results. For example, if the phrase "Please transfer money" is included, it will determine that there is a high possibility of fraud.

さらに、サーバは感情解析エンジン（例えば、IBM WatsonやMicrosoft Azureの感情分析API）を使用して、ユーザの感情を分析する。ユーザの音声やテキストデータを解析し、不安や緊張などの感情状態を特定する。感情解析エンジンが得た感情データは、生成系人工知能に送信される。 Furthermore, the server uses an emotion analysis engine (for example, IBM Watson or Microsoft Azure's emotion analysis API) to analyze the user's emotions. It analyzes the user's voice and text data to identify emotional states such as anxiety or tension. The emotion data obtained by the emotion analysis engine is sent to the generative artificial intelligence.

プロンプト文の例： Example prompt:

ステップ１： Step 1:

ユーザがメッセージを入力する。 The user enters a message.

ステップ２： Step 2:

ステップ３： Step 3:

ステップ４： Step 4:

ステップ５： Step 5:

ステップ６： Step 6:

ステップ７： Step 7:

（応用例３） (Application Example 3)

次に、端末（スマートフォンアプリケーションまたは家庭用電話）には、会話を聞き取るためのマイクが設置されている。このマイクは、通話中の音声を収集し、会話データとしてサーバに送信する。会話データは、音声認識技術を用いて文字化され、生成系ＡＩに送信される。 Next, the device (smartphone application or home phone) is equipped with a microphone for listening to the conversation. This microphone collects the voice during the call and sends it to the server as conversation data. The conversation data is converted into text using voice recognition technology and sent to the generative AI.

ステップ１： Step 1:

ステップ２： Step 2:

ステップ３： Step 3:

サーバは、音声認識技術を用いて音声データを文字化する。入力は音声データであり、出力はテキストデータである。具体的には、音声認識ソフトウェアが音声データを解析し、対応するテキストに変換する。 The server uses voice recognition technology to convert the voice data into text. The input is voice data and the output is text data. Specifically, the voice recognition software analyzes the voice data and converts it into the corresponding text.

ステップ４： Step 4:

ステップ５： Step 5:

ステップ６： Step 6:

ステップ７： Step 7:

特定処理部２９０は、特定処理の結果をスマート眼鏡２１４に送信する。スマート眼鏡２１４では、制御部４６Ａが、スピーカ２４０に対して特定処理の結果を出力させる。マイクロフォン２３８は、特定処理の結果に対するユーザ入力を示す音声を取得する。制御部４６Ａは、マイクロフォン２３８によって取得されたユーザ入力を示す音声データをデータ処理装置１２に送信する。データ処理装置１２では、特定処理部２９０が音声データを取得する。 The specific processing unit 290 transmits the results of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the results of the specific processing. The microphone 238 acquires audio indicating the user input regarding the results of the specific processing. The control unit 46A transmits audio data indicating the user input acquired by the microphone 238 to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

データ生成モデル５８は、いわゆる生成ＡＩ（Artificial Intelligence）である。データ生成モデル５８の一例としては、ＣｈａｔＧＰＴ（インターネット検索＜URL: https://openai.com/blog/chatgpt＞）等の生成ＡＩが挙げられる。データ生成モデル５８は、ニューラルネットワークに対して深層学習を行わせることによって得られる。データ生成モデル５８には、指示を含むプロンプトが入力され、かつ、音声を示す音声データ、テキストを示すテキストデータ、及び画像を示す画像データ等の推論用データが入力される。データ生成モデル５８は、入力された推論用データをプロンプトにより示される指示に従って推論し、推論結果を音声データ及びテキストデータ等のデータ形式で出力する。ここで、推論とは、例えば、分析、分類、予測、及び／又は要約等を指す。 Data generation model 58 is what is known as generative AI (artificial intelligence). An example of data generation model 58 is generative AI such as ChatGPT (Internet search <URL: https://openai.com/blog/chatgpt>). Data generation model 58 is obtained by performing deep learning on a neural network. A prompt containing an instruction is input to data generation model 58, and inference data such as voice data indicating voice, text data indicating text, and image data indicating an image is also input. Data generation model 58 performs inference on the input inference data in accordance with the instructions indicated by the prompt, and outputs the inference results in the form of data such as voice data and text data. Here, inference refers to, for example, analysis, classification, prediction, and/or summarization.

生成AIの他の例としては、Gemini（インターネット検索＜URL: https://gemini.google.com/?hl=ja＞）が挙げられる。 Another example of generative AI is Gemini (Internet search <URL: https://gemini.google.com/?hl=ja>).

上記実施形態では、データ処理装置１２によって特定処理が行われる形態例を挙げたが、本開示の技術はこれに限定されず、スマート眼鏡２１４によって特定処理が行われるようにしてもよい。 In the above embodiment, an example was given in which the specific processing was performed by the data processing device 12, but the technology disclosed herein is not limited to this, and the specific processing may also be performed by the smart glasses 214.

［第３実施形態］ [Third embodiment]

図５には、第３実施形態に係るデータ処理システム３１０の構成の一例が示されている。 Figure 5 shows an example of the configuration of a data processing system 310 according to the third embodiment.

図５に示すように、データ処理システム３１０は、データ処理装置１２及びヘッドセット型端末３１４を備えている。データ処理装置１２の一例としては、サーバが挙げられる。 As shown in FIG. 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

ヘッドセット型端末３１４は、コンピュータ３６、マイクロフォン２３８、スピーカ２４０、カメラ４２、通信Ｉ／Ｆ４４、及びディスプレイ３４３を備えている。コンピュータ３６は、プロセッサ４６、ＲＡＭ４８、及びストレージ５０を備えている。プロセッサ４６、ＲＡＭ４８、及びストレージ５０は、バス５２に接続されている。また、マイクロフォン２３８、スピーカ２４０、カメラ４２、及びディスプレイ３４３も、バス５２に接続されている。 The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication I/F 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, the speaker 240, the camera 42, and the display 343 are also connected to the bus 52.

図６には、データ処理装置１２及びヘッドセット型端末３１４の要部機能の一例が示されている。図６に示すように、データ処理装置１２では、プロセッサ２８によって特定処理が行われる。ストレージ３２には、特定処理プログラム５６が格納されている。 Figure 6 shows an example of the main functions of the data processing device 12 and headset terminal 314. As shown in Figure 6, in the data processing device 12, specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32.

ヘッドセット型端末３１４では、プロセッサ４６によって受付出力処理が行われる。ストレージ５０には、受付出力プログラム６０が格納されている。プロセッサ４６は、ストレージ５０から受付出力プログラム６０を読み出し、読み出した受付出力プログラム６０をＲＡＭ４８上で実行する。受付出力処理は、プロセッサ４６がＲＡＭ４８上で実行する受付出力プログラム６０に従って、制御部４６Ａとして動作することによって実現される。 In the headset terminal 314, the reception output process is performed by the processor 46. A reception output program 60 is stored in the storage 50. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output process is realized by the processor 46 operating as the control unit 46A in accordance with the reception output program 60 executed on the RAM 48.

「形態例１」 "Example 1"

「形態例２」 "Example 2"

「形態例３」 "Example 3"

「形態例１」 "Example 1"

「形態例２」 "Example 2"

「形態例３」 "Example 3"

（実施例１） (Example 1)

次に、形態例１の実施例１について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、ヘッドセット型端末３１４を「端末」と称する。 Next, Example 1 of Form Example 1 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset-type terminal 314 will be referred to as the "terminal."

「詐欺が疑われると判断した場合に、前記通信端末のスピーカーから音声アラートを発する手段」とは、生成AIモデルが詐欺の可能性を検出した際に、通信端末のスピーカーを通じて警告音やメッセージを再生するための機能や装置である。 "Means for issuing an audio alert from the speaker of the communication device when it is determined that fraud is suspected" refers to a function or device that plays an alert sound or message through the speaker of the communication device when the generative AI model detects the possibility of fraud.

ステップ１：入電の検知 Step 1: Detect incoming calls

ステップ２：会話の聞き取り Step 2: Listen to the conversation

ステップ４：文字化データの送信 Step 4: Send the transcribed data

ステップ５：詐欺判定 Step 5: Fraud Detection

ステップ６：判定結果の受信 Step 6: Receive the results

ステップ７：音声アラートの発生 Step 7: Receive an audio alert

（応用例１） (Application Example 1)

次に、形態例１の応用例１について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、ヘッドセット型端末３１４を「端末」と称する。 Next, we will explain Application Example 1 of Form Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the headset-type terminal 314 will be referred to as the "terminal."

ソフトウェア： Software:

システムの動作 System Operation

具体例 Specific examples

プロンプト文の例 Example prompt

ステップ１：入電検知 Step 1: Incoming call detection

ステップ２：会話の録音 Step 2: Record the conversation

ステップ３：会話の文字化 Step 3: Transcribe the conversation

ステップ６：アラート発信 Step 6: Send an alert

（実施例２） (Example 2)

次に、形態例２の実施例２について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、ヘッドセット型端末３１４を「端末」と称する。 Next, Example 2 of Form Example 2 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset-type terminal 314 will be referred to as the "terminal."

1. システムの構成 1. System Configuration

音声入力装置 Voice input device

生成AIモデル Generative AI model

音声アラートを発する手段 Method for issuing audio alerts

3. システムの動作 3. System Operation

4. 具体例 4. Specific Examples

5. プロンプト文の例 5. Prompt Sentence Examples

ステップ１： Step 1:

音声の収集 Audio collection

入力: ユーザーの音声会話 Input: User's voice conversation

出力: デジタル音声データ Output: Digital audio data

ステップ２： Step 2:

音声データの文字化 Transcription of audio data

入力: デジタル音声データ Input: Digital audio data

出力: テキストデータ Output: Text data

ステップ３： Step 3:

テキストデータの送信 Send text data

入力: テキストデータ Input: Text data

出力: サーバへのデータ送信 Output: Send data to the server

ステップ４： Step 4:

詐欺の検出 Fraud detection

入力: テキストデータ Input: Text data

ステップ５： Step 5:

音声アラートの発生 Audio alert occurs

入力: アラート信号 Input: Alert signal

出力: 音声アラート Output: Audio alert

（応用例２） (Application Example 2)

次に、形態例２の応用例２について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、ヘッドセット型端末３１４を「端末」と称する。 Next, we will explain Application Example 2 of Form Example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the headset-type terminal 314 will be referred to as the "terminal."

ステップ１： Step 1:

入力：通話音声 Input: Call audio

出力：録音された音声データ Output: Recorded audio data

ステップ２： Step 2:

入力：録音された音声データ Input: Recorded audio data

出力：テキストデータ Output: Text data

ステップ３： Step 3:

入力：テキストデータ Input: Text data

ステップ４： Step 4:

出力：音声アラートの内容 Output: Audio alert content

ステップ５： Step 5:

入力：音声アラートの内容 Input: Audio alert content

出力：音声アラート Output: Audio alert

（実施例３） (Example 3)

次に、形態例３の実施例３について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、ヘッドセット型端末３１４を「端末」と称する。 Next, we will explain Example 3 of Form Example 3. In the following explanation, the data processing device 12 will be referred to as the "server" and the headset-type terminal 314 will be referred to as the "terminal."

システムの構成 System Configuration

主語：サーバ Subject: Server

主語：端末 Subject: Terminal

主語：ユーザ Subject: User

具体例 Specific examples

具体例1 Example 1

具体例2 Example 2

ユーザが「あなたの銀行口座が不正利用されています。今すぐこちらに連絡してください」というメールを受け取った場合、以下のようにシステムが動作する： When a user receives an email saying, "Your bank account has been compromised. Contact us immediately," the system behaves as follows:

プロンプト文の例 Example prompt

ステップ１： Step 1:

ユーザがテキストデータを入力する。 The user enters text data.

ステップ２： Step 2:

ステップ３： Step 3:

ステップ４： Step 4:

ステップ５： Step 5:

ステップ６： Step 6:

（応用例３） (Application Example 3)

次に、形態例３の応用例３について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、ヘッドセット型端末３１４を「端末」と称する。 Next, we will explain Application Example 3 of Form Example 3. In the following explanation, the data processing device 12 will be referred to as the "server" and the headset-type terminal 314 will be referred to as the "terminal."

「生成系ＡＩが詐欺の可能性を示す特定のフレーズまたはパターンを検出する手段」とは、生成系AIが事前に学習した詐欺の典型的なフレーズやパターンを会話データから検出するための機能である。 "Means for the generative AI to detect specific phrases or patterns that indicate the possibility of fraud" refers to a function that allows the generative AI to detect typical fraudulent phrases and patterns that it has previously learned from conversation data.

ステップ１： Step 1:

詐欺電話の入電を検知する Detect incoming fraudulent calls

ステップ２： Step 2:

ステップ３： Step 3:

ステップ４： Step 4:

ステップ５： Step 5:

ステップ６： Step 6:

ユーザーに警告を発する Warn the user

「形態例１」 "Example 1"

「形態例２」 "Example 2"

「形態例３」 "Example 3"

「形態例１」 "Example 1"

「形態例２」 "Example 2"

「形態例３」 "Example 3"

（実施例１） (Example 1)

発明を実施するための形態 Form for implementing the invention

1. 通信端末 1. Communication terminal

2. サーバ 2. Server

システムの具体的な動作 Specific system operation

1. 電話の入電検知 1. Incoming phone call detection

2. 会話の聞き取り 2. Listening to Conversations

3. 音声データの文字化 3. Transcription of audio data

4. 生成系人工知能への送信 4. Sending to generative AI

5. 詐欺の検知 5. Fraud Detection

6. 感情の分析 6. Sentiment Analysis

7. パニック状態の検知 7. Panic Detection

8. 詐欺の可能性の再評価 8. Reassessing the possibility of fraud

9. 音声アラートの発信 9. Audio alerts

具体例とプロンプト文 Examples and prompts

プロンプト文の例: Example prompt:

ユーザーの声のトーン: 急に上がる User's voice tone: Sudden rise

ユーザーの音量: 大きくなる」 User volume: Increased"

ステップ１： Step 1:

電話の入電検知 Incoming phone call detection

ステップ２： Step 2:

会話の聞き取り Listening to conversations

入力: 通話中の音声データ。 Input: Audio data during the call.

出力: 音声データ。 Output: Audio data.

ステップ３： Step 3:

音声データの文字化 Transcription of audio data

入力: 音声データ。 Input: Audio data.

出力: 文字データ。 Output: Character data.

ステップ４： Step 4:

生成系人工知能への送信 Send to generative AI

入力: 文字データ。 Input: Character data.

ステップ５： Step 5:

詐欺の検知 Fraud detection

入力: 文字データ。 Input: Character data.

ステップ６： Step 6:

感情の分析 Emotion analysis

入力: 音声データ。 Input: Audio data.

出力: 感情パラメータ。 Output: Emotion parameters.

ステップ７： Step 7:

パニック状態の検知 Detecting panic states

入力: 感情パラメータ。 Input: Emotion parameters.

ステップ８： Step 8:

詐欺の可能性の再評価 Reassessing the possibility of fraud

入力: 感情パラメータ。 Input: Emotion parameters.

ステップ９： Step 9:

音声アラートの発信 Send audio alerts

出力: 音声アラートの発信。 Output: Plays an audio alert.

（応用例１） (Application Example 1)

システム構成 System Configuration

1. ハードウェア 1. Hardware

2. ソフトウェア 2. Software

処理の流れ Processing flow

1. 入電検知 1. Incoming call detection

2. 音声認識 2. Voice Recognition

3. 生成系AIによる詐欺検出 3. Fraud Detection Using Generative AI

4. 感情分析 4. Sentiment analysis

5. 音声アラート 5. Audio alerts

具体例 Specific examples

プロンプト文の例 Example prompt

ステップ１： Step 1:

出力：通話開始のフラグ。 Output: Call start flag.

ステップ２： Step 2:

入力：通話中の音声。 Input: Audio during a call.

出力：取得された音声データ。 Output: Captured audio data.

ステップ３： Step 3:

入力：取得された音声データ。 Input: Acquired audio data.

出力：変換されたテキストデータ。 Output: Converted text data.

ステップ４： Step 4:

入力：変換されたテキストデータ。 Input: Converted text data.

ステップ５： Step 5:

ステップ６： Step 6:

出力：音声アラート。 Output: Audio alert.

（実施例２） (Example 2)

発明を実施するための形態 Form for implementing the invention

システムの動作 System Operation

具体例 Specific examples

プロンプト文の例 Example prompt

ステップ１： Step 1:

ステップ２： Step 2:

ステップ３： Step 3:

ステップ４： Step 4:

ステップ５： Step 5:

ステップ６： Step 6:

ステップ７： Step 7:

ステップ８： Step 8:

（応用例２） (Application Example 2)

具体例として、ユーザーが電話で「銀行の口座情報を教えてください」と言われた場合、生成系AIモデルが詐欺の可能性を検知し、音声アラートを発する。また、ユーザーが怒りや不安を感じている場合、感情エンジンがそれを検知し、音声アラートを発する。 For example, if a user calls and says, "Please tell me your bank account information," the generative AI model will detect the possibility of fraud and issue a voice alert. Also, if the user is feeling anger or anxiety, the emotion engine will detect this and issue a voice alert.

ステップ１： Step 1:

ステップ２： Step 2:

ステップ３： Step 3:

端末が文字化された会話データを生成系AIモデル（transformersライブラリを用いたモデル）に送信する。生成系AIモデルが会話データを解析し、詐欺の可能性があるかどうかを判断する。入力は文字化された会話データであり、出力は詐欺の可能性に関する判断結果である。 The device sends the transcribed conversation data to a generative AI model (a model using the transformers library). The generative AI model analyzes the conversation data and determines whether there is a possibility of fraud. The input is the transcribed conversation data, and the output is a judgment regarding the possibility of fraud.

ステップ４： Step 4:

ステップ５： Step 5:

ステップ６： Step 6:

ステップ７： Step 7:

（実施例３） (Example 3)

プロンプト文の例： Example prompt:

ステップ１： Step 1:

ユーザがメッセージを入力する。 The user enters a message.

ステップ２： Step 2:

ステップ３： Step 3:

ステップ４： Step 4:

ステップ５： Step 5:

ステップ６： Step 6:

ステップ７： Step 7:

（応用例３） (Application Example 3)

次に、形態例３の応用例３について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、ヘッドセット型端末３１４を「端末」と称する。 Next, we will explain Application Example 3 of Configuration Example 3. In the following explanation, the data processing device 12 will be referred to as the "server" and the headset-type terminal 314 will be referred to as the "terminal."

「詐欺が疑われると判断した場合に、前記マイクから音声アラートを発する手段」とは、生成系ＡＩが詐欺の可能性を検出した際に、マイクを通じて警告音やメッセージを発する装置やソフトウェアである。 "Means for issuing an audio alert from the microphone when fraud is suspected" refers to a device or software that issues an alarm or message through the microphone when the generative AI detects the possibility of fraud.

ステップ１： Step 1:

ステップ２： Step 2:

ステップ３： Step 3:

ステップ４： Step 4:

ステップ５： Step 5:

ステップ６： Step 6:

ステップ７： Step 7:

特定処理部２９０は、特定処理の結果をヘッドセット型端末３１４に送信する。ヘッドセット型端末３１４では、制御部４６Ａが、スピーカ２４０及びディスプレイ３４３に対して特定処理の結果を出力させる。マイクロフォン２３８は、特定処理の結果に対するユーザ入力を示す音声を取得する。制御部４６Ａは、マイクロフォン２３８によって取得されたユーザ入力を示す音声データをデータ処理装置１２に送信する。データ処理装置１２では、特定処理部２９０が音声データを取得する。 The specific processing unit 290 transmits the results of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the results of the specific processing. The microphone 238 acquires audio indicating the user input regarding the results of the specific processing. The control unit 46A transmits audio data indicating the user input acquired by the microphone 238 to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

上記実施形態では、データ処理装置１２によって特定処理が行われる形態例を挙げたが、本開示の技術はこれに限定されず、ヘッドセット型端末３１４によって特定処理が行われるようにしてもよい。 In the above embodiment, an example was given in which the specific processing was performed by the data processing device 12, but the technology disclosed herein is not limited to this, and the specific processing may also be performed by the headset-type terminal 314.

［第４実施形態］ [Fourth embodiment]

図７には、第４実施形態に係るデータ処理システム４１０の構成の一例が示されている。 Figure 7 shows an example of the configuration of a data processing system 410 according to the fourth embodiment.

図７に示すように、データ処理システム４１０は、データ処理装置１２及びロボット４１４を備えている。データ処理装置１２の一例としては、サーバが挙げられる。 As shown in FIG. 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

ロボット４１４は、コンピュータ３６、マイクロフォン２３８、スピーカ２４０、カメラ４２、通信Ｉ／Ｆ４４、及び制御対象４４３を備えている。コンピュータ３６は、プロセッサ４６、ＲＡＭ４８、及びストレージ５０を備えている。プロセッサ４６、ＲＡＭ４８、及びストレージ５０は、バス５２に接続されている。また、マイクロフォン２３８、スピーカ２４０、カメラ４２、及び制御対象４４３も、バス５２に接続されている。 The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication I/F 44, and a control target 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and control target 443 are also connected to the bus 52.

制御対象４４３は、表示装置、目部のＬＥＤ、並びに、腕、手及び足等を駆動するモータ等を含む。ロボット４１４の姿勢や仕草は、腕、手及び足等のモータを制御することにより制御される。ロボット４１４の感情の一部は、これらのモータを制御することにより表現できる。また、ロボット４１４の目部のＬＥＤの発光状態を制御することによっても、ロボット４１４の表情を表現できる。 The control object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the emotions of the robot 414 can be expressed by controlling these motors. In addition, the facial expressions of the robot 414 can also be expressed by controlling the light emission state of the LEDs in the eyes of the robot 414.

図８には、データ処理装置１２及びロボット４１４の要部機能の一例が示されている。図８に示すように、データ処理装置１２では、プロセッサ２８によって特定処理が行われる。ストレージ３２には、特定処理プログラム５６が格納されている。 Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, in the data processing device 12, specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32.

ロボット４１４では、プロセッサ４６によって受付出力処理が行われる。ストレージ５０には、受付出力プログラム６０が格納されている。プロセッサ４６は、ストレージ５０から受付出力プログラム６０を読み出し、読み出した受付出力プログラム６０をＲＡＭ４８上で実行する。受付出力処理は、プロセッサ４６がＲＡＭ４８上で実行する受付出力プログラム６０に従って、制御部４６Ａとして動作することによって実現される。 In the robot 414, the reception output process is performed by the processor 46. A reception output program 60 is stored in the storage 50. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output process is realized by the processor 46 operating as the control unit 46A in accordance with the reception output program 60 executed on the RAM 48.

「形態例１」 "Example 1"

「形態例２」 "Example 2"

「形態例３」 "Example 3"

「形態例１」 "Example 1"

「形態例２」 "Example 2"

「形態例３」 "Example 3"

（実施例１） (Example 1)

次に、形態例１の実施例１について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、ロボット４１４を「端末」と称する。 Next, Example 1 of Form Example 1 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 will be referred to as the "terminal."

ステップ１：入電の検知 Step 1: Detect incoming calls

ステップ２：会話の聞き取り Step 2: Listen to the conversation

ステップ４：文字化データの送信 Step 4: Send the transcribed data

ステップ５：詐欺判定 Step 5: Fraud Detection

ステップ６：判定結果の受信 Step 6: Receive the results

ステップ７：音声アラートの発生 Step 7: Receive an audio alert

（応用例１） (Application Example 1)

次に、形態例１の応用例１について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、ロボット４１４を「端末」と称する。 Next, we will explain Application Example 1 of Form Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 will be referred to as the "terminal."

ソフトウェア： Software:

システムの動作 System Operation

具体例 Specific examples

プロンプト文の例 Example prompt

ステップ１：入電検知 Step 1: Incoming call detection

ステップ２：会話の録音 Step 2: Record the conversation

ステップ３：会話の文字化 Step 3: Transcribe the conversation

ステップ６：アラート発信 Step 6: Send an alert

（実施例２） (Example 2)

次に、形態例２の実施例２について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、ロボット４１４を「端末」と称する。 Next, Example 2 of Form Example 2 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 will be referred to as the "terminal."

1. システムの構成 1. System Configuration

音声入力装置 Voice input device

生成AIモデル Generative AI model

音声アラートを発する手段 Method for issuing audio alerts

3. システムの動作 3. System Operation

4. 具体例 4. Specific Examples

5. プロンプト文の例 5. Prompt Sentence Examples

ステップ１： Step 1:

音声の収集 Audio collection

入力: ユーザーの音声会話 Input: User's voice conversation

出力: デジタル音声データ Output: Digital audio data

ステップ２： Step 2:

音声データの文字化 Transcription of audio data

入力: デジタル音声データ Input: Digital audio data

出力: テキストデータ Output: Text data

ステップ３： Step 3:

テキストデータの送信 Send text data

入力: テキストデータ Input: Text data

出力: サーバへのデータ送信 Output: Send data to the server

ステップ４： Step 4:

詐欺の検出 Fraud detection

入力: テキストデータ Input: Text data

ステップ５： Step 5:

音声アラートの発生 Audio alert occurs

入力: アラート信号 Input: Alert signal

出力: 音声アラート Output: Audio alert

（応用例２） (Application Example 2)

次に、形態例２の応用例２について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、ロボット４１４を「端末」と称する。 Next, we will explain Application Example 2 of Form Example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 will be referred to as the "terminal."

ステップ１： Step 1:

入力：通話音声 Input: Call audio

出力：録音された音声データ Output: Recorded audio data

ステップ２： Step 2:

入力：録音された音声データ Input: Recorded audio data

出力：テキストデータ Output: Text data

ステップ３： Step 3:

入力：テキストデータ Input: Text data

ステップ４： Step 4:

出力：音声アラートの内容 Output: Audio alert content

ステップ５： Step 5:

入力：音声アラートの内容 Input: Audio alert content

出力：音声アラート Output: Audio alert

（実施例３） (Example 3)

次に、形態例３の実施例３について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、ロボット４１４を「端末」と称する。 Next, Example 3 of Form Example 3 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 will be referred to as the "terminal."

システムの構成 System Configuration

主語：サーバ Subject: Server

主語：端末 Subject: Terminal

主語：ユーザ Subject: User

具体例 Specific examples

具体例1 Example 1

具体例2 Example 2

プロンプト文の例 Example prompt

ステップ１： Step 1:

ユーザがテキストデータを入力する。 The user enters text data.

ステップ２： Step 2:

ステップ３： Step 3:

ステップ４： Step 4:

ステップ５： Step 5:

ステップ６： Step 6:

（応用例３） (Application Example 3)

次に、形態例３の応用例３について説明する。以下の説明では、データ処理装置１２を「サーバ」と称し、ロボット４１４を「端末」と称する。 Next, we will explain Application Example 3 of Form Example 3. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 will be referred to as the "terminal."

ステップ１： Step 1:

詐欺電話の入電を検知する Detect incoming fraudulent calls

ステップ２： Step 2:

ステップ３： Step 3:

ステップ４： Step 4:

ステップ５： Step 5:

ステップ６： Step 6:

ユーザーに警告を発する Warn the user

「形態例１」 "Example 1"

「形態例２」 "Example 2"

「形態例３」 "Example 3"

「形態例１」 "Example 1"

「形態例２」 "Example 2"

「形態例３」 "Example 3"

（実施例１） (Example 1)

発明を実施するための形態 Form for implementing the invention

1. 通信端末 1. Communication terminal

2. サーバ 2. Server

システムの具体的な動作 Specific system operation

1. 電話の入電検知 1. Incoming phone call detection

2. 会話の聞き取り 2. Listening to Conversations

3. 音声データの文字化 3. Transcription of audio data

4. 生成系人工知能への送信 4. Sending to generative AI

5. 詐欺の検知 5. Fraud Detection

6. 感情の分析 6. Sentiment Analysis

7. パニック状態の検知 7. Panic Detection

8. 詐欺の可能性の再評価 8. Reassessing the possibility of fraud

9. 音声アラートの発信 9. Audio alerts

具体例とプロンプト文 Examples and prompts

プロンプト文の例: Example prompt:

ユーザーの声のトーン: 急に上がる User's voice tone: Sudden rise

ユーザーの音量: 大きくなる」 User volume: Increased"

ステップ１： Step 1:

電話の入電検知 Incoming phone call detection

ステップ２： Step 2:

会話の聞き取り Listening to conversations

入力: 通話中の音声データ。 Input: Audio data during the call.

出力: 音声データ。 Output: Audio data.

ステップ３： Step 3:

音声データの文字化 Transcription of audio data

入力: 音声データ。 Input: Audio data.

出力: 文字データ。 Output: Character data.

ステップ４： Step 4:

生成系人工知能への送信 Send to generative AI

入力: 文字データ。 Input: Character data.

ステップ５： Step 5:

詐欺の検知 Fraud detection

入力: 文字データ。 Input: Character data.

ステップ６： Step 6:

感情の分析 Emotion analysis

入力: 音声データ。 Input: Audio data.

出力: 感情パラメータ。 Output: Emotion parameters.

ステップ７： Step 7:

パニック状態の検知 Detecting panic states

入力: 感情パラメータ。 Input: Emotion parameters.

ステップ８： Step 8:

詐欺の可能性の再評価 Reassessing the possibility of fraud

入力: 感情パラメータ。 Input: Emotion parameters.

具体的な動作: 生成系人工知能は、感情データを追加の入力として受け取り、詐欺の可能性を再評価する。 Specific behavior: The generative AI receives emotional data as additional input and reassess the likelihood of fraud.

ステップ９： Step 9:

音声アラートの発信 Send audio alerts

出力: 音声アラートの発信。 Output: Plays an audio alert.

（応用例１） (Application Example 1)

「会話データ」とは、音声入力装置によって取得された音声を文字化したデータである。 "Conversation data" refers to data that has been converted from speech captured by a voice input device.

システム構成 System Configuration

1. ハードウェア 1. Hardware

音声入力装置：通信端末に内蔵されたマイク。 Audio input device: A microphone built into the communication device.

2. ソフトウェア 2. Software

処理の流れ Processing flow

1. 入電検知 1. Incoming call detection

2. 音声認識 2. Voice Recognition

3. 生成系AIによる詐欺検出 3. Fraud Detection Using Generative AI

4. 感情分析 4. Sentiment analysis

5. 音声アラート 5. Audio alerts

具体例 Specific examples

プロンプト文の例 Example prompt

ステップ１： Step 1:

出力：通話開始のフラグ。 Output: Call start flag.

ステップ２： Step 2:

入力：通話中の音声。 Input: Audio during a call.

出力：取得された音声データ。 Output: Captured audio data.

ステップ３： Step 3:

入力：取得された音声データ。 Input: Acquired audio data.

出力：変換されたテキストデータ。 Output: Converted text data.

ステップ４： Step 4:

入力：変換されたテキストデータ。 Input: Converted text data.

ステップ５： Step 5:

ステップ６： Step 6:

出力：音声アラート。 Output: Audio alert.

（実施例２） (Example 2)

発明を実施するための形態 Form for implementing the invention

システムの動作 System Operation

具体例 Specific examples

プロンプト文の例 Example prompt

ステップ１： Step 1:

ステップ２： Step 2:

ステップ３： Step 3:

ステップ４： Step 4:

ステップ５： Step 5:

ステップ６： Step 6:

ステップ７： Step 7:

ステップ８： Step 8:

（応用例２） (Application Example 2)

ステップ１： Step 1:

ステップ２： Step 2:

ステップ３： Step 3:

ステップ４： Step 4:

ステップ５： Step 5:

ステップ６： Step 6:

ステップ７： Step 7:

（実施例３） (Example 3)

プロンプト文の例： Example prompt:

ステップ１： Step 1:

ユーザがメッセージを入力する。 The user enters a message.

ステップ２： Step 2:

ステップ３： Step 3:

ステップ４： Step 4:

ステップ５： Step 5:

ステップ６： Step 6:

ステップ７： Step 7:

（応用例３） (Application Example 3)

さらに、感情エンジンがユーザーの感情を分析する。感情エンジンは、音声やテキストデータからユーザーの感情状態を解析するためのソフトウェアである。感情エンジンから得られた感情情報と生成系ＡＩによる会話データの解析結果を統合し、詐欺の可能性を総合的に判断する。 Furthermore, an emotion engine analyzes the user's emotions. The emotion engine is software that analyzes the user's emotional state from voice and text data. The emotional information obtained from the emotion engine is integrated with the results of the analysis of conversation data by generative AI to make a comprehensive judgment on the possibility of fraud.

ステップ１： Step 1:

ステップ２： Step 2:

ステップ３： Step 3:

ステップ４： Step 4:

ステップ５： Step 5:

ステップ６： Step 6:

ステップ７： Step 7:

特定処理部２９０は、特定処理の結果をロボット４１４に送信する。ロボット４１４では、制御部４６Ａが、スピーカ２４０及び制御対象４４３に対して特定処理の結果を出力させる。マイクロフォン２３８は、特定処理の結果に対するユーザ入力を示す音声を取得する。制御部４６Ａは、マイクロフォン２３８によって取得されたユーザ入力を示す音声データをデータ処理装置１２に送信する。データ処理装置１２では、特定処理部２９０が音声データを取得する。 The specific processing unit 290 transmits the results of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the control target 443 to output the results of the specific processing. The microphone 238 acquires audio indicating the user input regarding the results of the specific processing. The control unit 46A transmits audio data indicating the user input acquired by the microphone 238 to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

上記実施形態では、データ処理装置１２によって特定処理が行われる形態例を挙げたが、本開示の技術はこれに限定されず、ロボット４１４によって特定処理が行われるようにしてもよい。 In the above embodiment, an example was given in which the specific processing was performed by the data processing device 12, but the technology disclosed herein is not limited to this, and the specific processing may also be performed by the robot 414.

なお、感情エンジンとしての感情特定モデル５９は、特定のマッピングに従い、ユーザの感情を決定してよい。具体的には、感情特定モデル５９は、特定のマッピングである感情マップ（図９参照）に従い、ユーザの感情を決定してよい。また、感情特定モデル５９は、同様に、ロボットの感情を決定し、特定処理部２９０は、ロボットの感情を用いた特定処理を行うようにしてもよい。 The emotion identification model 59, which serves as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to an emotion map (see Figure 9), which is a specific mapping. Similarly, the emotion identification model 59 may determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

図９は、複数の感情がマッピングされる感情マップ４００を示す図である。感情マップ４００において、感情は、中心から放射状に同心円に配置されている。同心円の中心に近いほど、原始的状態の感情が配置されている。同心円のより外側には、心境から生まれる状態や行動を表す感情が配置されている。感情とは、情動や心的状態も含む概念である。同心円の左側には、概して脳内で起きる反応から生成される感情が配置されている。同心円の右側には概して、状況判断で誘導される感情が配置されている。同心円の上方向及び下方向には、概して脳内で起きる反応から生成され、かつ、状況判断で誘導される感情が配置されている。また、同心円の上側には、「快」の感情が配置され、下側には、「不快」の感情が配置されている。このように、感情マップ４００では、感情が生まれる構造に基づいて複数の感情がマッピングされており、同時に生じやすい感情が、近くにマッピングされている。 Figure 9 shows an emotion map 400 on which multiple emotions are mapped. In emotion map 400, emotions are arranged in concentric circles radiating from the center. Emotions closer to the center of the concentric circles are more primitive. Emotions representing states and actions arising from a state of mind are arranged on the outer edges of the concentric circles. The concept of emotion includes both emotions and mental states. Emotions that are generally generated from reactions that occur in the brain are arranged on the left side of the concentric circles. Emotions that are generally induced by situational judgment are arranged on the right side of the concentric circles. Emotions that are generally generated from reactions that occur in the brain and are induced by situational judgment are arranged above and below the concentric circles. Furthermore, the emotion of "pleasure" is arranged on the top side of the concentric circles, and the emotion of "discomfort" is arranged on the bottom side. In this way, emotion map 400 maps multiple emotions based on the structure by which emotions are generated, with emotions that tend to occur simultaneously being mapped close together.

これらの感情は、感情マップ４００の３時の方向に分布しており、普段は安心と不安のあたりを行き来する。感情マップ４００の右半分では、内部的な感覚よりも状況認識の方が優位に立つため、落ち着いた印象になる。 These emotions are distributed in the 3 o'clock direction on emotion map 400, and usually fluctuate between relief and anxiety. In the right half of emotion map 400, situational awareness takes precedence over internal sensations, resulting in a sense of calm.

感情マップ４００の内側は心の中、感情マップ４００の外側は行動を表すため、感情マップ４００の外側に行くほど、感情が目に見える（行動に表れる）ようになる。 The inside of emotion map 400 represents what is going on in the mind, and the outside of emotion map 400 represents behavior, so the further out you go on emotion map 400, the more visible (expressed in behavior) the emotion becomes.

ここで、人の感情は、姿勢や血糖値のような様々なバランスを基礎としており、それらのバランスが理想から遠ざかると不快、理想に近づくと快という状態を示す。ロボットや自動車やバイク等においても、姿勢やバッテリー残量のような様々なバランスを基礎として、それらのバランスが理想から遠ざかると不快、理想に近づくと快という状態を示すように感情を作ることができる。感情マップは、例えば、光吉博士の感情地図（音声感情認識及び情動の脳生理信号分析システムに関する研究、徳島大学、博士論文：https://ci.nii.ac.jp/naid/500000375379）に基づいて生成されてよい。感情地図の左半分には、感覚が優位にたつ「反応」と呼ばれる領域に属する感情が並ぶ。また、感情地図の右半分には、状況認識が優位にたつ「状況」と呼ばれる領域に属する感情が並ぶ。 Here, human emotions are based on various balances such as posture and blood sugar levels, and when these balances deviate from the ideal, it indicates discomfort, and when they approach the ideal, it indicates pleasure. Emotions can also be created for robots, cars, motorcycles, etc., based on various balances such as posture and remaining battery life, so that when these balances deviate from the ideal, it indicates discomfort, and when they approach the ideal, it indicates pleasure. Emotion maps may be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on speech emotion recognition and emotional brain physiological signal analysis systems, Tokushima University, doctoral dissertation: https://ci.nii.ac.jp/naid/500000375379). The left half of the emotion map is lined with emotions belonging to an area called "reaction," where sensation is dominant. The right half of the emotion map is lined with emotions belonging to an area called "situation," where situational awareness is dominant.

感情マップでは学習を促す感情が２つ定義される。１つは、状況側にあるネガティブな「懺悔」や「反省」の真ん中周辺の感情である。つまり、「もう２度とこんな想いはしたくない」「もう叱られたくない」というネガティブな感情がロボットに生じたときである。もう１つは、反応側にあるポジティブな「欲」のあたりの感情である。つまり、「もっと欲しい」「もっと知りたい」というポジティブな気持ちのときである。 The emotion map defines two emotions that encourage learning. One is the negative emotion around the middle of "repentance" or "reflection" on the situation side. In other words, this is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the positive emotion around "desire" on the response side. In other words, this is when the robot experiences positive feelings such as "I want more" or "I want to know more."

感情特定モデル５９は、ユーザ入力を、予め学習されたニューラルネットワークに入力し、感情マップ４００に示す各感情を示す感情値を取得し、ユーザの感情を決定する。このニューラルネットワークは、ユーザ入力と、感情マップ４００に示す各感情を示す感情値との組み合わせである複数の学習データに基づいて予め学習されたものである。また、このニューラルネットワークは、図１０に示す感情マップ９００のように、近くに配置されている感情同士は、近い値を持つように学習される。図１０では、「安心」、「安穏」、「心強い」という複数の感情が、近い感情値となる例を示している。 The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values indicating each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple pieces of training data that are combinations of user input and emotion values indicating each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions that are close to each other have similar values, as in the emotion map 900 shown in Figure 10. Figure 10 shows an example in which multiple emotions, such as "relieved," "calm," and "reassuring," have similar emotion values.

上記実施形態では、１台のコンピュータ２２によって特定処理が行われる形態例を挙げたが、本開示の技術はこれに限定されず、コンピュータ２２を含めた複数のコンピュータによる特定処理に対する分散処理が行われるようにしてもよい。 In the above embodiment, an example was given in which a specific process is performed by a single computer 22, but the technology disclosed herein is not limited to this, and distributed processing of the specific process may also be performed by multiple computers, including computer 22.

上記実施形態では、ストレージ３２に特定処理プログラム５６が格納されている形態例を挙げて説明したが、本開示の技術はこれに限定されない。例えば、特定処理プログラム５６がＵＳＢ（Universal Serial Bus）メモリなどの可搬型のコンピュータ読み取り可能な非一時的格納媒体に格納されていてもよい。非一時的格納媒体に格納されている特定処理プログラム５６は、データ処理装置１２のコンピュータ２２にインストールされる。プロセッサ２８は、特定処理プログラム５６に従って特定処理を実行する。 In the above embodiment, an example was described in which the specific processing program 56 is stored in the storage 32, but the technology of the present disclosure is not limited to this. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-transitory storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-transitory storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes the specific processing in accordance with the specific processing program 56.

また、ネットワーク５４を介してデータ処理装置１２に接続されるサーバ等の格納装置に特定処理プログラム５６を格納させておき、データ処理装置１２の要求に応じて特定処理プログラム５６がダウンロードされ、コンピュータ２２にインストールされるようにしてもよい。 Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

なお、ネットワーク５４を介してデータ処理装置１２に接続されるサーバ等の格納装置に特定処理プログラム５６の全てを格納させておいたり、ストレージ３２に特定処理プログラム５６の全てを記憶させたりしておく必要はなく、特定処理プログラム５６の一部を格納させておいてもよい。 It is not necessary to store the entire specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entire specific processing program 56 in the storage 32; only a portion of the specific processing program 56 may be stored.

特定処理を実行するハードウェア資源としては、次に示す各種のプロセッサを用いることができる。プロセッサとしては、例えば、ソフトウェア、すなわち、プログラムを実行することで、特定処理を実行するハードウェア資源として機能する汎用的なプロセッサであるＣＰＵが挙げられる。また、プロセッサとしては、例えば、ＦＰＧＡ（Field-Programmable Gate Array）、ＰＬＤ（Programmable Logic Device）、又はＡＳＩＣ（Application Specific Integrated Circuit）などの特定の処理を実行させるために専用に設計さ The hardware resources that execute specific processes can include the following types of processors. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource that executes specific processes by executing software, i.e., a program. Processors also include FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), and ASICs (Application Specific Integrated Circuits), which are specially designed to execute specific processes.

れた回路構成を有するプロセッサである専用電気回路が挙げられる。何れのプロセッサにもメモリが内蔵又は接続されており、何れのプロセッサもメモリを使用することで特定処理を実行する。 Specialized electrical circuits are processors with specialized circuit configurations. All processors have built-in or connected memory, and all processors use memory to perform specific processes.

特定処理を実行するハードウェア資源は、これらの各種のプロセッサのうちの１つで構成されてもよいし、同種又は異種の２つ以上のプロセッサの組み合わせ（例えば、複数のＦＰＧＡの組み合わせ、又はＣＰＵとＦＰＧＡとの組み合わせ）で構成されてもよい。また、特定処理を実行するハードウェア資源は１つのプロセッサであってもよい。 The hardware resource that executes the specific processing may be composed of one of these various processors, or may be composed of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). The hardware resource that executes the specific processing may also be a single processor.

１つのプロセッサで構成する例としては、第１に、１つ以上のＣＰＵとソフトウェアの組み合わせで１つのプロセッサを構成し、このプロセッサが、特定処理を実行するハードウェア資源として機能する形態がある。第２に、ＳｏＣ（System-on-a-chip）などに代表されるように、特定処理を実行する複数のハードウェア資源を含むシステム全体の機能を１つのＩＣチップで実現するプロセッサを使用する形態がある。このように、特定処理は、ハードウェア資源として、上記各種のプロセッサの１つ以上を用いて実現される。 As an example of a configuration using a single processor, first, there is a configuration in which one processor is configured using a combination of one or more CPUs and software, and this processor functions as a hardware resource that executes specific processing. Second, there is a configuration in which a processor is used to realize the functions of an entire system, including multiple hardware resources that execute specific processing, on a single IC chip, as typified by SoC (System-on-a-chip). In this way, specific processing is realized using one or more of the various processors listed above as hardware resources.

更に、これらの各種のプロセッサのハードウェア的な構造としては、より具体的には、半導体素子などの回路素子を組み合わせた電気回路を用いることができる。また、上記の特定処理はあくまでも一例である。従って、主旨を逸脱しない範囲内において不要なステップを削除したり、新たなステップを追加したり、処理順序を入れ替えたりしてもよいことは言うまでもない。 More specifically, the hardware structure of these various processors can be an electrical circuit that combines circuit elements such as semiconductor devices. Furthermore, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps can be added, or the processing order can be rearranged, all within the scope of the spirit of the invention.

以上に示した記載内容及び図示内容は、本開示の技術に係る部分についての詳細な説明であり、本開示の技術の一例に過ぎない。例えば、上記の構成、機能、作用、及び効果に関する説明は、本開示の技術に係る部分の構成、機能、作用、及び効果の一例に関する説明である。よって、本開示の技術の主旨を逸脱しない範囲内において、以上に示した記載内容及び図示内容に対して、不要な部分を削除したり、新たな要素を追加したり、置き換えたりしてもよいことは言うまでもない。また、錯綜を回避し、本開示の技術に係る部分の理解を容易にするために、以上に示した記載内容及び図示内容では、本開示の技術の実施を可能にする上で特に説明を要しない技術常識等に関する説明は省略されている。 The above-described written content and illustrations are a detailed explanation of the parts related to the technology of the present disclosure and are merely an example of the technology of the present disclosure. For example, the above description of the configuration, functions, actions, and effects is an explanation of an example of the configuration, functions, actions, and effects of the parts related to the technology of the present disclosure. Therefore, it goes without saying that unnecessary parts may be deleted, new elements may be added, or substitutions may be made to the above-described written content and illustrations, as long as they do not deviate from the spirit of the technology of the present disclosure. Furthermore, to avoid confusion and facilitate understanding of the parts related to the technology of the present disclosure, the above-described written content and illustrations omit explanations of common technical knowledge that do not require particular explanation to enable the implementation of the technology of the present disclosure.

本明細書に記載された全ての文献、特許出願及び技術規格は、個々の文献、特許出願及び技術規格が参照により取り込まれることが具体的かつ個々に記された場合と同程度に、本明細書中に参照により取り込まれる。 All publications, patent applications, and technical standards mentioned in this specification are incorporated by reference herein to the same extent as if each individual publication, patent application, and technical standard was specifically and individually indicated to be incorporated by reference.

以上の実施形態に関し、更に以下を開示する。 The following is further disclosed regarding the above embodiments.

（請求項１） (Claim 1)

詐欺電話の入電を検知する手段と、スマートフォンアプリケーションまたは家庭用電話に設置したマイクから会話を聞き取る手段と、会話データを文字化し生成系ＡＩへ送信する手段と、生成系ＡＩが会話データから詐欺が疑われると判断した場合に、前記マイクから音声アラートを発する手段を含むシステム。 A system comprising: a means for detecting incoming fraudulent calls; a means for listening to conversations from a smartphone application or a microphone installed on a home phone; a means for converting the conversation data into text and sending it to a generative AI; and a means for issuing an audio alert from the microphone if the generative AI determines that fraud is suspected based on the conversation data.

（請求項２） (Claim 2)

前記生成系ＡＩが、特殊詐欺の可能性を示す特定のフレーズまたはパターンを検出することにより詐欺が疑われると判断する、請求項１記載のシステム。 The system of claim 1, wherein the generative AI determines that fraud is suspected by detecting specific phrases or patterns that indicate the possibility of specialized fraud.

（請求項３） (Claim 3)

前記音声アラートが、ユーザーが詐欺の可能性に気づくことを促す内容である、請求項１記載のシステム。 The system of claim 1, wherein the audio alert alerts users to potential fraud.

（請求項４） (Claim 4)

前記生成系ＡＩが、ユーザーの感情を認識する感情エンジンを含む、請求項１記載のシステム。 The system of claim 1, wherein the generative AI includes an emotion engine that recognizes the user's emotions.

（請求項５） (Claim 5)

前記感情エンジンが、ユーザーの声のトーン、音量、話速などから感情を分析する、請求項４記載のシステム。 The system described in claim 4, wherein the emotion engine analyzes emotions from the user's tone of voice, volume, speaking rate, etc.

（請求項６） (Claim 6)

前記感情エンジンが、ユーザーの感情が特定の閾値を超えた場合に、音声アラートを発する、請求項４記載のシステム。 The system of claim 4, wherein the emotion engine issues an audio alert when the user's emotion exceeds a certain threshold.

「実施例１」 "Example 1"

（請求項１）
電話の入電について所定のリストを用いて詐欺電話の入電を検知する手段と、
通信端末に設置した音声入力装置から会話データを収集して取得する手段と、
前記会話データを文字化し、文字化した当該会話データの解析の指示を生成AIモデルへ入力する手段と、
前記生成AIモデルが前記会話データを解析し、詐欺が疑われると判断した場合に、前記通信端末の前記音声入出力装置から音声アラートを発する手段と、を含むシステム。 (Claim 1)
means for detecting incoming fraudulent calls using a predetermined list of incoming calls;
means for collecting and acquiring conversation data from a voice input device installed in the communication terminal;
A means for transcribing the conversation data and inputting instructions for analyzing the transcribed conversation data into a generative AI model;
and means for issuing an audio alert from the audio input/output device of the communication terminal if the generative AI model analyzes the conversation data and determines that fraud is suspected.

（請求項２）
前記生成AIモデルが、特殊詐欺の可能性を示す特定のフレーズまたはパターンを検出した場合に前記詐欺が疑われると判断する、請求項１記載のシステム。 (Claim 2)
The system of claim 1 , wherein the generative AI model determines that fraud is suspected if it detects specific phrases or patterns that indicate the possibility of specialized fraud.

（請求項３）
前記音声出力装置から発する前記音声アラートは、ユーザーが詐欺の可能性に気づくことを促す内容である、請求項１記載のシステム。 (Claim 3)
The system of claim 1 , wherein the audio alert emitted from the audio output device alerts a user to potential fraud.

「応用例１」 "Application Example 1"

（請求項１）
詐欺電話の入電を検知する手段と、
通信端末に設置した音声入力装置から会話を聞き取る手段と、
会話データを文字化し生成系ＡＩへ送信する手段と、
生成系ＡＩが会話データから詐欺が疑われると判断した場合に、前記音声出力装置から音声アラートを発する手段と、
前記生成系ＡＩが、詐欺の可能性を示す特定のフレーズまたはパターンを検出する手段と、
前記音声アラートが、ユーザーが詐欺の可能性に気づくことを促す内容である手段と、
を含むシステム。 (Claim 1)
A means for detecting incoming fraudulent calls;
means for listening to a conversation from a voice input device installed in a communication terminal;
A means for converting conversation data into text and transmitting it to a generative AI;
means for issuing a voice alert from the voice output device when the generative AI determines that fraud is suspected from the conversation data;
means for the generative AI to detect specific phrases or patterns indicative of potential fraud;
said audio alert alerting the user to potential fraud;
A system including:

（請求項２）
前記生成系ＡＩが、詐欺の可能性を示す特定のフレーズまたはパターンを検出することにより詐欺が疑われると判断する、請求項１記載のシステム。 (Claim 2)
10. The system of claim 1, wherein the generative AI determines that fraud is suspected by detecting specific phrases or patterns indicative of possible fraud.

（請求項３）
前記音声アラートが、ユーザーが詐欺の可能性に気づくことを促す内容である、請求項１記載のシステム。 (Claim 3)
The system of claim 1 , wherein the audio alert alerts the user to potential fraud.

「実施例２」 "Example 2"

（請求項１）
詐欺電話の入電を検知する手段と、
通信装置に設置した音声入力装置から会話を収集する手段と、
会話データを文字データに変換する手段と、
文字データを生成AIモデルへ送信する手段と、
生成AIモデルが会話データから詐欺が疑われると判断した場合に、前記音声入力装置から音声アラートを発する手段を含むシステム。 (Claim 1)
A means for detecting incoming fraudulent calls;
means for collecting conversation from a voice input device installed in the communication device;
A means for converting conversation data into text data;
a means for transmitting the character data to the generative AI model;
The system includes a means for issuing an audio alert from the audio input device if the generative AI model determines that fraud is suspected from the conversation data.

（請求項２）
生成AIモデルが、特殊詐欺の可能性を示す特定のフレーズまたはパターンを検出することにより詐欺が疑われると判断する、請求項１記載のシステム。 (Claim 2)
The system of claim 1 , wherein the generative AI model determines that fraud is suspected by detecting specific phrases or patterns that indicate the possibility of specialized fraud.

「応用例２」 "Application Example 2"

（請求項１）
詐欺電話の入電を検知する手段と、
音声認識装置から会話を聞き取る手段と、
会話データを文字化し生成系ＡＩへ送信する手段と、
生成系ＡＩが会話データから詐欺が疑われると判断した場合に、音声出力装置から音声アラートを発する手段と、
スマートフォンにインストールされるアプリケーションを含む手段と、
を含むシステム。 (Claim 1)
A means for detecting incoming fraudulent calls;
means for listening to a conversation from a speech recognition device;
A means for converting conversation data into text and transmitting it to a generative AI;
means for issuing a voice alert from a voice output device when the generative AI determines that fraud is suspected from the conversation data;
means including an application installed on a smartphone;
A system including:

（請求項２）
前記生成系ＡＩが、特殊詐欺の可能性を示す特定のフレーズまたはパターンを検出することにより詐欺が疑われると判断する、請求項１記載のシステム。 (Claim 2)
The system of claim 1 , wherein the generative AI determines that fraud is suspected by detecting specific phrases or patterns that indicate the possibility of specialized fraud.

「実施例３」 "Example 3"

（請求項１）
詐欺電話の入電を検知する手段と、
通信端末に設置した音声入力装置から会話を聞き取る手段と、
会話データを文字化し生成系人工知能へ送信する手段と、
生成系人工知能が会話データから詐欺が疑われると判断した場合に、前記音声入力装置から音声アラートを発する手段と、
ユーザが入力したテキストデータをサーバに送信する手段と、
サーバが生成系人工知能モデルを用いてテキストデータを解析し、詐欺の可能性を判断する手段と、
サーバが解析結果を通信端末に送信する手段と、
通信端末が解析結果をユーザに表示する手段と、
を含むシステム。 (Claim 1)
A means for detecting incoming fraudulent calls;
means for listening to a conversation from a voice input device installed in a communication terminal;
A means for transcribing conversation data and transmitting it to a generative artificial intelligence;
means for issuing a voice alert from the voice input device when the generative artificial intelligence determines that fraud is suspected from the conversation data;
means for transmitting text data entered by a user to a server;
A means for the server to analyze the text data using a generative artificial intelligence model to determine the likelihood of fraud;
A means for the server to transmit the analysis result to the communication terminal;
a means for the communication terminal to display the analysis result to a user;
A system including:

（請求項２） (Claim 2)

生成系人工知能が、特殊詐欺の可能性を示す特定のフレーズまたはパターンを検出することにより詐欺が疑われると判断する、請求項１記載のシステム。 The system described in claim 1, wherein the generative artificial intelligence determines that fraud is suspected by detecting specific phrases or patterns that indicate the possibility of specialized fraud.

（請求項３）
前記音声アラートが、ユーザが詐欺の可能性に気づくことを促す内容である、請求項１記載のシステム。 (Claim 3)
The system of claim 1 , wherein the audio alert alerts a user to potential fraud.

「応用例３」 "Application Example 3"

（請求項１）
詐欺電話の入電を検知する手段と、
通信端末に設置した音声入力装置から会話を聞き取る手段と、
会話データを文字化し生成系ＡＩへ送信する手段と、
生成系ＡＩが会話データから詐欺が疑われると判断した場合に、前記音声入力装置から音声アラートを発する手段と、
生成系ＡＩが詐欺の可能性を示す特定のフレーズまたはパターンを検出する手段と、
生成系ＡＩが詐欺の可能性を示す特定のフレーズまたはパターンを検出した場合に、ユーザーに警告を発する手段と、
を含むシステム。 (Claim 1)
A means for detecting incoming fraudulent calls;
means for listening to a conversation from a voice input device installed in a communication terminal;
A means for converting conversation data into text and transmitting it to a generative AI;
means for issuing a voice alert from the voice input device when the generative AI determines that fraud is suspected from the conversation data;
means for the generative AI to detect specific phrases or patterns indicative of potential fraud;
means for alerting a user if the generative AI detects certain phrases or patterns that indicate potential fraud; and
A system including:

「感情エンジンを組み合わせた場合の実施例１」 "Example 1: Combining Emotion Engines"

（請求項１）
電話の入電について所定のリストを用いて詐欺電話の入電を検知する手段と、
通信端末に設置した音声入力装置から会話データを収集して取得する手段と、
前記会話データを文字化し、文字化した当該会話データの解析の指示を生成系ＡＩへ入力する手段と、
前記生成系ＡＩが前記会話データを解析し、詐欺が疑われると判断した場合に、前記通信端末の音声出力装置から音声アラートを発する手段と、
前記会話データのユーザーの音声に関するパラメータを収集し、感情分析を行う手段と、
前記感情分析の結果を基に詐欺の可能性を再評価する手段と、を含むシステム。 (Claim 1)
means for detecting incoming fraudulent calls using a predetermined list of incoming calls;
means for collecting and acquiring conversation data from a voice input device installed in the communication terminal;
A means for transcribing the conversation data and inputting instructions for analyzing the transcribed conversation data to a generative AI;
means for issuing a voice alert from a voice output device of the communication terminal when the generative AI analyzes the conversation data and determines that fraud is suspected;
means for collecting parameters relating to the user's voice in the conversation data and performing sentiment analysis;
and means for reassessing the likelihood of fraud based on the results of the sentiment analysis.

（請求項２）
前記生成系人工知能が、特殊詐欺の可能性を示す特定のフレーズまたはパターンを検出することにより詐欺が疑われると判断する、請求項１記載のシステム。 (Claim 2)
The system of claim 1 , wherein the generative artificial intelligence determines that fraud is suspected by detecting specific phrases or patterns that indicate the possibility of specialized fraud.

「感情エンジンを組み合わせた場合の応用例１」 "Application example 1: Combining emotion engines"

（請求項１）
詐欺電話の入電を検知する手段と、
通信端末に設置した音声入力装置から会話を聞き取る手段と、
会話データを文字化し生成系ＡＩへ送信する手段と、
生成系ＡＩが会話データから詐欺が疑われると判断した場合に、前記音声入力装置から音声アラートを発する手段と、
ユーザーの声のトーン、音量、話速などから感情を分析する感情エンジンを含む手段と、
感情エンジンの分析結果を基に詐欺の可能性をさらに高める手段と、
を含むシステム。 (Claim 1)
A means for detecting incoming fraudulent calls;
means for listening to a conversation from a voice input device installed in a communication terminal;
A means for converting conversation data into text and transmitting it to a generative AI;
means for issuing a voice alert from the voice input device when the generative AI determines that fraud is suspected from the conversation data;
a means including an emotion engine for analyzing emotions based on the tone, volume, and speaking rate of a user's voice;
and measures to further increase the likelihood of fraud based on the results of the emotion engine analysis.
A system including:

（請求項２）
生成系ＡＩが、特殊詐欺の可能性を示す特定のフレーズまたはパターンを検出することにより詐欺が疑われると判断する、請求項１記載のシステム。 (Claim 2)
2. The system of claim 1, wherein the generative AI determines that fraud is suspected by detecting specific phrases or patterns that indicate the possibility of specialized fraud.

「感情エンジンを組み合わせた場合の実施例２」 "Example 2: Combining Emotion Engines"

（請求項１）
詐欺電話の入電を検知する手段と、
通信装置に設置した音声入力装置から会話を聞き取る手段と、
会話データを文字化し生成AIモデルへ送信する手段と、
生成AIモデルが会話データから詐欺が疑われると判断した場合に、前記音声入力装置から音声アラートを発する手段と、
感情解析エンジンがユーザーの声のトーンや話速を解析する手段と、
感情解析エンジンが特定の閾値を超えた場合に生成AIモデルが音声アラートを発する手段を含むシステム。 (Claim 1)
A means for detecting incoming fraudulent calls;
means for listening to a conversation from a voice input device installed in the communication device;
A means of transcribing the conversation data and sending it to the generative AI model;
means for issuing a voice alert from the voice input device when the generative AI model determines that fraud is suspected from the conversation data;
The emotion analysis engine analyzes the tone and speed of the user's voice,
A system including a means for the generative AI model to issue an audio alert when the sentiment analysis engine exceeds a certain threshold.

（請求項２）
前記生成AIモデルが、特殊詐欺の可能性を示す特定のフレーズまたはパターンを検出することにより詐欺が疑われると判断する、請求項１記載のシステム。 (Claim 2)
10. The system of claim 1, wherein the generative AI model determines that fraud is suspected by detecting specific phrases or patterns indicative of the possibility of specialized fraud.

「感情エンジンを組み合わせた場合の応用例２」 "Application Example 2: Combining Emotion Engines"

（請求項１）
詐欺電話の入電を検知する手段と、
スマートデバイスに設置した音声入力装置から会話を聞き取る手段と、
会話データを文字化し生成系AIへ送信する手段と、
生成系AIが会話データから詐欺が疑われると判断した場合に、前記音声入力装置から音声アラートを発する手段と、
感情エンジンを用いてユーザーの感情を分析し、特定の閾値を超えた場合に警告を発する手段と、
を含むシステム。 (Claim 1)
A means for detecting incoming fraudulent calls;
A means for listening to a conversation from a voice input device installed on a smart device;
A means of transcribing conversation data and sending it to generative AI,
means for issuing a voice alert from the voice input device when the generative AI determines that fraud is suspected from the conversation data;
A means for analyzing user emotions using an emotion engine and issuing a warning when certain thresholds are exceeded;
A system including:

（請求項２）
前記生成系AIが、特殊詐欺の可能性を示す特定のフレーズまたはパターンを検出することにより詐欺が疑われると判断する、請求項１記載のシステム。 (Claim 2)
The system of claim 1, wherein the generative AI determines that fraud is suspected by detecting specific phrases or patterns that indicate the possibility of specialized fraud.

「感情エンジンを組み合わせた場合の実施例３」 "Example 3: Combining Emotion Engines"

（請求項１）
詐欺電話の入電を検知する手段と、
通信端末に設置した音声入力装置から会話を聞き取る手段と、
会話データを文字化し生成系人工知能へ送信する手段と、
生成系人工知能が会話データから詐欺が疑われると判断した場合に、前記音声入力装置から音声アラートを発する手段と、
感情解析エンジンがユーザの感情を分析する手段と、
生成系人工知能が感情解析エンジンからの情報と会話データを統合して詐欺の可能性を判断する手段を含むシステム。 (Claim 1)
A means for detecting incoming fraudulent calls;
means for listening to a conversation from a voice input device installed in a communication terminal;
A means for transcribing conversation data and transmitting it to a generative artificial intelligence;
means for issuing a voice alert from the voice input device when the generative artificial intelligence determines that fraud is suspected from the conversation data;
A means for the sentiment analysis engine to analyze the sentiment of the user;
A system that includes a means for a generative artificial intelligence to integrate information from a sentiment analysis engine with conversational data to determine likelihood of fraud.

（請求項２） (Claim 2)

（請求項３） (Claim 3)

音声アラートが、ユーザが詐欺の可能性に気づくことを促す内容である、請求項１記載のシステム。 The system of claim 1, wherein the audio alert alerts the user to potential fraud.

「感情エンジンを組み合わせた場合の応用例３」 "Application Example 3: Combining Emotion Engines"

（請求項１）
詐欺電話の入電を検知する手段と、
スマートフォンアプリケーションまたは家庭用電話に設置したマイクから会話を聞き取る手段と、
会話データを文字化し生成系ＡＩへ送信する手段と、
生成系ＡＩが会話データから詐欺が疑われると判断した場合に、前記マイクから音声アラートを発する手段と、
感情エンジンを用いてユーザーの感情を分析する手段と、
感情エンジンからの情報と生成系ＡＩが分析した会話データを組み合わせて詐欺の可能性を判断する手段と、
詐欺の可能性が高い場合にユーザーに警告を表示する手段
を含むシステム。 (Claim 1)
A means for detecting incoming fraudulent calls;
a means for listening to the conversation through a smartphone application or a microphone installed on a home phone;
A means for converting conversation data into text and transmitting it to a generative AI;
means for issuing an audio alert from the microphone when the generative AI determines that fraud is suspected from the conversation data;
A means for analyzing user emotions using an emotion engine;
A means to determine the possibility of fraud by combining information from the emotion engine with conversation data analyzed by generative AI; and
The system includes a means to warn users when fraud is likely.

１０、２１０、３１０、４１０データ処理システム
１２データ処理装置
１４スマートデバイス
２１４スマート眼鏡
３１４ヘッドセット型端末
４１４ロボット 10, 210, 310, 410 Data processing system 12 Data processing device 14 Smart device 214 Smart glasses 314 Headset type terminal 414 Robot

Claims

means for detecting an incoming fraudulent call that may be fraudulent using a predetermined list of incoming calls , and notifying a communication terminal of the incoming fraudulent call;
means for collecting and acquiring conversation data of the fraudulent call notified from a voice input device installed in the communication terminal;
A means for transcribing the conversation data and inputting instructions for analyzing the transcribed conversation data to a generative AI;
means for instructing the communication terminal to issue an audio alert by sending a command to play an audio alert to an audio output device of the communication terminal when the generative AI analyzes the conversation data and determines that fraud is suspected;
means for collecting parameters relating to the user's voice in the conversation data and performing sentiment analysis;
means for reassessing the likelihood of fraud based on the results of said sentiment analysis;
A system including:

The system of claim 1, wherein the generative AI determines that fraud is suspected when it detects a specific phrase or pattern that indicates the possibility of specialized fraud.

The system of claim 1, wherein the audio alert emitted from the audio output device alerts the user to possible fraud.