JP7791014B2

JP7791014B2 - Information processing device, information processing method, and information processing program

Info

Publication number: JP7791014B2
Application number: JP2022034738A
Authority: JP
Inventors: 純平三宅
Original assignee: Individual
Current assignee: Individual
Priority date: 2022-03-07
Filing date: 2022-03-07
Publication date: 2025-12-23
Anticipated expiration: 2042-03-07
Also published as: JP2023130204A

Description

本発明は、情報処理装置、情報処理方法、および情報処理プログラムに関する。 The present invention relates to an information processing device, an information processing method, and an information processing program.

従来、利用者の発話をテキスト化してテキスト情報に変換し、変換したテキスト情報を提供する技術が知られている。例えば、特許文献１には、利用者の音声を認識する音声認識部と、音声認識部で認識した音声を文字列に変換する文字変換部と、文字変換部で変換した文字列を表示する文字列表示部とを備える音声入力装置が開示されている。 Technology has been known in the past that converts a user's speech into text information and provides the converted text information. For example, Patent Document 1 discloses a voice input device that includes a voice recognition unit that recognizes the user's voice, a character conversion unit that converts the voice recognized by the voice recognition unit into a character string, and a character string display unit that displays the character string converted by the character conversion unit.

特開２０２１－１６８０２０号公報Japanese Patent Application Laid-Open No. 2021-168020

しかしながら、上記従来技術では、利用者の利便性の向上を図る点で改善の余地がある。例えば、特許文献１に記載の技術では、利用者の発話の内容をすべてテキスト化して表示することから、例えば、プライバシーなどの観点から利用者が秘匿したい内容などがある場合、表示されている文字列を編集する必要があり、改善の余地がある。 However, the above-mentioned conventional technologies leave room for improvement in terms of improving user convenience. For example, the technology described in Patent Document 1 converts all of the user's speech into text and displays it. Therefore, if there is content that the user wants to keep secret for privacy reasons, for example, the displayed text must be edited, leaving room for improvement.

本願は、上記に鑑みてなされたものであって、利用者の利便性の向上を図ることができる情報処理装置、情報処理方法、および情報処理プログラムを提供することを目的とする。 This application has been made in light of the above, and aims to provide an information processing device, information processing method, and information processing program that can improve user convenience.

本願に係る情報処理装置は、受付部と、生成部と、提供部とを備える。受付部は、利用者の発話を受け付ける。生成部は、受付部によって受け付けられた発話のうち秘匿条件を満たす部分をマスキングして利用者の発話をテキスト化したテキスト情報である発話テキスト情報を生成する。提供部は、生成部によって生成された発話テキスト情報を提供する。 The information processing device according to the present application includes a reception unit, a generation unit, and a provision unit. The reception unit receives a user's utterance. The generation unit masks portions of the utterance received by the reception unit that satisfy confidentiality conditions, and generates utterance text information, which is text information obtained by converting the user's utterance into text. The provision unit provides the utterance text information generated by the generation unit.

実施形態の一態様によれば、利用者の利便性の向上を図ることができるという効果を奏する。 According to one aspect of the embodiment, it is possible to achieve an effect of improving user convenience.

図１は、実施形態に係る情報処理の一例を示す図である。FIG. 1 is a diagram illustrating an example of information processing according to an embodiment. 図２は、実施形態に係る情報処理装置の構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of the configuration of the information processing apparatus according to the embodiment. 図３は、実施形態に係る情報処理装置の表示部に表示された発話テキスト情報の一例を示す図である。FIG. 3 is a diagram illustrating an example of spoken text information displayed on the display unit of the information processing apparatus according to the embodiment. 図４は、実施形態に係る情報処理装置の表示部に表示された発話テキスト情報のうち選択された部分のマスキングが解除される例を示す図である。FIG. 4 is a diagram showing an example in which masking of a selected portion of spoken text information displayed on the display unit of the information processing apparatus according to the embodiment is released. 図５は、実施形態に係る情報処理装置の処理部による情報処理の一例を示すフローチャートである。FIG. 5 is a flowchart illustrating an example of information processing by the processing unit of the information processing apparatus according to the embodiment. 図６は、実施形態に係る情報処理装置の処理部による情報処理の一例を示すフローチャートである。FIG. 6 is a flowchart illustrating an example of information processing by the processing unit of the information processing apparatus according to the embodiment. 図７は、実施形態に係る情報処理装置の構成の他の例を示す図である。FIG. 7 is a diagram illustrating another example of the configuration of the information processing device according to the embodiment. 図８は、実施形態に係る情報処理装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。FIG. 8 is a hardware configuration diagram illustrating an example of a computer that realizes the functions of the information processing apparatus according to the embodiment.

以下に、本願に係る情報処理装置、情報処理方法、および情報処理プログラムを実施するための形態（以下、「実施形態」と呼ぶ）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る情報処理装置、情報処理方法、および情報処理プログラムが限定されるものではない。また、各実施形態は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 Below, detailed explanations will be given of the information processing device, information processing method, and information processing program (hereinafter referred to as "embodiments") according to the present application, with reference to the drawings. Note that the information processing device, information processing method, and information processing program according to the present application are not limited to these embodiments. Furthermore, the embodiments can be combined as appropriate to the extent that the processing content is not contradictory. Furthermore, the same components in the following embodiments will be assigned the same reference numerals, and redundant explanations will be omitted.

〔１．情報処理の一例〕
図１は、実施形態に係る情報処理の一例を示す図であり、本実施形態においては情報処理装置１により情報処理方法が実行される。 [1. An example of information processing]
FIG. 1 is a diagram showing an example of information processing according to an embodiment, and in this embodiment, an information processing method is executed by an information processing device 1.

情報処理装置１は、対話型の音声操作に対応するＡＩ（人工知能：Artificial Intelligence）アシスタント機能を利用可能な装置であり、利用者Ｕは、情報処理装置１と対話することで周辺の機器を制御したり、様々な情報を取得したりすることができる。周辺の機器は、例えば、照明機器、冷蔵庫、洗濯機、エアーコンディショナー、テレビジョン受像機、食器洗浄機、食器乾燥機、電磁調理器、または電子レンジといった種々の機器である。 The information processing device 1 is a device that can utilize an AI (artificial intelligence) assistant function that supports interactive voice operation, and the user U can control peripheral devices and obtain various information by interacting with the information processing device 1. Peripheral devices include various appliances such as lighting equipment, refrigerators, washing machines, air conditioners, television sets, dishwashers, dish dryers, induction cookers, and microwave ovens.

また、利用者Ｕは、様々な情報を取得するための発話を情報処理装置１に対して行うと、情報処理装置１は、利用者Ｕからの指示を示す入力情報を情報提供装置２（図２参照）へ送信する。情報処理装置１は、入力情報に応じた情報提供装置２からネットワークＮ（図２参照）を介して提供されるコンテンツ（例えば、ニュース、交通情報、天候、および音楽などの各種情報）を取得し、取得したコンテンツを表示部に表示したりスピーカから出力したりすることができる。 In addition, when user U speaks to information processing device 1 to obtain various information, information processing device 1 transmits input information indicating instructions from user U to information providing device 2 (see Figure 2). Information processing device 1 can obtain content (e.g., various information such as news, traffic information, weather, and music) provided via network N (see Figure 2) from information providing device 2 in accordance with the input information, and display the obtained content on the display unit or output it from the speaker.

さらに、情報処理装置１は、利用者Ｕの発話をテキスト化し、テキスト化した情報であるテキスト情報を出力するテキスト化機能を有している。かかるテキスト化機能によって、利用者Ｕは、例えば、メールやＳＮＳ（Social Networking Service）に用いる文章、電子掲示板や口コミサイトなどへ投稿する文章などを情報処理装置１に対する発話によって作成することができる。以下、図１を参照して主にテキスト化機能について説明する。 In addition, the information processing device 1 has a text conversion function that converts the speech of the user U into text and outputs the converted text information. This text conversion function allows the user U to create, for example, text to be used in emails or SNS (Social Networking Service), or text to be posted to electronic bulletin boards or review sites, by speaking to the information processing device 1. The text conversion function will be mainly described below with reference to Figure 1.

図１に示すように、利用者Ｕは、情報処理装置１のテキスト機能を利用する場合、情報処理装置１に向けて発話を行う（ステップＳ１）。図１に示す例では、利用者Ｕの発話は、「最近引っ越した所は、東京都Ａ区１-２-３のマンションですが、住み心地がよく、引っ越しの際には、特許太郎さんとかにもとてもお世話になりました。今度、パーティーをするので、是非遊びに来て下さい。連絡先は、０９０-０１９０-ｘｘｘｘです。」である。なお、「０９０-０１９０-ｘｘｘｘ」は電話番号であり、「ｘｘｘｘ」は４桁の数字の組み合わせである。 As shown in FIG. 1, when user U uses the text function of information processing device 1, he or she speaks to information processing device 1 (step S1). In the example shown in FIG. 1, user U's utterance is, "I recently moved into an apartment at 1-2-3, A-ku, Tokyo. It's a very comfortable place to live, and Tokkyo Taro and others were very helpful to me when I moved. I'm having a party soon, so please come along. My contact number is 090-0190-xxxx." Note that "090-0190-xxxx" is the phone number, and "xxxx" is a four-digit number combination.

情報処理装置１は、利用者Ｕの発話を受け付ける（ステップＳ２）。そして、情報処理装置１は、ステップＳ２で受け付けた利用者Ｕの発話のうち秘匿条件を満たす部分をマスキングして利用者Ｕの発話をテキスト化したテキスト情報である発話テキスト情報を生成する（ステップＳ３）。秘匿条件は、例えば、住所、氏名、電話番号などといったプライバシーにかかわる内容を示す言葉であるが、利用者Ｕの情報処理装置１に対する操作によって設定または変更が可能である。 The information processing device 1 accepts the user U's utterance (step S2). Then, the information processing device 1 masks the parts of the user U's utterance accepted in step S2 that satisfy the confidentiality conditions, and generates utterance text information, which is text information obtained by converting the user U's utterance into text (step S3). The confidentiality conditions are, for example, words that indicate privacy-related content such as address, name, and telephone number, and can be set or changed by the user U operating the information processing device 1.

図１に示す例では、「Ａ区１-２-３」、「特許太郎」、および「０９０-０１９０-ｘｘｘｘ」が秘匿条件を満たす部分（単語または単語群など）であり、情報処理装置１は、秘匿条件を満たす部分をマスキングする。「Ａ区１-２-３」は、住所を示す情報であり、「特許太郎」は氏名を示す情報である。 In the example shown in Figure 1, "A-ku 1-2-3," "Takken Taro," and "090-0190-xxxx" are parts (words or groups of words, etc.) that satisfy the confidentiality conditions, and the information processing device 1 masks the parts that satisfy the confidentiality conditions. "A-ku 1-2-3" is information that indicates an address, and "Takken Taro" is information that indicates a name.

情報処理装置１は、利用者Ｕの発話をテキスト化する過程で秘匿条件を満たす部分のマスキングを行うが、利用者Ｕの発話をテキスト化した後に、秘匿条件を満たす部分のマスキングを行うこともできる。 The information processing device 1 masks parts that satisfy confidentiality conditions during the process of converting user U's speech into text, but it can also mask parts that satisfy confidentiality conditions after converting user U's speech into text.

情報処理装置１は、例えば、ステップＳ２で受け付けた利用者Ｕの発話に対応する音声情報またはテキスト情報を入力とし、利用者Ｕの発話を構成する複数の要素部分の各々に対する秘匿度合いを示す秘匿スコアを出力とする学習済みモデルを有する。情報処理装置１は、ステップＳ２で受け付けた利用者Ｕの発話に対応する音声情報またはテキスト情報を学習済みモデルに入力し、利用者Ｕの発話を構成する複数の要素部分の各々に対する秘匿度合いを示す秘匿スコアを学習済みモデルから取得する。 The information processing device 1 has a trained model that receives, for example, audio information or text information corresponding to the user U's utterance received in step S2 as input, and outputs a confidentiality score indicating the degree of confidentiality for each of the multiple element parts that make up the user U's utterance. The information processing device 1 inputs the audio information or text information corresponding to the user U's utterance received in step S2 into the trained model, and obtains from the trained model a confidentiality score that indicates the degree of confidentiality for each of the multiple element parts that make up the user U's utterance.

情報処理装置１は、複数の要素部分のうち秘匿スコアが閾値以上である要素部分を、秘匿条件を満たす部分として、発話テキスト情報を生成する。なお、学習済みモデルの入力がテキスト情報である場合、情報処理装置１は、ステップＳ２で受け付けた利用者Ｕの発話に対応する音声情報をテキスト情報に変換し、変換したテキスト情報を学習済みモデルに入力する。 The information processing device 1 generates spoken text information by treating element parts among the multiple element parts whose confidentiality scores are equal to or greater than a threshold as parts that satisfy the confidentiality conditions. Note that if the input to the trained model is text information, the information processing device 1 converts the audio information corresponding to the utterance of user U received in step S2 into text information, and inputs the converted text information into the trained model.

また、情報処理装置１は、例えば、ステップＳ２で受け付けた利用者Ｕの発話に対応する音声情報またはテキスト情報を入力とし、利用者Ｕの発話のうち秘匿条件を満たす部分をマスキングした発話テキスト情報を発話テキスト情報として出力とする学習済みモデルを有していてもよい。この場合、情報処理装置１は、ステップＳ２で受け付けた利用者Ｕの発話に対応する音声情報またはテキスト情報を学習済みモデルに入力し、学習済みモデルから出力されるテキスト情報を発話テキスト情報として得ることができる。 In addition, the information processing device 1 may have a trained model that receives, for example, audio information or text information corresponding to the utterance of user U received in step S2 as input, and outputs, as spoken text information, speech text information in which parts of the utterance of user U that satisfy confidentiality conditions have been masked. In this case, the information processing device 1 can input, into the trained model, the audio information or text information corresponding to the utterance of user U received in step S2, and obtain the text information output from the trained model as speech text information.

また、情報処理装置１は、学習済みモデルに代えてまたは加えて、秘匿対象となる複数の言葉を示す情報を含む秘匿情報テーブルを有していてもよい。この場合、情報処理装置１は、秘匿情報テーブルに含まれる言葉に対応するテキスト情報または音声情報を、秘匿条件を満たす部分として、発話テキスト情報を生成することができる。 In addition to or instead of the trained model, the information processing device 1 may have a confidential information table containing information indicating multiple words to be concealed. In this case, the information processing device 1 can generate spoken text information by treating the text information or audio information corresponding to the words included in the confidential information table as the portion that satisfies the confidentiality conditions.

なお、秘匿情報テーブルに含まれる文字列は、正規表現で示される文字列であってもよく、この場合、秘匿情報テーブルで示される正規表現で特定されるテキスト情報または音声情報を、秘匿条件を満たす部分として、発話テキスト情報を生成する。 The character strings included in the confidential information table may be character strings expressed using regular expressions. In this case, the text information or audio information specified by the regular expressions shown in the confidential information table is used as the part that satisfies the confidentiality conditions to generate the spoken text information.

マスキングは、利用者Ｕの発話のうち秘匿条件を満たす部分を、例えば、特定の文字、記号、および模様のうちの少なくとも１つに置き換えることによって行われる。図１に示す例では、利用者Ｕの発話のうち秘匿条件を満たす部分が「Ｘ」の文字に置き換えられた発話テキスト情報が生成される。 Masking is performed by replacing parts of user U's utterance that satisfy the confidentiality conditions with, for example, at least one of specific characters, symbols, and patterns. In the example shown in Figure 1, spoken text information is generated in which parts of user U's utterance that satisfy the confidentiality conditions are replaced with the letter "X."

具体的には、図１に示す例では、発話テキスト情報は、「最近引っ越した所は、東京都ＸＸＸＸＸのマンションですが、住み心地がよく、引っ越しの際には、ＸＸＸＸさんとかにもとてもお世話になりました。今度、パーティーをするので、是非遊びに来て下さい。連絡先は、ＸＸＸ-ＸＸＸＸ-ＸＸＸＸです。」である。 Specifically, in the example shown in Figure 1, the spoken text information is, "I recently moved into an apartment in XXXXXX, Tokyo. It's very comfortable, and XXXX and others were very helpful to me when I moved. I'm having a party soon, so please come along. My contact details are XXX-XXXX-XXXX."

情報処理装置１は、ステップＳ３で生成した発話テキスト情報を提供する（ステップＳ４）。例えば、情報処理装置１は、表示部に発話テキスト情報を表示することによって利用者Ｕに発話テキスト情報を提供することができる。また、情報処理装置１は、スピーカから発話テキスト情報を音として出力することによって利用者Ｕに発話テキスト情報を提供することもできる。 The information processing device 1 provides the spoken text information generated in step S3 (step S4). For example, the information processing device 1 can provide the spoken text information to the user U by displaying the spoken text information on a display unit. The information processing device 1 can also provide the spoken text information to the user U by outputting the spoken text information as sound from a speaker.

なお、情報処理装置１は、音声で発話テキスト情報を提供する場合、マスキングした部分を無音にしたり、予め設定された音（例えば、ピーという音）にしたりすることができる。また、情報処理装置１は、発話テキスト情報を他者の端末装置に送信することで、他者に発話テキスト情報を提供することもできる。これにより、例えば、利用者Ｕの発話に秘匿条件を満たす部分がある場合であっても、その部分の内容が他者に知られることを抑制することができる。 When providing spoken text information by voice, the information processing device 1 can silence the masked portions or set a preset sound (for example, a beep). The information processing device 1 can also provide spoken text information to others by transmitting the spoken text information to their terminal device. This makes it possible to prevent others from learning the content of those portions, even if, for example, there is a portion in user U's speech that satisfies confidentiality conditions.

また、情報処理装置１は、表示部に表示されている発話テキスト情報のうちマスキングされた部分が利用者Ｕによって選択された場合、マスキングを解除することができる。これにより、利用者Ｕは、マスキングされた内容を知りたい場合には、情報処理装置１を操作することで容易にマスキングされた内容を知ることができる。なお、情報処理装置１は、発話テキスト情報のうちマスキングされた部分に対する利用者Ｕの選択が解除された場合、再度マスキングを行うこともできる。 Furthermore, the information processing device 1 can unmask a masked portion of the spoken text information displayed on the display unit when the user U selects that portion. This allows the user U to easily find out the masked content by operating the information processing device 1 if they wish to know the masked content. Note that the information processing device 1 can also re-mask a portion of the spoken text information when the user U unselects the masked portion.

このように、実施形態に係る情報処理装置１は、利用者Ｕの発話のうち秘匿条件を満たす部分をマスキングして利用者Ｕの発話をテキスト化したテキスト情報である発話テキスト情報を生成し、生成した発話テキスト情報を利用者Ｕに提供する。これにより、情報処理装置１は、利用者Ｕの利便性の向上を図ることができる。 In this way, the information processing device 1 according to the embodiment generates utterance text information, which is text information obtained by converting user U's utterance into text by masking parts of the utterance that satisfy confidentiality conditions, and provides the generated utterance text information to user U. This allows the information processing device 1 to improve convenience for user U.

以下、このような処理を行う情報処理装置１を含む情報処理システムの構成などについて、詳細に説明する。 The following provides a detailed explanation of the configuration of an information processing system that includes an information processing device 1 that performs such processing.

〔２．情報処理システムの構成〕
次に、図２を用いて、実施形態に係る情報処理装置１を含む情報処理システムの構成について説明する。図２は、実施形態に係る情報処理装置１の構成の一例を示す図である。図２に示すように、実施形態に係る情報処理システム１００は、情報処理装置１と、情報提供装置２とを含む。情報処理装置１および情報提供装置２は、ネットワークＮを介して、有線または無線により通信可能に接続される。なお、図２に示す情報処理システム１００には、複数台の情報処理装置１および複数台の情報提供装置２が含まれてもよい。 2. Configuration of the information processing system
Next, the configuration of an information processing system including an information processing device 1 according to an embodiment will be described with reference to Fig. 2. Fig. 2 is a diagram showing an example of the configuration of the information processing device 1 according to an embodiment. As shown in Fig. 2, the information processing system 100 according to the embodiment includes the information processing device 1 and an information providing device 2. The information processing device 1 and the information providing device 2 are connected to each other via a network N so as to be able to communicate with each other via a wired or wireless connection. Note that the information processing system 100 shown in Fig. 2 may include a plurality of information processing devices 1 and a plurality of information providing devices 2.

情報処理装置１は、例えば、スマートスピーカ、デスクトップ型ＰＣ（Personal Computer）、ノート型ＰＣ、タブレット端末、携帯電話機、またはＰＤＡ（Personal Digital Assistant）などである。なお、情報処理装置１は、上記例に限定されなくともよく、例えば、スマートウォッチ、またはウェアラブルデバイス（Wearable Device）であってもよい。 The information processing device 1 may be, for example, a smart speaker, a desktop PC (Personal Computer), a notebook PC, a tablet terminal, a mobile phone, or a PDA (Personal Digital Assistant). Note that the information processing device 1 is not limited to the above examples and may also be, for example, a smart watch or a wearable device.

情報提供装置２は、利用者Ｕにオンラインでサービスを提供する。情報提供装置２によって提供されるサービスは、例えば、検索サービス、情報提供サービス、電子商取引サービス、オークションサービス、音楽配信サービス、動画配信サービスなどのオンラインサービスであるが、かかる例に限定されない。情報提供サービスは、検索サイトによって提供される検索サービス、ニュースサイトで提供されるニュース配信サービス、交通情報サイトで提供される交通情報提供サービス、天候情報サイトで提供される天候情報提供サービスなどの種々のサービスが含まれる。 The information providing device 2 provides online services to the user U. Services provided by the information providing device 2 include, but are not limited to, online services such as search services, information providing services, e-commerce services, auction services, music distribution services, and video distribution services. Information providing services include a variety of services such as search services provided by search sites, news distribution services provided by news sites, traffic information providing services provided by traffic information sites, and weather information providing services provided by weather information sites.

情報提供装置２は、インターネットなどの所定のネットワークＮを介して、各種の装置と通信可能な情報処理装置であり、例えば、サーバ装置またはクラウドシステムなどにより実現される。例えば、情報提供装置２は、ネットワークＮを介して、他の各種装置と通信可能に接続される。 The information providing device 2 is an information processing device capable of communicating with various devices via a predetermined network N such as the Internet, and is realized, for example, by a server device or a cloud system. For example, the information providing device 2 is connected to various other devices via the network N so that it can communicate with them.

〔３．情報処理装置１〕
図２に示すように、実施形態に係る情報処理装置１は、通信部１０と、表示部１１と、操作部１２と、記憶部１３と、音声入力部１４と、音声出力部１５と、位置検出部１６と、処理部１７とを備える。 3. Information Processing Device 1
As shown in Figure 2, the information processing device 1 according to the embodiment includes a communication unit 10, a display unit 11, an operation unit 12, a memory unit 13, an audio input unit 14, an audio output unit 15, a position detection unit 16, and a processing unit 17.

〔３．１．通信部１０〕
通信部１０は、例えば、ＮＩＣ（Network Interface Card）などによって実現される。通信部１０は、ネットワークＮと有線または無線で接続され、ネットワークＮを介して、情報提供装置２との間で情報の送受信を行う。 3.1. Communication Unit 10
The communication unit 10 is realized by, for example, a network interface card (NIC), etc. The communication unit 10 is connected to a network N by wire or wirelessly, and transmits and receives information to and from the information providing device 2 via the network N.

〔３．２．表示部１１〕
表示部１１は、例えば、ＬＣＤ（Liquid Crystal Display）または有機ＥＬ（Electro Luminescence）ディスプレイなどである。 [3.2. Display section 11]
The display unit 11 is, for example, an LCD (Liquid Crystal Display) or an organic EL (Electro Luminescence) display.

〔３．３．操作部１２〕
操作部１２は、例えば、文字、数字、およびスペースを入力するためのキー、エンターキーおよび矢印キーなどを含むキーボード、マウス、および電源ボタンなどを含む。表示部１１がタッチパネルディスプレイの表示装置である場合、操作部１２はタッチパネルを含んでいてもよい。 [3.3. Operation unit 12]
The operation unit 12 includes, for example, a keyboard including keys for inputting letters, numbers, and spaces, an enter key, arrow keys, etc., a mouse, a power button, etc. When the display unit 11 is a touch panel display device, the operation unit 12 may include a touch panel.

〔３．４．記憶部１３〕
記憶部１３は、例えば、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）などの半導体メモリ素子、または、ハードディスク、光ディスクなどの記憶装置によって実現される。 [3.4. Storage unit 13]
The storage unit 13 is realized by, for example, a semiconductor memory element such as a random access memory (RAM) or a flash memory, or a storage device such as a hard disk or an optical disk.

記憶部１３には、各種の情報が記憶される。例えば、記憶部１３には、情報提供装置２から送信されネットワークＮおよび通信部１０を介して処理部１７によって取得された情報などが記憶される。また、記憶部１３には、利用者Ｕの発話に対応する音声情報およびテキスト情報なども記憶される。 The memory unit 13 stores various types of information. For example, the memory unit 13 stores information transmitted from the information providing device 2 and acquired by the processing unit 17 via the network N and the communication unit 10. The memory unit 13 also stores voice information and text information corresponding to the speech of the user U.

〔３．５．音声入力部１４〕
音声入力部１４は、利用者Ｕが発した音声の信号である音声信号をデジタル信号に変換し、変換したデジタル信号である音声デジタル信号を処理部１７に音声情報として出力する。音声入力部１４は、例えば、マイクロホンと、マイクホンから出力される電気的なアナログ信号である音声信号をデジタル信号に変換するＡＤ（Analog to Digital）変換器とを含む。 3.5. Audio Input Unit 14
The voice input unit 14 converts a voice signal, which is a signal of the voice uttered by the user U, into a digital signal and outputs the converted digital signal, which is a voice digital signal, as voice information to the processing unit 17. The voice input unit 14 includes, for example, a microphone and an AD (Analog to Digital) converter that converts the voice signal, which is an electrical analog signal output from the microphone, into a digital signal.

〔３．６．音声出力部１５〕
音声出力部１５は、例えば、処理部１７から出力される音声情報であるデジタル音声信号をアナログ音声信号に変換するＤＡ（Digital to Analog）変換器と、ＤＡ変換器から出力されるアナログ音声信号を音に変換して出力するスピーカとを備える。 3.6. Audio Output Unit 15
The audio output unit 15 includes, for example, a DA (Digital to Analog) converter that converts a digital audio signal, which is audio information output from the processing unit 17, into an analog audio signal, and a speaker that converts the analog audio signal output from the DA converter into sound and outputs it.

〔３．７．位置検出部１６〕
位置検出部１６は、例えば、情報処理装置１の位置を検出し、検出した情報処理装置１の位置のデータである位置データを処理部１７に出力する。位置検出部１６は、ＧＮＳＳ（Global Navigation Satellite System）における複数の測位衛星から送信される複数の測位信号を受信し、受信した複数の測位信号に基づいて、情報処理装置１の位置を検出する。 3.7. Position detection unit 16
The position detection unit 16 detects, for example, the position of the information processing device 1, and outputs position data that is data on the detected position of the information processing device 1 to the processing unit 17. The position detection unit 16 receives a plurality of positioning signals transmitted from a plurality of positioning satellites in the GNSS (Global Navigation Satellite System), and detects the position of the information processing device 1 based on the received plurality of positioning signals.

〔３．８．処理部１７〕
処理部１７は、コントローラ（controller）であり、例えば、ＣＰＵ（Central Processing Unit）またはＭＰＵ（Micro Processing Unit）などによって、情報処理装置１内部の記憶装置に記憶されている各種プログラムがＲＡＭを作業領域として実行されることにより実現される。 3.8. Processing Unit 17
The processing unit 17 is a controller, and is realized, for example, by a CPU (Central Processing Unit) or an MPU (Micro Processing Unit) executing various programs stored in a storage device inside the information processing device 1 using RAM as a working area.

また、処理部１７は、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）またはＦＰＧＡ（Field Programmable Gate Array）などの集積回路により実現されてもよい。処理部１７は、受付部２０と、生成部２１と、提供部２２と、学習部２３とを備える。 The processing unit 17 may also be implemented by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array). The processing unit 17 includes a reception unit 20, a generation unit 21, a provision unit 22, and a learning unit 23.

〔３．８．１．受付部２０〕
受付部２０は、音声入力部１４から出力される音声デジタル情報に基づいて、利用者Ｕの発話を受け付ける。例えば、受付部２０は、利用者Ｕが操作部１２を用いて特定操作を行った場合に、その後に行われる利用者Ｕの発話を受け付ける。 3.8.1. Reception unit 20
The reception unit 20 receives the speech of the user U based on the digital audio information output from the audio input unit 14. For example, when the user U performs a specific operation using the operation unit 12, the reception unit 20 receives the speech of the user U thereafter.

特定操作は、例えば、情報取得機能、情報送信機能、機器制御機能、およびテキスト化機能の各々で異なる。情報取得機能は、利用者Ｕの発話に応じて情報提供装置２から特定の情報を取得する機能であり、情報送信機能は、利用者Ｕの発話に応じて情報を送信する機能であり、機器制御機能は、利用者Ｕの発話に応じて周辺の機器を制御する機能である。テキスト化機能は、上述したように、利用者Ｕの発話をテキスト化し、テキスト化した情報であるテキスト情報を出力する機能である。 Specific operations differ, for example, for the information acquisition function, information transmission function, device control function, and text conversion function. The information acquisition function is a function for acquiring specific information from the information providing device 2 in response to user U's speech, the information transmission function is a function for transmitting information in response to user U's speech, and the device control function is a function for controlling peripheral devices in response to user U's speech. As described above, the text conversion function is a function for converting user U's speech into text and outputting the text information that has been converted into text.

また、受付部２０は、利用者Ｕが特定のキーワードを発話した場合に、その後に行われる利用者Ｕの発話を受け付けることもできる。利用者Ｕが特定のキーワードを発話したか否かは、音声入力部１４から出力される音声情報号に対する音声認識によって行われる。なお、特定のキーワードは、例えば、情報取得機能、情報送信機能、機器制御機能、およびテキスト化機能の各々で異なる。 Furthermore, when user U utters a specific keyword, the reception unit 20 can also receive subsequent utterances from user U. Whether user U has uttered a specific keyword is determined by voice recognition of the voice information signal output from the voice input unit 14. Note that the specific keyword differs for each of the information acquisition function, information transmission function, device control function, and text conversion function, for example.

受付部２０は、利用者Ｕの発話によって音声入力部１４から出力される音声情報をテキスト情報に変換する音声認識機能を有している。また、受付部２０は、音声認識機能によって変換されたテキスト情報の意味を解析する機能を有していてもよい。 The reception unit 20 has a voice recognition function that converts the voice information output from the voice input unit 14 in response to the user U's speech into text information. The reception unit 20 may also have a function that analyzes the meaning of the text information converted by the voice recognition function.

受付部２０は、利用者発話に対応する音声情報またはテキスト情報を生成部２１に出力する。利用者発話に対応する音声情報は、音声入力部１４から出力される音声情報であり、利用者発話に対応するテキスト情報は、利用者発話に対応する音声情報を音声認識機能によってテキスト化された情報である。受付部２０は、例えば、利用者Ｕの発話に対応する音声情報およびテキスト情報を利用者Ｕの発話毎に関連付けて記憶部１３に記憶させる。 The reception unit 20 outputs voice information or text information corresponding to the user's utterance to the generation unit 21. The voice information corresponding to the user's utterance is voice information output from the voice input unit 14, and the text information corresponding to the user's utterance is information obtained by converting the voice information corresponding to the user's utterance into text using a voice recognition function. The reception unit 20, for example, associates the voice information and text information corresponding to the user U's utterance with each utterance from the user U and stores them in the storage unit 13.

また、受付部２０は、マスキングの要否を規定する秘匿条件の設定または変更を受け付ける。例えば、受付部２０は、利用者Ｕの操作部１２への操作に応じて秘匿条件の情報の入力または変更があった場合に、秘匿条件の設定または変更を受け付ける。 The reception unit 20 also accepts the setting or change of confidentiality conditions that specify whether masking is required. For example, the reception unit 20 accepts the setting or change of confidentiality conditions when confidentiality condition information is input or changed in response to an operation by the user U on the operation unit 12.

〔３．８．２．生成部２１〕
生成部２１は、テキスト化機能において、受付部２０によって受け付けられた利用者Ｕの発話のうち秘匿条件を満たす部分をマスキングして利用者Ｕの発話をテキスト化したテキスト情報である発話テキスト情報を生成する。 [3.8.2. Generation unit 21]
In the text conversion function, the generation unit 21 generates utterance text information, which is text information obtained by converting the utterance of the user U received by the reception unit 20 into text by masking parts that satisfy confidentiality conditions.

秘匿条件は、例えば、住所、氏名、電話番号などといったプライバシーにかかわる内容を示す言葉や公序良俗に反する内容を示す言葉である。公序良俗に反する内容を示す言葉は、例えば、差別的または侮蔑的な言葉、卑猥な言葉、犯罪を肯定または助長させる意味をもつ言葉などである。秘匿条件は、上述したように、利用者Ｕによって生成または変更可能である。 The confidentiality conditions are, for example, words that indicate privacy-related content such as addresses, names, and telephone numbers, or words that indicate content that violates public order and morals. Words that indicate content that violates public order and morals include, for example, discriminatory or derogatory words, obscene words, and words that condone or encourage crime. As described above, the confidentiality conditions can be generated or changed by user U.

生成部２１は、例えば、利用者Ｕの発話に対応する音声情報またはテキスト情報を入力とし、利用者Ｕの発話を構成する複数の要素部分の各々に対する秘匿度合いを示す秘匿スコアを出力とする学習済みモデルを有する。 The generation unit 21 has a trained model that receives, for example, audio information or text information corresponding to user U's speech, and outputs a confidentiality score indicating the degree of confidentiality for each of the multiple element parts that make up user U's speech.

生成部２１は、受付部２０によって受け付けられた利用者Ｕの発話に対応する音声情報またはテキスト情報を学習済みモデルに入力し、利用者Ｕの発話を構成する複数の要素部分の各々に対する秘匿度合いを示す秘匿スコアを学習済みモデルから取得する。生成部２１は、複数の要素部分のうち秘匿スコアが閾値以上である要素部分を、秘匿条件を満たす部分として、発話テキスト情報を生成する。 The generation unit 21 inputs the audio information or text information corresponding to the user U's utterance received by the reception unit 20 into the trained model, and obtains from the trained model a confidentiality score indicating the degree of confidentiality for each of the multiple element parts that make up the user U's utterance. The generation unit 21 generates utterance text information by regarding element parts among the multiple element parts whose confidentiality scores are equal to or greater than a threshold as parts that satisfy the confidentiality conditions.

また、生成部２１は、例えば、利用者Ｕの発話に対応する音声情報またはテキスト情報を入力とし、利用者Ｕの発話のうち秘匿条件を満たす部分をマスキングした発話テキスト情報を発話テキスト情報として出力とする学習済みモデルを有していてもよい。 The generation unit 21 may also have a trained model that receives, for example, audio information or text information corresponding to user U's speech as input, and outputs, as speech text information, speech text information in which parts of user U's speech that satisfy confidentiality conditions are masked.

この場合、生成部２１は、受付部２０によって受け付けられた利用者Ｕの発話に対応する音声情報またはテキスト情報を学習済みモデルに入力し、学習済みモデルから出力されるテキスト情報を発話テキスト情報として得ることができる。 In this case, the generation unit 21 inputs the voice information or text information corresponding to the user U's utterance received by the reception unit 20 into the trained model, and can obtain the text information output from the trained model as utterance text information.

生成部２１は、例えば、発話テキスト情報の用途に応じた複数の学習済みモデルを有してもよい。発話テキスト情報の用途は、例えば、メール用、ＳＮＳ用、口コミ投稿用、または電子掲示板用などである。生成部２１は、例えば、利用者Ｕの操作部１２への操作、利用者Ｕの発話による指定、または表示部１１に表示されているアプリケーションの種類などに応じて、発話テキスト情報の用途を判定することができる。 The generation unit 21 may have multiple trained models according to the intended use of the spoken text information, for example. The intended use of the spoken text information may be for email, SNS, word-of-mouth posting, or electronic bulletin boards, for example. The generation unit 21 can determine the intended use of the spoken text information according to, for example, an operation by the user U on the operation unit 12, a specification made by the user U through speech, or the type of application displayed on the display unit 11.

生成部２１は、例えば、利用者Ｕのコンテキストに応じた複数の学習済みモデルを有してもよい。コンテキストには、利用者Ｕの現在位置または利用者Ｕの運動状態などが含まれる。利用者Ｕの現在位置や運動状態は、例えば、位置検出部１６によって検出された位置に基づいて判定される。 The generation unit 21 may have, for example, multiple trained models according to the context of the user U. The context includes the current location of the user U or the movement state of the user U. The current location or movement state of the user U is determined based on, for example, the location detected by the location detection unit 16.

学習済みモデルは、例えば、畳み込みニューラルネットワークまたは回帰型ニューラルネットワークなどのニューラルネットワークによる機械学習によって生成されるが、かかる例に限定されない。例えば、学習済みモデルは、ニューラルネットワークに代えて、線形回帰、重回帰、またはロジスティック回帰といった回帰手法の学習アルゴリズムなどのように他の学習アルゴリズムによる機械学習を用いて生成されてもよい。 The trained model is generated by machine learning using a neural network, such as a convolutional neural network or a recurrent neural network, but is not limited to such examples. For example, instead of a neural network, the trained model may be generated using machine learning using other learning algorithms, such as learning algorithms for regression methods such as linear regression, multiple regression, or logistic regression.

また、生成部２１は、学習済みモデルに代えてまたは加えて、秘匿対象となる複数の言葉を示す情報を含む秘匿情報テーブルを有してもよい。この場合、情報処理装置１は、秘匿情報テーブルに含まれる言葉に対応するテキスト情報または音声情報を、秘匿条件を満たす部分として、発話テキスト情報を生成することができる。 Furthermore, instead of or in addition to the trained model, the generation unit 21 may have a confidential information table containing information indicating multiple words to be concealed. In this case, the information processing device 1 can generate spoken text information by treating text information or audio information corresponding to words included in the confidential information table as the part that satisfies the confidentiality conditions.

秘匿情報テーブルに含まれる文字列は、正規表現で示される文字列であってもよく、この場合、秘匿情報テーブルで示される正規表現で特定されるテキスト情報または音声情報を、秘匿条件を満たす部分として、発話テキスト情報を生成する。正規表現で示される文字列は、例えば、電話番号の場合、「[０-９]｛３｝-[０-９]｛４｝-[０-９]｛４｝」などである。 The character strings included in the confidential information table may be character strings expressed using regular expressions. In this case, the text information or audio information specified by the regular expressions in the confidential information table is used as the portion that satisfies the confidentiality conditions to generate the spoken text information. For example, in the case of a phone number, a character string expressed using a regular expression might be "[0-9]{3}-[0-9]{4}-[0-9]{4}".

また、生成部２１は、学習済みモデルに代えてまたは加えて、互いに異なる言葉毎の秘匿に関する情報を含む秘匿情報テーブルを有してもよい。秘匿に関する情報は、秘匿レベルを示す情報であり、例えば、２段階以上に設定される。生成部２１は、例えば、利用者Ｕの発話に秘匿レベル設定された言葉が含まれる割合が多いほど、マスキング対象とする秘匿レベルを下げることができる。 Furthermore, instead of or in addition to the trained model, the generation unit 21 may have a confidentiality information table containing information on confidentiality for each different word. The confidentiality information is information indicating the confidentiality level, which is set to, for example, two or more levels. For example, the generation unit 21 can lower the confidentiality level to be masked as the proportion of words with set confidentiality levels included in the user U's speech increases.

例えば、秘匿レベルがレベル１～３までの３段階であるとし、レベル１、レベル２、レベル３の順に秘匿レベルが高いとする。すなわち、秘匿レベルが最も低いレベルがレベル１であり、秘匿レベルが次に低いレベルがレベル２であり、秘匿レベルが最も高いレベルがレベル３である。 For example, suppose there are three confidentiality levels, from level 1 to level 3, with the confidentiality levels increasing from level 1 to level 2 and level 3. In other words, the lowest confidentiality level is level 1, the next lowest confidentiality level is level 2, and the highest confidentiality level is level 3.

この場合、生成部２１は、利用者Ｕの発話に秘匿レベル設定された言葉が含まれる割合が第１閾値未満であれば、秘匿レベルがレベル３である言葉をマスキング対象とする。また、生成部２１は、利用者Ｕの発話に秘匿レベル設定された言葉が含まれる割合が第１閾値以上第２閾値未満であれば、秘匿レベルがレベル２以上である言葉をマスキング対象とする。また、生成部２１は、利用者Ｕの発話に秘匿レベル設定された言葉が含まれる割合が第２閾値以上であれば、秘匿レベルがレベル１以上である言葉をマスキング対象とする。 In this case, if the proportion of words with a confidentiality level set in the utterance of user U that are included is less than the first threshold, the generation unit 21 targets words with a confidentiality level of level 3 as masking targets. Furthermore, if the proportion of words with a confidentiality level set in the utterance of user U that are included is equal to or greater than the first threshold and less than the second threshold, the generation unit 21 targets words with a confidentiality level of level 2 or higher as masking targets. If the proportion of words with a confidentiality level set in the utterance of user U that are included is equal to or greater than the second threshold, the generation unit 21 targets words with a confidentiality level of level 1 or higher as masking targets.

また、生成部２１は、利用者Ｕの発話に秘匿レベルに設定された言葉が含まれる割合に代えてまたは加えて、発話テキスト情報の用途や利用者Ｕのコンテキストに応じてマスキング対象とする秘匿レベルを決定することもできる。また、利用者Ｕは、操作部１２への操作または情報処理装置１に対する発話によってマスキング対象とする秘匿レベルを指定することもできる。例えば、生成部２１は、受付部２０によって指定されたレベル以上の秘匿レベルである言葉をマスキング対象とすることもできる。 In addition to or instead of the proportion of words set to a confidentiality level included in user U's speech, the generation unit 21 can also determine the confidentiality level to be masked depending on the purpose of the spoken text information and the context of user U. User U can also specify the confidentiality level to be masked by operating the operation unit 12 or by speaking to the information processing device 1. For example, the generation unit 21 can also mask words with a confidentiality level equal to or higher than the level specified by the reception unit 20.

マスキングは、利用者Ｕの発話のうち秘匿条件を満たす部分を、例えば、特定の文字、記号、および模様のうちの少なくとも１つに置き換えることによって行われる。特定の文字は、例えば、「Ｘ」、「・」、「－」、「？」などであり、記号は、例えば、「□」、「◆」、「※」、「〇」などである。また、模様は、例えば、市松模様、幾何学模様などである。また、マスキングは、利用者Ｕの発話のうち秘匿条件を満たす部分を、例えば、スペースに置き換えることによって行ってもよい。 Masking is performed by, for example, replacing portions of user U's speech that satisfy confidentiality conditions with at least one of specific characters, symbols, and patterns. Specific characters include, for example, "X," "・," "-," and "?", and symbols include, for example, "□," "◆," "※," and "◯." Patterns include, for example, checkerboard patterns and geometric patterns. Masking may also be performed by, for example, replacing portions of user U's speech that satisfy confidentiality conditions with spaces.

なお、生成部２１は、学習済みモデルと秘匿情報テーブルとを併用して、発話テキスト情報を生成することもできる。また、生成部２１は、利用者Ｕの発話に対応する音声情報およびテキスト情報との組み合わせを利用者Ｕの発話毎に記憶部１３に記憶させる。 The generation unit 21 can also generate utterance text information by using both the trained model and the secret information table. The generation unit 21 also stores, in the storage unit 13, a combination of audio information and text information corresponding to the utterance of the user U for each utterance of the user U.

また、生成部２１は、秘匿条件を満たす部分の文字数と同じ文字数の文字または記号などでのマスキングに代えて、例えば、秘匿レベルが閾値以上である場合、秘匿条件を満たす部分の文字数と異なる文字数の文字または記号などでのマスキングを行うこともできる。 In addition, instead of masking with characters or symbols with the same number of characters as the part that satisfies the confidentiality conditions, the generation unit 21 can also mask with characters or symbols with a different number of characters than the part that satisfies the confidentiality conditions, for example, when the confidentiality level is equal to or higher than a threshold value.

〔３．８．３．提供部２２〕
提供部２２は、生成部２１によって生成された発話テキスト情報を提供する。例えば、提供部２２は、発話テキスト情報を表示部１１に表示させることで発話テキスト情報を利用者Ｕに提供する。 [3.8.3. Providing Department 22]
The providing unit 22 provides the utterance text information generated by the generating unit 21. For example, the providing unit 22 provides the utterance text information to the user U by displaying the utterance text information on the display unit 11.

図３は、実施形態に係る情報処理装置１の表示部１１に表示された発話テキスト情報の一例を示す図である。図３に示す例では、利用者Ｕが「うちの近くのＡＡＡスーパーに買い物に行ったときに、特許花子さんと出会って、話をしたのですが、次郎という息子さんが今年、あのＢＢＢ商事に入社されたそうです。」と発話した場合に、情報処理装置１の表示部１１に表示される発話テキスト情報の一例である。 Figure 3 is a diagram showing an example of spoken text information displayed on the display unit 11 of the information processing device 1 according to the embodiment. The example shown in Figure 3 is an example of spoken text information displayed on the display unit 11 of the information processing device 1 when user U utters, "When I went shopping at the AAA supermarket near my house, I met and talked with Tokkyo Hanako, and she told me that her son, Jiro, joined BBB Shoji this year."

図３に示す発話テキスト情報は、「うちの近くのＸＸＸＸＸＸＸに買い物に行ったときに、ＸＸＸＸさんと出会って、話をしたのですが、ＸＸという息子さんが今年、あのＸＸ高いＸＸＸＸＸに入社されたそうです」である。そして、図３に示す発話テキスト情報では、秘匿条件を満たす部分として「ＡＡＡスーパー」、「特許花子」、および「ＢＢＢ商事」が「Ｘ」の文字列でマスキングされている。 The spoken text information shown in Figure 3 is, "When I went shopping at the XXXXXX store near my house, I met and talked with Mr./Ms. XXX, and I learned that their son, XX, joined the XXXXXX store this year." In the spoken text information shown in Figure 3, the parts that satisfy the confidentiality conditions are masked with the characters "X," including "AAA Supermarket," "Hanako Patent," and "BBB Trading."

マスキングは、利用者Ｕの発話のうち秘匿条件を満たす部分を、例えば、特定の文字、記号、および模様のうちの少なくとも１つに置き換えることによって行われる。図３に示す例では、利用者Ｕの発話のうち秘匿条件を満たす部分が「Ｘ」の文字に置き換えられた発話テキスト情報が生成される。 Masking is performed by replacing parts of user U's utterance that satisfy the confidentiality conditions with, for example, at least one of specific characters, symbols, and patterns. In the example shown in Figure 3, spoken text information is generated in which parts of user U's utterance that satisfy the confidentiality conditions are replaced with the letter "X."

また、提供部２２は、発話テキスト情報を音声合成により音声情報に変換し、変換した音声情報を音声出力部１５に出力することもできる。これにより、提供部２２は、発話テキスト情報を音声で利用者Ｕに提供することができる。なお、提供部２２は、発話テキスト情報を音声合成して音声情報に変換する場合に、発話テキスト情報をマスキングされた部分を特定の音（例えば、「ピー」という音など）に変換することができ、また、発話テキスト情報をマスキングされた部分を無音にすることもできる。 The providing unit 22 can also convert the spoken text information into audio information using voice synthesis and output the converted audio information to the audio output unit 15. This allows the providing unit 22 to provide the spoken text information to the user U by voice. When converting the spoken text information into audio information using voice synthesis, the providing unit 22 can convert masked parts of the spoken text information into a specific sound (for example, a "beep" sound), and can also silence the masked parts of the spoken text information.

また、提供部２２は、利用者Ｕとの対話に基づいて、利用者Ｕの発話に応じた情報を情報提供装置２に送信したり、利用者Ｕの発話に応じた情報を情報提供装置２から取得したり、利用者Ｕの発話に応じた周辺の機器を制御したりすることもできる。 In addition, based on the dialogue with user U, the providing unit 22 can transmit information corresponding to user U's utterances to the information providing device 2, obtain information corresponding to user U's utterances from the information providing device 2, and control peripheral devices corresponding to user U's utterances.

また、提供部２２は、表示部１１に表示されている発話テキスト情報のうちマスキングされた部分が選択された場合、マスキングを解除する。利用者Ｕは、例えば、操作部１２を操作することによってマスキングされた部分を選択することができる。 Furthermore, when a masked portion of the speech text information displayed on the display unit 11 is selected, the providing unit 22 unmasks the masking. The user U can select the masked portion by, for example, operating the operating unit 12.

図４は、実施形態に係る情報処理装置１の表示部１１に表示された発話テキスト情報のうち選択された部分のマスキングが解除される例を示す図である。図４に示す例では、「ＸＸＸＸＸ」が利用者Ｕによって選択され、利用者Ｕによって選択された「ＸＸＸＸＸ」のマスキングが解除されて「ＢＢＢ商事」が表示されている。 Figure 4 is a diagram showing an example in which masking of a selected portion of spoken text information displayed on the display unit 11 of the information processing device 1 according to the embodiment is removed. In the example shown in Figure 4, "XXXXXX" is selected by user U, and the masking of "XXXXXX" selected by user U is removed, displaying "BBB Trading."

なお、提供部２２は、例えば、利用者Ｕの発話をマスキングなしにテキスト化したテキスト情報と、利用者Ｕの発話のうち秘匿条件を満たす部分の位置を示す情報とを含む情報を発話情報として生成することができる。この場合、提供部２２は、生成した発情報に基づいて、発話テキスト情報を利用者Ｕに提供したり、通信部１０を介して外部装置へ送信したりすることができる。 The providing unit 22 can generate, as speech information, information including, for example, text information obtained by converting user U's speech into text without masking, and information indicating the location of parts of user U's speech that satisfy confidentiality conditions. In this case, the providing unit 22 can provide the speech text information to user U or transmit it to an external device via the communication unit 10 based on the generated speech information.

〔３．８．４．学習部２３〕
学習部２３は、学習済みモデルを生成したり、更新したりすることができる。例えば、学習部２３は、秘匿条件を満たす言葉であって利用者Ｕの発話に対応する音声情報とテキスト情報とを含む学習用データを用いて、学習済みモデルを生成したり、更新したりする。 3.8.4. Learning Unit 23
The learning unit 23 can generate and update the trained model. For example, the learning unit 23 generates and updates the trained model using training data that includes speech information and text information corresponding to utterances of the user U that satisfy the confidentiality conditions.

学習用データは、例えば、利用者Ｕの過去の発話履歴に基づいて生成される。例えば、学習部２３は、秘匿条件を満たす言葉として受付部２０によって受け付けられた利用者Ｕの発話に基づいて学習用データを生成する。 The learning data is generated, for example, based on the user U's past speech history. For example, the learning unit 23 generates the learning data based on the user U's speech that has been accepted by the accepting unit 20 as words that satisfy the confidentiality conditions.

また、学習部２３は、利用者Ｕの過去の発話履歴を表示部１１に表示させ、秘匿条件を満たす言葉を操作部１２への操作などによって利用者Ｕに選択させることによって、学習用データを生成することもできる。 The learning unit 23 can also generate learning data by displaying the user U's past speech history on the display unit 11 and having the user U select words that satisfy the confidentiality conditions by operating the operation unit 12, etc.

例えば、学習部２３は、利用者Ｕの過去の発話履歴を記憶部１３から取得する。利用者Ｕの過去の発話履歴には、利用者Ｕの過去の発話に対応する音声情報とテキスト情報とが含まれており、学習部２３は、利用者Ｕに選択された秘匿条件を満たす言葉に対応する音声情報とテキスト情報との組み合わせを学習用データとして生成する。 For example, the learning unit 23 acquires user U's past speech history from the storage unit 13. User U's past speech history includes voice information and text information corresponding to user U's past utterances, and the learning unit 23 generates, as learning data, a combination of voice information and text information corresponding to words that satisfy the confidentiality conditions selected by user U.

また、学習部２３は、利用者Ｕに選択された秘匿条件を満たす言葉の位置を示す情報と利用者Ｕの過去の発話に対応する音声情報またはテキスト情報との組み合わせを学習用データとして生成することもできる。かかる学習用データによって生成または更新された学習済みモデルは、利用者Ｕの発話に対応する音声情報またはテキスト情報に含まれる語の位置（先頭の語からの位置）毎に秘匿度合いを示す秘匿スコアを出力とするモデルである。 The learning unit 23 can also generate, as training data, a combination of information indicating the position of words that satisfy the confidentiality conditions selected by user U and audio information or text information corresponding to user U's past utterances. The trained model generated or updated using such training data is a model that outputs a confidentiality score indicating the degree of confidentiality for each position (position from the first word) of a word included in the audio information or text information corresponding to user U's utterance.

この場合、生成部２１は、利用者Ｕの発話に対応する音声情報またはテキスト情報を学習済みモデルに入力し、学習済みモデルから出力される秘匿スコアが閾値以上である位置をマスキング対象としてマスキングすることで発話テキスト情報を生成する。なお、生成部２１は、閾値は、例えば、上述した秘匿レベルが高いほど高くすることができる。 In this case, the generation unit 21 inputs audio information or text information corresponding to the user U's utterance into the trained model, and generates the utterance text information by masking positions where the confidentiality score output from the trained model is equal to or greater than a threshold. Note that the generation unit 21 can, for example, set the threshold higher the higher the confidentiality level described above.

学習部２３は、発話テキスト情報の用途に応じた複数の学習済みモデルを生成することができ、また、利用者Ｕのコンテキストに応じた複数の学習済みモデルを生成することもできる。なお、学習部２３は、上述した学習済みモデルを生成することができればよく、学習部２３による学習処理は上述の処理に限定されない。 The learning unit 23 can generate multiple trained models according to the purpose of the spoken text information, and can also generate multiple trained models according to the context of the user U. Note that the learning unit 23 is only required to be able to generate the trained models described above, and the learning process performed by the learning unit 23 is not limited to the process described above.

〔４．処理手順〕
次に、実施形態に係る情報処理装置１の処理部１７による情報処理の手順について説明する。図５および図６は、実施形態に係る情報処理装置１の処理部１７による情報処理の一例を示すフローチャートである。 4. Processing Procedure
Next, a procedure of information processing by the processing unit 17 of the information processing device 1 according to the embodiment will be described. Figures 5 and 6 are flowcharts showing an example of information processing by the processing unit 17 of the information processing device 1 according to the embodiment.

まず、図５について説明する。図５は、情報処理装置１が行う発話テキスト情報生成処理の一例を示す。情報処理装置１の処理部１７は、利用者Ｕの発話を受け付ける（ステップＳ１０）。 First, we will explain Figure 5. Figure 5 shows an example of the utterance text information generation process performed by the information processing device 1. The processing unit 17 of the information processing device 1 accepts an utterance from the user U (step S10).

次に、処理部１７は、ステップＳ１０で受け付けた利用者Ｕの発話のうち秘匿条件を満たす部分をマスキングして利用者Ｕの発話をテキスト化したテキスト情報である発話テキスト情報を生成する（ステップＳ１１）。ステップＳ１１の処理において、処理部１７は、例えば、利用者Ｕの発話に対応する音声情報または文字情報を学習済みモデルに入力し、利用者Ｕの発話を構成する複数の要素部分の各々に対する秘匿度合いを示す秘匿スコアを学習済みモデルから取得する。処理部１７は、複数の要素部分のうち秘匿スコアが閾値以上である要素部分を、秘匿条件を満たす部分として、発話テキスト情報を生成する。 Next, the processing unit 17 generates utterance text information, which is text information obtained by converting user U's utterance received in step S10 into text by masking parts that satisfy the confidentiality conditions (step S11). In the processing of step S11, the processing unit 17 inputs, for example, audio information or character information corresponding to user U's utterance into the trained model, and obtains from the trained model a confidentiality score indicating the degree of confidentiality for each of the multiple element parts that make up user U's utterance. The processing unit 17 generates utterance text information by treating element parts of the multiple element parts whose confidentiality scores are equal to or greater than a threshold as parts that satisfy the confidentiality conditions.

そして、処理部１７は、ステップＳ１１で生成した発話テキスト情報を提供し（ステップＳ１２）、図５に示す処理を終了する。例えば、ステップＳ１２の処理において、処理部１７は、例えば、発話テキスト情報を表示部１１に表示したり、発話テキスト情報を音声として音声出力部１５から出力させたりすることができる。 Then, the processing unit 17 provides the spoken text information generated in step S11 (step S12), and ends the processing shown in FIG. 5. For example, in the processing of step S12, the processing unit 17 can, for example, display the spoken text information on the display unit 11 or output the spoken text information as audio from the audio output unit 15.

次に、図６について説明する。図６は、情報処理装置１が行う学習処理の一例を示す。情報処理装置１の処理部１７は、学習用データを生成する（ステップＳ２０）。ステップＳ２０の処理において、例えば、処理部１７は、秘匿条件を満たす言葉として受付部２０によって受け付けられた利用者Ｕの発話に基づいて学習用データを生成する。 Next, we will explain Figure 6. Figure 6 shows an example of the learning process performed by the information processing device 1. The processing unit 17 of the information processing device 1 generates learning data (step S20). In the process of step S20, for example, the processing unit 17 generates learning data based on the utterance of user U that has been accepted by the accepting unit 20 as words that satisfy the confidentiality conditions.

次に、ステップＳ２０で生成した学習用データを用いて学習済みモデルを生成または更新して（ステップＳ２１）、図６に示す処理を終了する。 Next, a trained model is generated or updated using the training data generated in step S20 (step S21), and the process shown in Figure 6 is terminated.

〔５．変形例〕
上述した情報処理装置１は、利用者Ｕに操作される端末装置などの機器であるものとして説明したが、情報処理装置１は、サーバ装置などであってもよい。図７は、実施形態に係る情報処理装置１の構成の他の例を示す図である。 5. Modifications
The above-described information processing device 1 has been described as a device such as a terminal device operated by a user U, but the information processing device 1 may also be a server device, etc. Fig. 7 is a diagram showing another example of the configuration of the information processing device 1 according to the embodiment.

図７に示す端末装置３は、利用者Ｕの発話を音声情報またはテキスト情報に変換し、変換した音声情報またはテキスト情報を含む利用者発話情報を情報処理装置１にネットワークＮを介して送信する。図７に示す情報処理装置１の受付部２０は、端末装置３から送信される利用者発話情報をネットワークＮおよび通信部１０を介して取得することで、利用者Ｕの発話を受け付ける。 The terminal device 3 shown in FIG. 7 converts the speech of the user U into voice information or text information, and transmits the user speech information including the converted voice information or text information to the information processing device 1 via the network N. The reception unit 20 of the information processing device 1 shown in FIG. 7 receives the user speech information transmitted from the terminal device 3 via the network N and the communication unit 10, thereby receiving the speech of the user U.

情報処理装置１の生成部２１は、利用者発話情報に含まれる音声情報またはテキスト情報に基づいて、発話テキスト情報を生成する。提供部２２は、生成部２１によって生成された発話テキスト情報を通信部１０およびネットワークＮを介して端末装置３に送信することで、発話テキスト情報を提供する。端末装置３は、情報処理装置１から送信される発話テキスト情報を受信し、受信した発話テキスト情報を表示したり音声として出力したりする。 The generation unit 21 of the information processing device 1 generates spoken text information based on the audio information or text information included in the user utterance information. The provision unit 22 provides the spoken text information by transmitting the spoken text information generated by the generation unit 21 to the terminal device 3 via the communication unit 10 and the network N. The terminal device 3 receives the spoken text information transmitted from the information processing device 1 and displays the received spoken text information or outputs it as audio.

なお、図２に示す情報処理装置１における処理部１７の機能の一部は、図７に示す端末装置３によって実現されてもよい。また、図２に示す情報処理装置１における処理部１７の機能の一部は、情報提供装置２によって実現されてもよい。 Note that some of the functions of the processing unit 17 in the information processing device 1 shown in FIG. 2 may be implemented by the terminal device 3 shown in FIG. 7. Also, some of the functions of the processing unit 17 in the information processing device 1 shown in FIG. 2 may be implemented by the information providing device 2.

また、図２に示す情報処理装置１は、学習済みモデルや秘匿情報テーブルを利用者Ｕ毎に有しており、この場合の学習済みモデルは、オンデバイスモデルということもできる。また、図７に示す情報処理装置１は、学習済みモデルや秘匿情報テーブルを全利用者Ｕに共通に有してもよく、学習済みモデルや秘匿情報テーブルを利用者Ｕ毎に有していてもよい。 Furthermore, the information processing device 1 shown in FIG. 2 has a trained model and a confidential information table for each user U, and in this case the trained model can also be called an on-device model. Furthermore, the information processing device 1 shown in FIG. 7 may have a trained model and a confidential information table that are common to all users U, or may have a trained model and a confidential information table for each user U.

また、上述した例では、図２に示す情報処理装置１は、対話型の音声操作に対応するＡＩアシスタント機能を有するものとして説明したが、情報処理装置１は、ＡＩアシスタント機能を有しない装置であってもよい。例えば、情報処理装置１は、ボイスレコーダなどであってもよい。情報処理装置１のテキスト化機能は、例えば、会議の議事録をテキスト化する際などにも用いることができる。 In the example described above, the information processing device 1 shown in FIG. 2 has been described as having an AI assistant function that supports interactive voice operations, but the information processing device 1 may also be a device that does not have an AI assistant function. For example, the information processing device 1 may be a voice recorder, etc. The text conversion function of the information processing device 1 can also be used, for example, when converting meeting minutes into text.

〔６．ハードウェア構成〕
上述してきた実施形態に係る情報処理装置１は、例えば図８に示すような構成のコンピュータ８０によって実現される。図８は、実施形態に係る情報処理装置１の機能を実現するコンピュータ８０の一例を示すハードウェア構成図である。コンピュータ８０は、ＣＰＵ８１、ＲＡＭ８２、ＲＯＭ（Read Only Memory）８３、ＨＤＤ（Hard Disk Drive）８４、通信インターフェイス（Ｉ／Ｆ）８５、入出力インターフェイス（Ｉ／Ｆ）８６、およびメディアインターフェイス（Ｉ／Ｆ）８７を有する。 6. Hardware Configuration
The information processing device 1 according to the embodiment described above is realized by, for example, a computer 80 configured as shown in Fig. 8. Fig. 8 is a hardware configuration diagram showing an example of the computer 80 that realizes the functions of the information processing device 1 according to the embodiment. The computer 80 has a CPU 81, a RAM 82, a ROM (Read Only Memory) 83, an HDD (Hard Disk Drive) 84, a communication interface (I/F) 85, an input/output interface (I/F) 86, and a media interface (I/F) 87.

ＣＰＵ８１は、ＲＯＭ８３またはＨＤＤ８４に記憶されたプログラムに基づいて動作し、各部の制御を行う。ＲＯＭ８３は、コンピュータ８０の起動時にＣＰＵ８１によって実行されるブートプログラムや、コンピュータ８０のハードウェアに依存するプログラムなどを記憶する。 The CPU 81 operates based on programs stored in the ROM 83 or HDD 84 and controls each component. The ROM 83 stores a boot program executed by the CPU 81 when the computer 80 starts up, as well as programs that depend on the computer 80's hardware.

ＨＤＤ８４は、ＣＰＵ８１によって実行されるプログラム、および、かかるプログラムによって使用されるデータなどを記憶する。通信インターフェイス８５は、ネットワークＮ（図２参照）を介して他の機器からデータを受信してＣＰＵ８１へ送り、ＣＰＵ８１が生成したデータを、ネットワークＮを介して他の機器に送信する。 The HDD 84 stores programs executed by the CPU 81 and data used by such programs. The communication interface 85 receives data from other devices via the network N (see Figure 2) and sends it to the CPU 81, and transmits data generated by the CPU 81 to other devices via the network N.

ＣＰＵ８１は、入出力インターフェイス８６を介して、ディスプレイやプリンタなどの出力装置、および、キーボードまたはマウスなどの入力装置を制御する。ＣＰＵ８１は、入出力インターフェイス８６を介して、入力装置からデータを取得する。また、ＣＰＵ８１は、入出力インターフェイス８６を介して生成したデータを出力装置に出力する。 The CPU 81 controls output devices such as displays and printers, and input devices such as keyboards and mice, via the input/output interface 86. The CPU 81 acquires data from the input devices via the input/output interface 86. The CPU 81 also outputs data generated via the input/output interface 86 to the output devices.

メディアインターフェイス８７は、記録媒体８８に記憶されたプログラムまたはデータを読み取り、ＲＡＭ８２を介してＣＰＵ８１に提供する。ＣＰＵ８１は、かかるプログラムを、メディアインターフェイス８７を介して記録媒体８８からＲＡＭ８２上にロードし、ロードしたプログラムを実行する。記録媒体８８は、例えばＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）などの光学記録媒体、ＭＯ（Magneto-Optical disk）などの光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリなどである。 The media interface 87 reads programs or data stored on the recording medium 88 and provides them to the CPU 81 via the RAM 82. The CPU 81 loads the programs from the recording medium 88 onto the RAM 82 via the media interface 87 and executes the loaded programs. The recording medium 88 may be, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or a PD (Phase Change Rewritable Disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory.

例えば、コンピュータ８０が実施形態に係る情報処理装置１として機能する場合、コンピュータ８０のＣＰＵ８１は、ＲＡＭ８２上にロードされたプログラムを実行することにより、処理部１７の機能を実現する。また、ＨＤＤ８４には、記憶部１３内のデータが記憶される。コンピュータ８０のＣＰＵ８１は、これらのプログラムを記録媒体８８から読み取って実行するが、他の例として、他の装置からネットワークＮを介してこれらのプログラムを取得してもよい。 For example, when the computer 80 functions as the information processing device 1 according to the embodiment, the CPU 81 of the computer 80 executes programs loaded onto the RAM 82 to realize the functions of the processing unit 17. In addition, the HDD 84 stores data in the storage unit 13. The CPU 81 of the computer 80 reads and executes these programs from the recording medium 88, but as another example, these programs may be obtained from another device via the network N.

〔７．その他〕
また、上記実施形態および変形例において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 [7. Other]
Furthermore, among the processes described in the above embodiments and modifications, all or part of the processes described as being performed automatically can be performed manually, or all or part of the processes described as being performed manually can be performed automatically using known methods. In addition, the information including the processing procedures, specific names, various data, and parameters shown in the above documents and drawings can be changed as desired unless otherwise specified. For example, the various information shown in each drawing is not limited to the information shown in the drawings.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Furthermore, the components of each device shown in the figure are functional concepts and do not necessarily have to be physically configured as shown. In other words, the specific form of distribution and integration of each device is not limited to that shown, and all or part of them can be functionally or physically distributed and integrated in any unit depending on various loads, usage conditions, etc.

また、上述してきた実施形態および変形例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Furthermore, the above-described embodiments and variations can be combined as appropriate to the extent that the processing content is not contradictory.

〔８．効果〕
上述してきたように、実施形態に係る情報処理装置１は、受付部２０と、生成部２１と、提供部２２とを備える。受付部２０は、利用者Ｕの発話を受け付ける。生成部２１は、受付部２０によって受け付けられた発話のうち秘匿条件を満たす部分をマスキングして利用者Ｕの発話をテキスト化したテキスト情報である発話テキスト情報を生成する。提供部２２は、生成部２１によって生成された発話テキスト情報を提供する。これにより、情報処理装置１は、利用者Ｕの利便性の向上を図ることができる。 8. Effects
As described above, the information processing device 1 according to the embodiment includes a receiving unit 20, a generating unit 21, and a providing unit 22. The receiving unit 20 receives an utterance from a user U. The generating unit 21 generates utterance text information, which is text information obtained by converting the utterance of the user U into text by masking a portion of the utterance received by the receiving unit 20 that satisfies a confidentiality condition. The providing unit 22 provides the utterance text information generated by the generating unit 21. This enables the information processing device 1 to improve convenience for the user U.

また、生成部２１は、受付部２０によって受け付けられた利用者Ｕの発話に対応する音声情報またはテキスト情報を入力とし、利用者Ｕの発話を構成する複数の要素部分の各々に対する秘匿度合いを示す秘匿スコアを出力とする学習済みモデルを有する。生成部２１は、学習済みモデルを用いて、複数の要素部分のうち秘匿スコアが閾値以上である要素部分を、秘匿条件を満たす部分として、発話テキスト情報を生成する。これにより、情報処理装置１は、秘匿条件を満たす部分を適切に検出することができる。 The generation unit 21 also has a trained model that receives as input audio information or text information corresponding to the user U's utterance received by the reception unit 20, and outputs a confidentiality score indicating the degree of confidentiality for each of multiple element parts that make up the user U's utterance. Using the trained model, the generation unit 21 generates utterance text information by treating element parts among the multiple element parts whose confidentiality scores are equal to or greater than a threshold as parts that satisfy the confidentiality conditions. This allows the information processing device 1 to appropriately detect parts that satisfy the confidentiality conditions.

また、生成部２１は、受付部２０によって受け付けられた利用者Ｕの発話に対応する音声情報またはテキスト情報を入力とし、利用者Ｕの発話のうち秘匿条件を満たす部分をマスキングしたテキスト情報を発話テキスト情報として出力とする学習済みモデルを有し、学習済みモデルを用いて、発話テキスト情報を生成する。これにより、情報処理装置１は、秘匿条件を満たす部分を適切に検出することができる。 The generation unit 21 also has a trained model that receives as input audio information or text information corresponding to the user U's utterance received by the reception unit 20, and outputs text information obtained by masking portions of the user U's utterance that satisfy the confidentiality conditions as utterance text information, and generates utterance text information using the trained model. This allows the information processing device 1 to appropriately detect portions that satisfy the confidentiality conditions.

また、生成部２１は、秘匿対象となる複数の言葉を示す情報を含む秘匿情報テーブルを用いて、発話テキスト情報を生成する。これにより、情報処理装置１は、秘匿条件を満たす部分を適切に検出することができる。 The generation unit 21 also generates spoken text information using a confidential information table that includes information indicating multiple words to be concealed. This allows the information processing device 1 to appropriately detect parts that satisfy the confidentiality conditions.

また、情報処理装置１は、利用者Ｕの過去の発話履歴に基づいて、学習済みモデルを更新する学習部２３を備える。これにより、情報処理装置１は、学習済みモデルによる秘匿条件を満たす部分の検出精度を高めることができる。 The information processing device 1 also includes a learning unit 23 that updates the trained model based on the user U's past speech history. This allows the information processing device 1 to improve the accuracy of detecting parts that satisfy the confidentiality conditions using the trained model.

また、生成部２１は、利用者Ｕの発話のうち秘匿条件を満たす部分を特定の文字、記号、および模様のうちの少なくとも１つに変換することで秘匿条件を満たす部分をマスキングする。これにより、情報処理装置１は、マスキングされた箇所を利用者Ｕに適切に提示することができる。 The generation unit 21 also masks the portions of the user U's speech that satisfy the confidentiality conditions by converting those portions into at least one of specific characters, symbols, and patterns. This allows the information processing device 1 to appropriately present the masked portions to the user U.

また、提供部２２は、発話テキスト情報を表示部１１に表示させることで発話テキスト情報を利用者Ｕに提供する。これにより、情報処理装置１は、利用者Ｕの利便性の向上を図ることができる。 In addition, the providing unit 22 provides the utterance text information to the user U by displaying the utterance text information on the display unit 11. This allows the information processing device 1 to improve the convenience of the user U.

また、提供部２２は、表示部１１に表示されている発話テキスト情報のうちマスキングされた部分が選択された場合、マスキングを解除する。これにより、情報処理装置１は、利用者Ｕの利便性の向上を図ることができる。 Furthermore, when a masked portion of the speech text information displayed on the display unit 11 is selected, the providing unit 22 removes the masking. This enables the information processing device 1 to improve the convenience of the user U.

以上、本願の実施形態を図面に基づいて詳細に説明したが、これは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 The above describes in detail the embodiments of the present application based on the drawings, but this is merely an example, and the present invention can be implemented in other forms that incorporate various modifications and improvements based on the knowledge of those skilled in the art, including the aspects described in the Disclosure of the Invention section.

また、上述してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、取得部は、取得手段や取得回路に読み替えることができる。 Furthermore, the "section, module, unit" mentioned above can be read as "means" or "circuit." For example, an acquisition unit can be read as an acquisition means or an acquisition circuit.

１情報処理装置
２情報提供装置
３端末装置
１０通信部
１１表示部
１２操作部
１３記憶部
１４音声入力部
１５音声出力部
１６位置検出部
１７処理部
２０受付部
２１生成部
２２提供部
２３学習部
１００情報処理システム REFERENCE SIGNS LIST 1 Information processing device 2 Information providing device 3 Terminal device 10 Communication unit 11 Display unit 12 Operation unit 13 Storage unit 14 Voice input unit 15 Voice output unit 16 Position detection unit 17 Processing unit 20 Reception unit 21 Generation unit 22 Provision unit 23 Learning unit 100 Information processing system

Claims

a reception unit that receives utterances from a user;
a generation unit that generates utterance text information, which is text information obtained by converting the user's utterance into text by masking a portion of the utterance that satisfies a confidentiality condition among the utterance received by the reception unit;
a providing unit that provides the utterance text information generated by the generating unit ,
The generation unit
Using a confidential information table including information indicating a plurality of words to be concealed and information indicating a confidentiality level for each of the words, the greater the proportion of words to which a confidentiality level is set that are included in the utterance, the lower the confidentiality level to be the target of the masking, and the speech text information is generated.
1. An information processing device comprising:

a reception unit that receives utterances from a user;
a generation unit that generates utterance text information, which is text information obtained by converting the user's utterance into text by masking a portion of the utterance that satisfies a confidentiality condition among the utterance received by the reception unit;
a providing unit that provides the utterance text information to the user by displaying the utterance text information generated by the generating unit on a display unit,
The providing unit
When a masked portion of the speech text information displayed on the display unit is selected, the masking is released.

The generation unit
2. The information processing device according to claim 1, further comprising a trained model that receives as input audio information or text information corresponding to the utterance received by the reception unit and outputs a confidentiality score indicating the degree of confidentiality for each of a plurality of element parts constituting the user's utterance, and that uses the trained model to generate the utterance text information by treating element parts among the plurality of element parts whose confidentiality score is equal to or greater than a threshold as parts that satisfy the confidentiality condition.

The generation unit
2. The information processing device according to claim 1, further comprising a trained model that receives as input audio information or text information corresponding to the utterance received by the reception unit, and outputs as the utterance text information text information obtained by masking a portion of the user's utterance that satisfies the confidentiality conditions, and that generates the utterance text information using the trained model.

The generation unit
The information processing apparatus according to claim 2 , wherein the speech text information is generated using a confidential information table including information indicating a plurality of words to be confidentialized.

The information processing device according to claim 3 , further comprising a learning unit that updates the trained model based on a past speech history of the user.

The generation unit
The information processing device according to any one of claims 1 to 5, characterized in that the part of the user's speech that satisfies the confidentiality condition is masked by converting the part that satisfies the confidentiality condition into at least one of a specific character, symbol, and pattern.

1. A computer-implemented information processing method, comprising:
a reception step of receiving an utterance from a user;
a generating step of generating utterance text information, which is text information obtained by converting the user's utterance into text by masking a portion of the utterance that satisfies a confidentiality condition among the utterance received by the receiving step;
a providing step of providing the utterance text information generated by the generating step ,
The generating step includes:
Using a confidential information table including information indicating a plurality of words to be concealed and information indicating a confidentiality level for each of the words, the greater the proportion of words to which a confidentiality level is set that are included in the utterance, the lower the confidentiality level to be the target of the masking, and the speech text information is generated.
1. An information processing method comprising:

A computer-implemented information processing method, comprising:
a reception step of receiving an utterance from a user;
a generating step of generating utterance text information, which is text information obtained by converting the user's utterance into text by masking a portion of the utterance that satisfies a confidentiality condition among the utterance received by the receiving step;
a providing step of providing the utterance text information to the user by displaying the utterance text information generated by the generating step on a display unit,
The providing step includes:
When a masked portion of the speech text information displayed on the display unit is selected, the masking is released.
An information processing method comprising:

A reception procedure for receiving a user's utterance;
a generation step of generating utterance text information, which is text information obtained by converting the user's utterance into text by masking a portion of the utterance that satisfies a confidentiality condition among the utterance received by the reception step;
a providing step of providing the utterance text information generated by the generating step ;
The generating procedure includes:
Using a confidential information table including information indicating a plurality of words to be concealed and information indicating a confidentiality level for each of the words, the greater the proportion of words to which a confidentiality level is set that are included in the utterance, the lower the confidentiality level to be the target of the masking, and the speech text information is generated.
An information processing program characterized by:

A reception procedure for receiving a user's utterance;
a generation step of generating utterance text information, which is text information obtained by converting the user's utterance into text by masking a portion of the utterance that satisfies a confidentiality condition among the utterance received by the reception step;
a providing step of providing the utterance text information to the user by displaying the utterance text information generated by the generating step on a display unit;
The providing step comprises:
When a masked portion of the speech text information displayed on the display unit is selected, the masking is released.
An information processing program characterized by: