JP7112949B2

JP7112949B2 - Call control system

Info

Publication number: JP7112949B2
Application number: JP2018225618A
Authority: JP
Inventors: 和愛三上; 勇真五十嵐; 篤佐藤
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2022-08-04
Anticipated expiration: 2038-11-30
Also published as: JP2020088818A

Description

本開示の一側面は呼制御システムに関する。 One aspect of the present disclosure relates to call control systems.

端末間で伝送される通話の内容をテキストに変換して少なくとも一方の端末にそのテキストを表示する技術が知られている。例えば、特許文献１には、第１の端末から入力された音声信号を音声認識し、音声認識結果の読み情報を生成し、少なくとも読み情報を、第１の端末の通話相手である第２の端末に表示させる電話システムが記載されている。 A technique is known for converting the content of a call transmitted between terminals into text and displaying the text on at least one of the terminals. For example, in Patent Document 1, a speech signal input from a first terminal is subjected to speech recognition, reading information of the speech recognition result is generated, and at least the reading information is transmitted to a second communication partner of the first terminal. A telephone system is described for displaying on a terminal.

特開２００８－６６８６６号公報JP 2008-66866 A

上記の電話システムは、一方の話者の発話をテキストに変換して該テキストを他方の話者の電話機に伝送するので、この仕組みは一方向のテキスト変換である。一方の話者の発話を双方の話者が視認する場面を実現するための手段として、発側および着側の双方に音声認識サーバを設置することが考えられる。しかし、音声認識エンジンへの接続が発側と着側との間で異なると音声認識の結果が異なってしまう可能性があり、その結果、一つの発話を表すテキストが発側と着側とで異なる可能性がある。そのため、発側および着側の双方の間で通話内容のテキストを一致させることが望まれている。 This mechanism is one-way text conversion, since the telephone system described above converts one speaker's utterances to text and transmits the text to the other speaker's telephone. As a means for realizing a scene in which one speaker's utterance is visually recognized by both speakers, it is conceivable to install speech recognition servers on both the calling side and the called side. However, if the connection to the speech recognition engine differs between the caller and the callee, the results of speech recognition may differ. may differ. Therefore, it is desired to match the text of the call content between the caller and callee.

本開示の一側面に係る呼制御システムは、発信端末と着信端末との間で伝送される通話をテキストに変換する音声テキスト化サービスを実行可能である。呼制御システムは、発信端末を利用する発信者と着信端末を利用する着信者との双方が音声テキスト化サービスの利用者である場合に、発信端末に対応する発側メディア処理装置と着信端末に対応する着側メディア処理装置とのうちの一方を共通のメディア処理装置として機能させる制御部を備える。共通のメディア処理装置は、発信者または着信者の音声をテキストに変換する音声認識エンジンと接続する。共通のメディア処理装置は、発信端末から送信された発信者の発側音声を音声認識エンジンに入力することで発側テキストを取得し、発側テキストを発信端末および着信端末の双方に向けて送信する。共通のメディア処理装置は、着信端末から送信された着信者の着側音声を音声認識エンジンに入力することで着側テキストを取得し、着側テキストを発信端末および着信端末の双方に向けて送信する。 A call control system according to one aspect of the present disclosure is capable of executing a speech-to-text service that converts a call transmitted between a calling terminal and a called terminal into text. In the call control system, when both the caller using the calling terminal and the called party using the called terminal are users of the voice-to-text service, the calling-side media processing device corresponding to the calling terminal and the called terminal A control unit is provided that causes one of the corresponding destination media processing devices to function as a common media processing device. A common media processing unit interfaces with a speech recognition engine that converts the caller's or called party's speech to text. The common media processing device acquires the caller text by inputting the caller's caller's voice transmitted from the caller terminal into the speech recognition engine, and transmits the caller text to both the caller terminal and the callee terminal. do. The common media processing device acquires the called-side text by inputting the called-side voice of the called party transmitted from the called terminal into the speech recognition engine, and transmits the called-side text to both the calling terminal and the called terminal. do.

このような側面においては、発信者および着信者の双方が音声認識サービスの利用者である場合に、発信者および着信者の双方の音声が共通のメディア処理装置を介してテキストに変換され、そのテキストが発信端末および着信端末の双方に送信される。発側および着側の双方について、共通のメディア処理装置が用いられるので、発側および着側の双方の間で通話内容のテキストを一致させることができる。 In this aspect, when both the caller and the callee are users of a speech recognition service, the voices of both the caller and the callee are converted to text through a common media processing device, and the A text is sent to both the originating terminal and the terminating terminal. Since a common media processor is used for both the originating and terminating parties, the text of the call can be matched between the originating and terminating parties.

本開示の一側面によれば、発側および着側の双方の間で通話内容のテキストを一致させることができる。 According to one aspect of the present disclosure, the text of the call content can be matched between the calling party and the called party.

実施形態に係る呼制御システムの全体構成の一例を示す図である。1 is a diagram showing an example of the overall configuration of a call control system according to an embodiment; FIG. 実施形態に係るいくつかの通信制御装置の機能構成の一例を示す図である。It is a figure showing an example of functional composition of some communication control devices concerning an embodiment. 実施形態に係る呼制御システムの動作の一例を示すシーケンス図である。4 is a sequence diagram showing an example of the operation of the call control system according to the embodiment; FIG. 実施形態に係る呼制御システムの動作の一例を示すシーケンス図である。4 is a sequence diagram showing an example of the operation of the call control system according to the embodiment; FIG. 実施形態に係る呼制御システムの動作の一例を示すシーケンス図である。4 is a sequence diagram showing an example of the operation of the call control system according to the embodiment; FIG. 実施形態に係る呼制御システムの動作の一例を示すシーケンス図である。4 is a sequence diagram showing an example of the operation of the call control system according to the embodiment; FIG. 実施形態に係る通信制御装置に用いられるコンピュータのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of the computer used for the communication control apparatus which concerns on embodiment.

以下、添付図面を参照しながら本開示での実施形態を詳細に説明する。なお、図面の説明において同一または同等の要素には同一の符号を付し、重複する説明を省略する。 Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In the description of the drawings, the same or equivalent elements are denoted by the same reference numerals, and overlapping descriptions are omitted.

呼制御システムは、発信端末と着信端末との間の呼および通話を制御するコンピュータシステムである。呼とは発信端末と着信端末との間で一時的に占有される通信経路のことをいう。発信端末とは最初に呼接続を要求する通信端末のことをいい、着信端末とはその呼接続要求に応答する通信端末のことをいう。これら二つの通信端末間で呼が確立されることで、発信者（発信端末のユーザ）および着信者（着信端末のユーザ）は会話することができる。通話とは、発信端末と着信端末との間で送受信される音声を意味し、また、発信端末と着信端末との間での音声の送受信も意味する。 A call control system is a computer system that controls calls and conversations between originating and terminating terminals. A call is a communication path temporarily occupied between a calling terminal and a called terminal. A calling terminal is a communication terminal that first requests a call connection, and a receiving terminal is a communication terminal that responds to the call connection request. By establishing a call between these two communication terminals, the caller (user of the calling terminal) and the called party (user of the called terminal) can talk. A call means voice transmitted and received between a calling terminal and a called terminal, and also means voice transmission and reception between a calling terminal and a called terminal.

本実施形態では、呼制御システムは、発信端末と着信端末との間の通話をテキストに変換して、変換されたテキストを発信端末および着信端末の少なくとも一方に表示させる音声テキスト化サービス（これは音声認識サービスともいう。）を実行する。本開示では、変換されたテキストを音声テキストともいう。 In this embodiment, the call control system converts a call between a calling terminal and a called terminal into text and displays the converted text on at least one of the calling terminal and the called terminal. (also called speech recognition service). In this disclosure, converted text is also referred to as spoken text.

図１は実施形態に係る呼制御システム１の全体構成を示す図である。呼制御システム１は、発信端末３１が在圏する発側ネットワーク２１と、着信端末３２が在圏する着側ネットワーク２２と、発側ネットワーク２１および着側ネットワーク２２を接続するコアネットワーク１０とを備える。呼制御システム１では、複数の装置および端末の間で制御信号が伝送されることで呼（通信経路）が確立され、音声を示すデータ信号がその呼を介して伝送されることで、通話が可能になる。 FIG. 1 is a diagram showing the overall configuration of a call control system 1 according to an embodiment. The call control system 1 includes a calling-side network 21 in which a calling terminal 31 resides, a called-side network 22 in which a called terminal 32 resides, and a core network 10 connecting the calling-side network 21 and the called-side network 22. . In the call control system 1, a call (communication path) is established by transmitting control signals between a plurality of devices and terminals, and a call is completed by transmitting a data signal representing voice through the call. be possible.

発信端末３１および着信端末３２はいずれも、通話機能を有する通信端末である。発信端末３１および着信端末３２のそれぞれは固定端末でもよいし携帯端末でもよい。発信端末３１および着信端末３２の例として、携帯電話機、スマートフォン、タブレット端末、ウェアラブル端末、またはパーソナルコンピュータが挙げられるが、端末の種類はこれらに限定されない。発信端末３１と着信端末３２とで端末の種類が同じでもよいし異なってもよい。 Both the originating terminal 31 and the receiving terminal 32 are communication terminals having call functions. Each of the originating terminal 31 and the receiving terminal 32 may be a fixed terminal or a mobile terminal. Examples of the calling terminal 31 and receiving terminal 32 include mobile phones, smart phones, tablet terminals, wearable terminals, and personal computers, but the types of terminals are not limited to these. The type of terminals may be the same or different between the originating terminal 31 and the receiving terminal 32 .

発側ネットワーク２１および着側ネットワーク２２はいずれも、端末が直接に接続するアクセスネットワークである。アクセスネットワークの構成は限定されない。例えば、アクセスネットワークは任意の無線ネットワークまたは有線ネットワークであってもよい。発側ネットワーク２１と着側ネットワーク２２との間でアクセスネットワークの種類（プロトコル）が同じでもよいし異なってもよい。 Both the originating network 21 and the terminating network 22 are access networks to which terminals are directly connected. The configuration of the access network is not limited. For example, the access network may be any wireless or wired network. The types (protocols) of access networks may be the same or different between the originating network 21 and the terminating network 22 .

コアネットワーク１０は、呼制御システム１の中核を成すネットワークであり、様々な通信制御装置を備える。本実施形態では、コアネットワーク１０はＩＭＳネットワークであるとする。ＩＭＳネットワークは、通信プロトコルとしてＳＩＰを用い、データ通信だけでなく音声または動画のリアルタイム通信を実現するマルチメディアサービスを提供できるネットワークである。ＩＭＳネットワークでは、呼セッション制御機能（ＣＳＣＦ：ＣａｌｌＳｅｓｓｉｏｎＣｏｎｔｒｏｌＦｕｎｃｔｉｏｎ）、アプリケーションサーバ（ＡＳ：ＡｐｐｌｉｃａｔｉｏｎＳｅｒｖｅｒ）、ゲートウェイ、加入者管理機能（ＨＳＳ：ＨｏｍｅＳｕｂｓｃｒｉｂｅｒＳｅｒｖｅｒ）などの複数の通信制御装置により呼が処理される。ＣＳＣＦは、呼またはセッションを設定したり、予め定められたサービスを起動したりする呼制御装置である。アプリケーションサーバは、予め定められた付加サービス（例えば、音声テキスト化サービス）を実行したり、その付加サービスの実行の可否を判定したりする装置である。ゲートウェイは、アクセスネットワークとコアネットワークとを接続する装置である。ＨＳＳはユーザのプロファイル（加入者情報）を記憶する装置（データベース）である。 The core network 10 is a network forming the core of the call control system 1 and includes various communication control devices. In this embodiment, core network 10 is assumed to be an IMS network. The IMS network is a network that uses SIP as a communication protocol and can provide multimedia services that realize not only data communication but also voice or video real-time communication. In the IMS network, calls are processed by a plurality of communication control devices such as a Call Session Control Function (CSCF), an Application Server (AS), a gateway, and a Home Subscriber Server (HSS). be done. A CSCF is a call control device that sets up calls or sessions and activates predefined services. The application server is a device that executes a predetermined supplementary service (for example, speech-to-text service) and determines whether or not the supplementary service can be executed. A gateway is a device that connects an access network and a core network. The HSS is a device (database) that stores user profiles (subscriber information).

本実施形態では、コアネットワーク１０は、ＭＣＥ（ＭｅｄｉａＣｏｍｐｏｓｉｔｉｏｎＥｎａｂｌｅｒ）およびＳＭＳ－ＧＷ（ＳＭＳゲートウェイ）という２種類の通信制御装置をさらに備える。ＭＣＥは通話の付加機能を提供するメディア処理装置である。ＳＭＳ－ＧＷは、コアネットワークと他のネットワークとを接続するゲートウェイの一種であり、ショートメッセージサービス（ＳＭＳ）を提供する装置である。 In this embodiment, the core network 10 further includes two types of communication control devices, an MCE (Media Composition Enabler) and an SMS-GW (SMS gateway). The MCE is a media processing device that provides additional call functionality. SMS-GW is a type of gateway that connects a core network and other networks, and is a device that provides short message service (SMS).

図１は、付加サービスを伴う呼の制御に特に関連する通信制御装置を示し、具体的には、発側ＣＳＣＦ１１、着側ＣＳＣＦ１２、発側ＡＳ１３、着側ＡＳ１４、発側ＭＣＥ１５、着側ＭＣＥ１６、発側ＳＭＳ－ＧＷ１７、および着側ＳＭＳ－ＧＷ１８を示す。 FIG. 1 shows a communication control device particularly related to control of calls involving supplementary services. An originating SMS-GW 17 and a terminating SMS-GW 18 are shown.

発側ＣＳＣＦ１１および着側ＣＳＣＦ１２はいずれも、発信端末３１と着信端末３２とを通信接続するための呼制御を実行する。発側ＣＳＣＦ１１と着側ＣＳＣＦ１２との間で制御信号およびデータ信号（例えば音声データ）が送受信されることで、発側と着側とが相互に接続される。発側ＡＳ１３は発側のアプリケーションサーバであり、着側ＡＳ１４は着側のアプリケーションサーバである。発側ＭＣＥ１５は発側のメディア処理装置であり、着側ＭＣＥ１６は着側のメディア処理装置である。発側ＳＭＳ－ＧＷ１７は発側のＳＭＳゲートウェイであり、着側ＳＭＳ－ＧＷ１８は着側のＳＭＳゲートウェイである。 Both the originating CSCF 11 and the terminating CSCF 12 execute call control for connecting the originating terminal 31 and the terminating terminal 32 for communication. By transmitting and receiving control signals and data signals (for example, voice data) between the originating CSCF 11 and the terminating CSCF 12, the originating side and the terminating side are connected to each other. The originating AS 13 is an originating application server, and the terminating AS 14 is an terminating application server. The calling side MCE 15 is a calling side media processing device, and the called side MCE 16 is a called side media processing device. The calling-side SMS-GW 17 is the calling-side SMS gateway, and the called-side SMS-GW 18 is the called-side SMS gateway.

図１はさらに発側Ｗｅｂサーバ４１、着側Ｗｅｂサーバ４２、および音声認識エンジン４３を示す。発側Ｗｅｂサーバ４１および音声認識エンジン４３は、発信端末３１に音声テキスト化サービスを提供する発側サービス基盤を構成する。着側Ｗｅｂサーバ４２および音声認識エンジン４３は、着信端末３２に音声テキスト化サービスを提供する着側サービス基盤を構成する。音声認識エンジン４３は、発側および着側の双方により用いられる共通のコンピュータであり、音声認識を用いて音声をテキストに変換する。発側および着側のサービス基盤はいずれも、コアネットワーク１０とは別の通信ネットワーク内に設けられる。発側Ｗｅｂサーバ４１は、発信端末３１、発側ＡＳ１３、および発側ＭＣＥ１５のそれぞれとデータ通信を実行することができる。着側Ｗｅｂサーバ４２は、着信端末３２、着側ＡＳ１４、および着側ＭＣＥ１６のそれぞれとデータ通信を実行することができる。音声認識エンジン４３は発側ＭＣＥ１５および着側ＭＣＥ１６のそれぞれとデータ通信を実行することができる。発信端末３１は発側Ｗｅｂサーバ４１と接続することで音声テキスト化サービスを発信者に提供することができる。着信端末３２は着側Ｗｅｂサーバ４２と接続することで音声テキスト化サービスを着信者に提供することができる。 FIG. 1 further shows a calling side Web server 41 , a called side Web server 42 , and a speech recognition engine 43 . The calling-side Web server 41 and the speech recognition engine 43 constitute a calling-side service infrastructure that provides the calling terminal 31 with a speech-to-text conversion service. The called-side Web server 42 and the voice recognition engine 43 constitute a called-side service infrastructure that provides the called terminal 32 with a speech-to-text conversion service. The speech recognition engine 43 is a common computer used by both calling and called parties that uses speech recognition to convert speech into text. Both the originating side and the terminating side service infrastructure are provided in a communication network separate from the core network 10 . The calling-side Web server 41 can perform data communication with each of the calling terminal 31, the calling-side AS 13, and the calling-side MCE 15. FIG. The terminating-side Web server 42 can perform data communication with each of the terminating terminal 32, the terminating-side AS 14, and the terminating-side MCE 16. FIG. The speech recognition engine 43 can perform data communication with each of the originating MCE 15 and the terminating MCE 16 . The originating terminal 31 can provide a speech-to-text conversion service to the originator by connecting to the originating side Web server 41 . The receiving terminal 32 can provide a speech-to-text conversion service to the receiving party by connecting to the receiving side Web server 42 .

本実施形態では、コアネットワーク１０はセッションデータベース（セッションＤＢ）１９をさらに備える。セッションデータベース１９は、音声テキスト化サービスを伴う呼（セッション）に関するセッション情報を記憶する装置（記憶部）であり、発側および着側の双方により用いられる共通のデータベースである。セッションデータベース１９は発側ＡＳ１３および着側ＡＳ１４にアクセスされ得る。 In this embodiment, the core network 10 further comprises a session database (session DB) 19 . The session database 19 is a device (storage unit) that stores session information regarding a call (session) involving a speech-to-text service, and is a common database used by both the caller and callee. Session database 19 can be accessed by originating AS 13 and terminating AS 14 .

例えば、一つの呼に対応するセッション情報は、セッションＩＤ、発側補助セッションＩＤ、着側補助セッションＩＤ、発信端末３１の加入者番号、着信端末３２の加入者番号、発側エンドポイント、着側エンドポイント、および認識方向というデータ項目群を含んでもよい。セッションＩＤは呼（セッション）を一意に特定する識別子である。補助セッションＩＤは、コアネットワーク１０の外側に位置するＷｅｂサーバでも呼を一意に特定できるように用意される識別子である。発側補助セッションＩＤは発側Ｗｅｂサーバ４１のために用いられ、着側補助セッションＩＤは着側Ｗｅｂサーバ４２のために用いられる。エンドポイントはＷｅｂサーバを一意に特定する識別子である。発側エンドポイントは発側Ｗｅｂサーバ４１を一意に特定し、着側エンドポイントは着側Ｗｅｂサーバ４２を一意に特定する。認識方向は、音声テキストをどの通信端末に送信するかを示す情報である。 For example, the session information corresponding to one call includes session ID, originating auxiliary session ID, terminating auxiliary session ID, subscriber number of originating terminal 31, subscriber number of terminating terminal 32, originating end point, terminating side Data items such as endpoint and recognition direction may be included. A session ID is an identifier that uniquely identifies a call (session). The auxiliary session ID is an identifier prepared so that even a web server located outside the core network 10 can uniquely identify a call. The calling side auxiliary session ID is used for the calling side Web server 41 and the called side auxiliary session ID is used for the called side Web server 42 . An endpoint is an identifier that uniquely identifies a web server. The originating endpoint uniquely identifies the originating Web server 41 , and the terminating endpoint uniquely identifies the terminating Web server 42 . The recognition direction is information indicating to which communication terminal the voice text is to be sent.

セッション情報のデータ構造は限定されず、任意の方針で設計されてよい。例えば、セッション情報は発側のレコードと着側のレコードとを互いに関連付けることで表現されてもよい。あるいは、セッション情報は、発側および着側の双方のデータ項目が１レコードに統合されることで表現されてもよい。 The data structure of session information is not limited and may be designed according to any policy. For example, the session information may be expressed by associating a calling-side record and a called-side record with each other. Alternatively, the session information may be expressed by integrating data items of both the originating side and the receiving side into one record.

図１に示す各装置は、少なくとも一つのコンピュータを用いて構成される。複数のコンピュータが用いられる場合には、これらのコンピュータが通信ネットワークを介して相互に接続することで、論理的に一つの装置が構築される。 Each device shown in FIG. 1 is configured using at least one computer. When a plurality of computers are used, these computers are interconnected via a communication network to logically construct one device.

呼制御システム１の特徴の一つは、発信者および着信者の双方が音声テキスト化サービスを利用する場合に、発側および着側のいずれか一方が、発信者および着信者の双方の音声をテキストに変換する点にある。図１に示すように音声認識エンジン４３が発側と着側とで共通であったとしても、その音声認識エンジン４３への接続が発側と着側との間で異なると音声認識の結果が異なってしまう可能性がある。例えば、或る一つの発話が発側ＭＣＥ１５から音声認識エンジン４３に入力された場合と、同じ発話が着側ＭＣＥ１６から音声認識エンジン４３に入力された場合とで、音声テキストが異なる可能性がある。発側および着側の双方の間で通話内容のテキストを一致させるために、呼制御システム１では、発側ＭＣＥ１５および着側ＭＣＥ１６のうちの一方のみが共通のメディア処理装置として機能する。この共通のメディア処理装置は、発信者および着信者の双方の音声を音声認識エンジン４３に送信し、音声テキストを発側Ｗｅｂサーバ４１および着側Ｗｅｂサーバ４２の双方に送信する。図１は、この仕組みに関連する接続５１，５２も示す。接続５１は一つの呼（セッション）において発側ＭＣＥ１５が共通のメディア処理装置として機能する場合に用いられ、接続５２は一つの呼（セッション）において着側ＭＣＥ１６が共通のメディア処理装置として機能する場合に用いられる。 One of the features of the call control system 1 is that when both the caller and the called party use the speech-to-text service, either the caller or the called party can reproduce the voices of both the caller and the called party. It's about converting to text. As shown in FIG. 1, even if the voice recognition engine 43 is common to both the calling side and the called side, if the connection to the voice recognition engine 43 differs between the calling side and the called side, the result of voice recognition will be different. It could be different. For example, there is a possibility that the speech text will be different when a certain utterance is input from the originating MCE 15 to the speech recognition engine 43 and when the same utterance is input from the destination MCE 16 to the speech recognition engine 43. . In the call control system 1, only one of the calling side MCE 15 and the called side MCE 16 functions as a common media processing device in order to match the text of the call content between both the calling side and the called side. This common media processing unit transmits both the caller's and called party's voice to the speech recognition engine 43 and the voice text to both the calling Web server 41 and the called Web server 42 . Figure 1 also shows the connections 51, 52 associated with this arrangement. The connection 51 is used when the calling side MCE 15 functions as a common media processing device in one call (session), and the connection 52 is used when the receiving side MCE 16 functions as a common media processing device in one call (session). used for

図２は、アプリケーションサーバの機能構成の一例を示す図である。発側ＡＳ１３は機能要素としてサービス制御部１３１、セッション制御部１３２、およびサービスシナリオ部１３３を備える。サービス制御部１３１は発側ＣＳＣＦ１１との間でデータを送受信する機能要素である。セッション制御部１３２は発側ＭＣＥ１５との間でデータを送受信する機能要素である。サービスシナリオ部１３３は発側ＳＭＳ－ＧＷ１７および発側Ｗｅｂサーバ４１のそれぞれとの間でデータを送受信する機能要素である。発側ＭＣＥ１５が発側および着側のそれぞれの音声を処理する場合には、サービスシナリオ部１３３は着側Ｗｅｂサーバ４２との間でもデータを送受信する可能性があり、図２における接続６１はその通信を示す。 FIG. 2 is a diagram illustrating an example of a functional configuration of an application server; The originating AS 13 has a service control section 131, a session control section 132, and a service scenario section 133 as functional elements. The service control unit 131 is a functional element that transmits and receives data to and from the CSCF 11 on the calling side. The session control unit 132 is a functional element that transmits and receives data to and from the originating MCE 15 . The service scenario unit 133 is a functional element that transmits and receives data to and from the SMS-GW 17 on the calling side and the Web server 41 on the calling side. When the calling side MCE 15 processes the voices of the calling side and the called side respectively, the service scenario section 133 may also transmit and receive data to and from the called side Web server 42, and the connection 61 in FIG. Indicates communication.

着側ＡＳ１４は機能要素としてサービス制御部１４１、セッション制御部１４２、およびサービスシナリオ部１４３を備える。サービス制御部１４１は着側ＣＳＣＦ１２との間でデータを送受信する機能要素である。セッション制御部１４２は着側ＭＣＥ１６との間でデータを送受信する機能要素である。サービスシナリオ部１４３は着側ＳＭＳ－ＧＷ１８および着側Ｗｅｂサーバ４２のそれぞれとの間でデータを送受信する機能要素である。着側ＭＣＥ１６が発側および着側のそれぞれの音声を処理する場合には、サービスシナリオ部１４３は発側Ｗｅｂサーバ４１との間でもデータを送受信する可能性があり、図２における接続６２はその通信を示す。 The destination AS 14 has a service control section 141, a session control section 142, and a service scenario section 143 as functional elements. The service control unit 141 is a functional element that transmits and receives data to and from the CSCF 12 on the receiving side. The session control unit 142 is a functional element that transmits and receives data to and from the destination MCE 16 . The service scenario unit 143 is a functional element that transmits and receives data to and from the SMS-GW 18 on the receiving side and the Web server 42 on the receiving side. When the receiving side MCE 16 processes the voices of the calling side and the called side respectively, the service scenario section 143 may also transmit and receive data to and from the calling side Web server 41, and the connection 62 in FIG. Indicates communication.

発側ＡＳ１３および着側ＡＳ１４はいずれも、発信者および着信者の双方が音声テキスト化サービスを利用する場合に、発側ＭＣＥ１５および着側ＭＣＥ１６のうちの一方を共通のメディア処理装置として機能させる制御部を備える。発側ＡＳ１３では、サービス制御部１３１、セッション制御部１３２、およびサービスシナリオ部１３３の少なくとも一つがその制御部に相当する。着側ＡＳ１４では、サービス制御部１４１、セッション制御部１４２、およびサービスシナリオ部１４３の少なくとも一つがその制御部に相当する。 Both the calling side AS 13 and the called side AS 14 control one of the calling side MCE 15 and the called side MCE 16 to function as a common media processing device when both the caller and the called party use the speech-to-text service. have a department. In the originating AS 13, at least one of the service control section 131, the session control section 132, and the service scenario section 133 corresponds to its control section. At least one of the service control section 141, the session control section 142, and the service scenario section 143 corresponds to the control section of the destination AS 14. FIG.

本実施形態では発側ＭＣＥ１５が双方の音声を処理する例を説明する。したがって、図１に示す接続５１と図２に示す接続６１とが利用される。しかし、本開示はその例に限定されるものではなく、着側ＭＣＥ１６が双方の音声を処理してもよい。 In this embodiment, an example in which the originating MCE 15 processes both voices will be described. Therefore, connection 51 shown in FIG. 1 and connection 61 shown in FIG. 2 are utilized. However, the present disclosure is not limited to that example, and the terminating MCE 16 may process both voices.

図３～図６を参照しながら、本実施形態に係る呼制御システム１の動作の例を説明する。図３～図６はいずれも呼制御システム１の動作の一例を示すシーケンス図である。図３は呼を確立する処理の例を示す。図４および図５は音声テキスト化サービスを起動する処理の例を示す。図６は音声テキストを通信端末上に表示する処理の例を示す。理解を容易にするために、図３～図６では、通話および音声テキスト化サービスの制御に特に関係する構成要素、処理、およびデータ信号に限って示す。 An example of the operation of the call control system 1 according to this embodiment will be described with reference to FIGS. 3 to 6. FIG. 3 to 6 are sequence diagrams showing an example of the operation of the call control system 1. FIG. FIG. 3 shows an example of the process of establishing a call. 4 and 5 show an example of processing for activating the speech-to-text service. FIG. 6 shows an example of the process of displaying the spoken text on the communication terminal. For ease of understanding, FIGS. 3-6 show only those components, processes, and data signals specifically related to the control of the call and speech-to-text services.

まず、図３を参照しながら、呼を確立する処理の例を処理フローＳ１として説明する。 First, referring to FIG. 3, an example of a call establishment process will be described as a process flow S1.

ステップＳ１０１では、発信端末３１が発信者の発信操作に応じてＩＮＶＩＴＥメッセージを送信し、発側ＡＳ１３がそのＩＮＶＩＴＥメッセージを受信する。ＩＮＶＩＴＥメッセージは、発信端末３１と着信端末３２との間に呼（セッション）を確立するために伝送される制御信号（呼確立要求信号）である。このＩＮＶＩＴＥメッセージは発側ネットワーク２１を経由してコアネットワーク１０に入る。コアネットワーク１０では、発側ＣＳＣＦ１１がそのＩＮＶＩＴＥメッセージを発側ＡＳ１３に転送する。 In step S101, the originating terminal 31 transmits an INVITE message in response to the originator's originating operation, and the originating AS 13 receives the INVITE message. The INVITE message is a control signal (call establishment request signal) transmitted to establish a call (session) between the calling terminal 31 and the called terminal 32 . This INVITE message enters the core network 10 via the originating network 21 . In core network 10 , originating CSCF 11 forwards the INVITE message to originating AS 13 .

ステップＳ１０２では、サービス制御部１３１がそのＩＮＶＩＴＥメッセージに応答して発信端末３１（発信者）のために音声テキスト化サービスを起動する。サービス制御部１３１は加入者管理機能にアクセスして発信者の加入者情報を参照し、発信者が音声テキスト化サービスを契約しているか否かを判定する。発信者が音声テキスト化サービスを契約している場合に、サービス制御部１３１はサービスを起動する。本実施形態では、発信者が音声テキスト化サービスの契約者であることを前提とする。サービスの起動に関連して、サービス制御部１３１、セッション制御部１３２、およびサービスシナリオ部１３３は連携して、これから確立する呼のセッションＩＤと、発側補助セッションＩＤと、発信端末３１の加入者番号と、着信端末３２の加入者番号とを含むセッション情報をセッションデータベース１９に格納する。 In step S102, the service control unit 131 responds to the INVITE message and activates the speech-to-text service for the calling terminal 31 (caller). The service control unit 131 accesses the subscriber management function, refers to the subscriber information of the caller, and determines whether or not the caller has a contract for the speech-to-text conversion service. If the caller subscribes to the speech-to-text service, the service control unit 131 activates the service. In this embodiment, it is assumed that the caller is a subscriber of the voice-to-text conversion service. In connection with service activation, the service control unit 131, the session control unit 132, and the service scenario unit 133 work together to establish the session ID of the call to be established, the calling side auxiliary session ID, and the subscriber of the calling terminal 31. The session information including the number and the subscriber number of the called terminal 32 is stored in the session database 19 .

ステップＳ１０３では、サービスシナリオ部１３３が発側ＳＭＳ－ＧＷ１７にプッシュ通知を送信し、ステップＳ１０４では、発側ＳＭＳ－ＧＷ１７がそのプッシュ通知に応答して発信端末３１にプッシュ要求を送信する。サービスシナリオ部１３３は、サービス制御部１３１からの指示に応答してユーザプロファイルにアクセスして発信者のユーザ情報を参照し、音声テキスト化サービスの契約状態を判定する。発信者に音声テキスト化サービスを提供できる場合に、サービスシナリオ部１３３はプッシュ通知を送信する。本実施形態では、発信者が音声テキスト化サービスを享受する資格を有することを前提とする。プッシュ要求は、発信端末３１が発側Ｗｅｂサーバ４１から音声テキスト化サービスを受けるために必要な情報（例えば、発信端末３１のデバイストークン、および発側補助セッションＩＤ）を含み、プッシュ通知は、そのプッシュ要求を構成する情報の少なくとも一部を含む。 In step S103, the service scenario unit 133 transmits a push notification to the originating SMS-GW 17, and in step S104, the originating SMS-GW 17 transmits a push request to the originating terminal 31 in response to the push notification. The service scenario unit 133 accesses the user profile in response to the instruction from the service control unit 131, refers to the user information of the caller, and determines the contract status of the speech-to-text service. The service scenario unit 133 sends a push notification when the speech-to-text service can be provided to the caller. This embodiment assumes that the caller is qualified to receive the speech-to-text service. The push request includes information necessary for the originating terminal 31 to receive the speech-to-text service from the originating Web server 41 (for example, the device token of the originating terminal 31 and the originating auxiliary session ID), and the push notification Contains at least part of the information that makes up the push request.

ステップＳ１０５では、セッション制御部１３２が発側ＭＣＥ１５との接続のためにＩＮＶＩＴＥメッセージを発側ＭＣＥ１５に送信する。発側ＭＣＥ１５はそのＩＮＶＩＴＥメッセージに応答して音声テキスト化サービスのための処理を実行した後に、ステップＳ１０６において２００＿ＯＫメッセージを送信する。２００＿ＯＫメッセージは、ＩＮＶＩＴＥメッセージに対応する処理が正常に実行されたことを示す応答信号である。すなわち、２００＿ＯＫメッセージはＩＮＶＩＴＥメッセージに対応する成功応答信号である。 In step S105 , the session control unit 132 transmits an INVITE message to the calling MCE 15 for connection with the calling MCE 15 . After the originating MCE 15 responds to the INVITE message and performs processing for the speech-to-text service, it transmits a 200_OK message in step S106. The 200_OK message is a response signal indicating that the processing corresponding to the INVITE message was successfully executed. That is, the 200_OK message is a success response signal corresponding to the INVITE message.

ステップＳ１０７では、サービス制御部１３１が着側ＡＳ１４に向けてＩＮＶＩＴＥメッセージを送信する。サービス制御部１３１は、ＩＮＶＩＴＥメッセージのヘッダ情報に、発側ＭＣＥ１５を一意に特定するための識別子である発側メディア装置ＩＤと、発側で音声テキスト化サービスが実行されることを示す発側サービス情報とを付加する。そして、サービス制御部１３１は発側メディア装置ＩＤおよび発側サービス情報を含むＩＮＶＩＴＥメッセージを送信する。このＩＮＶＩＴＥメッセージは発側ＣＳＣＦ１１および着側ＣＳＣＦ１２を経由して着側ＡＳ１４に到達する。 In step S107, the service control unit 131 transmits an INVITE message to the AS 14 on the receiving side. The service control unit 131 adds, in header information of the INVITE message, an originating media device ID, which is an identifier for uniquely identifying the originating MCE 15, and an originating-side service ID indicating that a speech-to-text conversion service is to be executed on the originating side. Add information. The service control unit 131 then transmits an INVITE message containing the originating media device ID and the originating service information. This INVITE message reaches terminating AS 14 via originating CSCF 11 and terminating CSCF 12 .

ステップＳ１０８では、サービス制御部１４１が発側ＡＳ１３からのＩＮＶＩＴＥメッセージに応答して着信端末３２（着信者）のために音声テキスト化サービスを起動する。サービス制御部１４１は加入者管理機能にアクセスして着信者の加入者情報を参照し、着信者が音声テキスト化サービスを契約しているか否かを判定する。着信者が音声テキスト化サービスを契約している場合に、サービス制御部１４１はサービスを起動する。本実施形態では、着信者が音声テキスト化サービスの契約者であることを前提とする。サービスの起動に関連して、サービス制御部１４１、セッション制御部１４２、およびサービスシナリオ部１４３は連携して、これから確立する呼の着側補助セッションＩＤをセッションデータベース１９内の対応するセッション情報に書き込む。 In step S108, the service control unit 141 responds to the INVITE message from the originating AS 13 and activates the speech-to-text service for the receiving terminal 32 (recipient). The service control unit 141 accesses the subscriber management function, refers to the subscriber information of the called party, and determines whether or not the called party has a contract for the speech-to-text conversion service. If the called party subscribes to the voice-to-text service, the service control unit 141 activates the service. In this embodiment, it is assumed that the called party is a subscriber of the voice-to-text conversion service. In connection with service activation, the service control unit 141, the session control unit 142, and the service scenario unit 143 cooperate to write the incoming side auxiliary session ID of the call to be established into the corresponding session information in the session database 19. .

ステップＳ１０９では、サービスシナリオ部１４３が着側ＳＭＳ－ＧＷ１８にプッシュ通知を送信し、ステップＳ１１０では、着側ＳＭＳ－ＧＷ１８がそのプッシュ通知に応答して着信端末３２にプッシュ要求を送信する。サービスシナリオ部１４３は、サービス制御部１４１からの指示に応答してユーザプロファイルにアクセスして着信者のユーザ情報を参照し、音声テキスト化サービスの契約状態を判定する。着信者に音声テキスト化サービスを提供できる場合に、サービスシナリオ部１４３はプッシュ通知を送信する。本実施形態では、着信者が音声テキスト化サービスを享受する資格を有することを前提とする。プッシュ要求は、着信端末３２が着側Ｗｅｂサーバ４２から音声テキスト化サービスを受けるために必要な情報（例えば、着信端末３２のデバイストークン、および着側補助セッションＩＤ）を含み、プッシュ通知は、そのプッシュ要求を構成する情報の少なくとも一部を含む。 In step S109, the service scenario unit 143 transmits a push notification to the called-side SMS-GW 18, and in step S110, the called-side SMS-GW 18 transmits a push request to the called terminal 32 in response to the push notification. The service scenario unit 143 accesses the user profile in response to the instruction from the service control unit 141, refers to the user information of the called party, and determines the contract status of the speech-to-text service. The service scenario unit 143 sends a push notification when the speech-to-text service can be provided to the called party. In this embodiment, it is assumed that the called party is entitled to enjoy the speech-to-text service. The push request includes information necessary for the receiving terminal 32 to receive the speech-to-text service from the receiving-side Web server 42 (for example, the device token of the receiving terminal 32 and the receiving-side auxiliary session ID). Contains at least part of the information that makes up the push request.

ステップＳ１１１では、セッション制御部１４２が着側ＭＣＥ１６との接続のためにＩＮＶＩＴＥメッセージを着側ＭＣＥ１６に送信する。着側ＭＣＥ１６はそのＩＮＶＩＴＥメッセージに応答して音声テキスト化サービスのための処理を実行する。着側ＭＣＥ１６はＩＮＶＩＴＥメッセージ内の発側メディア装置ＩＤおよび発側サービス情報を参照することで、発側で音声テキスト化サービスが実行されることと、発側ＭＣＥ１５がそのサービスを実行することとを認識する。この認識に基づいて、着側ＭＣＥ１６は音声データを音声認識エンジン４３に提供しない。ただし、着側ＭＣＥ１６と着側ＡＳ１４との間の接続は、呼が切断されるまで維持される。ステップＳ１１２では、着側ＭＣＥ１６が２００＿ＯＫメッセージを着側ＡＳ１４に送信する。 In step S111, the session control unit 142 transmits an INVITE message to the MCE 16 on the receiving side for connection with the MCE 16 on the receiving side. The terminating MCE 16 performs processing for the speech-to-text service in response to the INVITE message. By referring to the originating media device ID and the originating service information in the INVITE message, the terminating MCE 16 confirms that the originating side will execute the voice-to-text conversion service and that the originating MCE 15 will execute the service. recognize. Based on this recognition, the destination MCE 16 does not provide voice data to the voice recognition engine 43 . However, the connection between the terminating MCE 16 and the terminating AS 14 is maintained until the call is disconnected. In step S112, the terminating MCE 16 sends a 200_OK message to the terminating AS 14.

ステップＳ１１３では、サービス制御部１４１がＩＮＶＩＴＥメッセージを着信端末３２に向けて送信する。ＩＮＶＩＴＥメッセージは着側ＡＳ１４から着側ＣＳＣＦ１２に送られ、着側ＣＳＣＦ１２から着側ネットワーク２２を経由して着信端末３２に送信される。着信端末３２がそのＩＮＶＩＴＥメッセージを受信することで、着信端末３２に対する呼出処理が完了する。 In step S113 , the service control unit 141 transmits an INVITE message to the receiving terminal 32 . The INVITE message is sent from the terminating AS 14 to the terminating CSCF 12 and from the terminating CSCF 12 through the terminating network 22 to the terminating terminal 32 . When the receiving terminal 32 receives the INVITE message, the calling process for the receiving terminal 32 is completed.

ステップＳ１１４では、着信者が電話に出たことに応答して、着信端末３２が２００＿ＯＫメッセージを送信し、この２００＿ＯＫメッセージが着側ネットワーク２２および着側ＣＳＣＦ１２を経由して着側ＡＳ１４に到達する。 In step S114, the terminating terminal 32 transmits a 200_OK message in response to the terminating party answering the call, and this 200_OK message reaches the terminating AS 14 via the terminating network 22 and the terminating CSCF 12 .

ステップＳ１１５では、着側ＡＳ１４のサービス制御部１４１、セッション制御部１４２、およびサービスシナリオ部１４３のそれぞれがそのメッセージを処理し、最後にサービス制御部１４１が２００＿ＯＫメッセージを発側ＡＳ１３に向けて送信する。サービス制御部１４１は、２００＿ＯＫメッセージのヘッダ情報に、着側ＭＣＥ１６を一意に特定するための識別子である着側メディア装置ＩＤと、着側で音声テキスト化サービスが実行されることを示す着側サービス情報とを付加する。そして、サービス制御部１４１は着側メディア装置ＩＤおよび着側サービス情報を含む２００＿ＯＫメッセージを送信する。この２００＿ＯＫメッセージは着側ＣＳＣＦ１２および発側ＣＳＣＦ１１を経由して発側ＡＳ１３に到達する。 In step S115, each of service control unit 141, session control unit 142, and service scenario unit 143 of terminating AS 14 processes the message, and finally service control unit 141 transmits a 200_OK message to originating AS 13. . The service control unit 141 adds, in the header information of the 200_OK message, a called-side media device ID that is an identifier for uniquely identifying the called-side MCE 16 and a called-side service indicating that the speech-to-text conversion service is to be executed on the called side. Add information. The service control unit 141 then transmits a 200_OK message containing the destination media device ID and the destination service information. This 200_OK message reaches the originating AS 13 via the terminating CSCF 12 and the originating CSCF 11 .

ステップＳ１１６では、セッション制御部１３２がその２００＿ＯＫメッセージを発側ＭＣＥ１５に送信する。発側ＭＣＥ１５はその２００＿ＯＫメッセージ内の着側メディア装置ＩＤおよび着側サービス情報を参照することで、着側でも音声テキスト化サービスが実行されることを認識する。この認識に基づいて、発側ＭＣＥ１５は発信端末３１からの音声データと着信端末３２からの音声データとを音声認識エンジン４３に提供する。このように、発側ＡＳ１３は発側ＭＣＥ１５を共通のメディア処理装置として機能させる。ステップＳ１１７では、発側ＭＣＥ１５が２００＿ＯＫメッセージを発側ＡＳ１３に返し、ステップＳ１１８では、発側ＡＳ１３がその２００＿ＯＫメッセージを発信端末３１に向けて送信する。２００＿ＯＫメッセージは発側ＣＳＣＦ１１および発側ネットワーク２１を経由して発信端末３１に到達する。 In step S116, the session control unit 132 transmits the 200_OK message to the originating MCE 15. The originating MCE 15 refers to the terminating media device ID and the terminating service information in the 200_OK message, thereby recognizing that the terminating side also executes the speech-to-text service. Based on this recognition, the calling MCE 15 provides the voice data from the calling terminal 31 and the voice data from the receiving terminal 32 to the voice recognition engine 43 . Thus, the originating AS 13 causes the originating MCE 15 to function as a common media processing device. In step S117, the originating MCE 15 returns the 200_OK message to the originating AS 13, and in step S118, the originating AS 13 transmits the 200_OK message to the originating terminal 31. FIG. The 200_OK message reaches the originating terminal 31 via the originating CSCF 11 and the originating network 21 .

ステップＳ１１９では、発信端末３１が２００＿ＯＫメッセージを受信することで、発信端末３１と着信端末３２との間に、データ信号を伝送するためのＵ－Ｐｌａｎｅ（ユーザ・プレイン）のバスが確立される。すなわち、発信端末３１と着信端末３２との間に呼が確立される。この結果、発信端末３１と着信端末３２との間で通話が可能になる。 In step S119, the calling terminal 31 receives the 200_OK message, whereby a U-Plane (user plane) bus for transmitting data signals is established between the calling terminal 31 and the called terminal 32. FIG. That is, a call is established between the originating terminal 31 and the receiving terminal 32 . As a result, a call can be made between the calling terminal 31 and the receiving terminal 32 .

次に、図４を参照しながら、音声テキスト化サービスを起動する処理の例を処理フローＳ２として説明する。この例は、通信端末での音声テキスト化サービスの開始のタイミングが発信端末３１と着信端末３２との間で同じかまたはほぼ同じ場合を示す。 Next, referring to FIG. 4, an example of processing for activating the speech-to-text service will be described as a processing flow S2. This example shows a case where the start timing of the speech-to-text service at the communication terminal is the same or almost the same between the originating terminal 31 and the receiving terminal 32 .

ステップＳ２０１では、発信端末３１が音声テキスト化サービスのためのアプリケーションプログラムを起動するために接続要求を発側Ｗｅｂサーバ４１に送信する。接続要求は発信端末３１と発側Ｗｅｂサーバ４１との間に通信接続を確立するためのデータ信号であり、プッシュ要求により提供された情報の少なくとも一部（例えば、発信端末３１のデバイストークン、および発側補助セッションＩＤ）を含む。 In step S201, the originating terminal 31 transmits a connection request to the originating side Web server 41 in order to activate the application program for the speech-to-text service. The connection request is a data signal for establishing a communication connection between the originating terminal 31 and the originating Web server 41, and contains at least part of the information provided by the push request (e.g., the device token of the originating terminal 31 and Originating Auxiliary Session ID).

ステップＳ２０２では、発側Ｗｅｂサーバ４１と発側ＡＳ１３のサービスシナリオ部１３３との間で、発信者を認証するための処理が実行される。発側Ｗｅｂサーバ４１は、接続要求により提供された情報の少なくとも一部（例えば、発信端末３１のデバイストークン）を含む認証要求を発側ＡＳ１３に送信する。サービスシナリオ部１３３はその認証要求に応答して認証処理を実行する。例えば、サービスシナリオ部１３３はデバイストークンが有効か否かを検査する。サービスシナリオ部１３３はその処理結果を発側Ｗｅｂサーバ４１に送信する。本実施形態では、発信者が認証されることを前提とする。 In step S202, processing for authenticating the caller is executed between the calling side Web server 41 and the service scenario section 133 of the calling side AS 13. FIG. The calling-side Web server 41 transmits to the calling-side AS 13 an authentication request including at least part of the information provided by the connection request (for example, the device token of the calling terminal 31). The service scenario unit 133 executes authentication processing in response to the authentication request. For example, the service scenario unit 133 checks whether the device token is valid. The service scenario unit 133 transmits the processing result to the originating Web server 41 . In this embodiment, it is assumed that the caller is authenticated.

ステップＳ２０３では、発信端末３１が音声テキスト化サービスのためのアプリケーションプログラムを起動させて起動信号を発側Ｗｅｂサーバ４１に送信する。起動信号はそのアプリケーションプログラムを実行するためのデータ信号である。 In step S203, the originating terminal 31 activates the application program for the speech-to-text service and transmits an activation signal to the originating side Web server 41. FIG. A start signal is a data signal for executing the application program.

ステップＳ２０４では、発側Ｗｅｂサーバ４１がその起動信号に応答して発側ＡＳ１３にイベント通知を送信する。このイベント通知は発側エンドポイントおよび発側補助セッションＩＤを含む。 In step S204, the originating Web server 41 responds to the activation signal and transmits an event notification to the originating AS 13. FIG. This event notification includes the originating endpoint and the originating auxiliary session ID.

ステップＳ２０５では、発側ＡＳ１３のサービスシナリオ部１３３が発側エンドポイントをセッションデータベース１９に登録する。サービスシナリオ部１３３は、発側補助セッションＩＤに対応するセッション情報に発側エンドポイントを書き込む。この登録処理により、現在確立されている呼（セッション）での音声テキストを発側Ｗｅｂサーバ４１経由で発信端末３１に送信することが可能になる。 In step S 205 , the service scenario unit 133 of the originating AS 13 registers the originating endpoint in the session database 19 . The service scenario unit 133 writes the calling-side endpoint in the session information corresponding to the calling-side auxiliary session ID. This registration process enables the voice text of the currently established call (session) to be transmitted to the calling terminal 31 via the calling side Web server 41 .

着側でもステップＳ２０１～Ｓ２０５と同様の処理が実行される。その同様の処理をステップＳ２１１～Ｓ２１５として示す。 The processing similar to steps S201 to S205 is executed on the receiving side as well. Similar processing is shown as steps S211 to S215.

ステップＳ２１１では、着信端末３２が音声テキスト化サービスのためのアプリケーションプログラムを起動するために接続要求を着側Ｗｅｂサーバ４２に送信する。接続要求は、プッシュ要求により提供された情報の少なくとも一部（例えば、着信端末３２のデバイストークン、および着側補助セッションＩＤ）を含む。 In step S211, the receiving terminal 32 transmits a connection request to the receiving side Web server 42 in order to activate the application program for the speech-to-text service. The connection request includes at least some of the information provided by the push request (eg, the device token of the terminating terminal 32 and the terminating auxiliary session ID).

ステップＳ２１２では、着側Ｗｅｂサーバ４２と着側ＡＳ１４のサービスシナリオ部１４３との間で、発信者を認証するための処理が実行される。本実施形態では、着信者も認証されることを前提とする。 In step S212, a process for authenticating the caller is executed between the called-side Web server 42 and the service scenario section 143 of the called-side AS 14. FIG. In this embodiment, it is assumed that the called party is also authenticated.

ステップＳ２１３では、着信端末３２が音声テキスト化サービスのためのアプリケーションプログラムを起動させて起動信号を着側Ｗｅｂサーバ４２に送信する。 In step S213 , the receiving terminal 32 activates the application program for the speech-to-text service and transmits an activation signal to the receiving-side Web server 42 .

ステップＳ２１４では、着側Ｗｅｂサーバ４２がその起動信号に応答して着側ＡＳ１４にイベント通知を送信する。このイベント通知は着側エンドポイントおよび着側補助セッションＩＤを含む。 In step S214, the destination Web server 42 transmits an event notification to the destination AS 14 in response to the activation signal. This event notification includes the terminating endpoint and the terminating auxiliary session ID.

ステップＳ２１５では、着側ＡＳ１４のサービスシナリオ部１４３が着側エンドポイントをセッションデータベース１９に登録する。サービスシナリオ部１４３は、着側補助セッションＩＤに対応するレコードに着側エンドポイントを書き込む。この登録処理により、現在確立されている呼（セッション）での音声テキストを着側Ｗｅｂサーバ４２経由で着信端末３２に送信することが可能になる。 In step S 215 , the service scenario unit 143 of the terminating AS 14 registers the terminating endpoint in the session database 19 . The service scenario unit 143 writes the destination endpoint in the record corresponding to the destination auxiliary session ID. This registration process allows the voice text of the currently established call (session) to be sent to the called terminal 32 via the called Web server 42 .

発側では、ステップＳ２０５の後にステップＳ２０６，Ｓ２０７が実行される。ステップＳ２０６では、発信端末３１が、発信者が音声テキスト化サービスの利用に同意することを示す同意信号を発側Ｗｅｂサーバ４１に送信する。ステップＳ２０７では、発側Ｗｅｂサーバ４１がその同意信号に応答して発側ＡＳ１３にイベント通知を送信する。このイベント通知は発信者の同意を示す。これらの同意信号およびイベント通知はいずれも発側補助セッションＩＤを含む。 On the calling side, steps S206 and S207 are executed after step S205. In step S206, the calling terminal 31 transmits to the calling side Web server 41 a consent signal indicating that the caller agrees to use the speech-to-text service. In step S207, originating side Web server 41 transmits an event notification to originating side AS 13 in response to the consent signal. This event notification indicates the consent of the caller. Both these consent signals and event notifications contain the originating auxiliary session ID.

着側では、ステップＳ２１５の後にステップＳ２１６，Ｓ２１７が実行される。ステップＳ２１６では、着信端末３２が、着信者が音声テキスト化サービスの利用に同意することを示す同意信号を着側Ｗｅｂサーバ４２に送信する。ステップＳ２１７では、着側Ｗｅｂサーバ４２がその同意信号に応答して発側ＡＳ１３に向けてイベント通知を送信する。このイベント通知は着信者の同意を示す。これらの同意信号およびイベント通知はいずれも着側補助セッションＩＤを含む。 On the receiving side, steps S216 and S217 are executed after step S215. In step S216, the receiving terminal 32 transmits to the receiving side Web server 42 a consent signal indicating that the receiving party agrees to use the speech-to-text service. In step S217, the destination Web server 42 transmits an event notification to the originating AS 13 in response to the consent signal. This event notification indicates the consent of the callee. Both these consent signals and event notifications contain the called party auxiliary session ID.

ステップＳ２０８では、サービスシナリオ部１３３が、ステップＳ２０７，Ｓ２１７での二つのイベント通知に基づいて、確立された呼に対応するセッション情報の認識方向を「双方向」に設定する。具体的には、サービスシナリオ部１３３はセッションデータベース１９にアクセスして、発側または着側の補助セッションＩＤに対応するセッション情報を特定し、このセッション情報の認識方向を「双方向」に設定する。このように、サービスシナリオ部１３３は、発信端末３１および着信端末３２の双方から同意信号が送信されたことに応答して認識方向を「双方向」に設定する。この結果、ステップＳ２２０で示すように、発着側の双方で音声テキスト化サービスが実行される。 At step S208, the service scenario unit 133 sets the recognition direction of the session information corresponding to the established call to "bidirectional" based on the two event notifications at steps S207 and S217. Specifically, the service scenario unit 133 accesses the session database 19, identifies the session information corresponding to the auxiliary session ID of the originating side or the terminating side, and sets the recognition direction of this session information to "bidirectional". . In this way, the service scenario unit 133 sets the recognition direction to “two-way” in response to the consent signals being transmitted from both the calling terminal 31 and the receiving terminal 32 . As a result, as shown in step S220, the speech-to-text service is executed on both the originating and receiving sides.

次に、図５を参照しながら、音声テキスト化サービスを起動する処理の別の例を処理フローＳ２Ａとして説明する。この例は、通信端末での音声テキスト化サービスの開始のタイミングが発信端末３１と着信端末３２との間で異なる場合を示し、より具体的には、着信端末３２が発信端末３１よりも後に音声テキスト化サービスを開始する場合を示す。 Next, another example of processing for activating the speech-to-text service will be described as a processing flow S2A with reference to FIG. This example shows a case where the start timing of the speech-to-text service at the communication terminal is different between the calling terminal 31 and the called terminal 32. Indicates when to start the text conversion service.

処理フローＳ２Ａでも処理フローＳ２と同様に、発側ではステップＳ２０１～Ｓ２０７が実行される。音声テキスト化サービスのアプリケーションプログラムの起動に関する処理のタイミングが発側と着側とである程度大きく異なる場合には、発側ではステップＳ２０７の後にステップＳ２０８Ａが実行される。このステップＳ２０８Ａでは、サービスシナリオ部１３３が、ステップＳ２０７でのイベント通知に基づいて、確立された呼に対応するセッション情報（発側補助セッションＩＤに対応するセッション情報）の認識方向を「発側」に設定する。この結果、ステップＳ２２１に示すように、発信端末３１でのみ音声テキスト化サービスが実行される。 In the processing flow S2A, steps S201 to S207 are executed on the originating side in the same manner as in the processing flow S2. If the timing of the process for starting the application program of the speech-to-text service differs to some extent between the calling side and the called side, step S208A is executed on the calling side after step S207. In step S208A, service scenario section 133 sets the recognition direction of the session information corresponding to the established call (session information corresponding to the calling side auxiliary session ID) to "calling side" based on the event notification in step S207. set to As a result, as shown in step S221, the speech-to-text service is executed only at the calling terminal 31. FIG.

ステップＳ２２１の後に、着側でステップＳ２１１～Ｓ２１７が実行されると、発側ではステップＳ２０８Ｂが実行される。このステップＳ２０８Ｂでは、サービスシナリオ部１３３が、ステップＳ２１７でのイベント通知に基づいて、確立された呼に対応するセッション情報（着側補助セッションＩＤに対応するセッション情報）の認識方向を「発側」から「双方向」に更新する。このように、サービスシナリオ部１３３は、発信端末３１および着信端末３２の双方から同意信号が送信されたことに応答して認識方向を「双方向」に設定する。この結果、ステップＳ２２２で示すように、発着側の双方で音声テキスト化サービスが実行可能になる。ステップＳ２２２は処理フローＳ２におけるステップＳ２２０と同じである。 After step S221, when steps S211 to S217 are executed on the receiving side, step S208B is executed on the calling side. In this step S208B, the service scenario unit 133 sets the recognition direction of the session information corresponding to the established call (session information corresponding to the receiving side auxiliary session ID) to "calling side" based on the event notification in step S217. to 'bidirectional' update. In this way, the service scenario unit 133 sets the recognition direction to “two-way” in response to the consent signals being transmitted from both the calling terminal 31 and the receiving terminal 32 . As a result, as shown in step S222, the voice-to-text conversion service can be executed on both the originating and terminating sides. Step S222 is the same as step S220 in the processing flow S2.

次に、図６を参照しながら、音声テキストを通信端末上に表示する処理の例を処理フローＳ３として説明する。処理フローＳ３は、発着側の双方で音声テキスト化サービスが実行可能になったこと（すなわち、ステップＳ２２０またはＳ２２２）を前提とする。 Next, an example of processing for displaying voice text on the communication terminal will be described as processing flow S3 with reference to FIG. The process flow S3 assumes that the voice-to-text service has become executable on both the originating and receiving sides (that is, step S220 or S222).

ステップＳ３０１～Ｓ３０９は、着信者の音声（着側音声）をテキストに変換して、その音声テキストを発信端末３１および着信端末３２の双方に表示にする処理を示す。 Steps S 301 to S 309 show processing for converting the voice of the called party (called voice) into text and displaying the voice text on both the calling terminal 31 and the called terminal 32 .

ステップＳ３０１では、着信端末３２から送信された音声データ（着側音声）が着側ネットワーク２２を介してコアネットワーク１０に送られ、着側ＣＳＣＦ１２、発側ＣＳＣＦ１１、発側ＡＳ１３などの通信制御装置を経由して発側ＭＣＥ１５に送信される。ステップＳ３０２では発側ＭＣＥ１５がその音声データを音声認識エンジン４３に送信する。ステップＳ３０３では、音声認識エンジン４３がその音声データに対して音声認識を実行することで着側音声をテキストに変換し、その音声テキストを発側ＭＣＥ１５に送信する。この音声テキストは着側テキストに相当する。 In step S301, voice data (receiving-side voice) transmitted from the receiving terminal 32 is sent to the core network 10 via the receiving-side network 22, and the communication control devices such as the receiving-side CSCF 12, the calling-side CSCF 11, and the calling-side AS 13 are It is transmitted to the originating side MCE 15 via. In step S302 , the originating MCE 15 transmits the voice data to the voice recognition engine 43 . In step S303 , the speech recognition engine 43 converts the incoming-side speech into text by executing speech recognition on the speech data, and transmits the speech text to the calling-side MCE 15 . This voice text corresponds to the destination text.

ステップＳ３０４では、発側ＭＣＥ１５が、その音声テキストと、発話者が誰であるかを示す発話種別とを含む認識結果を発側Ｗｅｂサーバ４１に送信する。音声テキストは着側音声を示すので、このステップで送信される認識結果では、発話種別は着側を示す。ステップＳ３０５では、発側ＭＣＥ１５がその認識結果を着側Ｗｅｂサーバ４２にも送信する。発側ＭＣＥ１５は発側ＡＳ１３を介して現在の呼に対応するセッション情報をセッションデータベース１９から取得する。セッション情報の認識方向が「双方向」であることに応答して、発側ＭＣＥ１５はそのセッション情報から発側エンドポイントおよび着側エンドポイントを取得する。発側ＭＣＥ１５はこれらのエンドポイントにより認識結果の送信先（すなわち、発側Ｗｅｂサーバ４１および着側Ｗｅｂサーバ４２）を取得することができる。このように、発側ＭＣＥ１５は、認識方向が「双方向」であることに応答して着側テキストを発側Ｗｅｂサーバ４１および着側Ｗｅｂサーバ４２の双方に向けて送信する。 In step S304, the calling side MCE 15 transmits to the calling side Web server 41 the recognition result including the voice text and the utterance type indicating who the speaker is. Since the speech text indicates the speech of the called party, the speech type indicates the called party in the recognition result sent in this step. In step S305 , the originating MCE 15 also transmits the recognition result to the destination Web server 42 . The originating MCE 15 acquires session information corresponding to the current call from the session database 19 via the originating AS 13 . In response to the recognition direction of the session information being "two-way", the originating MCE 15 acquires the originating endpoint and the terminating endpoint from the session information. The originating MCE 15 can acquire the destination of the recognition result (that is, the originating Web server 41 and the terminating Web server 42) from these endpoints. In this way, the originating MCE 15 transmits the destination text to both the originating Web server 41 and the destination Web server 42 in response to the "bidirectional" recognition direction.

ステップＳ３０６では、発側Ｗｅｂサーバ４１が発信端末３１に認識結果を送信する。発側Ｗｅｂサーバ４１は、認識結果に含まれる発話種別が着側であることに基づいて、音声テキストが通話相手のものとして発信端末３１上に表示されるように、音声テキストを含むデータを生成する。 In step S306 , the calling-side Web server 41 transmits the recognition result to the calling terminal 31 . The originating side Web server 41 generates data including the voice text so that the voice text is displayed on the calling terminal 31 as that of the other party based on the fact that the utterance type included in the recognition result is the called side. do.

ステップＳ３０７では、発信端末３１がそのデータに基づいて、音声テキストを着信者（通話相手）のものとして画面上に表示する。これにより、発信者は相手が話した内容を視覚的に認識できる。 In step S307, the originating terminal 31 displays the voice text on the screen as that of the called party (communication partner) based on the data. This allows the caller to visually recognize what the other party has said.

ステップＳ３０８では、着側Ｗｅｂサーバ４２が着信端末３２に認識結果を送信する。着側Ｗｅｂサーバ４２は、認識結果に含まれる発話種別が着側であることに基づいて、音声テキストが着信者自身のものとして着信端末３２上に表示されるように、音声テキストを含むデータを生成する。 In step S308 , the receiving-side Web server 42 transmits the recognition result to the receiving terminal 32 . Based on the fact that the utterance type included in the recognition result is that of the called party, the called-side Web server 42 renders the data including the voice text so that the voice text is displayed on the called terminal 32 as that of the called party. Generate.

ステップＳ３０９では、着信端末３２がそのデータに基づいて、音声テキストを着信者自身のものとして画面上に表示する。これにより、着信者は自分の発話を視覚的に認識できる。 In step S309, the receiving terminal 32 displays the voice text on the screen based on the data as the text of the receiving party. This allows the called party to visually recognize his/her speech.

ステップＳ３１０～Ｓ３１８は、発信者の音声（発側音声）をテキストに変換して、その音声テキストを発信端末３１および着信端末３２の双方に表示にする処理を示す。 Steps S 310 to S 318 show processing for converting the voice of the caller (calling side voice) into text and displaying the voice text on both the calling terminal 31 and the receiving terminal 32 .

ステップＳ３１０では、発信端末３１から送信された音声データ（発側音声）が発側ネットワーク２１を介してコアネットワーク１０に送られ、発側ＣＳＣＦ１１および発側ＡＳ１３を経由して発側ＭＣＥ１５に送信される。ステップＳ３１１では発側ＭＣＥ１５がその音声データを音声認識エンジン４３に送信する。ステップＳ３１２では、音声認識エンジン４３がその音声データに対して音声認識を実行することで発側音声をテキストに変換し、その音声テキストを発側ＭＣＥ１５に送信する。この音声テキストは発側テキストに相当する。 In step S310, voice data (calling-side voice) transmitted from the calling terminal 31 is sent to the core network 10 via the calling-side network 21, and sent to the calling-side MCE 15 via the calling-side CSCF 11 and calling-side AS 13. be. In step S311 , the originating MCE 15 transmits the voice data to the voice recognition engine 43 . In step S312 , the voice recognition engine 43 converts the calling side voice into text by executing voice recognition on the voice data, and transmits the voice text to the calling side MCE 15 . This voice text corresponds to the caller text.

ステップＳ３１３では、発側ＭＣＥ１５が、その音声テキストと、発話者が誰であるかを示す発話種別とを含む認識結果を発側Ｗｅｂサーバ４１に送信する。音声テキストは発側音声を示すので、このステップで送信される認識結果では、発話種別は発側を示す。ステップＳ３１４では、発側ＭＣＥ１５がその認識結果を着側Ｗｅｂサーバ４２にも送信する。発側ＭＣＥ１５は発側ＡＳ１３を介して、現在の呼に対応するセッション情報をセッションデータベース１９から取得する。セッション情報の認識方向が「双方向」であることに応答して、発側ＭＣＥ１５はそのセッション情報から発側エンドポイントおよび着側エンドポイントを取得し、これにより発側Ｗｅｂサーバ４１および着側Ｗｅｂサーバ４２を特定できる。このように、発側ＭＣＥ１５は、認識方向が「双方向」であることに応答して発側テキストを発側Ｗｅｂサーバ４１および着側Ｗｅｂサーバ４２の双方に向けて送信する。 In step S313, the calling side MCE 15 transmits to the calling side Web server 41 the recognition result including the voice text and the utterance type indicating who the speaker is. Since the speech text indicates the caller's voice, the utterance type indicates the caller's side in the recognition result sent in this step. In step S314 , the originating MCE 15 also transmits the recognition result to the destination Web server 42 . The originating MCE 15 acquires session information corresponding to the current call from the session database 19 via the originating AS 13 . In response to the fact that the recognition direction of the session information is "bi-directional", the originating MCE 15 obtains the originating end point and the terminating end point from the session information. A server 42 can be identified. In this way, the calling side MCE 15 transmits the calling side text to both the calling side Web server 41 and the called side Web server 42 in response to the fact that the recognition direction is "bidirectional".

ステップＳ３１５では、発側Ｗｅｂサーバ４１が発信端末３１に認識結果を送信する。発側Ｗｅｂサーバ４１は、認識結果に含まれる発話種別が発側であることに基づいて、音声テキストが発信者自身のものとして発信端末３１上に表示されるように、音声テキストを含むデータを生成する。 In step S315 , the calling-side Web server 41 transmits the recognition result to the calling terminal 31 . Based on the fact that the utterance type included in the recognition result is that of the calling party, the originating side Web server 41 renders the data including the voice text so that the voice text is displayed on the calling terminal 31 as that of the caller himself/herself. Generate.

ステップＳ３１６では、発信端末３１がそのデータに基づいて、音声テキストを発信者自身のものとして画面上に表示する。これにより、発信者は自分の発話を視覚的に認識できる。 In step S316, the calling terminal 31 displays the voice text on the screen as the caller's own text based on the data. This allows the caller to visually recognize their own speech.

ステップＳ３１７では、着側Ｗｅｂサーバ４２が着信端末３２に認識結果を送信する。着側Ｗｅｂサーバ４２は、認識結果に含まれる発話種別が発側であることに基づいて、音声テキストが通話相手のものとして着信端末３２上に表示されるように、音声テキストを含むデータを生成する。 In step S317 , the receiving-side Web server 42 transmits the recognition result to the receiving terminal 32 . The receiving-side Web server 42 generates data including the voice text so that the voice text is displayed on the receiving terminal 32 as that of the other party, based on the fact that the utterance type included in the recognition result is that of the calling party. do.

ステップＳ３１８では、着信端末３２がそのデータに基づいて、音声テキストを発信者（通話相手）のものとして画面上に表示する。これにより、着信者は相手が話した内容を視覚的に認識できる。 In step S318, the receiving terminal 32 displays the voice text on the screen as that of the caller (caller) based on the data. This allows the called party to visually recognize what the other party has said.

このように、双方のＷｅｂサーバは発話種別に基づいて音声テキストの表示態様を設定する。音声テキストを発話者自身または通話相手のものとして表示する手法は何ら限定されず、任意の手法が採用されてよい。Ｗｅｂサーバは発話種別に応じて音声テキストの表示位置（たとえば、音声テキストの吹き出しの表示位置）を変えてもよい。例えば、Ｗｅｂサーバは、発話者自身の音声テキストが右側（一方の側の一例）に表示され、通話相手の音声テキストが左側（他方の側の一例）に表示されるように表示態様を制御してもよい。あるいは、Ｗｅｂサーバは発話種別に応じて、音声テキストのフォントを変えてもよいし、吹き出しの形状または背景色を変えてもよい。 In this way, both Web servers set the display mode of the voice text based on the utterance type. The method of displaying the voice text as that of the speaker or the other party is not limited at all, and any method may be adopted. The Web server may change the display position of the voice text (for example, the display position of the balloon of the voice text) according to the utterance type. For example, the Web server controls the display mode so that the speaker's own voice text is displayed on the right side (an example of one side) and the voice text of the other party is displayed on the left side (an example of the other side). may Alternatively, the Web server may change the font of the voice text, the shape of the balloon, or the background color according to the type of utterance.

発話種別に基づく音声テキストの表示態様の設定は発信端末３１および着信端末３２で実行されてもよい。具体的には、発側Ｗｅｂサーバ４１および着側Ｗｅｂサーバ４２のそれぞれが、音声テキストと共に発話種別も、対応する通信端末に送信することで、該通信端末にその発話種別に基づいて音声テキストの表示態様を設定させてもよい。この仕組みによっても、発信端末３１および着信端末３２のそれぞれは、表示位置、フォント、吹き出しの形状または背景色などの表示態様を設定することができる。 The setting of the display mode of the voice text based on the utterance type may be performed by the calling terminal 31 and the receiving terminal 32 . Specifically, each of the originating-side Web server 41 and the receiving-side Web server 42 transmits the utterance type together with the voice text to the corresponding communication terminal, so that the communication terminal receives the voice text based on the utterance type. A display mode may be set. This mechanism also allows each of the calling terminal 31 and the receiving terminal 32 to set the display mode, such as the display position, font, balloon shape, and background color.

本実施形態ではコアネットワーク１０がＩＭＳネットワークであるが、本開示に係る呼制御システムは任意の種類のコアネットワークに適用されてもよい。これに関連して、本開示に係る呼制御システムはＳＩＰ以外の通信プロトコルを用いてもよい。 Although the core network 10 is an IMS network in this embodiment, the call control system according to the present disclosure may be applied to any type of core network. In this regard, the call control system according to the present disclosure may use communication protocols other than SIP.

発側ＡＳ１３に実装される機能要素の少なくとも一部は、発側ＡＳ１３以外の通信制御装置に実装されてもよい。同様に、着側ＡＳ１４に実装される機能要素の少なくとも一部は、着側ＡＳ１４以外の通信制御装置に実装されてもよい。 At least part of the functional elements implemented in the originating AS 13 may be implemented in a communication control device other than the originating AS 13 . Similarly, at least part of the functional elements implemented in the destination AS 14 may be implemented in a communication control device other than the destination AS 14 .

上記実施形態の説明に用いたブロック図は、機能単位のブロックを示している。これらの機能ブロック（構成部）は、ハードウェア及びソフトウェアの少なくとも一方の任意の組み合わせによって実現される。また、各機能ブロックの実現方法は特に限定されない。すなわち、各機能ブロックは、物理的又は論理的に結合した１つの装置を用いて実現されてもよいし、物理的又は論理的に分離した２つ以上の装置を直接的又は間接的に（例えば、有線、無線などを用いて）接続し、これら複数の装置を用いて実現されてもよい。機能ブロックは、上記１つの装置又は上記複数の装置にソフトウェアを組み合わせて実現されてもよい。 The block diagrams used in the description of the above embodiments show blocks for each function. These functional blocks (components) are realized by any combination of at least one of hardware and software. Also, the method of implementing each functional block is not particularly limited. That is, each functional block may be implemented using one device that is physically or logically coupled, or directly or indirectly using two or more devices that are physically or logically separated (e.g. , wired, wireless, etc.) and may be implemented using these multiple devices. A functional block may be implemented by combining software in the one device or the plurality of devices.

機能には、判断、決定、判定、計算、算出、処理、導出、調査、探索、確認、受信、送信、出力、アクセス、解決、選択、選定、確立、比較、想定、期待、見做し、報知（broadcasting）、通知（notifying）、通信（communicating）、転送（forwarding）、構成（configuring）、再構成（reconfiguring）、割り当て（allocating、mapping）、割り振り（assigning）などがあるが、これらに限られない。たとえば、送信を機能させる機能ブロック（構成部）は、送信部（transmitting unit）や送信機（transmitter）と呼称される。いずれも、上述したとおり、実現方法は特に限定されない。 Functions include judging, determining, determining, calculating, calculating, processing, deriving, investigating, searching, checking, receiving, transmitting, outputting, accessing, resolving, selecting, choosing, establishing, comparing, assuming, expecting, assuming, Broadcasting, notifying, communicating, forwarding, configuring, reconfiguring, allocating, mapping, assigning, etc. can't For example, a functional block (component) that makes transmission work is called a transmitting unit or a transmitter. In either case, as described above, the implementation method is not particularly limited.

例えば、本開示の一実施の形態における通信制御装置は、本開示の処理を行うコンピュータとして機能してもよい。図７は、その通信制御装置として機能するコンピュータ１００のハードウェア構成の一例を示す図である。コンピュータ１００は、物理的には、プロセッサ１００１、メモリ１００２、ストレージ１００３、通信装置１００４、入力装置１００５、出力装置１００６、バス１００７などを含んでもよい。 For example, a communication control device according to an embodiment of the present disclosure may function as a computer that performs processing of the present disclosure. FIG. 7 is a diagram showing an example of the hardware configuration of computer 100 that functions as the communication control device. Computer 100 may physically include processor 1001, memory 1002, storage 1003, communication device 1004, input device 1005, output device 1006, bus 1007, and the like.

なお、以下の説明では、「装置」という文言は、回路、デバイス、ユニットなどに読み替えることができる。通信制御装置のハードウェア構成は、図に示した各装置を１つ又は複数含むように構成されてもよいし、一部の装置を含まずに構成されてもよい。 Note that in the following description, the term "apparatus" can be read as a circuit, device, unit, or the like. The hardware configuration of the communication control device may be configured to include one or more of each device shown in the figure, or may be configured without some of the devices.

通信制御装置における各機能は、プロセッサ１００１、メモリ１００２などのハードウェア上に所定のソフトウェア（プログラム）を読み込ませることによって、プロセッサ１００１が演算を行い、通信装置１００４による通信を制御したり、メモリ１００２及びストレージ１００３におけるデータの読み出し及び書き込みの少なくとも一方を制御したりすることによって実現される。 Each function in the communication control device is performed by causing the processor 1001 to perform calculations, controlling communication by the communication device 1004, and controlling the communication by the memory 1002 by loading predetermined software (programs) onto hardware such as the processor 1001 and the memory 1002. and by controlling at least one of reading and writing of data in the storage 1003 .

プロセッサ１００１は、例えば、オペレーティングシステムを動作させてコンピュータ全体を制御する。プロセッサ１００１は、周辺装置とのインターフェース、制御装置、演算装置、レジスタなどを含む中央処理装置（ＣＰＵ：Central Processing Unit）によって構成されてもよい。 The processor 1001, for example, operates an operating system to control the entire computer. The processor 1001 may be configured by a central processing unit (CPU) including an interface with peripheral devices, a control device, an arithmetic device, registers, and the like.

また、プロセッサ１００１は、プログラム（プログラムコード）、ソフトウェアモジュール、データなどを、ストレージ１００３及び通信装置１００４の少なくとも一方からメモリ１００２に読み出し、これらに従って各種の処理を実行する。プログラムとしては、上述の実施の形態において説明した動作の少なくとも一部をコンピュータに実行させるプログラムが用いられる。例えば、通信制御装置の各機能要素は、メモリ１００２に格納され、プロセッサ１００１において動作する制御プログラムによって実現されてもよい。上述の各種処理は、１つのプロセッサ１００１によって実行される旨を説明してきたが、２以上のプロセッサ１００１により同時又は逐次に実行されてもよい。プロセッサ１００１は、１以上のチップによって実装されてもよい。なお、プログラムは、電気通信回線を介してネットワークから送信されてもよい。 The processor 1001 also reads programs (program codes), software modules, data, etc. from at least one of the storage 1003 and the communication device 1004 to the memory 1002, and executes various processes according to them. As the program, a program that causes a computer to execute at least part of the operations described in the above embodiments is used. For example, each functional element of the communication control device may be implemented by a control program stored in memory 1002 and running on processor 1001 . Although it has been explained that the above-described various processes are executed by one processor 1001, they may be executed simultaneously or sequentially by two or more processors 1001. FIG. Processor 1001 may be implemented by one or more chips. Note that the program may be transmitted from a network via an electric communication line.

メモリ１００２は、コンピュータ読み取り可能な記録媒体であり、例えば、ＲＯＭ（Read Only Memory）、ＥＰＲＯＭ（Erasable Programmable ＲＯＭ）、ＥＥＰＲＯＭ（Electrically Erasable Programmable ＲＯＭ）、ＲＡＭ（Random Access Memory）などの少なくとも１つによって構成されてもよい。メモリ１００２は、レジスタ、キャッシュ、メインメモリ（主記憶装置）などと呼ばれてもよい。メモリ１００２は、本開示の一実施の形態に係る方法を実施するために実行可能なプログラム（プログラムコード）、ソフトウェアモジュールなどを保存することができる。 The memory 1002 is a computer-readable recording medium, and is composed of at least one of, for example, ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), and RAM (Random Access Memory). may be The memory 1002 may also be called a register, cache, main memory (main storage device), or the like. The memory 1002 can store executable programs (program code), software modules, etc. to perform a method according to an embodiment of the present disclosure.

ストレージ１００３は、コンピュータ読み取り可能な記録媒体であり、例えば、ＣＤ－ＲＯＭ（Compact Disc ＲＯＭ）などの光ディスク、ハードディスクドライブ、フレキシブルディスク、光磁気ディスク(例えば、コンパクトディスク、デジタル多用途ディスク、Ｂｌｕ－ｒａｙ（登録商標）ディスク)、スマートカード、フラッシュメモリ(例えば、カード、スティック、キードライブ)、フロッピー（登録商標）ディスク、磁気ストリップなどの少なくとも１つによって構成されてもよい。ストレージ１００３は、補助記憶装置と呼ばれてもよい。上述の記憶媒体は、例えば、メモリ１００２及びストレージ１００３の少なくとも一方を含むデータベース、サーバその他の適切な媒体であってもよい。 The storage 1003 is a computer-readable recording medium, for example, an optical disc such as a CD-ROM (Compact Disc ROM), a hard disk drive, a flexible disc, a magneto-optical disc (for example, a compact disc, a digital versatile disc, a Blu-ray disk), smart card, flash memory (eg, card, stick, key drive), floppy disk, magnetic strip, and/or the like. Storage 1003 may also be called an auxiliary storage device. The storage medium described above may be, for example, a database, server, or other suitable medium including at least one of memory 1002 and storage 1003 .

通信装置１００４は、有線ネットワーク及び無線ネットワークの少なくとも一方を介してコンピュータ間の通信を行うためのハードウェア（送受信デバイス）であり、例えばネットワークデバイス、ネットワークコントローラ、ネットワークカード、通信モジュールなどともいう。通信装置１００４は、例えば周波数分割複信（ＦＤＤ：Frequency Division Duplex）及び時分割複信（ＴＤＤ：Time Division Duplex）の少なくとも一方を実現するために、高周波スイッチ、デュプレクサ、フィルタ、周波数シンセサイザなどを含んで構成されてもよい。 The communication device 1004 is hardware (transmitting/receiving device) for communicating between computers via at least one of a wired network and a wireless network, and is also called a network device, a network controller, a network card, a communication module, or the like. The communication device 1004 includes a high-frequency switch, a duplexer, a filter, a frequency synthesizer, and the like, for example, to implement at least one of frequency division duplex (FDD) and time division duplex (TDD). may consist of

入力装置１００５は、外部からの入力を受け付ける入力デバイス（例えば、キーボード、マウス、マイクロフォン、スイッチ、ボタン、センサなど）である。出力装置１００６は、外部への出力を実施する出力デバイス（例えば、ディスプレイ、スピーカー、LEDランプなど）である。なお、入力装置１００５及び出力装置１００６は、一体となった構成（例えば、タッチパネル）であってもよい。 The input device 1005 is an input device (for example, keyboard, mouse, microphone, switch, button, sensor, etc.) that receives input from the outside. The output device 1006 is an output device (eg, display, speaker, LED lamp, etc.) that outputs to the outside. Note that the input device 1005 and the output device 1006 may be integrated (for example, a touch panel).

また、プロセッサ１００１、メモリ１００２などの各装置は、情報を通信するためのバス１００７によって接続される。バス１００７は、単一のバスを用いて構成されてもよいし、装置間ごとに異なるバスを用いて構成されてもよい。 Devices such as the processor 1001 and the memory 1002 are connected by a bus 1007 for communicating information. The bus 1007 may be configured using a single bus, or may be configured using different buses between devices.

また、コンピュータ１００は、マイクロプロセッサ、デジタル信号プロセッサ（ＤＳＰ：Digital Signal Processor）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＰＬＤ（Programmable Logic Device）、ＦＰＧＡ（Field Programmable Gate Array）などのハードウェアを含んで構成されてもよく、当該ハードウェアにより、各機能ブロックの一部又は全てが実現されてもよい。例えば、プロセッサ１００１は、これらのハードウェアの少なくとも１つを用いて実装されてもよい。 In addition, the computer 100 includes hardware such as a microprocessor, a digital signal processor (DSP), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), and an FPGA (Field Programmable Gate Array). A part or all of each functional block may be implemented by the hardware. For example, processor 1001 may be implemented using at least one of these pieces of hardware.

以上説明したように、本開示の一側面に係る呼制御システムは、発信端末と着信端末との間で伝送される通話をテキストに変換する音声テキスト化サービスを実行可能である。呼制御システムは、発信端末を利用する発信者と着信端末を利用する着信者との双方が音声テキスト化サービスの利用者である場合に、発信端末に対応する発側メディア処理装置と着信端末に対応する着側メディア処理装置とのうちの一方を共通のメディア処理装置として機能させる制御部を備える。共通のメディア処理装置は、発信者または着信者の音声をテキストに変換する音声認識エンジンと接続する。共通のメディア処理装置は、発信端末から送信された発信者の発側音声を音声認識エンジンに入力することで発側テキストを取得し、発側テキストを発信端末および着信端末の双方に向けて送信する。共通のメディア処理装置は、着信端末から送信された着信者の着側音声を音声認識エンジンに入力することで着側テキストを取得し、着側テキストを発信端末および着信端末の双方に向けて送信する。 As described above, the call control system according to one aspect of the present disclosure can execute a speech-to-text conversion service that converts a call transmitted between a calling terminal and a called terminal into text. In the call control system, when both the caller using the calling terminal and the called party using the called terminal are users of the voice-to-text service, the calling-side media processing device corresponding to the calling terminal and the called terminal A control unit is provided that causes one of the corresponding destination media processing devices to function as a common media processing device. A common media processing unit interfaces with a speech recognition engine that converts the caller's or called party's speech to text. The common media processing device acquires the caller text by inputting the caller's caller's voice transmitted from the caller terminal into the speech recognition engine, and transmits the caller text to both the caller terminal and the callee terminal. do. The common media processing device acquires the called-side text by inputting the called-side voice of the called party transmitted from the called terminal into the speech recognition engine, and transmits the called-side text to both the calling terminal and the called terminal. do.

また、発側メディア処理装置と着側メディア処理装置の双方を用いるのではなく、そのうちの一方が用いられるので、音声テキスト化サービスを実行するために用いられるハードウェア資源および利用ライセンス数の少なくとも一方を節約することができる。また、音声テキスト化サービスに関連するメッセージ（例えばガイダンス）を、共通のメディア処理装置から発信端末および着信端末の双方に送信することも可能になる。 Also, since one of the originating media processing device and the terminating media processing device is used instead of both, at least one of the hardware resources and the number of usage licenses used to execute the speech-to-text service is used. can be saved. It also enables messages (eg, guidance) related to speech-to-text services to be sent from a common media processing device to both the originating terminal and the terminating terminal.

他の側面に係る呼制御システムでは、制御部が発側メディア処理装置を共通のメディア処理装置として機能させてもよい。或る同一種類の処理が実行されるタイミングは着側よりも発側の方が早い。したがって、発側メディア処理装置を共通のメディア処理装置として用いることで、音声テキスト化サービスに関連する処理を早く開始することができ、その分、音声テキスト化サービスをより早くユーザに提供することが可能になる。 In a call control system according to another aspect, the control unit may cause the originating media processing device to function as a common media processing device. The timing at which the same type of processing is executed is earlier on the originating side than on the receiving side. Therefore, by using the originating side media processing device as a common media processing device, it is possible to start processing related to the voice-to-text service earlier, and to provide the user with the voice-to-text service more quickly. be possible.

他の側面に係る呼制御システムでは、制御部が、発側メディア処理装置を一意に特定する発側メディア装置ＩＤを着側メディア処理装置に向けて送信し、発側メディア装置ＩＤを受信した着側メディア処理装置から、着側メディア処理装置を一意に特定する着側メディア装置ＩＤを受信し、着側メディア装置ＩＤの受信に応答して、発側メディア処理装置を共通のメディア処理装置として機能させてもよい。発側および着側の双方のメディア処理装置の識別子を取得することで共通のメディア処理装置を確実に機能させることができる。 In the call control system according to another aspect, the control unit transmits a calling-side media device ID that uniquely identifies the calling-side media processing device to the called-side media processing device. Receiving a destination media device ID that uniquely identifies a destination media processing device from a media processing device on the side, and in response to receiving the media device ID on the destination side, the originating media processing device functions as a common media processing device. You may let By acquiring the identifiers of the media processing devices on both the originating side and the terminating side, the common media processing device can be reliably operated.

他の側面に係る呼制御システムでは、発側メディア処理装置が、発側テキストまたは着側テキストを発信端末に送信する発側Ｗｅｂサーバと接続し、着側メディア処理装置が、発側テキストまたは着側テキストを着信端末に送信する着側Ｗｅｂサーバと接続してもよい。呼制御システムは、発側Ｗｅｂサーバを一意に特定する発側エンドポイントと、着側Ｗｅｂサーバを一意に特定する着側エンドポイントとを含むセッション情報を記憶するデータベースをさらに備えてもよい。共通のメディア処理装置は、セッション情報の発側エンドポイントおよび着側エンドポイントを取得し、発側エンドポイントに基づいて、発側テキストまたは着側テキストを発側Ｗｅｂサーバに送信することで、発側テキストまたは着側テキストを発信端末に向けて送信し、着側エンドポイントに基づいて、発側テキストまたは着側テキストを着側Ｗｅｂサーバに送信することで、発側テキストまたは着側テキストを着信端末に向けて送信してもよい。そのエンドポイントを参照することで、テキストを送信すべきＷｅｂサーバを特定することができる。 In a call control system according to another aspect, a calling-side media processing device connects to a calling-side web server that transmits calling-side text or called-side text to a calling terminal, and a called-side media processing device transmits calling-side text or called-side text to a calling terminal. It may also be connected to a destination web server that sends the destination text to the destination terminal. The call control system may further comprise a database that stores session information including an originating endpoint that uniquely identifies the originating Web server and a terminating endpoint that uniquely identifies the terminating Web server. The common media processing device obtains the originating end point and the terminating end point of the session information, and transmits the originating text or the terminating text to the originating web server based on the originating end point. Terminate the calling or called text by sending the calling or called text to the originating terminal and sending the calling or called text to the called Web server based on the called endpoint. You can send it to your terminal. By referring to the endpoint, the web server to which the text should be sent can be specified.

他の側面に係る呼制御システムでは、制御部が、ユーザが音声テキスト化サービスの利用に同意することを示す同意信号が発信端末および着信端末の双方から送信されたことに応答して、音声テキストをどの通信端末に送信するかを示す認識方向を双方向に設定し、共通のメディア処理装置が、認識方向が双方向であることに応答して、発側テキストまたは着側テキストを発側Ｗｅｂサーバおよび着側Ｗｅｂサーバの双方に向けて送信してもよい。ユーザの同意に応じて認識方向を設定することで、発信者および着信者の双方が音声テキスト化サービスを希望する場合にのみその双方にテキストを送信することが可能になる。 In the call control system according to another aspect, the control unit generates the speech text in response to the consent signal indicating that the user agrees to use the speech-to-text service from both the calling terminal and the receiving terminal. is set to bidirectional, and the common media processing device transmits the originating side text or the terminating side text to the originating side Web in response to the fact that the recognition direction is bidirectional. It may be sent to both the server and the destination Web server. By setting the recognition direction according to the user's consent, it is possible to send the text to both the caller and the called party only if they both want the speech-to-text service.

他の側面に係る呼制御システムでは、共通のメディア処理装置が、発側テキストおよび着側テキストのそれぞれについて、発話者が発信者および着信者のどちらであるかを示す発話種別をさらに発側Ｗｅｂサーバおよび着側Ｗｅｂサーバの双方に送信してもよい。この発話種別がＷｅｂサーバに提供されることで、Ｗｅｂサーバは発話者の種類に応じてテキストを処理することができる。 In the call control system according to another aspect, the common media processing device further adds a speech type indicating whether the speaker is a caller or a callee for each of the caller text and the callee text. It may be sent to both the server and the destination Web server. By providing this utterance type to the web server, the web server can process the text according to the type of the utterer.

他の側面に係る呼制御システムでは、発側Ｗｅｂサーバは、発話種別が発信者を示す場合には、発信端末上で発側テキストが発話者自身の音声テキストとして表示されるように発側テキストの表示態様を設定し、発話種別が着信者を示す場合には、発信端末上で着側テキストが通話相手の音声テキストとして表示されるように着側テキストの表示態様を設定してもよい。着側Ｗｅｂサーバは、発話種別が発信者を示す場合には、着信端末上で発側テキストが通話相手の音声テキストとして表示されるように発側テキストの表示態様を設定し、発話種別が着信者を示す場合には、着信端末上で着側テキストが発話者自身の音声テキストとして表示されるように着側テキストの表示態様を設定してもよい。 In the call control system according to another aspect, when the utterance type indicates a caller, the caller-side Web server displays the caller-side text as the voice text of the speaker himself/herself on the calling terminal. is set, and when the utterance type indicates the called party, the display mode of the called party text may be set so that the called party text is displayed as the voice text of the other party on the calling terminal. When the utterance type indicates the caller, the receiving-side Web server sets the display mode of the calling-side text so that the caller-side text is displayed as the voice text of the other party on the receiving terminal, and In the case of indicating the person, the display mode of the called party text may be set so that the called party text is displayed as the voice text of the speaker himself/herself on the called terminal.

発側および着側のそれぞれで、発話種別に応じて上記のようにテキストの表示態様を設定することで、通信端末の利用者と発話者との関係に応じてテキストを表示することができる。通信端末は自機のユーザの音声テキストと通話相手の音声テキストとを互いに異なる表示態様で表示し、このことは、音声テキスト化サービスのユーザインタフェースの改善に寄与し得る。 By setting the text display mode as described above according to the speech type on each of the caller and callee, the text can be displayed according to the relationship between the user of the communication terminal and the speaker. The communication terminal displays the speech text of the user of the own device and the speech text of the other party in different display modes, which can contribute to improving the user interface of the speech-to-text service.

他の側面に係る呼制御システムでは、発側Ｗｅｂサーバは、発話種別が発信者を示す場合には、発信端末上で発側テキストが発話者自身の音声テキストとして表示されるように発側テキストを発信端末上の第１の側に表示させ、発話種別が着信者を示す場合には、発信端末上で着側テキストが通話相手の音声テキストとして表示されるように着側テキストを発信端末上の第２の側に表示させてもよい。着側Ｗｅｂサーバは、発話種別が発信者を示す場合には、着信端末上で発側テキストが通話相手の音声テキストとして表示されるように発側テキストを着信端末上の第１の側に表示させ、発話種別が着信者を示す場合には、着信端末上で着側テキストが発話者自身の音声テキストとして表示されるように着側テキストを着信端末上の第２の側に表示させてもよい。 In the call control system according to another aspect, when the utterance type indicates a caller, the caller-side Web server displays the caller-side text as the voice text of the speaker himself/herself on the calling terminal. is displayed on the first side on the calling terminal, and if the utterance type indicates called party, the called text is displayed on the calling terminal as the spoken text of the other party on the calling terminal. may be displayed on the second side of the The called-side Web server displays the calling-side text on the first side of the called terminal such that the called-side text is displayed as the spoken text of the other party on the called terminal when the utterance type indicates the caller. and if the utterance type indicates a called party, the called party text may be displayed on the second side of the called terminal such that the called party text is displayed as the speaker's own voice text on the called terminal. good.

発側および着側のそれぞれで、発話種別に応じて上記のようにテキストの表示位置を設定することで、通信端末の利用者と発話者との関係に応じてテキストを表示することができる。通信端末は自機のユーザの音声テキストと通話相手の音声テキストとを互いに異なる側に表示するので、発信者および着信者のそれぞれに、自分の発話と相手の発話とを分かり易く示すことができる。 By setting the display position of the text according to the type of speech as described above, the text can be displayed according to the relationship between the user of the communication terminal and the speaker. Since the communication terminal displays the voice text of the user of the own device and the voice text of the other party on different sides, it is possible to clearly show the utterance of the user and the utterance of the other party to each of the caller and the called party. .

他の側面に係る呼制御システムでは、発側Ｗｅｂサーバが、発話種別を発側テキストまたは着側テキストと共に発信端末に送信することで、発信端末に発話種別に基づいて発側テキストまたは着側テキストの表示態様を設定させ、着側Ｗｅｂサーバが、発話種別を発側テキストまたは着側テキストと共に着信端末に送信することで、着信端末に発話種別に基づいて発側テキストまたは着側テキストの表示態様を設定させてもよい。 In the call control system according to another aspect, the originating-side Web server transmits the utterance type together with the originating-side text or the terminating-side text to the originating terminal, so that the originating-side text or the terminating-side text is sent to the originating terminal based on the utterance type. and the receiving-side Web server transmits the utterance type together with the calling-side text or the called-side text to the receiving terminal, thereby allowing the receiving terminal to display the calling-side text or the called-side text based on the utterance type. may be set.

以上、本開示について詳細に説明したが、当業者にとっては、本開示が本開示中に説明した実施形態に限定されるものではないということは明らかである。本開示は、請求の範囲の記載により定まる本開示の趣旨及び範囲を逸脱することなく修正及び変更態様として実施することができる。したがって、本開示の記載は、例示説明を目的とするものであり、本開示に対して何ら制限的な意味を有するものではない。 Although the present disclosure has been described in detail above, it should be apparent to those skilled in the art that the present disclosure is not limited to the embodiments described in this disclosure. The present disclosure can be practiced with modifications and variations without departing from the spirit and scope of the present disclosure as defined by the claims. Accordingly, the description of the present disclosure is for illustrative purposes and is not meant to be limiting in any way.

情報の通知は、本開示において説明した態様／実施形態に限られず、他の方法を用いて行われてもよい。例えば、情報の通知は、物理レイヤシグナリング（例えば、ＤＣＩ（Downlink Control Information）、ＵＣＩ（Uplink Control Information））、上位レイヤシグナリング（例えば、ＲＲＣ（Radio Resource Control）シグナリング、ＭＡＣ（Medium Access Control）シグナリング、報知情報（ＭＩＢ（Master Information Block）、ＳＩＢ（System Information Block）））、その他の信号又はこれらの組み合わせによって実施されてもよい。また、ＲＲＣシグナリングは、ＲＲＣメッセージと呼ばれてもよく、例えば、ＲＲＣ接続セットアップ（RRC Connection Setup）メッセージ、ＲＲＣ接続再構成（RRC Connection Reconfiguration）メッセージなどであってもよい。 Notification of information is not limited to the aspects/embodiments described in this disclosure, and may be performed using other methods. For example, notification of information includes physical layer signaling (e.g., DCI (Downlink Control Information), UCI (Uplink Control Information)), higher layer signaling (e.g., RRC (Radio Resource Control) signaling, MAC (Medium Access Control) signaling, It may be implemented by broadcast information (MIB (Master Information Block), SIB (System Information Block)), other signals, or a combination thereof. RRC signaling may also be called an RRC message, and may be, for example, an RRC connection setup message, an RRC connection reconfiguration message, or the like.

本開示において説明した各態様／実施形態は、ＬＴＥ（Long Term Evolution）、ＬＴＥ－Ａ（LTE-Advanced）、ＳＵＰＥＲ３Ｇ、ＩＭＴ－Ａｄｖａｎｃｅｄ、４Ｇ（4th generation mobile communication system）、５Ｇ（5th generation mobile communication system）、ＦＲＡ（Future Radio Access）、ＮＲ（new Radio）、Ｗ－ＣＤＭＡ（登録商標）、ＧＳＭ（登録商標）、ＣＤＭＡ２０００、ＵＭＢ（Ultra Mobile Broadband）、ＩＥＥＥ８０２．１１（Ｗｉ－Ｆｉ（登録商標））、ＩＥＥＥ８０２．１６（ＷｉＭＡＸ（登録商標））、ＩＥＥＥ８０２．２０、ＵＷＢ（Ultra-WideBand）、Ｂｌｕｅｔｏｏｔｈ（登録商標）、その他の適切なシステムを利用するシステム及びこれらに基づいて拡張された次世代システムの少なくとも一つに適用されてもよい。また、複数のシステムが組み合わされて（例えば、ＬＴＥ及びＬＴＥ－Ａの少なくとも一方と５Ｇとの組み合わせ等）適用されてもよい。 Each aspect/embodiment described in the present disclosure includes LTE (Long Term Evolution), LTE-A (LTE-Advanced), SUPER 3G, IMT-Advanced, 4G (4th generation mobile communication system), 5G (5th generation mobile communication system), FRA (Future Radio Access), NR (new Radio), W-CDMA (registered trademark), GSM (registered trademark), CDMA2000, UMB (Ultra Mobile Broadband), IEEE 802.11 (Wi-Fi (registered trademark) )), IEEE 802.16 (WiMAX®), IEEE 802.20, UWB (Ultra-WideBand), Bluetooth®, and other suitable systems and extended It may be applied to at least one of the next generation systems. Also, a plurality of systems may be applied in combination (for example, a combination of at least one of LTE and LTE-A and 5G, etc.).

本開示において説明した各態様／実施形態の処理手順、シーケンス、フローチャートなどは、矛盾の無い限り、順序を入れ替えてもよい。例えば、本開示において説明した方法については、例示的な順序を用いて様々なステップの要素を提示しており、提示した特定の順序に限定されない。 The processing procedures, sequences, flowcharts, etc. of each aspect/embodiment described in this disclosure may be rearranged as long as there is no contradiction. For example, the methods described in this disclosure present elements of the various steps using a sample order, and are not limited to the specific order presented.

本開示において基地局によって行われるとした特定動作は、場合によってはその上位ノード（upper node）によって行われることもある。基地局を有する１つ又は複数のネットワークノード（network nodes）からなるネットワークにおいて、端末との通信のために行われる様々な動作は、基地局及び基地局以外の他のネットワークノード（例えば、ＭＭＥ又はＳ－ＧＷなどが考えられるが、これらに限られない）の少なくとも１つによって行われ得ることは明らかである。上記において基地局以外の他のネットワークノードが１つである場合を例示したが、複数の他のネットワークノードの組み合わせ（例えば、ＭＭＥ及びＳ－ＧＷ）であってもよい。 Certain operations that are described in this disclosure as being performed by a base station may also be performed by its upper node in some cases. In a network consisting of one or more network nodes with a base station, various operations performed for communication with a terminal may be performed by the base station and other network nodes other than the base station (e.g. MME or S-GW, etc. (including but not limited to). Although the case where there is one network node other than the base station is exemplified above, it may be a combination of a plurality of other network nodes (for example, MME and S-GW).

情報等は、上位レイヤ（又は下位レイヤ）から下位レイヤ（又は上位レイヤ）へ出力され得る。複数のネットワークノードを介して入出力されてもよい。 Information, etc., may be output from a higher layer (or lower layer) to a lower layer (or higher layer). It may be input and output via multiple network nodes.

入出力された情報等は特定の場所（例えば、メモリ）に保存されてもよいし、管理テーブルを用いて管理してもよい。入出力される情報等は、上書き、更新、又は追記され得る。出力された情報等は削除されてもよい。入力された情報等は他の装置へ送信されてもよい。 Input/output information and the like may be stored in a specific location (for example, memory), or may be managed using a management table. Input/output information and the like can be overwritten, updated, or appended. The output information and the like may be deleted. The entered information and the like may be transmitted to another device.

判定は、１ビットで表される値（０か１か）によって行われてもよいし、真偽値（Boolean：true又はfalse）によって行われてもよいし、数値の比較（例えば、所定の値との比較）によって行われてもよい。 The determination may be made by a value represented by one bit (0 or 1), by a true/false value (Boolean: true or false), or by numerical comparison (for example, a predetermined value).

本開示において説明した各態様／実施形態は単独で用いてもよいし、組み合わせて用いてもよいし、実行に伴って切り替えて用いてもよい。また、所定の情報の通知（例えば、「Ｘであること」の通知）は、明示的に行うものに限られず、暗黙的（例えば、当該所定の情報の通知を行わない）ことによって行われてもよい。 Each aspect/embodiment described in the present disclosure may be used alone, may be used in combination, or may be used by switching according to execution. In addition, the notification of predetermined information (for example, notification of “being X”) is not limited to being performed explicitly, but may be performed implicitly (for example, not notifying the predetermined information). good too.

ソフトウェアは、ソフトウェア、ファームウェア、ミドルウェア、マイクロコード、ハードウェア記述言語と呼ばれるか、他の名称で呼ばれるかを問わず、命令、命令セット、コード、コードセグメント、プログラムコード、プログラム、サブプログラム、ソフトウェアモジュール、アプリケーション、ソフトウェアアプリケーション、ソフトウェアパッケージ、ルーチン、サブルーチン、オブジェクト、実行可能ファイル、実行スレッド、手順、機能などを意味するよう広く解釈されるべきである。 Software, whether referred to as software, firmware, middleware, microcode, hardware description language or otherwise, includes instructions, instruction sets, code, code segments, program code, programs, subprograms, and software modules. , applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, and the like.

また、ソフトウェア、命令、情報などは、伝送媒体を介して送受信されてもよい。例えば、ソフトウェアが、有線技術（同軸ケーブル、光ファイバケーブル、ツイストペア、デジタル加入者回線（ＤＳＬ：Digital Subscriber Line）など）及び無線技術（赤外線、マイクロ波など）の少なくとも一方を使用してウェブサイト、サーバ、又は他のリモートソースから送信される場合、これらの有線技術及び無線技術の少なくとも一方は、伝送媒体の定義内に含まれる。 Software, instructions, information, etc. may also be sent and received over a transmission medium. For example, the software uses wired technology (coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), etc.) and/or wireless technology (infrared, microwave, etc.) to create websites, Wired and/or wireless technologies are included within the definition of transmission medium when sent from a server or other remote source.

本開示において説明した情報、信号などは、様々な異なる技術のいずれかを使用して表されてもよい。例えば、上記の説明全体に渡って言及され得るデータ、命令、コマンド、情報、信号、ビット、シンボル、チップなどは、電圧、電流、電磁波、磁界若しくは磁性粒子、光場若しくは光子、又はこれらの任意の組み合わせによって表されてもよい。 Information, signals, etc. described in this disclosure may be represented using any of a variety of different technologies. For example, data, instructions, commands, information, signals, bits, symbols, chips, etc. that may be referred to throughout the above description may refer to voltages, currents, electromagnetic waves, magnetic fields or magnetic particles, light fields or photons, or any of these. may be represented by a combination of

なお、本開示において説明した用語及び本開示の理解に必要な用語については、同一の又は類似する意味を有する用語と置き換えてもよい。例えば、チャネル及びシンボルの少なくとも一方は信号（シグナリング）であってもよい。また、信号はメッセージであってもよい。また、コンポーネントキャリア（ＣＣ：Component Carrier）は、キャリア周波数、セル、周波数キャリアなどと呼ばれてもよい。 The terms explained in this disclosure and the terms necessary for understanding the present disclosure may be replaced with terms having the same or similar meanings. For example, the channel and/or symbols may be signaling. A signal may also be a message. A component carrier (CC) may also be called a carrier frequency, a cell, a frequency carrier, or the like.

本開示において使用する「システム」及び「ネットワーク」という用語は、互換的に使用される。 As used in this disclosure, the terms "system" and "network" are used interchangeably.

また、本開示において説明した情報、パラメータなどは、絶対値を用いて表されてもよいし、所定の値からの相対値を用いて表されてもよいし、対応する別の情報を用いて表されてもよい。例えば、無線リソースはインデックスによって指示されるものであってもよい。 In addition, the information, parameters, etc. described in the present disclosure may be expressed using absolute values, may be expressed using relative values from a predetermined value, or may be expressed using other corresponding information. may be represented. For example, radio resources may be indexed.

上述したパラメータに使用する名称はいかなる点においても限定的な名称ではない。さらに、これらのパラメータを使用する数式等は、本開示で明示的に開示したものと異なる場合もある。様々なチャネル（例えば、ＰＵＣＣＨ、ＰＤＣＣＨなど）及び情報要素は、あらゆる好適な名称によって識別できるので、これらの様々なチャネル及び情報要素に割り当てている様々な名称は、いかなる点においても限定的な名称ではない。 The names used for the parameters described above are not limiting names in any way. Further, the formulas, etc., using these parameters may differ from those expressly disclosed in this disclosure. Since the various channels (e.g., PUCCH, PDCCH, etc.) and information elements can be identified by any suitable designation, the various designations assigned to these various channels and information elements are in no way restrictive designations. is not.

本開示においては、「基地局（ＢＳ：Base Station）」、「無線基地局」、「固定局（fixed station）」、「ＮｏｄｅＢ」、「ｅＮｏｄｅＢ（ｅＮＢ）」、「ｇＮｏｄｅＢ（ｇＮＢ）」、「アクセスポイント（access point）」、「送信ポイント（transmission point）」、「受信ポイント（reception point）、「送受信ポイント（transmission/reception point）」、「セル」、「セクタ」、「セルグループ」、「キャリア」、「コンポーネントキャリア」などの用語は、互換的に使用され得る。基地局は、マクロセル、スモールセル、フェムトセル、ピコセルなどの用語で呼ばれる場合もある。 In the present disclosure, "base station (BS)", "radio base station", "fixed station", "NodeB", "eNodeB (eNB)", "gNodeB (gNB)", " "access point", "transmission point", "reception point", "transmission/reception point", "cell", "sector", "cell group", " Terms such as "carrier", "component carrier" may be used interchangeably. A base station may also be referred to by terms such as macrocell, small cell, femtocell, picocell, and the like.

基地局は、１つ又は複数（例えば、３つ）のセルを収容することができる。基地局が複数のセルを収容する場合、基地局のカバレッジエリア全体は複数のより小さいエリアに区分でき、各々のより小さいエリアは、基地局サブシステム（例えば、屋内用の小型基地局（ＲＲＨ：ＲｅｍｏｔｅＲａｄｉｏＨｅａｄ）によって通信サービスを提供することもできる。「セル」又は「セクタ」という用語は、このカバレッジにおいて通信サービスを行う基地局及び基地局サブシステムの少なくとも一方のカバレッジエリアの一部又は全体を指す。 A base station may serve one or more (eg, three) cells. When a base station accommodates multiple cells, the overall coverage area of the base station can be partitioned into multiple smaller areas, each smaller area being associated with a base station subsystem (e.g., an indoor small base station (RRH: The term "cell" or "sector" refers to part or all of the coverage area of a base station and/or base station subsystem serving communication in this coverage. point to

本開示においては、「移動局（ＭＳ：Mobile Station）」、「ユーザ端末（user terminal）」、「ユーザ装置（ＵＥ：User Equipment）」、「端末」などの用語は、互換的に使用され得る。 In this disclosure, terms such as “Mobile Station (MS),” “user terminal,” “User Equipment (UE),” “terminal,” etc. may be used interchangeably. .

移動局は、当業者によって、加入者局、モバイルユニット、加入者ユニット、ワイヤレスユニット、リモートユニット、モバイルデバイス、ワイヤレスデバイス、ワイヤレス通信デバイス、リモートデバイス、モバイル加入者局、アクセス端末、モバイル端末、ワイヤレス端末、リモート端末、ハンドセット、ユーザエージェント、モバイルクライアント、クライアント、又はいくつかの他の適切な用語で呼ばれる場合もある。 A mobile station is defined by those skilled in the art as a subscriber station, mobile unit, subscriber unit, wireless unit, remote unit, mobile device, wireless device, wireless communication device, remote device, mobile subscriber station, access terminal, mobile terminal, wireless It may also be called a terminal, remote terminal, handset, user agent, mobile client, client, or some other suitable term.

基地局及び移動局の少なくとも一方は、送信装置、受信装置、通信装置などと呼ばれてもよい。なお、基地局及び移動局の少なくとも一方は、移動体に搭載されたデバイス、移動体自体などであってもよい。当該移動体は、乗り物（例えば、車、飛行機など）であってもよいし、無人で動く移動体（例えば、ドローン、自動運転車など）であってもよいし、ロボット（有人型又は無人型）であってもよい。なお、基地局及び移動局の少なくとも一方は、必ずしも通信動作時に移動しない装置も含む。例えば、基地局及び移動局の少なくとも一方は、センサなどのＩｏＴ（Internet of Things）機器であってもよい。 At least one of a base station and a mobile station may be called a transmitter, a receiver, a communication device, and the like. At least one of the base station and the mobile station may be a device mounted on a mobile object, the mobile object itself, or the like. The mobile object may be a vehicle (e.g., car, airplane, etc.), an unmanned mobile object (e.g., drone, self-driving car, etc.), or a robot (manned or unmanned ). Note that at least one of the base station and the mobile station includes devices that do not necessarily move during communication operations. For example, at least one of the base station and the mobile station may be an IoT (Internet of Things) device such as a sensor.

また、本開示における基地局は、ユーザ端末で読み替えてもよい。例えば、基地局及びユーザ端末間の通信を、複数のユーザ端末間の通信（例えば、Ｄ２Ｄ（Device-to-Device）、Ｖ２Ｘ（Vehicle-to-Everything）などと呼ばれてもよい）に置き換えた構成について、本開示の各態様／実施形態を適用してもよい。この場合、基地局が有する機能をユーザ端末が有する構成としてもよい。また、「上り」及び「下り」などの文言は、端末間通信に対応する文言（例えば、「サイド（side）」）で読み替えられてもよい。例えば、上りチャネル、下りチャネルなどは、サイドチャネルで読み替えられてもよい。 Also, the base station in the present disclosure may be read as a user terminal. For example, communication between a base station and a user terminal is replaced with communication between multiple user terminals (for example, D2D (Device-to-Device), V2X (Vehicle-to-Everything), etc.) Regarding the configuration, each aspect/embodiment of the present disclosure may be applied. In this case, the user terminal may have the functions that the base station has. Also, words such as "up" and "down" may be replaced with words corresponding to inter-terminal communication (for example, "side"). For example, uplink channels, downlink channels, etc. may be read as side channels.

同様に、本開示におけるユーザ端末は、基地局で読み替えてもよい。この場合、ユーザ端末が有する機能を基地局が有する構成としてもよい。 Similarly, user terminals in the present disclosure may be read as base stations. In this case, the base station may have the functions that the user terminal has.

本開示で使用する「判断(determining)」、「決定(determining)」という用語は、多種多様な動作を包含する場合がある。「判断」、「決定」は、例えば、判定(judging)、計算(calculating)、算出(computing)、処理(processing)、導出(deriving)、調査(investigating)、探索(looking up、search、inquiry)（例えば、テーブル、データベース又は別のデータ構造での探索）、確認(ascertaining)した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、受信(receiving)（例えば、情報を受信すること）、送信(transmitting)(例えば、情報を送信すること)、入力(input)、出力(output)、アクセス(accessing)（例えば、メモリ中のデータにアクセスすること）した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、解決(resolving)、選択(selecting)、選定(choosing)、確立(establishing)、比較(comparing)などした事を「判断」「決定」したとみなす事を含み得る。つまり、「判断」「決定」は、何らかの動作を「判断」「決定」したとみなす事を含み得る。また、「判断（決定）」は、「想定する（assuming）」、「期待する（expecting）」、「みなす（considering）」などで読み替えられてもよい。 As used in this disclosure, the terms "determining" and "determining" may encompass a wide variety of actions. "Judgement", "determining" are, for example, judging, calculating, computing, processing, deriving, investigating, looking up, searching, inquiring (eg, lookup in a table, database, or other data structure), ascertaining as "judged" or "determined", and the like. Also, "judgment" and "determination" are used for receiving (e.g., receiving information), transmitting (e.g., transmitting information), input, output, access (accessing) (for example, accessing data in memory) may include deeming that a "judgement" or "decision" has been made. In addition, "judgment" and "decision" are considered to be "judgment" and "decision" by resolving, selecting, choosing, establishing, comparing, etc. can contain. In other words, "judgment" and "decision" can include considering that some action is "judgment" and "decision". Also, "judgment (decision)" may be read as "assuming", "expecting", "considering", or the like.

「接続された(connected)」、「結合された(coupled)」という用語、又はこれらのあらゆる変形は、２又はそれ以上の要素間の直接的又は間接的なあらゆる接続又は結合を意味し、互いに「接続」又は「結合」された２つの要素間に１又はそれ以上の中間要素が存在することを含むことができる。要素間の結合又は接続は、物理的なものであっても、論理的なものであっても、或いはこれらの組み合わせであってもよい。例えば、「接続」は「アクセス」で読み替えられてもよい。本開示で使用する場合、２つの要素は、１又はそれ以上の電線、ケーブル及びプリント電気接続の少なくとも一つを用いて、並びにいくつかの非限定的かつ非包括的な例として、無線周波数領域、マイクロ波領域及び光（可視及び不可視の両方）領域の波長を有する電磁エネルギーなどを用いて、互いに「接続」又は「結合」されると考えることができる。 The terms "connected", "coupled", or any variation thereof, mean any direct or indirect connection or coupling between two or more elements, It can include the presence of one or more intermediate elements between two elements being "connected" or "coupled." Couplings or connections between elements may be physical, logical, or a combination thereof. For example, "connection" may be read as "access". As used in this disclosure, two elements are in the radio frequency domain using at least one of one or more wires, cables and printed electrical connections, and as some non-limiting and non-exhaustive examples. , electromagnetic energy having wavelengths in the microwave and optical (both visible and invisible) regions, and the like.

本開示において使用する「に基づいて」という記載は、別段に明記されていない限り、「のみに基づいて」を意味しない。言い換えれば、「に基づいて」という記載は、「のみに基づいて」と「に少なくとも基づいて」の両方を意味する。 As used in this disclosure, the phrase "based on" does not mean "based only on," unless expressly specified otherwise. In other words, the phrase "based on" means both "based only on" and "based at least on."

本開示において使用する「第１の」、「第２の」などの呼称を使用した要素へのいかなる参照も、それらの要素の量又は順序を全般的に限定しない。これらの呼称は、２つ以上の要素間を区別する便利な方法として本開示において使用され得る。したがって、第１及び第２の要素への参照は、２つの要素のみが採用され得ること、又は何らかの形で第１の要素が第２の要素に先行しなければならないことを意味しない。 Any reference to elements using the "first," "second," etc. designations used in this disclosure does not generally limit the quantity or order of those elements. These designations may be used in this disclosure as a convenient method of distinguishing between two or more elements. Thus, reference to a first and second element does not imply that only two elements can be employed or that the first element must precede the second element in any way.

本開示において、「含む（include）」、「含んでいる（including）」及びそれらの変形が使用されている場合、これらの用語は、用語「備える（comprising）」と同様に、包括的であることが意図される。さらに、本開示において使用されている用語「又は（or）」は、排他的論理和ではないことが意図される。 Where "include," "including," and variations thereof are used in this disclosure, these terms are inclusive, as is the term "comprising." is intended. Furthermore, the term "or" as used in this disclosure is not intended to be an exclusive OR.

本開示において、例えば、英語でのa, an及びtheのように、翻訳により冠詞が追加された場合、本開示は、これらの冠詞の後に続く名詞が複数形であることを含んでもよい。 In this disclosure, where articles have been added by translation, such as a, an, and the in English, the disclosure may include the plural nouns following these articles.

本開示において、「ＡとＢが異なる」という用語は、「ＡとＢが互いに異なる」ことを意味してもよい。なお、当該用語は、「ＡとＢがそれぞれＣと異なる」ことを意味してもよい。「離れる」、「結合される」などの用語も、「異なる」と同様に解釈されてもよい。 In the present disclosure, the term "A and B are different" may mean "A and B are different from each other." The term may also mean that "A and B are different from C". Terms such as "separate," "coupled," etc. may also be interpreted in the same manner as "different."

１…呼制御システム、１０…コアネットワーク、１１…発側ＣＳＣＦ、１２…着側ＣＳＣＦ、１３…発側ＡＳ、１４…着側ＡＳ、１５…発側ＭＣＥ（発側メディア処理装置）、１６…着側ＭＣＥ（着側メディア処理装置）、１７…発側ＳＭＳ－ＧＷ、１８…着側ＳＭＳ－ＧＷ、１９…セッションデータベース、２１…発側ネットワーク、２２…着側ネットワーク、３１…発信端末、３２…着信端末、４１…発側Ｗｅｂサーバ、４２…着側Ｗｅｂサーバ、４３…音声認識エンジン、１３１，１４１…サービス制御部、１３２，１４２…セッション制御部、１３３，１４３…サービスシナリオ部。 DESCRIPTION OF SYMBOLS 1... Call control system 10... Core network 11... Calling-side CSCF 12... Calling-side CSCF 13... Calling-side AS 14... Calling-side AS 15... Calling-side MCE (calling-side media processing device) 16... Terminating side MCE (terminating side media processing device), 17 ... originating side SMS-GW, 18 ... destination side SMS-GW, 19 ... session database, 21 ... originating side network, 22 ... destination side network, 31 ... originating terminal, 32 Received terminal 41 Calling-side Web server 42 Calling-side Web server 43 Speech recognition engine 131, 141 Service control unit 132, 142 Session control unit 133, 143 Service scenario unit.

Claims

A call control system capable of executing a voice-to-text conversion service for converting a call transmitted between a calling terminal and a called terminal into text,
When both the caller using the calling terminal and the called party using the called terminal are users of the speech-to-text service, the calling side media processing device corresponding to the calling terminal and the called terminal a control unit that causes one of the corresponding destination media processing devices to function as a common media processing device;
the common media processing unit interfaces with a speech recognition engine that converts the speech of the caller or the callee to text;
the common media processing device comprising:
obtaining a caller text by inputting the caller's caller's voice transmitted from the caller terminal into the speech recognition engine;
transmitting the calling-side text to both the calling terminal and the called terminal;
Acquiring a destination text by inputting the destination voice of the called party transmitted from the receiving terminal into the speech recognition engine;
sending the called party text to both the calling terminal and the called terminal;
call control system.

the control unit causes the originating media processing device to function as the common media processing device;
The call control system according to claim 1.

The control unit
transmitting an originating media device ID that uniquely identifies the originating media processing device to the destination media processing device;
receiving a destination media device ID that uniquely identifies the destination media processing device from the destination media processing device that received the originating media device ID;
causing the originating media processing device to function as the common media processing device in response to receiving the destination media device ID;
The call control system according to claim 2.

the calling-side media processing device connects to a calling-side Web server that transmits the calling-side text or the called-side text to the calling terminal;
the called-side media processing device connects to a called-side Web server that transmits the calling-side text or the called-side text to the called terminal;
said call control system further comprising a database for storing session information including an originating endpoint that uniquely identifies said originating Web server and a terminating endpoint that uniquely identifies said terminating Web server;
the common media processing device comprising:
obtaining the originating end point and the terminating end point of the session information;
transmitting the calling-side text or the called-side text to the calling-side Web server based on the calling-side endpoint, thereby transmitting the calling-side text or the called-side text to the calling terminal;
transmitting the calling-side text or the called-side text to the called-side Web server based on the called-side endpoint, thereby transmitting the calling-side text or the called-side text to the called terminal;
4. The call control system according to claim 2 or 3.

The control unit transmits the voice text to which communication terminal in response to the consent signal indicating that the user consents to the use of the voice-to-text conversion service is transmitted from both the calling terminal and the receiving terminal. Set the recognition direction to indicate whether to
The common media processing device transmits the originating text or the terminating text to both the originating web server and the terminating web server in response to the fact that the recognition direction is bidirectional. ,
The call control system according to claim 4.

The common media processing device further adds, for each of the calling side text and the called side text, an utterance type indicating whether the speaker is the calling party or the called party, to the calling side Web server and the called side text. Send to both side Web servers,
6. The call control system according to claim 4 or 5.