JP7845690B2

JP7845690B2 - Question creation device, question creation system, and question creation method

Info

Publication number: JP7845690B2
Application number: JP2023142339A
Authority: JP
Inventors: 道生小林
Original assignee: PARONYM INC.
Current assignee: PARONYM INC.
Priority date: 2023-09-01
Filing date: 2023-09-01
Publication date: 2026-04-14
Anticipated expiration: 2043-09-01
Also published as: JP2025035353A

Description

本発明は、質問作成装置、質問作成システム及び質問作成方法に関する。 This invention relates to a question creation device, a question creation system, and a question creation method.

近年、動画をユーザー側の端末（例えばタブレット型端末、スマートフォン、パーソナルコンピュータなど）で視聴することが行われている。例えば特許文献１～３には、ドラマなどの放送番組を視聴者側の端末で視聴することが記載されている。 In recent years, it has become common to view videos on the user's device (e.g., tablet devices, smartphones, personal computers, etc.). For example, Patent Documents 1-3 describe viewing broadcast programs such as dramas on the viewer's device.

特開２０１２－１１９８３３号公報Japanese Patent Publication No. 2012-119833 特開２００７－３０６３９９号公報Japanese Patent Publication No. 2007-306399 特開２００４－２３４２５号公報Japanese Patent Publication No. 2004-23425

ところで、動画の視聴者が動画の中に映し出された対象（以下、「アイテム」と呼ぶことがある）に関心を持ったとき、視聴者が、アイテムに関連する情報（以下、「アイテム関連情報」と呼ぶことがある）の収集を試みることがある。例えば、視聴者がドラマを視聴しているときに、ドラマの主人公が持っているバッグを購入したいと考え、そのバッグのブランド名や商品名をオンラインで探そうとすることがある。 Incidentally, when viewers of a video become interested in an object shown in the video (hereinafter sometimes referred to as "item"), they may try to collect information related to that item (hereinafter sometimes referred to as "item-related information"). For example, when a viewer is watching a drama, they might want to buy the bag that the main character is carrying and try to find the brand name and product name of that bag online.

また、その際、視聴者は、例えばオンラインのチャットボットを使用して、アイテム関連情報の収集を試みることがある。チャットボットは、近年、大規模言語モデル（ＬａｒｇｅＬａｎｇｕａｇｅＭｏｄｅｌｓ、ＬＬＭ）に基づいて構築されるソフトウェアが多く用いられている。視聴者は、チャットボットにおけるテキストや音声による対話を通じて、アイテム関連情報を得ることができる。 Furthermore, viewers may attempt to gather item-related information using, for example, online chatbots. In recent years, many chatbots utilize software built on Large Language Models (LLMs). Viewers can obtain item-related information through text and voice interactions with the chatbot.

しかし、視聴者がテキストや音声による対話を通じてアイテム関連情報を得ようとしても、アイテムの検索キーワードが分からなければ、目的とするアイテム関連情報を探すことは困難である。つまり、視聴者は、アイテムの検索キーワードが分からないので、チャットボットに問いかける適切な質問も容易に思いつくことができない。 However, even if viewers try to obtain item-related information through text or voice interaction, it will be difficult to find the desired information if they don't know the search keywords for the item. In other words, because viewers don't know the search keywords for the item, they cannot easily come up with appropriate questions to ask the chatbot.

本発明の目的の一例は、動画の中に映し出された対象に関連する情報を得るための質問を容易に作成することにある。本発明の他の目的は、本明細書の記載から明らかになるであろう。 One example of the object of the present invention is to facilitate the creation of questions for obtaining information related to objects displayed in a video. Other objects of the present invention will become apparent from the description herein.

本発明の一態様は、表示された動画に対してアクションが実行されたときに、前記アクションが実行された前記動画における動画要素を抽出し、解析装置に出力する要素出力部と、前記解析装置により前記動画要素に基づいて作成されたテキスト化要素を取得するテキスト取得部と、前記テキスト化要素から、前記動画要素に関する質問候補を作成する質問作成部と、を備える質問作成装置である。 One aspect of the present invention is a question creation device comprising: an element output unit that extracts video elements from a video on which an action has been performed and outputs them to an analysis device when an action has been performed on the displayed video; a text acquisition unit that acquires text elements created by the analysis device based on the video elements; and a question creation unit that creates candidate questions related to the video elements from the text elements.

本発明の一態様は、表示された動画に対してアクションが実行されたときに、前記アクションが実行された前記動画における動画要素を抽出し、解析装置に出力する要素出力部と、前記解析装置により前記動画要素に基づいて作成されたテキスト化要素を取得するテキスト取得部と、前記テキスト化要素から、前記動画要素に関する質問候補を作成する質問作成部と、前記質問候補を表示する表示部と、を備える質問作成システムである。 One aspect of the present invention is a question creation system comprising: an element output unit that extracts video elements from a video on which an action has been performed and outputs them to an analysis device when an action has been performed on the displayed video; a text acquisition unit that acquires text elements created by the analysis device based on the video elements; a question creation unit that creates candidate questions related to the video elements from the text elements; and a display unit that displays the candidate questions.

本発明の一態様は、表示された動画に対してアクションが実行されたときに、前記アクションが実行された前記動画における動画要素を抽出し、解析装置に出力することと、前記解析装置により前記動画要素に基づいて作成されたテキスト化要素を取得することと、前記テキスト化要素から、前記動画要素に関する質問候補を作成することと、を備える質問作成方法である。 One aspect of the present invention is a question creation method comprising: extracting video elements from a displayed video when an action is performed on the video, outputting them to an analysis device; obtaining text elements created by the analysis device based on the video elements; and creating candidate questions related to the video elements from the text elements.

本発明の上記態様によれば、動画の中に映し出された対象に関連する情報を得るための質問を容易に作成することができる。 According to the above-described aspect of the present invention, questions can be easily created to obtain information related to the object displayed in the video.

図１は、本実施形態の動画に対して実行されるアクションの一例を示す説明図である。Figure 1 is an explanatory diagram showing an example of an action performed on a video in this embodiment. 図２は、本実施形態の質問作成装置１０を有する質問作成システム１００を示す図である。Figure 2 shows a question creation system 100 having the question creation device 10 of this embodiment. 図３は、本実施形態の質問作成装置１０が行う質問作成処理のフロー図である。Figure 3 is a flowchart of the question creation process performed by the question creation device 10 of this embodiment. 図４は、質問テンプレートの一例を示す図である。Figure 4 shows an example of a question template. 図５は、複数のテキスト化要素があった場合の質問テンプレートへのテキスト化要素への入力方法を示す図である。Figure 5 shows how to input text elements into a question template when there are multiple text elements. 図６は、質問作成処理の第１例における画面の一例を示す図である。Figure 6 shows an example of the screen in the first example of the question creation process. 図７は、質問作成処理の第１例において、動画要素を取得し、質問候補を作成する際の画面の一例を示す図である。Figure 7 shows an example of the screen used when acquiring video elements and creating candidate questions in the first example of the question creation process. 図８は、質問作成処理の第１例において、質問候補が表示された画面の一例を示す図である。Figure 8 shows an example of a screen displaying candidate questions in the first example of the question creation process. 図９は、質問作成処理の第２例における画面の一例を示す図である。Figure 9 shows an example of the screen in the second example of the question creation process. 図１０は、質問作成処理の第２例において、動画要素を取得する際の画面の一例を示す図である。Figure 10 shows an example of the screen when acquiring video elements in the second example of the question creation process. 図１１は、質問作成処理の第２例において、テキスト化要素を取得し、質問候補を作成する際の画面の一例を示す図である。Figure 11 shows an example of the screen used when obtaining text elements and creating question candidates in the second example of the question creation process.

後述する明細書及び図面の記載から、少なくとも以下の事項が明らかとなる。 From the description in the specification and drawings described below, at least the following matters become clear:

表示された動画に対してアクションが実行されたときに、前記アクションが実行された前記動画における動画要素を抽出し、解析装置に出力する要素出力部と、前記解析装置により前記動画要素に基づいて作成されたテキスト化要素を取得するテキスト取得部と、前記テキスト化要素から、前記動画要素に関する質問候補を作成する質問作成部と、を備える質問作成装置が明らかとなる。このような質問作成装置によれば、動画の中に映し出された対象に関連する情報を得るための質問を容易に作成することができる。 A question creation device is revealed that comprises: an element output unit that extracts video elements from a displayed video when an action is performed on the video and outputs them to an analysis device; a text acquisition unit that acquires text elements created by the analysis device based on the video elements; and a question creation unit that creates candidate questions related to the video elements from the text elements. Such a question creation device makes it easy to create questions to obtain information related to objects displayed in a video.

前記動画要素は、前記動画に関する、画像情報、音声情報、文字情報の少なくとも１つ以上を有することが望ましい。これにより、動画の中に映し出された対象に関連する情報を得るための質問を容易に作成することができる。 The aforementioned video element preferably contains at least one of the following: image information, audio information, and text information related to the video. This makes it easy to create questions to obtain information related to the subject shown in the video.

前記動画要素は、前記動画要素に関する種別情報を含む前記画像情報を有し、前記テキスト取得部は、前記種別情報を、前記テキスト化要素に紐づけて取得し、前記質問作成部は、前記種別情報に対応した質問テンプレートを使用して前記質問候補を作成することが望ましい。これにより、画像情報が有する種別情報を利用することで、種別情報に基づく精度の高い質問候補を作成することができる。 The aforementioned video element has image information that includes type information related to the video element. The text acquisition unit acquires the type information linked to the text element, and the question creation unit preferably creates the question candidates using a question template corresponding to the type information. This allows for the creation of highly accurate question candidates based on the type information by utilizing the type information contained in the image information.

前記動画要素は、前記動画要素に関する種別情報を含まない画像情報、音声情報、文字情報の少なくとも１つ以上であって、前記テキスト取得部は、前記種別情報に紐づく前記テキスト化要素を取得し、前記質問作成部は、前記種別情報に対応した質問テンプレートを使用して前記質問候補を作成することが望ましい。これにより、動画要素が種別情報を有していない場合であっても、種別情報に対応した質問テンプレートを使用することができる。 The aforementioned video element preferably consists of at least one of image information, audio information, and text information that does not include type information related to the video element. The text acquisition unit acquires the text elements associated with the type information, and the question creation unit creates the candidate questions using a question template corresponding to the type information. This allows the use of a question template corresponding to type information even when the video element does not have type information.

前記テキスト取得部は、第１の前記種別情報に紐づく第１のテキスト化要素と、第２の前記種別情報に紐づく第２のテキスト化要素と、を取得し、前記質問テンプレートは、前記第１の種別情報に関連する第１の入力要素と、前記第２の種別情報に関連する第２の入力要素と、を有しており、前記質問作成部は、前記第１の入力要素に前記第１のテキスト化要素が入力され、前記第２の入力要素に前記第２のテキスト化要素が入力された前記質問候補を作成することが望ましい。これにより、質問作成部が質問として不自然な表現となる質問候補を作成することを抑制することができる。 The text acquisition unit acquires a first text element associated with the first type information and a second text element associated with the second type information. The question template has a first input element related to the first type information and a second input element related to the second type information. It is desirable that the question creation unit creates a question candidate in which the first text element is input to the first input element and the second text element is input to the second input element. This prevents the question creation unit from creating question candidates with unnatural phrasing.

前記質問候補を表示装置に出力する質問出力部と、を備えることが望ましい。これにより、視聴者が表示装置を介して質問候補を目視することができる。 It is desirable to include a question output unit that outputs the aforementioned question candidates to a display device. This allows viewers to visually view the question candidates via the display device.

前記質問作成部は、複数の前記質問候補を作成し、前記質問出力部は、前記動画の視聴者が選択可能な態様で前記複数の質問候補を前記表示装置に出力することが望ましい。これにより、視聴者が、複数の質問候補の中から、意図にあった質問候補を選択することができる。 The question generation unit should create a plurality of candidate questions, and the question output unit should preferably output the plurality of candidate questions to the display device in a manner that allows the viewer of the video to select one. This allows the viewer to select a question that best suits their intentions from among the multiple candidate questions.

前記質問出力部は、前記複数の質問候補に優先度情報を付加して前記表示装置に出力することが望ましい。これにより、視聴者が、複数の質問候補の中から、意図にあった質問候補を迅速に選択することができる。 It is desirable that the question output unit adds priority information to the multiple question candidates and outputs them to the display device. This allows viewers to quickly select the question candidate that best suits their intentions from among the multiple candidates.

前記質問出力部は、前記動画の視聴者が確定した前記質問候補を回答作成装置に出力することが望ましい。これにより、視聴者が確定した質問に対する回答を自動的に作成することができる。 The question output unit should preferably output the candidate questions confirmed by the video viewer to the answer generation device. This allows for the automatic generation of answers to the questions confirmed by the viewer.

前記回答作成装置は、大規模言語モデルであることが望ましい。これにより、視聴者が確定した質問に対する回答を自動的に作成することができる。 The aforementioned answer generation device should preferably be a large-scale language model. This allows for the automatic generation of answers to questions confirmed by the audience.

前記アクションは、クリック、タッチ、音声入力及びキーボードによる入力のいずれかの操作、又は、いずれかの操作の組み合わせにより実行されることが望ましい。これにより、動画の中に映し出された対象に関連する情報を得るための質問を容易に作成することができる。 The aforementioned actions should preferably be performed by clicking, touching, voice input, or keyboard input, or a combination of these operations. This makes it easy to create questions to obtain information related to the subject shown in the video.

表示された動画に対してアクションが実行されたときに、前記アクションが実行された前記動画における動画要素を抽出し、解析装置に出力する要素出力部と、前記解析装置により前記動画要素に基づいて作成されたテキスト化要素を取得するテキスト取得部と、前記テキスト化要素から、前記動画要素に関する質問候補を作成する質問作成部と、前記質問候補を表示する表示部と、を備える質問作成システムが明らかとなる。このような質問作成システムによれば、動画の中に映し出された対象に関連する情報を得るための質問を容易に作成することができる。 A question creation system is revealed comprising: an element output unit that extracts video elements from a video on which an action has been performed and outputs them to an analysis device when an action is performed on the displayed video; a text acquisition unit that acquires text elements created by the analysis device based on the video elements; a question creation unit that creates candidate questions related to the video elements from the text elements; and a display unit that displays the candidate questions. Such a question creation system makes it easy to create questions to obtain information related to objects shown in a video.

表示された動画に対してアクションが実行されたときに、前記アクションが実行された前記動画における動画要素を抽出し、解析装置に出力することと、前記解析装置により前記動画要素に基づいて作成されたテキスト化要素を取得することと、前記テキスト化要素から、前記動画要素に関する質問候補を作成することと、を備える質問作成方法が明らかとなる。このような質問作成方法によれば、動画の中に映し出された対象に関連する情報を得るための質問を容易に作成することができる。 A question generation method is revealed, comprising: extracting video elements from a displayed video when an action is performed on the video, outputting them to an analysis device; obtaining text elements created by the analysis device based on the video elements; and creating candidate questions related to the video elements from the text elements. Such a question generation method makes it easy to create questions to obtain information related to objects shown in the video.

以下、図面を参照しながら本発明の好適な実施の形態を説明する。各図面に示される同一又は同等の構成要素、部材等には同一の符号を付し、適宜重複した説明は省略する。 The following describes preferred embodiments of the present invention with reference to the drawings. The same or equivalent components, members, etc., shown in each drawing are denoted by the same reference numerals, and redundant descriptions are omitted where appropriate.

＝＝＝本実施形態＝＝＝
＜動画の概要＞
本実施形態の質問作成装置、質問作成システム及び質問作成方法について説明する前に、まず、図１を参照しつつ、本実施形態において使用される動画の一例と、動画に対して実行されるアクションとについて説明する。 ===This Embodiment===
<Video Summary>
Before describing the question creation device, question creation system, and question creation method of this embodiment, we will first describe an example of a video used in this embodiment and the actions performed on the video, with reference to Figure 1.

図１は、本実施形態の動画に対して実行されるアクションの一例を示す説明図である。なお、図１Ａは、本実施形態の動画における画面の一例を示し、図１Ｂは、動画においてアイテム領域情報をストックする操作（後述）の一例を示し、図１Ｃは、ストックによりアイテム領域情報が記憶された状態の画面の一例を示す。 Figure 1 is an explanatory diagram showing an example of an action performed on a video in this embodiment. Figure 1A shows an example of a screen in the video of this embodiment, Figure 1B shows an example of an operation to store item area information in the video (described later), and Figure 1C shows an example of a screen in a state where item area information has been stored through the stocking process.

図１Ａに示されるように、ユーザー端末５（例えば、視聴者のスマートフォンやタブレット型端末など）の画面（ここでは、タッチパネル６）上で動画が再生されている。「動画」とは、配信主体（ここでは、後述する動画配信サーバー１）からデータにより配信される動きのある画像であり、例えば、バラエティ、ドラマ、ニュース、スポーツ、音楽、アニメ等の番組である。動画の配信形式は、後述するようにストリーミング形式に限られず、ダウンロード形式、プログレッシブダウンロード形式、ライブ形式であっても良い。 As shown in Figure 1A, the video is being played on the screen (in this case, the touch panel 6) of the user terminal 5 (for example, the viewer's smartphone or tablet). "Video" refers to moving images delivered as data from the distribution entity (in this case, the video distribution server 1, described later), such as variety shows, dramas, news, sports, music, and anime. The video distribution format is not limited to streaming, as will be described later; it may also be download, progressive download, or live.

図１Ａに示されるタッチパネル６上には、上半身にスポーツウェア（具体的には、ジャケット）を着た人物の様子が映し出されている。本実施形態では、動画に含まれるフレーム（静止画）には予めアイテム領域が設定されていることにより、動画は、アイテム領域が設定されたアイテムに関する情報（以下、「アイテム領域情報」と呼ぶことがある）を有している。 Figure 1A shows a touch panel 6 displaying an image of a person wearing sportswear (specifically, a jacket) on their upper body. In this embodiment, since item areas are pre-defined in the frames (still images) included in the video, the video contains information about the items for which the item areas are defined (hereinafter sometimes referred to as "item area information").

ここで、「アイテム領域」とは、動画の中に映し出されたアイテムの、画面上における領域である。本実施形態では、動画に映し出されたアイテムに対してアイテム領域を設定し、そのアイテム領域情報まで視聴者を誘導することにより、動画中のアイテム領域情報を視聴者に簡易な方法で提供することができる。例えば、動画中のアイテムに関心を持ったとき、視聴者がアイテム領域を選択するだけで、アイテム領域に対応づけられているアイテム領域情報を表示することができる。本実施形態では、画面上のジャケットの領域に予めアイテム領域が設定されている。アイテム領域は、図１Ｂでは、矩形の破線の枠４１で表されている。 Here, "item area" refers to the area on the screen corresponding to an item displayed in the video. In this embodiment, by setting an item area for an item displayed in the video and guiding the viewer to that item area information, item area information within the video can be provided to the viewer in a simple manner. For example, when a viewer becomes interested in an item in the video, they can simply select the item area to display the item area information associated with that area. In this embodiment, an item area is pre-set in the area of the jacket on the screen. In Figure 1B, the item area is represented by a rectangular dashed frame 41.

但し、アイテム領域を表す枠４１は、実際のタッチパネル６上では非表示である。これにより、アイテム領域を表す枠４１が動画の視聴に邪魔になることを抑制することができる。つまり、図１Ｂでは、説明の便宜上、アイテム領域を示す枠４１を表している。但し、アイテム領域を表す枠４１は、動画の再生中において実際のタッチパネル６上で非表示ではなく、表示されていても良い。 However, the frame 41 representing the item area is not displayed on the actual touch panel 6. This prevents the frame 41 from interfering with video viewing. In other words, Figure 1B shows the frame 41 indicating the item area for illustrative purposes. However, the frame 41 representing the item area may be displayed on the actual touch panel 6 during video playback, rather than being hidden.

動画の再生時にはフレーム（静止画）が順次切り替えられて表示されるため、動画の画面上（フレーム上）のジャケットの占める領域は刻々と変化することになる。しかし、本実施形態では、動画に追随するようにアイテム領域も刻々と変化するように設定されており、アイテム領域を表す枠４１も刻々と変形する。 During video playback, frames (still images) are displayed sequentially, so the area occupied by the album art on the video screen (on the frame) changes constantly. However, in this embodiment, the item area is also set to change constantly in accordance with the video, and the frame 41 representing the item area also deforms constantly.

ユーザー端末５は、図１Ｂに示されるように、予め動画に設定されているアイテム領域がタップ操作されたことを検出すると、そのアイテム領域に対応付けられているアイテム領域情報をストック情報として記憶する。ここで、「タップ操作」とは、画面を指で叩くように触れて離す操作であり、「タッチ操作」に含まれる。また、「タッチ操作」とは、指で画面を触れて行う操作の総称である。「タッチ操作」には、上述した「タップ操作」のほか、「スワイプ」、「ダブルタップ」、「フリック」、「スクロール」、「ドラッグ」、「長押し」及び「ピンチイン又はピンチアウト」の各種操作が含まれる。なお、ユーザー端末５は、タップ操作以外の他のタッチ操作を検出した場合に、そのアイテム領域に対応付けられているアイテム領域情報をストック情報として記憶しても良い。 As shown in Figure 1B, when the user terminal 5 detects that an item area pre-set in the video has been tapped, it stores the item area information associated with that item area as stock information. Here, "tap operation" refers to the action of touching and releasing the screen with a finger, and is included in "touch operation." Furthermore, "touch operation" is a general term for operations performed by touching the screen with a finger. "Touch operation" includes not only the aforementioned "tap operation," but also various operations such as "swipe," "double tap," "flick," "scroll," "drag," "long press," and "pinch in or pinch out." The user terminal 5 may also store the item area information associated with other touch operations besides tap operations as stock information.

ユーザー端末５には、図１Ｃに示されるように、所定のアイテム領域情報がストック情報として記憶されると、タッチパネル６上のストック情報表示部４３にストック情報（ストックされたアイテム領域情報）に対応するアイテム画像４４（例えばサムネイル画像）が表示される。すなわち、視聴者は、動画内のジャケットに興味を持ったときに、タッチパネル６上のジャケットをタップすると、そのジャケットのアイテム領域情報をストックさせることができる。また、視聴者は、そのジャケットのアイテム領域情報がストック情報として記憶されたことをストック情報表示部４３で確認できる。 As shown in Figure 1C, when predetermined item area information is stored as stock information on the user terminal 5, an item image 44 (e.g., a thumbnail image) corresponding to the stock information (stocked item area information) is displayed on the stock information display unit 43 on the touch panel 6. That is, when a viewer becomes interested in an album cover in a video, they can tap the cover on the touch panel 6 to stock the item area information for that cover. Furthermore, the viewer can confirm on the stock information display unit 43 that the item area information for that cover has been stored as stock information.

なお、図１Ｂ及び図１Ｃでは不図示だが、視聴者がアイテム領域情報をストックする際、そのアイテム領域に対応付けられているアイテム画像（例えばサムネイル画像）が表示されても良い。ユーザー端末５は、例えば、予め動画に設定されているアイテム領域がタップ操作されたことを検出すると、そのアイテム領域に対応付けられているアイテム画像を表示する。視聴者は、仮にアイテム領域を示す枠４１（図１Ｂ中の矩形の破線）が非表示だとしても、動画内のジャケットに興味を持ったときにタッチパネル６上のジャケットをタップするとジャケットのアイテム画像が表示されるため、そのジャケットの関連情報を取得できることを認識できる。なお、ユーザー端末５は、タップ操作ではなく、ダブルタップ操作等の他のタッチ操作を検出した場合に、そのアイテム領域に対応付けられているアイテム画像を表示しても良い。なお、視聴者は、複数のアイテム領域情報をストックしても良い。 Although not shown in Figures 1B and 1C, when a viewer stores item area information, the item image (e.g., a thumbnail image) associated with that item area may be displayed. For example, when the user terminal 5 detects that an item area pre-set in the video has been tapped, it displays the item image associated with that item area. Even if the frame 41 indicating the item area (the dashed rectangle in Figure 1B) is hidden, the viewer can recognize that by tapping the jacket on the touch panel 6 when interested in the jacket in the video, the item image of the jacket will be displayed, allowing them to obtain related information about that jacket. Furthermore, the user terminal 5 may also display the item image associated with an item area when it detects other touch operations, such as a double tap, instead of a single tap. Additionally, the viewer may store information for multiple item areas.

ユーザー端末５は、ストック情報表示部４３に表示されたアイテム画像４４の領域がタップ操作されたことを検出すると、そのアイテム画像４４（アイテム領域情報）に対応付けられているイベント情報に従った処理（例えば、ジャケットの購入ページ（決済ページ）の表示）を行う。なお、ユーザー端末５は、タップ操作以外の他のタッチ操作を検出した場合に、そのアイテム画像４４（アイテム領域情報）に対応付けられているイベント情報に従った処理を行っても良い。 When the user terminal 5 detects that an item image 44 displayed on the stock information display unit 43 has been tapped, it performs processing according to the event information associated with that item image 44 (item area information) (for example, displaying the jacket purchase page (payment page)). The user terminal 5 may also perform processing according to the event information associated with the item image 44 (item area information) if it detects any touch operation other than a tap.

ここでは、ジャケットのアイテム領域情報として、ジャケットの購入ページ（決済ページ）のアドレスが対応付けられており、視聴者がジャケットのアイテム画像４４をタップ操作すると、タッチパネル６上に、そのジャケットの購入ページ（決済ページ）が表示されることになる。なお、ジャケットの購入ページ（決済ページ）の表示画面は、ジャケットに関するアイテム領域情報であるとともに、ジャケットのアイテム画像４４でもある。ウェブページの表示方法は、再生中の動画とともにマルチ画面として表示しても良いし、単独で表示しても良い。 Here, the item area information for the jacket is associated with the address of the jacket's purchase page (payment page). When the viewer taps the jacket's item image 44, the jacket's purchase page (payment page) is displayed on the touch panel 6. The display screen for the jacket's purchase page (payment page) is both the item area information for the jacket and the jacket's item image 44. The webpage can be displayed as part of a multi-screen setup alongside the currently playing video, or it can be displayed independently.

以下の説明では、視聴者が視聴中の動画に起因して端末を操作することを「アクション」と呼ぶことがある。アクションは、上述したタッチ操作（タップ操作やスワイプ操作）に限られない。視聴中の動画に起因して視聴者が行う操作は、例えば、音声入力であっても良いし、ユーザー端末５がパーソナルコンピュータの場合は、マウスのクリックや、キーボードによる入力であっても良い。アクションは、クリック、タッチ、音声入力及びキーボードによる入力のいずれかの操作、又は、いずれかの操作の組み合わせにより実行される。 In the following explanation, actions taken by viewers in response to the video they are watching may be referred to as "actions." Actions are not limited to the touch operations (taps and swipes) described above. Actions taken by viewers in response to the video they are watching may include, for example, voice input, or, if the user's device 5 is a personal computer, mouse clicks or keyboard input. Actions are performed by clicking, touching, voice input, keyboard input, or a combination of any of these operations.

また、上述した例では、アクションが実行される端末（装置）は、動画が表示されている端末（装置）と同じ端末（装置）である。すなわち、動画が表示されたユーザー端末５において、タップ操作によるストック情報の記憶や、タップ操作によるアイテムの購入ページ（決済ページ）の表示が実行されている。 Furthermore, in the example described above, the terminal (device) on which the action is performed is the same terminal (device) on which the video is displayed. That is, on user terminal 5, where the video is displayed, the storage of stock information via tap operations and the display of the item purchase page (payment page) via tap operations are performed.

しかし、アクションが実行される端末（装置）は、動画が表示されている端末とは異なる端末（装置）であっても良い。アクションは、動画が表示されたユーザー端末５とは別の端末、例えば、別のスマートフォンにおいて実行されても良い。 However, the terminal (device) on which the action is performed may be different from the terminal (device) on which the video is displayed. The action may be performed on a terminal other than the user terminal 5 on which the video is displayed, for example, on a different smartphone.

また、動画に含まれるフレーム（静止画）にはアイテム領域が設定されていなくても良い。すなわち、動画は、アイテム領域情報を有していなくても良い。 Furthermore, the frames (still images) included in the video do not need to have item areas defined. In other words, the video does not need to contain item area information.

＜質問作成システム１００＞
次に、図２を参照しつつ、本実施形態の質問作成装置１０を有する質問作成システム１００の構成について説明する。 <Question Creation System 100>
Next, with reference to Figure 2, the configuration of the question creation system 100 having the question creation device 10 of this embodiment will be described.

図２は、本実施形態の質問作成装置１０を有する質問作成システム１００を示す図である。 Figure 2 shows a question creation system 100 having the question creation device 10 of this embodiment.

質問作成システム１００は、アイテム関連情報を得るための質問を作成するためのシステムである。なお、アイテム関連情報には、アイテムに直接関係する情報（例えば、アイテムの名称、大きさ、色、価格等や、上述した図１におけるアイテム領域情報）に加え、アイテムの間接的な情報（例えば、アイテムに関係するおすすめ情報等）も含まれる。 The question creation system 100 is a system for creating questions to obtain item-related information. This item-related information includes not only information directly related to the item (e.g., item name, size, color, price, etc., and item area information in Figure 1 above), but also indirect information about the item (e.g., recommended information related to the item).

本実施形態では、視聴者が動画の中に映し出された対象（アイテム）に関心を持ったときに、表示された動画に対してアクションを実行するだけで、アイテム関連情報を得るための質問を容易に作成することができる。例えば、視聴者がチャットボットにおける質問を作成する際に、仮にユーザー端末上でテキスト入力を行うとすると、動画の視聴の妨げとなってしまう。これに対し、本実施形態によれば、表示された動画に対してアクションを実行するだけで、アイテム関連情報を得るための質問を自動的に作成することができ、動画の視聴の妨げとなることを抑制することができる。質問作成システム１００では、後述する通信ネットワーク９を介した複数の装置により、質問作成処理が実行される。 In this embodiment, when a viewer becomes interested in an object (item) displayed in a video, they can easily create a question to obtain item-related information simply by performing an action on the displayed video. For example, if a viewer were to input text on their user terminal when creating a question for a chatbot, it would interfere with video viewing. In contrast, this embodiment allows for the automatic creation of questions to obtain item-related information simply by performing an action on the displayed video, thus minimizing interference with video viewing. In the question creation system 100, the question creation process is executed by multiple devices via the communication network 9 described later.

質問作成システム１００は、図２に示されるように、動画配信サーバー１と、メタデータ配信サーバー３と、ユーザー端末５と、質問作成装置１０と、解析装置２０と、回答作成装置３０とを有する。動画配信サーバー１と、メタデータ配信サーバー３と、ユーザー端末５と、質問作成装置１０と、解析装置２０と、回答作成装置３０とは、通信ネットワーク９を介して相互に通信可能に接続されている。ここで、通信ネットワーク９は、例えばインターネット、電話回線網、無線通信網、ＬＡＮ、ＶＡＮなどであり、ここではインターネットを想定している。 The question creation system 100, as shown in Figure 2, comprises a video distribution server 1, a metadata distribution server 3, a user terminal 5, a question creation device 10, an analysis device 20, and an answer creation device 30. The video distribution server 1, metadata distribution server 3, user terminal 5, question creation device 10, analysis device 20, and answer creation device 30 are interconnected and can communicate with each other via a communication network 9. Here, the communication network 9 could be, for example, the Internet, a telephone network, a wireless communication network, a LAN, or a VAN; in this case, the Internet is assumed.

動画配信サーバー１は、動画を配信するためのサーバーである。本実施形態では、動画配信サーバー１は、ストリーミング形式で動画データをユーザー端末５に配信する。但し、動画データの配信方法は、ダウンロード形式でも良いし、プログレッシブダウンロード形式でも良い。なお、ストリーミング形式の配信の場合には、ユーザー端末５にて動画データを一時的に記憶し、ダウンロード形式の配信の場合には、ユーザー端末５にてダウンロードされた動画データを記憶して保持することになる。さらに、動画配信サーバー１は、ライブ形式で動画を配信しても良い。 The video distribution server 1 is a server for distributing video. In this embodiment, the video distribution server 1 distributes video data to the user terminal 5 in streaming format. However, the video data distribution method may also be download format or progressive download format. In the case of streaming format distribution, the user terminal 5 temporarily stores the video data, and in the case of download format distribution, the user terminal 5 stores and retains the downloaded video data. Furthermore, the video distribution server 1 may also distribute video in live format.

メタデータ配信サーバー３は、上述のアイテム領域情報を含むメタデータを配信するためのサーバーである。本実施形態では、メタデータの一部を動画再生前にプリロード形式にて配信するとともに、メタデータのその他の一部をプログレッシブダウンロード形式で配信する。但し、メタデータの配信方法は、これに限られるものではなく、例えばダウンロード形式でも良いし、ストリーミング形式でも良い。 The metadata distribution server 3 is a server for distributing metadata, including the item area information described above. In this embodiment, a portion of the metadata is distributed in a preload format before video playback, while another portion of the metadata is distributed in a progressive download format. However, the method of metadata distribution is not limited to this; for example, it may be in download format or streaming format.

本実施形態では、説明の便宜のためメタデータと動画データとを切り離して説明しているが、メタデータが動画データ（動画ファイル）に格納されていてもよい。メタデータが動画データに格納された状態で動画配信サーバー１が動画データを配信する場合、質問作成システム１００は、メタデータ配信サーバー３を備えていなくても良い。 In this embodiment, for the sake of explanation, metadata and video data are described separately, but metadata may be stored within the video data (video file). If the video distribution server 1 distributes video data with metadata stored within it, the question creation system 100 does not need to have a metadata distribution server 3.

なお、メタデータ配信サーバー３が配信するメタデータは、メタデータ配信サーバー３で作成されたものであっても良いし、不図示のメタデータ作成端末によって作成されたものであっても良い。 Furthermore, the metadata distributed by the metadata distribution server 3 may be created by the metadata distribution server 3, or it may be created by a metadata creation terminal (not shown).

ユーザー端末５は、動画の再生が可能な情報端末（動画再生装置）である。ここでは、ユーザー端末５は、スマートフォンである。但し、ユーザー端末５は、スマートフォンに限られるものではなく、例えばタブレット型の携帯端末であっても良いし、パーソナルコンピュータであっても良い。ユーザー端末５は、不図示のＣＰＵ、メモリ、記憶装置、通信モジュール、タッチパネル６（表示部７Ａ、入力部７Ｂに相当）などのハードウェアを備えている。ユーザー端末５には、動画再生プログラムがインストールされており、ユーザー端末５が動画再生プログラムを実行することによって、上述の各種動作が実現されることになる。なお、動画再生プログラムは、不図示のプログラム配信サーバーからユーザー端末５にダウンロードすることが可能である。 User terminal 5 is an information terminal (video playback device) capable of playing videos. Here, user terminal 5 is a smartphone. However, user terminal 5 is not limited to a smartphone; for example, it could be a tablet-type mobile device or a personal computer. User terminal 5 is equipped with hardware such as a CPU, memory, storage device, communication module, and touch panel 6 (corresponding to display unit 7A and input unit 7B), which are not shown. A video playback program is installed on user terminal 5, and the various operations described above are realized when user terminal 5 executes this video playback program. The video playback program can be downloaded to user terminal 5 from a program distribution server (not shown).

ユーザー端末５は、表示部７Ａと、入力部７Ｂと、制御部８Ａと、通信部８Ｂとを備えている。 The user terminal 5 comprises a display unit 7A, an input unit 7B, a control unit 8A, and a communication unit 8B.

表示部７Ａは、各種画面を表示するための機能である。本実施形態では、表示部７Ａは、タッチパネル６のディスプレイや、そのディスプレイの表示を制御するコントローラー等によって実現されている。入力部７Ｂは、ユーザーからの指示を入力・検出するための機能である。本実施形態では、入力部７Ｂは、タッチパネル６のタッチセンサー等によって実現されている。なお、ユーザー端末５がスマートフォンや、タブレット型の携帯端末の場合は、表示部７Ａ及び入力部７Ｂが主にタッチパネル６によって実現されているが、表示部７Ａ及び入力部７Ｂが、別々の部品で構成されても良い。例えば、ユーザー端末５がパーソナルコンピュータの場合には、表示部７Ａは、例えば液晶ディスプレイ等によって構成され、入力部７Ｂは、マウスやキーボード等によって構成されることになる。 The display unit 7A is responsible for displaying various screens. In this embodiment, the display unit 7A is implemented by the display of the touch panel 6 and a controller that controls the display of that display. The input unit 7B is responsible for inputting and detecting user instructions. In this embodiment, the input unit 7B is implemented by the touch sensor of the touch panel 6. Note that if the user terminal 5 is a smartphone or a tablet-type mobile terminal, the display unit 7A and the input unit 7B are mainly implemented by the touch panel 6, but the display unit 7A and the input unit 7B may be composed of separate components. For example, if the user terminal 5 is a personal computer, the display unit 7A would be composed of, for example, a liquid crystal display, and the input unit 7B would be composed of a mouse or keyboard.

制御部８Ａは、ユーザー端末５を制御する機能である。制御部８Ａは、動画データを処理して動画を再生（表示）するための機能や、メタデータを処理するための機能などを有する。また、制御部８Ａは、ウェブページの情報を取得して、ウェブページを表示させるブラウザ機能なども有する。本実施形態では、制御部８Ａは、不図示のＣＰＵや、動画再生プログラムを記憶したメモリ及び記憶装置等によって実現されている。 The control unit 8A controls the user terminal 5. The control unit 8A has functions for processing video data and playing (displaying) the video, as well as functions for processing metadata. Furthermore, the control unit 8A also has browser functions for acquiring web page information and displaying web pages. In this embodiment, the control unit 8A is implemented by a CPU (not shown), memory and storage devices that store the video playback program, etc.

通信部８Ｂは、通信ネットワーク９に接続するための機能である。通信部８Ｂは、動画配信サーバー１から動画データを受信したり、メタデータ配信サーバー３からメタデータを受信したり、動画配信サーバー１やメタデータ配信サーバー３にデータを要求したりする。 The communication unit 8B is responsible for connecting to the communication network 9. The communication unit 8B receives video data from the video distribution server 1, receives metadata from the metadata distribution server 3, and requests data from the video distribution server 1 and the metadata distribution server 3.

ユーザー端末５は、不図示であるが、上述した構成のほか、動画データを記憶する機能を有する動画データ記憶部や、メタデータを記憶する機能を有するメタデータ記憶部や、ストックされたアイテム領域情報を動画データに対応付けて記憶するストック情報記憶部を有していても良い。 Although not shown, the user terminal 5 may also have, in addition to the configuration described above, a video data storage unit with the function of storing video data, a metadata storage unit with the function of storing metadata, and a stock information storage unit that stores stocked item area information in association with video data.

質問作成装置１０は、アイテム関連情報を得るための質問を作成するための装置である。質問作成装置１０は、後述する質問作成処理を実行する。本実施形態では、質問作成装置１０の不図示の各種ハードウェアと各種ソフトウェアとの協働により、質問作成装置１０における質問作成処理を含む各種処理の実行が可能になる。 The question creation device 10 is a device for creating questions to obtain item-related information. The question creation device 10 executes the question creation process described later. In this embodiment, the execution of various processes, including the question creation process, becomes possible through the cooperation of various hardware and software (not shown) of the question creation device 10.

質問作成装置１０は、例えば、サーバー等のコンピュータであり、演算装置（ＣＰＵなど）、メモリ、記憶装置、通信装置などで構成されている。記憶装置には、質問作成プログラムを含む各種のプログラムや各種のデータが記憶されている。演算装置が記憶装置に記憶されている質問作成プログラムをメモリに読み出して実行することにより、質問作成処理を含む各種処理、すなわち、後述の各部位（要素出力部１１、テキスト取得部１２、質問作成部１３及び質問出力部１４）の機能が実現される。要素出力部１１、テキスト取得部１２、質問作成部１３及び質問出力部１４の機能の詳細については、後述する。 The question generation device 10 is, for example, a computer such as a server, and is composed of an arithmetic unit (CPU, etc.), memory, storage device, communication device, etc. The storage device stores various programs, including the question generation program, and various data. The arithmetic unit reads the question generation program stored in the storage device into memory and executes it, thereby realizing various processes, including the question generation process, i.e., the functions of each part described later (element output unit 11, text acquisition unit 12, question generation unit 13, and question output unit 14). Details of the functions of the element output unit 11, text acquisition unit 12, question generation unit 13, and question output unit 14 will be described later.

本実施形態の質問作成装置１０は、複数のコンピュータを有していても良い。そして、ネットワークを介した当該複数のコンピュータの協働により、質問作成処理を含む各種処理が実行されても良い。図２に示される質問作成システム１００では、質問作成装置１０は、一つのユーザー端末５と通信ネットワーク９を介して接続しているが、質問作成装置１０は、多数のユーザー端末５と通信ネットワーク９を介して接続していても良い。 The question creation device 10 in this embodiment may have multiple computers. Furthermore, various processes, including the question creation process, may be executed through the cooperation of these multiple computers via a network. In the question creation system 100 shown in Figure 2, the question creation device 10 is connected to one user terminal 5 via a communication network 9, but the question creation device 10 may be connected to multiple user terminals 5 via the communication network 9.

解析装置２０は、動画における動画要素を解析し、テキスト化要素を作成する装置である。 The analysis device 20 is a device that analyzes video elements in a video and creates text elements.

ここで、動画要素とは、動画においてデータとして抽出可能な要素であり、動画要素は、例えば、動画に関する、画像情報、音声情報、文字情報の少なくとも１つ以上を有する。なお、画像情報は、例えば、動画中の人物や物の画像データであり、アイテム領域情報が含まれた（アイテム領域が設定された）画像データに限られず、アイテム領域情報が含まれない（アイテム領域が設定されていない）画像データも含まれる。音声情報は、動画中の音、セリフ、ＢＧＭ等の音声データである。文字情報には、字幕やテロップ等の動画内の文字情報に限られず、視聴者コメント等の動画外の文字情報も含まれる。 Here, a video element is an element that can be extracted as data from a video. A video element, for example, has at least one of the following elements related to the video: image information, audio information, and text information. Note that image information includes, for example, image data of people or objects in the video, and is not limited to image data that includes item area information (i.e., image data for which an item area has been set), but also includes image data that does not include item area information (i.e., image data for which an item area has not been set). Audio information is audio data such as sounds, dialogue, and background music in the video. Text information is not limited to text information within the video, such as subtitles and captions, but also includes text information outside the video, such as viewer comments.

また、テキスト化要素とは、上述した動画要素を表す文字情報である。例えば、動画要素が上述した図１に示されるジャケットの画像データである場合、「服」、「スポーツウェア」、「ジャケット」等の文字情報であり、ジャケットの色が赤色である場合、「赤」等の文字情報である。なお、動画要素とテキスト化要素とは一対一に対応する必要はなく、一つの動画要素に、複数の「テキスト化要素」が対応していても良い。 Furthermore, a text element is textual information that represents the video element mentioned above. For example, if the video element is image data of a jacket as shown in Figure 1 above, the textual information would be "clothing," "sportswear," "jacket," etc. If the jacket is red, the textual information would be "red," etc. Note that there is no one-to-one correspondence between video elements and text elements; a single video element may be associated with multiple text elements.

本実施形態では、解析装置２０の不図示の各種ハードウェアと各種ソフトウェアとの協働により、解析装置２０における解析処理を含む各種処理の実行が可能になる。 In this embodiment, the analysis device 20 can perform various processes, including analysis processing, through the cooperation of various hardware (not shown) and various software.

解析装置２０は、例えば、サーバー等のコンピュータであり、演算装置（ＣＰＵなど）、メモリ、記憶装置、通信装置などで構成されている。記憶装置には、解析プログラムを含む各種のプログラムや各種のデータが記憶されている。演算装置が記憶装置に記憶されている解析プログラムをメモリに読み出して実行することにより、解析処理を含む各種処理が実現される。本実施形態の解析装置２０は、複数のコンピュータを有していても良い。そして、ネットワークを介した当該複数のコンピュータの協働により、解析処理を含む各種処理が実行されても良い。 The analysis device 20 is, for example, a computer such as a server, and is composed of a processing unit (CPU, etc.), memory, storage device, communication device, etc. The storage device stores various programs, including the analysis program, and various data. The processing unit reads the analysis program stored in the storage device into memory and executes it, thereby realizing various processes, including the analysis process. The analysis device 20 in this embodiment may have multiple computers. Furthermore, various processes, including the analysis process, may be executed through the cooperation of these multiple computers via a network.

解析装置２０は、具体的には、大規模言語モデル、テキスト検索エンジン、画像検索・画像生成エンジン、音源検索・音源生成エンジン、映像検索・映像生成エンジン、レコメンドエンジン等である。解析装置２０は、ＡＩ（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）を用いた装置に限られず、動画における動画要素を解析し、テキスト化要素を作成する装置であれば、ＡＩを用いない装置であっても良い。 The analysis device 20 specifically includes large-scale language models, text search engines, image search/generation engines, sound source search/generation engines, video search/generation engines, recommendation engines, etc. The analysis device 20 is not limited to devices using AI (Artificial Intelligence); any device that analyzes video elements in a video and creates text elements, even if it does not use AI, is acceptable.

解析装置２０では、例えば、テキスト検索エンジンと画像検索エンジンとが別々のコンピュータで構成されても良い。また、テキスト検索エンジンを構成するコンピュータが複数のコンピュータで構成されても良い。 In the analysis device 20, for example, the text search engine and the image search engine may be configured on separate computers. Furthermore, the computer constituting the text search engine may be composed of multiple computers.

なお、本実施形態では、解析装置２０は、質問作成装置１０と別の装置であるとして説明した。すなわち、解析装置２０は、質問作成装置１０の外部の装置であるとして説明した。但し、質問作成装置１０が解析装置２０の機能を備えていても良い。また、質問作成システム１００には、解析装置２０が含まれていなくても良い。 In this embodiment, the analysis device 20 was described as a separate device from the question creation device 10. That is, the analysis device 20 was described as an external device to the question creation device 10. However, the question creation device 10 may also possess the functions of the analysis device 20. Furthermore, the question creation system 100 does not necessarily include the analysis device 20.

回答作成装置３０は、上述したテキスト化要素から作成された動画要素に関する質問候補に対する回答を作成する装置である。本実施形態では、回答作成装置３０の不図示の各種ハードウェアと各種ソフトウェアとの協働により、回答作成装置３０における回答作成処理を含む各種処理の実行が可能になる。 The answer generation device 30 is a device that generates answers to candidate questions related to video elements created from the text elements described above. In this embodiment, the cooperation of various hardware and software (not shown) of the answer generation device 30 enables the execution of various processes, including the answer generation process.

回答作成装置３０は、例えば、サーバー等のコンピュータであり、演算装置（ＣＰＵなど）、メモリ、記憶装置、通信装置などで構成されている。記憶装置には、回答作成プログラムを含む各種のプログラムや各種のデータが記憶されている。演算装置が記憶装置に記憶されている回答作成プログラムをメモリに読み出して実行することにより、回答作成処理を含む各種処理が実現される。本実施形態の回答作成装置３０は、複数のコンピュータを有していても良い。そして、ネットワークを介した当該複数のコンピュータの協働により、回答作成処理を含む各種処理が実行されても良い。 The answer generation device 30 is, for example, a computer such as a server, and is composed of an arithmetic unit (CPU, etc.), memory, storage device, communication device, etc. The storage device stores various programs, including the answer generation program, and various data. The arithmetic unit reads the answer generation program stored in the storage device into memory and executes it, thereby realizing various processes, including the answer generation process. The answer generation device 30 in this embodiment may have multiple computers. Furthermore, various processes, including the answer generation process, may be executed through the cooperation of these multiple computers via a network.

回答作成装置３０は、具体的には、大規模言語モデル、テキスト検索エンジン、画像検索・画像生成エンジン、音源検索・音源生成エンジン、映像検索・映像生成エンジン、レコメンドエンジン等である。本実施形態では、具体的には、回答作成装置３０は、ＣＨＡＴＧＰＴ（登録商標）等の、大規模な言語モデルに基づいて構築された生成ＡＩである。回答作成装置３０は、ＡＩを用いた装置に限られず、質問候補に対する回答を作成する装置であれば、ＡＩを用いない装置であっても良い。 The answer generation device 30 specifically includes a large-scale language model, a text search engine, an image search/image generation engine, an audio search/audio generation engine, a video search/video generation engine, a recommendation engine, etc. In this embodiment, specifically, the answer generation device 30 is a generative AI built on a large-scale language model such as CHATGPT®. The answer generation device 30 is not limited to an AI-based device; any device that generates answers to candidate questions may also be an AI-based device.

なお、本実施形態では、回答作成装置３０は、質問作成装置１０と別の装置であるとして説明した。すなわち、回答作成装置３０は、質問作成装置１０の外部の装置であるとして説明した。但し、質問作成装置１０が回答作成装置３０の機能を備えていても良い。また、質問作成システム１００には、回答作成装置３０が含まれていなくても良い。 In this embodiment, the answer generation device 30 was described as a separate device from the question generation device 10. That is, the answer generation device 30 was described as an external device to the question generation device 10. However, the question generation device 10 may also possess the functions of the answer generation device 30. Furthermore, the question generation system 100 does not necessarily include the answer generation device 30.

＜質問作成装置１０＞
質問作成装置１０は、図２に示されるように、要素出力部１１と、テキスト取得部１２と、質問作成部１３と、質問出力部１４と、記録部１６とを有する。 <Question generation device 10>
As shown in Figure 2, the question creation device 10 includes an element output unit 11, a text acquisition unit 12, a question creation unit 13, a question output unit 14, and a recording unit 16.

要素出力部１１は、表示された動画に対してアクションが実行されたときに、動画における動画要素（例えば、ジャケットの画像情報）を抽出し、解析装置２０に出力する部位である。上述したように、解析装置２０は、要素出力部１１から出力された動画要素（例えば、ジャケットの画像情報）を解析し、テキスト化要素（例えば、「ジャケット」という文字情報）を作成する。 The element output unit 11 is the part that extracts video elements (e.g., image information of the jacket) from the displayed video when an action is performed on the video, and outputs them to the analysis device 20. As described above, the analysis device 20 analyzes the video elements (e.g., image information of the jacket) output from the element output unit 11 and creates text elements (e.g., the text information "jacket").

また、解析装置２０は、一つの動画要素（例えば、ジャケットの画像情報）に対して、複数のテキスト化要素（例えば、「ジャケット」、「服」、「スポーツウェア」という文字情報）を作成しても良い。解析装置２０は、要素出力部１１から出力された動画要素（例えば、ジャケットの画像情報）に基づいてテキスト化要素（例えば、「ジャケット」という文字情報）を作成し、テキスト取得部１２に出力する。 Furthermore, the analysis device 20 may create multiple text elements (for example, text information such as "jacket," "clothing," and "sportswear") for a single video element (for example, image information of a jacket). The analysis device 20 creates text elements (for example, text information such as "jacket") based on the video element (for example, image information of a jacket) output from the element output unit 11, and outputs them to the text acquisition unit 12.

テキスト取得部１２は、解析装置２０により動画要素に基づいて作成された少なくとも１つのテキスト化要素を取得する部位である。 The text acquisition unit 12 is the part that acquires at least one text element created based on the video elements by the analysis device 20.

質問作成部１３は、テキスト取得部１２が取得したテキスト化要素（例えば、「ジャケット」という文字情報）から、動画要素に関する質問候補（例えば、「ジャケットの値段はいくら？」という文字情報）を作成する部位である。質問作成部１３は、一つのテキスト化要素（例えば、「ジャケット」という文字情報）から、複数の質問候補（例えば、「ジャケットの値段はいくら？」、「ジャケットはどこで売っている？」という文字情報）を作成しても良い。 The question generation unit 13 is responsible for creating candidate questions (for example, "How much does the jacket cost?") related to video elements from the text elements (for example, the text information "jacket") acquired by the text acquisition unit 12. The question generation unit 13 may also create multiple candidate questions (for example, "How much does the jacket cost?", "Where is the jacket sold?") from a single text element (for example, the text information "jacket").

質問出力部１４は、質問作成部１３が作成した質問候補を表示装置（例えば、ユーザー端末５の表示部７Ａ）に出力する部位である。これにより、視聴者が表示装置を介して質問候補を目視することができる。 The question output unit 14 is the part that outputs the question candidates created by the question creation unit 13 to a display device (for example, the display unit 7A of the user terminal 5). This allows viewers to visually view the question candidates via the display device.

記録部１６は、上述した動画要素、テキスト化要素及び質問候補のデータが記録される部位である。但し、記録部１６は、動画要素、テキスト化要素及び質問候補のデータ以外のデータが記録されても良い。また、質問作成装置１０が記録部１６を有していなくても良く、ユーザー端末５が記録部１６を有していても良い。具体的には、ユーザー端末５にメモリ領域としての記録部１６が確保されていても良い。これによって視聴者のプライバシーを守りつつ、特定の動画要素、テキスト化要素及び質問候補をユーザー端末５（すなわち、視聴者が有する端末）で記録することができる。さらに、質問作成装置１０とユーザー端末５との両方が記録部１６を有していても良い。 The recording unit 16 is the part where the video elements, text elements, and question candidate data described above are recorded. However, the recording unit 16 may also record data other than the video elements, text elements, and question candidate data. Furthermore, the question creation device 10 does not necessarily have a recording unit 16; the user terminal 5 may have a recording unit 16. Specifically, the user terminal 5 may have a recording unit 16 as a memory area. This allows specific video elements, text elements, and question candidates to be recorded on the user terminal 5 (i.e., the terminal owned by the viewer) while protecting the viewer's privacy. Moreover, both the question creation device 10 and the user terminal 5 may have a recording unit 16.

＜質問作成処理の概要＞
次に、図３を参照しつつ、本実施形態の質問作成方法の一例となる質問作成処理について説明する。 <Overview of the question creation process>
Next, with reference to Figure 3, we will describe a question creation process that is an example of the question creation method of this embodiment.

図３は、本実施形態の質問作成装置１０が行う質問作成処理のフロー図である。 Figure 3 is a flowchart of the question creation process performed by the question creation device 10 of this embodiment.

まず、要素出力部１１は、表示された動画に対してアクションが実行されたときに、アクションが実行された動画における動画要素（例えば、ジャケットの画像情報）を抽出し、解析装置２０に出力する（Ｓ００１）。ここで、アクションは一つに限られず、複数のアクションが実行され、要素出力部１１は、複数のアクション間の少なくとも１つの動画要素を抽出しても良い。例えば、要素出力部１１は、視聴者が所定時間内に複数回タッチ操作した間の動画要素を抽出しても良い。これにより、要素出力部１１は、アクションが実行された時点の動画要素だけでなく、ある程度の時間幅（所定時間内の幅）における動画要素を抽出することができる。また、要素出力部１１は、視聴者が所定時間内に長押し操作した間の動画要素を抽出しても良い。 First, when an action is performed on the displayed video, the element output unit 11 extracts video elements (e.g., image information of the jacket) from the video in which the action was performed and outputs them to the analysis device 20 (S001). Here, the action is not limited to one; multiple actions may be performed, and the element output unit 11 may extract at least one video element between multiple actions. For example, the element output unit 11 may extract video elements during the period when the viewer performs multiple touch operations within a predetermined time. This allows the element output unit 11 to extract not only video elements at the time the action was performed, but also video elements within a certain time range (a predetermined time range). Furthermore, the element output unit 11 may extract video elements during the period when the viewer performs a long press operation within a predetermined time.

解析装置２０は、要素出力部１１から出力された動画要素（例えば、ジャケットの画像情報）に基づいてテキスト化要素（例えば、「ジャケット」という文字情報）を作成し、テキスト取得部１２に出力する。 The analysis device 20 creates text elements (for example, the text information "jacket") based on the video elements (for example, image information of the jacket) output from the element output unit 11, and outputs them to the text acquisition unit 12.

次に、テキスト取得部１２は、テキスト化要素（例えば、「ジャケット」という文字情報）を解析装置２０から取得する（Ｓ００２）。次に、質問作成部１３は、質問候補（例えば、「ジャケットの値段はいくら？」という文字情報）を作成する（Ｓ００３）。 Next, the text acquisition unit 12 acquires text elements (for example, the text information "jacket") from the analysis device 20 (S002). Then, the question creation unit 13 creates a question candidate (for example, the text information "How much does the jacket cost?") (S003).

本実施形態では、質問作成部１３は、種別情報に対応した質問テンプレートを使用して質問候補を作成する。ここで、「種別情報」とは、動画要素の属性に関する文字情報であり、当該動画要素のテキスト化要素に紐づくことができるデータである。例えば、動画要素がジャケットの画像データである場合、当該動画要素のテキスト化要素は「ジャケット」という文字情報であり、種別情報は、「固有名・商品名」という文字情報である。 In this embodiment, the question creation unit 13 creates question candidates using a question template corresponding to the type information. Here, "type information" refers to textual information related to the attributes of a video element, and is data that can be linked to the text element of that video element. For example, if the video element is image data of a jacket, the text element of that video element is the textual information "jacket," and the type information is the textual information "proper name/product name."

動画要素が、アイテム領域情報を含む（アイテム領域が設定された）画像情報である場合（図１に示される動画の画像データである場合）、種別情報は、当該動画要素のアイテム領域情報に含まれていても良い。すなわち、動画のアイテム領域情報として、種別情報が含まれていても良い。このとき、テキスト取得部１２は、種別情報（例えば、「固有名・商品名」という文字情報）を、テキスト化要素（例えば、「ジャケット」という文字情報）に紐づけて取得する。画像情報が有する種別情報を利用することで、種別情報に基づく精度の高い質問候補を作成することができる。 If a video element is image information that includes item area information (i.e., image data of a video as shown in Figure 1), then the type information may be included in the item area information of that video element. That is, the type information may be included as part of the video's item area information. In this case, the text acquisition unit 12 acquires the type information (for example, the text information "proper name/product name") by associating it with a text element (for example, the text information "jacket"). By utilizing the type information contained in the image information, it is possible to create highly accurate question candidates based on the type information.

動画要素が、アイテム領域情報を含まない画像情報、音声情報、文字情報の少なくとも１つ以上である場合、解析装置２０が、動画要素を解析し、種別情報（例えば、「固有名・商品名」という文字情報）に紐づくテキスト化要素（例えば、「ジャケット」という文字情報）を作成しても良い。テキスト取得部１２は、種別情報に紐づくテキスト化要素を取得する。これにより、動画要素が種別情報を有していない場合であっても、種別情報に対応した質問テンプレートを使用することができる。 If a video element consists of at least one of the following: image information, audio information, or text information, without item area information, the analysis device 20 may analyze the video element and create a text element (e.g., text information such as "jacket") associated with type information (e.g., text information such as "proper name/product name"). The text acquisition unit 12 acquires the text element associated with the type information. This allows the use of a question template corresponding to type information even if the video element does not have type information.

質問テンプレートは、記録部１６に予め記録されており、質問作成部１３は、記録部１６から質問テンプレートのデータを読み出して質問候補を作成する。 The question templates are pre-recorded in the recording unit 16, and the question creation unit 13 reads the question template data from the recording unit 16 to create candidate questions.

図４は、質問テンプレートの一例を示す図である。 Figure 4 shows an example of a question template.

図４に示されるように、質問テンプレートとして、例えば、「○○○○○をもっと詳しく知りたい」、「○○○○○の値段はいくら？」、「○○○○○はどこで売っている？」、「○○○○○の色違いを見たい」等の質問文のテンプレートが用意されている。ここで、質問テンプレートの「○○○○○」の部分は、テキスト化要素（例えば、「ジャケット」という文字情報）が入力される部分（以下、「入力要素」と呼ぶことがある）である。また、上述した４つの質問テンプレートは、「固有名・商品名」である種別情報に対応している。 As shown in Figure 4, question templates are provided, such as "I want to know more about ○○○○○," "How much does ○○○○○ cost?", "Where is ○○○○○ sold?", and "I want to see different colors of ○○○○○." Here, the "○○○○○" part of the question template is where text elements (for example, the text information "jacket") are entered (hereinafter sometimes referred to as "input elements"). Furthermore, the four question templates mentioned above correspond to category information, which is "proper nouns/product names."

種別情報（例えば、「固有名・商品名」という文字情報）に紐づくテキスト化要素（例えば、「ジャケット」という文字情報）の場合、質問作成部１３は、複数ある質問テンプレートのうち、種別情報「固有名・商品名」に対応する質問テンプレートを使用する。質問作成部１３が質問テンプレートを任意に（種別情報と無関係に）選択して使用するのではなく、種別情報に対応した質問テンプレートを使用することにより、より視聴者の意図に沿った正確な質問候補を作成することができる。 In the case of text elements (e.g., "jacket") associated with type information (e.g., text information such as "proper name/product name"), the question creation unit 13 uses the question template corresponding to the type information "proper name/product name" from among multiple question templates. By using a question template corresponding to the type information, rather than arbitrarily selecting and using a question template (unrelated to the type information), the question creation unit 13 can create more accurate question candidates that better align with the viewer's intent.

図５は、複数のテキスト化要素があった場合の質問テンプレートへのテキスト化要素への入力方法を示す図である。 Figure 5 shows how to input text elements into a question template when there are multiple text elements.

質問テンプレートは、一つの入力要素を有している場合に限られず、図５Ａに示されるように、複数の入力要素（ここでは、第１の入力要素及び第２の入力要素）を有していても良い。さらに、各々の入力要素は、所定の種別情報に紐づけられていても良い。すなわち、第１の入力要素（○○○）には、第１の種別情報（例えば、「人名」）に紐づけられ、第２の入力要素（△△△）には、第２の種別情報（例えば、「固有名・商品名」）に紐づけられていても良い。 The question template is not limited to having only one input element; as shown in Figure 5A, it may have multiple input elements (in this case, a first input element and a second input element). Furthermore, each input element may be linked to a predetermined type of information. That is, the first input element (○○○) may be linked to a first type of information (e.g., "person's name"), and the second input element (△△△) may be linked to a second type of information (e.g., "proper name/product name").

ここで、第１の入力要素に入力されるテキスト化要素と、第２の入力要素に入力されるテキスト化要素とが任意（種別情報と無関係）であるとすると、例えば、図５Ｂに示されるような質問候補が作成されてしまう場合がある。つまり、「ジャケットが着用していたＭｒ．ＡＢＣを詳しく知りたい」のように、質問として不自然な表現となる質問候補が作成されてしまう場合がある。 Here, if the text elements entered into the first input element and the text elements entered into the second input element are arbitrary (unrelated to type information), then, for example, a candidate question like the one shown in Figure 5B may be generated. In other words, a candidate question with an unnatural expression, such as "I want to know more about Mr. ABC, who was wearing the jacket," may be generated.

本実施形態では、各々の入力要素は、所定の種別情報に紐づけられているため、質問作成部１３は、図５Ｃに示されるように、第１の入力要素（○○○）に第１のテキスト化要素（Ｍｒ．ＡＢＣ）が入力され、第２の入力要素（△△△）に第２のテキスト化要素（ジャケット）が入力された質問候補を作成する。これにより、「Ｍｒ．ＡＢＣが着用していたジャケットを詳しく知りたい」のような、質問として自然な表現となる質問候補を作成することができる。言い換えると、図５Ｂに示されるような質問として不自然な表現となる質問候補を作成することを抑制することができる。 In this embodiment, since each input element is linked to predetermined type information, the question creation unit 13 creates a candidate question as shown in Figure 5C, where the first text element (Mr. ABC) is input to the first input element (○○○) and the second text element (Jacket) is input to the second input element (△△△). This makes it possible to create a candidate question that is expressed naturally, such as "I would like to know more about the jacket Mr. ABC was wearing." In other words, it is possible to suppress the creation of a candidate question that is expressed unnaturally, as shown in Figure 5B.

次に、質問出力部１４は、質問候補を表示装置に出力する（Ｓ００３）。これにより、視聴者が表示装置を介して質問候補を目視することができる。さらに、本実施形態では、動画の再生中に（動画の再生と並行して）質問候補が表示されることで、視聴者は、動画の視聴が妨げられずに、質問候補を確認することができる。 Next, the question output unit 14 outputs the question candidates to the display device (S003). This allows viewers to visually view the question candidates via the display device. Furthermore, in this embodiment, since the question candidates are displayed during video playback (in parallel with video playback), viewers can review the question candidates without interrupting their video viewing.

質問作成部１３が複数の質問候補を作成する場合、質問出力部１４は、動画の視聴者が選択可能な態様で複数の質問候補を表示装置に出力しても良い。これにより、視聴者が、複数の質問候補の中から、意図にあった質問候補を選択することができる。 When the question generation unit 13 generates multiple question candidates, the question output unit 14 may output the multiple question candidates to the display device in a manner that allows the video viewer to select one. This allows the viewer to select the question candidate that best suits their intentions from among the multiple candidates.

また、質問出力部１４は、複数の質問候補に優先度情報を付加して表示装置に出力しても良い。例えば、複数の質問候補が画面の上から下に優先度順に表示されても良い。また、複数の質問候補が優先度順に画面遷移して表示されても良い。さらに、複数の質問候補の各々に、関連度の情報（関連度○％）や、優先度順位の情報（１位、２位、３位…）が付加されて表示されても良い。これにより、視聴者が、複数の質問候補の中から、意図にあった質問候補を迅速に選択することができる。 Furthermore, the question output unit 14 may output multiple question candidates to the display device with priority information added. For example, multiple question candidates may be displayed from top to bottom of the screen in order of priority. Alternatively, multiple question candidates may be displayed in order of priority through screen transitions. Additionally, each of the multiple question candidates may be displayed with relevance information (relevance percentage) and priority ranking information (1st, 2nd, 3rd, etc.). This allows viewers to quickly select the question candidate that best suits their intent from among multiple candidates.

最後に、質問出力部１４は、視聴者が選択した質問候補（以下、単に「質問」と呼ぶことがある）を回答作成装置３０に出力する（Ｓ００４）。回答作成装置３０では、質問に対する回答を作成し、質問作成装置１０に出力する。質問作成装置１０では、質問に対する回答を表示装置（例えば、ユーザー端末５の表示部７Ａ）に出力する。これにより、視聴者が表示装置を介して質問に対する回答を目視することができる。 Finally, the question output unit 14 outputs the question candidates selected by the viewer (hereinafter sometimes simply referred to as "questions") to the answer creation device 30 (S004). The answer creation device 30 creates an answer to the question and outputs it to the question creation device 10. The question creation device 10 outputs the answer to the question to a display device (for example, the display unit 7A of the user terminal 5). This allows the viewer to visually view the answer to the question via the display device.

質問作成装置１０では、回答作成装置３０から取得した質問に対する回答が、記録部１６に記録しても良い。これにより、質問に対する回答をさらに解析し、配信する動画に反映する等役立てることができる。また、解析した結果から視聴者の傾向を抽出し、抽出した傾向からレコメンドを作成することもできる。 The question generation device 10 may also record the answers to the questions obtained from the answer generation device 30 in the recording unit 16. This allows for further analysis of the answers to the questions and their use in the distributed videos. Furthermore, viewer trends can be extracted from the analysis results, and recommendations can be created based on these trends.

＜質問作成処理の具体例＞
次に、図６～図１１を参照しつつ、質問作成処理の具体例について説明する。
・第１例
図６は、質問作成処理の第１例における画面の一例を示す図である。図７は、質問作成処理の第１例において、動画要素を取得し、質問候補を作成する際の画面の一例を示す図である。図８は、質問作成処理の第１例において、質問候補が表示された画面の一例を示す図である。 <Specific examples of the question creation process>
Next, we will explain a specific example of the question creation process, referring to Figures 6 to 11.
- First Example Figure 6 shows an example of the screen in the first example of the question creation process. Figure 7 shows an example of the screen when acquiring video elements and creating question candidates in the first example of the question creation process. Figure 8 shows an example of the screen where question candidates are displayed in the first example of the question creation process.

図６～図８に示される質問作成処理の第１例は、視聴者が、アイテム領域が設定されているアイテム（すなわち、ジャケット）に関心を持ったとき、当該アイテムのアイテム関連情報（ジャケットに関する情報）を得るための質問が作成される例である。 The first example of the question generation process shown in Figures 6 to 8 is an example where, when a viewer becomes interested in an item with an item area set (i.e., a jacket), a question is generated to obtain item-related information (information about the jacket) related to that item.

図６に示されるように、ユーザー端末５のタッチパネル６の右下には、質問ボタン５０が配置されている。これにより、視聴者は、質問候補が作成される機能の存在を認識することができる。ユーザー端末５は、図７に示されるように、予め動画に設定されているアイテム領域がタップ操作されたことを検出すると、質問作成装置１０が有する要素出力部１１、テキスト取得部１２及び質問作成部１３の各部位（各機能）により、質問候補を作成する。 As shown in Figure 6, a question button 50 is located in the lower right corner of the touch panel 6 of the user terminal 5. This allows viewers to recognize the existence of a function that generates question candidates. As shown in Figure 7, when the user terminal 5 detects that an item area pre-set in the video has been tapped, the question creation device 10 generates question candidates using its respective components (functions): the element output unit 11, the text acquisition unit 12, and the question creation unit 13.

本実施形態では、表示された動画に対してアクション（ここでは、タップ操作）を実行するだけで、ジャケットのアイテム関連情報を得るための質問を自動的に作成することができ、動画の視聴の妨げとなることを抑制することができる。なお、上述したように、ユーザー端末５は、予め動画に設定されているアイテム領域がタップ操作されたことを検出すると、そのアイテム領域に対応付けられているアイテム領域情報をストック情報として記憶する。このため、表示された動画に対する１つのアクション（ここでは、タップ操作）により、アイテム関連情報を得るための質問の作成と、ストック情報としての記憶とを、同時に行うこともできる。 In this embodiment, simply performing an action (in this case, a tap) on the displayed video automatically generates questions to obtain item-related information about the jacket, thus preventing interruptions to video viewing. As described above, when the user terminal 5 detects that an item area pre-set in the video has been tapped, it stores the item area information associated with that item area as stock information. Therefore, a single action (in this case, a tap) on the displayed video can simultaneously generate questions to obtain item-related information and store it as stock information.

ここで、図７では、質問候補が作成されたものの、質問候補の具体的内容がタッチパネル６上に表示されていない。但し、タッチパネル６の右下に位置する質問ボタン５０の色が変更されることで、視聴者は、質問候補が作成されたことを認識することができる。 In Figure 7, although a list of potential questions has been created, the specific content of the questions is not displayed on the touch panel 6. However, the viewer can recognize that a list of potential questions has been created by the change in color of the question button 50 located in the lower right corner of the touch panel 6.

ユーザー端末５は、図８に示されるように、質問ボタン５０がタップ操作されたことを検出すると、質問作成装置１０の質問出力部１４は、質問候補５２を表示する。視聴者は、自ら意図したタイミング（質問ボタン５０をタップするタイミング）で質問候補５２を表示することができるので、質問候補５２の不意な表示により動画の視聴が妨げられることを抑制することができる。図８では、質問出力部１４により、視聴者が選択可能な態様で複数（ここでは、３つ）の質問候補がタッチパネル６に出力されている。これにより、視聴者が、複数の質問候補の中から、意図にあった質問候補を選択することができる。 As shown in Figure 8, when the user terminal 5 detects that the question button 50 has been tapped, the question output unit 14 of the question creation device 10 displays the candidate questions 52. Since the viewer can choose when to display the candidate questions 52 (the timing of tapping the question button 50), the unexpected display of candidate questions 52 can prevent interruptions to video viewing. In Figure 8, the question output unit 14 outputs multiple (in this case, three) candidate questions to the touch panel 6 in a format that the viewer can select. This allows the viewer to choose the question that best suits their intentions from among the multiple candidate questions.

・第２例
図９は、質問作成処理の第２例における画面の一例を示す図である。図１０は、質問作成処理の第２例において、動画要素を取得する際の画面の一例を示す図である。図１１は、質問作成処理の第２例において、テキスト化要素を取得し、質問候補を作成する際の画面の一例を示す図である。・Second Example Figure 9 shows an example of the screen in the second example of the question creation process. Figure 10 shows an example of the screen when acquiring video elements in the second example of the question creation process. Figure 11 shows an example of the screen when acquiring text elements and creating question candidates in the second example of the question creation process.

図９～図１１に示される質問作成処理の第２例は、視聴者が、アイテム領域が設定されているアイテム以外のアイテム（すなわち、人物）に関心を持ったとき、当該アイテムのアイテム関連情報（ジャケットに関する情報）を得るための質問が作成される例である。 The second example of the question generation process shown in Figures 9 to 11 is an example where, when a viewer becomes interested in an item other than the item for which an item area is set (i.e., a person), a question is generated to obtain item-related information (information about the jacket) for that item.

ユーザー端末５は、図９に示されるように、予め動画に設定されているアイテム領域以外の領域がタップ操作されたことを検出すると、質問作成装置１０の要素出力部１１は、タップ操作された領域を含む所定の大きさの領域の画像情報を、解析装置２０に出力する。図１０に示されるように、視聴者が当該所定の大きさの領域の画像情報をタップ操作した際、その領域内のアイテム画像が表示されても良い。視聴者は、予め動画に設定されているアイテム領域以外の領域だとしても、動画内の人物に興味を持ったときにタッチパネル６上の人物をタップすると人物のアイテム画像が表示されるため、その人物のアイテム関連情報を取得できることを認識できる。 As shown in Figure 9, when the user terminal 5 detects that a tap operation has been performed on an area other than the item areas pre-set in the video, the element output unit 11 of the question creation device 10 outputs image information of a predetermined size area including the tapped area to the analysis device 20. As shown in Figure 10, when the viewer taps the image information of the predetermined size area, the item image within that area may be displayed. The viewer can recognize that even if the area is outside the pre-set item areas in the video, tapping a person on the touch panel 6 when they are interested in a person in the video will display the person's item image, allowing them to obtain item-related information about that person.

解析装置２０は、当該画像情報を解析し、テキスト化要素を作成する。図１０に示されるように、タッチパネル６上には、テキスト化要素（「ＡＢＣ」）が表示され、タップ操作された領域の画像情報に基づいてテキスト化要素が作成されたことを認識することができる。また、図１１に示されるように、質問作成装置１０の質問作成部１３が、質問候補５２を作成し、質問出力部１４が、質問候補５２を表示する。 The analysis device 20 analyzes the image information and creates text elements. As shown in Figure 10, the text elements ("ABC") are displayed on the touch panel 6, allowing the user to recognize that the text elements were created based on the image information of the tapped area. Furthermore, as shown in Figure 11, the question creation unit 13 of the question creation device 10 creates a candidate question 52, and the question output unit 14 displays the candidate question 52.

・その他の例
質問作成処理の例は、上述した第１例及び第２例に限られない。例えば、上述した第１例において、質問候補５２を表示するための質問ボタン５０が配置されていなくても良い。この場合、視聴者が、アイテム領域が設定されているアイテム（すなわち、ジャケット）に関心を持ったとき、例えば、アイテムを長押し操作することにより、第２例における図１１に示されるような質問候補５２が表示されても良い。さらに、上述した第２例において、質問ボタン５０が配置されており、視聴者が質問ボタン５０をタップ操作することで質問候補５２が表示されても良い。 Other Examples The examples of the question creation process are not limited to the first and second examples described above. For example, in the first example described above, the question button 50 for displaying the question candidate 52 does not need to be provided. In this case, when a viewer becomes interested in an item with an item area set (i.e., a jacket), for example, by long-pressing the item, the question candidate 52 shown in Figure 11 in the second example may be displayed. Furthermore, in the second example described above, the question button 50 may be provided, and the question candidate 52 may be displayed when the viewer taps the question button 50.

＝＝＝その他＝＝＝
前述の実施形態は、本発明の理解を容易にするためのものであり、本発明を限定して解釈するためのものではない。本発明は、その趣旨を逸脱することなく、変更・改良され得ると共に、本発明には、その等価物が含まれることは言うまでもない。 ===Other===
The embodiments described above are provided to facilitate understanding of the present invention and are not intended to limit its interpretation. The present invention can be modified and improved without departing from its spirit, and it goes without saying that the present invention includes equivalents thereof.

１動画配信サーバー
３メタデータ配信サーバー
５ユーザー端末
６タッチパネル
７Ａ表示部
７Ｂ入力部
８Ａ制御部
８Ｂ通信部
９通信ネットワーク
１０質問作成装置
１１要素出力部
１２テキスト取得部
１３質問作成部
１４質問出力部
１６記録部
２０解析装置
３０回答作成装置
４１枠
４３ストック情報表示部
４４アイテム画像
５０質問ボタン
５１テキスト化要素
５２質問候補
１００質問作成システム 1 Video distribution server 3 Metadata distribution server 5 User terminal 6 Touch panel 7A Display unit 7B Input unit 8A Control unit 8B Communication unit 9 Communication network 10 Question creation device 11 Element output unit 12 Text acquisition unit 13 Question creation unit 14 Question output unit 16 Recording unit 20 Analysis device 30 Answer creation device 41 Frame 43 Stock information display unit 44 Item image 50 Question button 51 Text element 52 Question candidate 100 Question creation system

Claims

An element output unit that, when an action is performed on a displayed video, extracts video elements from the video on which the action was performed and outputs them to an analysis device,
A text acquisition unit that acquires text elements created based on the video elements by the analysis device,
A question creation unit that creates candidate questions related to the video elements from the text elements,
A question generation device equipped with the following features.

The aforementioned video element has at least one of the following related to the video: image information, audio information, and text information.
The question generation device according to claim 1.

The aforementioned video element has the image information which includes type information relating to the video element,
The text acquisition unit acquires the type information linked to the text element,
The question creation unit creates the candidate questions using a question template corresponding to the type information.
The question generation device according to claim 2.

The aforementioned video element is at least one of image information, audio information, and text information that does not include type information related to the video element.
The text acquisition unit acquires the text elements associated with the type information,
The question creation unit creates the candidate questions using a question template corresponding to the type information.
The question generation device according to claim 2.

The text acquisition unit acquires a first text element associated with the first type information and a second text element associated with the second type information.
The aforementioned question template includes a first input element related to the first type information and a second input element related to the second type information.
The question creation unit creates a candidate question in which the first text element is input into the first input element and the second text element is input into the second input element.
The question generation device according to claim 3 or 4.

The system includes a question output unit that outputs the aforementioned candidate questions to a display device.
The question generation device according to claim 1.

The question generation unit generates a plurality of candidate questions,
The question output unit outputs the plurality of question candidates to the display device in a manner that can be selected by the viewer of the video.
The question generation device according to claim 6.

The question output unit adds priority information to the plurality of candidate questions and outputs them to the display device.
The question generation device according to claim 7.

The question output unit outputs the candidate questions determined by the viewer of the video to the answer creation device.
A question generation device according to any one of claims 6 to 8.

The aforementioned response generation device is a large-scale language model,
The question generation device according to claim 9.

The aforementioned action is performed by one of the following operations: click, touch, voice input, or keyboard input, or a combination of any of these operations.
The question generation device according to claim 1.

An element output unit that, when an action is performed on a displayed video, extracts video elements from the video on which the action was performed and outputs them to an analysis device,
A text acquisition unit that acquires text elements created based on the video elements by the analysis device,
A question creation unit that creates candidate questions related to the video elements from the text elements,
A display unit that displays the aforementioned candidate questions,
A question creation system equipped with the following features.

When an action is performed on the displayed video, the video elements of the video on which the action was performed are extracted and output to the analysis device.
The analysis device obtains text elements created based on the video elements,
To generate candidate questions related to the video elements from the aforementioned text elements,
A question creation method that includes the following features.