JP7687765B2

JP7687765B2 - Caption display method and related device

Info

Publication number: JP7687765B2
Application number: JP2023580652A
Authority: JP
Inventors: ルオ，ションリ
Original assignee: Petal Cloud Technology Co Ltd
Current assignee: Petal Cloud Technology Co Ltd
Priority date: 2021-06-30
Filing date: 2022-05-26
Publication date: 2025-06-03
Anticipated expiration: 2042-05-26
Also published as: CN115550714A; CN119233003A; JP2024526253A; WO2023273729A1; CN115550714B

Description

［技術分野］
本出願は、端末技術の分野に関し、特に、キャプション表示方法及び関連装置に関する。 [Technical field]
The present application relates to the field of terminal technology, and in particular to a caption display method and related device.

電子製品の急速な発展に伴い、携帯電話、タブレット型コンピュータ、及びスマートテレビなどの電子デバイスは、人々の生活の中でユビキタスになっており、ビデオ再生もまた、これらの電子デバイスの重要なアプリケーション機能になっている。電子デバイスがビデオを再生する際に、再生されるビデオに関連するキャプションがビデオ再生ウィンドウに表示されるアプリケーションシナリオも普及している。例えば、音声と同期されたキャプションをビデオ再生ウィンドウに表示したり、ユーザによって入力されたキャプションをビデオ再生ウィンドウに表示したりして、ビデオ対話を向上させる。 With the rapid development of electronic products, electronic devices such as mobile phones, tablet computers, and smart TVs have become ubiquitous in people's lives, and video playback has also become an important application function of these electronic devices. When an electronic device plays a video, an application scenario in which captions related to the played video are displayed in the video playback window is also widespread. For example, captions synchronized with audio can be displayed in the video playback window, or captions input by a user can be displayed in the video playback window to improve video interaction.

しかしながら、ビデオが再生されると同時にキャプションが表示される前述のアプリケーションシナリオでは、ビデオの色及び輝度がキャプションの色に近い場合、又はキャプションの色及び輝度がキャプションの表示位置におけるビデオの色及び輝度と高度に一致する場合、例えば、高輝度のシナリオでいくつかの明るい色のキャプションが表示される場合、又は雪のシナリオでいくつかの白いキャプションが表示される場合、キャプションが十分に認識できず、ユーザにはっきりと見えにくい。従って、ユーザエクスペリエンスが低下する。 However, in the aforementioned application scenario where the caption is displayed at the same time as the video is played, if the color and brightness of the video are close to the color of the caption, or the color and brightness of the caption are highly consistent with the color and brightness of the video at the display position of the caption, for example, if some bright-colored captions are displayed in a high-brightness scenario, or some white captions are displayed in a snowy scenario, the captions will not be fully recognizable and will not be clearly visible to the user, thus degrading the user experience.

本出願の一実施形態は、キャプション表示方法及び関連デバイスを提供し、ユーザがビデオを見るプロセスにおいてキャプション認識度（caption recognition）が低いという問題を解決して、ユーザエクスペリエンスを向上させる。 An embodiment of the present application provides a caption display method and related device, which solves the problem of low caption recognition in the process of users watching videos, and improves the user experience.

第１の態様によれば、本出願の一実施形態は、キャプション表示方法を提供する。この方法は、以下を含む：電子デバイスが、第１のビデオを再生する。電子デバイスが第１のインターフェースを表示するとき、第１のインターフェースは、第１のピクチャと第１のキャプションとを含み、第１のキャプションは、第１のマスクを背景として使用することによってフローティング方式で第１のピクチャの第１の領域上に表示され、第１の領域は、第１のピクチャ内にあり、第１のキャプションの表示位置に対応する領域である。第１のキャプションの色値と第１の領域の色値との間の差分値は、第１の値である。電子デバイスが第２のインターフェースを表示するとき、第２のインターフェースは、第２のピクチャと第１のキャプションとを含み、第１のキャプションに対してマスクは表示されず、第１のキャプションは、第２のピクチャの第２の領域上にフローティング方式で表示され、第２の領域は、第２のピクチャ内にあり、第１のキャプションの表示位置に対応する領域である。第１のキャプションの色値と第２の領域の色値との間の差分値は、第２の値であり、第２の値は、第１の値よりも大きい。第１のピクチャは、第１のビデオ内の１つのピクチャであり、第２のピクチャは、第１のビデオ内の別のピクチャである。 According to a first aspect, an embodiment of the present application provides a caption display method. The method includes: an electronic device plays a first video. When the electronic device displays a first interface, the first interface includes a first picture and a first caption, and the first caption is displayed on a first region of the first picture in a floating manner by using a first mask as a background, the first region being a region in the first picture corresponding to a display position of the first caption. A difference value between a color value of the first caption and a color value of the first region is a first value. When the electronic device displays the second interface, the second interface includes a second picture and a first caption, no mask is displayed for the first caption, and the first caption is displayed in a floating manner on a second region of the second picture, the second region being a region in the second picture corresponding to a display position of the first caption. A difference value between a color value of the first caption and a color value of the second region is a second value, the second value being greater than the first value. The first picture is a picture in the first video, and the second picture is another picture in the first video.

本出願のこの実施形態では、前述のキャプション表示方法を実施することによって、電子デバイスは、キャプション認識度が低い場合にキャプションに対してマスクを設定し、キャプションの色を変更することなく、キャプション認識度を高めることができる。 In this embodiment of the present application, by implementing the above-mentioned caption display method, the electronic device can set a mask for the caption when the caption recognition is low, thereby increasing the caption recognition without changing the caption color.

可能な実装形態では、電子デバイスが第１のピクチャを表示する前に、方法は、以下を更に含む：電子デバイスは、第１のビデオファイル及び第１のキャプションファイルを取得し、ここで、第１のビデオファイル及び第１のキャプションファイルは同じ時間情報を搬送する。電子デバイスは、第１のビデオファイルに基づいて第１のビデオフレームを生成し、ここで、第１のビデオフレームは、第１のピクチャを生成するために使用される。電子デバイスは、第１のキャプションファイルに基づいて第１のキャプションフレームを生成し、第１のキャプションフレームから第１のキャプションの色値及び表示位置を取得し、ここで、第１のキャプションフレームで搬送される時間情報は、第１のビデオフレームで搬送される時間情報と同じである。電子デバイスは、第１のキャプションの表示位置に基づいて第１の領域を決定する。電子デバイスは、第１のキャプションの色値又は第１の領域の色値に基づいて第１のマスクを生成する。電子デバイスは、第１のキャプションフレーム内の第１のマスク上に第１のキャプションを重ね合わせて第２のキャプションフレームを生成し、第２のキャプションフレームと第１のビデオフレームとを結合する。このようにして、電子デバイスは、再生されるべきビデオファイル及び表示されるべきキャプションファイルを取得し、次いで、ビデオファイルを復号してビデオフレームを取得し、キャプションファイルを復号してキャプションフレームを取得し得る。その後、電子デバイスは、キャプションフレームからキャプション色域情報、キャプション位置情報などを抽出し、キャプション位置情報に基づいて、ビデオフレーム内にあり、キャプションに対応するキャプション表示位置における色域情報を抽出し、キャプション色域情報と、ビデオフレーム内にあり、キャプションに対応するキャプション表示位置の色域情報とに基づいてキャプション認識度を計算し得る。更に、電子デバイスは、キャプション認識度に基づいて、キャプションに対応するマスクの色値を計算して、マスクフレームを有するキャプションフレームを生成し、次いで、ビデオフレームとマスクを有するキャプションフレームとを結合し、結合されたビデオフレームをレンダリングする。 In a possible implementation, before the electronic device displays the first picture, the method further includes: the electronic device obtains a first video file and a first caption file, where the first video file and the first caption file carry the same time information; the electronic device generates a first video frame based on the first video file, where the first video frame is used to generate the first picture; the electronic device generates a first caption frame based on the first caption file and obtains a color value and a display position of the first caption from the first caption frame, where the time information carried in the first caption frame is the same as the time information carried in the first video frame; the electronic device determines a first region based on the display position of the first caption; the electronic device generates a first mask based on the color value of the first caption or the color value of the first region. The electronic device overlays the first caption on the first mask in the first caption frame to generate a second caption frame, and combines the second caption frame with the first video frame. In this way, the electronic device may obtain a video file to be played and a caption file to be displayed, then decode the video file to obtain a video frame, and decode the caption file to obtain a caption frame. The electronic device may then extract caption color gamut information, caption position information, etc. from the caption frame, extract color gamut information at a caption display position in the video frame that corresponds to the caption based on the caption position information, and calculate a caption recognizability based on the caption color gamut information and the color gamut information at a caption display position in the video frame that corresponds to the caption. Furthermore, the electronic device may calculate a color value of a mask corresponding to the caption based on the caption recognizability to generate a caption frame with a mask frame, and then combine the video frame and the caption frame with the mask, and render the combined video frame.

可能な実装形態では、電子デバイスが第１のキャプションの色値又は第１の領域の色値に基づいて第１のマスクを生成する前に、方法は、以下を更に含む：電子デバイスは、第１の値が第１の閾値未満であると決定する。このようにして、電子デバイスは、第１の値が第１の閾値未満であると決定することによって、キャプション認識度が低いことを更に決定し得る。 In a possible implementation, before the electronic device generates the first mask based on the color values of the first caption or the color values of the first region, the method further includes: the electronic device determines that the first value is less than a first threshold. In this manner, the electronic device may further determine that the caption recognition degree is low by determining that the first value is less than the first threshold.

可能な実装形態では、電子デバイスが、第１の値が第１の閾値未満であると決定することは、具体的には、以下を含む：電子デバイスは、第１の領域をＮ個の第１のサブ領域に分割し、ここで、Ｎは正の整数である。電子デバイスは、第１のキャプションの色値とＮ個の第１のサブ領域の色値とに基づいて、第１の値が第１の閾値未満であると決定する。このようにして、電子デバイスは、第１のキャプションの色値とＮ個の第１のサブ領域の色値とに基づいて、第１の値が第１の閾値未満であると決定し得る。 In a possible implementation, the electronic device's determining that the first value is less than the first threshold specifically includes: the electronic device divides the first region into N first sub-regions, where N is a positive integer. The electronic device determines that the first value is less than the first threshold based on the color value of the first caption and the color values of the N first sub-regions. In this manner, the electronic device may determine that the first value is less than the first threshold based on the color value of the first caption and the color values of the N first sub-regions.

可能な実装形態では、電子デバイスが、第１のキャプションの色値又は第１の領域の色値に基づいて第１のマスクを生成することは、具体的には、以下を含む：電子デバイスは、第１のキャプションの色値又はＮ個の第１のサブ領域の色値に基づいて、第１のマスクの色値を決定する。電子デバイスは、第１のマスクの色値に基づいて第１のマスクを生成する。このようにして、電子デバイスは、第１のキャプションの色値又はＮ個の第１のサブ領域の色値に基づいて第１のマスクの色値を決定し、第１のキャプションに対する第１のマスクを更に生成し得る。 In a possible implementation, the electronic device generating the first mask based on the color values of the first caption or the color values of the first region specifically includes: the electronic device determines the color values of the first mask based on the color values of the first caption or the color values of the N first sub-regions. The electronic device generates the first mask based on the color values of the first mask. In this manner, the electronic device may determine the color values of the first mask based on the color values of the first caption or the color values of the N first sub-regions, and further generate the first mask for the first caption.

可能な実装形態では、電子デバイスが、第１の値が第１の閾値未満であると決定することは、以下を具体的に含む：電子デバイスは、第１の領域をＮ個の第１のサブ領域に分割し、ここで、Ｎは正の整数である。電子デバイスは、隣接する第１のサブ領域の色値の間の差分値に基づいて、隣接する第１のサブ領域を第２のサブ領域に結合するかどうかを決定する。隣接する第１のサブ領域の色値間の差分値が第２の閾値未満である場合、電子デバイスは、隣接する第１のサブ領域を第２のサブ領域に結合する。電子デバイスは、第１のキャプションの色値と第２のサブ領域の色値とに基づいて、第１の値が第１の閾値未満であると決定する。このようにして、電子デバイスは、近い色値を有する第１のサブ領域を結合して第２のサブ領域を生成し、第１のキャプションの色値と第２のサブ領域の色値とに基づいて、第１の値が第１の閾値未満であると更に決定し得る。 In a possible implementation, the electronic device's determining that the first value is less than the first threshold specifically includes: the electronic device divides the first region into N first sub-regions, where N is a positive integer. The electronic device determines whether to combine the adjacent first sub-regions into a second sub-region based on a difference value between the color values of the adjacent first sub-regions. If the difference value between the color values of the adjacent first sub-regions is less than the second threshold, the electronic device combines the adjacent first sub-regions into a second sub-region. The electronic device determines that the first value is less than the first threshold based on the color value of the first caption and the color value of the second sub-region. In this way, the electronic device may combine the first sub-regions having close color values to generate a second sub-region, and further determine that the first value is less than the first threshold based on the color value of the first caption and the color value of the second sub-region.

可能な実装形態では、第１の領域は、Ｍ個の第２のサブ領域を含み、Ｍは、正の整数であり、Ｎ以下であり、第２のサブ領域は、１つ又は複数の第１のサブ領域を含み、各第２のサブ領域に含まれる第１のサブ領域の数は、別の第２のサブ領域に含まれる第１のサブ領域の数と同じであるか、又は異なる。このようにして、電子デバイスは、第１の領域をＭ個の第２のサブ領域に分割し得る。 In a possible implementation, the first region includes M second sub-regions, where M is a positive integer and is less than or equal to N, and the second sub-region includes one or more first sub-regions, and the number of first sub-regions included in each second sub-region is the same as or different from the number of first sub-regions included in another second sub-region. In this way, the electronic device may divide the first region into M second sub-regions.

可能な実装形態では、電子デバイスが、第１のキャプションの色値又は第１の領域の色値に基づいて第１のマスクを生成することは、具体的には、以下を含む：電子デバイスは、第１のキャプションの色値又はＭ個の第２のサブ領域の色値に基づいて、Ｍ個の第１のサブマスクの色値を順次計算する。電子デバイスは、Ｍ個の第１のサブマスクの色値に基づいてＭ個の第１のサブマスクを生成し、ここで、Ｍ個の第１のサブマスクは、第１のサブマスクに結合される。このようにして、電子デバイスは、第１のキャプションに対するＭ個の第１のサブマスクを生成し得る。 In a possible implementation, the electronic device generating the first mask based on the color values of the first caption or the color values of the first region specifically includes: the electronic device sequentially calculates color values of M first sub-masks based on the color values of the first caption or the color values of the M second sub-regions. The electronic device generates M first sub-masks based on the color values of the M first sub-masks, where the M first sub-masks are combined into the first sub-mask. In this manner, the electronic device may generate M first sub-masks for the first caption.

可能な実装形態では、方法は、以下を更に含む：電子デバイスが第３のインターフェースを表示するとき、第３のインターフェースは、第３のピクチャと第１のキャプションとを含み、第１のキャプションは、少なくとも第１の部分と第２の部分とを含み、第１の部分に対して第２のサブマスクが表示され、第２の部分に対して第３のサブマスクが表示されるか又は第３のサブマスクが表示されず、第２のサブマスクの色値は第３のサブマスクの色値とは異なる。このようにして、電子デバイスは、複数のサブマスクに対応するキャプションを表示し得る。 In a possible implementation, the method further includes: when the electronic device displays the third interface, the third interface includes the third picture and the first caption, the first caption includes at least a first portion and a second portion, the second submask is displayed for the first portion, the third submask is displayed or not displayed for the second portion, and the color value of the second submask is different from the color value of the third submask. In this manner, the electronic device may display a caption corresponding to multiple submasks.

可能な実装形態では、第１のマスクの表示位置は、第１のキャプションの表示位置に基づいて決定される。このようにして、第１のマスクの表示位置は、第１のキャプションの表示位置と重なり得る。 In a possible implementation, the display position of the first mask is determined based on the display position of the first caption. In this way, the display position of the first mask may overlap with the display position of the first caption.

可能な実装形態では、第１のマスクの色値と第１のキャプションの色値との間の差分値は、第１の値よりも大きい。このようにして、キャプション認識度を高めることができる。 In a possible implementation, the difference value between the color value of the first mask and the color value of the first caption is greater than the first value. In this way, the degree of caption recognition can be increased.

可能な実装形態では、第１のピクチャ及び第２のピクチャにおいて、電子デバイスの表示画面に対する第１のキャプションの表示位置は、固定されていないか、又は固定されており、第１のキャプションは、連続的に表示される文字又は記号のセグメントである。このようにして、第１のキャプションは、弾幕コメント（bullet comment）（bullet
barrage）、又は音声と同期されたキャプションであり得、第１のキャプションは、表示画面上に表示される全てのキャプションではなく、１つのキャプションである。 In a possible implementation, the display position of the first caption in the first picture and the second picture relative to the display screen of the electronic device is not fixed or is fixed, and the first caption is a segment of characters or symbols that are displayed consecutively. In this way, the first caption is a bullet comment .
The first caption may be a caption with a 3D barrage , or a caption synchronized with the audio, and the first caption is one caption rather than all captions displayed on the display screen.

可能な実装形態では、電子デバイスが第１のインターフェースを表示する前に、方法は、以下を含む：電子デバイスは、第１のマスクの透明度を１００％未満に設定する。このようにして、第１のマスクが位置する領域に対応するビデオフレームが依然としてある程度可視であることを保証することができる。 In a possible implementation, before the electronic device displays the first interface, the method includes: the electronic device sets the transparency of the first mask to less than 100%. In this way, it can be ensured that the video frame corresponding to the area where the first mask is located is still visible to some extent.

可能な実装形態では、電子デバイスが第２のインターフェースを表示する前に、方法は以下を含む：電子デバイスは、第１のキャプションの色値又は第２の領域の色値に基づいて第２のマスクを生成し、第１のキャプションを第２のマスクに重ね合わせ、ここで、第２のマスクの色値は予め設定された色値であり、第２のマスクの透明度は１００％である。代替的に、電子デバイスは、第２のマスクの生成をスキップする。このようにして、認識度の高いキャプションについて、電子デバイスは、透明度が１００％であるマスクをキャプションに対して設定してもよいし、キャプションに対してマスクを設定しなくてもよい。 In a possible implementation, before the electronic device displays the second interface, the method includes: the electronic device generates a second mask based on a color value of the first caption or a color value of the second region, and overlays the first caption on the second mask, where the color value of the second mask is a preset color value, and the transparency of the second mask is 100%. Alternatively, the electronic device skips generating the second mask. In this way, for a caption with high recognition, the electronic device may set a mask with a transparency of 100% for the caption, or may not set a mask for the caption.

第２の態様によれば、本出願の一実施形態は、電子デバイスを提供する。電子デバイスは、１つ又は複数のプロセッサと１つ又は複数のメモリとを含み、１つ又は複数のメモリは、１つ又は複数のプロセッサに結合され、１つ又は複数のメモリは、コンピュータプログラムコードを記憶するように構成され、コンピュータプログラムコードは、コンピュータ命令を含み、１つ又は複数のプロセッサがコンピュータ命令を実行すると、電子デバイスは、第１の態様の任意の可能な実装形態における方法を実行することが可能になる。 According to a second aspect, an embodiment of the present application provides an electronic device. The electronic device includes one or more processors and one or more memories, the one or more memories coupled to the one or more processors, the one or more memories configured to store computer program code, the computer program code including computer instructions, and execution of the computer instructions by the one or more processors enables the electronic device to perform the method in any possible implementation of the first aspect.

第３の態様によれば、本出願の一実施形態は、コンピュータ記憶媒体を提供する。コンピュータ記憶媒体はコンピュータプログラムを記憶し、コンピュータプログラムはプログラム命令を含み、プログラム命令が電子デバイス上で実行されると、電子デバイスは、第１の態様の任意の可能な実装形態による方法を実行することが可能になる。 According to a third aspect, an embodiment of the present application provides a computer storage medium. The computer storage medium stores a computer program, the computer program including program instructions that, when executed on an electronic device, enable the electronic device to perform a method according to any possible implementation of the first aspect.

第４の態様によれば、本出願の一実施形態は、コンピュータプログラム製品を提供する。コンピュータプログラム製品がコンピュータ上で実行されると、コンピュータは、第１の態様の任意の可能な実装形態による方法を実行することが可能になる。 According to a fourth aspect, an embodiment of the present application provides a computer program product. When the computer program product is executed on a computer, the computer is enabled to execute a method according to any possible implementation of the first aspect.

本出願の一実施形態によるキャプション表示方法の概略的なフローチャートである。1 is a schematic flowchart of a caption display method according to an embodiment of the present application; 本出願の一実施形態によるキャプション表示方法の概略フローチャートである。1 is a schematic flowchart of a caption display method according to an embodiment of the present application; 本出願の一実施形態によるユーザインターフェースのグループの概略図である。FIG. 2 is a schematic diagram of a group of user interfaces according to an embodiment of the present application. 本出願の一実施形態によるユーザインターフェースのグループの概略図である。FIG. 2 is a schematic diagram of a group of user interfaces according to an embodiment of the present application. 本出願の一実施形態によるユーザインターフェースのグループの概略図である。FIG. 2 is a schematic diagram of a group of user interfaces according to an embodiment of the present application. 本出願の一実施形態による別のキャプション表示方法の概略フローチャートである。4 is a schematic flowchart of another caption display method according to an embodiment of the present application; 本出願の一実施形態による別のキャプション表示方法の概略フローチャートである。4 is a schematic flowchart of another caption display method according to an embodiment of the present application; 本出願の一実施形態による別のキャプション表示方法の概略フローチャートである。4 is a schematic flowchart of another caption display method according to an embodiment of the present application; 本出願の一実施形態による別のキャプション表示方法の概略フローチャートである。4 is a schematic flowchart of another caption display method according to an embodiment of the present application; 本出願の一実施形態によるキャプションフレームの概略図である。FIG. 2 is a schematic diagram of a caption frame according to an embodiment of the present application. 本出願の一実施形態による、キャプションに対応するマスクを生成する原理の概略図である。FIG. 2 is a schematic diagram of the principle of generating a mask corresponding to a caption according to an embodiment of the present application; 本出願の一実施形態による、マスクを有するキャプションフレームの概略図である。FIG. 2 is a schematic diagram of a caption frame with a mask according to an embodiment of the present application; 本出願の一実施形態による、キャプションを表示するためのユーザインターフェースのグループの概略図である。FIG. 2 is a schematic diagram of a group of user interfaces for displaying captions according to an embodiment of the present application. 本出願の一実施形態による、キャプションを表示するためのユーザインターフェースのグループの概略図である。FIG. 2 is a schematic diagram of a group of user interfaces for displaying captions according to an embodiment of the present application. 本出願の一実施形態による、キャプションに対応するマスクを生成するための方法の概略フローチャートである。1 is a schematic flowchart of a method for generating a mask corresponding to a caption according to an embodiment of the present application; 本出願の一実施形態による、キャプションに対応するマスクを生成する別の原理の概略図である。FIG. 2 is a schematic diagram of another principle of generating a mask corresponding to a caption according to an embodiment of the present application; 本出願の一実施形態による、マスクを有する別のキャプションフレームの概略図である。FIG. 13 is a schematic diagram of another caption frame with a mask according to an embodiment of the present application; 本出願の一実施形態による、キャプションを表示するためのユーザインターフェースのグループの概略図である。FIG. 2 is a schematic diagram of a group of user interfaces for displaying captions according to an embodiment of the present application. 本出願の一実施形態による、キャプションを表示するためのユーザインターフェースのグループの概略図である。FIG. 2 is a schematic diagram of a group of user interfaces for displaying captions according to an embodiment of the present application. 本出願の一実施形態による、電子デバイスの構造の概略図である。1 is a schematic diagram of a structure of an electronic device according to an embodiment of the present application. 本出願の一実施形態による、電子デバイスのソフトウェア構造の概略図である。2 is a schematic diagram of a software structure of an electronic device according to an embodiment of the present application; 本出願の一実施形態による、別の電子デバイスの構造の概略図である。FIG. 2 is a schematic diagram of a structure of another electronic device according to an embodiment of the present application. 本出願の一実施形態による、別の電子デバイスの構造の概略図である。FIG. 2 is a schematic diagram of a structure of another electronic device according to an embodiment of the present application.

以下では、本出願の実施形態における添付図面を参照して、本出願の実施形態における技術的解決策について明確に説明する。本出願の実施形態の説明では、別段の指定がない限り、「／」は「又は」関係を示す。例えば、Ａ／ＢはＡ又はＢを表し得る。本明細書における「及び／又は（and/or）」は、関連するオブジェクトを説明するための単なる関連関係であり、３つの関係が存在し得ることを表す。例えば、Ａ及び／又はＢは、以下の３つの場合を表し得る：Ａのみが存在する、ＡとＢの両方が存在する、Ｂのみが存在する。加えて、本出願の実施形態の説明では、「複数の（a plurality of）」は「２つ以上の（two or more）」を意味する。 Hereinafter, the technical solutions in the embodiments of the present application will be clearly described with reference to the accompanying drawings in the embodiments of the present application. In the description of the embodiments of the present application, unless otherwise specified, "/" indicates an "or" relationship. For example, A/B may represent A or B. In this specification, "and/or" is merely an associated relationship to describe related objects, and represents that three relationships may exist. For example, A and/or B may represent the following three cases: only A exists, both A and B exist, and only B exists. In addition, in the description of the embodiments of the present application, "a plurality of" means "two or more".

本出願の明細書、特許請求の範囲、及び添付図面において、「第１の」、「第２の」などの用語は、異なる物体を区別することを意図するものであり、特定の順序を示すことを意図するものではないことを理解されたい。加えて、「含む（include）」、「有する（have）」などの用語、及びそれらの任意の他の変形は、非排他的な包含をカバーすることを意図するものである。例えば、一連のステップ又はユニットを含むプロセス、方法、システム、製品、又はデバイスは、列挙されたステップ又はユニットに限定されず、任意選択で、列挙されていないステップ又はユニットを更に含むか、又は任意選択で、プロセス、方法、製品、又はデバイスの別の固有のステップ又はユニットを更に含む。 In the specification, claims, and accompanying drawings of this application, it should be understood that terms such as "first", "second", etc. are intended to distinguish between different objects and are not intended to indicate a particular order. In addition, terms such as "include", "have", and any other variations thereof are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the recited steps or units, and optionally further includes unrecited steps or units, or optionally further includes another unique step or unit of the process, method, product, or device.

本出願で言及される「実施形態」は、実施形態を参照して説明される特定の特性、構造、又は特徴が、本出願の少なくとも１つの実施形態に含まれ得ることを意味する。本明細書の様々な箇所に示される表現は、必ずしも同じ実施形態を意味するとは限らず、他の実施形態から排他的な独立した又は任意の実施形態ではない。本出願において説明される実施形態は、他の実施形態と組み合わされ得ることが、当業者によって明示的及び暗示的に理解される。 The term "embodiment" as used herein means that a particular feature, structure, or characteristic described with reference to the embodiment may be included in at least one embodiment of the present application. Expressions appearing in various places in this specification do not necessarily refer to the same embodiment, nor are they an independent or optional embodiment exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described in this application may be combined with other embodiments.

理解を容易にするために、以下ではまず、本出願の実施形態におけるいくつかの関連する概念について説明する。 To facilitate understanding, we first describe some relevant concepts in the embodiments of this application.

１．ビデオ復号： 1. Video Decoding:

ビデオファイルのバイナリデータが読み取られ、ビデオファイルの圧縮アルゴリズムに従って解釈することによって、ビデオ再生のための画像フレーム（ビデオフレームと呼ばれることもある）のデータが取得されるプロセス。 The process by which the binary data of a video file is read and interpreted according to the video file's compression algorithm to obtain image frames (sometimes called video frames) of data for video playback.

２．キャプション： 2. Caption:

ビデオ再生中にビデオ再生ウィンドウに表示され、ビデオファイルとは無関係であるテキスト及び記号情報。 Text and symbolic information that appears in the video playback window while a video is playing and is unrelated to the video file.

３．ビデオ再生： 3. Video playback:

ビデオ復号及びビデオレンダリングなどの動作がビデオファイルに対して実行された後に、画像のグループ及び対応する音声情報をビデオ再生ウィンドウに時系列で表示するプロセス。 The process of displaying a group of images and corresponding audio information in chronological order in a video playback window after operations such as video decoding and video rendering have been performed on a video file.

４．弾幕コメント： 4. Comments on the barrage:

ビデオ再生クライアント（又はビデオアプリケーションと呼ばれる）上でユーザによって入力され、ビデオ再生のための画像フレームの、ユーザによって入力された時間に対応する位置に基づいて、入力ユーザのビデオ再生ウィンドウに、又はビデオ再生クライアント上の別のユーザのビデオ再生ウィンドウに表示され得るキャプション。 A caption that can be entered by a user on a video playback client (also called a video application) and displayed in the video playback window of the entering user, or in the video playback window of another user on the video playback client, based on the position of the image frame for video playback that corresponds to the time entered by the user.

電子製品の急速な発展に伴い、携帯電話、タブレット型コンピュータ、及びスマートテレビなどの電子デバイスは、人々の生活の中でユビキタスになっており、ビデオ再生もまた、これらの電子デバイスの重要なアプリケーション機能になっている。電子デバイスがビデオを再生する際に、再生されるビデオに関連するキャプションがビデオ再生ウィンドウに表示されるアプリケーションシナリオも普及している。例えば、音声と同期されたキャプションをビデオ再生ウィンドウに表示したり、ユーザによって入力されたキャプション（すなわち、弾幕コメント）をビデオ再生ウィンドウに表示したりして、ビデオ対話を向上させる。 With the rapid development of electronic products, electronic devices such as mobile phones, tablet computers, and smart TVs have become ubiquitous in people's lives, and video playback has also become an important application function of these electronic devices. When an electronic device plays a video, an application scenario in which captions related to the played video are displayed in the video playback window is also widespread. For example, captions synchronized with audio can be displayed in the video playback window, or captions entered by a user (i.e., barrage comments) can be displayed in the video playback window to improve video interaction.

音声と同期されたキャプションをビデオ再生ウィンドウで、通常はビデオ再生ウィンドウのより低い位置に、表示するアプリケーションシナリオでは、キャプションのタイムスタンプとビデオ内で再生される画像フレームのタイムスタンプとの間でマッチングが実行され、キャプションとビデオ内で再生される対応する画像フレームとが合成される。具体的には、キャプションは、対応するビデオフレーム上に重ね合わされ、キャプションとビデオフレームとの重なり位置は固定されている。 In an application scenario where captions synchronized with audio are displayed in a video playback window, usually at a lower position of the video playback window, a matching is performed between the timestamps of the captions and the timestamps of the image frames played in the video, and the captions are composited with the corresponding image frames played in the video. Specifically, the captions are overlaid on the corresponding video frames, and the overlapping positions of the captions and the video frames are fixed .

ユーザによって入力されたキャプション（すなわち、弾幕コメント）をビデオ再生ウィンドウに表示するアプリケーションシナリオでは、ビデオ再生プロセス中に左から右に又は右から左に飛ぶキャプションがビデオ再生ウィンドウに複数存在し、キャプションとビデオフレームとの重なり位置は固定されていない。 In an application scenario in which captions entered by users (i.e., barrage comments) are displayed in a video playback window, there are multiple captions in the video playback window that fly from left to right or right to left during the video playback process, and the overlapping positions of the captions and the video frames are not fixed .

いくつかの実際のアプリケーションシナリオでは、ビデオ再生の楽しさを向上させるために、ビデオ再生プラットフォームは、通常、ユーザによってキャプションの色を選択する能力をユーザに提供する。音声と同期されたキャプションをビデオ再生ウィンドウに表示するアプリケーションシナリオでは、キャプションの色は、通常、システムのデフォルト色であり、ビデオを再生するとき、ユーザは、好ましいキャプションの色を選択し得る。この場合、電子デバイスは、ユーザによって選択された色に基づいてキャプションをビデオ再生ウィンドウに表示する。弾幕コメントをビデオ再生ウィンドウに表示するアプリケーションシナリオでは、弾幕コメントを送信するユーザは、送信されるべき弾幕コメントの色を選択し得、別のユーザによって見られる弾幕コメントの色は、弾幕コメントを送信するユーザによって選択された弾幕コメントの色と一致する。従って、ユーザが弾幕コメントを見るとき、同じビデオフレーム内に表示される弾幕コメントの色は異なる場合がある。 In some practical application scenarios, in order to improve the enjoyment of video playback, the video playback platform usually provides the user with the ability to select the color of the caption by the user. In an application scenario in which captions synchronized with audio are displayed in the video playback window, the color of the caption is usually the system default color, and when playing the video, the user may select a preferred caption color. In this case, the electronic device displays the caption in the video playback window based on the color selected by the user. In an application scenario in which barrage comments are displayed in the video playback window, the user who sends the barrage comment may select the color of the barrage comment to be sent, and the color of the barrage comment seen by another user will match the color of the barrage comment selected by the user who sends the barrage comment. Therefore, when a user views the barrage comment, the colors of the barrage comments displayed in the same video frame may be different.

前述の２つのアプリケーションシナリオを実施するために、本出願の一実施形態は、キャプション表示方法を提供する。電子デバイスは、まず、再生されるべきビデオファイルと、ビデオ再生ウィンドウに表示されるべきキャプションファイルとを取得し得、次いで、それぞれ、ビデオファイルに対してビデオ復号を実行してビデオフレームを取得し、キャプションファイルに対してキャプション復号を実行してキャプションフレームを取得し得る。次いで、電子デバイスは、時系列に基づいてビデオフレームとキャプションフレームとを位置合わせ及びマッチングして、最終的な表示されるべきビデオフレームを合成し、それをビデオフレームキューに記憶し得る。その後、電子デバイスは、時系列に基づいて、表示されるべきビデオフレームを読み取ってレンダリングし、最終的に、レンダリングされたビデオフレームをビデオ再生ウィンドウに表示し得る。 To implement the aforementioned two application scenarios, an embodiment of the present application provides a caption display method. The electronic device may first obtain a video file to be played and a caption file to be displayed in a video playback window, and then perform video decoding on the video file to obtain video frames and caption decoding on the caption file to obtain caption frames, respectively. The electronic device may then align and match the video frames and the caption frames based on a time sequence to synthesize a final video frame to be displayed, which is stored in a video frame queue. After that, the electronic device may read and render the video frames to be displayed based on a time sequence, and finally display the rendered video frames in the video playback window.

以下では、前述のキャプション表示方法の方法手順について詳細に説明する。 The following describes in detail the steps of the above-mentioned caption display method.

図１Ａ及び図１Ｂは、本出願の一実施形態によるキャプション表示方法の方法手順の一例を示す。 Figures 1A and 1B show an example of a method procedure for displaying captions according to one embodiment of the present application.

図１Ａ及び図１Ｂに示すように、本方法は、ビデオ再生機能を有する電子デバイス１００に適用され得る。以下では、方法の特定のステップについて詳細に説明する。 As shown in Figures 1A and 1B, the method may be applied to an electronic device 100 having video playback capabilities. Certain steps of the method are described in detail below.

フェーズ１：ビデオ情報ストリーム及びキャプション情報ストリームの取得 Phase 1: Obtaining video and caption information streams

Ｓ１０１及びＳ１０２：電子デバイス１００は、ユーザによるビデオアプリケーション上でビデオを再生する操作を検出し、この操作に応答して、電子デバイス１００は、ビデオ情報ストリーム及びキャプション情報ストリームを取得し得る。 S101 and S102: The electronic device 100 detects a user's operation to play a video on a video application, and in response to this operation, the electronic device 100 may obtain a video information stream and a caption information stream.

具体的には、ビデオアプリケーションは、電子デバイス１００にインストールされ得る。ユーザによるビデオアプリケーション上でビデオを再生する操作を検出した後、この操作に応答して、電子デバイス１００は、ユーザが再生したいビデオに対応するビデオ情報ストリーム（又はビデオファイルと呼ばれる）及びキャプション情報ストリーム（又はキャプションファイルと呼ばれる）を取得し得る。 Specifically, a video application may be installed on electronic device 100. After detecting a user's operation to play a video on the video application, in response to the operation, electronic device 100 may obtain a video information stream (or referred to as a video file) and a caption information stream (or referred to as a caption file) corresponding to the video that the user wants to play.

例えば、図２Ａは、電子デバイス１００にインストールされたアプリケーションを表示するために電子デバイス１００によって提供されるユーザインターフェース（user interface、ＵＩ）を示す。電子デバイス１００は、ユーザインターフェース２１０内の「ビデオ」アプリケーションオプション２１１上でユーザによって実行された操作（例えば、タップ操作）を検出し得る。この操作に応答して、電子デバイス１００は、図２Ｂに示される例示的なユーザインターフェース２２０を表示し得る。ユーザインターフェース２２０は、「ビデオ」アプリケーションのメインインターフェースであり得る。電子デバイス１００は、ユーザインターフェース２２０内のビデオ再生オプション２２１上でユーザによって実行された操作（例えば、タップ操作）を検出し、この操作に応答して、電子デバイス１００は、ビデオ情報ストリームと、ビデオに対応するキャプション情報ストリームとを取得し得る。 2A illustrates a user interface (UI) provided by electronic device 100 to display applications installed on electronic device 100. Electronic device 100 may detect an operation (e.g., a tap operation) performed by a user on a "Video" application option 211 in user interface 210. In response to this operation, electronic device 100 may display the exemplary user interface 220 shown in FIG. 2B. User interface 220 may be a main interface of the "Video" application. Electronic device 100 may detect an operation (e.g., a tap operation) performed by a user on a video playback option 221 in user interface 220, and in response to this operation, electronic device 100 may obtain a video information stream and a caption information stream corresponding to the video.

ビデオ情報ストリーム及びキャプション情報ストリームは、電子デバイス１００がビデオアプリケーションのサーバーからダウンロードしたファイルであってもよいし、電子デバイス１００から取得したファイルであってもよい。ビデオファイルとキャプションファイルの両方が時間情報を搬送する。 The video information stream and the caption information stream may be files that the electronic device 100 downloads from a video application server or may be files obtained from the electronic device 100. Both the video file and the caption file carry time information.

図２Ａ及び図２Ｂは、電子デバイス１００上の例示的なユーザインターフェースを示すにすぎず、本出願のこの実施形態に対する限定を構成するものではないことが理解され得る。 It can be appreciated that Figures 2A and 2B are merely illustrative of an exemplary user interface on electronic device 100 and do not constitute a limitation on this embodiment of the present application.

フェーズ２：ビデオの復号 Phase 2: Video Decryption

Ｓ１０３：電子デバイス１００上のビデオアプリケーションは、ビデオ情報ストリームを電子デバイス１００上のビデオ復号モジュールに送信する。 S103: A video application on the electronic device 100 sends a video information stream to a video decoding module on the electronic device 100.

具体的には、ビデオ情報ストリームを取得した後、ビデオアプリケーションは、ビデオ情報ストリームをビデオ復号モジュールに送信し得る。 Specifically, after obtaining the video information stream, the video application may send the video information stream to a video decoding module.

Ｓ１０４及びＳ１０５：電子デバイス１００上のビデオ復号モジュールは、ビデオ情報ストリームを復号してビデオフレームを生成し、ビデオフレームを電子デバイス１００上のビデオフレーム合成モジュールに送信する。 S104 and S105: The video decoding module on the electronic device 100 decodes the video information stream to generate video frames and transmits the video frames to a video frame synthesis module on the electronic device 100.

具体的には、ビデオアプリケーションによって送信されたビデオ情報ストリームを受信した後、ビデオ復号モジュールは、ビデオ情報ストリームを復号してビデオフレームを生成し得る。ビデオフレームは、ビデオ再生プロセスにおける全てのビデオフレームであり得る。ビデオフレームは、画像フレームと呼ばれることもあり、各ビデオフレームは、ビデオフレームの時間情報（すなわち、タイムスタンプ）を搬送し得る。次いで、ビデオ復号モジュールは、復号を通じて生成されたビデオフレームをビデオフレーム合成モジュールに送信し、その後、表示されるべきビデオフレームを生成し得る。 Specifically, after receiving the video information stream sent by the video application, the video decoding module may decode the video information stream to generate video frames. The video frames may be all video frames in the video playback process. The video frames may also be referred to as image frames, and each video frame may carry time information (i.e., a timestamp) of the video frame. The video decoding module may then send the video frames generated through decoding to a video frame synthesis module, which may then generate the video frames to be displayed.

ビデオ復号モジュールは、従来技術のビデオ復号方法を使用することによってビデオ情報ストリームを復号し得る。これは、本出願のこの実施形態では限定されない。ビデオ復号方法の具体的な実装については、ビデオ復号に関する技術文書を参照されたい。詳細はここでは説明しない。 The video decoding module may decode the video information stream by using a video decoding method in the prior art, which is not limited in this embodiment of the present application. For a specific implementation of the video decoding method, please refer to technical documents related to video decoding. Details will not be described here.

フェーズ３：キャプションの復号 Phase 3: Decoding captions

Ｓ１０６：電子デバイス１００上のビデオアプリケーションは、キャプション情報ストリームを電子デバイス１００上のキャプション復号モジュールに送信する。 S106: The video application on the electronic device 100 sends the caption information stream to a caption decoding module on the electronic device 100.

具体的には、キャプション情報ストリームを取得した後、ビデオアプリケーションは、キャプション情報ストリームをキャプション復号モジュールに送信し得る。 Specifically, after obtaining the caption information stream, the video application may send the caption information stream to a caption decoding module.

Ｓ１０７及びＳ１０８：電子デバイス１００上のキャプション復号モジュールは、キャプション情報ストリームを復号してキャプションフレームを生成し、キャプションフレームを電子デバイス１００上のビデオフレーム合成モジュールに送信する。 S107 and S108: The caption decoding module on the electronic device 100 decodes the caption information stream to generate caption frames and sends the caption frames to the video frame synthesis module on the electronic device 100.

具体的には、ビデオアプリケーションによって送信されたキャプション情報ストリームを受信した後、キャプション復号モジュールは、キャプション情報ストリームを復号してキャプションフレームを生成し得る。キャプションフレームは、ビデオ再生プロセスにおける全てのキャプションフレームであり得る。各キャプションフレームは、キャプションテキスト、キャプションテキストの表示位置、キャプションテキストのフォント色、キャプションテキストのフォントフォーマットなどを含み得、キャプションフレームの時間情報（すなわち、タイムスタンプ）を更に搬送し得る。次いで、キャプション復号モジュールは、復号を通じて生成されたキャプションフレームをビデオフレーム合成モジュールに送信し、その後、表示されるべきビデオフレームを生成し得る。 Specifically, after receiving the caption information stream sent by the video application, the caption decoding module may decode the caption information stream to generate caption frames. The caption frames may be all caption frames in the video playback process. Each caption frame may include caption text, a display position of the caption text, a font color of the caption text, a font format of the caption text, etc., and may further carry time information (i.e., a timestamp) of the caption frame. The caption decoding module may then send the caption frames generated through decoding to a video frame synthesis module, which may then generate the video frames to be displayed.

キャプション復号モジュールは、従来技術のキャプション復号方法を使用することによってキャプション情報ストリームを復号し得る。これは、本出願のこの実施形態では限定されない。キャプション復号方法の具体的な実装については、キャプション復号に関連する技術文書を参照されたい。詳細はここでは説明しない。 The caption decoding module may decode the caption information stream by using a conventional caption decoding method, which is not limited in this embodiment of the present application. For a specific implementation of the caption decoding method, please refer to technical documents related to caption decoding. Details will not be described here.

フェーズ２におけるビデオ復号のステップが最初に実行され、次いでフェーズ３におけるキャプション復号のステップが実行される例は、単に、本出願のこの実施形態において使用されることに留意されたい。いくつかの実施形態では、フェーズ３におけるキャプション復号のステップが最初に実行されてから、フェーズ２におけるビデオ復号のステップが実行されてもよいし、フェーズ２におけるビデオ復号のステップ及びフェーズ３におけるキャプション復号のステップが同時に実行されてもよい。これは、本出願のこの実施形態では限定されない。 Please note that the example in which the video decoding step in phase 2 is performed first and then the caption decoding step in phase 3 is used simply in this embodiment of the present application. In some embodiments, the caption decoding step in phase 3 may be performed first and then the video decoding step in phase 2, or the video decoding step in phase 2 and the caption decoding step in phase 3 may be performed simultaneously. This is not limited in this embodiment of the present application.

フェーズ４：ビデオフレームの合成、レンダリング、及び表示 Phase 4: Compositing, rendering, and displaying video frames

Ｓ１０９及びＳ１１０：電子デバイス１００上のビデオフレーム合成モジュールは、受信したビデオフレームとキャプションフレームとを重ね合わせて結合して、表示されるべきビデオフレームを生成し、表示されるべきビデオフレームを電子デバイス１００上のビデオフレームキューに送信する。 S109 and S110: The video frame synthesis module on the electronic device 100 overlays and combines the received video frame and caption frame to generate the video frame to be displayed, and sends the video frame to be displayed to a video frame queue on the electronic device 100.

具体的には、ビデオフレーム合成モジュールは、ビデオフレームに対応する時間情報とキャプションフレームに対応する時間情報とに基づいてマッチングを実行し、マッチングが完了した後にキャプションフレームを対応するビデオフレームに重ね合わせ、それらを結合して、表示されるべきビデオフレームを生成し得る。その後、ビデオフレーム合成モジュールは、表示されるべきビデオフレームをビデオフレームキューに送信し得る。 Specifically, the video frame synthesis module may perform matching based on time information corresponding to the video frames and time information corresponding to the caption frames, and after the matching is completed, may overlay the caption frames onto the corresponding video frames and combine them to generate the video frames to be displayed. Then, the video frame synthesis module may send the video frames to be displayed to a video frame queue.

Ｓ１１１からＳ１１３：ビデオレンダリングモジュールは、表示されるべきビデオフレームを時系列に基づいてビデオフレームキューから読み出し、表示されるべきビデオフレームを時系列に基づいてレンダリングして、レンダリングされたビデオフレームを生成し得る。 S111 to S113: The video rendering module may read the video frames to be displayed from the video frame queue in chronological order and render the video frames to be displayed in chronological order to generate rendered video frames.

具体的には、ビデオレンダリングモジュールは、表示されるべきビデオフレームをビデオフレームキューからリアルタイムで（又はある時間期間の間隔で）取得し得る。ビデオフレーム合成モジュールが、表示されるべきビデオフレームをビデオフレームキューに送信した後、ビデオレンダリングモジュールは、表示されるべきビデオフレームを時系列に基づいてビデオフレームキューから読み出し、表示されるべきビデオフレームをレンダリングして、レンダリングされたビデオフレームを生成し得る。次いで、ビデオレンダリングモジュールは、レンダリングされたビデオフレームをビデオアプリケーションに送信し得る。 Specifically, the video rendering module may obtain the video frames to be displayed from the video frame queue in real time (or at intervals of a period of time). After the video frame composition module sends the video frames to be displayed to the video frame queue, the video rendering module may read the video frames to be displayed from the video frame queue based on a time sequence and render the video frames to be displayed to generate rendered video frames. The video rendering module may then send the rendered video frames to the video application.

ビデオレンダリングモジュールは、従来技術のビデオレンダリング方法を使用することによって、表示されるべきビデオフレームをレンダリングし得る。これは、本出願のこの実施形態では限定されない。ビデオレンダリング方法の具体的な実装については、ビデオレンダリングに関する技術文書を参照されたい。詳細はここでは説明しない。 The video rendering module may render the video frames to be displayed by using a conventional video rendering method, which is not limited in this embodiment of the present application. For a specific implementation of the video rendering method, please refer to technical documents on video rendering. Details will not be described here.

Ｓ１１４：電子デバイス１００は、レンダリングされたビデオフレームを表示する。 S114: The electronic device 100 displays the rendered video frame.

具体的には、ビデオレンダリングモジュールによって送信されたレンダリングされたビデオフレームを受信した後、電子デバイス１００上のビデオアプリケーションは、レンダリングされたビデオフレームを、電子デバイス１００の表示画面（すなわち、ビデオ再生ウィンドウ）に表示し得る。 Specifically, after receiving the rendered video frames transmitted by the video rendering module, a video application on the electronic device 100 may display the rendered video frames on a display screen (i.e., a video playback window) of the electronic device 100.

例えば、図２Ｃは、電子デバイス１００が図１Ａ及び図１Ｂに示されるキャプション表示方法を実行した後に表示される、レンダリングされたビデオフレーム内のフレームのピクチャであり得る。キャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」、キャプション「認識度の高いキャプション」、及びキャプション「不明瞭な色のキャプション」は、全て弾幕コメントであり、弾幕コメントの表示位置は、電子デバイス１００の表示画面に対して固定されていない。キャプション「音声と同期されたキャプション」の表示位置は、電子デバイス１００の表示画面に対して固定されている。図２Ｃから容易に分かるように、キャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」の両側とビデオの色との間の色の差は小さいので、キャプション認識度が低くなり、その結果、ユーザはこのキャプションをはっきりと見ることができない。キャプション「認識度の高いキャプション」及び「音声と同期されたキャプション」とビデオの色との色の差は大きく、キャプションの認識度が高いので、ユーザはこれらのキャプションをはっきりと見ることができる。キャプション「不明瞭な色のキャプション」とビデオの色との間の色の差は小さくないが、ビデオの輝度が高いので、この場合もキャプション認識度が低くなり、ユーザはこのキャプションをはっきりと見ることができない。 For example, Fig. 2C may be a picture of a frame in a rendered video frame displayed after the electronic device 100 executes the caption display method shown in Fig. 1A and Fig. 1B. The caption "W S Y T K L D G S Y D Z M", the caption "highly recognizable caption", and the caption "unclear color caption" are all barrage comments, and the display positions of the barrage comments are not fixed with respect to the display screen of the electronic device 100. The display position of the caption "caption synchronized with audio" is fixed with respect to the display screen of the electronic device 100. As can be easily seen from Fig. 2C, the color difference between both sides of the caption "W S Y T K L D G S Y D Z M" and the color of the video is small, so the caption recognizability is low, and as a result, the user cannot see the caption clearly. The color difference between the captions "Highly recognizable caption" and "Caption synchronized with audio" and the color of the video is large , so the caption recognition is high and users can see these captions clearly. The color difference between the caption "Obscure color caption" and the color of the video is not small, but the brightness of the video is high , so the caption recognition is also low and users cannot see this caption clearly.

図２Ｃから分かるように、図１Ａ及び図１Ｂに示されるキャプション表示方法を使用した場合、ビデオを再生しながらキャプションを表示するアプリケーションシナリオにおいて、キャプションの色がキャプション表示位置でのビデオの色及び輝度と大きく重なると、キャプション認識度が低くなり、ユーザがキャプションをはっきりと見ることが困難になるので、ユーザエクスペリエンスが低下する。 As can be seen from FIG. 2C, when using the caption display method shown in FIG. 1A and FIG. 1B, in an application scenario in which captions are displayed while a video is being played, if the color of the caption overlaps significantly with the color and brightness of the video at the caption display position, the caption recognition will be low and it will be difficult for the user to clearly see the caption, resulting in a poor user experience.

前述の問題を解決するために、本出願の一実施形態は、別のキャプション表示方法を提供する。電子デバイスは、最初に、再生されるべきビデオファイルと、ビデオ再生ウィンドウに表示されるべきキャプションファイルとを取得し、次いで、ビデオファイルに対してビデオ復号を実行してビデオフレームを取得し、キャプションファイルに対してキャプション復号を実行してキャプションフレームを取得し得る。次いで、電子デバイスは、キャプションフレームからキャプション色域情報、キャプション位置情報などを抽出し、キャプション位置情報に基づいて、ビデオフレーム内にあり、キャプションに対応するキャプション表示位置における色域情報を抽出し、次いで、キャプション色域情報と、ビデオフレーム内にあり、キャプションに対応するキャプション表示位置における色域情報とに基づいて、キャプション認識度を計算し得る。キャプション認識度が低い場合、電子デバイスは、キャプションに対してマスクを追加し、キャプション認識度に基づいてマスクの色値及び透明度を計算して、マスク付きのキャプションフレームを生成し、時系列に基づいてビデオフレームとマスクを有するキャプションフレームとの間の位置合わせ及びマッチングを実行して、最終的な表示されるべきビデオフレームを合成し得る。電子デバイスは、表示されるべきビデオフレームをビデオキューにバッファし、次いで、表示されるべきビデオフレームを時系列に基づいて読み出してレンダリングし、最後に、レンダリングされたビデオフレームをビデオ再生ウィンドウに表示する。このようにして、ユーザによって選択されたキャプションの色を変更することなく、キャプションマスクの色及び透明度を調整することによって、キャプション認識度が低いという問題を解決することができる。加えて、これにより、キャプションによるビデオコンテンツの遮蔽が低減され、ビデオコンテンツの特定の可視性を確保することができ、それにより、ユーザエクスペリエンスを向上させることができる。 In order to solve the above-mentioned problem, an embodiment of the present application provides another caption display method. The electronic device may first obtain a video file to be played and a caption file to be displayed in a video playback window, then perform video decoding on the video file to obtain a video frame, and perform caption decoding on the caption file to obtain a caption frame. The electronic device may then extract caption color gamut information, caption position information, etc. from the caption frame, and extract color gamut information at a caption display position in a video frame corresponding to the caption based on the caption position information, and then calculate a caption recognizability based on the caption color gamut information and the color gamut information at a caption display position in a video frame corresponding to the caption. If the caption recognizability is low , the electronic device may add a mask to the caption, calculate the color value and transparency of the mask based on the caption recognizability to generate a caption frame with the mask, and perform alignment and matching between the video frame and the caption frame with the mask based on a time sequence to synthesize a final video frame to be displayed. The electronic device buffers the video frames to be displayed in a video queue, then reads and renders the video frames to be displayed based on a time sequence, and finally displays the rendered video frames in a video playback window. In this way, the problem of low caption recognition can be solved by adjusting the color and transparency of the caption mask without changing the color of the caption selected by the user. In addition, this can reduce the occlusion of the video content by the caption, and ensure a certain visibility of the video content, thereby improving the user experience.

以下では、本出願の一実施形態において提供される別のキャプション表示方法について説明する。 The following describes another method of displaying captions provided in one embodiment of the present application.

図３Ａ、図３Ｂ、図３Ｃ、及び図３Ｄは、本出願の一実施形態による別のキャプション表示方法の方法手順の一例を示す。 Figures 3A, 3B, 3C, and 3D show an example of a method procedure for another caption display method according to one embodiment of the present application.

図３Ａ、図３Ｂ、図３Ｃ、及び図３Ｄに示すように、本方法は、ビデオ再生機能を有する電子デバイス１００に適用され得る。以下では、方法の特定のステップについて詳細に説明する。 As shown in Figures 3A, 3B, 3C, and 3D, the method may be applied to an electronic device 100 having video playback capabilities. Certain steps of the method are described in detail below.

Ｓ３０１及びＳ３０２：電子デバイス１００は、ユーザによるビデオアプリケーション上でビデオを再生する操作を検出し、この操作に応答して、電子デバイス１００は、ビデオ情報ストリーム及びキャプション情報ストリームを取得し得る。 S301 and S302: The electronic device 100 detects a user's operation to play a video on a video application, and in response to this operation, the electronic device 100 may obtain a video information stream and a caption information stream.

ステップＳ３０１及びＳ３０２の具体的な実行プロセスについては、図１Ａ及び図１Ｂに示される実施形態におけるステップＳ１０１及びＳ１０２の関連する内容を参照されたい。詳細はここでは改めて説明しない。 For the specific execution process of steps S301 and S302, please refer to the relevant contents of steps S101 and S102 in the embodiment shown in Figures 1A and 1B. Details will not be described again here.

フェーズ２：ビデオの復号 Phase 2: Video Decryption

Ｓ３０３：電子デバイス１００上のビデオアプリケーションは、ビデオ情報ストリームを電子デバイス１００上のビデオ復号モジュールに送信する。 S303: A video application on the electronic device 100 sends a video information stream to a video decoding module on the electronic device 100.

Ｓ３０４及びＳ３０５：電子デバイス１００上のビデオ復号モジュールは、ビデオ情報ストリームを復号してビデオフレームを生成し、ビデオフレームを電子デバイス１００上のビデオフレーム合成モジュールに送信する。 S304 and S305: The video decoding module on the electronic device 100 decodes the video information stream to generate video frames and transmits the video frames to a video frame synthesis module on the electronic device 100.

ステップＳ３０３からステップＳ３０５の具体的な実行プロセスについては、図１Ａ及び図１Ｂに示される実施形態におけるステップＳ１０３からステップＳ１０５の関連する内容を参照されたい。詳細はここでは改めて説明しない。 For the specific execution process of steps S303 to S305, please refer to the relevant contents of steps S103 to S105 in the embodiment shown in Figures 1A and 1B. Details will not be described again here.

フェーズ３：キャプションの復号 Phase 3: Decoding captions

Ｓ３０６：電子デバイス１００上のビデオアプリケーションは、キャプション情報ストリームを電子デバイス１００上のキャプション復号モジュールに送信する。 S306: The video application on the electronic device 100 sends the caption information stream to a caption decoding module on the electronic device 100.

Ｓ３０７：電子デバイス１００上のキャプション復号モジュールは、キャプション情報ストリームを復号してキャプションフレームを生成する。 S307: The caption decoding module on the electronic device 100 decodes the caption information stream to generate caption frames.

ステップＳ３０６及びＳ３０７の具体的な実行プロセスについては、図１Ａ及び図１Ｂに示される実施形態におけるステップＳ１０６及びＳ１０７の関連する内容を参照されたい。詳細はここでは改めて説明しない。 For the specific execution process of steps S306 and S307, please refer to the relevant contents of steps S106 and S107 in the embodiment shown in Figures 1A and 1B. Details will not be described again here.

図４は、キャプション復号モジュールによってキャプション情報ストリームを復号することによって生成されたキャプションフレームの一例を示す。 Figure 4 shows an example of a caption frame generated by decoding the caption information stream by the caption decoding module.

図４に示すように、矩形の実線ボックスの内側の領域は、キャプションフレーム表示領域（又は、ビデオ再生ウィンドウ領域と呼ばれる）を表し得、ビデオフレーム表示領域と一致し得る。この領域には、例えば、「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」、「認識度の高いキャプション」、「不明瞭な色のキャプション」、「音声と同期されたキャプション」など、１つ又は複数のキャプションが表示され得る。「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」、「認識度の高いキャプション」などはそれぞれ、１つのキャプションと呼ばれることがあり、この領域に表示される全てのキャプションはキャプショングループと呼ばれることがあり、例えば、「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」、「認識度の高いキャプション」、「不明瞭な色のキャプション」、「音声と同期されたキャプション」などのキャプションのグループはキャプショングループと呼ばれることがある。 As shown in FIG. 4, the area inside the rectangular solid box may represent the caption frame display area (or may be called the video playback window area) and may coincide with the video frame display area. In this area, one or more captions may be displayed, such as, for example, "W S Y T K L D G S Y D Z M", "Highly recognizable caption", "Obscure color caption", "Synchronized with audio caption". Each of "W S Y T K L D G S Y D Z M", "Highly recognizable caption", etc. may be referred to as one caption, and all the captions displayed in this area may be referred to as a caption group, and a group of captions, such as "W S Y T K L D G S Y D Z M", "Highly recognizable caption", "Obscure color caption", "Synchronized with audio caption", etc. may be referred to as a caption group.

図４に示される各キャプションを囲む矩形の破線ボックスは、各キャプションの位置を識別するために使用される単なる補助要素であり、ビデオ再生プロセスにおいて表示されなくてもよい。 The rectangular dashed line boxes surrounding each caption shown in FIG. 4 are merely auxiliary elements used to identify the location of each caption and may not be displayed in the video playback process.

前述の説明とキャプション及びキャプショングループの説明に基づいて、図２Ｃに示すように、図２Ｃに示されるピクチャには４つのキャプションが表示され、それぞれ「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」、「認識度の高いキャプション」、「不明瞭な色のキャプション」、「音声と同期されたキャプション」であることが容易に理解できる。これらの４つのキャプションはキャプショングループを形成する。 Based on the above description and the description of captions and caption groups, as shown in FIG. 2C, it can be easily understood that the picture shown in FIG. 2C displays four captions, which are "W S Y T K L D G S Y D Z M", "Highly recognizable caption", "Obscure color caption", and "Synchronized with audio caption". These four captions form a caption group.

Ｓ３０８：電子デバイス１００上のキャプション復号モジュールは、キャプションフレームから各キャプションのキャプション位置情報、キャプション色域情報などを抽出して、キャプショングループ情報を生成する。 S308: The caption decoding module on the electronic device 100 extracts caption position information, caption color gamut information, etc. of each caption from the caption frame and generates caption group information.

具体的には、キャプションフレームを生成した後、キャプション復号モジュールは、キャプションフレームから各キャプションのキャプション位置情報、キャプション色域情報などを抽出して、キャプショングループ情報を生成し得る。キャプション位置情報は、キャプションフレーム表示領域における各キャプションの表示位置であり、キャプション色域情報は、各キャプションの色値を含み得る。キャプショングループ情報は、キャプションフレーム内の全てのキャプションのキャプション位置情報及びキャプション色域情報を含み得る。 Specifically, after generating the caption frame, the caption decoding module may extract caption position information, caption color gamut information, etc. of each caption from the caption frame to generate caption group information. The caption position information is the display position of each caption in the caption frame display area, and the caption color gamut information may include the color value of each caption. The caption group information may include the caption position information and caption color gamut information of all captions in the caption frame.

任意選択で、キャプション色域情報は、キャプションの輝度などの情報も含み得る。 Optionally, the caption color gamut information may also include information such as the luminance of the caption.

以下では、キャプション位置情報及びキャプション色域情報を抽出するプロセスについて詳細に説明する。 The process of extracting caption position information and caption color gamut information is described in detail below.

１．キャプション位置情報を抽出するプロセス： 1. The process of extracting caption location information:

キャプションの表示位置領域は、図４に示される、キャプションを正確に覆うことができる矩形の破線ボックスの内部領域、又はキャプションを覆うことができる任意の形状の別の内部領域であり得る。これは、本出願のこの実施形態では限定されない。 The display location area of the caption can be the inner area of the rectangular dashed box shown in FIG. 4 that can exactly cover the caption, or another inner area of any shape that can cover the caption. This is not limited in this embodiment of the present application.

本出願のこの実施形態では、キャプション位置情報を抽出するプロセスを説明するために、矩形の破線ボックスの内部領域がキャプションの表示位置領域である例が使用される。 In this embodiment of the present application, an example is used in which the inner area of a rectangular dashed box is the display position area of the caption to explain the process of extracting caption position information.

図４に示されるキャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」のキャプション位置情報の抽出を一例として使用する。キャプション復号モジュールは、最初に、キャプションフレーム表示領域内にＸ－Ｏ－Ｙ平面矩形座標系を確立し、次に、キャプションフレーム表示領域内の点（例えば、矩形の実線ボックスの左下隅の頂点）を基準座標点Ｏとして選択し得る。基準座標点Ｏの座標は、（０，０）に設定され得る。数学的知識から分かるように、キャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」の外側の矩形の破線ボックスの４つの頂点の座標（ｘ１，ｙ１）、（ｘ２，ｙ２）、（ｘ３，ｙ３）、及び（ｘ４，ｙ４）を計算することができる。この場合、キャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」の位置情報は、矩形の破線ボックスの４つの頂点の座標を含み得る。代替的に、矩形は規則的な形状であるので、矩形の位置領域は、矩形の破線ボックスの対角線上の２つの頂点の座標を決定するだけで決定することができる。従って、キャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」の位置情報は、矩形の破線ボックスの対角線上の２つの頂点の座標のみを含み得る。 Take the extraction of caption position information of the caption "W S Y T K L D G S Y D Z M" shown in Figure 4 as an example. The caption decoding module may first establish an X-O-Y plane rectangular coordinate system within the caption frame display area, and then select a point within the caption frame display area (e.g., the lower left corner vertex of the rectangular solid-line box) as the reference coordinate point O. The coordinates of the reference coordinate point O may be set to (0,0). As can be seen from mathematical knowledge, the coordinates (x1,y1), (x2,y2), (x3,y3), and (x4,y4) of the four vertices of the outer rectangular dashed-line box of the caption "W S Y T K L D G S Y D Z M" can be calculated. In this case, the position information of the caption "W S Y T K L D G S Y D Z M" may include the coordinates of the four vertices of the rectangular dashed box. Alternatively, since a rectangle is a regular shape, the position region of the rectangle can be determined by only determining the coordinates of the two diagonal vertices of the rectangular dashed box. Thus, the position information of the caption "W S Y T K L D G S Y D Z M" may include only the coordinates of the two diagonal vertices of the rectangular dashed box.

同様に、図４に示される他のキャプションのキャプション位置情報も、前述のキャプション位置抽出方法を使用することによって抽出され得る。詳細はここでは改めて説明しない。 Similarly, the caption position information of the other captions shown in FIG. 4 can be extracted by using the caption position extraction method described above. Details will not be described again here.

キャプション復号モジュールがキャプションフレーム内の全てのキャプションの位置情報を決定した後、キャプション復号モジュールがキャプション位置情報の抽出を完了したことを示す。 Indicates that after the caption decoding module has determined the position information of all captions in the caption frame, the caption decoding module has completed extracting the caption position information.

前述のキャプション位置情報抽出プロセスは、キャプション位置情報を抽出する可能な実装形態にすぎないことに留意されたい。キャプション位置情報を抽出する実装は、代替的に、従来技術における別の実装形態であってもよい。これは、本出願のこの実施形態では限定されない。 Please note that the above-described caption location information extraction process is only a possible implementation of extracting caption location information. The implementation of extracting caption location information may alternatively be another implementation in the prior art. This is not limited in this embodiment of the present application.

２．キャプション色域情報を抽出するプロセス： 2. The process of extracting caption gamut information:

最初に、キャプション色域情報を抽出するプロセスに関連する概念について以下で説明する。 First, we discuss the concepts related to the process of extracting caption gamut information below.

色値： Color value:

色値は、カラーモードにおける色に対応する色値のグループである。ＲＧＢカラーモードが一例として使用される。ＲＧＢカラーモードでは、色は、赤、緑、及び青を混合することによって形成され、各色の色値は、（ｒ，ｇ，ｂ）で表され得、ここで、ｒ、ｇ、及びｂは、それぞれ、赤、緑、及び青という３つの原色の値を表し、値範囲は［０，２５５］である。例えば、赤の色値は、（２５５，０，０）で表され、緑の色値は、（０，２５５，０）で表され得、青の色値は、（０，０，２５５）で表され得、黒の色値は、（０，０，０）で表され得、白の色値は、（２５５，２５５，２５５）で表され得る。 A color value is a group of color values that correspond to colors in a color mode. The RGB color mode is used as an example. In the RGB color mode, colors are formed by mixing red, green, and blue, and the color value of each color may be represented by (r, g, b), where r, g, and b represent the values of the three primary colors red, green, and blue, respectively, with a value range of [0, 255]. For example, the color value of red may be represented by (255, 0, 0), the color value of green may be represented by (0, 255, 0), the color value of blue may be represented by (0, 0, 255), the color value of black may be represented by (0, 0, 0), and the color value of white may be represented by (255, 255, 255).

色域： Color gamut:

色域は、色値のセット、すなわち、特定のカラーモードで生成することができる色のセットである。ＲＧＢカラーモードでは、最大２５６×２５６×２５６＝１６７７７２１６個の異なる色、すなわち、２^２４個の異なる色を生成することができ、色域は［０，２^２４－１］であることが容易に理解できる。２^２４個の異なる色と各色に対応する色値とが色値テーブルを形成し得、各色に対応する色値は、色値テーブル内で見つけられ得る。 A color gamut is a set of color values, i.e., a set of colors that can be produced in a particular color mode. It is easy to see that in the RGB color mode, a maximum of 256×256×256=16777216 different colors, i.e., 2 ²⁴ different colors, can be produced, and the color gamut is [0, 2 ²⁴ −1]. The 2 ²⁴ different colors and the color values corresponding to each color may form a color value table, in which the color values corresponding to each color may be found.

キャプション位置情報を抽出した後、キャプション復号モジュールは、キャプションの色値を決定するために、キャプション位置におけるキャプションのフォント色に基づいて、そのフォント色に対応する色値を求めて色値テーブルを探索し得る。 After extracting the caption position information, the caption decoding module may search a color value table for a color value corresponding to the font color based on the font color of the caption at the caption position to determine a color value of the caption.

キャプション復号モジュールがキャプションフレーム内の全てのキャプションの色値を決定した後、キャプション復号モジュールがキャプション色域情報の抽出を完了したことを示す。 Indicates that after the caption decoding module has determined the color values of all captions in the caption frame, the caption decoding module has completed extracting caption color gamut information.

Ｓ３０９：電子デバイス１００上のキャプション復号モジュールは、キャプショングループのマスクパラメータを取得する命令を電子デバイス１００上のビデオフレーム色域解釈モジュールに送信し、この命令は、キャプションフレームの時間情報、キャプショングループ情報などを搬送する。 S309: The caption decoding module on the electronic device 100 sends an instruction to the video frame gamut interpretation module on the electronic device 100 to obtain mask parameters of the caption group, which instruction carries time information of the caption frame, caption group information, etc.

具体的には、キャプショングループ情報を生成した後、キャプション復号モジュールは、キャプショングループのマスクパラメータを取得する命令をビデオフレーム色域解釈モジュールに送信し得る。命令は、キャプショングループに対応するマスクパラメータ（マスクの色値及び透明度を含む）をキャプション復号モジュールに送信するようにビデオフレーム色域解釈モジュールに命令するために使用される。色値及び透明度は、マスクパラメータのグループと呼ばれることがある。命令は、キャプションフレームの時間情報、キャプショングループ情報などを搬送し得る。キャプションフレームの時間情報は、後続のステップにおいてキャプショングループに対応するビデオフレームを取得するために使用され得、キャプショングループ情報は、後続のステップにおいてキャプション認識度を分析するために使用され得る。 Specifically, after generating the caption group information, the caption decoding module may send an instruction to the video frame gamut interpretation module to obtain mask parameters of the caption group. The instruction is used to instruct the video frame gamut interpretation module to send mask parameters (including the color value and transparency of the mask) corresponding to the caption group to the caption decoding module. The color value and transparency may be referred to as a group of mask parameters. The instruction may carry time information of the caption frame, caption group information, etc. The time information of the caption frame may be used to obtain a video frame corresponding to the caption group in a subsequent step, and the caption group information may be used to analyze the caption recognition degree in a subsequent step.

Ｓ３１０：電子デバイス１００上のビデオフレーム色域解釈モジュールは、キャプショングループに対応するビデオフレームを取得する命令を電子デバイス１００上のビデオ復号モジュールに送信し、この命令は、キャプションフレームの時間情報などを搬送する。 S310: The video frame gamut interpretation module on the electronic device 100 sends an instruction to the video decoding module on the electronic device 100 to obtain a video frame corresponding to the caption group, the instruction carrying time information of the caption frame, etc.

具体的には、キャプション復号モジュールによって送信されたキャプショングループのマスクパラメータを取得する命令を受信した後に、ビデオフレーム色域解釈モジュールは、キャプショングループに対応するビデオフレームを取得する命令をビデオ復号モジュールに送信し得、命令は、キャプショングループに対応するビデオフレームをビデオフレーム色域解釈モジュールに送信するようにビデオ復号モジュールに命令するために使用される。命令は、キャプションフレームの時間情報を搬送し得、キャプションフレームの時間情報は、キャプショングループに対応するビデオフレームを見つけるためにビデオ復号モジュールによって使用され得る。 Specifically, after receiving the instruction to obtain the mask parameters of the caption group sent by the caption decoding module, the video frame gamut interpretation module may send an instruction to obtain the video frame corresponding to the caption group to the video decoding module, and the instruction is used to instruct the video decoding module to send the video frame corresponding to the caption group to the video frame gamut interpretation module. The instruction may carry time information of the caption frame, and the time information of the caption frame may be used by the video decoding module to find the video frame corresponding to the caption group.

Ｓ３１１及びＳ３１２：電子デバイス１００上のビデオ復号モジュールは、キャプショングループに対応するビデオフレームを探索し、キャプショングループに対応するビデオフレームを電子デバイス１００上のビデオフレーム色域解釈モジュールに送信する。 S311 and S312: The video decoding module on the electronic device 100 searches for a video frame corresponding to the caption group and sends the video frame corresponding to the caption group to the video frame gamut interpretation module on the electronic device 100.

具体的には、ビデオ復号モジュールが、ビデオフレーム色域解釈モジュールによって送信された、キャプショングループに対応するビデオフレームを取得する命令を受信した後、ビデオ復号モジュールは、命令で搬送されたキャプションフレームの時間情報に基づいて、キャプショングループに対応するビデオフレームを見つけ得る。ビデオ復号モジュールは、ビデオ復号フェーズにおける復号を通じて全てのビデオフレームの時間情報を取得しているので、ビデオ復号モジュールは、全てのビデオフレームの時間情報をキャプションフレームの時間情報とマッチングさせることができる。マッチングが成功した場合（すなわち、ビデオフレームの時間情報がキャプションフレームの時間情報と一致した場合）、ビデオフレームは、キャプショングループに対応するビデオフレームである。その後、ビデオ復号モジュールは、キャプショングループに対応するビデオフレームをビデオフレーム色域解釈モジュールに送信し得る。 Specifically, after the video decoding module receives an instruction to obtain a video frame corresponding to a caption group sent by the video frame gamut interpretation module, the video decoding module may find a video frame corresponding to the caption group based on the time information of the caption frame carried in the instruction. Since the video decoding module has obtained the time information of all video frames through decoding in the video decoding phase, the video decoding module can match the time information of all video frames with the time information of the caption frame. If the matching is successful (i.e., if the time information of the video frame matches the time information of the caption frame), the video frame is a video frame corresponding to the caption group. Then, the video decoding module may send the video frame corresponding to the caption group to the video frame gamut interpretation module.

Ｓ３１３：電子デバイス１００上のビデオフレーム色域解釈モジュールは、キャプショングループ情報内のキャプション位置情報に基づいて、キャプショングループに対応するビデオフレーム内の各キャプションの位置における色域情報を取得する。 S313: The video frame gamut interpretation module on the electronic device 100 obtains gamut information at the position of each caption in the video frame corresponding to the caption group based on the caption position information in the caption group information.

具体的には、キャプショングループに対応するビデオフレームを取得した後、ビデオフレーム色域解釈モジュールは、キャプショングループ情報内の各キャプション位置情報に基づいて、各キャプションの位置に対応するビデオフレーム領域を決定し得る。更に、ビデオフレーム色域解釈モジュールは、各キャプションの位置に対応するビデオフレーム領域の色域情報を算出し得る。 Specifically, after obtaining a video frame corresponding to a caption group, the video frame gamut interpretation module may determine a video frame region corresponding to the position of each caption based on each caption position information in the caption group information. Furthermore, the video frame gamut interpretation module may calculate gamut information of the video frame region corresponding to the position of each caption.

以下では、ビデオフレーム色域解釈モジュールが、各キャプションの位置に対応するビデオフレーム領域の色域情報を計算するプロセスについて詳細に説明する。 The following describes in detail the process by which the video frame gamut interpretation module calculates the gamut information of the video frame region corresponding to each caption position.

図２Ｃに示されるピクチャ中のキャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」がキャプション１であると仮定する。ビデオフレーム色域解釈モジュールが、キャプション１に対応するビデオフレーム領域の色域情報を計算する例が説明のために使用される。 Assume that the caption "W S Y T K L D G S Y D Z M" in the picture shown in FIG. 2C is caption 1. An example in which the video frame gamut interpretation module calculates the gamut information of the video frame region corresponding to caption 1 is used for explanation.

図５に示すように、キャプション１の位置に対応するビデオフレーム領域は、図５の上部の矩形の実線ボックスの内部領域であり得る。異なる色域のピクセル領域が１つのビデオフレーム領域内に存在し得るので、１つのビデオフレーム領域は、複数のサブ領域に分割され得、各サブ領域は、ビデオフレーム色域抽出単位と呼ばれることがある。サブ領域の分割は、予め設定された幅に基づいて実行されてもよいし、キャプション内の各文字の幅に基づいて分割されてもよい。例えば、キャプション１は、全部で１３文字を有する。この場合、図５では、キャプション１の位置に対応するビデオフレーム領域は、キャプション１内の各文字の幅に基づいて、１３個のサブ領域、すなわち、１３個のビデオフレーム色域抽出単位に分割される。 As shown in FIG. 5, the video frame region corresponding to the position of caption 1 may be the inner region of the rectangular solid box at the top of FIG. 5. Because pixel regions of different color gamuts may exist within one video frame region, one video frame region may be divided into multiple sub-regions, and each sub-region may be referred to as a video frame gamut extraction unit. The division of the sub-regions may be performed based on a preset width, or may be divided based on the width of each character in the caption. For example, caption 1 has a total of 13 characters. In this case, in FIG. 5, the video frame region corresponding to the position of caption 1 is divided into 13 sub-regions, i.e., 13 video frame gamut extraction units, based on the width of each character in caption 1.

更に、ビデオフレーム色域解釈モジュールは、全てのサブ領域の色域情報を左から右へ（又は右から左へ）順次計算し得る。ビデオフレーム領域内の１つのサブ領域の色域情報の計算を一例として使用する。ビデオフレーム色域解釈モジュールは、サブ領域内の全てのピクセルの色値を取得し、次いで、全てのピクセルの色値に対して重ね合わせ及び平均化を実行し、サブ領域内の全てのピクセルの色値の平均値を取得し得る。平均値は、サブ領域の色値であり、サブ領域の色値は、サブ領域の色域情報である。 Furthermore, the video frame gamut interpretation module may calculate the gamut information of all sub-regions sequentially from left to right (or right to left). Using the calculation of the gamut information of one sub-region in the video frame region as an example, the video frame gamut interpretation module may obtain the color values of all pixels in the sub-region, and then perform overlapping and averaging on the color values of all pixels to obtain an average value of the color values of all pixels in the sub-region. The average value is the color value of the sub-region, and the color value of the sub-region is the gamut information of the sub-region.

例えば、サブ領域が幅ｍピクセル及び高さｎピクセルであると仮定すると、サブ領域は、合計でｍ＊ｎピクセルを有し、各ピクセルの色値ｘは、（ｒ，ｇ，ｂ）で表され得る。この場合、サブ領域内の全てのピクセルの色値の平均値

は、以下である：

For example, assuming a subregion is m pixels wide and n pixels high, the subregion has a total of m*n pixels, and the color value x of each pixel can be expressed as (r,g,b), where the average color value of all pixels in the subregion is

is the following:

ｒ_ｉは、サブ領域内の全てのピクセルの平均赤色値であり、ｇ_ｉは、サブ領域内の全てのピクセルの平均緑色値であり、ｂ_ｉは、サブ領域内の全てのピクセルの平均青色値である。

は、ｉ番目のピクセルの赤色値であり、

は、ｉ番目のピクセルの緑色値であり、

は、ｉ番目のピクセルの青色値である。 r _i is the average red value of all pixels in the subregion, g _i is the average green value of all pixels in the subregion, and b _i is the average blue value of all pixels in the subregion.

is the red value of the i-th pixel,

is the green color value of the i-th pixel,

is the blue value of the i-th pixel.

同様に、ビデオフレーム色域解釈モジュールは、各キャプションの位置に対応するビデオフレーム領域の全てのサブ領域の色域情報、すなわち、キャプショングループに対応するビデオフレーム内のキャプション位置の色域情報を計算し得る。 Similarly, the video frame gamut interpretation module may calculate gamut information for all sub-regions of the video frame region corresponding to each caption position, i.e., gamut information for the caption positions in the video frame corresponding to the caption group.

キャプションに対応するビデオフレーム領域が分割される複数のサブ領域の数は、予め設定された分割規則に従って決定されてもよいことを理解されたい。これは、本出願のこの実施形態では限定されない。 It should be understood that the number of sub-regions into which the video frame region corresponding to the caption is divided may be determined according to a preset division rule, which is not limited in this embodiment of the present application.

任意選択で、ビデオフレーム領域の色域情報は、ビデオフレーム領域の輝度などの情報も含み得る。 Optionally, the color gamut information for a video frame region may also include information such as the luminance of the video frame region.

各キャプションの位置に対応するビデオフレーム領域の色域情報を計算する前述のプロセスは、可能な実装形態にすぎず、別の実装形態が使用されてもよいことに留意されたい。これは、本出願のこの実施形態では限定されない。 It should be noted that the above process of calculating color gamut information for the video frame region corresponding to each caption position is only a possible implementation and other implementations may be used. This is not a limitation in this embodiment of the present application.

Ｓ３１４：電子デバイス１００上のビデオフレーム色域解釈モジュールは、キャプショングループ情報内の各キャプション色域情報と、キャプショングループに対応するビデオフレーム内の各キャプション位置における色域情報とに基づいて、重畳キャプション認識度分析結果（superimposed caption recognition analysis result）を生成する。 S314: The video frame gamut interpretation module on the electronic device 100 generates a superimposed caption recognition analysis result based on each caption gamut information in the caption group information and the gamut information at each caption position in the video frame corresponding to the caption group.

具体的には、キャプショングループに対応するビデオフレーム内のキャプション位置における色域情報を計算した後、ビデオフレーム色域解釈モジュールは、キャプショングループ情報内のキャプション色域情報と、キャプショングループに対応するビデオフレーム内のキャプション位置における色域情報とに基づいて、重畳キャプション認識度分析を実行し得る。更に、重畳キャプション認識度分析結果は、重畳キャプション認識度分析を通じて生成され得、その結果は、キャプショングループ内の各キャプションの認識度の大きさ（識別の大きさとも呼ばれる）を示すために使用される。 Specifically, after calculating the color gamut information at the caption position in the video frame corresponding to the caption group, the video frame color gamut interpretation module may perform an overlay caption recognizability analysis based on the caption color gamut information in the caption group information and the color gamut information at the caption position in the video frame corresponding to the caption group. Furthermore, an overlay caption recognizability analysis result may be generated through the overlay caption recognizability analysis, and the result is used to indicate the magnitude of recognizability (also referred to as the discrimination magnitude) of each caption in the caption group.

言い換えると、ビデオフレーム色域解釈モジュールは、キャプショングループがキャプショングループに対応するビデオフレーム内のキャプション位置に重ね合わされた後、キャプションの色とキャプションに対応するビデオフレーム領域の色との間の差を決定し得る。差が小さい場合、キャプションの認識度が低く、ユーザに容易に認識されないことを示す。 In other words, the video frame color gamut interpretation module may determine the difference between the color of the caption and the color of the video frame region corresponding to the caption after the caption group is superimposed on the caption position in the video frame corresponding to the caption group. If the difference is small, it indicates that the caption has low recognizability and is not easily recognized by the user.

以下では、ビデオフレーム色域解釈モジュールが重畳キャプション認識度分析を実行するプロセスについて詳細に説明する。 The following describes in detail the process by which the video frame gamut interpretation module performs overlay caption recognition analysis.

ビデオフレーム色域解釈モジュールは、キャプションの色とキャプションに対応するビデオフレーム領域の色との間の色差値を決定し得、色差値は、キャプションの色とキャプションに対応するビデオフレーム領域の色との間の差を示すために使用される。色差値は、従来技術における関連アルゴリズムを使用することによって決定され得る。 The video frame color gamut interpretation module may determine a color difference value between the color of the caption and the color of the video frame region corresponding to the caption, the color difference value being used to indicate the difference between the color of the caption and the color of the video frame region corresponding to the caption. The color difference value may be determined by using a related algorithm in the prior art.

可能な実装形態では、色差値Ｄｉｆｆは、以下の式を使用することで計算され得る：

In a possible implementation, the color difference value Diff may be calculated using the following formula:

ｋは、キャプションに対応するビデオフレーム領域の全てのサブ領域の数であり、ｒ_ｉは、サブ領域内の全てのピクセルの平均赤色値であり、ｇ_ｉは、サブ領域内の全てのピクセルの平均緑色値であり、ｂ_ｉは、サブ領域内の全てのピクセルの平均青色値である。ｒ_０は、キャプションの赤色値であり、ｇ_０は、キャプションの緑色値であり、ｂ_０は、キャプションの青色値である。 k is the number of all sub-regions of the video frame region that correspond to the caption, _ri is the average red value of all pixels in the sub-region, g _i is the average green value of all pixels in the sub-region, _bi is the average blue value of all pixels in the sub-region, r ₀ is the red value of the caption, g ₀ is the green value of the caption, and b ₀ is the blue value of the caption.

更に、計算を通じて色差値を取得した後、ビデオフレーム色域解釈モジュールは、色差値が、予め設定された色差閾値未満であるかどうかを決定することによって、キャプション認識度の大きさを決定し得る。 Furthermore, after obtaining the color difference value through calculation, the video frame color gamut interpretation module may determine the magnitude of the caption recognizability by determining whether the color difference value is less than a pre-set color difference threshold.

色差値が予め設定された色差閾値（第１の閾値とも呼ばれることもある）未満である場合、それは、キャプション認識度が低いことを示す。 If the color difference value is less than a pre-set color difference threshold (sometimes called a first threshold), it indicates low caption recognition.

いくつかの実施形態では、キャプション認識度は、キャプションに対応するビデオフレーム領域の輝度を参照して更に決定され得る。 In some embodiments, the degree of caption recognition may be further determined with reference to the luminance of the area of the video frame that corresponds to the caption.

例えば、図２Ｃに示されるキャプション「不明瞭な色のキャプション」のキャプションの色とキャプションに対応するビデオフレーム領域の色との間の色差値はそれ程小さくないが、キャプションに対応するビデオフレーム領域の輝度が高すぎるので、キャプション認識度が低いという問題が依然として存在する。従って、この場合、キャプション認識度は、キャプションに対応するビデオフレーム領域の輝度を参照して更に決定され得る。キャプションに対応するビデオフレーム領域の輝度が、予め設定された輝度閾値よりも高い場合、キャプション認識度が低いことを示す。 For example, although the color difference value between the caption color of the caption "Caption with unclear color" shown in FIG. 2C and the color of the video frame area corresponding to the caption is not very small, the problem of low caption recognizability still exists because the luminance of the video frame area corresponding to the caption is too high. Therefore, in this case, the caption recognizability can be further determined with reference to the luminance of the video frame area corresponding to the caption. If the luminance of the video frame area corresponding to the caption is higher than the preset luminance threshold, it indicates low caption recognizability.

純色キャプションの場合、抽出されたキャプション色域情報は、１つのパラメータ、すなわち、キャプションに対応する１つの色値のみを含み得る。非純色キャプションの場合、抽出されたキャプション色域情報は、複数のパラメータを含み得る。例えば、グラデーションカラーキャプションの場合、抽出されたキャプション色域情報は、開始点色値、終了点色値、グラデーション方向のような複数のパラメータを含み得る。この場合、可能な実装形態では、キャプションの開始点色値及び終了点色値の平均値が最初に計算され得、次いで、平均値をキャプションに対応する色値として使用して重畳キャプション認識度分析を実行する。 For a solid color caption, the extracted caption color gamut information may include only one parameter, i.e., one color value corresponding to the caption. For a non-solid color caption, the extracted caption color gamut information may include multiple parameters. For example, for a gradient color caption, the extracted caption color gamut information may include multiple parameters such as a start point color value, an end point color value, and a gradient direction. In this case, in a possible implementation, an average value of the start point color value and the end point color value of the caption may be calculated first, and then the average value is used as the color value corresponding to the caption to perform the overlay caption recognition analysis.

ビデオフレーム色域解釈モジュールが重畳キャプション認識度分析を実行する前述のプロセスは、可能な実装形態にすぎず、別の実装形態が使用されてもよいことに留意されたい。これは、本出願のこの実施形態では限定されない。 It should be noted that the above process in which the video frame gamut interpretation module performs the overlay caption recognition analysis is only a possible implementation and other implementations may be used. This is not a limitation in this embodiment of the present application.

Ｓ３１５：電子デバイス１００上のビデオフレーム色域解釈モジュールは、重畳キャプション認識度分析結果に基づいて、キャプショングループ内の各キャプションに対応するマスクの色値及び透明度を計算する。 S315: The video frame color gamut interpretation module on the electronic device 100 calculates the color value and transparency of a mask corresponding to each caption in the caption group based on the overlay caption recognition analysis result.

具体的には、重畳キャプション認識度分析結果を生成した後、ビデオフレーム色域解釈モジュールは、その結果に基づいて、キャプションフレーム内の各キャプションに対応するマスクの色値及び透明度を計算し得る。 Specifically, after generating the overlay caption recognition analysis result, the video frame color gamut interpretation module may calculate the color value and transparency of a mask corresponding to each caption in the caption frame based on the result.

認識度の高いキャプション（例えば、図２Ｃのキャプション「認識度の高いキャプション」又はキャプション「音声と同期されたキャプション」）の場合、そのキャプションに対応するマスクの色値は、予め設定された固定値であり得、透明度は１００％に設定され得る。 For a highly recognizable caption (e.g., the caption "Highly Recognizable Caption" or the caption "Caption Synchronized with Audio" in FIG. 2C), the color value of the mask corresponding to that caption may be a preset fixed value and the transparency may be set to 100%.

認識度の低いキャプション（例えば、図２Ｃのキャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」又はキャプション「不明瞭な色のキャプション」）の場合、キャプションに対応するマスクの色値及び透明度は、キャプションの色域情報又はキャプションの位置に対応するビデオフレーム領域の色域情報に基づいて更に決定される必要がある。 For captions with low recognition (e.g., the caption "W S Y T K L D G S Y D Z M" or the caption "Obscure color caption" in FIG. 2C), the color value and transparency of the mask corresponding to the caption needs to be further determined based on the color gamut information of the caption or the color gamut information of the video frame region corresponding to the position of the caption.

キャプションに対応するマスクの色値及び透明度を決定する具体的な方法は多く存在し得る。これは、本出願のこの実施形態では限定されず、当業者は、要件に従って方式を選択してもよい。 There may be many specific methods for determining the color value and transparency of the mask corresponding to the caption. This is not limited in this embodiment of the present application, and those skilled in the art may select the method according to the requirements.

可能な実装形態では、キャプションの色値又はキャプションに対応するビデオフレーム領域の色値から最大の色差値を有する色に対応する色値が、キャプションに対応するマスクの色値として決定され得る。このようにして、ユーザはキャプションをよりはっきりと見ることができる。代替的に、キャプションの色値又はキャプションに対応するビデオフレーム領域の色値から中間の色差値を有する色に対応する色値が、キャプションに対応するマスクの色値として決定されてもよく、これにより、大きな色差によって引き起こされる目の不快感などを回避しながら、ユーザがキャプションをはっきりと見ることができることを確実にする。 In a possible implementation, a color value corresponding to a color having the largest color difference value from the color value of the caption or the color value of the video frame region corresponding to the caption may be determined as the color value of the mask corresponding to the caption. In this way, the user can see the caption more clearly. Alternatively, a color value corresponding to a color having an intermediate color difference value from the color value of the caption or the color value of the video frame region corresponding to the caption may be determined as the color value of the mask corresponding to the caption, thereby ensuring that the user can see the caption clearly while avoiding eye discomfort caused by large color differences, etc.

例えば、電子デバイス１００は、色値テーブル内の各色に対応する色値とキャプションの色値との間の色差値Ｄｉｆｆを算出し得、次いで、最大／中間の色差値Ｄｉｆｆを有する色に対応する色値をマスクの色値として選択し得る。可能な実装形態では、色値テーブル内の各色に対応する色値とキャプションの色値との間の色差値Ｄｉｆｆは、以下の式を使用することによって計算され得る：

For example, the electronic device 100 may calculate a color difference value Diff between the color value corresponding to each color in the color value table and the color value of the caption, and then select the color value corresponding to the color with the maximum/middle color difference value Diff as the color value of the mask. In a possible implementation, the color difference value Diff between the color value corresponding to each color in the color value table and the color value of the caption may be calculated by using the following formula:

色値テーブル内の特定の色に対応する色値が（Ｒ_０，Ｇ_０，Ｂ_０）であると仮定すると、Ｒ_０は、その色に対応する赤色値であり、Ｇ_０は、その色に対応する緑色値であり、Ｂ_０は、その色に対応する青色値である。ｒ_０は、キャプションの赤色値であり、ｇ_０は、キャプションの緑色値であり、ｂ_０は、キャプションの青色値である。 Assuming the color value corresponding to a particular color in the color value table is ( _R0 , _G0 , _B0 ), _R0 is the red value corresponding to that color, _G0 is the green value corresponding to that color, _B0 is the blue value corresponding to that color, _r0 is the red value of the caption, _g0 is the green value of the caption, and _b0 is the blue value of the caption.

他の例として、電子デバイス１００は、色値テーブル内の各色に対応する色値とキャプションに対応するビデオフレーム領域の色値との間の色差値Ｄｉｆｆを計算し、次いで、最大／中間の色差値Ｄｉｆｆを有する色に対応する色値をマスクの色値として選択し得る。可能な実装形態では、色値テーブル内の各色に対応する色値とキャプションに対応するビデオフレーム領域の色値との間の色差値Ｄｉｆｆは、以下の式を使用することによって計算され得る：

As another example, the electronic device 100 may calculate a color difference value Diff between the color value corresponding to each color in the color value table and the color value of the video frame region corresponding to the caption, and then select the color value corresponding to the color with the maximum/middle color difference value Diff as the color value of the mask. In a possible implementation, the color difference value Diff between the color value corresponding to each color in the color value table and the color value of the video frame region corresponding to the caption may be calculated by using the following formula:

色値テーブル内の特定の色に対応する色値が（Ｒ_０，Ｇ_０，Ｂ_０）であると仮定すると、Ｒ_０は、その色に対応する赤色値であり、Ｇ_０は、その色に対応する緑色値であり、Ｂ_０は、その色に対応する青色値である。ｋは、キャプションに対応するビデオフレーム領域の全てのサブ領域の数であり、ｒ_ｉは、サブ領域内の全てのピクセルの平均赤色値であり、ｇ_ｉは、サブ領域内の全てのピクセルの平均緑色値である。 Assuming the color value corresponding to a particular color in the color value table is ( _R0 , _G0 , _B0 ), _R0 is the red value corresponding to that color, _G0 is the green value corresponding to that color, and _B0 is the blue value corresponding to that color. k is the number of all sub-regions of the video frame region corresponding to the caption, _ri is the average red value of all pixels in the sub-region, and g _i is the average green value of all pixels in the sub-region.

可能な実装形態では、キャプションに対応するマスクの透明度が、キャプションに対応するマスクの色値に基づいて更に決定され得る。例えば、キャプションに対応するマスクの色値がキャプションの色値と大きく異なる場合、キャプションに対応するマスクの透明度に対して大きい値（例えば、５０％より大きい値）が選択され得、キャプション重畳領域によるビデオピクチャの遮蔽を低減しながら、ユーザがキャプションをはっきりと見ることを確実にする。 In a possible implementation, the transparency of the mask corresponding to the caption may be further determined based on the color value of the mask corresponding to the caption. For example, if the color value of the mask corresponding to the caption is significantly different from the color value of the caption, a large value (e.g., a value larger than 50%) may be selected for the transparency of the mask corresponding to the caption, ensuring that the user can clearly see the caption while reducing the occlusion of the video picture by the caption overlapping region.

Ｓ３１６：電子デバイス１００上のビデオフレーム色域解釈モジュールは、キャプショングループ内の各キャプションに対応するマスクの色値及び透明度を電子デバイス１００上のキャプション復号モジュールに送信する。 S316: The video frame color gamut interpretation module on the electronic device 100 sends the color values and transparency of the masks corresponding to each caption in the caption group to the caption decoding module on the electronic device 100.

具体的には、キャプショングループ内の各キャプションに対応するマスクの色値及び透明度を計算した後、ビデオフレーム色域解釈モジュールは、キャプショングループ内の各キャプションに対応するマスクの色値及び透明度をキャプション復号モジュールに送信し得、マスクに対応するキャプションのキャプション位置情報も搬送され得、それにより、キャプション復号モジュールは、キャプションをマスクに１対１でマッピングし得る。 Specifically, after calculating the color values and transparency of the mask corresponding to each caption in the caption group, the video frame color gamut interpretation module may send the color values and transparency of the mask corresponding to each caption in the caption group to the caption decoding module, and the caption position information of the caption corresponding to the mask may also be conveyed, so that the caption decoding module may map the caption to the mask one-to-one.

Ｓ３１７：電子デバイス１００上のキャプション復号モジュールは、キャプショングループ内の各キャプションに対応するマスクの色値及び透明度に基づいて、対応するマスクを生成し、キャプショングループ内の各キャプションと対応するマスクとを重ね合わせて、マスクを有するキャプションフレームを生成する。 S317: The caption decoding module on the electronic device 100 generates a corresponding mask based on the color value and transparency of the mask corresponding to each caption in the caption group, and superimposes each caption in the caption group with the corresponding mask to generate a caption frame with a mask.

具体的には、ビデオフレーム色域解釈モジュールによって送信されたキャプショングループ内の各キャプションに対応するマスクの色値及び透明度を受信した後、キャプション復号モジュールは、キャプションに対応するマスクの色値及び透明度と、キャプションのキャプション位置情報とに基づいて、キャプションに対応するマスク（例えば、図５に示されるキャプション１に対応するマスク）を生成し得る。マスクの形状は、矩形であってよいし、キャプションを覆うことができる任意の他の形状であってもよい。これは、本出願のこの実施形態では限定されない。 Specifically, after receiving the color value and transparency of the mask corresponding to each caption in the caption group sent by the video frame color gamut interpretation module, the caption decoding module may generate a mask corresponding to the caption (e.g., a mask corresponding to caption 1 shown in FIG. 5) based on the color value and transparency of the mask corresponding to the caption and the caption position information of the caption. The shape of the mask may be rectangular or any other shape that can cover the caption. This is not limited in this embodiment of the present application.

同様に、キャプション復号モジュールは、キャプショングループ内の各キャプションについて、キャプションに対応するマスクを生成し得る。 Similarly, for each caption in the caption group, the caption decoding module may generate a mask corresponding to the caption.

例えば、図２Ｃに示すように、ピクチャ内に４つのキャプションがあることが容易に分かる。従って、キャプション復号モジュールは、４つのマスクを生成し得、１つのキャプションは１つのマスクに対応する。 For example, as shown in FIG. 2C, it is easy to see that there are four captions in the picture. Therefore, the caption decoding module may generate four masks, one caption corresponding to one mask.

更に、キャプション復号モジュールは、キャプションを、キャプションに対応するマスクの上位レイヤに重ねわせて、マスクを有するキャプション（例えば、図５に示されるマスク付きキャプション１）を生成し得る。 Furthermore, the caption decoding module may overlay the caption on top of a mask corresponding to the caption to generate a caption with a mask (e.g., Caption with Mask 1 shown in FIG. 5).

同様に、キャプション復号モジュールは、キャプショングループ内の各キャプションを対応するマスク上に重ね合わせて、マスクを有するキャプションフレームを生成し得る。 Similarly, the caption decoding module may overlay each caption in a caption group onto a corresponding mask to generate a caption frame with a mask.

図６Ａは、マスクを有するキャプションフレームの例を示す。各キャプションにマスクが重ね合わされていることが分かる。認識度の高いキャプション（例えば、「認識度の高いキャプション」又は「音声と同期されたキャプション」）に対応するマスクの透明度は１００％であり、認識度の低いキャプション（例えば、「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」又は「不明瞭な色のキャプション」）に対応するマスクの透明度は１００％未満であり、特定の色値が存在する。 Figure 6A shows an example of a caption frame with a mask. It can be seen that each caption has a mask superimposed on it. The masks corresponding to the captions with high recognizability (e.g. "High Recognizability Caption" or "Synchronized with Audio Caption") have 100% transparency, whereas the masks corresponding to the captions with low recognizability (e.g. "W S Y T K L D G S Y D Z M" or "Obscure Color Caption") have less than 100% transparency and certain color values are present.

Ｓ３１８：電子デバイス１００上のキャプション復号モジュールは、マスクを有するキャプションフレームを電子デバイス１００上のビデオフレーム合成モジュールに送信する。 S318: The caption decoding module on the electronic device 100 sends the caption frame with the mask to the video frame synthesis module on the electronic device 100.

具体的には、マスクを有するキャプションフレームを生成した後、キャプション復号モジュールは、マスクを有するキャプションフレームをビデオフレーム合成モジュールに送信して、その後、表示されるべきビデオフレームを生成し得る。 Specifically, after generating the caption frame with the mask, the caption decoding module may send the caption frame with the mask to a video frame synthesis module, which then generates the video frame to be displayed.

Ｓ３１９及びＳ３２０：電子デバイス１００上のビデオフレーム合成モジュールは、受信したビデオフレームとマスクを有するキャプションフレームとを重ね合わせて結合して、表示されるべきビデオフレームを生成し、表示されるべきビデオフレームを電子デバイス１００上のビデオフレームキューに送信する。 S319 and S320: The video frame synthesis module on the electronic device 100 overlays and combines the received video frame with the caption frame having the mask to generate the video frame to be displayed, and sends the video frame to be displayed to a video frame queue on the electronic device 100.

Ｓ３２１からＳ３２３：ビデオレンダリングモジュールは、表示されるべきビデオフレームを時系列に基づいてビデオフレームキューから読み出し、表示されるべきビデオフレームを時系列に基づいてレンダリングして、レンダリングされたビデオフレームを生成し得る。 S321 to S323: The video rendering module may read the video frames to be displayed from the video frame queue in chronological order and render the video frames to be displayed in chronological order to generate rendered video frames.

Ｓ３２４：電子デバイス１００は、レンダリングされたビデオフレームを表示する。 S324: The electronic device 100 displays the rendered video frame.

ステップＳ３１９からステップＳ３２４の具体的な実行プロセスについては、図１Ａ及び図１Ｂに示される実施形態におけるステップＳ１０９からステップＳ１１４の関連する内容を参照されたい。詳細はここでは改めて説明しない。 For the specific execution process of steps S319 to S324, please refer to the relevant contents of steps S109 to S114 in the embodiment shown in Figures 1A and 1B. Details will not be described again here.

いくつかの実施形態では、ビデオ復号モジュール、キャプション復号モジュール、ビデオフレーム色域解釈モジュール、ビデオフレーム合成モジュール、ビデオフレームキュー、及びビデオレンダリングモジュールは、代替的に、本出願のこの実施形態において提供されるキャプション表示方法を実行するためにビデオアプリケーションに統合されてもよいことに留意されたい。これは、本出願のこの実施形態では限定されない。 It should be noted that in some embodiments, the video decoding module, the caption decoding module, the video frame gamut interpretation module, the video frame synthesis module, the video frame queue, and the video rendering module may alternatively be integrated into a video application to perform the caption display method provided in this embodiment of the present application. This is not limited to this embodiment of the present application.

例えば、図６Ｂは、電子デバイス１００が図３Ａ、図３Ｂ、図３Ｃ、及び図３Ｄに示されるキャプション表示方法（１つのキャプションは１つのマスクに対応し得る）を実行した後に表示されるレンダリングされたビデオフレームにおけるフレームのピクチャであり得る。図２Ｃに示されるピクチャと比較して、対応するマスクがキャプショングループに追加された後、キャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」の認識度及びキャプション「不明瞭な色のキャプション」の認識度が大幅に改善されていることが容易に分かる。加えて、キャプションに対応するマスクは、特定の透明度を有するので、キャプション重畳領域はビデオピクチャを完全には遮蔽しない。このようにして、ビデオ表示及びキャプション表示の効果が包括的に考慮され、それにより、ユーザによって選択されたキャプションの色の変更なしに、ビデオピクチャの特定の可視性を確保しながら、ユーザがキャプションをはっきりと見ることができることを確実にすることができ、それによって、ユーザエクスペリエンスを向上させることができる。 For example, FIG. 6B may be a picture of a frame in a rendered video frame displayed after the electronic device 100 executes the caption display method shown in FIG. 3A, FIG. 3B, FIG. 3C, and FIG. 3D (one caption may correspond to one mask). Compared with the picture shown in FIG. 2C, it is easy to see that the recognition degree of the caption "W S Y T K L D G S Y D Z M" and the recognition degree of the caption "obscure color caption" are greatly improved after the corresponding mask is added to the caption group. In addition, the mask corresponding to the caption has a certain transparency, so that the caption overlapping area does not completely occlude the video picture. In this way, the effects of video display and caption display are comprehensively considered, thereby ensuring that the user can clearly see the caption while ensuring a certain visibility of the video picture without changing the color of the caption selected by the user, thereby improving the user experience.

更に、ビデオ再生プロセス全体において、キャプションの位置、ビデオ背景の色などが変化し得る。従って、前述のキャプション表示方法は、ユーザがビデオ再生プロセス全体においてキャプションをはっきりと見ることができるように、常に実行され得る。例えば、図６Ｂは、ビデオ再生進行が瞬間８：００にある第１のユーザインターフェースの概略図であり得、図６Ｃは、ビデオ再生進行が瞬間８：０２にある第２のユーザインターフェースの概略図であり得る。第１のユーザインターフェースに含まれるビデオフレームは、第２のユーザインターフェースに含まれるビデオフレームとは異なる。図６Ｃに示すように、図６Ｂと比べて、キャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」、キャプション「認識度の高いキャプション」、キャプション「不明瞭な色のキャプション」は、いずれも表示画面の左側に移動していることが分かる。電子デバイス１００は、キャプションの色値とキャプションに対応する現在のビデオフレーム領域の色値とに基づいて、キャプションに対応するマスクの色値及び透明度を再計算して、キャプションに対応するマスクを生成する。第２のユーザインターフェースにおいて、現在のビデオフレーム領域に対応するキャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」のビデオ背景色が変化し、キャプションの認識度も増加することが容易に分かる。従って、図６Ｂと比較すると、キャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」に対応するマスクも変化している。キャプションに対してマスクが表示されていないことが分かる。具体的には、キャプションに対応するマスクの透明度が１００％に変更されたり、キャプションに対するマスクがなかったりする。 In addition, the position of the caption, the color of the video background, etc. may change during the entire video playback process. Therefore, the above-mentioned caption display method may be always performed so that the user can clearly see the caption during the entire video playback process. For example, FIG. 6B may be a schematic diagram of a first user interface where the video playback progress is at the moment 8:00, and FIG. 6C may be a schematic diagram of a second user interface where the video playback progress is at the moment 8:02. The video frames included in the first user interface are different from the video frames included in the second user interface. As shown in FIG. 6C, compared with FIG. 6B, it can be seen that the caption "W S Y T K L D G S Y D Z M", the caption "Highly recognizable caption", and the caption "Obscure color caption" are all moved to the left side of the display screen. The electronic device 100 recalculates the color value and transparency of the mask corresponding to the caption based on the color value of the caption and the color value of the current video frame area corresponding to the caption to generate a mask corresponding to the caption. It can be easily seen that in the second user interface, the video background color of the caption "W S Y T K L D G S Y D Z M" corresponding to the current video frame area is changed, and the recognition degree of the caption is also increased. Therefore, compared with FIG. 6B, the mask corresponding to the caption "W S Y T K L D G S Y D Z M" is also changed. It can be seen that no mask is displayed for the caption. Specifically, the transparency of the mask corresponding to the caption is changed to 100%, or there is no mask for the caption.

図６Ｂ及び図６Ｃに示されるビデオ再生ピクチャは、全画面モードで表示されてもよいし、部分画面モードで表示されてもよい。これは、本出願のこの実施形態では限定されない。 The video playback pictures shown in Figures 6B and 6C may be displayed in full screen mode or in partial screen mode. This is not a limitation in this embodiment of the present application.

図６Ｂに示されるキャプションに対応するマスクは、キャプション全体が位置する領域にまたがるマスクであり、すなわち、１つのキャプションは１つのマスクのみに対応する。いくつかの実際のアプリケーションシナリオでは、１つのキャプションが、大きな色域差を有する複数の領域にまたがることがある。その結果、キャプションの一部の認識度が高くなり、キャプションの他の部分の認識度が低くなる。この場合、１つのキャプションに対して複数の対応マスクが生成され得る。例えば、図２Ｃに示されるキャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」では、キャプションの領域の先頭部分のキャプション認識度は低く（すなわち、４文字「ＷＳＹＴ」はユーザに認識されにくい）、キャプションの領域の末尾部分のキャプション認識度も低い（すなわち、４文字「ＹＤＺＭ」はユーザに認識されにくい）。キャプションの領域の中間部分のキャプション認識度は高い（すなわち、４文字「ＫＬＤＧＳ」はユーザに認識されやすい）。従って、この場合、キャプションの領域の先頭部分、中間部分、及び末尾部分のそれぞれに対して１つの対応するマスクが生成され得、すなわち、キャプションは、３つの対応するマスクを有し得る。 The mask corresponding to the caption shown in FIG. 6B is a mask that spans the region where the entire caption is located, i.e., one caption corresponds to only one mask. In some practical application scenarios, one caption may span multiple regions with large color gamut differences. As a result, some parts of the caption are highly recognizable and other parts of the caption are less recognizable. In this case, multiple corresponding masks may be generated for one caption. For example, in the caption "W S Y T K L D G S Y D Z M" shown in FIG. 2C, the caption recognition is low at the beginning of the caption region (i.e., the four characters "W S Y T" are difficult for users to recognize), and the caption recognition is also low at the end of the caption region (i.e., the four characters "Y D Z M" are difficult for users to recognize). The middle part of the caption region has high caption recognition (i.e., the four letters "K L D G S" are easy to recognize by users). Therefore, in this case, one corresponding mask may be generated for each of the beginning, middle, and end parts of the caption region, i.e., the caption may have three corresponding masks.

１つのキャプションが複数のマスクに対応する前述のアプリケーションシナリオの場合、本出願のこの実施形態では、１つのキャプションが複数のマスクに対応するように、図３Ａ、図３Ｂ、図３Ｃ、及び図３Ｄに示される方法に基づいて、ステップＳ３１３からステップＳ３１７に対していくつかの対応する改良が行われ得る。他のステップは変更する必要がない。 For the aforementioned application scenario where one caption corresponds to multiple masks, in this embodiment of the present application, some corresponding improvements may be made to steps S313 to S317 based on the methods shown in Figures 3A, 3B, 3C, and 3D so that one caption corresponds to multiple masks. Other steps do not need to be changed.

以下では、１つのキャプションが複数のマスクに対応するプロセスについて詳細に説明する。 Below we explain in detail the process by which one caption corresponds to multiple masks.

キャプショングループに対応するビデオフレーム内のキャプション位置における色域情報を生成するプロセスにおいて、ビデオフレーム色域解釈モジュールは、全てのサブ領域の色値を左から右へ（又は右から左へ）順に順次計算し得る。１つのキャプションが複数のマスクに対応する必要がある前述のアプリケーションシナリオ、すなわち、１つのキャプションが大きい色域差を有する複数の領域にまたがるアプリケーションシナリオでは、ビデオフレーム色域解釈モジュールは、隣接するサブ領域の色値を比較し得、隣接するサブ領域の色値が近い場合、隣接するサブ領域は１つの領域に結合され、結合された領域は１つのマスクに対応する。隣接するサブ領域の色値が大きく異なる場合、隣接するサブ領域は結合されず、２つの結合されていない領域はそれぞれのマスクに対応する。従って、１つのキャプションが複数のマスクに対応し得る。 In the process of generating color gamut information at the caption position in the video frame corresponding to the caption group, the video frame gamut interpretation module may sequentially calculate the color values of all sub-regions in order from left to right (or from right to left). In the aforementioned application scenario where one caption needs to correspond to multiple masks, i.e., one caption spans multiple regions with large gamut differences, the video frame gamut interpretation module may compare the color values of adjacent sub-regions, and if the color values of the adjacent sub-regions are close, the adjacent sub-regions are combined into one region, and the combined region corresponds to one mask. If the color values of the adjacent sub-regions are significantly different, the adjacent sub-regions are not combined, and the two uncombined regions correspond to respective masks. Thus, one caption may correspond to multiple masks.

図７Ａに示すように、１つのキャプションが複数のマスクに対応し得る場合、ステップＳ３１３からステップＳ３１７は、具体的には、以下のステップに基づいて実行され得る。以下では、図７Ｂに示されるキャプション１が、図２Ｃに示されるキャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」である場合を説明で使用する。 As shown in FIG. 7A, when one caption can correspond to multiple masks, steps S313 to S317 can be specifically executed based on the following steps. In the following explanation, the caption 1 shown in FIG. 7B is the caption "W S Y T K L D G S Y D Z M" shown in FIG. 2C.

Ｓ７０１：ビデオフレーム色域解釈モジュールが、キャプションの位置に対応するビデオフレーム領域の全てのサブ領域の色値を順次計算し、近い色値を有するサブ領域を結合して、Ｍ個の第２のサブ領域を取得する。 S701: The video frame color gamut interpretation module sequentially calculates the color values of all sub-regions of the video frame region corresponding to the position of the caption, and combines the sub-regions with similar color values to obtain M second sub-regions.

具体的には、ステップＳ３１３に基づいて、全てのサブ領域の色値を左から右に（又は右から左に）順次計算した後、ビデオフレーム色域解釈モジュールは、更に、隣接するサブ領域の色値を比較し、近い色値を有するサブ領域を結合して、Ｍ個の第２のサブ領域を取得する必要があり、Ｍは正の整数である。図７Ｂに示すように、隣接するサブ領域の色値を比較し、近い色値を有するサブ領域を結合した後、ビデオフレーム色域解釈モジュールは、キャプションの位置に対応するビデオフレーム領域を、領域Ａ、領域Ｂ、及び領域Ｃという３つの領域（すなわち、３つの第２のサブ領域）に分割する。領域Ａはａ個のサブ領域を結合することによって形成され、領域Ｂはｂ個のサブ領域を結合することによって形成され、領域Ｃはｃ個のサブ領域を結合することによって形成されると仮定する。 Specifically, after calculating the color values of all sub-regions from left to right (or from right to left) sequentially according to step S313, the video frame gamut interpretation module further needs to compare the color values of adjacent sub-regions and combine the sub-regions with similar color values to obtain M second sub-regions, where M is a positive integer. As shown in Fig. 7B, after comparing the color values of adjacent sub-regions and combining the sub-regions with similar color values, the video frame gamut interpretation module divides the video frame region corresponding to the position of the caption into three regions (i.e., three second sub-regions), namely, region A, region B, and region C. Assume that region A is formed by combining a sub-regions, region B is formed by combining b sub-regions, and region C is formed by combining c sub-regions.

色値が近いということは、２つのサブ領域の色値間の差分値が第２の閾値未満であることを意味し得、第２の閾値は予め設定される。 Color values being close may mean that the difference value between the color values of the two subregions is less than a second threshold, which is preset.

Ｓ７０２：ビデオフレーム色域解釈モジュールが、Ｍ個の第２のサブ領域に対して重畳キャプション認識度分析を別々に実行して、Ｍ個の第２のサブ領域の重畳キャプション認識度分析結果を生成する。 S702: The video frame gamut interpretation module performs overlay caption recognizability analysis separately for the M second sub-regions to generate overlay caption recognizability analysis results for the M second sub-regions.

具体的には、ビデオフレーム色域解釈モジュールは、ビデオフレーム領域全体に対して重畳キャプション認識度分析を直接実行するのではなく、領域Ａ、領域Ｂ、及び領域Ｃに対して重畳キャプション認識度分析を別々に実行する必要がある。同様に、ビデオフレーム色域解釈モジュールは、ステップＳ３１４における色差値を使用して、領域Ａ、領域Ｂ、及び領域Ｃに対して重畳キャプション認識度分析を別々に実行し得る。プロセスは以下の通りである。 Specifically, instead of directly performing the overlay caption recognizability analysis on the entire video frame region, the video frame gamut interpretation module needs to perform the overlay caption recognizability analysis separately on region A, region B, and region C. Similarly, the video frame gamut interpretation module may use the color difference values in step S314 to perform the overlay caption recognizability analysis separately on region A, region B, and region C. The process is as follows:

領域Ａの色差値Ｄｉｆｆ１：

Color difference value Diff1 of area A:

ａは、領域Ａに含まれるサブ領域の数であり、ｒ_ｉは、領域Ａ内のサブ領域内の全てのピクセルの平均赤色値であり、ｇ_ｉは、領域Ａ内のサブ領域内の全てのピクセルの平均緑色値であり、ｂ_ｉは、領域Ａ内のサブ領域内の全てのピクセルの平均青色値である。ｒ_０は、領域Ａにおけるキャプションの赤色値であり、ｇ_０は、領域Ａにおけるキャプションの緑色値であり、ｂ_０は、領域Ａにおけるキャプションの青色値である。 a is the number of subregions contained in region A, r _i is the average red value of all pixels in the subregions in region A, g _i is the average green value of all pixels in the subregions in region A, and b _i is the average blue value of all pixels in the subregions in region A. r ₀ is the red value of the caption in region A, g ₀ is the green value of the caption in region A, and b ₀ is the blue value of the caption in region A.

領域Ｂの色差値Ｄｉｆｆ２：

Color difference value Diff2 of area B:

ｂは、領域Ｂに含まれるサブ領域の数であり、ｒ_ｉは、領域Ｂ内のサブ領域内の全てのピクセルの平均赤色値であり、ｇ_ｉは、領域Ｂ内のサブ領域内の全てのピクセルの平均緑色値であり、ｂ_ｉは、領域Ｂ内のサブ領域内の全てのピクセルの平均青色値である。ｒ_０は、領域Ｂにおけるキャプションの赤色値であり、ｇ_０は、領域Ｂにおけるキャプションの緑色値であり、ｂ_０は、領域Ｂにおけるキャプションの青色値である。 b is the number of subregions contained in region B, r _i is the average red value of all pixels in the subregions in region B, g _i is the average green value of all pixels in the subregions in region B, and b _i is the average blue value of all pixels in the subregions in region B. r ₀ is the red value of the caption in region B, g ₀ is the green value of the caption in region B, and b ₀ is the blue value of the caption in region B.

領域Ｃの色差値Ｄｉｆｆ３：

Color difference value Diff3 of area C:

ｃは、領域Ｃに含まれるサブ領域の数であり、ｒ_ｉは、領域Ｃ内のサブ領域内の全てのピクセルの平均赤色値であり、ｇ_ｉは、領域Ｃ内のサブ領域内の全てのピクセルの平均緑色値であり、ｂ_ｉは、領域Ｃ内のサブ領域内の全てのピクセルの平均青色値である。ｒ_０は、領域Ｃにおけるキャプションの赤色値であり、ｇ_０は、領域Ｃにおけるキャプションの緑色値であり、ｂ_０は、領域Ｃにおけるキャプションの青色値である。 c is the number of subregions contained in region C, r _i is the average red value of all pixels in the subregions in region C, g _i is the average green value of all pixels in the subregions in region C, and b _i is the average blue value of all pixels in the subregions in region C. r ₀ is the red value of the caption in region C, g ₀ is the green value of the caption in region C, and b ₀ is the blue value of the caption in region C.

計算を通じて領域Ａ、領域Ｂ、及び領域Ｃの色差値を別々に取得した後、ビデオフレーム色域解釈モジュールは、３つの領域の色差値が、予め設定された色差閾値未満であるかどうかを決定し得る。領域の色差値が予め設定された色差閾値未満である場合、領域のキャプション認識度が低いことを示す。 After separately obtaining the color difference values of area A, area B, and area C through calculation, the video frame color gamut interpretation module may determine whether the color difference values of the three areas are less than the preset color difference threshold. If the color difference value of the area is less than the preset color difference threshold, it indicates that the caption recognition degree of the area is low.

Ｓ７０３：ビデオフレーム色域解釈モジュールが、Ｍ個の第２のサブ領域及びキャプション色域情報の重畳キャプション認識度分析結果に基づいて、Ｍ個の第２のサブ領域に対応するマスクの色値及び透明度をそれぞれ決定する。 S703: The video frame color gamut interpretation module determines color values and transparency of masks corresponding to the M second sub-regions, respectively, based on the superimposed caption recognition analysis results of the M second sub-regions and the caption color gamut information.

具体的には、ビデオフレーム色域解釈モジュールは、領域Ａ、領域Ｂ、及び領域Ｃのキャプション色域情報及び重畳キャプション認識度分析結果に基づいて、領域Ａに対応するマスクの色値及び透明度、領域Ｂに対応するマスクの色値及び透明度、並びに領域Ｃに対応するマスクの色値及び透明度をそれぞれ決定する必要がある。各第２のサブ領域に対応するマスクの色値及び透明度を決定する具体的なプロセスは、ステップＳ３１５におけるキャプションの位置に対応するビデオフレーム領域全体に対応するマスクの色値及び透明度を決定するプロセスと同様である。前述の関連する内容を参照されたい。詳細はここでは改めて説明しない。 Specifically, the video frame color gamut interpretation module needs to determine the color value and transparency of the mask corresponding to region A, the color value and transparency of the mask corresponding to region B, and the color value and transparency of the mask corresponding to region C based on the caption color gamut information of region A, region B, and region C and the result of the superimposed caption recognition analysis. The specific process of determining the color value and transparency of the mask corresponding to each second sub-region is similar to the process of determining the color value and transparency of the mask corresponding to the entire video frame region corresponding to the position of the caption in step S315. Please refer to the related contents mentioned above. Details will not be described again here.

Ｓ７０４：ビデオフレーム色域解釈モジュールが、Ｍ個の第２のサブ領域に対応するマスクの色値、透明度、及び位置情報をキャプション復号モジュールに送信する。 S704: The video frame gamut interpretation module sends the color values, transparency, and position information of the masks corresponding to the M second sub-regions to the caption decoding module.

具体的には、１つのキャプションが複数のマスクに対応し得るので、ビデオフレーム色域解釈モジュールは、キャプショングループ内の各キャプションに対応するマスクの色値及び透明度をキャプション復号モジュールに送信する必要があるだけでなく、各マスクの位置情報（又は、対応するキャプションに対する各マスクの位置情報）をキャプション復号モジュールに送信する必要もある。各マスクの位置情報は、キャプション位置情報に基づいて取得され得る。具体的には、１つのキャプションが複数のマスクに対応する場合、キャプション位置情報は既知であるので、キャプション位置におけるビデオフレーム領域内の全てのサブ領域の位置情報が推定され得る。更に、各第２のサブ領域に対応するマスクの位置情報が推定され得る。 Specifically, since one caption may correspond to multiple masks, the video frame gamut interpretation module not only needs to send the color value and transparency of the mask corresponding to each caption in the caption group to the caption decoding module, but also needs to send the position information of each mask (or the position information of each mask relative to the corresponding caption) to the caption decoding module. The position information of each mask can be obtained based on the caption position information. Specifically, when one caption corresponds to multiple masks, since the caption position information is known, the position information of all sub-regions in the video frame region at the caption position can be estimated. Furthermore, the position information of the mask corresponding to each second sub-region can be estimated.

Ｓ７０５：キャプション復号モジュールが、Ｍ個の第２のサブ領域に対応するマスクの色値、透明度、及び位置情報に基づいて、キャプションに対応するマスクを生成し、このマスクにキャプションを重ね合わせて、マスクを有するキャプションを生成する。 S705: The caption decoding module generates a mask corresponding to the caption based on the color values, transparency, and position information of the mask corresponding to the M second sub-regions, and superimposes the caption on the mask to generate a caption with the mask.

具体的には、複数のマスクに対応する１つのキャプションの場合、キャプション復号モジュールは、キャプションに対応する各第２のサブ領域のマスクの色値、透明度、及びマスク位置情報に基づいて、キャプションに対応する３つのマスク（例えば、図７Ｂに示されるキャプション１に対応するマスク）を生成し得る。そして、キャプション復号モジュールは、キャプションを、キャプションに対応するマスクの上位レイヤに重ね合わせて、マスクを有するキャプション（例えば、図７Ｂに示されるマスク付きキャプション１）を生成し得る。 Specifically, in the case of one caption corresponding to multiple masks, the caption decoding module may generate three masks corresponding to the caption (e.g., a mask corresponding to caption 1 shown in FIG. 7B) based on the color value, transparency, and mask position information of the mask of each second sub-region corresponding to the caption. Then, the caption decoding module may overlay the caption on a layer above the mask corresponding to the caption to generate a caption with a mask (e.g., caption 1 with mask shown in FIG. 7B).

図２Ｃに示すように、３つのキャプション「認識度の高いキャプション」、「不明瞭な色のキャプション」、及び「音声と同期されたキャプション」は、大きな色域差を有する複数の領域にまたがらないので、３つのキャプションの各々は、依然として１つのマスクに対応する。 As shown in FIG. 2C, the three captions “highly recognizable caption”, “obscure color caption”, and “synchronized with audio caption” do not span multiple regions with large color gamut differences, so each of the three captions still corresponds to one mask.

キャプション復号モジュールは、キャプショングループ内の各キャプションを対応するマスク上に重ね合わせて、マスクを有するキャプションフレームを生成し得る。 The caption decoding module may overlay each caption in the caption group onto a corresponding mask to generate a caption frame with the mask.

図８Ａは、マスクを有するキャプションフレームの一例を示す。３つのマスクがキャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」に重ね合わされていることが分かる。「ＷＳＹＴ」の認識度及び「ＹＤＺＭ」の認識度は低いので、対応するマスクの透明度は１００％未満であり、特定の色値が存在する。「ＫＬＤＧＳ」の認識度は高いので、対応するマスクの透明度は１００％である。１つのマスクは、他の３つのキャプションの各々に重ね合わされる。キャプション「認識度の高いキャプション」の認識度とキャプション「音声と同期されたキャプション」の認識度は高いので、対応するマスクの透明度は１００％である。キャプション「不明瞭な色のキャプション」の認識度は低いので、対応するマスクの透明度は１００％未満であり、特定の色値が存在する。 8A shows an example of a caption frame with masks. It can be seen that three masks are superimposed on the caption "W S Y T K L D G S Y D Z M". The recognizability of "W S Y T" and the recognizability of "Y D Z M" are low, so the transparency of the corresponding mask is less than 100% and there is a specific color value. The recognizability of "K L D G S" is high, so the transparency of the corresponding mask is 100%. One mask is superimposed on each of the other three captions. The recognizability of the caption "High recognizability caption" and the recognizability of the caption "Caption synchronized with audio" are high, so the transparency of the corresponding mask is 100%. The recognizability of the caption "Caption with unclear color" is low, so the transparency of the corresponding mask is less than 100% and there is a specific color value.

例えば、図８Ｂは、電子デバイス１００が図３Ａ、図３Ｂ、図３Ｃ及び図３Ｄに示される改善されたキャプション表示方法を実行した後に表示されるレンダリングされたビデオフレームにおけるフレームのピクチャであり得る（大きな色域差を有する複数の領域にまたがるキャプションは、複数のマスクに対応し得る）。図６Ｂに示されるピクチャと比較すると、キャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」は、大きな色域差を有する複数の領域にまたがっているため、キャプションに対応するマスクが変化している。キャプションの領域の中間部分（すなわち、部分「ＫＬＤＧＳ」）は高いキャプション認識度を有するので、その部分に対応するマスクの透明度は１００％（すなわち、完全に透明）に設定されるか、又はマスクが設定されなくてもよいことが容易に分かる。キャプションの領域の先頭部分（すなわち、部分「ＷＳＹＴ」）及び末尾部分（すなわち、部分「ＹＤＺＭ」）は、低いキャプション認識度を有するので、これら２つの部分に対応するマスクの色値及び透明度は、それぞれ、キャプション色域情報と２つの部分が位置する領域の色域情報とに基づいて計算される。このように、キャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」の領域の中間部分に対応するマスクの透明度が１００％であるか、又はマスクが設定されなくてもよいので、図６Ｂに示される有益な効果を達成することに基づいて、マスクによるビデオピクチャの遮蔽が更に低減され、ユーザエクスペリエンスが更に向上する。 For example, Fig. 8B may be a picture of a frame in the rendered video frame displayed after the electronic device 100 executes the improved caption display method shown in Figs. 3A, 3B, 3C, and 3D (a caption spanning multiple regions with large color gamut differences may correspond to multiple masks). Compared to the picture shown in Fig. 6B, the caption "W S Y T K L D G S Y D Z M" spans multiple regions with large color gamut differences, so the mask corresponding to the caption has changed. It can be easily seen that the middle part of the caption region (i.e., the portion "K L D G S") has a high degree of caption recognition, so the transparency of the mask corresponding to that part may be set to 100% (i.e., completely transparent) or no mask may be set. Since the leading portion (i.e., portion "W S Y T") and trailing portion (i.e., portion "Y D Z M") of the caption region have low caption recognition degree, the color value and transparency of the mask corresponding to these two portions are calculated based on the caption color gamut information and the color gamut information of the region where the two portions are located, respectively. In this way, the transparency of the mask corresponding to the middle portion of the caption "W S Y T K L D G S Y D Z M" region can be 100% or no mask can be set, so that the mask further reduces the occlusion of the video picture, and the user experience is further improved, based on which the beneficial effect shown in FIG. 6B is achieved.

更に、ビデオ再生プロセス全体において、キャプションの位置、ビデオ背景の色などが変化し得る。従って、前述のキャプション表示方法は、ユーザがビデオ再生プロセス全体においてキャプションをはっきりと見ることができるように、常に実行され得る。例えば、図８Ｂは、ビデオ再生進行が瞬間８：００にあり、第１のビデオフレームが含まれるユーザインターフェースの概略図であり得る。図８Ｃは、ビデオ再生進行が瞬間８：０１にあり、第２のビデオフレームが含まれるユーザインターフェースの概略図であり得る。第１のビデオフレームは、第２のビデオフレームと同じである。図８Ｃに示すように、図８Ｂと比べて、キャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」、キャプション「認識度の高いキャプション」、キャプション「不明瞭な色のキャプション」は、いずれも表示画面の左側に移動していることが分かる。電子デバイス１００は、キャプションの色値とキャプションに対応する現在のビデオフレーム領域の色値とに基づいて、キャプションに対応するマスクの色値及び透明度を再計算して、キャプションに対応するマスクを生成する。図８Ｂと比較すると、図８Ｃのキャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」に対応するマスクが大きく変化していることが容易に分かる。図８Ｂにおいて、キャプション認識度が低い部分は、「ＷＳＹＴ」及び「ＹＤＺＭ」である。従って、これら２つの部分に対応するマスクは特定の色値を有し、対応するマスクの透明度は１００％未満である。キャプション認識度が高い部分は「ＫＬＤＧＳ」であり、従って、この部分にはマスクは表示されない。具体的には、キャプションに対応するマスクの透明度が１００％に設定されてもよいし、マスクが設定されなくてもよい。しかしながら、図８Ｃでは、キャプション認識度が低い部分は「ＷＳＹＴＫ」と「ＤＺＭ」に変化している。従って、電子デバイス１００は、キャプションの色値とキャプションに対応する現在のビデオフレーム領域の色値とに基づいて、これら２つの部分に対応するマスクの色値及び透明度を再計算する。２つの部分の認識度は低く、従って、２つの部分に対応するマスクは特定の色値を有する。加えて、対応するマスクの透明度は１００％未満である。キャプション認識度が高い部分は「ＬＤＧＳＹ」に変化している。従って、この部分にはマスクは表示されない。具体的には、キャプションに対応するマスクの透明度が１００％に設定されてもよいし、マスクが設定されなくてもよい。図８Ｃにおけるキャプションに対応するマスクを生成するプロセスは、図８Ｂにおけるキャプションに対応するマスクを生成するプロセスと同様である。詳細はここでは改めて説明しない。 In addition, the position of the caption, the color of the video background, etc. may change during the entire video playback process. Therefore, the above-mentioned caption display method may be always performed so that the user can clearly see the caption during the entire video playback process. For example, FIG. 8B may be a schematic diagram of a user interface where the video playback progress is at the moment 8:00 and a first video frame is included. FIG. 8C may be a schematic diagram of a user interface where the video playback progress is at the moment 8:01 and a second video frame is included. The first video frame is the same as the second video frame. As shown in FIG. 8C, compared with FIG. 8B, it can be seen that the caption "W S Y T K L D G S Y D Z M", the caption "Highly recognizable caption", and the caption "Obscure color caption" have all moved to the left side of the display screen. The electronic device 100 recalculates the color value and transparency of the mask corresponding to the caption based on the color value of the caption and the color value of the current video frame area corresponding to the caption to generate a mask corresponding to the caption. Compared with FIG. 8B, it is easy to see that the mask corresponding to the caption "W S Y T K L D G S Y D Z M" in FIG. 8C has changed significantly. In FIG. 8B, the parts with low caption recognition are "W S Y T" and "Y D Z M". Therefore, the masks corresponding to these two parts have certain color values, and the transparency of the corresponding masks is less than 100%. The part with high caption recognition is "K L D G S", so no mask is displayed in this part. Specifically, the transparency of the mask corresponding to the caption may be set to 100%, or no mask may be set. However, in FIG. 8C, the part with low caption recognition has changed to "W S Y T K" and "D Z M". Therefore, the electronic device 100 recalculates the color values and transparency of the masks corresponding to these two parts based on the color values of the caption and the color values of the current video frame area corresponding to the caption. The recognition of the two parts is low, so the masks corresponding to the two parts have certain color values. In addition, the transparency of the corresponding masks is less than 100%. The part with high caption recognition has changed to "L D G S Y". Therefore, no mask is displayed in this part. Specifically, the transparency of the mask corresponding to the caption may be set to 100%, or no mask may be set. The process of generating a mask corresponding to the caption in FIG. 8C is similar to the process of generating a mask corresponding to the caption in FIG. 8B. Details will not be described again here.

図８Ｂ及び図８Ｃに示されるビデオ再生ピクチャは、全画面モードで表示されてもよいし、部分画面モードで表示されてもよい。これは、本出願のこの実施形態では限定されない。 The video playback pictures shown in Figures 8B and 8C may be displayed in full screen mode or in partial screen mode. This is not a limitation in this embodiment of the present application.

本出願のこの実施形態では、認識度の高いキャプションの場合、電子デバイス１００は、キャプションに対するマスクを生成し、ここで、マスクの色値は、予め設定された色値であり得、マスクの透明度は１００％である。いくつかの実施形態では、認識度の高いキャプションの場合、電子デバイス１００は、代替的に、キャプションに対するマスクを生成しなくてもよい。具体的には、キャプションが高い認識度を有すると電子デバイス１００が決定した場合、電子デバイス１００は、キャプションに対して更なる処理を実行しなくてもよく、従って、キャプションは、対応するマスクを有さず、すなわち、キャプションに対してマスクは設定されない。 In this embodiment of the application, for a highly recognizable caption, the electronic device 100 generates a mask for the caption, where the color value of the mask may be a preset color value and the transparency of the mask is 100%. In some embodiments, for a highly recognizable caption, the electronic device 100 may alternatively not generate a mask for the caption. Specifically, if the electronic device 100 determines that the caption has a high degree of recognition, the electronic device 100 may not perform further processing on the caption, and thus the caption does not have a corresponding mask, i.e., no mask is set for the caption.

本出願のこの実施形態では、１つのキャプションが１つのマスクに対応する（すなわち、１つのキャプションがマスクパラメータの１つのグループに対応する）ということは、１つのキャプションが１つの色値及び透明度を含む１つのマスクに対応することを意味し得る。１つのキャプションが複数のマスクに対応する（すなわち、１つのキャプションがマスクパラメータの複数のグループに対応する）ということは、１つのキャプションが、異なる色値及び異なる透明度を有する複数のマスクに対応し、１つのキャプションが、異なる色値及び異なる透明度を有する１つのマスクに対応する（すなわち、異なる色値及び異なる透明度を有する複数のマスクが、異なる色値及び異なる透明度を有する１つのマスクに結合される）ことを意味し得る。 In this embodiment of the application, a caption corresponds to a mask (i.e., a caption corresponds to a group of mask parameters) may mean that a caption corresponds to a mask including a color value and transparency. A caption corresponds to multiple masks (i.e., a caption corresponds to multiple groups of mask parameters) may mean that a caption corresponds to multiple masks with different color values and different transparency, and that a caption corresponds to a mask with different color values and different transparency (i.e., multiple masks with different color values and different transparency are combined into a mask with different color values and different transparency).

本出願のこの実施形態では、電子デバイス１００の一例として携帯電話（mobile phone）を使用する。代替的に、電子デバイス１００は、タブレット型コンピュータ（Ｐａｄ）、パーソナルデジタルアシスタント（Personal Digital Assistant、ＰＤＡ）、又はラップトップコンピュータ（Laptop）などのポータブル電子デバイスであってもよい。電子デバイス１００のタイプ、物理的形態、及びサイズは、本出願のこの実施形態では限定されない。 In this embodiment of the present application, a mobile phone is used as an example of the electronic device 100. Alternatively, the electronic device 100 may be a portable electronic device such as a tablet computer (Pad), a personal digital assistant (PDA), or a laptop computer (Laptop). The type, physical form, and size of the electronic device 100 are not limited in this embodiment of the present application.

本出願の実施形態では、第１のビデオは、ユーザが図２Ｂに示されるビデオ再生オプション２２１をタップした後に電子デバイス１００によって再生されるビデオであり得る。第１のインターフェースは、図６Ｂに示されるユーザインターフェースであり得る。第１のピクチャは、図６Ｂに示されるビデオフレームのピクチャであり得る。第１のキャプションは、キャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」であり得る。第１の領域は、第１のピクチャ内にあり、第１のキャプションの表示位置に対応する領域である。第１の値は、第１のキャプションの色と、第１のキャプションの表示位置に対応する第１のピクチャ領域の色との間の色差値であり得る。第２のインターフェースは、図６Ｃに示されるユーザインターフェースであり得る。第２のピクチャは、図６Ｃに示されるビデオフレームのピクチャであり得る。第２の領域は、第２のピクチャ内にあり、第１のキャプションの表示位置に対応する領域である。第２の値は、第１のキャプションの色と、第１のキャプションの表示位置に対応する第２のピクチャ領域の色との間の色差値であり得る。第１のビデオファイルは、第１のビデオに対応するビデオファイルであり得、第１のキャプションファイルは、第１のビデオに対応するキャプションファイルであり得る。第１のビデオフレームは、第１のピクチャを生成するために使用されるビデオフレームである。第１のキャプションフレームは、第１のキャプションを含み、第１のビデオフレームと同じ時間情報を搬送するキャプションフレームであり、第２のキャプションフレームは、第１のキャプションが第１のマスク上に重ね合わされた後に生成されるキャプションフレーム（すなわち、マスクを有するキャプションフレーム）である。第１のサブ領域は、ビデオフレーム色域抽出単位であり得る。第２のサブ領域は、近い色値を有する隣接する第１のサブ領域が結合された後に取得された領域（例えば、領域Ａ、領域Ｂ、及び領域Ｃ）であり得る。第１のサブマスクは、各第２のサブ領域に対応するマスクであり得る。第１のマスクは、図６Ｂに示されるキャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」に対応するマスクであってもよいし、図８Ｂに示されるキャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」に対応するマスクであってもよい。第３のインターフェースは、図８Ｂに示されるユーザインターフェースであり得、第３のピクチャは、図８Ｂに示されるビデオフレームのピクチャであり得る。第１の部分は、キャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」中の「ＷＳＹＴ」であり得、第２の部分は、キャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」中の「ＫＬＤＧＳ」であり得る。第２のサブマスクは、「ＷＳＹＴ」に対応するマスク（すなわち、図７Ｂに示される領域Ａのマスク）であり得る。第３のサブマスクは、「ＫＬＤＧＳ」に対応するマスク（すなわち、図７Ｂに示される領域Ｂのマスク）であり得る。第２のマスクは、図６Ｃに示されるキャプション「ＷＳＹＴＫＬＤＧＳＹＤＺＭ」に対応するマスクであり得る。 In an embodiment of the present application, the first video may be a video played by the electronic device 100 after the user taps the video playback option 221 shown in FIG. 2B. The first interface may be the user interface shown in FIG. 6B. The first picture may be a picture of the video frame shown in FIG. 6B. The first caption may be the caption "W S Y T K L D G S Y D Z M". The first region is a region in the first picture that corresponds to the display position of the first caption. The first value may be a color difference value between the color of the first caption and the color of the first picture region that corresponds to the display position of the first caption. The second interface may be the user interface shown in FIG. 6C. The second picture may be a picture of the video frame shown in FIG. 6C. The second region is a region in the second picture that corresponds to the display position of the first caption. The second value may be a color difference value between the color of the first caption and the color of the second picture region corresponding to the display position of the first caption. The first video file may be a video file corresponding to the first video, and the first caption file may be a caption file corresponding to the first video. The first video frame is a video frame used to generate the first picture. The first caption frame is a caption frame that includes the first caption and carries the same time information as the first video frame, and the second caption frame is a caption frame generated after the first caption is superimposed on the first mask (i.e., a caption frame with a mask). The first sub-region may be a video frame gamut extraction unit. The second sub-region may be a region (e.g., region A, region B, and region C) obtained after adjacent first sub-regions with close color values are combined. The first sub-mask may be a mask corresponding to each second sub-region. The first mask may be a mask corresponding to the caption "W S Y T K L D G S Y D Z M" shown in Figure 6B or may be a mask corresponding to the caption "W S Y T K L D G S Y D Z M" shown in Figure 8B. The third interface may be the user interface shown in Figure 8B and the third picture may be a picture of the video frame shown in Figure 8B. The first portion may be "W S Y T" in the caption "W S Y T K L D G S Y D Z M" and the second portion may be "K L D G S" in the caption "W S Y T K L D G S Y D Z M". The second submask may be a mask corresponding to "W S Y T" (i.e., the mask of area A shown in FIG. 7B). The third submask may be a mask corresponding to "K L D G S" (i.e., the mask of area B shown in FIG. 7B). The second mask may be a mask corresponding to the caption "W S Y T K L D G S Y D Z M" shown in FIG. 6C.

以下では、本出願の一実施形態による、電子デバイス１００の構造について説明する。 The structure of the electronic device 100 according to one embodiment of the present application is described below.

図９は、本出願の一実施形態による、電子デバイス１００の構成の一例を示す。 Figure 9 shows an example of the configuration of an electronic device 100 according to one embodiment of the present application.

図９に示すように、電子デバイス１００は、プロセッサ１１０、外部メモリインターフェース１２０、内部メモリ１２１、ユニバーサルシリアルバス（universal serial bus、ＵＳＢ）インターフェース１３０、充電管理モジュール１４０、電力管理モジュール１４１、バッテリ１４２、アンテナ１、アンテナ２、モバイル通信モジュール１５０、ワイヤレス通信モジュール１６０、オーディオモジュール１７０、ラウドスピーカ１７０Ａ、レシーバ１７０Ｂ、マイクロフォン１７０Ｃ、ヘッドセットジャック１７０Ｄ、センサモジュール１８０、ボタン１９０、モータ１９１、インジケータ１９２、カメラ１９３、ディスプレイ１９４、及び加入者識別モジュール（subscriber identity module、ＳＩＭ）カードインターフェース１９５などを含み得る。センサモジュール１８０は、圧力センサ１８０Ａ、ジャイロセンサ１８０Ｂ、気圧センサ１８０Ｃ、磁気センサ１８０Ｄ、加速度センサ１８０Ｅ、距離センサ１８０Ｆ、光近接センサ１８０Ｇ、指紋センサ１８０Ｈ、温度センサ１８０Ｊ、タッチセンサ１８０Ｋ、環境光センサ１８０Ｌ、骨伝導センサ１８０Ｍなどを含み得る。 As shown in FIG. 9, the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a loudspeaker 170A, a receiver 170B, a microphone 170C, a headset jack 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display 194, and a subscriber identity module (SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, an optical proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

本出願のこの実施形態において示される構造は、電子デバイス１００に対する特定の限定を構成しないことは理解され得る。本出願のいくつかの他の実施形態では、電子デバイス１００は、図に示されるものよりも多い又は少ない構成要素を含んでもよいし、いくつかの構成要素が組み合わされてもよいし、いくつかの構成要素が分割されてもよいし、異なる構成要素配置が使用されてもよい。図に示される構成要素は、ハードウェア、ソフトウェア、又はソフトウェアとハードウェアの組合せで実装され得る。 It may be understood that the structure shown in this embodiment of the present application does not constitute a specific limitation on the electronic device 100. In some other embodiments of the present application, the electronic device 100 may include more or fewer components than those shown in the figures, some components may be combined, some components may be divided, or different component arrangements may be used. The components shown in the figures may be implemented in hardware, software, or a combination of software and hardware.

プロセッサ１１０は、１つ又は複数の処理ユニットを含み得る。例えば、プロセッサ１１０は、アプリケーションプロセッサ（application processor、ＡＰ）、モデムプロセッサ、グラフィックス処理ユニット（graphics processing unit、ＧＰＵ）、画像信号プロセッサ（image signal processor、ＩＳＰ）、コントローラ、メモリ、ビデオコーデック、デジタルシグナルプロセッサ（digital signal processor、ＤＳＰ）、ベースバンドプロセッサ、及び／又はニューラルネットワーク処理ユニット（neural-network processing unit、ＮＰＵ）などを含み得る。異なる処理ユニットは、別個のデバイスであってもよいし、１つ又は複数のプロセッサに統合されてもよい。 The processor 110 may include one or more processing units. For example, the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), an image signal processor (ISP), a controller, a memory, a video codec, a digital signal processor (DSP), a baseband processor, and/or a neural-network processing unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.

コントローラは、電子デバイス１００の神経中枢及び指令中枢であってもよい。コントローラは、命令フェッチ及び実行に関する制御を実行するために、命令オペレーションコード及びタイミング信号に基づいてオペレーション制御信号を生成し得る。 The controller may be the nerve center and command center of the electronic device 100. The controller may generate operation control signals based on instruction operation codes and timing signals to exercise control over instruction fetching and execution.

メモリが、プロセッサ１１０に更に配置され得、命令及びデータを記憶するように構成される。いくつかの実施形態では、プロセッサ１１０内のメモリはキャッシュである。メモリは、プロセッサ１１０によって使用されたばかりの、又は周期的に使用される命令又はデータを記憶し得る。プロセッサ１１０が命令又はデータを再び使用する必要がある場合、プロセッサ１１０は、メモリから命令又はデータを直接呼び出し得る。これにより、繰り返しのアクセスが回避され、プロセッサ１１０の待ち時間が短縮され、システム効率を向上させる。 Memory may further be disposed in the processor 110 and configured to store instructions and data. In some embodiments, the memory in the processor 110 is a cache. The memory may store instructions or data that have just been used or that are used periodically by the processor 110. When the processor 110 needs to use the instructions or data again, the processor 110 may retrieve the instructions or data directly from the memory. This avoids repeated accesses, reducing the latency of the processor 110 and improving system efficiency.

いくつかの実施形態では、プロセッサ１１０は、１つ又は複数のインターフェースを含み得る。インターフェースは、集積回路間（inter-integrated circuit、Ｉ２Ｃ）インターフェース、集積回路間サウンド（inter-integrated circuit sound、Ｉ２Ｓ）インターフェース、パルス符号変調（pulse code modulation、ＰＣＭ）インターフェース、汎用非同期送受信回路（universal asynchronous receiver/transmitter、ＵＡＲＴ）インターフェース、モバイルインダストリープロセッサインターフェース（mobile industry processor interface、ＭＩＰＩ）、汎用入出力（general-purpose input/output、ＧＰＩＯ）インターフェース、加入者識別モジュール（subscriber ｉｄｅｎｔｉｔｙ module、ＳＩＭ）インターフェース、ユニバーサルシリアルバス（universal serial bus、ＵＳＢ）ポート、及び／又は同様のものを含み得る。 In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an inter-integrated circuit (I2C) interface, an inter-integrated circuit sound (I2S) interface, a pulse code modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a mobile industry processor interface (MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (SIM) interface, a universal serial bus (USB) port, and/or the like.

Ｉ^２Ｃインターフェースは、シリアルデータライン（serial data line、ＳＤＡ）及びシリアルクロックライン（serial clock line、ＳＣＬ）を含む双方向同期シリアルバスである。いくつかの実施形態では、プロセッサ１１０は、複数のＩ^２Ｃバスを含み得る。プロセッサ１１０は、異なるＩ^２Ｃバスインターフェースを介して、タッチセンサ１８０Ｋ、充電器、カメラフラッシュ、カメラ１９３などにそれぞれ結合され得る。例えば、プロセッサ１１０は、Ｉ２Ｃインターフェースを介してタッチセンサ１８０Ｋに結合されてもよく、プロセッサ１１０及びタッチセンサ１８０Ｋが、Ｉ２Ｃバスインターフェースを介して互いに通信し、それによって、電子デバイス１００のタッチ機能を実装する。 The ^I2C interface is a bidirectional synchronous serial bus including a serial data line (SDA) and a serial clock line (SCL). In some embodiments, the processor 110 may include multiple ^I2C buses. The processor 110 may be coupled to the touch sensor 180K, the charger, the camera flash, the camera 193, etc., via different ^I2C bus interfaces. For example, the processor 110 may be coupled to the touch sensor 180K via an I2C interface, and the processor 110 and the touch sensor 180K communicate with each other via the I2C bus interfaces, thereby implementing the touch function of the electronic device 100.

Ｉ^２Ｓインターフェースは、オーディオ通信に使用され得る。いくつかの実施形態では、プロセッサ１１０は、複数のＩ^２Ｓバスを含み得る。プロセッサ１１０は、プロセッサ１１０とオーディオモジュール１７０との間の通信を実現するために、Ｉ^２Ｓバス介してオーディオモジュール１７０に結合され得る。いくつかの実施形態では、オーディオモジュール１７０は、Ｂｌｕｅｔｏｏｔｈ（登録商標）ヘッドセットを介して呼に応答する機能を実装するために、Ｉ^２Ｓインターフェースを介してオーディオ信号をワイヤレス通信モジュール１６０に送信し得る。 The ^I2S interface may be used for audio communication. In some embodiments, the processor 110 may include multiple ^I2S buses. The processor 110 may be coupled to the audio module 170 via the ^I2S bus to facilitate communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit audio signals to the wireless communication module 160 via the ^I2S interface to implement a function for answering a call via a Bluetooth headset.

ＰＣＭインターフェースもまた、オーディオ通信に使用され得、アナログ信号に対してサンプリング、量子化、及び符号化を実行し得る。いくつかの実施形態では、オーディオモジュール１７０及びワイヤレス通信モジュール１６０は、ＰＣＭインターフェースを介して結合され得る。いくつかの実施形態では、オーディオモジュール１７０はまた、Ｂｌｕｅｔｏｏｔｈ（登録商標）ヘッドセットを介して呼に応答する機能を実装するために、ＰＣＭインターフェースを介してオーディオ信号をワイヤレス通信モジュール１６０に送信し得る。Ｉ２ＳインターフェースとＰＣＭインターフェースの両方が、オーディオ通信に使用され得る。 The PCM interface may also be used for audio communication and may perform sampling, quantization, and encoding on the analog signal. In some embodiments, audio module 170 and wireless communication module 160 may be coupled via a PCM interface . In some embodiments, audio module 170 may also send audio signals to wireless communication module 160 via the PCM interface to implement the functionality of answering a call via a Bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

ＵＡＲＴインターフェースは、ユニバーサルシリアルデータバスであり、非同期通信に使用される。バスは、双方向通信バスであり得る。これは、直列通信とパラレル通信との間で、送信されるべきデータを変換する。いくつかの実施形態では、ＵＡＲＴインターフェースは、通常、プロセッサ１１０及びワイヤレス通信モジュール１６０に接続するように構成される。例えば、プロセッサ１１０は、Ｂｌｕｅｔｏｏｔｈ（登録商標）機能を実装するために、ＵＡＲＴインターフェースを介してワイヤレス通信モジュール１６０中のＢｌｕｅｔｏｏｔｈ（登録商標）モジュールと通信する。いくつかの実施形態では、オーディオモジュール１７０は、Ｂｌｕｅｔｏｏｔｈ（登録商標）ヘッドセットを介して音楽を再生する機能を実装するために、ＵＡＲＴインターフェースを介してオーディオ信号をワイヤレス通信モジュール１６０に送信し得る。 The UART interface is a universal serial data bus and is used for asynchronous communication. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial and parallel communication. In some embodiments, the UART interface is configured to connect generally to the processor 110 and the wireless communication module 160. For example, the processor 110 communicates with a Bluetooth module in the wireless communication module 160 through the UART interface to implement Bluetooth functionality. In some embodiments, the audio module 170 may send audio signals to the wireless communication module 160 through the UART interface to implement the functionality of playing music through a Bluetooth headset.

ＭＩＰＩインターフェースは、プロセッサ１１０と、ディスプレイ１９４及びカメラ１９３のような周辺デバイスとを接続するように構成され得る。ＭＩＰＩインターフェースは、カメラシリアルインターフェース（camera serial interface、ＣＳＩ）、ディスプレイシリアルインターフェース（display serial interface、ＤＳＩ）などを含む。いくつかの実施形態では、プロセッサ１１０は、電子デバイス１００の撮影機能を実装するために、ＣＳＩインターフェースを介してカメラ１９３と通信する。プロセッサ１１０は、電子デバイス１００の表示機能を実装するために、ＤＳＩインターフェースを介してディスプレイ１９４と通信する。 The MIPI interface may be configured to connect the processor 110 to peripheral devices such as the display 194 and the camera 193. The MIPI interface includes a camera serial interface (CSI), a display serial interface (DSI), and the like. In some embodiments, the processor 110 communicates with the camera 193 via the CSI interface to implement the image capture function of the electronic device 100. The processor 110 communicates with the display 194 via the DSI interface to implement the display function of the electronic device 100.

ＧＰＩＯインターフェースは、ソフトウェアによって構成され得る。ＧＰＩＯインターフェースは、制御信号又はデータ信号として構成され得る。いくつかの実施形態では、ＧＰＩＯインターフェースは、プロセッサ１１０を、カメラ１９３、ディスプレイ１９４、ワイヤレス通信モジュール１６０、オーディオモジュール１７０、センサモジュール１８０などに接続するように構成され得る。ＧＰＩＯインターフェースは、Ｉ^２Ｃインターフェース、Ｉ^２Ｓインターフェース、ＵＡＲＴインターフェース、ＭＩＰＩインターフェースなどとして更に構成され得る。 The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal or a data signal. In some embodiments, the GPIO interface may be configured to connect the processor 110 to the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, etc. The GPIO interface may be further configured as an ^I2C interface, an ^I2S interface, a UART interface, a MIPI interface, etc.

ＵＳＢインターフェース１３０は、ＵＳＢ標準仕様に準拠したインターフェースであり、具体的には、ミニＵＳＢインターフェース、マイクロＵＳＢインターフェース、ＵＳＢＴｙｐｅ－Ｃインターフェースなどであり得る。ＵＳＢインターフェース１３０は、充電器に接続して電子デバイス１００を充電するように構成され得、また、電子デバイス１００と周辺デバイスとの間でデータを送信するように構成され得る。それはまた、ヘッドセットを介してオーディオを再生するために、ヘッドセットに接続するように構成され得る。インターフェースはまた、ＡＲデバイスなどの別の端末デバイスに接続するように構成され得る。 The USB interface 130 is an interface that complies with the USB standard specifications, and specifically may be a mini USB interface, a micro USB interface, a USB Type-C interface, etc. The USB interface 130 may be configured to connect to a charger to charge the electronic device 100, and may also be configured to transmit data between the electronic device 100 and a peripheral device. It may also be configured to connect to a headset to play audio through the headset. The interface may also be configured to connect to another terminal device, such as an AR device.

本出願のこの実施形態において示されるモジュール間のインターフェース接続関係は、例示的な説明にすぎず、電子デバイス１００の構造に対する限定を構成しないことが理解され得る。本出願のいくつかの他の実施形態では、電子デバイス１００は、代替的に、前述の実施形態とは異なるインターフェース接続方式を使用してもよいし、複数のインターフェース接続方式の組合せを使用してもよい。 It may be understood that the interface connection relationships between modules shown in this embodiment of the present application are merely exemplary illustrations and do not constitute limitations on the structure of the electronic device 100. In some other embodiments of the present application, the electronic device 100 may alternatively use a different interface connection scheme than the aforementioned embodiment or may use a combination of multiple interface connection schemes.

充電管理モジュール１４０は、充電器から充電入力を受信するように構成される。充電器は、ワイヤレス充電器又はワイヤード充電器であり得る。いくつかのワイヤード充電実施形態では、充電管理モジュール１４０は、ＵＳＢインターフェース１３０を介して有線充電器から充電入力を受信し得る。いくつかのワイヤレス充電実施形態では、充電管理モジュール１４０は、電子デバイス１００のワイヤレス充電コイルを介してワイヤレス充電入力を受信し得る。充電管理モジュール１４０は、更に、バッテリ１４２を充電しながら、電力管理モジュール１４１を介して電子デバイス１００に電力を供給し得る。 The charging management module 140 is configured to receive a charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive a charging input from a wired charger via the USB interface 130. In some wireless charging embodiments, the charging management module 140 may receive a wireless charging input via a wireless charging coil of the electronic device 100. The charging management module 140 may further provide power to the electronic device 100 via the power management module 141 while charging the battery 142.

電力管理モジュール１４１は、バッテリ１４２、充電管理モジュール１４０、及びプロセッサ１１０に接続するように構成される。電力管理モジュール１４１は、バッテリ１４２及び／又は充電管理モジュール１４０の入力を受信し、プロセッサ１１０、内部メモリ１２１、外部メモリ、ディスプレイ１９４、カメラ１９３、ワイヤレス通信モジュール１６０などに電力を供給する。電力管理モジュール１４１は、バッテリ容量、バッテリサイクル量、及びバッテリヘルスステータス（漏電及びインピーダンス）などのパラメータを監視するように更に構成され得る。いくつかの他の実施形態では、電力管理モジュール１４１はまた、プロセッサ１１０内に設けられ得る。いくつかの他の実施形態では、電力管理モジュール１４１及び充電管理モジュール１４０は、同じデバイス内に設けられ得る。 The power management module 141 is configured to connect to the battery 142, the charging management module 140, and the processor 110. The power management module 141 receives inputs from the battery 142 and/or the charging management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may be further configured to monitor parameters such as battery capacity, battery cycle amount, and battery health status (leakage and impedance). In some other embodiments, the power management module 141 may also be provided within the processor 110. In some other embodiments, the power management module 141 and the charging management module 140 may be provided within the same device.

電子デバイス１００のワイヤレス通信機能は、アンテナ１、アンテナ２、モバイル通信モジュール１５０、ワイヤレス通信モジュール１６０、モデムプロセッサ、ベースバンドプロセッサなどを介して具現され得る。 The wireless communication function of the electronic device 100 may be embodied via antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, a modem processor, a baseband processor, etc.

アンテナ１及びアンテナ２は、電磁波信号を送受信するように構成される。電子デバイス１００内の各アンテナは、１つ又は複数の通信周波数帯域をカバーするように構成され得る。アンテナ利用率を向上させるために、異なるアンテナが更に多重化され得る。例えば、アンテナ１は、ワイヤレスローカルエリアネットワークにおけるダイバーシティアンテナとして多重化され得る。いくつかの他の実施形態では、アンテナは、同調スイッチと併用され得る。 Antenna 1 and Antenna 2 are configured to transmit and receive electromagnetic signals. Each antenna in electronic device 100 may be configured to cover one or more communication frequency bands. Different antennas may be further multiplexed to improve antenna utilization. For example, Antenna 1 may be multiplexed as a diversity antenna in a wireless local area network. In some other embodiments, the antennas may be used in conjunction with a tuning switch.

モバイル通信モジュール１５０は、電子デバイス１００に適用されるソリューションを２Ｇ、３Ｇ、４Ｇ、５Ｇなどを含むワイヤレス通信に提供し得る。モバイル通信モジュール１５０は、少なくとも１つのフィルタ、スイッチ、電力増幅器、低雑音増幅器（low noise amplifier、ＬＮＡ）などを含み得る。モバイル通信モジュール１５０は、アンテナ１を介して電磁波を受信し、受信された電磁波に対してフィルタリング及び増幅などの処理を実行し、復調のために処理された電磁波をモデムプロセッサに送信し得る。モバイル通信モジュール１５０は、モデムプロセッサによって変調された信号を更に増幅し、放射にアンテナ１を使用することによってその信号を電磁波に変換し得る。いくつかの実施形態では、モバイル通信モジュール１５０の少なくともいくつかの機能モジュールは、プロセッサ１１０に配置され得る。いくつかの実施形態では、モバイル通信モジュール１５０の少なくともいくつかの機能モジュールは、プロセッサ１１０の少なくともいくつかのモジュールと同じデバイス内に配置され得る。 The mobile communication module 150 may provide a solution for wireless communication including 2G, 3G, 4G, 5G, etc., applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc. The mobile communication module 150 may receive electromagnetic waves via the antenna 1, perform processing such as filtering and amplification on the received electromagnetic waves, and send the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 may further amplify the signal modulated by the modem processor and convert the signal to electromagnetic waves by using the antenna 1 for radiation. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be located in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be located in the same device as at least some of the modules of the processor 110.

モデムプロセッサは、変調器と復調器とを含み得る。変調器は、送信されるべき低周波数ベースバンド信号を中間周波数信号及び高周波数信号に変調するように構成される。復調器は、受信した電磁波信号を低周波数ベースバンド信号に復調するように構成される。次いで、復調器は、復調された低周波数ベースバンド信号を処理のためにベースバンドプロセッサに送信する。低周波数ベースバンド信号は、ベースバンドプロセッサによって処理された後、アプリケーションプロセッサに渡される。アプリケーションプロセッサは、オーディオデバイス（ラウドスピーカ１７０Ａ、レシーバ１７０Ｂなどに限定されない）を介して音声信号を出力したり、ディスプレイ１９４を介して画像又はビデオを表示したりする。いくつかの実施形態では、モデムプロセッサは別個のデバイスであり得る。いくつかの他の実施形態では、モデムプロセッサは、プロセッサ１１０とは無関係であり、モバイル通信モジュール１５０又は別の機能モジュールと同じデバイス内に設けられ得る。 The modem processor may include a modulator and a demodulator. The modulator is configured to modulate a low-frequency baseband signal to be transmitted into an intermediate frequency signal and a high-frequency signal. The demodulator is configured to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing. After being processed by the baseband processor, the low-frequency baseband signal is passed to the application processor. The application processor outputs an audio signal via an audio device (such as, but not limited to, loudspeaker 170A, receiver 170B, etc.) or displays an image or video via display 194. In some embodiments, the modem processor may be a separate device. In some other embodiments, the modem processor may be independent of the processor 110 and may be provided in the same device as the mobile communication module 150 or another functional module.

ワイヤレス通信モジュール１６０は、電子デバイス１００に適用されワイヤレス通信ソリューションであって、ワイヤレスローカルエリアネットワーク（wireless local area networks、ＷＬＡＮ）（例えば、ワイヤレスフィデリティ（wireless fidelity、Ｗｉ－Ｆｉ）ネットワーク）、ブルートゥース（登録商標）（Bluetooth（登録商標）、ＢＴ）、全地球的航法衛星システム（global navigation satellite system、ＧＮＳＳ）、周波数変調（frequency modulation、ＦＭ）、近距離無線通信（near field communication、ＮＦＣ）技術、赤外線（infrared、ＩＲ）技術などを含むワイヤレス通信ソリューションを提供し得る。ワイヤレス通信モジュール１６０は、少なくとも１つの通信処理モジュールを統合する１つ又は複数の構成要素であり得る。ワイヤレス通信モジュール１６０は、アンテナ２を介して電磁波を受信し、電磁波信号に対して周波数変調及びフィルタリング処理を実行し、処理された信号をプロセッサ１１０に送信する。ワイヤレス通信モジュール１６０は更に、送信されるべき信号をプロセッサ１１０から受信し、信号に対して周波数変調及び増幅を実行し、アンテナ２を介して放射するために信号を電磁波に変換し得る。 The wireless communication module 160 may be applied to the electronic device 100 to provide wireless communication solutions including wireless local area networks (WLAN) (e.g., wireless fidelity (Wi-Fi) networks), Bluetooth (BT), global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC) technology, infrared (IR) technology, etc. The wireless communication module 160 may be one or more components integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on the electromagnetic wave signal, and transmits the processed signal to the processor 110. The wireless communication module 160 may further receive signals to be transmitted from the processor 110, perform frequency modulation and amplification on the signals, and convert the signals into electromagnetic waves for emission via the antenna 2.

いくつかの実施形態では、電子デバイス１００において、アンテナ１とモバイル通信モジュール１５０とが結合され、アンテナ２とワイヤレス通信モジュール１６０とが結合され、それにより、電子デバイス１００は、ワイヤレス通信技術を使用することによってネットワーク及び別のデバイスと通信することができる。ワイヤレス通信技術は、汎欧州デジタル移動電話方式（global system for mobile communications、ＧＳＭ）、汎用パケット無線サービス（general packet radio service、ＧＰＲＳ）、符号分割多元接続（code division multiple access、ＣＤＭＡ）、広帯域符号分割多重接続（wideband code division multiple access、ＷＣＤＭＡ（登録商標））、時分割符号分割多元接続（time-division code division multiple access、ＴＤ－ＳＣＤＭＡ）、ロングタームエボリューション（long term evolution、ＬＴＥ）、ブルートゥース（登録商標）（Bluetooth（登録商標）、ＢＴ）、ＧＮＳＳ、ＷＬＡＮ、ＮＦＣ、ＦＭ、ＩＲ技術、及び／又は同様のものを含み得る。ＧＮＳＳは、全地球位置決めシステム（global positioning system、ＧＰＳ）、全地球的航法衛星システム（global navigation satellite system、ＧＬＯＮＡＳＳ）、北斗衛星導航系統（BeiDou navigation satellite system、ＢＤＳ）、準天頂衛星システム（quasi-zenith satellite system、ＱＺＳＳ）、及び／又は静止衛星型補強システム（satellite based augmentation systems、ＳＢＡＳ）を含み得る。 In some embodiments, antenna 1 and mobile communication module 150 are coupled in electronic device 100, and antenna 2 and wireless communication module 160 are coupled in electronic device 100, such that electronic device 100 can communicate with a network and another device by using wireless communication technologies. Wireless communication technologies may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), wideband code division multiple access (WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), Bluetooth ( BT ) , GNSS, WLAN, NFC, FM, IR technologies, and/or the like. The GNSS may include a global positioning system (GPS), a global navigation satellite system (GLONASS), a BeiDou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a satellite based augmentation systems (SBAS).

電子デバイス１００は、ＧＰＵ、ディスプレイ１９４、アプリケーションプロセッサなどを使用することによって表示機能を実装する。ＧＰＵは、画像処理用のマイクロプロセッサであり、ディスプレイ１９４及びアプリケーションプロセッサに接続される。ＧＰＵは、数学的及び幾何学的計算を実行し、画像をレンダリングするように構成される。プロセッサ１１０は、表示情報を生成又は変更するためのプログラム命令を実行する１つ又は複数のＧＰＵを含み得る。 The electronic device 100 implements display functionality by using a GPU, a display 194, an application processor, and the like. The GPU is a microprocessor for image processing and is connected to the display 194 and the application processor. The GPU is configured to perform mathematical and geometric calculations and render images. The processor 110 may include one or more GPUs that execute program instructions to generate or modify display information.

ディスプレイ１９４は、画像、ビデオなどを表示するように構成される。ディスプレイ１９４は、表示パネルを含む。表示パネルは、液晶表示装置（liquid crystal display、ＬＣＤ）、有機発光ダイオード（organic light-emitting diode、ＯＬＥＤ）、アクティブマトリクス有機発光ダイオード（active-matrix organic light emitting diode、ＡＭＯＬＥＤ）、フレキシブル発光ダイオード（flexible light-emitting diode、ＦＬＥＤ）、ミニＬＥＤ、マイクロＬＥＤ、マイクロＯＬＥＤ、量子ドット発光ダイオード（quantum dot light emitting diodes、ＱＬＥＤ）などであり得る。いくつかの実施形態では、電子デバイス１００は、１つ又はＮ個のディスプレイ１９４を含み得、Ｎは、１よりも大きい正の整数である。 The display 194 is configured to display images, videos, and the like. The display 194 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), a mini-LED, a micro-LED, a micro-OLED, a quantum dot light-emitting diodes (QLED), and the like. In some embodiments, the electronic device 100 can include one or N displays 194, where N is a positive integer greater than 1.

電子デバイス１００は、ＩＳＰ、カメラ１９３、ビデオコーデック、ＧＰＵ、ディスプレイ１９４、アプリケーションプロセッサなどを通じて撮影機能を実装し得る。 The electronic device 100 may implement the imaging function through an ISP, a camera 193, a video codec, a GPU, a display 194, an application processor, etc.

ＩＳＰは、カメラ１９３によってフィードバックされたデータを処理するように構成される。例えば、撮影時には、シャッターが開かれ、光線がレンズを介してカメラの感光素子に送信される。光信号は電気信号に変換される。カメラの感光素子は、電気信号を処理のためにＩＳＰに送信し、電気信号を可視画像に変換する。ＩＳＰは、画像ノイズ、輝度、及び肌の色合いに対してアルゴリズム最適化を更に実行し得る。ＩＳＰは、撮影シナリオにおける露出及び色温度などのパラメータを更に最適化することができる。いくつかの実施形態では、ＩＳＰは、カメラ１９３内に配置され得る。 The ISP is configured to process data fed back by the camera 193. For example, when taking a picture, a shutter is opened and a light beam is sent through a lens to the camera's photosensitive elements. The light signal is converted into an electrical signal. The camera's photosensitive elements send the electrical signal to the ISP for processing, which converts the electrical signal into a visible image. The ISP may further perform algorithmic optimization for image noise, brightness, and skin tone. The ISP may further optimize parameters such as exposure and color temperature in the picture taking scenario. In some embodiments, the ISP may be located within the camera 193.

カメラ１９３は、静止画像又はビデオをキャプチャするように構成される。物体の光学像は、レンズを通して生成され、感光素子上に投影される。感光素子は、電荷結合素子（charge-coupled device、ＣＣＤ）又は相補型金属酸化膜半導体（complementary metal-oxide-semiconductor、ＣＭＯＳ）フォトトランジスタであり得る。感光素子は、光信号を電気信号に変換し、その後、電気信号をＩＳＰに送信してデジタル画像信号に変換する。ＩＳＰは、デジタル画像信号を処理のためにＤＳＰに出力する。ＤＳＰは、デジタル画像信号を標準的なＲＧＢ又はＹＵＶなどのフォーマットの画像信号に変換する。いくつかの実施形態では、電子デバイス１００は、１つ又はＮ個のカメラ１９３を含み得、Ｎは、１よりも大きい正の整数である。 The camera 193 is configured to capture still images or video. An optical image of an object is generated through a lens and projected onto a photosensitive element. The photosensitive element may be a charge-coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then sent to the ISP for conversion into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a format such as standard RGB or YUV. In some embodiments, the electronic device 100 may include one or N cameras 193, where N is a positive integer greater than 1.

デジタルシグナルプロセッサは、デジタル信号を処理するように構成され、デジタル画像信号に加えて別のデジタル信号を処理することもある。例えば、電子デバイス１００が周波数を選択する場合、デジタルシグナルプロセッサは、周波数エネルギーに対してフーリエ変換などを事項するように構成される。 The digital signal processor is configured to process digital signals, and may process other digital signals in addition to the digital image signal. For example, if the electronic device 100 selects a frequency, the digital signal processor is configured to perform a Fourier transform or the like on the frequency energy.

ビデオコーデックは、デジタルビデオを圧縮又は解凍するように構成される。電子デバイス１００は、１つ又は複数のビデオコーデックをサポートし得る。このようにして、電子デバイス１００は、複数の符号化フォーマット、例えば、ＭＰＥＧ（moving picture experts group）－１、ＭＰＥＧ－２、ＭＰＥＧ－３、及びＭＰＥＧ－４でビデオを再生又は記録することができる。 A video codec is configured to compress or decompress digital video. Electronic device 100 may support one or more video codecs. In this manner, electronic device 100 can play or record video in multiple encoding formats, e.g., moving picture experts group (MPEG)-1, MPEG-2, MPEG-3, and MPEG-4.

ＮＰＵは、ニューラルネットワーク（neural-network、ＮＮ）コンピューティングプロセッサであり、人間の脳内のニューロン間の伝達モードなどの生物学的ニューラルネットワーク構造をシミュレートして、入力情報に対して高速処理を実行し、連続的な自己学習を実行することができる。ＮＰＵは、電子デバイス１００の知的認識、例えば、画像認識、顔認識、音声認識、及びテキスト理解などのアプリケーションを実装し得る。 The NPU is a neural-network (NN) computing processor that can simulate a biological neural network structure, such as the communication mode between neurons in the human brain, to perform high-speed processing on input information and perform continuous self-learning. The NPU can implement applications such as intelligent recognition of the electronic device 100, such as image recognition, face recognition, speech recognition, and text understanding.

外部メモリインターフェース１２０は、電子デバイス１００の記憶能力を拡張するために、外部メモリカード、例えば、マイクロＳＤカードに接続するように構成され得る。外部メモリカードは、データ記憶機能を実装するために、外部メモリインターフェース１２０を使用することによってプロセッサ１１０と通信する。外部記憶カードには、例えば音楽及びビデオなどのファイルが記憶される。 The external memory interface 120 may be configured to connect to an external memory card, e.g., a microSD card, to expand the storage capabilities of the electronic device 100. The external memory card communicates with the processor 110 by using the external memory interface 120 to implement data storage functions. The external memory card stores files such as music and videos.

内部メモリ１２１は、コンピュータ実行可能プログラムコードを記憶するように構成され得、実行可能プログラムコードは命令を含む。プロセッサ１１０は、内部メモリ１２１に記憶された命令を実行することによって、電子デバイス１００の様々な機能アプリケーション及びデータ処理を実行する。内部メモリ１２１は、プログラム記憶エリア及びデータ記憶エリアを含み得る。プログラム記憶エリアは、オペレーティングシステム、少なくとも１つの機能（音声再生機能及び画像再生機能など）に必要なアプリケーションなどを記憶し得る。データ記憶エリアは、電子デバイス１００の使用に基づいて作成されたデータ（音声データ及び電話帳など）などを記憶し得る。加えて、内部メモリ１２１は、高速ランダムアクセスメモリを含み得、不揮発性メモリ、例えば、少なくとも１つの磁気記憶デバイス、フラッシュメモリデバイス、及びユニバーサルフラッシュストレージ（universal flash storage、ＵＦＳ）を更に含み得る。 The internal memory 121 may be configured to store computer executable program code, which includes instructions. The processor 110 executes various functional applications and data processing of the electronic device 100 by executing the instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. The program storage area may store an operating system, applications necessary for at least one function (such as a voice playback function and an image playback function), and the like. The data storage area may store data (such as voice data and a phone book) created based on the use of the electronic device 100, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a non-volatile memory, such as at least one magnetic storage device, a flash memory device, and a universal flash storage (UFS).

電子デバイス１００は、オーディオモジュール１７０、ラウドスピーカ１７０Ａ、レシーバ１７０Ｂ、マイクロフォン１７０Ｃ、ヘッドセットジャック１７０Ｄ、アプリケーションプロセッサなどを介して、音楽再生及び録音などのオーディオ機能を実装し得る。 The electronic device 100 may implement audio functions such as music playback and recording via an audio module 170, a loudspeaker 170A, a receiver 170B, a microphone 170C, a headset jack 170D, an application processor, etc.

オーディオモジュール１７０は、デジタルオーディオ情報をアナログオーディオ信号出力に変換するように構成され、アナログオーディオ入力をデジタルオーディオ信号に変換するようにも構成される。オーディオモジュール１７０は、オーディオ信号を符号化及び復号するように更に構成され得る。いくつかの実施形態では、オーディオモジュール１７０は、プロセッサ１１０内に配置されてもよく、又はオーディオモジュール１７０内のいくつかの機能モジュールは、プロセッサ１１０内に配置される。 Audio module 170 is configured to convert digital audio information to an analog audio signal output, and is also configured to convert an analog audio input to a digital audio signal. Audio module 170 may be further configured to encode and decode audio signals. In some embodiments, audio module 170 may be located within processor 110, or some functional modules within audio module 170 are located within processor 110.

ラウドスピーカ１７０Ａは、「ラウドスピーカ」とも呼ばれ、オーディオ電気信号を音声信号に変換するように構成される。電子デバイス１００は、ラウドスピーカ１７０Ａを通じてハンズフリーモードで音楽を聴いたり呼に応答したりし得る。 The loudspeaker 170A, also referred to as a "loudspeaker," is configured to convert audio electrical signals into voice signals. The electronic device 100 may listen to music or answer calls in a hands-free mode through the loudspeaker 170A.

レシーバ１７０Ｂは、「イヤピース」とも呼ばれ、オーディオ電気信号を音声信号に変換するように構成される。電子デバイス１００を介して呼に応答したり音声メッセージを受けたりする場合、レシーバ１７０Ｂを人の耳に近づけて音声を聞くことができる。 Receiver 170B, also known as an "earpiece," is configured to convert audio electrical signals into speech signals. When answering a call or receiving a voice message via electronic device 100, receiver 170B can be held close to a person's ear to hear the audio.

マイクロフォン１７０Ｃは、「マイク（mike）」又は「マイク（mic）」とも呼ばれ、音声信号を電気信号に変換するように構成される。ユーザは、電話をかけたり音声メッセージを送信したりする際に、人の口をマイクロフォン１７０Ｃに近づけて音を発して、その音声信号をマイクロフォン１７０Ｃに入力し得る。少なくとも１つのマイクロフォン１７０Ｃが電子デバイス１００内に配置され得る。いくつかの他の実施形態では、音声信号を収集することに加えて、ノイズリダクション機能を実装するために、２つのマイクロフォン１７０Ｃが電子デバイス１００内に配置され得る。いくつかの他の実施形態では、代替的に、３つ、４つ、又はそれ以上のマイクロフォン１７０Ｃを電子デバイス１００内に配置して、音声信号を収集し、ノイズリダクションを実装し、音源を識別し、それにより、指向性録音機能などを実装してもよい。 The microphone 170C is also called a "mike" or "mic" and is configured to convert a voice signal into an electrical signal. When making a phone call or sending a voice message, a user may put a person's mouth close to the microphone 170C and make a sound to input the voice signal into the microphone 170C. At least one microphone 170C may be disposed in the electronic device 100. In some other embodiments, in addition to collecting voice signals, two microphones 170C may be disposed in the electronic device 100 to implement noise reduction functions. In some other embodiments, three, four, or more microphones 170C may alternatively be disposed in the electronic device 100 to collect voice signals, implement noise reduction, identify sound sources, and thereby implement directional recording functions, etc.

ヘッドセットジャック１７０Ｄは、ワイヤードヘッドセットに接続するように構成される。ヘッドセットジャック１７０Ｄは、ＵＳＢインターフェース１３０であってもよいし、３．５ｍｍのオープンモバイル端末プラットフォーム（open mobile terminal platform、ＯＭＴＰ）標準インターフェース若しくは米国セルラー電気通信産業協会（Cellular Telecommunications Industry Association of the USA、ＣＴＩＡ）標準インターフェースであってもよい。 The headset jack 170D is configured to connect to a wired headset. The headset jack 170D may be a USB interface 130, a 3.5 mm open mobile terminal platform (OMTP) standard interface, or a Cellular Telecommunications Industry Association of the USA (CTIA) standard interface.

圧力センサ１８０Ａは、圧力信号を感知するように構成され、圧力信号を電気信号に変換することができる。いくつかの実施形態では、圧力センサ１８０Ａは、ディスプレイ１９４内に配置され得る。圧力センサ１８０Ａには、抵抗型圧力センサ、誘導型圧力センサ、及び容量性圧力センサなど、多くのタイプがある。容量性圧力センサは、導電性材料を有する少なくとも２つの平行なプレートを含み得る。圧力センサ１８０Ａに力が加わると、電極間の静電容量が変化する。電子デバイス１００は、静電容量の変化に基づいて圧力の強さを決定する。ディスプレイ１９４に対してタッチ操作が行われると、電子デバイス１００は、圧力センサ１８０Ａを使用することによってタッチ操作の強度を検出する。電子デバイス１００はまた、圧力センサ１８０Ａの検出信号に基づいてタッチ位置を計算し得る。いくつかの実施形態では、同じタッチ位置に対して実行されるが、異なるタッチ操作強度を有するタッチ操作は、異なる操作命令に対応し得る。例えば、第１の圧力閾値未満のタッチ操作強度を有するタッチ操作がショートメッセージアプリケーションアイコンに対して実行されると、ショートメッセージを閲覧する命令が実行される。第１の圧力閾値以上のタッチ操作強度を有するタッチ操作がショートメッセージアプリケーションアイコンに対して実行されると、新しいショートメッセージを作成する命令が実行される。 The pressure sensor 180A is configured to sense a pressure signal and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed within the display 194. There are many types of pressure sensor 180A, such as resistive pressure sensors, inductive pressure sensors, and capacitive pressure sensors. The capacitive pressure sensor may include at least two parallel plates having a conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes. The electronic device 100 determines the strength of the pressure based on the change in capacitance. When a touch operation is performed on the display 194, the electronic device 100 detects the strength of the touch operation by using the pressure sensor 180A. The electronic device 100 may also calculate a touch position based on the detection signal of the pressure sensor 180A. In some embodiments, touch operations performed on the same touch position but with different touch operation strengths may correspond to different operation instructions. For example, when a touch operation with a touch operation strength less than a first pressure threshold is performed on a short message application icon, an instruction to view a short message is executed. When a touch operation having a touch operation strength equal to or greater than the first pressure threshold is performed on the short message application icon, a command to create a new short message is executed.

ジャイロセンサ１８０Ｂは、電子デバイス１００の動き姿勢を決定するように構成され得る。いくつかの実施形態では、３つの軸（ｘ、ｙ、及びｚ軸）の周りの電子デバイス１００の角速度は、ジャイロセンサ１８０Ｂを使用することによって決定され得る。ジャイロセンサ１８０Ｂは、手振れ補正のために使用され得る。例えば、シャッターが押されると、ジャイロセンサ１８０Ｂは、電子デバイス１００がジッタする角度を検出し、この角度に基づいて、レンズモジュールが補償する必要がある距離を計算し、レンズが逆の動きを通じて電子デバイス１００の揺らぎを打ち消すことを可能にして、画像安定化を実現する。ジャイロセンサ１８０Ｂは、ナビゲーションシナリオ及び動き感知ゲームシナリオで更に使用され得る。 The gyro sensor 180B may be configured to determine the motion attitude of the electronic device 100. In some embodiments, the angular velocity of the electronic device 100 around three axes (x, y, and z) may be determined by using the gyro sensor 180B. The gyro sensor 180B may be used for image stabilization. For example, when a shutter is pressed, the gyro sensor 180B detects the angle at which the electronic device 100 jitters, and based on this angle, calculates the distance that the lens module needs to compensate, allowing the lens to counteract the jitter of the electronic device 100 through a reverse motion to achieve image stabilization. The gyro sensor 180B may further be used in navigation scenarios and motion-sensing gaming scenarios.

気圧センサ１８０Ｃは、気圧を測定するように構成される。いくつかの実施形態では、電子デバイス１００は、位置決め及びナビゲーションを支援するために、気圧センサ１８０Ｃによって測定された気圧値を使用することによって高度を計算する。 The air pressure sensor 180C is configured to measure air pressure. In some embodiments, the electronic device 100 calculates altitude by using the air pressure values measured by the air pressure sensor 180C to aid in positioning and navigation.

磁気センサ１８０Ｄは、ホールセンサを含む。電子デバイス１００は、磁気センサ１８０Ｄを使用することによってフリップホルスタの開閉を検出し得る。いくつかの実施形態では、電子デバイス１００がフリップデバイスであるとき、電子デバイス１００は、磁気センサ１８０Ｄを使用することによってフリップの開閉を検出し得る。更に、検出されたレザーケースの開閉状態又はフリップカバーの開閉状態に基づいて、フリップカバーを開けたときに自動的にロック解除されるなどの機能が設定される。 The magnetic sensor 180D includes a Hall sensor. The electronic device 100 may detect the opening and closing of the flip holster by using the magnetic sensor 180D. In some embodiments, when the electronic device 100 is a flip device, the electronic device 100 may detect the opening and closing of the flip by using the magnetic sensor 180D. Furthermore, based on the detected opening and closing state of the leather case or the opening and closing state of the flip cover, a function such as automatically unlocking when the flip cover is opened is set.

加速度センサ１８０Ｅは、電子デバイス１００の様々な方向（通常は３軸）の加速度を検出し得る。電子デバイス１００が静止している場合、重力の大きさ及び方向が検出され得る。加速度センサ１８０Ｅは、電子デバイス１００の姿勢を識別するように更に構成されてもよく、歩数計及びランドスケープモードとポートレートモードとの間の切り替えなどの用途に適用される。 The acceleration sensor 180E can detect acceleration of the electronic device 100 in various directions (usually three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. The acceleration sensor 180E may be further configured to identify the orientation of the electronic device 100, for applications such as a pedometer and switching between landscape and portrait modes.

距離センサ１８０Ｆは、距離を測定するように構成される。電子デバイス１００は、赤外線方式又はレーザー方式で距離を測定し得る。いくつかの実施形態では、撮影シナリオにおいて、電子デバイス１００は、距離センサ１８０Ｆを使用して距離を測定し、高速フォーカシングを達成し得る。 The distance sensor 180F is configured to measure distance. The electronic device 100 may measure distance in an infrared or laser manner. In some embodiments, in a photography scenario, the electronic device 100 may use the distance sensor 180F to measure distance and achieve fast focusing.

光近接センサ１８０Ｇは、例えば、発光ダイオード（ＬＥＤ）と、フォトダイオードなどの光検出器とを含み得る。発光ダイオードは、赤外線発光ダイオードであり得る。電子デバイス１００は、発光ダイオードを使用することによって外部に赤外光を発光する。電子デバイス１００は、フォトダイオードを使用して、周囲の物体からの赤外線反射光を検出する。豊富な反射光が検出された場合、電子デバイス１００の近傍に物体が存在すると決定し得る。反射光が十分に検出されない場合、電子デバイス１００は、電子デバイス１００の近くに物体がないと決定し得る。電子デバイス１００は、光近接センサ１８０Ｇを使用して、ユーザが通話のために電子デバイス１００を耳の近くに保持していることを検出して、ディスプレイを自動的にオフにして電力を節約し得る。光近接センサ１８０Ｇはまた、自動ロック解除及び画面ロックのために、ホルスターモード又はポケットモードで使用され得る。 The optical proximity sensor 180G may include, for example, a light emitting diode (LED) and a photodetector such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device 100 emits infrared light to the outside by using the light emitting diode. The electronic device 100 uses the photodiode to detect infrared reflected light from surrounding objects. If abundant reflected light is detected, it may be determined that an object is present in the vicinity of the electronic device 100. If insufficient reflected light is detected, the electronic device 100 may determine that no object is present in the vicinity of the electronic device 100. The electronic device 100 may use the optical proximity sensor 180G to detect when a user holds the electronic device 100 close to the ear for a call and automatically turn off the display to save power. The optical proximity sensor 180G may also be used in holster mode or pocket mode for automatic unlocking and screen locking.

環境光センサ１８０Ｌは、周囲光の明るさを感知するように構成される。電子デバイス１００は、感知された周囲光の明るさに基づいて、ディスプレイ１９４の輝度を適応的に調整し得る。環境光センサ１８０Ｌは、撮影中にホワイトバランスを自動的に調整するように更に構成され得る。環境光センサ１８０Ｌは、更に、光近接センサ１８０Ｇと協働して、電子デバイス１００がポケット内にあるかどうかを検出して、偶発的なタッチを防止し得る。 The ambient light sensor 180L is configured to sense the brightness of the ambient light. The electronic device 100 may adaptively adjust the brightness of the display 194 based on the sensed brightness of the ambient light. The ambient light sensor 180L may further be configured to automatically adjust the white balance during capture. The ambient light sensor 180L may further cooperate with the optical proximity sensor 180G to detect whether the electronic device 100 is in a pocket to prevent accidental touch.

指紋センサ１８０Ｈは、指紋を採取するように構成される。電子デバイス１００は、収集された指紋の特徴を使用して、指紋ベースのロック解除、アプリケーションロックアクセス、指紋ベースの写真撮影、指紋ベースの着呼応答などを実施し得る。 Fingerprint sensor 180H is configured to capture a fingerprint. Electronic device 100 may use the captured fingerprint characteristics to perform fingerprint-based unlocking, application lock access, fingerprint-based photographing, fingerprint-based call answering, etc.

温度センサ１８０Ｊは、温度を検出するように構成される。いくつかの実施形態では、電子デバイス１００は、温度センサ１８０Ｊによって検出された温度に基づいて温度処理ポリシーを実行する。例えば、温度センサ１８０Ｊによって報告された温度が閾値を超えるとき、電子デバイス１００は、温度センサ１８０Ｊの近くに位置するプロセッサの性能を低減して、電力消費を低減し、熱保護を実施する。いくつかの他の実施形態では、温度が別の閾値よりも低いとき、電子デバイス１００は、バッテリ１４２を加熱して、低温によって引き起こされる電子デバイス１００の異常シャットダウンを回避する。いくつかの他の実施形態では、温度が更に別の閾値よりも低いとき、電子デバイス１００は、バッテリ１４２の出力電圧をブーストして、低温によって引き起こされる異常シャットダウンを回避する。 The temperature sensor 180J is configured to detect a temperature. In some embodiments, the electronic device 100 executes a temperature handling policy based on the temperature detected by the temperature sensor 180J. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the electronic device 100 reduces the performance of a processor located near the temperature sensor 180J to reduce power consumption and implement thermal protection. In some other embodiments, when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to avoid an abnormal shutdown of the electronic device 100 caused by a low temperature. In some other embodiments, when the temperature is lower than yet another threshold, the electronic device 100 boosts the output voltage of the battery 142 to avoid an abnormal shutdown caused by a low temperature.

タッチセンサ１８０Ｋは、「タッチパネル」とも呼ばれる。タッチセンサ１８０Ｋは、ディスプレイ１９４上に配置され得、タッチセンサ１８０Ｋ及びディスプレイ１９４は、「タッチスクリーン」とも呼ばれるタッチスクリーンを形成する。タッチセンサ１８０Ｋは、タッチセンサ上又はその近傍で実行されたタッチ操作を検出するように構成される。タッチセンサは、タッチイベントのタイプを決定するために、検出されたタッチ操作をアプリケーションプロセッサに転送し得る。タッチ操作に関連する視覚的出力がディスプレイ１９４上に提供され得る。いくつかの他の実施形態では、タッチセンサ１８０Ｋは、代替的に、ディスプレイ１９４の位置とは異なる位置で電子デバイス１００の表面上に配置されてもよい。 The touch sensor 180K may be referred to as a "touch panel." The touch sensor 180K may be disposed on the display 194, with the touch sensor 180K and the display 194 forming a touch screen, also referred to as a "touch screen." The touch sensor 180K is configured to detect touch operations performed on or near the touch sensor. The touch sensor may forward the detected touch operations to an application processor to determine a type of touch event. A visual output associated with the touch operation may be provided on the display 194. In some other embodiments, the touch sensor 180K may alternatively be disposed on the surface of the electronic device 100 at a location different from the location of the display 194.

骨伝導センサ１８０Ｍは、振動信号を取得し得る。いくつかの実施形態では、骨伝導センサ１８０Ｍは、人体の音振動骨から振動信号を取得し得る。また、骨伝導センサ１８０Ｍは、人体の脈拍に接触して血圧拍動信号を受信し得る。いくつかの実施形態では、骨伝導センサ１８０Ｍは、骨伝導ヘッドセットに統合されるように、ヘッドセットに配置され得る。オーディオモジュール１７０は、音声機能を実装するために、骨伝導センサ１８０Ｍによって取得された音振動骨の振動信号に基づいて音声信号を解析し得る。アプリケーションプロセッサは、心拍数検出機能を実装するために、骨伝導センサ１８０Ｍによって取得された血圧拍動信号に基づいて心拍数情報を解析し得る。 The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, the bone conduction sensor 180M may acquire a vibration signal from a sound-vibrating bone of the human body. The bone conduction sensor 180M may also receive a blood pressure pulsation signal by contacting the pulse of the human body. In some embodiments, the bone conduction sensor 180M may be disposed in a headset so as to be integrated into a bone conduction headset. The audio module 170 may analyze an audio signal based on the vibration signal of the sound-vibrating bone acquired by the bone conduction sensor 180M to implement a voice function. The application processor may analyze heart rate information based on the blood pressure pulsation signal acquired by the bone conduction sensor 180M to implement a heart rate detection function.

ボタン１９０は、電源ボタン、音量ボタンなどを含む。ボタン１９０は、機械的ボタンであってもよくいし、タッチセンサ式ボタンであってもよい。電子デバイス１００は、ボタン入力を受信し、電子デバイス１００のユーザ設定及び機能制御と関連したボタン信号入力を生成し得る。 Buttons 190 include a power button, a volume button, etc. Buttons 190 may be mechanical or touch-sensitive buttons. Electronic device 100 may receive button input and generate button signal input associated with user settings and function control of electronic device 100.

モータ１９１は、振動アラートを生成し得る。モータ１９１は、着信呼のための振動アラートのために使用され得、タッチ振動フィードバックのためにも使用され得る。例えば、異なるアプリケーション（例えば、撮影及びオーディオ再生）上のタッチ操作は、異なる振動フィードバック効果に対応し得る。ディスプレイ１９４の異なる領域上のタッチ操作について、モータ１９１はまた、対応して異なる振動フィードバック効果を生成し得る。異なるアプリケーションシナリオ（時間リマインダ、情報受信、目覚し時計、及びゲームなど）はまた、異なる振動フィードバック効果に対応し得る。タッチ振動フィードバック効果は、更にカスタマイズされ得る。 The motor 191 may generate a vibration alert. The motor 191 may be used for vibration alerts for incoming calls and may also be used for touch vibration feedback. For example, touch operations on different applications (e.g., taking pictures and playing audio) may correspond to different vibration feedback effects. For touch operations on different areas of the display 194, the motor 191 may also generate correspondingly different vibration feedback effects. Different application scenarios (such as time reminders, information reception, alarm clocks, and games) may also correspond to different vibration feedback effects. The touch vibration feedback effects may be further customized.

インジケータ１９２は、インジケータであり得、充電ステータス及び電力変化を示すように構成され得るか、又はメッセージ、不在着信、通知などを示すように構成され得る。 Indicator 192 may be an indicator and may be configured to indicate charging status and power changes, or may be configured to indicate messages, missed calls, notifications, etc.

ＳＩＭカードインターフェース１９５は、ＳＩＭカードに接続するように構成される。ＳＩＭカードは、電子デバイス１００との接触又は分離を実施するために、ＳＩＭカードインターフェース１９５に挿入され得るか、又はＳＩＭカードインターフェース１９５から取り外され得る。電子デバイス１００は、１つ又はＮ個のＳＩＭカードインターフェースをサポートし得、ここで、Ｎは、１よりも大きい正の整数である。ＳＩＭカードインターフェース１９５は、ナノＳＩＭカード、マイクロＳＩＭカード、ＳＩＭカードなどをサポートし得る。複数のカードが同じＳＩＭカードインターフェース１９５に挿入され得る。複数のカードのタイプは、同じであってもよいし、異なっていてもよい。ＳＩＭカードインターフェース１９５はまた、異なるタイプのＳＩＭカードと互換性があり得る。ＳＩＭカードインターフェース１９５はまた、外部メモリカードと互換性があり得る。電子デバイス１００は、ＳＩＭカードを介してネットワークと対話して、通話及びデータ通信などの機能を実装する。いくつかの実施形態では、電子デバイス１００は、ｅＳＩＭ、すなわち、組み込み型ＳＩＭカードを使用する。ｅＳＩＭカードは、電子デバイス１００に埋め込まれ得、電子デバイス１００から分離することができない。 The SIM card interface 195 is configured to connect to a SIM card. The SIM card can be inserted into or removed from the SIM card interface 195 to effect contact or separation with the electronic device 100. The electronic device 100 can support one or N SIM card interfaces, where N is a positive integer greater than 1. The SIM card interface 195 can support a nano-SIM card, a micro-SIM card, a SIM card, etc. Multiple cards can be inserted into the same SIM card interface 195. The types of the multiple cards can be the same or different. The SIM card interface 195 can also be compatible with different types of SIM cards. The SIM card interface 195 can also be compatible with an external memory card. The electronic device 100 interacts with a network through the SIM card to implement functions such as calling and data communication. In some embodiments, electronic device 100 uses an eSIM, i.e., an embedded SIM card. The eSIM card may be embedded in electronic device 100 and cannot be separated from electronic device 100.

図９に示される電子デバイス１００は例にすぎず、電子デバイス１００は、図９に示される構成要素よりも多い又は少ない構成要素を有してもよいし、２つ以上の構成要素を組み合わせてもよいし、異なる構成要素構成を有してもよいことを理解されたい。図９に示される様々な構成要素は、１つ又は複数の信号プロセッサ及び／又は特定用途向け集積回路を含むハードウェア、ソフトウェア、又はハードウェアとソフトウェアの組合せを使用することによって実装され得る。 It should be understood that the electronic device 100 shown in FIG. 9 is only an example, and that the electronic device 100 may have more or fewer components than those shown in FIG. 9, may combine two or more components, or may have a different component configuration. The various components shown in FIG. 9 may be implemented using hardware, software, or a combination of hardware and software, including one or more signal processors and/or application specific integrated circuits.

以下では、本出願の一実施形態による、電子デバイス１００のソフトウェア構造について説明する。 The following describes the software structure of the electronic device 100 according to one embodiment of the present application.

図１０は、本出願の一実施形態による、電子デバイス１００のソフトウェア構造の一例を示す。 Figure 10 shows an example of the software structure of electronic device 100 according to one embodiment of the present application.

図１０に示すように、電子デバイス１００のソフトウェアシステムは、階層化アーキテクチャ、イベント駆動型アーキテクチャ、マイクロカーネルアーキテクチャ、マイクロサービスアーキテクチャ、又はクラウドアーキテクチャを使用し得る。以下では、一例を使用して、電子デバイス１００のソフトウェア構造について説明する。 As shown in FIG. 10, the software system of the electronic device 100 may use a layered architecture, an event-driven architecture, a microkernel architecture, a microservices architecture, or a cloud architecture. In the following, the software structure of the electronic device 100 is described using an example.

ソフトウェアは、階層化アーキテクチャを使用することによって層に分割され、各層は、明確な役割及びタスクを有する。これらの層は、ソフトウェアインターフェース接続を使用することによって互いに通信する。いくつかの実施形態では、電子デバイス１００のソフトウェア構造は、上から下へ、アプリケーション層、アプリケーションフレームワーク層、及びカーネル層という３つの層に分割される。 The software is divided into layers by using a layered architecture, with each layer having a distinct role and task. These layers communicate with each other by using software interface connections. In some embodiments, the software structure of electronic device 100 is divided into three layers from top to bottom: application layer, application framework layer, and kernel layer.

アプリケーション層は、一連のアプリケーションパッケージを含み得る。 The application layer may include a set of application packages.

図１０に示すように、アプリケーションパッケージは、カメラ、ギャラリー、カレンダー、電話、マップ、ナビゲーション、ＷＬＡＮ、Ｂｌｕｅｔｏｏｔｈ（登録商標）、音楽、ビデオ、及びメッセージなどのアプリケーションを含み得る。ビデオは、本出願の実施形態において言及されたビデオアプリケーションであり得る。 As shown in FIG. 10, the application package may include applications such as camera, gallery, calendar, phone, maps, navigation, WLAN, Bluetooth, music, video, and messaging. Video may be a video application as mentioned in the embodiments of this application.

アプリケーションフレームワーク層は、アプリケーション層におけるアプリケーションのためのアプリケーションプログラミングインターフェース（application programming interface、ＡＰＩ）及びプログラミングフレームワークを提供する。アプリケーションフレームワーク層は、いくつかの予め定義された機能を含む。 The application framework layer provides an application programming interface (API) and a programming framework for applications in the application layer. The application framework layer includes some predefined functionality.

図１０に示すように、アプリケーションフレームワーク層は、ウィンドウマネジャ、コンテンツプロバイダ、ビューシステム、電話マネジャ、リソースマネジャ、通知マネジャ、ビデオ処理システムなどを含み得る。 As shown in FIG. 10, the application framework layer may include a window manager, a content provider, a view system, a telephone manager, a resource manager, a notification manager, a video processing system, etc.

ウィンドウマネジャはウィンドウプログラムを管理するように構成される。ウィンドウマネジャは、表示サイズを取得し、ステータスバーがあるかどうかを決定し、画面をロックし、スクリーンショットを撮影するなどし得る。 A window manager is configured to manage window programs. The window manager may get the display size, determine if there is a status bar, lock the screen, take screenshots, etc.

コンテンツプロバイダは、データを記憶及び取得し、アプリケーションがこれらのデータにアクセスできるようにするように構成される。データは、ビデオ、画像、音声、発呼及び着呼、閲覧履歴及び閲覧ブックマーク、アドレス帳などを含み得る。 Content providers are configured to store and retrieve data and make this data accessible to applications. Data can include video, images, audio, calls made and received, browsing history and bookmarks, address books, etc.

ビューシステムは、テキストを表示するためのコントロール、ピクチャを表示するためのコントロールなどの視覚的コントロールを含む。ビューシステムは、アプリケーションを構築するように構成され得る。ディスプレイインターフェースは、１つ又は複数のビューを含み得る。例えば、メッセージ通知アイコンを含むディスプレイインターフェースは、テキストを表示するためのビューと、ピクチャを表示するためのビューとを含み得る。 The view system includes visual controls, such as a control for displaying text and a control for displaying pictures. The view system can be configured to build applications. A display interface can include one or more views. For example, a display interface that includes a message notification icon can include a view for displaying text and a view for displaying pictures.

電話マネジャは、電子デバイス１００の通信機能、例えば、通話ステータス（応答、拒否などを含む）の管理を提供するように構成される。 The phone manager is configured to provide management of communication functions of the electronic device 100, such as call status (including answering, rejecting, etc.).

リソースマネジャは、ローカライズされた文字列、アイコン、画像、レイアウトファイル、及びビデオファイルなどのアプリケーション用のリソースを提供する。 The resource manager provides resources for an application, such as localized strings, icons, images, layout files, and video files.

通知マネジャは、アプリケーションがステータスバーに通知情報を表示することを可能にし、通知メッセージを伝達するように構成され得る。通知マネジャは、ユーザ対話を必要とすることなく、短い休止の後に自動的に消え得る。例えば、通知マネジャは、ダウンロード完了を通知すること、メッセージリマインドを提供することなどを行うように構成される。通知マネジャは、代替的に、バックグラウンドで実行されているアプリケーションについての通知、又はダイアログウィンドウの形態で画面上に現れる通知など、グラフ又はスクロールバーテキストの形態でシステムの上部のステータスバーに現れる通知であり得る。例えば、ステータスバーに文字情報が表示されたり、アナウンスが行われたり、電子デバイスが振動したり、表示灯が点滅したりする。 The notification manager may be configured to allow applications to display notification information in the status bar and to communicate notification messages. The notification manager may disappear automatically after a short pause without requiring user interaction. For example, the notification manager may be configured to notify of download completion, provide message reminders, etc. The notification manager may alternatively be a notification that appears in the status bar at the top of the system in the form of a graph or scrollbar text, such as a notification about an application running in the background, or a notification that appears on the screen in the form of a dialog window. For example, textual information may be displayed in the status bar, an announcement may be made, the electronic device may vibrate, an indicator light may flash, etc.

ビデオ処理システムは、本出願の実施形態において提供されるキャプション表示方法を実行するように構成され得る。ビデオ処理システムは、キャプション復号モジュールと、ビデオフレーム色域解釈モジュールと、ビデオフレーム合成モジュールと、ビデオフレームキューと、ビデオレンダリングモジュールとを含み得る。各モジュールの具体的な機能については、前述の実施形態における関連する内容を参照されたい。詳細はここでは改めて説明しない。 The video processing system may be configured to execute the caption display method provided in the embodiments of the present application. The video processing system may include a caption decoding module, a video frame gamut interpretation module, a video frame synthesis module, a video frame queue, and a video rendering module. For specific functions of each module, please refer to the relevant contents in the above-mentioned embodiments. Details will not be described again here.

カーネル層は、ハードウェアとソフトウェアとの間の層である。カーネル層は、少なくともディスプレイドライバ、カメラドライバ、Ｂｌｕｅｔｏｏｔｈ（登録商標）ドライバ、及びセンサドライバを含む。 The kernel layer is a layer between the hardware and the software. The kernel layer includes at least a display driver, a camera driver, a Bluetooth driver, and a sensor driver.

以下では、撮影シナリオを参照して、電子デバイス１００のソフトウェア及びハードウェアの動作プロセスの一例について説明する。 Below, an example of the operating process of the software and hardware of the electronic device 100 is described with reference to a shooting scenario.

タッチセンサ１８０Ｋがタッチ操作を受信すると、対応するハードウェア割り込みがカーネル層に送信される。カーネル層は、タッチ操作を元の入力イベント（タッチ操作のタッチ座標及びタイムスタンプなどの情報を含む）に処理する。元の入力イベントはカーネル層に記憶される。アプリケーションフレームワーク層は、カーネル層から元の入力イベントを取得し、入力イベントに対応するコントロールを識別する。例えば、タッチ操作はシングルタップ操作であり、シングルタップ操作に対応するコントロールは、カメラアプリケーションアイコンのコントロールである。カメラアプリケーションは、カメラアプリケーションを開始するために、アプリケーションフレームワーク層でインターフェースを呼び出す。そして、カーネル層を呼び出してカメラドライバを起動し、カメラ１９３を使用して静止画像又はビデオを撮影する。 When the touch sensor 180K receives a touch operation, a corresponding hardware interrupt is sent to the kernel layer. The kernel layer processes the touch operation into an original input event (including information such as the touch coordinates and timestamp of the touch operation). The original input event is stored in the kernel layer. The application framework layer obtains the original input event from the kernel layer and identifies a control corresponding to the input event. For example, the touch operation is a single tap operation, and the control corresponding to the single tap operation is a control of a camera application icon. The camera application calls an interface in the application framework layer to start the camera application. Then, it calls the kernel layer to start the camera driver and use the camera 193 to take still images or videos.

以下では、本出願の一実施形態による、別の電子デバイス１００の構造について説明する。 The following describes the structure of another electronic device 100 according to one embodiment of the present application.

図１１は、本出願の一実施形態による、別の電子デバイス１００の構成の一例を示す。 Figure 11 shows an example of a configuration of another electronic device 100 according to one embodiment of the present application.

図１１に示すように、電子デバイス１００は、ビデオアプリケーション１１００とビデオ処理システム１１１０とを含み得る。 As shown in FIG. 11, the electronic device 100 may include a video application 1100 and a video processing system 1110.

ビデオアプリケーション１１００は、電子デバイス１００にインストールされたシステムアプリケーション（例えば、図２Ａに示される「ビデオ」アプリケーション）であり得るか、又は第三者によって提供され、電子デバイス１００にインストールされ、ビデオ再生機能を有するアプリケーションであり得る。ビデオアプリケーション１１００は、主にビデオを再生するように構成される。 Video application 1100 may be a system application installed on electronic device 100 (e.g., the "Video" application shown in FIG. 2A) or may be an application provided by a third party and installed on electronic device 100 and having video playback capabilities. Video application 1100 is primarily configured to play videos.

ビデオ処理システム１１１０は、ビデオ復号モジュール１１１１と、キャプション復号モジュール１１１２と、ビデオフレーム色域解釈モジュール１１１３と、ビデオフレーム合成モジュール１１１４と、ビデオフレームキュー１１１５と、ビデオレンダリングモジュール１１１６とを含み得る。 The video processing system 1110 may include a video decoding module 1111, a caption decoding module 1112, a video frame gamut interpretation module 1113, a video frame synthesis module 1114, a video frame queue 1115, and a video rendering module 1116.

ビデオ復号モジュール１１１１は、ビデオアプリケーション１１００によって送信されたビデオ情報ストリームを受信し、ビデオ情報ストリームを復号してビデオフレームを生成し得る。 The video decoding module 1111 may receive the video information stream transmitted by the video application 1100 and decode the video information stream to generate video frames.

キャプション復号モジュール１１１２は、ビデオアプリケーション１１００によって送信されたキャプション情報ストリームを受信し、キャプション情報ストリームを復号してキャプションフレームを生成し、ビデオフレーム色域解釈モジュール１１１３によって送信されたマスクパラメータに基づいて、マスクを有するキャプションフレームを送信して、キャプション認識度を高め得る。 The caption decoding module 1112 may receive a caption information stream transmitted by the video application 1100, decode the caption information stream to generate caption frames, and transmit the caption frames with a mask based on the mask parameters transmitted by the video frame gamut interpretation module 1113 to enhance caption recognition.

ビデオフレーム色域解釈モジュール１１１３は、キャプション認識度を分析してキャプション認識度分析結果を生成し、キャプション認識度分析結果に基づいて、キャプションに対応するマスクパラメータ（マスクの色値及び透明度）を計算し得る。 The video frame color gamut interpretation module 1113 may analyze the caption recognizability to generate a caption recognizability analysis result, and calculate mask parameters (mask color value and transparency) corresponding to the caption based on the caption recognizability analysis result.

ビデオフレーム合成モジュール１１１４は、ビデオフレームとキャプションフレームとを重ね合わせて結合し、表示されるべきビデオフレームを生成し得る。 The video frame synthesis module 1114 may overlay and combine the video frames and caption frames to generate the video frames to be displayed.

ビデオフレームキュー１１１５は、ビデオフレーム合成モジュール１１１４によって送信された表示されるべきビデオフレームを記憶し得る。 The video frame queue 1115 may store the video frames to be displayed that are sent by the video frame composition module 1114.

ビデオレンダリングモジュール１１１６は、表示されるべきビデオフレームを時系列に基づいてレンダリングして、レンダリングされたビデオフレームを生成し、レンダリングされたビデオフレームをビデオ再生のためにビデオアプリケーション１１００に送信し得る。 The video rendering module 1116 may render the video frames to be displayed based on a time sequence to generate rendered video frames and send the rendered video frames to the video application 1100 for video playback.

電子デバイス１００の機能及び作動原理に関する更なる詳細については、前述の実施形態における関連する内容を参照されたい。詳細はここでは改めて説明しない。 For further details regarding the function and working principle of the electronic device 100, please refer to the relevant contents in the above-mentioned embodiment. The details will not be described again here.

図１１に示される電子デバイス１００は一例にすぎず、電子デバイス１００は、図１１に示される構成要素よりも多い又は少ない構成要素を有してもよいし、２つ以上の構成要素を組み合わせてもよいし、異なる構成要素構成を有してもよいことを理解されたい。図１１に示される様々な構成要素は、ハードウェア、ソフトウェア、又はハードウェアとソフトウェアの組合せで実装され得る。 It should be understood that the electronic device 100 shown in FIG. 11 is only an example, and that the electronic device 100 may have more or fewer components than those shown in FIG. 11, may combine two or more components, or may have a different component configuration. The various components shown in FIG. 11 may be implemented in hardware, software, or a combination of hardware and software.

前述のモジュールは、機能によって分割され得る。実際の製品では、モジュールは、同じソフトウェアモジュールによって実行される異なる機能であり得る。 The aforementioned modules may be divided by functionality. In an actual product, the modules may be different functions performed by the same software module.

図１２は、本出願の一実施形態による、別の電子デバイス１００の構成の一例を示す。 Figure 12 shows an example of a configuration of another electronic device 100 according to one embodiment of the present application.

図１２に示すように、電子デバイス１００は、ビデオアプリケーション１２００を含み得る。ビデオアプリケーション１２００は、ビデオ復号モジュール１２１１、キャプション復号モジュール１２１２、ビデオフレーム色域解釈モジュール１２１３、ビデオフレーム合成モジュール１２１４、ビデオフレームキュー１２１５、及びビデオレンダリングモジュール１２１６を含み得る。 As shown in FIG. 12, the electronic device 100 may include a video application 1200. The video application 1200 may include a video decoding module 1211, a caption decoding module 1212, a video frame gamut interpretation module 1213, a video frame composition module 1214, a video frame queue 1215, and a video rendering module 1216.

ビデオアプリケーション１２００は、電子デバイス１００にインストールされたシステムアプリケーション（例えば、図２Ａに示される「ビデオ」アプリケーション）であり得るか、又は第三者によって提供され、電子デバイス１００にインストールされ、ビデオ再生機能を有するアプリケーションであり得る。ビデオアプリケーション１２００は、主にビデオを再生するように構成される。 Video application 1200 may be a system application installed on electronic device 100 (e.g., the "Video" application shown in FIG. 2A) or may be an application provided by a third party and installed on electronic device 100 and having video playback capabilities. Video application 1200 is primarily configured to play videos.

取得及び表示モジュール１２１０は、ビデオ情報ストリーム及びキャプション情報ストリームを取得し、ビデオレンダリングモジュール１２１６などによって送信されたレンダリングされたビデオフレームを表示し得る。 The acquisition and display module 1210 may acquire the video information stream and the caption information stream and display rendered video frames transmitted by the video rendering module 1216, etc.

ビデオ復号モジュール１２１１は、取得及び表示モジュール１２１０によって送信されたビデオ情報ストリームを受信し、ビデオ情報ストリームを復号してビデオフレームを生成し得る。 The video decoding module 1211 may receive the video information stream transmitted by the acquisition and display module 1210 and decode the video information stream to generate video frames.

キャプション復号モジュール１２１２は、取得及び表示モジュール１２１０によって送信されたキャプション情報ストリームを受信し、キャプション情報ストリームを復号してキャプションフレームを生成し、ビデオフレーム色域解釈モジュール１２１３によって送信されたマスクパラメータに基づいて、マスクを有するキャプションフレームを生成して、キャプション認識度を高め得る。 The caption decoding module 1212 receives the caption information stream transmitted by the acquisition and display module 1210, decodes the caption information stream to generate caption frames, and may generate caption frames with a mask based on the mask parameters transmitted by the video frame gamut interpretation module 1213 to enhance caption recognition.

ビデオフレーム色域解釈モジュール１２１３は、キャプション認識度を分析してキャプション認識度分析結果を生成し、キャプション認識度分析結果に基づいて、キャプションに対応するマスクパラメータ（マスクの色値及び透明度）を計算し得る。 The video frame color gamut interpretation module 1213 may analyze the caption recognizability to generate a caption recognizability analysis result, and calculate mask parameters (mask color value and transparency) corresponding to the caption based on the caption recognizability analysis result.

ビデオフレーム合成モジュール１２１４は、ビデオフレームとキャプションフレームとを重ね合わせて結合し、表示されるべきビデオフレームを生成し得る。 The video frame synthesis module 1214 may overlay and combine the video frames and caption frames to generate the video frames to be displayed.

ビデオフレームキュー１２１５は、ビデオフレーム合成モジュール１２１４によって送信された表示されるべきビデオフレームを記憶し得る。 The video frame queue 1215 may store the video frames to be displayed that are sent by the video frame composition module 1214.

ビデオレンダリングモジュール１２１６は、時系列に基づいて表示されるべきビデオフレームをレンダリングして、レンダリングされたビデオフレームを生成し、レンダリングされたビデオフレームを、ビデオ再生のために取得及び表示モジュール１２１０に送信し得る。 The video rendering module 1216 may render the video frames to be displayed based on a time sequence to generate rendered video frames and transmit the rendered video frames to the acquisition and display module 1210 for video playback.

図１２に示される電子デバイス１００は一例にすぎず、電子デバイス１００は、図１２に示される構成要素よりも多い又は少ない構成要素を有してもよいし、２つ以上の構成要素を組み合わせてもよいし、異なる構成要素構成を有してもよいことを理解されたい。図１２に示される様々な構成要素は、ハードウェア、ソフトウェア、又はハードウェアとソフトウェアの組合せで実装され得る。 It should be understood that the electronic device 100 shown in FIG. 12 is only one example, and that the electronic device 100 may have more or fewer components than those shown in FIG. 12, may combine two or more components, or may have a different component configuration. The various components shown in FIG. 12 may be implemented in hardware, software, or a combination of hardware and software.

前述の実施形態は、本出願の技術的解決策を説明することを意図しているにすぎず、本出願を限定することを意図していない。本出願は、前述の実施形態を参照して詳細に説明されているが、当業者は、本出願の実施形態の技術的解決策の範囲から逸脱することなく、前述の実施形態において説明された技術的解決策を更に修正し、又はそのいくつかの技術的特徴に対して同等の置換を行うことができることを理解すべきである。 The above embodiments are only intended to describe the technical solutions of the present application, and are not intended to limit the present application. Although the present application has been described in detail with reference to the above embodiments, it should be understood that those skilled in the art can further modify the technical solutions described in the above embodiments, or make equivalent substitutions for some technical features thereof, without departing from the scope of the technical solutions of the embodiments of the present application.

Claims

A caption display method, comprising:
the electronic device playing the first video;
When the electronic device displays a first interface, the first interface includes a first picture and a first caption, the first caption is displayed on a first region of the first picture in a floating manner that overlaps the first caption on the first picture by using a first mask as a background, the first region is a region in the first picture that corresponds to a display position of the first caption, and a difference value between a color value of the first caption and a color value of the first region is a first value;
When the electronic device displays a second interface different from the first interface after displaying the first interface, the second interface includes a second picture and the first caption, no mask is displayed for the first caption, the first caption is displayed on a second region of the second picture in a floating manner that overlays the first caption on the second picture , the second region is a region in the second picture that corresponds to the display position of the first caption, a difference value between the color value of the first caption and the color value of the second region is a second value, the second value is greater than the first value;
the first picture is one picture in the first video and the second picture is another picture in the first video;
a difference value between the color value of the first mask and the color value of the first caption is greater than the first value;
method.

Before the electronic device displays the first interface, the method further comprises:
The electronic device obtains a first video file and a first caption file, the first video file and the first caption file carrying the same time information;
generating, by the electronic device, a first video frame based on the first video file, the first video frame being used to generate the first picture;
the electronic device generating a first caption frame based on the first caption file and obtaining the color value and the display position of the first caption from the first caption frame, wherein time information carried in the first caption frame is the same as time information carried in the first video frame;
determining, by the electronic device, the first region based on the display position of the first caption;
generating, by the electronic device, the first mask based on the color values of the first caption or the color values of the first region;
2. The method of claim 1, further comprising: the electronic device overlaying the first caption on the first mask in the first caption frame to generate a second caption frame, and combining the second caption frame with the first video frame.

Prior to the step of the electronic device generating the first mask based on the color values of the first caption or the color values of the first region, the method further comprises:
The method of claim 2 , further comprising: the electronic device determining that the first value is less than a first threshold value.

Specifically, the step of the electronic device determining that the first value is less than a first threshold value includes:
the electronic device dividing the first region into N first sub-regions, where N is a positive integer;
and determining, by the electronic device, based on the color value of the first caption and the color values of the N first sub-regions, that the first value is less than the first threshold.

Specifically, the step of generating the first mask based on the color value of the first caption or the color value of the first region by the electronic device includes:
determining, by the electronic device, a color value of the first mask based on the color value of the first caption or the color values of the N first sub-regions;
and the electronic device generating the first mask based on the color values of the first mask.

Specifically, the step of the electronic device determining that the first value is less than a first threshold value includes:
the electronic device dividing the first region into N first sub-regions, where N is a positive integer;
determining whether to combine adjacent first sub-regions into a second sub-region based on a difference value between color values of the adjacent first sub-regions;
when the difference value between the color values of the adjacent first sub-regions is less than a second threshold, the electronic device merges the adjacent first sub-regions into the second sub-region;
and determining, by the electronic device, based on the color value of the first caption and the color value of the second sub-region, that the first value is less than the first threshold.

The method of claim 6, wherein the first region includes M second subregions, where M is a positive integer and is less than or equal to N, the second subregion includes one or more first subregions, and the number of first subregions included in each second subregion is the same as or different from the number of first subregions included in another second subregion.

Specifically, the step of generating the first mask based on the color value of the first caption or the color value of the first region by the electronic device includes:
The electronic device sequentially calculates color values of M first sub-masks, which are masks corresponding to the M second sub-regions, based on the color value of the first caption or the color values of the M second sub-regions;
generating, by the electronic device, M first sub-masks based on the color values of the M first sub-masks, the M first sub-masks being combined into the first mask ;
The method of claim 7, comprising:

The method comprises:
When the electronic device displays a third interface, the third interface includes a third picture and the first caption, the first caption includes at least a first portion and a second portion, a second submask is displayed for the first portion, a third submask is displayed or the third submask is not displayed for the second portion, and color values of the second submask are different from color values of the third submask.
The method of claim 1 further comprising:

The method of claim 1, wherein the display position of the first mask is determined based on the display position of the first caption.

The method of claim 1, wherein in the first picture and the second picture, the display position of the first caption relative to a display screen of the electronic device is not fixed or is fixed, and the first caption is a segment of a character or symbol that is displayed continuously.

Before the electronic device displays the first interface, the method further comprises:
The method of claim 1 , further comprising: the electronic device setting the transparency of the first mask to less than 100%.

Before the electronic device displays the second interface, the method further comprises:
a step of generating a second mask based on the color values of the first caption or the color values of the second region by the electronic device, and overlaying the first caption on the second mask, the color values of the second mask being preset color values and the transparency of the second mask being 100%;
Or,
The method of claim 1 , further comprising: the electronic device skipping generation of the second mask.

14. An electronic device comprising one or more processors and one or more memories coupled to the one or more processors, the one or more memories configured to store computer program code comprising computer instructions, the one or more processors, when executed by the one or more processors, enabling the electronic device to perform a method according to any one of claims 1 to 13 .

A computer program comprising program instructions which, when executed on an electronic device, enable the electronic device to carry out the method according to any one of claims 1 to 13 .