JP7789607B2

JP7789607B2 - Broadcasting system, receiver, receiving method, and program

Info

Publication number: JP7789607B2
Application number: JP2022052461A
Authority: JP
Inventors: 智夫西垣; 秀樹鈴木
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2022-03-28
Filing date: 2022-03-28
Publication date: 2025-12-22
Anticipated expiration: 2042-03-28
Also published as: JP2026035674A; JP2023145144A

Description

本発明は、放送システム、受信機、受信方法、及びプログラムに関する。 The present invention relates to a broadcasting system, a receiver, a receiving method, and a program.

放送において、ＭＰＥＧ－Ｈオーディオ等のオブジェクトベースの音声信号を用いることが検討されている。
特許文献１には、音声オブジェクトの音声信号（オブジェクトベースの音声信号）を優先信号とし、優先して再生することが記載されている。 In broadcasting, the use of object-based audio signals such as MPEG-H audio is being considered.
Patent Document 1 describes that an audio signal of an audio object (an object-based audio signal) is treated as a priority signal and is played back with priority.

特開２０２１－１２４７１９号公報Japanese Patent Application Laid-Open No. 2021-124719

特許文献１では、音声符号装置がオブジェクトベースの音声信号を符号化し、そのビットストリームを対象にして、音声復号装置が復号処理を行う。
しかしながら、放送を受信する受信機には、オブジェクトベースの音声信号を処理する能力を有しないものも存在する。受信機の能力に応じて、適切な音声再生を行うことができることが望まれる。 In Patent Document 1, an audio encoding device encodes an object-based audio signal, and an audio decoding device performs decoding processing on the resulting bitstream.
However, some receivers that receive broadcasts do not have the capability to process object-based audio signals, so it is desirable to be able to perform appropriate audio reproduction depending on the capabilities of the receiver.

本発明は上記の点に鑑みてなされたものであり、受信機の能力に応じて、適切な音声再生を行うことができる放送システム、受信機、受信方法、及びプログラムを提供する。 The present invention has been made in consideration of the above points, and provides a broadcasting system, receiver, receiving method, and program that can perform appropriate audio playback according to the receiver's capabilities.

（１）本発明は上記の課題を解決するためになされたものであり、本発明の一態様は、ＭＰＥＧ－Ｈオーディオを音声コンポーネントに含む放送を行う放送システムであって、受信機が、前記放送の放送波から、ＭＰＥＧ－Ｈオーディオが存在するか否かを示す識別情報を、多重化レイヤーで取得する分離部と、前記識別情報に基づいて、受信機の能力に応じた音声コンポーネントを選択する制御部と、選択された音声コンポーネントの音声データを復号化する復号化部と、を備える、放送システムである。 (1) The present invention has been made to solve the above-mentioned problems. One aspect of the present invention is a broadcasting system that broadcasts data that includes MPEG-H audio as an audio component, the broadcasting system comprising a receiver that includes a separation unit that acquires, at a multiplexing layer, identification information indicating whether MPEG-H audio is present from the airwaves of the broadcast, a control unit that selects an audio component according to the capabilities of the receiver based on the identification information, and a decoding unit that decodes the audio data of the selected audio component.

（２）また、本発明の一態様は、ＭＰＥＧ－Ｈオーディオを音声コンポーネントに含む放送を受信する受信部と、前記放送の放送波から、ＭＰＥＧ－Ｈオーディオが存在するか否かを示す識別情報を、多重化レイヤーで取得する分離部と、前記識別情報に基づいて、受信機の能力に応じた音声コンポーネントを選択する制御部と、選択された音声コンポーネントの音声データを復号化する復号化部と、を備える、受信機である。 (2) Another aspect of the present invention is a receiver comprising: a receiving unit that receives a broadcast that includes MPEG-H audio in its audio component; a separating unit that acquires, at a multiplexing layer, identification information indicating whether MPEG-H audio is present from the airwaves of the broadcast; a control unit that selects an audio component according to the capabilities of the receiver based on the identification information; and a decoding unit that decodes the audio data of the selected audio component.

（３）また、本発明の一態様は、受信機における受信方法であって、ＭＰＥＧ－Ｈオーディオを音声コンポーネントに含む放送を受信し、前記放送の放送波から、ＭＰＥＧ－Ｈオーディオが存在するか否かを示す識別情報を、多重化レイヤーで取得し、前記識別情報に基づいて、受信機の能力に応じた音声コンポーネントを選択し、選択された音声コンポーネントの音声データを復号化する受信方法である。 (3) Another aspect of the present invention is a receiving method in a receiver that receives a broadcast that includes MPEG-H audio in its audio component, acquires identification information indicating whether MPEG-H audio is present or not from the broadcast wave at a multiplexing layer, selects an audio component according to the receiver's capabilities based on the identification information, and decodes the audio data of the selected audio component.

（４）また、本発明の一態様は、ＭＰＥＧ－Ｈオーディオを音声コンポーネントに含む放送を受信する受信部と、前記放送の放送波から、ＭＰＥＧ－Ｈオーディオが存在するか否かを示す識別情報を、多重化レイヤーで取得する取得部と、を備える受信機のコンピュータに、前記識別情報に基づいて、受信機の能力に応じた音声コンポーネントを選択する選択手順を実行させ、選択された音声コンポーネントの音声データを復号化させる、を実行させるためのプログラムである。 (4) Another aspect of the present invention is a program for causing a receiver computer, which includes a receiving unit that receives broadcasts that include MPEG-H audio in the audio component and an acquisition unit that acquires, at a multiplexing layer, identification information indicating whether MPEG-H audio is present from the airwaves of the broadcast, to execute a selection procedure that selects an audio component according to the receiver's capabilities based on the identification information, and decodes the audio data of the selected audio component.

本発明によれば、受信機の能力に応じて、適切な音声再生を行うことができる。 This invention allows for appropriate audio playback depending on the receiver's capabilities.

本発明の実施形態に係る放送システムＳｙｓの構成の一例を示す図である。1 is a diagram illustrating an example of the configuration of a broadcasting system Sys according to an embodiment of the present invention. 本実施形態に係る放送システムの一例を示す図である。1 is a diagram illustrating an example of a broadcasting system according to an embodiment of the present invention. 本実施形態に係る放送システムの比較例を示す図である。FIG. 10 is a diagram illustrating a comparative example of a broadcasting system according to the present embodiment. 本実施形態に係る放送システムＳｙｓの別の一例を示す図である。FIG. 2 is a diagram illustrating another example of the broadcasting system Sys according to the present embodiment. 本実施形態に係る放送システムＳｙｓの概略を説明する説明図である。1 is an explanatory diagram illustrating an overview of a broadcasting system Sys according to an embodiment of the present invention. 本実施形態に係るプロトコルスタックの構造の一例を示す図である。FIG. 2 is a diagram illustrating an example of a protocol stack structure according to the present embodiment. 本実施形態に係るＭＰＴのデータ構造を示す図である。FIG. 2 is a diagram showing the data structure of an MPT according to the present embodiment. 本実施形態に係る受信機のハードウェア構成を示す概略図である。FIG. 2 is a schematic diagram illustrating a hardware configuration of a receiver according to the present embodiment. 本実施形態に係る音声モードの一覧の一例を表す図である。FIG. 4 is a diagram illustrating an example of a list of audio modes according to the embodiment. 本実施形態に係る受信機内の信号処理の流れの一例を表す概略図である。FIG. 2 is a schematic diagram illustrating an example of a flow of signal processing in a receiver according to the present embodiment. 本実施形態に係る音声切替メニューの一例を示す図である。FIG. 4 is a diagram showing an example of an audio switching menu according to the embodiment. 本実施形態に係るＭＨ－音声コンポーネント記述子の構造の一例を示す概略図である。A schematic diagram showing an example of the structure of an MH-speech component descriptor in this embodiment. 本実施形態に係る送出運用規則の一例を表す概略図である。FIG. 2 is a schematic diagram illustrating an example of a transmission operation rule according to the present embodiment. 本実施形態に係る受信処理規準の一例を表す概略図である。FIG. 2 is a schematic diagram illustrating an example of a reception processing standard according to the present embodiment. 本実施形態に係る切替えの詳細例を表すフローチャートである。10 is a flowchart illustrating a detailed example of switching according to the present embodiment. 本実施形態に係る切替えの別の詳細例を表すフローチャートである。10 is a flowchart illustrating another detailed example of switching according to the present embodiment. 本変形例に係るＭＨ－ＭＰＥＧ－Ｈオーディオ記述子の構造の一例を示す概略図である。FIG. 10 is a schematic diagram showing an example of the structure of an MH-MPEG-H audio descriptor according to this modified example. 本変形例に係る送出運用規則の一例を表す概略図である。FIG. 10 is a schematic diagram illustrating an example of a transmission operation rule according to the present modification. 本変形例に係る受信処理規準の一例を表す概略図である。FIG. 10 is a schematic diagram illustrating an example of a reception processing standard according to the present modification. 本実施形態に係るサイマル音声の運用の一例を示す概略図である。FIG. 1 is a schematic diagram showing an example of simultaneous audio operation according to the present embodiment. 本実施形態に係るサイマル音声の運用の別の一例を示す概略図である。FIG. 10 is a schematic diagram showing another example of simultaneous audio operation according to the present embodiment.

以下、図面を参照しながら本発明の実施形態について詳しく説明する。 Embodiments of the present invention will be described in detail below with reference to the drawings.

［システム構成］
図１は、本発明の実施形態に係る放送システムＳｙｓの構成の一例を示す図である。
放送システムＳｙｓは、放送局の放送装置１（「放送局１」という）、中継局Ｓａ、受信機２、放送局サーバ３、及び事業者サーバ４を具備する。放送は、例えば地上デジタル放送であるが、例えば高度ＢＳ（ＢｒｏａｄｃａｓｔｉｎｇＳａｔｅｌｌｉｔｅｓ）デジタル放送又は高度広帯域ＣＳ（ＣｏｍｍｕｎｉｃａｔｉｏｎＳａｔｅｌｌｉｔｅｓ）デジタル放送であってもよい。また本発明は、これらの放送に限られず、放送は、中継局Ｓａを用いない放送であってもよい。また放送は、ケーブルテレビ等の有線放送であってもよい。中継局Ｓａは、例えば、デジタル中継局であるが、放送衛星であってもよい。 [System configuration]
FIG. 1 is a diagram showing an example of the configuration of a broadcasting system Sys according to an embodiment of the present invention.
The broadcasting system Sys includes a broadcasting device 1 of a broadcasting station (referred to as "broadcasting station 1"), a relay station Sa, a receiver 2, a broadcasting station server 3, and a carrier server 4. The broadcasting is, for example, terrestrial digital broadcasting, but may also be, for example, advanced BS (Broadcasting Satellites) digital broadcasting or advanced wideband CS (Communication Satellites) digital broadcasting. The present invention is not limited to these broadcasting methods, and the broadcasting may also be broadcasting that does not use a relay station Sa. The broadcasting may also be wired broadcasting such as cable television. The relay station Sa is, for example, a digital relay station, but may also be a broadcasting satellite.

放送システムＳｙｓでは、放送局の放送局１から放送波によってデジタル放送信号、アプリケーション制御情報、提示に関する制御情報などを送出する。サービス事業者は、事業者サーバ４から、番組に関連するメタデータや動画コンテンツ等を提供する。
アプリケーション制御情報は、番組と連動するアプリケーション等を本システム対応受信機に周知するとともに起動・終了のためのコマンド、制御情報を送るものである。
提示に関する制御情報は、アプリケーションと放送番組の同一ＴＶ画面上での重ね合わせやアプリケーションの提示の可否に関する制御情報を送るものである。
放送局は、放送システムＳｙｓにおいて、放送局サーバ３を運営する。放送局サーバ３は、番組タイトル、番組ＩＤ、番組概要、出演者、放送日時などのメタデータの提供を行う。放送局がサービス事業者に提供する情報は、放送局サーバ３が備えるＡＰＩ（ＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍｍｉｎｇＩｎｔｅｒｆａｃｅ）を通して提供される。 In the broadcasting system Sys, a broadcasting station 1 transmits digital broadcast signals, application control information, presentation control information, etc. via broadcast waves. A service provider provides metadata and video content related to programs from a provider server 4.
The application control information notifies receivers compatible with this system of applications linked to programs, and also sends commands and control information for starting and stopping them.
The control information regarding presentation is transmitted as control information regarding whether or not an application can be presented, and whether or not an application and a broadcast program can be superimposed on the same TV screen.
The broadcast station operates a broadcast station server 3 in the broadcast system Sys. The broadcast station server 3 provides metadata such as program titles, program IDs, program summaries, performers, and broadcast dates and times. The information provided by the broadcast station to the service provider is provided via an API (Application Programming Interface) provided in the broadcast station server 3.

サービス事業者は、放送システムＳｙｓによるサービスを提供する者であり、サービスを提供するためのコンテンツ、アプリケーションの制作・配信、個々のサービスを実現するための放送局サーバ３の運営を行う。ここで、サービスには、放送と通信を連携させる放送通信連携サービスが含まれる。
放送局サーバ３は、「アプリケーションの管理・配布」のため、受信機２に対してアプリケーションの送出を行う。放送局サーバ３は、「サービス毎のサーバ」として、個々のサービス（ＭＰＥＧ－Ｈサービス、ＶＯＤ番組レコメンドサービス、多言語字幕サービス等）を実現するためのサーバ機能を提供する。 The service provider is a party that provides services through the broadcasting system Sys, and produces and distributes content and applications for the services, and operates the broadcasting station server 3 to realize each individual service. Here, the services include broadcasting and communication integration services that integrate broadcasting and communication.
The broadcast station server 3 "manages and distributes applications" by sending them to the receiver 2. As a "server for each service," the broadcast station server 3 provides server functions for realizing individual services (MPEG-H service, VOD program recommendation service, multilingual subtitle service, etc.).

ＭＰＥＧ－Ｈは、デジタルコンテナ標準、動画圧縮標準、音声圧縮標準、そして2つの順応試験標準のため、ＩＳＯ／ＩＥＣＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ（ＭＰＥＧ）の開発下にある一連の標準である。ＭＰＥＧ－Ｈオーディオでは、例えば、オブジェクトベース音響が可能である。オブジェクトベース音響の「オブジェクト」とは、音楽や人の声などの番組を構成する音の素材一つ一つである。オブジェクトベース音響では、音の素材ごとに音声信号が記録され、素材ごとの音声制御が可能である。また、受信機２で再生する時に、素材の再生位置情報を基に、実際に置かれているスピーカーの位置に合わせて番組を再生することも可能である。 MPEG-H is a set of standards under development by the ISO/IEC Moving Picture Experts Group (MPEG) for digital container standards, video compression standards, audio compression standards, and two conformance testing standards. MPEG-H audio, for example, enables object-based audio. An "object" in object-based audio is each individual sound element that makes up a program, such as music or human voice. With object-based audio, an audio signal is recorded for each sound element, allowing for audio control for each element. Furthermore, when playing back on receiver 2, it is possible to play back a program based on the actual speaker position, using playback position information for the element.

放送局サーバ３は、こうしたサービスの機能面を実現するだけでなく、サービスを構成するコンテンツ（ＭＰＥＧ－Ｈオーディオデータ、ＶＯＤコンテンツ、字幕データなど）の送出も行う。放送局サーバ３は、「リポジトリ」として、放送システムＳｙｓのアプリケーションを配布するために登録し、受信機２からの問い合わせに応じて提供可能なアプリケーションの一覧の提供や検索を行う。 The broadcast station server 3 not only realizes the functional aspects of these services, but also transmits the content that makes up the services (MPEG-H audio data, VOD content, subtitle data, etc.). As a "repository," the broadcast station server 3 registers applications for the broadcast system Sys for distribution, and provides and searches for a list of available applications in response to inquiries from the receiver 2.

受信機２には、既存デジタル放送の受信機能に加えて、放送通信連携サービスを実現するための機能を備えたものも含まれる。受信機２には、ブロードバンドネットワーク接続機能に加え、次の機能を有している。
・放送からのアプリケーション制御信号に応じてアプリケーションを実行する機能
・放送と通信間の連携による提示を行う機能
・端末連携機能
ここで、端末には、例えば、スマートフォンやスマートスピーカー等のユーザー端末が含まれる。受信機２の端末連携機能は、他の端末の要求に応じて番組情報などの放送リソースにアクセスしたり、再生制御当の受信機機能を呼び出したりする。
また、アプリケーションの例としては、ＭＰＥＧ－Ｈオーディオのデジタルミキサーが含まれる。ユーザー（「受信者」ともいう）は、事業者サーバ４から受信したデジタルミキサーを用いて、音の素材ごとの音声信号に対して音の強弱又はエフェクト等の調整や音の素材間のバランスを調整できる。これらの調整は、スピーカーごとに行うこともできる。 The receiver 2 includes a receiver that has the function to realize broadcasting and communication cooperation services in addition to the function to receive existing digital broadcasts. In addition to the broadband network connection function, the receiver 2 has the following functions.
- A function to execute an application in response to an application control signal from broadcasting - A function to provide presentations through collaboration between broadcasting and communication - A terminal collaboration function Here, the terminal includes, for example, a user terminal such as a smartphone, a smart speaker, etc. The terminal collaboration function of the receiver 2 accesses broadcast resources such as program information in response to requests from other terminals, and invokes receiver functions such as playback control.
Another example of the application is a digital mixer for MPEG-H audio. A user (also called a "receiver") can use the digital mixer received from the provider's server 4 to adjust the volume or effects of the audio signals for each sound material, or to adjust the balance between the sound materials. These adjustments can also be made for each speaker.

より具体的には、受信機２は、次の機能を有する。
受信機２は、「放送受信再生」機能として、放送電波を受信し、特定の放送サービスを選局して、サービスを構成する映像、音声、字幕、データ放送を同期再生する機能を有する。
受信機２は、「通信コンテンツ受信再生」機能として、通信ネットワーク上のサーバ（例えば事業者サーバ４）に置かれた映像コンテンツにアクセスし、ＶＯＤストリーミングとして受信し、コンテンツを構成する映像、音声、字幕を同期再生する機能を有する。
受信機２は、「アプリケーション制御」機能として、通信ネットワーク上のサーバあるいは放送信号から取得したアプリケーション制御情報に基づき、主にマネージドアプリケーションに関してアプリケーションエンジンに対して働きかけ、アプリケーション単位のライフサイクル及びイベントの制御・管理を行う機能を有する。
受信機２は、「アプリケーションエンジン」機能として、アプリケーションを取得し、実行する機能を有する。この機能は、例えばＨＴＭＬ５ブラウザで実現される。
受信機２は、「提示同期制御」機能として、放送受信による映像・音声等のストリームと、ストリーミング受信による映像・音声等のストリーム提示同期を制御する機能を有する。
受信機２は、「アプリケーションロンチャー」機能として、主に放送外マネージドアプリケーションをユーザーが選択、起動するためのナビゲーション機能を有する。 More specifically, the receiver 2 has the following functions.
The receiver 2 has a "broadcast reception and playback" function that receives broadcast radio waves, selects a specific broadcast service, and synchronously plays back the video, audio, subtitles, and data broadcast that make up the service.
The receiver 2 has a "communication content reception and playback" function that accesses video content stored on a server on the communication network (e.g., the operator server 4), receives it as VOD streaming, and synchronizes and plays back the video, audio, and subtitles that make up the content.
As an "application control" function, receiver 2 has the ability to act on the application engine, mainly regarding managed applications, based on application control information obtained from a server on the communication network or from a broadcast signal, and to control and manage the life cycle and events on an application-by-application basis.
The receiver 2 has an "application engine" function that acquires and executes applications. This function is realized by, for example, an HTML5 browser.
The receiver 2 has a "presentation synchronization control" function that controls the stream presentation synchronization of video, audio, etc. received by broadcasting and video, audio, etc. received by streaming.
The receiver 2 has an "application launcher" function, which is a navigation function that allows the user to select and launch managed applications outside of broadcasting.

図２は、本実施形態に係る放送システムＳｙｓの一例を示す図である。
図２の受信機２は、ＭＰＥＧ－Ｈの音声に対応した受信機である。 FIG. 2 is a diagram showing an example of a broadcasting system Sys according to this embodiment.
The receiver 2 in FIG. 2 is a receiver compatible with MPEG-H audio.

放送局１は、映像信号と音声信号を多重化し、多重化された信号を送信する。多重化方式としてＭＭＴ（ＭＰＥＧＭｅｄｉａＴｒａｎｓｐｏｒｔ）・ＴＬＶ（ＴｙｐｅＬｅｎｇｔｈＶａｌｕｅ）が用いられる。
放送局１は、ＭＰＥＧ－４の音声（チャンネルベースの音声：例えば音声１２～音声１４）の信号とＭＰＥＧ－Ｈの音声（オブジェクトベースの音声：音声１１）の信号の両方を含む音声信号Ａ１１（「高度オーディオ信号」とも称する）を生成して送信する。
このように、放送局１は、ＭＰＥＧ－Ｈオーディオ信号と、ＭＰＥＧ－４オーディオ信号と、を並列して送出する。 The broadcasting station 1 multiplexes the video signal and the audio signal and transmits the multiplexed signal, using MMT (MPEG Media Transport) and TLV (Type Length Value) as the multiplexing method.
Broadcasting station 1 generates and transmits an audio signal A11 (also referred to as an "advanced audio signal") that includes both MPEG-4 audio (channel-based audio: e.g., audio 12 to audio 14) signals and MPEG-H audio (object-based audio: audio 11) signals.
In this way, the broadcasting station 1 transmits the MPEG-H audio signal and the MPEG-4 audio signal in parallel.

より具体的には、放送局１は、ＭＰＥＧ－Ｈの音声を音声コンポーネント（アセット）とし、この音声コンポーネントとＭＰＥＧ－４の音声の各音声コンポーネントとが多重化された高度オーディオ信号Ａ１１を、生成する。この多重化は、各音声コンポーネントの音声データ列が符号化された後に行われる。放送局１は、この高度オーディオ信号Ａ１１が多重化された放送波を送出する。
放送局１は、アセット情報を記述するテーブルにおいて、ＭＰＥＧ－Ｈオーディオ信号（オブジェクトベースの音声：音声１１）が存在するかどうかを記述子で記述する（図１２参照）。なお、ＭＰＥＧ－Ｈオーディオ信号が存在するかどうかを記述子は、高度オーディオ信号Ａ１１が存在するかどうかの記述子であってもよく、ＭＰＥＧ－Ｈオーディオ信号が含まれているかどうかを示してもよい。
ＭＭＴでは、映像や音声などのコンポーネントをアセットと定義する。 More specifically, the broadcasting station 1 uses MPEG-H audio as an audio component (asset) and generates an advanced audio signal A11 by multiplexing this audio component with each audio component of MPEG-4 audio. This multiplexing is performed after the audio data strings of each audio component are encoded. The broadcasting station 1 transmits a broadcast wave on which this advanced audio signal A11 is multiplexed.
In the asset information table, the broadcasting station 1 uses a descriptor to indicate whether an MPEG-H audio signal (object-based audio: audio 11) is present (see FIG. 12). The descriptor indicating whether an MPEG-H audio signal is present may be a descriptor indicating whether an advanced audio signal A11 is present, or may indicate whether an MPEG-H audio signal is included.
In MMT, components such as video and audio are defined as assets.

ＭＰＥＧ－Ｈの音声１１の一例は、レベル３として、チャンネル音声が最大１１．１ｃｈ、セリフ音声が日本語又は英語、解説音声が日本語の音声である。受信機２は、レベル３では、ＭＰＥＧ－Ｈの再生処理に加えて、同時に１６ｃｈの復号処理の能力が必要となる。ＭＰＥＧ－Ｈの音声１１の別の一例は、レベル４として、チャンネル音声が最大２２．２ｃｈ、セリフ音声が日本語又は英語、解説音声が日本語の信号である。受信機２は、レベル４では、ＭＰＥＧ－Ｈの再生処理に加えて、同時に２８ｃｈの復号処理の能力が必要となる。
ＭＰＥＧ－４の音声の一例は、音声１２が７．１ｃｈの日本語、音声１３がステレオの日本語、音声１４が７．１ｃｈの英語の音声である。音声１３は、音声１２のサイマル放送（同時並行放送）の音声である。 An example of MPEG-H audio 11 is a level 3 signal with a maximum of 11.1 channels, dialogue in Japanese or English, and commentary in Japanese. At level 3, the receiver 2 must be capable of simultaneously decoding 16 channels in addition to processing the MPEG-H signal. Another example of MPEG-H audio 11 is a level 4 signal with a maximum of 22.2 channels, dialogue in Japanese or English, and commentary in Japanese. At level 4, the receiver 2 must be capable of simultaneously decoding 28 channels in addition to processing the MPEG-H signal.
In an example of MPEG-4 audio, audio 12 is 7.1ch Japanese, audio 13 is stereo Japanese, and audio 14 is 7.1ch English. Audio 13 is a simulcast (simultaneous broadcast) of audio 12.

受信機２は、チューナー２１１、Ｄｅｍｕｘ（デマルチプレクサ）２２、セレクタ２３１、音声デコーダー２３２、ミキサー２３３、及び映像デコーダー２４１を含んで構成される。受信機２の詳細な構成については、後述する。 The receiver 2 includes a tuner 211, a Demux (demultiplexer) 22, a selector 231, an audio decoder 232, a mixer 233, and a video decoder 241. The detailed configuration of the receiver 2 will be described later.

チューナー２１１は、アンテナを介して放送波を受信し、ユーザー操作に基づいて選択されていているチャンネルに同調（選局）する。同調された信号は復調され、データとしてＤｅｍｕｘ２２へ入力される。
Ｄｅｍｕｘ２２は、入力されたデータを、映像データ列、音声データ列、文字スーパーデータ列、字幕データ列等に分離する。分離された音声データ列は、セレクタ２３１へ出力される。分離された映像データ列は、映像デコーダー２４１へ出力される。 The tuner 211 receives broadcast waves via an antenna and tunes (selects) to a channel selected by a user operation. The tuned signal is demodulated and input to the Demux 22 as data.
The Demux 22 separates the input data into a video data string, an audio data string, a superimposed text data string, a subtitle data string, etc. The separated audio data string is output to the selector 231. The separated video data string is output to the video decoder 241.

ここで、Ｄｅｍｕｘ２２は、音声データ列について、ＭＰＥＧ－Ｈの音声１１と、ＭＰＥＧ－４の各音声１２、１３、１４と、の各音声コンポーネントの音声データ列に分離する。より具体的には、Ｄｅｍｕｘ２２は、アセット情報を記述するテーブルにおいて、ＭＰＥＧ－Ｈオーディオ信号が存在するかどうかを記述子で判断する。Ｄｅｍｕｘ２２は、ＭＰＥＧ－Ｈオーディオ信号が存在すると判断した場合、受信機２がＭＰＥＧ－Ｈの音声デコード能力があるときには、高度オーディオ信号のデータから音声１１、音声１２、音声１３、及び音声１４を分離する。Ｄｅｍｕｘ２２は、ＭＰＥＧ－Ｈオーディオ信号が存在しないと判断した場合、音声１２、音声１３、及び音声１４だけを分離する。 Here, Demux 22 separates the audio data stream into audio data streams for each audio component: MPEG-H audio 11 and MPEG-4 audio 12, 13, and 14. More specifically, Demux 22 determines whether an MPEG-H audio signal is present using a descriptor in a table describing asset information. If Demux 22 determines that an MPEG-H audio signal is present, and if receiver 2 has MPEG-H audio decoding capabilities, it separates audio 11, audio 12, audio 13, and audio 14 from the advanced audio signal data. If Demux 22 determines that an MPEG-H audio signal is not present, it separates only audio 12, audio 13, and audio 14.

Ｄｅｍｕｘ２２から出力された各音声コンポーネントの音声データ列は、セレクタ２３１に入力される。セレクタ２３１は、ユーザー操作又は受信機２の能力に応じて、音声コンポーネントの音声データ列を選択する。受信機２の能力には、例えば、同時にデコードできるチャンネル数、又は再生できるスピーカーの種類や能力が含まれる。セレクタ２３１は、選択した音声データ列を音声デコーダー２３２へ出力する。
音声デコーダー２３２は、セレクタ２３１から入力された音声コンポーネントの音声データ列を復号化する。
ミキサー２３３は、音声デコーダー２３２に復号化された音声データ列がＭＰＥＧ－Ｈの音声データ列である場合には、音の素材ごとの音声を合成して、ダウンミックス処理を行う。ダウンミックス処理をされた音声データ列は、音声に変換されてスピーカーから出力される。音声デコーダー２３２に復号化された音声データ列がＭＰＥＧ－４の音声データ列である場合には、その音声データ列は、音声に変換されてスピーカーから出力される。つまり、ＭＰＥＧ－４の音声データ列に対しては、音の素材ごとの音声の合成や、ダウンミックス処理が行われない。 The audio data strings of each audio component output from the Demux 22 are input to the selector 231. The selector 231 selects the audio data strings of the audio components in accordance with user operations or the capabilities of the receiver 2. The capabilities of the receiver 2 include, for example, the number of channels that can be decoded simultaneously, or the type and capabilities of speakers that can be played back. The selector 231 outputs the selected audio data string to the audio decoder 232.
The audio decoder 232 decodes the audio data string of the audio component input from the selector 231 .
If the audio data sequence decoded by the audio decoder 232 is an MPEG-H audio data sequence, the mixer 233 synthesizes the audio for each sound material and performs downmixing. The downmixed audio data sequence is converted into audio and output from a speaker. If the audio data sequence decoded by the audio decoder 232 is an MPEG-4 audio data sequence, the audio data sequence is converted into audio and output from a speaker. In other words, audio synthesis for each sound material and downmixing are not performed on MPEG-4 audio data sequences.

Ｄｅｍｕｘ２２から出力された映像データ列は、映像デコーダー２４１に入力される。
映像デコーダー２４１は、入力された映像データ列を復号化する。復号化された映像データ列は、必要に応じた色空間変換処理が行われ、ディスプレイでの映像の表示に用いられる。なお、Ｄｅｍｕｘ２２に分離された文字スーパーデータ列及び字幕データ列は、それぞれ、文字スーパーデコーダー及び字幕デコーダー（不図示）で復号化され、復号かされた文字列は、映像に重畳される。 The video data stream output from the Demux 22 is input to the video decoder 241 .
The video decoder 241 decodes the input video data string. The decoded video data string undergoes color space conversion processing as necessary and is used to display the video on a display. The superimposed text data string and subtitle data string separated by the Demux 22 are decoded by a superimposed text decoder and subtitle decoder (not shown), respectively, and the decoded character strings are superimposed on the video.

以上のとおり、本実施形態に係る受信機２は、ＭＰＥＧ－Ｈオーディオ（音声）を音声コンポーネントに含む放送で、ＭＰＥＧ－Ｈオーディオ信号が存在することを示す情報を、多重化レイヤーで取得する。受信機２は、自装置の能力に応じた音声コンポーネントを選択するので、受信機２の能力に応じて、適切な音声再生を行うことができる。 As described above, the receiver 2 according to this embodiment obtains information indicating the presence of an MPEG-H audio signal at the multiplex layer in a broadcast that includes MPEG-H audio (sound) in the audio component. The receiver 2 selects an audio component according to its own capabilities, allowing appropriate audio playback according to the receiver 2's capabilities.

図３は、本実施形態に係る放送システムＳｙｓの比較例を示す図である。
この図は、放送局Ｃ１が、ＭＰＥＧ－Ｈの音声のみを送出する場合の一例である。この例では、ＭＰＥＧ－Ｈの音声は、ＭＰＥＧ－４の音声と多重化もされない。
この例は、ＭＰＥＧ－Ｈの音声は、唯一の音声コンポーネントとして運用されている例である。そのため、アセット情報を記述するテーブルにおいて、ＭＰＥＧ－Ｈオーディオ信号が存在するかどうかを記述子も含まれていない。この場合、ＤｅｍｕｘＣ２２は、ＭＰＥＧ－Ｈの音声を音声コンポーネント（音声の構成）として取得ができるが、処理可能かどうか（レベル等）が判断できない。ＤｅｍｕｘＣ２２から出力された音声データ列は、音声デコーダーＣ２３２にて復号され、ミキサーＣ２３３へ出力される。 FIG. 3 is a diagram showing a comparative example of the broadcasting system Sys according to the present embodiment.
This diagram shows an example in which broadcasting station C1 transmits only MPEG-H audio, which is not multiplexed with MPEG-4 audio.
In this example, MPEG-H audio is used as the only audio component. Therefore, the table describing asset information does not include a descriptor indicating whether an MPEG-H audio signal is present. In this case, Demux C22 can acquire MPEG-H audio as an audio component (audio configuration), but cannot determine whether it can be processed (level, etc.). The audio data string output from Demux C22 is decoded by audio decoder C232 and output to mixer C233.

図３の比較例に対して、本実施形態に係る放送局１は、ＭＰＥＧ－Ｈオーディオ信号とＭＰＥＧ－４オーディオ信号を並行して送出する。受信機２は、まずＭＰＥＧ－Ｈオーディオ信号が存在するかどうかを記述子で判断し、音声デコード能力に応じて、ＭＰＥＧ－ＨまたはＭＰＥＧ－４の音声を選択する。
これにより、受信機２は、自装置のの能力に応じて、ＭＰＥＧ－ＨまたはＭＰＥＧ－４のうち、適切な音声再生を行うことができる。 In contrast to the comparative example in Figure 3, the broadcasting station 1 in this embodiment transmits MPEG-H audio signals and MPEG-4 audio signals in parallel. The receiver 2 first determines whether an MPEG-H audio signal is present by checking the descriptor, and then selects either MPEG-H or MPEG-4 audio depending on its audio decoding capabilities.
This allows the receiver 2 to play back audio in the appropriate format, either MPEG-H or MPEG-4, depending on the capabilities of the receiver 2 itself.

図４は、本実施形態に係る放送システムＳｙｓの別の一例を示す図である。この図において、受信機２は、ＭＰＥＧ－Ｈに対応していない受信機である。
この図の受信機２は、チューナー２１１、Ｄｅｍｕｘ２２ａ、セレクタ２３１ａ、音声デコーダー２３２ａ、及び音声デコーダー２４１を含んで構成される。この図において、図３の受信機２と同じ機能部については同じ符号を付し、その説明を省略する。 4 is a diagram showing another example of the broadcasting system Sys according to this embodiment, in which the receiver 2 is not compatible with MPEG-H.
The receiver 2 in this figure includes a tuner 211, a Demux 22a, a selector 231a, an audio decoder 232a, and an audio decoder 241. In this figure, the same functional units as those in the receiver 2 in Figure 3 are denoted by the same reference numerals, and their description will be omitted.

Ｄｅｍｕｘ２２ａは、入力されたデータを、映像データ列、音声データ列、文字スーパーデータ列、字幕データ列等に分離する。分離された音声データ列は、セレクタ２３１ａへ出力される。分離された映像データ列は、映像デコーダー２４１へ出力される。
ここで、Ｄｅｍｕｘ２２ａは、音声データ列について、ＭＰＥＧ－４の各音声１２、１３、１４と、の各音声コンポーネントの音声データ列に分離する。より具体的には、Ｄｅｍｕｘ２２ａは、アセット情報を記述するテーブルにおいて、ＭＰＥＧ－Ｈオーディオ信号１１が存在するかどうかを記述子で判断する。Ｄｅｍｕｘ２２ａは、ＭＰＥＧ－Ｈオーディオ信号１１は再生可能ではないと判定する。
Ｄｅｍｕｘ２２は、オーディオ信号Ａ１１のデータから、音声１２、音声１３、及び音声１４を分離する。
なお、このような信号の選択（ＭＰＥＧ－４オーディオ信号の選択、又はＭＰＥＧ－４オーディオ信号のみのサイマル放送の選択）は、セレクタ２３１で行われてもよい。 The Demux 22a separates the input data into a video data string, an audio data string, a superimposed text data string, a subtitle data string, etc. The separated audio data string is output to the selector 231a. The separated video data string is output to the video decoder 241.
Here, the Demux 22a separates the audio data stream into MPEG-4 audio 12, 13, and 14, and audio data streams for each audio component. More specifically, the Demux 22a determines whether an MPEG-H audio signal 11 exists based on a descriptor in a table describing asset information. The Demux 22a determines that the MPEG-H audio signal 11 cannot be played.
The Demux 22 separates the audio 12 , audio 13 , and audio 14 from the data of the audio signal A11.
Note that such signal selection (selection of an MPEG-4 audio signal or selection of a simulcast of only an MPEG-4 audio signal) may be performed by the selector 231 .

図５は、本実施形態に係る放送システムＳｙｓの概略を説明する説明図である。
放送システムＳｙｓにおいて、放送局１は、ＭＰＥＧ－Ｈエンコーダ１１、ＭＰＥＧ－４エンコーダ１１１～１１３、Ｍｕｘ（マルチプレクサ）１２を含んで構成される。なお、放送局１は、その他放送に必要な機能部を有する。この図のＭＰＥＧ－４エンコーダは、３個の例をしめすが、放送局１は、ＭＰＥＧ－４エンコーダを２個以下備えるものであってもよいし、４個以上備えるものであってもよい。 FIG. 5 is an explanatory diagram illustrating an outline of the broadcasting system Sys according to this embodiment.
In the broadcasting system Sys, the broadcasting station 1 is configured to include an MPEG-H encoder 11, MPEG-4 encoders 111 to 113, and a Mux (multiplexer) 12. The broadcasting station 1 also has other functional units necessary for broadcasting. Although this diagram shows an example of three MPEG-4 encoders, the broadcasting station 1 may be equipped with two or fewer MPEG-4 encoders, or may be equipped with four or more MPEG-4 encoders.

受信機２は、Ｄｅｍｕｘ２２、セレクタ２３１、ＭＰＥＧ－Ｈオーディオコアデコーダー２３２－１、ＭＰＥＧ－４デコーダー２３２－２、ＭＰＥＧ－Ｈオーディオレンダラー２３３－１、及び、ミキサー２３３－２を含んで構成される。この図において、図２の受信機２と同じ機能部については同じ符号を付す。なお、ＭＰＥＧ－Ｈオーディオコアデコーダー２３２－１及びＭＰＥＧ－４デコーダー２３２－２は、図２の音声デコーダー２３２に対応する。ＭＰＥＧ－Ｈオーディオレンダラー２３３－１及びミキサー２３３－２は、図２のミキサー２３３に対応する。 Receiver 2 is composed of Demux 22, selector 231, MPEG-H audio core decoder 232-1, MPEG-4 decoder 232-2, MPEG-H audio renderer 233-1, and mixer 233-2. In this figure, functional units that are the same as those in receiver 2 in Figure 2 are assigned the same reference numerals. Note that MPEG-H audio core decoder 232-1 and MPEG-4 decoder 232-2 correspond to audio decoder 232 in Figure 2. MPEG-H audio renderer 233-1 and mixer 233-2 correspond to mixer 233 in Figure 2.

放送局１では、ＭＰＥＧ－Ｈの音声の音素材として、背景音（２２．２ｃｈ／１１．１ｃｈ）、セリフ（日本語）、セリフ（英語）、及び、解説音声（日本語）のデータが、ＭＰＥＧ－Ｈエンコーダ１１へ入力される。また、ＭＰＥＧ－４の音声の音素材として、日本語のセリフを含む７．１ｃｈの音声、日本語のセリフを含むステレオの音声、及び英語のセリフを含むステレオのデータが、それぞれ、ＭＰＥＧ－４エンコーダ１１１、１１２、１１３へ入力される。 At broadcast station 1, background sound (22.2ch/11.1ch), dialogue (Japanese), dialogue (English), and commentary audio (Japanese) data are input to MPEG-H encoder 11 as MPEG-H audio material. Furthermore, 7.1ch audio including Japanese dialogue, stereo audio including Japanese dialogue, and stereo data including English dialogue are input to MPEG-4 encoders 111, 112, and 113, respectively, as MPEG-4 audio material.

ＭＰＥＧ－Ｈエンコーダ１１は、入力された音声を符号化することで、ＭＰＥＧ－Ｈの音声ストリームＳｔ１を出力する。このストリームを、ＭＰＥＧ－Ｈ３Ｄオーディオストリームともいう。
ＭＰＥＧ－４エンコーダ１１１、１１２、及び１１３は、入力された音声を符号化することで、それぞれ、ＭＰＥＧ－４の音声ストリームＳｔ２、Ｓｔ３及びＳｔ４を出力する。 The MPEG-H encoder 11 encodes the input audio and outputs an MPEG-H audio stream St1. This stream is also called an MPEG-H 3D audio stream.
The MPEG-4 encoders 111, 112, and 113 encode the input audio and output MPEG-4 audio streams St2, St3, and St4, respectively.

Ｍｕｘ１２には、映像ストリーム、ＳＩ（ＳｉｇｎａｌｉｎｇＩｎｆｏｒｍａｔｉｏｎ）、ＭＰＥＧ－Ｈの音声ストリームＳｔ１、ＭＰＥＧ－４の音声ストリームＳｔ２、Ｓｔ３及びＳｔ４が入力される。Ｍｕｘ１２は、これらのデータを多重化する。多重化されたデータは、変調され、変調後の信号が放送波として放送される。 Mux 12 receives the video stream, SI (Signaling Information), MPEG-H audio stream St1, and MPEG-4 audio streams St2, St3, and St4. Mux 12 multiplexes this data. The multiplexed data is modulated, and the modulated signal is broadcast as a broadcast wave.

受信機２に受信された放送波は復調され、復調後のデータはＤｅｍｕｘ２２に入力される。Ｄｅｍｕｘ２２は、入力されたデータを、映像ストリーム、ＳＩ、ＭＰＥＧ－Ｈの音声ストリームＳｔ１、ＭＰＥＧ－４の音声ストリームＳｔ２、Ｓｔ３及びＳｔ４に分離する。ＭＰＥＧ－Ｈの音声ストリームＳｔ１、ＭＰＥＧ－４の音声ストリームＳｔ２、Ｓｔ３及びＳｔ４は、セレクタ２３１に入力される。ＳＩのＭＰＴ（ＭＭＴＰａｃｋａｇｅＴａｂｌｅ）からは、ＭＰＥＧ－Ｈオーディオ信号が存在するかどうかを示す記述子が抽出される。 The broadcast waves received by receiver 2 are demodulated, and the demodulated data is input to Demux 22. Demux 22 separates the input data into a video stream, SI, MPEG-H audio stream St1, and MPEG-4 audio streams St2, St3, and St4. MPEG-H audio stream St1 and MPEG-4 audio streams St2, St3, and St4 are input to selector 231. A descriptor indicating whether an MPEG-H audio signal is present is extracted from the MMT Package Table (MPT) of SI.

セレクタ２３１は、抽出された記述子に基づいて、ＭＰＥＧ－Ｈオーディオ信号が存在するかどうかを判断する。ＭＰＥＧ－Ｈオーディオ信号が存在する場合、セレクタ２３１は、ＭＰＥＧ－Ｈの音声ストリームＳｔ１を、ＭＰＥＧ－Ｈオーディオコアデコーダー２３２－１へ出力する。
セレクタ２３１は、ＭＰＴに基づいて、ＭＰＥＧ－４の音声ストリームＳｔ２、Ｓｔ３又はＳｔ４を、ＭＰＥＧ－４デコーダー２３２－２へ出力する。 The selector 231 determines whether an MPEG-H audio signal is present based on the extracted descriptor, and if an MPEG-H audio signal is present, the selector 231 outputs the MPEG-H audio stream St1 to the MPEG-H audio core decoder 232-1.
The selector 231 outputs the MPEG-4 audio stream St2, St3 or St4 to the MPEG-4 decoder 232-2 based on the MPT.

ＭＰＥＧ－Ｈオーディオコアデコーダー２３２－１は、ＭＰＥＧ－Ｈオーディオコアデコーダー２３２－１を復号化することで、背景音（２２．２ｃｈ／１１．１ｃｈ）、セリフ（日本語）、セリフ（英語）、及び、解説音声（日本語）のデータを抽出する。
ＭＰＥＧ－Ｈオーディオレンダラー２３３－１は、ＭＰＥＧ－Ｈオーディオのオーディオレンダラーであり、ＭＰＥＧ－Ｈオーディオコアデコーダー２３２－１が抽出したデータの音声をレンダリング処理（ダウンコンバート、或いはアップコンバートを含む）し、ミキサー２３３へ出力する。
ＭＰＥＧ－４デコーダー２３２－２は、ＭＰＥＧ－４の音声ストリームＳｔ２、Ｓｔ３又はＳｔ４を復号化することで、日本語のセリフを含む７．１ｃｈの音声、日本語のセリフを含むステレオの音声、及び英語のセリフを含むステレオのデータを抽出し、ミキサー２３３へ出力する。
ミキサー２３３－２は、入力されたデータの音声を合成し、合成された音声は、各スピーカー又はヘッドホン等から出力される。 The MPEG-H audio core decoder 232-1 extracts data of background sounds (22.2ch/11.1ch), dialogue (Japanese), dialogue (English), and commentary audio (Japanese) by decoding the MPEG-H audio core decoder 232-1.
The MPEG-H audio renderer 233 - 1 is an audio renderer for MPEG-H audio, and renders (including down-converting or up-converting) the audio data extracted by the MPEG-H audio core decoder 232 - 1 and outputs it to the mixer 233 .
The MPEG-4 decoder 232-2 decodes the MPEG-4 audio stream St2, St3, or St4 to extract 7.1ch audio including Japanese dialogue, stereo audio including Japanese dialogue, and stereo data including English dialogue, and outputs this to the mixer 233.
The mixer 233-2 synthesizes the audio of the input data, and the synthesized audio is output from each speaker or headphone.

［放送波の制御情報について］
本実施形態に係る放送波について、説明する。
放送波において、制御情報は、各放送事業者がその放送信号であるＴＬＶストリームに重畳して送出される。制御情報には、ＴＬＶ多重化方式に関わるＴＬＶ－ＳＩ（ＴＬＶ－ＳｉｇｎａｌｉｎｇＩｎｆｏｒｍａｔｉｏｎ）と、メディアトランスポート方式であるＭＭＴに関わるＭＭＴ－ＳＩ（ＭＭＴ－ＳｉｇｎａｌｉｎｇＩｎｆｏｒｍａｔｉｏｎ）がある。
以下では、（映像又は音声の）「コンポーネント」を「アセット」ともいう。 [Regarding broadcast wave control information]
The broadcast wave according to this embodiment will be described.
In broadcast waves, control information is superimposed on the TLV stream, which is the broadcast signal, by each broadcaster and transmitted. The control information includes TLV-SI (TLV-Signaling Information) related to the TLV multiplexing method and MMT-SI (MMT-Signaling Information) related to MMT, a media transport method.
In the following, a (video or audio) "component" is also referred to as an "asset."

＜ＭＭＴを用いるシステムのプロトコルスタックの構造＞
ＭＭＴを用いるシステムにおいて、制御情報が配置されるプロトコルスタックの構造の例について説明する。
図６は、本実施形態に係るプロトコルスタックの構造の一例を示す図である。
この図に示すように、放送システムに用いるプロトコルスタックは、ＴＭＣＣ（ＴｒａｎｓｍｉｓｓｉｏｎａｎｄＭｕｌｔｉｐｌｅｘｉｎｇＣｏｎｆｉｇｕｒａｔｉｏｎＣｏｎｔｒｏｌ）、時刻情報、符号化された映像データ、符号化された音声データ、符号化された字幕データ、ＭＭＴ－ＳＩ、ＨＴＭＬ５規格で記述されたアプリケーション（単にアプリともいう）、ＥＰＧ（電子番組ガイド）、コンテンツダウンロードデータ等を含んで構成される。放送番組の映像信号及び音声信号の符号はＭＦＵ（ＭｅｄｉａＦｒａｇｍｅｎｔＵｎｉｔ）／ＭＰＵである。そして、ＭＦＵ／ＭＰＵは、ＭＭＴＰペイロードに乗せて放送局１によってＭＭＴＰパケット化され、ＩＰパケットで放送局１によって伝送される。データコンテンツの伝送は、データが放送局１によってＭＭＴＰパケット化され、ＩＰパケットで放送局１によって伝送される。このように構成されたＩＰパケットは、放送伝送路を用いて放送される場合、ＴＬＶパケットの形式で放送局１によって伝送される。一つのＩＰパケットあるいは一つのヘッダー圧縮したＩＰパケットは、一つのＴＬＶパケットで放送局１によって伝送する。 <Protocol stack structure of a system using MMT>
An example of the structure of a protocol stack in which control information is arranged in a system using MMT will be described.
FIG. 6 is a diagram showing an example of the structure of a protocol stack according to this embodiment.
As shown in this diagram, the protocol stack used in the broadcasting system includes TMCC (Transmission and Multiplexing Configuration Control), time information, encoded video data, encoded audio data, encoded subtitle data, MMT-SI, an application (also simply referred to as an app) written in HTML5 standard, EPG (Electronic Program Guide), content download data, etc. The codes for the video and audio signals of the broadcast program are MFU (Media Fragment Unit)/MPU. The MFU/MPU is then placed on an MMTP payload and packetized by broadcast station 1 into MMTP packets, which are then transmitted by broadcast station 1 in IP packets. For transmission of data content, the data is packetized by broadcast station 1 into MMTP packets, which are then transmitted by broadcast station 1 in IP packets. When the IP packets configured in this way are broadcast using a broadcast transmission path, they are transmitted by broadcast station 1 in the form of TLV packets. One IP packet or one header-compressed IP packet is transmitted by the broadcasting station 1 in one TLV packet.

さらに、放送システムに用いるプロトコルスタックでは、ＭＭＴ－ＳＩ、ＴＬＶ－ＳＩの２種類の制御情報が設けられている。ＭＭＴ－ＳＩとは、放送番組の構成などを示す制御情報である。ＭＭＴ－ＳＩでは、ＭＭＴの制御メッセージの形式とし、放送局１によってＭＭＴＰペイロードに乗せられてＭＭＴＰパケット化され、放送局１によってＩＰパケットで伝送される。ＴＬＶ－ＳＩとは、ＩＰパケットの多重に関する制御情報であり、選局のための情報やＩＰアドレスとサービスの対応情報を提供する。 Furthermore, the protocol stack used in the broadcasting system provides two types of control information: MMT-SI and TLV-SI. MMT-SI is control information that indicates the structure of a broadcast program, etc. MMT-SI is formatted as an MMT control message, placed in an MMTP payload by broadcast station 1, and converted into MMTP packets, which are then transmitted by broadcast station 1 in IP packets. TLV-SI is control information related to multiplexing IP packets, and provides information for channel selection and information on the correspondence between IP addresses and services.

また、ＴＭＣＣとは、伝送路上の信号の単位（スロット）ごとに変調方式やエラー訂正方式を指定する階層変調方式において、伝送フレームに挿入して伝送するこれらの制御情報である。ＨＥＶＣ（ＨｉｇｈＥｆｆｉｃｉｅｎｃｙＶＩＤｅｏＣｏｄｉｎｇ）とは、映像信号の符号化の手法である。ＡＡＣ（ＡｄｖａｎｃｅｄＡｕｄｉｏＣｏｄｉｎｇ）及びＡＬＳ（ＡｕｄｉｏＬｏｓｓｌｅｓｓＣｏｄｉｎｇ）とは、音声信号の符号化の手法である。ＵＤＰ／ＩＰ（ＵｓｅｒＤａｔａｇｒａｍＰｒｏｔｏｃｏｌ／ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）とは、通信に使われるプロトコルの１つである。ＴＬＶ（ＴＹＰＥＬＥＮＧＴＨＶＡＬＵＥ）とは、データの多重化手法の１つである。
ＴＬＶは、データの符号化をデータタイプ（Ｔｙｐｅ）、長さ（Ｌｅｎｇｔｈ）、値（Ｖａｌｕｅ）の３つで構成される。 TMCC is control information inserted into a transmission frame for transmission in a hierarchical modulation method that specifies a modulation method and an error correction method for each unit (slot) of a signal on a transmission path. HEVC (High Efficiency Video Coding) is a method for encoding video signals. AAC (Advanced Audio Coding) and ALS (Audio Lossless Coding) are methods for encoding audio signals. UDP/IP (User Datagram Protocol/Internet Protocol) is one of the protocols used in communication. TLV (TYPE LENGTH VALUE) is one of the data multiplexing methods.
TLV is composed of three parts for encoding data: data type (Type), length (Length), and value (Value).

＜メッセージの種類と識別＞
ＭＭＴ－ＳＩには、メッセージ、テーブル、記述子が含まれている。
メッセージには、ＰａｃｋａｇｅＡｃｃｅｓｓ（ＰＡ）メッセージ、Ｍ２セクションメッセージ、ＣＡメッセージ、Ｍ２短セクションメッセージ、データ伝送メッセージ、及び事業者が設定するメッセージが含まれる。
放送で使用するＭＭＴ－ＳＩのメッセージは、次の通りである。 <Message types and identification>
MMT-SI includes messages, tables, and descriptors.
The messages include a Package Access (PA) message, an M2 section message, a CA message, an M2 short section message, a data transmission message, and a message set by the operator.
The MMT-SI messages used in broadcasting are as follows:

「ＰＡメッセージ」は、サービスのエントリーポイントを示すために、ＰＬＴおよびＭＰＴを伝送する。
「Ｍ２セクションメッセージ」は、ＭＰＥＧ－２Ｓｙｓｔｅｍｓのセクション拡張形式を伝送する。
「ＣＡメッセージ」は、限定受信方式に関する情報を伝送する。
「Ｍ２短セクションメッセージ」は、ＭＰＥＧ－２Ｓｙｓｔｅｍｓのセクション短形式を伝送する。
「データ伝送メッセージ」は、データ伝送に関するテーブルを伝送する。 The "PA message" carries the PLT and MPT to indicate the entry point of the service.
The "M2 Section Message" transmits the section extension format of the MPEG-2 Systems.
The "CA message" transmits information about the conditional access method.
The "M2 Short Section Message" transmits the section short format of the MPEG-2 Systems.
The "data transmission message" transmits a table relating to data transmission.

＜テーブルの種類と識別＞
放送で使用するＴＬＶ－ＳＩのテーブルは、次の通りである。 <Table types and identification>
The TLV-SI table used in broadcasting is as follows:

「ＴＬＶ－ＮＩＴ（ＮｅｔｗｏｒｋＩｎｆｏｒｍａｔｉｏｎＴａｂｌｅｆｏｒＴＬＶ）」は、ＴＬＶパケットによる伝送において、変調周波数など伝送路の情報と放送番組を関連付ける情報を伝送する。
「ＡＭＴ(ＡｄｄｒｅｓｓＭａｐＴａｂｌｅ)」は、放送番組番号を識別するサービス識別子とＩＰパケットとを関連付ける情報を伝送する。
「ＭＰＴ（ＭＭＴＰａｃｋａｇｅＴａｂｌｅ）」は、アセットのリストやその位置などパッケージを構成する情報を与える。
「ＰＬＴ（ＰａｃｋａｇｅＬｉｓｔＴａｂｌｅ）」は、放送サービスとして提供されるサービスのＭＰＴを含むＰＡメッセージを伝送するパケットＩＤの一覧を示す。
「ＥＣＭ（ＥｎｔｉｔＬｅｍｅｎｔＣｏｎｔｒｏＬＭｅｓｓａｇｅ）」は、番組情報（番組に関する情報とデスクランブルのための鍵など）及び制御情報（デコーダーのスクランブル機能の強制オン／オフ指令）からなる共通情報を伝送する。 "TLV-NIT (Network Information Table for TLV)" transmits information that associates transmission path information, such as modulation frequency, with broadcast programs in transmission by TLV packets.
The "AMT (Address Map Table)" transmits information that associates a service identifier that identifies a broadcast program number with an IP packet.
"MPT (MMT Package Table)" provides information that configures a package, such as a list of assets and their locations.
"PLT (Package List Table)" indicates a list of packet IDs that transmit PA messages including MPTs of services provided as broadcast services.
An "ECM (Entitled Control Message)" transmits common information consisting of program information (information about the program and a key for descrambling, etc.) and control information (a forced on/off command for the decoder's scrambling function).

「ＥＭＭ（ＥｎｔｉｔＬｅｍｅｎｔＭａｎａｇｅｍｅｎｔＭｅｓｓａｇｅ）は、加入者毎の契約情報及び共通情報の暗号を解くためのワーク鍵を含む個別情報を伝送する。
「ＣＡＴ（ＭＨ）（ＣｏｎｄｉｔｉｏｎａＬＡｃｃｅｓｓＴａｂｌｅ）」は、限定受信放送を構成する関連情報のうち個別情報を伝送するＭＭＴＰパケットのパケット識別子を指定する。
「ＭＨ－ＥＩＴ（ＭＨ－ＥｖｅｎｔＩｎｆｏｒｍａｔｉｏｎＴａｂｌｅ）」は、番組の名称、放送日時、内容の説明など、番組に関する情報を伝送する。
「ＭＨ－ＡＩＴ（ＭＨ－ＡｐｐＬｉｃａｔｉｏｎＩｎｆｏｒｍａｔｉｏｎＴａｂｌｅ）」は、アプリケーションに関する動的制御情報及び実行に必要な付加情報を伝送する。
「ＭＨ－ＢＩＴ（ＭＨ－ＢｒｏａｄｃａｓｔｅｒＩｎｆｏｒｍａｔｉｏｎＴａｂｌｅ）」は、ネットワーク上に存在するブロードキャスタの情報を提示するために用いる。 "EMM (Entity Management Message) transmits individual information including contract information for each subscriber and a work key for decrypting common information.
"CAT (MH) (Conditional Access Table)" specifies the packet identifier of an MMTP packet that transmits individual information among the related information that constitutes the conditional access broadcast.
The "MH-EIT (MH-Event Information Table)" transmits information about the program, such as the program name, broadcast date and time, and a description of the content.
The "MH-AIT (MH-AppLication Information Table)" transmits dynamic control information related to an application and additional information required for execution.
"MH-BIT (MH-Broadcaster Information Table)" is used to present information about broadcasters present on the network.

「ＭＨ－ＳＤＴＴ（ＭＨ－ＳｏｆｔｗａｒｅＤｏｗｎＬｏａｄＴｒｉｇｇｅｒＴａｂｌｅ）」は、ダウンロードのサービスＩＤ、スケジュール情報、更新対象の受信機種別などの告知情報を伝送する。
「ＭＨ－ＳＤＴ（ＭＨ－ＳｅｒｖｉｃｅＤｅｓｃｒｉｐｔｉｏｎＴａｂｌｅ）」は、編成チャンネルの名称、放送事業者の名称など、編成チャンネルに関する情報を伝送する。
「ＭＨ－ＴＯＴ（ＭＨ－ＴｉｍｅＯｆｆｓｅｔＴａｂｌｅ）」は、現在の日付時刻の指示、及び、実際の時刻と人間系への表示時刻の差分時間を伝送する。
「ＭＨ－ＣＤＴ（ＭＨ－ＣｏｍｍｏｎＤａｔａＴａｂｌｅ）」は、事業者ロゴマークなど、受信機で共通に必要であり、不揮発性メモリに格納する事を前提としたデータを伝送する。
「ＤＤＭＴ（ＤａｔａＤｉｒｅｃｔｏｒｙＭａｎａｇｅｍｅｎｔＴａｂｌｅ）」は、アプリケーションを構成するファイルのディレクトリ構成を提供する。 "MH-SDTT (MH-Software Download Trigger Table)" transmits notification information such as the download service ID, schedule information, and the type of receiver to be updated.
The "MH-SDT (MH-Service Description Table)" transmits information about the organized channels, such as the names of the organized channels and the names of the broadcasting companies.
The "MH-TOT (MH-Time Offset Table)" transmits the current date and time, and the difference between the actual time and the time displayed to the human system.
The "MH-CDT (MH-Common Data Table)" transmits data that is commonly required by receivers, such as company logos, and is intended to be stored in non-volatile memory.
The "DDMT (Data Directory Management Table)" provides a directory structure of the files that make up an application.

「ＤＡＭＴ（ＤａｔａＡｓｓｅｔＭａｎａｇｅｍｅｎｔＴａｂｌｅ）」は、アセット内のＭＰＵの構成とＭＰＵ毎のバージョン情報を提供する。
「ＤＣＣＴ（ＤａｔａＣｏｎｔｅｎｔＣｏｎｆｉｇｕｒａｔｉｏｎＴａｂｌｅ）」は、データコンテンツとしてのファイルの構成情報を提供する。
「ＥＭＴ（ＥｖｅｎｔＭｅｓｓａｇｅＴａｂｌｅ）」は、イベントメッセージに関する情報を伝送するために用いる。 "DAMT (Data Asset Management Table)" provides the configuration of MPUs in an asset and version information for each MPU.
The "DCCT (Data Content Configuration Table)" provides configuration information of a file as data content.
The "EMT (Event Message Table)" is used to transmit information about event messages.

＜ＭＭＴパッケージテーブル＞
ＭＰＴ（ＭＭＴパッケージテーブル）は、アセットのリストやアセットのネットワーク上の位置などパッケージを構成する情報を与える。
図７は、本実施形態に係るＭＰＴのデータ構造を示す図である。
「ｔａｂＬｅ_ＩＤ」（テーブル識別子）は、テーブル識別子は８ビットのフィールドで、各テーブルを識別する。
「ｖｅｒｓｉｏｎ」（バージョン）は、テーブルのバージョン番号を書き込む領域である。
「Ｌｅｎｇｔｈ」（テーブル長）は、このフィールドより後に続くデータバイト数を書き込む領域である。 <MMT Package Table>
The MPT (MMT Package Table) provides information that constitutes a package, such as a list of assets and their locations on the network.
FIG. 7 is a diagram showing the data structure of the MPT according to this embodiment.
"tabLe_ID" (table identifier) is an 8-bit field that identifies each table.
"Version" is an area where the version number of the table is written.
"Length" (table length) is an area in which the number of data bytes following this field is written.

「ＭＭＴ_ｐａｃｋａｇｅ_ＩＤ_Ｌｅｎｇｔｈ」は、パッケージＩＤバイトの長さをバイト単位で示す。
「ＭＭＴ_ｐａｃｋａｇｅ_ＩＤ_ｂｙｔｅ」は、パッケージＩＤを示す。サービスを識別するためのサービス識別と同じ値とする。
「ＭＰＴ_ｄｅｓｃｒｉｐｔｏｒｓ_Ｌｅｎｇｔｈ」は、ＭＰＴ記述子領域の長さをバイト単位で示す。
「ＭＰＴ＿ｄｅｓｃｒｉｐｔｏｒｓ＿ｂｙｔｅ」（ＭＰＴ記述子領域）は、ＭＰＴの記述子を格納する領域である。
なお、番組がマルチビュー番組である場合、ＭＰＴの記述子領域には、ＭＨ－コンポーネントグループ記述子（ＭＨ－Ｃｏｍｐｏｎｅｎｔ＿Ｇｒｏｕｐ＿Ｄｅｓｃｒｉｐｔｏｒ（））が含まれる。これに対して、番組がマルチビュー番組ではない場合、ＭＰＴの記述子領域には、ＭＨ－コンポーネントグループ記述子が含まれない。
「ｎｕｍｂｅｒ＿ｏｆ＿ａｓｓｅｔｓ」（アセット数）は、本テーブルが情報を与えるアセットの数を示す。
「ＩＤｅｎｔｉｆｉｅｒ＿ｔｙｐｅ」（識別子タイプ）は、ＭＭＴＰパケットフローのＩＤ体系を示す。アセットＩＤを示すＩＤ体系であれば特定値（０ｘ００）とする。 "MMT_package_ID_Length" indicates the length of the package ID byte in bytes.
"MMT_package_ID_byte" indicates a package ID, which is the same value as the service ID for identifying the service.
"MPT_descriptors_Length" indicates the length of the MPT descriptor area in bytes.
"MPT_descriptors_byte" (MPT descriptor area) is an area that stores the MPT descriptors.
If the program is a multi-view program, the descriptor area of the MPT includes an MH-component group descriptor (MH-Component_Group_Descriptor()). On the other hand, if the program is not a multi-view program, the descriptor area of the MPT does not include an MH-component group descriptor.
"Number of assets" indicates the number of assets for which this table provides information.
"IDentifier_type" (identifier type) indicates the ID system of the MMTP packet flow. If it is an ID system indicating an asset ID, a specific value (0x00) is set.

ＭＰＴは、１又は複数のアセットの各々を記述する領域を有する。この領域には、アセット毎に、次のフィールドが格納される。
「ａｓｓｅｔ＿ＩＤ＿ｓｃｈｅｍｅ」（アセットＩＤ形式）は、アセットＩＤの形式を示す。「ａｓｓｅｔ_ＩＤ」について、受信機２は、ｃｏｍｐｏｎｅｎｔ_ｔａｇ値を受信動作に使う。受信機２は、アセットの識別に、ｃｏｍｐｏｎｅｎｔ_ｔａｇ値を用いる。
「ａｓｓｅｔ＿ＩＤ＿ｌｅｎｇｔｈ」（アセットＩＤ長）は、アセットＩＤバイトの長さをバイト単位で示す。
「ａｓｓｅｔ＿ＩＤ＿ｂｙｔｅ」（アセットＩＤバイト）は、アセットＩＤを示す。 The MPT has an area that describes each of one or more assets. This area stores the following fields for each asset:
"Asset_ID_scheme" (asset ID format) indicates the format of the asset ID. For "asset_ID", the receiver 2 uses the component_tag value in the receiving operation. The receiver 2 uses the component_tag value to identify the asset.
"Asset_ID_length" indicates the length of the asset ID bytes in bytes.
"Asset_ID_byte" (asset ID byte) indicates the asset ID.

「ａｓｓｅｔ＿ｔｙｐｅ」（アセットタイプ）は、アセットの種類を示す。
アセットタイプには、例えば、ＨＥＶＣで符号化された映像データを示すｈｃｖ１、ＭＰＥＧ－４オーディオで符号化された音声データを示すｍｐ４ａ、又は、ＭＰＥＧ－Ｈオーディオで符号化された音声データを示すｍｈａ１、ｍｈａ２、ｍｈｍ１、ｍｈｍ２などが記述される。 "Asset_type" indicates the type of asset.
The asset type may be, for example, hcv1, which indicates video data encoded using HEVC, mp4a, which indicates audio data encoded using MPEG-4 audio, or mha1, mha2, mhm1, or mhm2, which indicates audio data encoded using MPEG-H audio.

「ａｓｓｅｔ＿ｃｌｏｃｋ＿ｒｅｌａｔｉｏｎ＿ｆｌａｇ」（クロック情報フラグ）は、アセットのクロック情報フィールドの有無を示す。
「ｌｏｃａｔｉｏｎ＿ｃｏｕｎｔ」（ロケーション数）は、アセットのロケーション情報の数を示す。
「ＭＭＴ＿ｇｅｎｅｒａｌ＿ｌｏｃａｔｉｏｎ＿ｉｎｆｏ」（ロケーション情報）は、アセットのロケーション情報を示す。
「ａｓｓｅｔ＿ｄｅｓｃｒｉｐｔｏｒｓ＿ｌｅｎｇｔｈ」（アセット記述子長）は、後続の記述子の全バイト長を示す。
「ａｓｓｅｔ＿ｄｅｓｃｒｉｐｔｏｒｓ＿ｂｙｔｅ」（アセット記述子領域）は、アセットの記述子を格納する領域とする。 "Asset_clock_relation_flag" (clock information flag) indicates whether or not the asset has a clock information field.
"Location_count" indicates the number of location information items for an asset.
"MMT_general_location_info" (location information) indicates the location information of the asset.
"Asset_descriptors_length" indicates the total byte length of the subsequent descriptors.
"Asset_descriptors_byte" (asset descriptor area) is an area for storing asset descriptors.

＜記述子の種類と識別＞
放送で使用するＴＬＶ－ＳＩの記述子は、次の通りである。
「サービスリスト記述子（ＳｅｒｖｉｃｅＬｉｓｔＤｅｓｃｒｉｐｔｏｒ）」は、編成チャンネルとその種別の一覧の記述である。
「衛星分配システム記述子（ＳａｔｅＬＬｉｔｅＤｅＬｉｖｅｒｙＳｙｓｔｅｍＤｅｓｃｒｉｐｔｏｒ）」は、衛星伝送路の物理的条件の記述である。
「システム管理記述子（ＳｙｓｔｅｍＭａｎａｇｅｍｅｎｔＤｅｓｃｒｉｐｔｏｒ）」は、放送／非放送などの識別である。
「ネットワーク名記述子（ＮｅｔｗｏｒｋＮａｍｅＤｅｓｃｒｉｐｔｏｒ）」は、ネットワーク名の記述である。 <Descriptor types and identification>
The TLV-SI descriptors used in broadcasting are as follows:
The "Service List Descriptor" is a description of a list of programming channels and their types.
"Satellite Delivery System Descriptor" is a description of the physical conditions of a satellite transmission path.
The "System Management Descriptor" is an identification such as broadcast/non-broadcast.
"Network Name Descriptor" is a description of the network name.

放送で使用するＭＭＴ－ＳＩの記述子は、次の通りである。
「リモートコントロールキー記述子」は、受信機用リモコン（リモートコントローラー）のワンタッチキーに割り当てるサービスをユニークに提供する。
「アセットグループ記述子」は、アセットのグループ関係とグループ内での優先度を提供する。
「ＭＰＵタイムスタンプ記述子」は、ＭＰＵの提示時刻を提供する。
「アクセス制御記述子」は、限定受信方式を識別する。
「スクランブル方式記述子」は、スクランブルサブシステムを識別する。
「緊急情報記述子（ＭＨ）」は、緊急警報信号としての必要な情報及び機能の記述を提供する。
「ＭＨ－イベントグループ記述子」は、複数イベントのグループ化情報を記述する。
「ＭＨ－サービスリスト記述子」は、編成チャンネルとその種別の一覧を記述する。
「ＭＨ－短形式イベント記述子」は、番組名と番組の簡単な説明を記述する。
「ＭＨ－拡張形式イベント記述子」は、番組に関する詳細情報を記述する。 The MMT-SI descriptors used in broadcasting are as follows:
The "remote control key descriptor" uniquely provides a service for assigning to one-touch keys on the receiver's remote control (remote controller).
The "asset group descriptor" provides the group relationship of assets and their priority within the group.
The "MPU timestamp descriptor" provides the presentation time of the MPU.
The "access control descriptor" identifies the conditional access method.
The "Scrambling Scheme Descriptor" identifies the scrambling subsystem.
The "Emergency Information Descriptor (MH)" provides a description of the necessary information and functions as an emergency alert signal.
The "MH-Event Group Descriptor" describes grouping information for multiple events.
The "MH-service list descriptor" describes a list of organized channels and their types.
The "MH-short event descriptor" describes the program name and a brief description of the program.
The "MH-extended event descriptor" describes detailed information about a program.

「映像コンポーネント記述子」は、番組要素信号のうち映像信号に関するパラメータ、説明などを記述する。
「ＭＨ－ストリーム識別記述子」は、個々の番組要素信号の識別に用いる。
「ＭＨ－コンテント記述子」は、番組ジャンルを記述する。
「ＭＨ－パレンタルレート記述子」は、視聴許可年齢制限を記述する。
「ＭＨ－音声コンポーネント記述子」は、番組要素のうち音声信号に関するパラメータを記述する。
「ＭＨ－対象地域記述子」は、対象とする地域を記述する。
「ＭＨ－シリーズ記述子」は、複数イベントにまたがるシリーズ情報を記述する。
「ＭＨ－ＳＩ伝送パラメータ記述子」は、ＳＩ伝送のパラメータ（周期グループや再送周期等）を記述する。
「ＭＨ－ブロードキャスタ名記述子」は、ブロードキャスタ名を記述する。
「ＭＨ－サービス記述子」は、編成チャンネル名とその事業者名を記述する。 The "video component descriptor" describes parameters, explanations, etc. relating to the video signal among the program element signals.
The "MH-Stream Identification Descriptor" is used to identify individual program element signals.
The "MH-Content Descriptor" describes the program genre.
The "MH-Parental Rate Descriptor" describes the age limit for viewing.
The "MH-audio component descriptor" describes parameters relating to audio signals among program elements.
The "MH-target area descriptor" describes the target area.
The "MH-series descriptor" describes series information spanning multiple events.
The "MH-SI transmission parameter descriptor" describes the parameters of SI transmission (such as periodicity group and retransmission period).
The "MH-broadcaster name descriptor" describes the broadcaster name.
The "MH-service descriptor" describes the name of the channel and the name of its operator.

「ＭＨ－データ符号化方式記述子」は、データ符号化方式を識別するために使用する。
「ＵＴＣ－ＮＰＴ参照記述子」は、ＮＰＴとＵＴＣの関係を伝達する。
「イベントメッセージ記述子」は、イベントメッセージ一般に関する情報を伝達する。
「ＭＨ－ローカル時間オフセット記述子」は、サマータイム制度実行時の、実際の時刻と人間系への表示時刻との差分時間を記述する。
「ＭＨ－ロゴ伝送記述子」は、簡易ロゴ用文字列、ＣＤＴ形式のロゴへのポインティングなどを記述する。
「ＭＰＵ拡張タイムスタンプ記述子」は、ＭＰＵ内のアクセスユニットの復号時刻等を提供する。
「ＭＰＵダウンロードコンテンツ記述子」は、ＭＰＵを用いてダウンロードされるコンテンツの属性情報を記述する。
「ＭＨ－アプリケーション記述子」は、アプリケーションの情報を記述する。
「ＭＨ－伝送プロトコル記述子」は、伝送プロトコルの指定と伝送プロトコルに依存したアプリケーションのロケーション情報を記述する。
「ＭＨ－簡易アプリケーションロケーション記述子」は、アプリケーションの取得先の詳細を記述する。 The "MH-Data Encoding Descriptor" is used to identify the data encoding method.
The "UTC-NPT reference descriptor" conveys the relationship between the NPT and the UTC.
The "Event Message Descriptor" conveys information about the event message in general.
The "MH-local time offset descriptor" describes the difference in time between the actual time and the time displayed to the human system when daylight saving time is in effect.
The "MH-logo transmission descriptor" describes a simple logo character string, pointing to a CDT format logo, and the like.
The "MPU extended timestamp descriptor" provides the decoding time of the access unit within the MPU, etc.
The "MPU download content descriptor" describes attribute information of content downloaded using the MPU.
The "MH-Application Descriptor" describes information about an application.
The "MH-transmission protocol descriptor" specifies the transmission protocol and describes the location information of the application depending on the transmission protocol.
The "MH-Simple Application Location Descriptor" describes in detail where the application is obtained.

「ＭＨ－アプリケーション境界権限設定記述子」は、アプリケーションバウンダリの設定、領域(ＵＲＬ)毎の放送リソースアクセス権限の設定を記述する。
「リンク先ＰＵ記述子」は、リンク先プレゼンテーションユニットの情報を記述する。
「アプリケーションサービス記述子」は、サービスに関連するアプリケーションのエントリー情報等を記述する。
「ＭＰＵノード記述子」は、当該ＭＰＵがデータディレクトリ管理テーブルにて規定されるディレクトリノードに対応することを示す。
「ＰＵ構成記述子」は、プレゼンテーションユニットを構成するＭＰＵのリストを示す。
「ＭＨ－階層符号化記述子」は、階層符号化された映像ストリームコンポーネントを識別するための情報を記述する。 The "MH-application boundary authority setting descriptor" describes the application boundary setting and the broadcast resource access authority setting for each area (URL).
The "link destination PU descriptor" describes information about the link destination presentation unit.
The "application service descriptor" describes entry information of the application related to the service.
The "MPU node descriptor" indicates that the MPU corresponds to a directory node defined in the data directory management table.
The "PU configuration descriptor" indicates a list of MPUs that make up a presentation unit.
The "MH-Hierarchical Coding Descriptor" describes information for identifying a hierarchically coded video stream component.

「コンテンツコピー制御記述子」は、当該サービス全体に対して、デジタルコピーに関
する制御情報を示す場合か、あるいは最大伝送レートを記述する場合に配置する。
「コンテンツ利用制御記述子」は、当該番組に対して、蓄積や出力に関する制御情報を記述する場合に配置する。また当該番組またはアセットに対して、「個数制限コピー可」を運用するかどうかの指定を行う場合に配置する。
「関連ブロードキャスタ記述子」は、ＮＶＲＡＭへのアクセスに必要なＢＳ／広帯域ＣＳデジタル放送のブロードキャスタおよび地上デジタル放送の系列の識別値を示す。
「マルチメディアサービス情報記述子」は、データコンテンツの有無や字幕の有無などマルチメディアサービスの個々のコンテンツに関する詳細情報を記述する。
「緊急ニュース記述子」は、安心安全に関わる緊急ニュース速報（緊急地震速報、臨時ニュース、速報スーパー）が放送中であることを示す。
「ＭＨ－ＣＡ契約情報記述子」は、サービス又はイベントが予約可能であることを確認する情報を記述する。
「ＭＨ－ＣＡサービス記述子」は、自動表示メッセージを運用する事業体の編成チャンネルを示し、当該メッセージの表示制御情報を記述する。 The "content copy control descriptor" is placed when indicating control information related to digital copying for the entire service or when describing the maximum transmission rate.
The "content usage control descriptor" is used to describe control information related to storage and output for the program. It is also used to specify whether or not "number-limited copying" is to be applied to the program or asset.
The "associated broadcaster descriptor" indicates the identification value of the broadcaster of the BS/broadband CS digital broadcasting and the affiliate of the terrestrial digital broadcasting required for accessing the NVRAM.
The "multimedia service information descriptor" describes detailed information about each content of a multimedia service, such as whether or not there is data content or subtitles.
The "emergency news descriptor" indicates that an emergency news flash (emergency earthquake alert, special news, super flash news) related to safety and security is currently being broadcast.
The "MH-CA contract information descriptor" describes information that confirms that a service or event can be reserved.
The "MH-CA service descriptor" indicates the organization channel of the business entity that operates the automatic display message, and describes the display control information of the message.

＜ＭＨ－音声コンポーネント記述子の配置＞
「ＭＨ－音声コンポーネント記述子」は、次のテーブルに配置される。
・ＭＰＴ（アセット記述子領域）
・ＭＨ－ＥＩＴ[ｐ/ｆａｃｔｕａｌ] （ＭＨ－ＥＩＴ[ｐ/ｆ]）
・ＭＨ－ＥＩＴ[ｓｃｈｅｄｕｌｅａｃｔｕａｌｂａｓｉｃ]（ＭＨ－ＥＩＴ[ｓｃｈｅｄｕｌｅｂａｓｉｃ]） <MH-Speech Component Descriptor Location>
The "MH-Speech Component Descriptor" is arranged in the following table.
MPT (Asset Descriptor Area)
・MH-EIT[p/f actual] (MH-EIT[p/f])
・MH-EIT[schedule actual basic] (MH-EIT[schedule basic])

「ＭＰＴ」は、「ＰＡメッセージ」に格納される。
「ＭＨ－ＥＩＴ[ｐ/ｆ]」は、現在と次のイベントに関する時系列情報であり、前者をｐｒｅｓｅｎｔ、後者をｆｏｌｌｏｗｉｎｇという。
「ＭＨ－ＥＩＴ[ｐ/ｆａｃｔｕａｌ]」及び「ＭＨ－ＥＩＴ[ｓｃｈｅｄｕｌｅａｃｔｕａｌｂａｓｉｃ]」は、自ＴＬＶストリームで運用しているサービスに含まれるイベントに関して記述したテーブルであり、「Ｍ２セクションメッセージ」に格納される。 The "MPT" is stored in the "PA message."
"MH-EIT[p/f]" is time-series information about the current and next events, the former being called "present" and the latter being called "following."
"MH-EIT[p/f actual]" and "MH-EIT[schedule actual basic]" are tables that describe events included in the service operated by the own TLV stream, and are stored in the "M2 section message."

なお、「ＭＨ－ＡＩＴ」は、アプリケーションのライフサイクル、制約等を指示する制御情報を示すテーブルでもある。「ＭＭＴ」は、複数の伝送路での一体的な伝送を可能とする多重化方式でもある。「ＭＰ４ＡＡＣ」は、ＩＳＯ/ＩＥＣ１４４９６－３により規定される音声符号化方式である。「ＭＰ４ＡＬＳ」（ＡＬＳ：ＡｕｄｉｏＬｏｓｓｌｅｓｓＣｏｄｉｎｇ）は、ＩＳＯ/ＩＥＣ１４４９６－３により規定される音声ロスレス符号化方式である。「ＭＰＴ」は、ＭＭＴパッケージテーブルの略である。「ＭＰＴ」は、アセットのリストやその位置等サービス（パッケージ）を構成する情報を与えるテーブルである。特定の情報を示す要素や属性をもつ。「テーブル」は、メッセージに格納され、ＭＭＴＰパケットにて伝送される。「テーブル」は、テーブルを格納するメッセージはテーブルに応じて決まっている。「パッケージ」とは、ＭＭＴ規格では、コンテンツの単位のことを表す。「メッセージ」は、テーブルや記述子を格納する。メッセージは、ＭＭＴＰペイロードに格納され、ＭＭＴＰパケットを用いて伝送される。 The "MH-AIT" is also a table that indicates control information that indicates the application lifecycle, constraints, etc. "MMT" is a multiplexing method that enables integrated transmission over multiple transmission paths. "MP4 AAC " is an audio encoding method defined by ISO/IEC 14496-3. "MP4 ALS" (ALS: Audio Lossless Coding) is an audio lossless encoding method defined by ISO/IEC 14496-3. "MPT" is an abbreviation for MMT Package Table. "MPT" is a table that provides information that constitutes a service (package), such as a list of assets and their locations. It has elements and attributes that indicate specific information. "Tables" are stored in messages and transmitted in MMTP packets. The messages that store "tables" are determined according to the table. "Package" refers to a unit of content in the MMT standard. "Messages" store tables and descriptors. The message is stored in an MMTP payload and transmitted using an MMTP packet.

「ＳＩ情報」は、多重化された情報の内容、識別情報などを記述した情報でもある。受信機２は、例えば「地上デジタル放送受信機」であり、ＩＦ信号の中から受信チャンネルの選局・復調、希望番組を選択・デコードしてベースバンド信号を出力する機能をもつ。ただし、受信機２は、「高度ＢＳデジタル放送受信機」であってもよく、この場合、これらの機能をもつことに加えて、１１.７ＧＨｚ～１２.７５ＧＨｚの周波数帯の高度ＢＳデジタル放送が受信可能な機器である。受信機２は、ＳＴＢ、ＩＲＤとの呼称もある。
「アイテム」は、ＭＭＴ伝送方式に基づくアプリケーションデータ伝送においてＭＰＵを構成する伝送の最小単位である。「アイテム」は、ファイルに相当する。「ＭＰＵ」は、１つのコンポーネント内に含まれる、アイテムの集合で構成される伝送単位である。「ＭＰＵ」は、提示単位（ＰＵ）或いは更新単位、蓄積制御単位に対応させる運用が想定される。 "SI information" is also information that describes the contents of the multiplexed information, identification information, etc. Receiver 2 is, for example, a "terrestrial digital broadcasting receiver" that has the functions of selecting and demodulating a receiving channel from the IF signal, selecting and decoding a desired program, and outputting a baseband signal. However, receiver 2 may also be an "advanced BS digital broadcasting receiver," in which case, in addition to having these functions, it is a device that can receive advanced BS digital broadcasting in the frequency band of 11.7 GHz to 12.75 GHz. Receiver 2 is also called an STB or IRD.
An "item" is the smallest unit of transmission that constitutes an MPU in application data transmission based on the MMT transmission method. An "item" corresponds to a file. An "MPU" is a transmission unit that is comprised of a collection of items contained within one component. It is expected that an "MPU" will be operated in correspondence with a presentation unit (PU), an update unit, or an accumulation control unit.

「コンポーネント」（アセット）は、１つのＩＰデータフローにおいて同一のパケットＩＤを持つ単位である。ＭＰＴにおいて、アセットとして参照される。「コンポーネント」は、後述するｃｏｍｐｏｎｅｎｔ＿ｔａｇで識別される。
データイベントにより伝送するアプリケーションセットが切り替わる。「アセット」は、ＭＭＴ方式により多重化された映像、音声などの伝送単位である。「アセットタイプ」は、各アセットにおいて伝送されている内容を示す種類である。「サイマル音声」は、同一イベント内において、異なる複数の音声モードで同時に伝送することである。「イベント」は、ニュース、ドラマなど、同一サービス（編成チャンネル）内で開始・終了時刻の決まったストリームの集合である。 A "component" (asset) is a unit that has the same packet ID in one IP data flow. In MPT, it is referred to as an asset. A "component" is identified by a component_tag, which will be described later.
The application set to be transmitted is switched depending on the data event. An "asset" is a transmission unit for video, audio, etc. multiplexed using the MMT method. An "asset type" indicates the type of content being transmitted in each asset. "Simultaneous audio" is the simultaneous transmission of multiple different audio modes within the same event. An "event" is a collection of streams with fixed start and end times within the same service (programming channel), such as news or drama.

［受信機２のハードウェア構成］
図８は、本実施形態に係る受信機２のハードウェア構成を示す概略図である。
受信機２は、チューナー２１１、復調器２１２、分離器２２、セレクタ２３１、音声デコーダー２３２、スピーカー２３４、映像デコーダー２４１、提示処理器２４２、ディスプレイ２４３、入出力装置２５１、補助記憶装置２５２、ＲＯＭ（ＲｅａｄＯｎＬｙＭｅｍｏｒｙ）２５３、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２５４、ＣＰＵ（中央演算処理装置）２５５、及び、通信チップ２５６を含んで構成される。
復調器２１２、分離器２２、セレクタ２３１、音声デコーダー２３２、スピーカー２３４を、音声処理部Ｍとも呼ぶ。なお、データを処理する構成（例えば、分離器２２、セレクタ２３１、音声デコーダー２３２、映像デコーダー２４１、提示処理器２４２）については、ソフトウェア（ＣＰＵ２５５による演算処理）で実現されてもよい。
図２、図４、又は図５の受信機２の各構成に対応するハードウェア構成については、図８において、図２、図４、又は図５の構成に付した番号の数字部分と同じ番号を付す。 [Hardware configuration of receiver 2]
FIG. 8 is a schematic diagram showing the hardware configuration of the receiver 2 according to this embodiment.
The receiver 2 is configured to include a tuner 211, a demodulator 212, a separator 22, a selector 231, an audio decoder 232, a speaker 234, a video decoder 241, a presentation processor 242, a display 243, an input/output device 251, an auxiliary storage device 252, a ROM (Read Only Memory) 253, a RAM (Random Access Memory) 254, a CPU (Central Processing Unit) 255, and a communication chip 256.
The demodulator 212, the separator 22, the selector 231, the audio decoder 232, and the speaker 234 are also referred to as an audio processing unit M. Note that the configuration for processing data (for example, the separator 22, the selector 231, the audio decoder 232, the video decoder 241, and the presentation processor 242) may be realized by software (arithmetic processing by the CPU 255).
8, the hardware components corresponding to the components of the receiver 2 in FIG. 2, FIG. 4, or FIG. 5 are assigned the same numbers as those assigned to the components in FIG. 2, FIG. 4, or FIG.

アンテナで受信されたデジタル放送信号は入力端子経由で受信機２に入力され、チューナー２１１、復調器２１２によりＴＬＶストリームに変換され、分離器２２によるＴＬＶ／ＭＭＴ分離処理を経て映像、音声、その他のアセット、及びＭＭＴの各種メッセージ・テーブルに分離される。スクランブルされたアセットは、ＴＬＶ／ＭＭＴ分離処理で抽出したＥＭＭ／ＥＣＭをＣＡＳモジュール（不図示）で処理し、得られた鍵によってデスクランブラにて復号される。映像アセットは、映像デコーダー２４１による映像デコード処理が行われ、文字及びグラフィックス画像の提示処理を経て出力される。音声アセットは、音声デコーダー２３２による音声デコード処理の後、出力される。映像・音声の出力については、受信機本体に映像・音声出力手段（ディスプレイ２４３・スピーカー２３４）を備えてもよいし、デコードされた映像音声信号を外部装置に出力するデジタル映像音声出力や、音声のみを外部装置に出力するデジタル音声出力を備えてもよい。さらに高速デジタルインタフェースを備えてもよい。
また、受信機内部に補助記憶装置２５２（ＨＤＤ等）等の蓄積手段を備え、放送の蓄積機能を備えてもよい。受信機２は、ＥＰＧに代表される受信機アプリやマルチメディアサービスで使用されるＲＡＭ２５４、サービスのロゴデータやＥＰＧデータを保存する補助記憶装置２５２（不揮発性メモリ：ＮＶＲＡＭ等）、フォントなどを保存するＲＯＭ２５３（ＮＶＲＡＭで代用も可）のメモリを持つ。 A digital broadcast signal received by an antenna is input to the receiver 2 via an input terminal, converted into a TLV stream by a tuner 211 and a demodulator 212, and separated into video, audio, other assets, and various MMT message tables through a TLV/MMT separation process by a separator 22. The scrambled assets are decoded by a descrambler using the key obtained by processing the EMM/ECM extracted by the TLV/MMT separation process in a CAS module (not shown). The video assets are decoded by a video decoder 241 and output after processing to present text and graphics images. The audio assets are decoded by an audio decoder 232 and then output. Regarding video and audio output, the receiver itself may be equipped with video and audio output means (display 243 and speaker 234), or may be equipped with a digital video and audio output that outputs decoded video and audio signals to an external device, or a digital audio output that outputs only audio to an external device. A high-speed digital interface may also be provided.
The receiver 2 may also be provided with a storage means such as an auxiliary storage device 252 (HDD, etc.) inside it, providing a broadcast storage function. The receiver 2 has the following memories: RAM 254 used by receiver applications such as EPG and multimedia services, auxiliary storage device 252 (non-volatile memory: NVRAM, etc.) for storing service logo data and EPG data, and ROM 253 (NVRAM can be used as a substitute) for storing fonts, etc.

分離器２２は、アセット情報を記述するテーブルにおいて、ＭＰＥＧ－Ｈオーディオ信号が存在するかどうかを記述子で判断する。ＭＰＥＧ－Ｈオーディオ信号は、ＭＰＥＧ－Ｈの音声のアセット（「ＭＰＥＧ－Ｈ音声アセット」とも称する）を含む。
分離器２２は、ＭＰＥＧ－Ｈオーディオ信号が存在すると判断した場合、受信機２がＭＰＥＧ－Ｈの音声デコード能力があるときには、ＭＰＥＧ－Ｈオーディオ信号のデータからＭＰＥＧ－Ｈ音声アセット、及び、１又は複数のＭＰＥＧ－４の音声のアセット（ＭＰＥＧ－４アセット）の各々を分離する。分離器２２は、ＭＰＥＧ－Ｈオーディオ信号が存在しないと判断した場合、ＭＰＥＧ－４オーディオ信号のデータから、１又は複数のＭＰＥＧ－４音声アセットの各々を分離する。 The separator 22 determines whether an MPEG-H audio signal is present based on a descriptor in a table describing asset information. The MPEG-H audio signal includes an MPEG-H audio asset (also referred to as an "MPEG-H audio asset").
If the separator 22 determines that an MPEG-H audio signal is present, and if the receiver 2 has MPEG-H audio decoding capability, it separates each of the MPEG-H audio assets and one or more MPEG-4 audio assets (MPEG-4 assets) from the data of the MPEG-H audio signal. If the separator 22 determines that an MPEG-H audio signal is not present, it separates each of the one or more MPEG-4 audio assets from the data of the MPEG-4 audio signal.

＜入力端子・チューナー・復調器＞
受信機２には、デジタル放送信号を入力するための端子として、ＩＦ入力と光入力の２種類を有する。ただし、受信機はこのうちＩＦ入力を有し、光入力は有しなくてもよい。
チューナー２１１は、右旋帯域用ＩＦ周波数か左旋帯域用ＩＦ周波数、あるいはその両方に対応する。
復調器２１２は、フロントエンド信号処理を行う。 <Input terminal, tuner, demodulator>
The receiver 2 has two types of terminals for inputting digital broadcast signals: an IF input and an optical input. However, the receiver does not necessarily have to have the IF input but not the optical input.
The tuner 211 corresponds to a right-hand circular band IF frequency, a left-hand circular band IF frequency, or both.
Demodulator 212 performs front-end signal processing.

＜分離器・映像デコーダー＞
分離器２２によるＴＬＶ／ＭＭＴ分離処理は、ＴＬＶ分離、ＭＭＴ分離の２つの処理で構成される。放送伝送における受信機２は、最小でも１サービスあたり１２本のアセットを同時処理する能力を有する。受信機２は、１サービスあたりのアセット数は最大２２とされてもよい。映像アセットは、画面分割符号化が行われてもよい。また、受信機２は、本体に映像復号処理を内蔵せず、高速デジタルインタフェースからストリーム配信する機能等を搭載してもよい。受信機２は、ＳＤＲ（ＳｔａｎｄａｒｄＤｙｎａｍｉｃＲａｎｇｅ）対応ディスプレイへＨＤＲ（ＨｉｇｈＤｙｎａｍｉｃＲａｎｇｅ）映像を出力してもよい。映像伝達特性による映像切替受信機２は、映像コンポーネント記述子のｖＩＤｅｏ_ｔｒａｎｓｆｅｒ_ｃｈａｒａｃｔｅｒｉｓｔｉｃｓ値を監視し、受信した映像信号の伝達特性を識別する。 <Separator/Video Decoder>
The TLV/MMT separation process by the separator 22 consists of two processes: TLV separation and MMT separation. In broadcast transmission, the receiver 2 has the ability to simultaneously process at least 12 assets per service. The receiver 2 may handle up to 22 assets per service. Video assets may be split-screen encoded. Furthermore, the receiver 2 may not have a built-in video decoding process, but may instead be equipped with a function for streaming via a high-speed digital interface. The receiver 2 may output HDR (High Dynamic Range) video to an SDR (Standard Dynamic Range) compatible display. The video switching receiver 2 based on video transfer characteristics monitors the vIDEo_transfer_characteristics value of the video component descriptor and identifies the transfer characteristics of the received video signal.

＜音声デコーダー＞
外部擬似サラウンドプロセッサ用ダウンミックス処理及びステレオ音場拡大用ダウンミックス処理をオプションとして追加している受信機２においては、提示処理部２４２は、ダウンミックス設定状態をディスプレイ２４３に表示する。これにより、受信者は、設定状態を把握できる。
受信機２は、ＭＰＥＧ－４ＡＡＣ（ＡｄｖａｎｃｅｄＡｕｄｉｏＣｏｄｉｎｇ）音声ストリームのデジタル音声出力を装備する場合は、ＡＡＣ拡張に準拠し、放送の形式であるＬＡＴＭ/ＬＯＡＳ（Ｌｏｗ－ｏｖｅｒｈｅａｄＭＰＥＧ－４ＡｕｄｉｏＴｒａｎｓｐｏｒｔＭｕｌｔｉｐｌｅｘ／ＬｏｗＯｖｅｒｈｅａｄＡｕｄｉｏＳｔｒｅａｍ）によって多重化された形式で出力する。受信機２は、ＭＰＥＧ－４ＡＬＳ音声ストリームのデジタル音声出力を装備する場合は、ＡＬＳ拡張に準拠し、放送の形式であるＬＡＴＭ/ＬＯＡＳによって多重化された形式で出力する。 <Audio decoder>
In a receiver 2 that has downmix processing for an external pseudo surround processor and downmix processing for stereo sound field expansion added as options, the presentation processing unit 242 displays the downmix setting status on the display 243. This allows the receiver to understand the setting status.
If the receiver 2 is equipped with a digital audio output for an MPEG-4 AAC (Advanced Audio Coding) audio stream, it outputs the audio in a format multiplexed in accordance with the AAC extension and the LATM/LOAS (Low-overhead MPEG-4 Audio Transport Multiplex/Low Overhead Audio Stream) broadcast format.If the receiver 2 is equipped with a digital audio output for an MPEG-4 ALS audio stream, it outputs the audio in accordance with the ALS extension and the LATM/LOAS (Low Overhead Audio Stream) broadcast format.

＜出力端子＞
受信機２が備える出力端子として、デジタル映像音声出力端子、デジタル音声出力端子について以下に記載する。ただし、受信機２は、これらの出力端子の代わりに高速デジタルインタフェースを搭載してもよい。
なお、表示装置を本体に内蔵する受信機２の場合は、デジタル映像音声出力端子の装備をしなくてもよい。受信機２は、ＳＴＢなど表示装置（ディスプレイ２４３）を搭載しない受信機２の場合は、デジタル映像音声出力端子として、ＨＤＭＩ（登録商標、以下同じ）端子、ＭＨＬ／ｓｕｐｅｒＭＨＬ出力用の端子、又は、無線によるデジタル映像音声出力機能の端子のいずれかを装備する。 <Output terminal>
The following describes a digital video and audio output terminal and a digital audio output terminal as output terminals provided on the receiver 2. However, the receiver 2 may be equipped with a high-speed digital interface instead of these output terminals.
Note that the receiver 2 having a built-in display device does not need to be equipped with a digital video and audio output terminal. In the case of a receiver 2 that does not have a display device (display 243) such as an STB, the receiver 2 is equipped with an HDMI (registered trademark, the same applies hereinafter) terminal, a terminal for MHL/super MHL output, or a terminal for wireless digital video and audio output function as the digital video and audio output terminal.

受信機はデジタル音声出力端子として、光デジタル音声出力端子あるいは同軸デジタル音声出力端子を備えてもよい。またＨＤＭＩ端子を搭載し、ＨＤＭＩ１.４で定義されたＨＤＭＩオーディオリターンチャンネル（ＨＤＭＩ－ＡＲＣ）によるデジタル音声出力機能を設けてもよい。
デジタル音声出力端子にＭＰＥＧ－４ＡＡＣ音声ストリームを出力する場合には、ＡＡＣ拡張に準拠するが、２２.２ｃｈのマルチチャンネル音声の出力については、ＴＢＤとしてもよい。デジタル音声出力端子にＭＰＥＧ－４ＡＬＳ音声ストリームを出力する場合には、ＡＬＳ拡張に準拠するが、ＭＰＥＧ－４ＡＬＳストリームの出力については、ＴＢＤとしてもよい。 The receiver may be equipped with an optical digital audio output terminal or a coaxial digital audio output terminal as a digital audio output terminal. It may also be equipped with an HDMI terminal and have a digital audio output function using the HDMI Audio Return Channel (HDMI-ARC) defined in HDMI 1.4.
When an MPEG-4 AAC audio stream is output to the digital audio output terminal, it must conform to the AAC extension, but the output of 22.2ch multi-channel audio may be TBD. When an MPEG-4 ALS audio stream is output to the digital audio output terminal, it must conform to the ALS extension, but the output of the MPEG-4 ALS stream may be TBD.

ＮＶＲＡＭは、受信機ソフトウェアや全受信機共通データのダウンロード用のメモリ、ロゴデータなどＭＨ－ＣＤＴ方式で送信されるデータのダウンロード用メモリとして用いられる。ＮＶＲＡＭには、データの種類、全受信機共通データ（ジャンルコード表、番組特性コード表、予約語表）、ロゴデータ、マルチメディアサービス、メール受信等が保存され、例えば、ＭＰＥＧ－Ｈオーディオのデジタルミキサーが保存される。 NVRAM is used as memory for downloading receiver software and data common to all receivers, as well as downloading data transmitted using the MH-CDT system, such as logo data. NVRAM stores data types, data common to all receivers (genre code table, program characteristic code table, reserved word table), logo data, multimedia services, received emails, and, for example, MPEG-H audio digital mixers.

［音声アセットの選択と切替処理］
図９は、本実施形態に係る音声モードの一覧の一例を表す図である。
放送においては、音声モードとして図９のモードが運用される。このうち、受信機２単体で復号できる必要があるものはＭＰＥＧ－４ＡＡＣ１ｃｈ（高度広帯域ＣＳデジタル放送のみで運用）、ＡＡＣ２ｃｈ、ＡＡＣ５.１ｃｈ（２ｃｈダウンミックス処理可）の３つである。他のモードの音声アセットは受信機２内部で復号せず外部アンプにストリーム形式で出力し、外部アンプで復号してもよい。一方、ＡＡＣ１ｃｈ、ＡＡＣ２ｃｈ、ＡＡＣ５.１ｃｈ以外の音声モードが主音声として使用されている番組を、外部アンプが接続されていない受信機２で受信する場合を想定し、この３モード以外の音声を放送する場合は、すべての受信機２で復号可能なサイマル音声が異なる音声アセットで同時に放送される。 [Audio asset selection and switching process]
FIG. 9 is a diagram showing an example of a list of audio modes according to this embodiment.
In broadcasting, the audio modes shown in Figure 9 are used. Of these, the three that need to be decoded by the receiver 2 alone are MPEG-4 AAC1ch (only used in advanced wideband CS digital broadcasting), AAC2ch, and AAC5.1ch (2ch downmix processing possible). Audio assets of other modes may not be decoded within the receiver 2 but may be output in stream format to an external amplifier and decoded by the external amplifier. On the other hand, assuming a case where a program using an audio mode other than AAC1ch, AAC2ch, or AAC5.1ch as the main audio is received by a receiver 2 that is not connected to an external amplifier, when audio other than these three modes is broadcast, simulcast audio that can be decoded by all receivers 2 is broadcast simultaneously using different audio assets.

例えば、音声コーデックが「ＭＰＥＧ－Ｈ」の場合には、ＭＰＥＧ－４ＡＡＣ１ｃｈ、２ｃｈ、５．１ｃｈのいずれかがサイマル音声の組合せとして、異なる音声アセットで同時に放送される。なお、音声モードには、複数のＭＰＥＧ－Ｈ音声アセットが含まれてもよく、この場合、それぞれ、異なる音声アセットには異なるレベルのＭＰＥＧ－Ｈ音声アセットが含まれてもよい。例えばレベル（Ｌｅｖｅ３）が高い音声モードが放送される場合、それよりも低い特定のレベル（例えば、Ｌｅｖｅｌ１、２：チャンネル数が少ないレベル）の音声モードのサイマル音声が、異なる音声アセットで同時に放送されてもよい。
これにより、特定レベル（例えば、レベル３）のＭＰＥＧ－Ｈの放送を行うとき、そのレベルより低いレベル（例えば、レベル２）にしか対応していない受信機２でも、その低いレベルの音声アセットを選択することで、ＭＰＥＧ－Ｈ音声を再生できる。 For example, if the audio codec is "MPEG-H," any of MPEG-4 AAC 1ch, 2ch, and 5.1ch will be broadcast simultaneously using different audio assets as a simulcast audio combination. An audio mode may include multiple MPEG-H audio assets, and in this case, different audio assets may include MPEG-H audio assets of different levels. For example, when an audio mode with a high level (Level 3) is broadcast, simulcast audio of an audio mode with a lower specific level (e.g., Levels 1 and 2: levels with fewer channels) may be broadcast simultaneously using different audio assets.
This means that when MPEG-H of a particular level (e.g., level 3) is broadcast, even a receiver 2 that only supports a lower level (e.g., level 2) can play MPEG-H audio by selecting the audio asset of that lower level.

外部アンプが接続されていない受信機２は、このような複数の音声アセットの中から出音するアセットを１つ選択し、音声出力（スピーカー２３４など）に出力する。出力するアセットは、受信者が任意に選択可能とする、もしくは受信機２が対応可能な音声モードのアセットを自動的に選択する。自動選択の場合は、対応可能な音声モードのアセットの中で、コンポーネントタグ値の小さいアセットを優先することを基本とするが、言語や音声種別（解説音声など）の選択状況によって判断を変えてもよい。
受信機２は、ＭＰＥＧ－Ｈのアセットのコンポーネントタグ値を最小にすることで、ＭＰＥＧ－Ｈのアセットを自動選択してもよい。逆に、受信機２は、ＭＰＥＧ－４のアセットのコンポーネントタグ値を最小にすることで、ＭＰＥＧ－４のアセットを自動選択してもよい。ＭＰＥＧ－Ｈのアセットが存在する場合に、ＭＰＥＧ－４のアセットを選択した場合、受信機２は、ＭＰＥＧ－Ｈのアセットが存在する旨のメッセージを表示してもよい。 A receiver 2 that is not connected to an external amplifier selects one asset to output sound from among these multiple audio assets and outputs it to an audio output (such as a speaker 234). The asset to be output can be selected arbitrarily by the receiver, or the receiver 2 automatically selects an asset of a compatible audio mode. In the case of automatic selection, priority is basically given to an asset with a smaller component tag value among assets of a compatible audio mode, but the decision may be changed depending on the selection status of language or audio type (such as commentary audio).
The receiver 2 may automatically select an MPEG-H asset by minimizing the component tag value of the MPEG-H asset. Conversely, the receiver 2 may automatically select an MPEG-4 asset by minimizing the component tag value of the MPEG-4 asset. If an MPEG-H asset is present and an MPEG-4 asset is selected, the receiver 2 may display a message indicating that an MPEG-H asset is present.

＜音声処理部Ｍにおける信号処理の流れ＞
図１０は、本実施形態に係る受信機内の信号処理の流れの一例を表す概略図である。
この図は、音声処理部Ｍの一例である。音声処理部Ｍは、復調部２１２、ＴＬＶ／ＭＭＴ分離部２２、音声アセット選択部２３１、デコーダー部２３２、ミキサー部２３３１、ダウンミキサー（ＤＭＩＸ）部２３３２、スイッチ（ＳＷ）部２３３３、ＤＡＣ（Ｄｉｇｉｔａｌ－ＡｎａｌｏｇＣｏｎｖｅｒｔｅｒ）部２３３４、外部出力Ｉ／Ｆ（インターフェース）部２５１を含んで構成される。
図２、図４、又は図５の受信機２の各構成に対応する構成については、図１０において、図２、図４、又は図５の構成に付した番号の数字部分と同じ番号を付す。なお、ミキサー部２３３１、ダウンミキサー部２３３２、スイッチ部２３３３、及びＤＡＣ部２３３４は、図２のミキサー２３３に対応する。 <Signal processing flow in the audio processing unit M>
FIG. 10 is a schematic diagram showing an example of the flow of signal processing in the receiver according to this embodiment.
This diagram shows an example of the audio processing unit M. The audio processing unit M includes a demodulation unit 212, a TLV/MMT separation unit 22, an audio asset selection unit 231, a decoder unit 232, a mixer unit 2331, a downmixer (DMIX) unit 2332, a switch (SW) unit 2333, a DAC (Digital-Analog Converter) unit 2334, and an external output I/F (interface) unit 251.
10, the components corresponding to the components of the receiver 2 in Fig. 2, 4, or 5 are denoted by the same numbers as those used for the components in Fig. 2, 4, or 5. The mixer unit 2331, downmixer unit 2332, switch unit 2333, and DAC unit 2334 correspond to the mixer 233 in Fig. 2.

この図は、受信機２内における音声の信号処理の流れを示す。受信機２は、ＴＬＶ／ＭＭＴ分離処理部２２を経て、複数の音声アセットを取り出す。音声アセット選択部２３１は、この中から出音する音声アセットを選択し、デコーダー部２３２で復号し、出音する。ここで、デコーダー部２３２には、音声アセット選択部２３１で選択された音声アセットが入力され、その音声モード（音声コーデック）に応じた復号化が行われる。 This diagram shows the flow of audio signal processing within the receiver 2. The receiver 2 extracts multiple audio assets via the TLV/MMT separation processing unit 22. The audio asset selection unit 231 selects the audio asset to be output from these, and the decoder unit 232 decodes and outputs the audio. Here, the audio asset selected by the audio asset selection unit 231 is input to the decoder unit 232, which decodes it according to its audio mode (audio codec).

なお、スイッチ部２３３３は、複数の音声アセットの中から外部のＡＶアンプに適応するアセットを選択し、デコーダー部２３２と外部出力Ｉ／Ｆ部２５１に出力する。
デコーダー部２３２は、入力された音声アセットに応じて、音声アセットを復号化する。復号化されたデータ列がＭＰＥＧ－Ｈのデータ列の場合、つまり、音声アセットががＭＰＥＧ－Ｈ音声アセットの場合、デコーダー部２３２は、そのデータ列を、データ列をミキサー部２３３１に出力する。復号化されたデータ列が５．１ｃｈのＰＣＭデータ列である場合、デコーダー部２３２は、そのデータ列を、ダウンミキサー部２３３１へ出力する。復号化されたデータ列が２ｃｈのＰＣＭデータ列である場合、デコーダー部２３２は、そのデータ列を、スイッチ部２３３３へ出力する。
ミキサー部２３３１は、入力されたデータ列に対して、ＭＰＥＧ－Ｈ音声アセット内の音の素材ごとの音声を合成し、ダウンミックス処理を行う。ダウンミキサー部２３３１は、入力されたデータ列を、２ｃｈのＰＣＭデータに変換するダウンミックス処理を行う。ダウンミックス処理が行われたデータ列は、スイッチ部２３３３へ出力される。
スイッチ部２３３３は、音声アセット選択部２３１からの制御情報に基づく指示に応じて、ＤＡＣ部２３３４又は外部出力Ｉ／Ｆ（インターフェース）部２５１へ、データ列を出力する。ＤＡＣ部２３３４は、入力されたデータ列をアナログ音声信号に変換し、スピーカー２３４へ出力する。 The switch unit 2333 selects an asset suitable for an external AV amplifier from among a plurality of audio assets, and outputs the asset to the decoder unit 232 and the external output I/F unit 251 .
The decoder unit 232 decodes the audio asset according to the input audio asset. If the decoded data sequence is an MPEG-H data sequence, that is, if the audio asset is an MPEG-H audio asset, the decoder unit 232 outputs the data sequence to the mixer unit 2331. If the decoded data sequence is a 5.1ch PCM data sequence, the decoder unit 232 outputs the data sequence to the downmixer unit 2331. If the decoded data sequence is a 2ch PCM data sequence, the decoder unit 232 outputs the data sequence to the switch unit 2333.
The mixer unit 2331 performs downmixing processing on the input data stream by synthesizing the audio for each sound material in the MPEG-H audio asset. The downmixer unit 2331 performs downmixing processing to convert the input data stream into 2-channel PCM data. The downmixed data stream is output to the switch unit 2333.
The switch unit 2333 outputs a data string to the DAC unit 2334 or the external output I/F (interface) unit 251 in accordance with an instruction based on the control information from the audio asset selection unit 231. The DAC unit 2334 converts the input data string into an analog audio signal and outputs it to the speaker 234.

＜音声切替メニュー＞
図１１は、本実施形態に係る音声切替メニューの一例を示す図である。
音声切替メニューＦ８１は、ＭＰＥＧ－Ｈに対応している受信機２で表示される音声切替メニューの一例である。音声切替メニューＦ８２は、ＭＰＥＧ－Ｈに対応していない受信機２ａで表示される音声切替メニューの一例である。なお、音声切替メニューＦ８１は、ＭＰＥＧ－Ｈに対応している受信機２において、ＭＰＥＧ－Ｈ音声アセットが存在している場合（ＭＰＥＧ－Ｈ音声アセットとＭＰＥＧ－４音声アセットが存在する）に表示される音声切替メニューの一例でもある。音声切替メニューは、サイマル音声のいずれかの音声種別を選択するためのメニューである。なお、ＭＰＥＧ－Ｈ音声アセットを選択した場合、受信機２は、１つのアセットから、言語の異なる音声種別を分離して、音声切替メニューに再度表示するといった方法が考えられる。 <Audio switching menu>
FIG. 11 is a diagram showing an example of the audio switching menu according to the present embodiment.
The audio switching menu F81 is an example of an audio switching menu displayed on a receiver 2 that supports MPEG-H. The audio switching menu F82 is an example of an audio switching menu displayed on a receiver 2a that does not support MPEG-H. The audio switching menu F81 is also an example of an audio switching menu that is displayed on a receiver 2a that supports MPEG-H when MPEG-H audio assets are present (when MPEG-H audio assets and MPEG-4 audio assets are present). The audio switching menu is a menu for selecting one of the audio types of simulcast audio. When an MPEG-H audio asset is selected, the receiver 2 may separate audio types in different languages from a single asset and display them again in the audio switching menu.

音声種別Ｆ８１１は、音声がＭＰＥＧ－Ｈの音声を選択するための音声種別である。２つの音声種別Ｆ８１２は、言語が日本語、音声がＭＰＥＧ－４、５．１ｃｈ或いは２ｃｈの音声を選択するための音声種別である。音声種別Ｆ８１３は、言語が英語、音声がＭＰＥＧ－４、５．１ｃｈ或いは２ｃｈの音声を選択するための音声種別である。
なお、５．１ｃｈに対応している受信機２は、２ｃｈを選択するための音声種別は、表示してもしなくてもよい。また、音声種別は、ＭＨ－音声コンポーネント記述子のｔｅｘｔ＿ｃｈａｒ領域に記載の音声表記が用いられる。ＭＰＥＧ－Ｈ音声アセットにおいて、言語が複数存在する場合、ｔｅｘｔ＿ｃｈａｒ領域には、複数の音声種別（音声表記）が記載されてもよい。音声がＭＰＥＧ－Ｈである場合、メニュー画面では言語を選択させなくてもよい。この場合、例えば、音声種別Ｆ８１１は、音声がＭＰＥＧ－Ｈの音声を選択するための音声種別となる。 The audio type F811 is an audio type for selecting MPEG-H audio. The two audio types F812 are audio types for selecting Japanese as the language and MPEG-4, 5.1ch, or 2ch audio. The audio type F813 is an audio type for selecting English as the language and MPEG-4, 5.1ch, or 2ch audio.
Note that a receiver 2 that supports 5.1ch may or may not display the audio type for selecting 2ch. Furthermore, the audio type uses the audio notation described in the text_char field of the MH-audio component descriptor. If multiple languages exist in an MPEG-H audio asset, multiple audio types (audio notation) may be described in the text_char field. If the audio is MPEG-H, the language may not be selected on the menu screen. In this case, for example, audio type F811 is the audio type for selecting MPEG-H audio.

［ＭＨ－音声コンポーネント記述子］
図１２は、本実施形態に係るＭＨ－音声コンポーネント記述子の構造の一例を示す概略図である。
ＭＨ－音声コンポーネント記述子は、アセットに音声エレメンタリストリームの各パラメータを記述し、エレメンタリストリームを文字形式で表現するためにも使用される。ＭＰＥＧ－４オーディオは、音声構成（例えば、言語、チャンネル数）ごとに音声エレメンタリストリームとして多重化されている。ＭＰＥＧ－Ｈオーディオは、１つの音声エレメンタリストリームに、様々な音声構成が含まれる。 MH-Speech Component Descriptor
FIG. 12 is a schematic diagram showing an example of the structure of the MH-speech component descriptor according to this embodiment.
The MH-Audio Component Descriptor describes each parameter of an audio elementary stream in an asset and is also used to express the elementary stream in text format. MPEG-4 audio is multiplexed as an audio elementary stream for each audio configuration (e.g., language, number of channels). MPEG-H audio includes various audio configurations in one audio elementary stream.

ＭＨ－音声コンポーネント記述子において、各フィールドの意味は、次の通りである。
「ｄｅｓｃｒｉｐｔｏｒ＿ｔａｇ」は、ＭＨ－音声コンポーネント記述子であることを示す固定値を記述する。
「ｄｅｓｃｒｉｐｔｏｒ＿ｌｅｎｇｔｈ」は、ＭＨ－音声コンポーネント記述子の記述子長を記述する。 In the MH-Speech Component Descriptor, the meaning of each field is as follows:
"Descriptor_tag" describes a fixed value indicating that it is an MH-audio component descriptor.
"descriptor_length" describes the descriptor length of the MH-audio component descriptor.

「ｎｇａ＿ｐｒｏｆｉｌｅ＿ｌｅｖｅｌ」（フィールドＦ９１）は、ＭＰＥＧ－Ｈ音声アセットが存在するか否か（ＭＰＥＧ－Ｈオーディオの有無）を示す。また、ＭＰＥＧ－Ｈが存在する場合には、そのプロファイル及びレベルを示す。
プロファイルは、目的用途別に定義された機能の集合を表す。プロファイルには、「ＢａｓｉｃＰｒｏｆｉｌｅ」と「ＬｏｗＣｏｍｐｌｅｘｉｔｙＰｒｏｆｉｌｅ」（ＬＣ）の２種類がある。ＬＣは、標準的なプロファイルである。ＬＣから特定機能を省略したものが「ＢａｓｅｌｉｎｅＰｒｏｆｉｌｅ」である。なお、プロファイルは、３種類以上あってもよい。レベルは、処理能力を表し、処理能力に応じた情報である。レベルは、例えば処理の負荷や使用メモリ量であるが、チャンネル数に対応してもよい。プロファイルとレベルの組によって、例えば、機器の性能やビットストリームをデコードするのに必要な性能が特定されてもよい。 "nga_profile_level" (field F91) indicates whether or not an MPEG-H audio asset exists (whether or not MPEG-H audio exists). If MPEG-H exists, it indicates its profile and level.
A profile represents a set of functions defined for a specific purpose. There are two types of profiles: "Basic Profile" and "Low Complexity Profile" (LC). LC is a standard profile. A "Baseline Profile" is an LC with specific functions omitted. There may be three or more types of profiles. A level represents processing capability and is information corresponding to the processing capability. The level may correspond to, for example, the processing load or the amount of memory used, but may also correspond to the number of channels. A pair of profile and level may specify, for example, the performance of a device or the performance required to decode a bitstream.

「ｓｔｒｅａｍ＿ｃｏｎｔｅｎｔ」には、ＭＰＥＧ－４ＡＡＣの音声ストリームに対しては特定値（０ｘ０３）、ＭＰＥＧ－４ＡＬＳの音声ストリームに対しては別の値（０ｘ０４）が設定される。なお、ＭＰＥＧ－Ｈの音声ストリームに対しては、さらに別の値（例えば、０ｘ０５）が設定されてもよい。 "stream_content" is set to a specific value (0x03) for MPEG-4 AAC audio streams, and a different value (0x04) for MPEG-4 ALS audio streams. Note that an even different value (e.g., 0x05) may be set for MPEG-H audio streams.

「ｃｏｍｐｏｎｅｎｔ＿ｔｙｐｅ」は、音声コンポーネントの種別を規定し、８ビット（ｂ７－ｂ０）を、ｂ７: ダイアログ制御、ｂ６－ｂ５：障がい者用音声、ｂ４－ｂ０:音声モードと定義する。なお、「ｃｏｍｐｏｎｅｎｔ＿ｔｙｐｅ」は、ビット数を増やされ、値（例えばｂ８）を追加してもよく、追加された値をＭＰＥＧ－Ｈと定義してもよい。
「ｃｏｍｐｏｎｅｎｔ＿ｔａｇ」（コンポーネントタグ）は、コンポーネントストリームを識別するためのラベルであり、ＭＨ－ストリーム識別記述子内のコンポーネントタグと同一の値である。
「ｓｔｒｅａｍ＿ｔｙｐｅ」は、ＬＡＴＭ／ＬＯＡＳストリーム形式であることを示す固定値を記載する。 "component_type" specifies the type of audio component, with 8 bits (b7-b0) defined as b7: dialogue control, b6-b5: audio for disabled people, and b4-b0: audio mode. Note that "component_type" may have an increased number of bits, and a value (e.g., b8) may be added, and the added value may be defined as MPEG-H.
"component_tag" is a label for identifying a component stream, and has the same value as the component tag in the MH-stream identification descriptor.
"Stream_type" describes a fixed value indicating the LATM/LOAS stream format.

「ｓｉｍｕｌｃａｓｔ＿ｇｒｏｕｐ＿ｔａｇ」は、サイマルキャスト（同一内容を異なる符号化方式や音声モードで伝送）を行なっているコンポーネントに対して同じ番号を与える。サイマルキャストを行なっていないコンポーネントに対しては、特定値（０ｘＦＦ）に設定する。
「ｍａｉｎ＿ｃｏｍｐｏｎｅｎｔ＿ｆｌａｇ」は、その音声コンポーネントが主音声であるとき、特定値とする。
「ｑｕａｌｉｔｙ＿ｉｎｄｉｃａｔｏｒ」は、音質モードを表す。
「ｓａｍｐｌｉｎｇ＿ｒａｔｅ」は、サンプリング周波数を示す。
「ＩＳＯ＿６３９＿ｌａｎｇｕａｇｅ＿ｃｏｄｅ」は、音声コンポーネントの言語を示す。ＥＳ多言語モードのときは、第１音声コンポーネントの言語を示す。言語コードは、アルファベット３文字コードで表す。各文字は８ビットで記述し、その順で２４ビットフィールドに挿入される。
「ｔｅｘｔ＿ｃｈａｒ」は、音声種類名を記述する。この記述がデフォルトの文字列である場合はこのフィールドを省略してもよい。 The "simulcast_group_tag" is assigned the same number to components that are performing simulcast (transmitting the same content using different encoding methods or audio modes). For components that are not performing simulcast, a specific value (0xFF) is set.
"main_component_flag" has a specific value when the audio component is the main audio.
"quality_indicator" indicates the sound quality mode.
"sampling_rate" indicates the sampling frequency.
"ISO_639_language_code" indicates the language of the audio component. In ES multilingual mode, it indicates the language of the first audio component. The language code is expressed as a three-letter alphabet code. Each character is written in 8 bits and inserted in that order into the 24-bit field.
"text_char" describes the name of the audio type. If this description is a default character string, this field may be omitted.

ＭＰＥＧ－Ｈ、２２．２ｃｈサラウンド、又は５．１ｃｈサラウンドとサイマルで送出されるステレオ音声や、ＡＬＳ符号化方式とサイマルで送出されるＭＰＥＧ－４ＡＡＣステレオ音声などを受信機側で区別するために、ｓｉｍｕｌｃａｓｔ＿ｇｒｏｕｐ＿ｔａｇ（サイマルキャストグループ識別）を運用する。サイマルで送出する音声ではｓｉｍｕｌｃａｓｔ＿ｇｒｏｕｐ＿ｔａｇ値を同じ値で送出する。 The simulcast_group_tag (simulcast group identification) is used to distinguish between stereo audio transmitted simultaneously with MPEG-H, 22.2ch surround, or 5.1ch surround, and MPEG-4 AAC stereo audio transmitted simultaneously with the ALS encoding method, on the receiver side. The same simulcast_group_tag value is transmitted for audio transmitted simultaneously.

＜送出運用規則・受信処理規準＞
図１３は、本実施形態に係る送出運用規則の一例を表す概略図である。
図１４は、本実施形態に係る受信処理規準の一例を表す概略図である。
図１３は、ＭＨ－音声コンポーネント記述子の送出運用規則の一例であり、図１４は、ＭＨ－音声コンポーネント記述子の受信処理規準の一例である。 <Transmission Operation Rules and Reception Processing Standards>
FIG. 13 is a schematic diagram showing an example of the transmission operation rule according to this embodiment.
FIG. 14 is a schematic diagram showing an example of a receiving processing standard according to this embodiment.
FIG. 13 shows an example of sending operation rules for the MH-voice component descriptor, and FIG. 14 shows an example of receiving processing standards for the MH-voice component descriptor.

これらの例では、「ｎｇａ＿ｐｒｏｆｉｌｅ＿ｌｅｖｅｌ」は、ＭＰＥＧ－Ｈの有無、プロファイル、及びレベルを識別するためのラベルである。「ｎｇａ＿ｐｒｏｆｉｌｅ＿ｌｅｖｅｌ」は、４ビットのうち、先頭の１ビットにプロファイル、残りの３ビットにＭＰＥＧ－Ｈの有無及びレベルが設定される。プロファイルを表すフィールドは、「ｎｇａ＿ｐｒｏｆｉｌｅ」であり、ｂｓｌｂｆ（ｂｉｔｓｔｒｉｎｇ、ｌｅｆｔｂｉｔｆｉｒｓｔ：ビット列、左ビットが先頭)である。「ｎｇａ＿ｐｒｏｆｉｌｅ」は、値が「１」の場合にプロファイルが「ＢａｓｉｃＰｒｏｆｉｌｅ」であることを示し、値が「０」の場合にプロファイルが「ＬｏｗＣｏｍｐｌｅｘｉｔｙＰｒｏｆｉｌｅ」であることを示す。
ＭＰＥＧ－Ｈの有無及びレベルを表すフィールドは、「ｎｇａ＿ｌｅｖｅｌ」であり、ｕｉｍｓｂｆ（ｕｎｓｉｇｎｅｄｉｎｔｅｇｅｒｍｏｓｔｓｉｇｎｉｆｉｃａｎｔｂｉｔｆｉｒｓｔ：符号無し整数、最上位ビットが先頭）である。「ｎｇａ＿ｌｅｖｅｌ」は、値が「０」の場合、ＭＰＥＧ－Ｈオーディオ信号が存在しない（ＭＰＥＧ－Ｈオーディオでない）ことを示し、それ以外の値の場合、ＭＰＥＧ－Ｈオーディオ信号が存在する（ＭＰＥＧ－Ｈオーディオである）ことを示す。また、「ｎｇａ＿ｌｅｖｅｌ」は、値が「１」の場合にレベルが「Ｌｅｖｅｌ１」、値が「２」の場合にレベルが「Ｌｅｖｅｌ２」、値が「３」の場合にレベルが「Ｌｅｖｅｌ３」、値が「４」の場合にレベルが「Ｌｅｖｅｌ４」であることを示す。「ｎｇａ＿ｌｅｖｅｌ」の値が「５」～「７」については未使用であり、将来、これらの値を割り当てることができる。 In these examples, "nga_profile_level" is a label for identifying the presence or absence of MPEG-H, the profile, and the level. Of the four bits of "nga_profile_level," the profile is set in the first bit, and the presence or absence and level of MPEG-H are set in the remaining three bits. The field representing the profile is "nga_profile," which is bslbf (bit string, left bit first). When "nga_profile" has a value of "1," it indicates that the profile is "Basic Profile," and when it has a value of "0," it indicates that the profile is "Low Complexity Profile."
The field indicating the presence or absence and level of MPEG-H is "nga_level", which is uimsbf (unsigned integer most significant bit first: most significant bit first). When "nga_level" has a value of "0", it indicates that an MPEG-H audio signal is not present (not MPEG-H audio), and when it has any other value, it indicates that an MPEG-H audio signal is present (MPEG-H audio). Furthermore, when "nga_level" has a value of "1", it indicates that the level is "Level 1", when it has a value of "2", when it has a value of "Level 2", when it has a value of "3", and when it has a value of "4", it indicates that the level is "Level 4". The "nga_level" values "5" to "7" are unused and may be assigned in the future.

「ｓｔｒｅａｍ＿ｃｏｎｔｅｎｔ」では、ＭＰＥＧ－４ＡＡＣの音声ストリームに対しては０ｘ０３、ＭＰＥＧ－４ＡＬＳの音声ストリームに対しては０ｘ０４、ＭＰＥＧ－Ｈの音声ストリームに対しては０ｘ０５を設定する。受信機２は、「ｓｔｒｅａｍ＿ｃｏｎｔｅｎｔ」の値がこれらの値以外である場合、記述子を無効とする。 "stream_content" is set to 0x03 for MPEG-4 AAC audio streams, 0x04 for MPEG-4 ALS audio streams, and 0x05 for MPEG-H audio streams. If the value of "stream_content" is anything other than these values, the receiver 2 invalidates the descriptor.

「ｓｉｍｕｌｃａｓｔ＿ｇｒｏｕｐ＿ｔａｇ」については、ＭＰＥＧ－Ｈオーディオを送出・受信する場合には、ＭＰＥＧ－Ｈ音声アセットと１又は複数のＭＰＥＧ－４音声アセットに、同じ値の「ｓｉｍｕｌｃａｓｔ＿ｇｒｏｕｐ＿ｔａｇ」が設定される。この場合に、この値とは異なる値の「ｓｉｍｕｌｃａｓｔ＿ｇｒｏｕｐ＿ｔａｇ」が、１又は複数のＭＰＥＧ－４音声アセットに設定される。後者の「ｓｉｍｕｌｃａｓｔ＿ｇｒｏｕｐ＿ｔａｇ」は、ＭＰＥＧ－Ｈ音声アセットには設定されない。つまり、後者のサイマル音声は、ＭＰＥＧ－４音声アセットのみであり、ＭＰＥＧ－Ｈ音声アセットを含まない。
「ＩＳＯ＿６３９＿ｌａｎｇｕａｇｅ＿ｃｏｄｅ２」は、ＭＰＥＧ－Ｈ音声アセットの１又は複数の言語名を示してもよい。具体的には、「ＩＳＯ＿６３９＿ｌａｎｇｕａｇｅ＿ｃｏｄｅ」に第１言語（例えば、日本語）、「ＩＳＯ＿６３９＿ｌａｎｇｕａｇｅ＿ｃｏｄｅ２」に第２言語（例えば、英語）を記述してもよい。また、ＭＰＥＧ－Ｈ音声アセットについては、「ＩＳＯ＿６３９＿ｌａｎｇｕａｇｅ＿ｃｏｄｅ２」に複数の言語（例えば、日本語と英語）を記述してもよい。 Regarding "simulcast_group_tag," when transmitting/receiving MPEG-H audio, the same value of "simulcast_group_tag" is set for the MPEG-H audio asset and one or more MPEG-4 audio assets. In this case, a value of "simulcast_group_tag" different from this value is set for one or more MPEG-4 audio assets. The latter "simulcast_group_tag" is not set for the MPEG-H audio asset. In other words, the latter simulcast audio includes only MPEG-4 audio assets and does not include MPEG-H audio assets.
"ISO_639_language_code2" may indicate one or more language names of an MPEG-H audio asset. Specifically, a first language (e.g., Japanese) may be described in "ISO_639_language_code" and a second language (e.g., English) may be described in "ISO_639_language_code2." Furthermore, for an MPEG-H audio asset, multiple languages (e.g., Japanese and English) may be described in "ISO_639_language_code2."

ＭＨ－音声コンポーネント記述子の送出運用において、放送局１は、同一イベント内における音声ストリームのパラメータ更新時は、原則としてＭＰＴのＭＨ－音声コンポーネント記述子の内容を変更しＭＰＴのバージョン更新を行うが、例外として、本記述子を更新しない送出運用を行うことがある。この場合、音声ストリームとＭＨ－音声コンポーネント記述子の内容が一時的に不一致となる。例えば、番組本編からＣＭ等に移行する時や流動編成時などが想定される。この場合、放送局１は、ＭＰＴのバージョン更新をしないため、受信機側では同じコンポーネントタグ値の音声ストリームを再生し続ける。
このような音声ストリームのパラメータ更新時に本記述子を更新しない運用は、音声符号化方式がＡＡＣで、５．１ｃｈ以下の音声モード間で音声モードを切り替えるときに限り許容される。 In the transmission operation of the MH-audio component descriptor, when updating the parameters of the audio stream within the same event, the broadcasting station 1 generally changes the contents of the MH-audio component descriptor in the MPT and updates the MPT version. However, as an exception, there are cases where transmission operation is performed without updating this descriptor. In this case, the contents of the audio stream and the MH-audio component descriptor temporarily become inconsistent. For example, this may occur when transitioning from the main program to a commercial or during flexible programming. In this case, the broadcasting station 1 does not update the MPT version, so the receiver continues to play the audio stream with the same component tag value.
Such an operation in which this descriptor is not updated when updating the parameters of the audio stream is permitted only when the audio encoding method is AAC and the audio mode is switched between audio modes of 5.1ch or less.

ＭＨ－音声コンポーネント記述子の受信処理において、ＭＰＴのバージョンが更新し、音声ストリーム数や本記述子の内容が更新された場合は、受信機２は、本記述子の内容に従い、適切に音声再生を行う。受信機２は、ＭＰＴのバージョン更新が行われていなければ、原則として、同じコンポーネントタグ値の音声ストリームを再生し続ける。５．１ｃｈ以下の音声モード間での切り替えでは、音声ストリームと本記述子の内容が異なる場合がある。その時は、受信機２は、音声ストリームの内容を優先してデコードする。 If the MPT version is updated during the MH-Audio Component Descriptor reception process, and the number of audio streams or the contents of this descriptor are updated, the receiver 2 will play audio appropriately in accordance with the contents of this descriptor. As long as the MPT version has not been updated, the receiver 2 will, in principle, continue to play audio streams with the same component tag value. When switching between audio modes of 5.1ch or lower, the contents of the audio stream and this descriptor may differ. In such cases, the receiver 2 will prioritize decoding the contents of the audio stream.

［音声アセットの選択］
放送では複数の音声モード（ＭＰＥＧ－Ｈ、ＭＰＥＧ－４ＡＡＣ２ｃｈ、ＡＡＣ５．１ｃｈ、ＡＡＣ７．１ｃｈ、ＡＡＣ２２．２ｃｈ、ＡＬＳ２ｃｈ、ＡＬＳ５．１ｃｈ）を運用する。モノラル、デュアルモノラルは運用しない。 Select an audio asset
The broadcast operates in multiple audio modes (MPEG-H, MPEG-4 AAC 2ch, AAC 5.1ch, AAC 7.1ch, AAC 22.2ch, ALS 2ch, ALS 5.1ch). Monaural and dual mono are not operated.

受信機２は、音声デコード機能として次の機能を有する。
・ＭＰＥＧ－４ＡＡＣ２ｃｈ再生
・ＭＰＥＧ－４ＡＡＣ５．１ｃｈから２ｃｈへのダウンミックス再生機能
これらの条件を満たすため、ＭＰＥＧ－Ｈ、ＭＰＥＧ－４ＡＡＣ７．１ｃｈ、又はＡＡＣ２２．２ｃｈの音声モードではＡＡＣ２ｃｈをサイマル運用とする（ＡＡＣ５．１ｃｈがサイマル運用となる場合も有る）。また、ＭＰＥＧ－Ｈオーディオモード又はＡＬＳ音声モードではＡＡＣ２ｃｈ、又はＡＡＣ５．１ｃｈをサイマル運用とする。 The receiver 2 has the following audio decoding functions:
・MPEG-4 AAC 2ch playback ・Downmix playback function from MPEG-4 AAC 5.1ch to 2ch To meet these requirements, AAC 2ch is used in simulcast mode in MPEG-H, MPEG-4 AAC 7.1ch, or AAC 22.2ch audio modes (AAC 5.1ch may also be used in simulcast mode). Also, AAC 2ch or AAC 5.1ch is used in simulcast mode in MPEG-H audio mode or ALS audio mode.

受信機２は、複数の音声アセット運用時は下記に従い切替・選択できる機能を有する。
受信機２は、受信機２本体における再生の際、受信機２本体で再生できる音声モードを判別して、その内コンポーネントタグ値の小さい順にアセットを優先して切替え再生する。なお、選局時には再生可能な一番小さいコンポーネントタグ値のアセットをデフォルト音声として再生する。
受信機２は、２ｃｈまでの再生環境の場合で、ＭＰＥＧ－Ｈ又はＡＡＣ５．１ｃｈに合わせＡＡＣ２ｃｈ音声がサイマル運用されている場合はＡＡＣ２ｃｈ音声を優先して再生する。受信機２は、特定のレベルまでの再生環境の場合で、ＭＰＥＧ－Ｈが運用されている場合は、特定レベル以下のレベルのうち、最大のレベル或いは最小のレベルを優先して再生する。
ただし、ＭＰＴのバージョン更新なく音声モードが切り替わった場合は、再生中のアセットをそのまま再生し続ける。 The receiver 2 has the function to switch and select as follows when multiple audio assets are in operation.
When playing back on the receiver 2 itself, the receiver 2 determines the audio mode that can be played back on the receiver 2 itself, and switches and plays back the asset with priority given to the asset with the smallest component tag value. When selecting a channel, the asset with the smallest component tag value that can be played back is played back as the default audio.
In a playback environment of up to 2ch, if AAC 2ch audio is simultaneously operated in accordance with MPEG-H or AAC 5.1ch, the receiver 2 will prioritize playback of AAC 2ch audio. In a playback environment of up to a specific level, if MPEG-H is operated, the receiver 2 will prioritize playback of the maximum or minimum level of the levels below the specific level.
However, if the audio mode is switched without updating the MPT version, the asset that is currently being played will continue to be played.

受信機２は、サイマルキャストグループ識別を参照して複数言語運用を判別し、コンポーネントタグ値の小さいアセット（言語）をデフォルト言語として再生する。受信機２は、ＭＰＥＧ－Ｈについては、予め定められたデフォルト言語で、音声を再生してもよい。受信機２は、言語切り替えを行った場合でも、再選局の場合はデフォルト言語に復帰する。受信機２は、ＭＰＥＧ－Ｈについては、言語固定モードを設けてもよい。
受信機２では、リモコンの音声ボタン等で、有効な音声アセットの選択がサイクリックに切り換えられる。例えば、受信機２では、ＭＰＥＧ－Ｈ音声アセットと１又は複数のＭＰＥＧ－４音声アセットの選択が、サイクリックに切り換えられる。
受信者がメニュー上で音声を選択するユーザーインタフェースでは、ＭＨ－音声コンポーネント記述子の情報に従い、音声情報を表示すること。なお、音声種別の表記文字にはＭＨ－音声コンポーネント記述子内のｔｅｘｔ＿ｃｈａｒ領域に記載の音声表記を優先する。ただし、受信機２は、ＭＰＥＧ－Ｈについては、予め定められた音声表記を優先してもよい。 The receiver 2 determines whether multiple languages are used by referring to the simulcast group identification and plays the asset (language) with the smallest component tag value as the default language. For MPEG-H, the receiver 2 may play audio in a predetermined default language. Even if the language is switched, the receiver 2 returns to the default language when the channel is reselected. For MPEG-H, the receiver 2 may have a language fixed mode.
In the receiver 2, the selection of a valid audio asset is cyclically switched by using an audio button on a remote control, etc. For example, in the receiver 2, the selection of an MPEG-H audio asset and one or more MPEG-4 audio assets is cyclically switched.
In the user interface where the receiver selects audio from a menu, audio information is displayed according to the information in the MH-Audio Component Descriptor. Note that the phonetic notation described in the text_char field in the MH-Audio Component Descriptor takes priority when describing the audio type. However, for MPEG-H, the receiver 2 may also take priority over a predetermined phonetic notation.

受信機２は、同一音声アセット内で音声モードが切り替わる場合、及び受信機が自動的に異なるアセットの音声に切り替える場合、受信者に不自然さを感じさせないように切り替える。受信機２の切替動作は、次のような動作である。
（１）約０．５秒前のＭＰＴ更新により音声モードやアセットの切替を把握した受信機２は、先行音声の出力をフェード処理後ミュートする。
（２）受信機２は、切替に必要な処理を実行後、ミュートを解除し後続音声の出力を再開する。切替処理にかかる時間は、音声アセットの切替有無や、更新される音声モードの種類によって異なる。一般的には、符号化方式の切替時間が最も長くかかる。切替処理の間は、送出側にて無音区間が設けられる。
（３）受信機２は、ＭＰＥＧ－４音声アセットから、ＭＰＥＧ－Ｈ音声アセットに切り替わる場合、ＭＰＥＧ－Ｈオーディオのデジタルミキサーを表示する。 When the audio mode is switched within the same audio asset, or when the receiver automatically switches to audio from a different asset, the receiver 2 switches in a way that does not sound unnatural to the receiver. The switching operation of the receiver 2 is as follows.
(1) The receiver 2, which has learned of the audio mode or asset switch due to the MPT update approximately 0.5 seconds ago, mutes the output of the preceding audio after fading it.
(2) After the receiver 2 has performed the necessary processing for switching, it cancels the mute and resumes outputting the subsequent audio. The time required for the switching process varies depending on whether the audio asset has been switched and the type of audio mode being updated. Generally, the time required for switching the encoding method is the longest. During the switching process, a silent interval is provided on the sending side.
(3) When the receiver 2 switches from an MPEG-4 audio asset to an MPEG-H audio asset, it displays the MPEG-H audio digital mixer.

［音声の切替動作］
図１５は、本実施形態に係る切替えの詳細例を表すフローチャートである。
この図は、受信機２の音声の切替動作を表す。次のステップＳ１０１～Ｓ１０４、Ｓ１１、Ｓ１１２、Ｓ１２１の処理、及びステップＳ１２２、Ｓ１２３の制御は、受信機２のコンピュータ（ＣＰＵ２５５：制御部）が行う。 [Audio switching operation]
FIG. 15 is a flowchart showing a detailed example of switching according to this embodiment.
This diagram shows the audio switching operation of the receiver 2. The processes of the following steps S101 to S104, S11, S112, and S121, and the control of steps S122 and S123 are performed by the computer of the receiver 2 (CPU 255: control unit).

（ステップＳ１０１）受信者がリモコン等で、又は、受信機２が自動で選局を行う。その後、ステップＳ１０２の処理が行われる。
（ステップＳ１０２）受信機２は、ＭＰＴを更新する。その後、ステップＳ１０３の処理が行われる。
（ステップＳ１０３）受信機２は、デフォルトアセットを確認する。具体的には、デフォルトアセットは、「ｃｏｍｐｏｎｅｎｔ＿ｔａｇ」が特定値のアセットである。デフォルトアセットは、アセット種別ごとに予め定められている。アセット種別が「放送伝送音声」の場合、特定値「０ｘ００１０」がデフォルトアセットに割り当てられている。この特定値が、変数ｉの初期値（ｉ＝０ｘ００１０）に設定される。その後、ステップＳ１０４の処理が行われる。 (Step S101) The receiver selects a channel using a remote control or the like, or the receiver 2 automatically selects a channel. Then, the process of step S102 is performed.
(Step S102) The receiver 2 updates the MPT, and then the process of step S103 is performed.
(Step S103) The receiver 2 checks the default asset. Specifically, the default asset is an asset whose "component_tag" has a specific value. Default assets are predetermined for each asset type. If the asset type is "broadcast transmission audio," a specific value "0x0010" is assigned to the default asset. This specific value is set as the initial value of the variable i (i = 0x0010). Then, the processing of step S104 is performed.

（ステップＳ１０４）受信機２は、「ｃｏｍｐｏｎｅｎｔ＿ｔａｇ」が放送伝送音声のアセットであるか否かを判定する。アセット種別が「放送伝送音声」のアセットには、「０ｘ００１０」～「０ｘ００２Ｆ」の値が割り当てられている。受信機２は、変数ｉの値が、「０ｘ００２Ｆ」以下であるか否かを判定することで、放送伝送音声のアセットであるか否かを判定する。
なお、ＭＰＥＧ－Ｈ音声アセットには、「ｃｏｍｐｏｎｅｎｔ＿ｔａｇ」において、ＭＰＥＧ－４音声アセットよりも、小さい値が割り当てられている。この場合、先に、ＭＰＥＧ－Ｈ音声アセットの再生可能性が判定される。ただし、ＭＰＥＧ－Ｈ音声アセットには、「ｃｏｍｐｏｎｅｎｔ＿ｔａｇ」において、ＭＰＥＧ－４音声アセットよりも、大きい値が割り当てられてもよい。
放送伝送音声のアセットであると判定された場合（Ｙｅｓ）、ステップＳ１１１の処理が行われる。一方、放送伝送音声のアセットでないと判定された場合（Ｎｏ）、ステップＳ１２１の処理が行われる。 (Step S104) The receiver 2 determines whether or not "component_tag" is an asset of broadcast transmission audio. Assets whose asset type is "broadcast transmission audio" are assigned values from "0x0010" to "0x002F." The receiver 2 determines whether or not the asset is an asset of broadcast transmission audio by determining whether or not the value of the variable i is equal to or less than "0x002F."
Note that MPEG-H audio assets are assigned a smaller value in "component_tag" than MPEG-4 audio assets. In this case, the playability of the MPEG-H audio assets is determined first. However, MPEG-H audio assets may be assigned a larger value in "component_tag" than MPEG-4 audio assets.
If it is determined that the asset is a broadcast transmission audio (Yes), the process of step S111 is performed. On the other hand, if it is determined that the asset is not a broadcast transmission audio asset (No), the process of step S121 is performed.

（ステップＳ１１１）受信機２は、自装置が再生可能なストリームであるか否かを判定する。具体的には、受信機２は、「ｓｔｒｅａｍ＿ｃｏｎｔｅｎｔ」、「ｃｏｍｐｏｎｅｎｔ＿ｔｙｐｅ」、「ｓｔｒｅａｍ＿ｔｙｐｅ」、及び「ｎｇａ＿ｐｒｏｆｉｌｅ＿ｌｅｖｅｌ」を用いて、自装置が再生可能なストリームであるか否かを判定する。 (Step S111 ) The receiver 2 determines whether the stream can be played back by the receiver 2 itself. Specifically, the receiver 2 determines whether the stream can be played back by the receiver 2 itself, using "stream_content,""component_type,""stream_type," and "nga_profile_level."

ＭＰＥＧ－Ｈオーディオが再生可能であるか否かについては、受信機２は、次の判定を行う。 To determine whether MPEG-H audio can be played, receiver 2 performs the following determination:

受信機２は、「ｎｇａ＿ｐｒｏｆｉｌｅ＿ｌｅｖｅｌ」から、「ｎｇａ＿ｐｒｏｆｉｌｅ」及び「ｎｇａ＿ｌｅｖｅｌ」を分離する。受信機２は、「ｎｇａ＿ｌｅｖｅｌ」が「０」であるかを判定することで、ＭＰＥＧ－Ｈオーディオ信号が存在しない（ＭＰＥＧ－Ｈオーディオでない）か否かを判定する。換言すれば、受信機２は、「ｎｇａ＿ｌｅｖｅｌ」が「０」でないか否かを判定することで、ＭＰＥＧ－Ｈオーディオ信号が存在する（ＭＰＥＧ－Ｈオーディオである）か否かを判定する。 Receiver 2 separates "nga_profile" and "nga_level" from "nga_profile_level". Receiver 2 determines whether an MPEG-H audio signal is not present (not MPEG-H audio) by determining whether "nga_level" is "0". In other words, receiver 2 determines whether an MPEG-H audio signal is present (is MPEG-H audio) by determining whether "nga_level" is not "0".

受信機２は、「ｎｇａ＿ｌｅｖｅｌ」が「０」である場合、ＭＰＥＧ－Ｈ音声アセットが存在しないＭＰＥＧ－４オーディオストリームであるので、自装置がこのＭＰＥＧ－４オーディオストリーム（ＭＰＥＧ－４オーディオ）を再生可能であるか否かを判定する。受信機２は、このＭＰＥＧ－４オーディオストリームを再生可能ではない場合（Ｎｏ）、ステップＳ１０４の処理が行われる。一方、再生可能である場合（Ｙｅｓ）、ステップＳ１１２の処理が行われる。 If "nga_level" is "0", the MPEG-4 audio stream does not contain any MPEG-H audio assets, so the receiver 2 determines whether its own device can play this MPEG-4 audio stream (MPEG-4 audio). If the receiver 2 cannot play this MPEG-4 audio stream (No), it proceeds to step S104. On the other hand, if it can be played (Yes), it proceeds to step S112.

受信機２は、「ｎｇａ＿ｌｅｖｅｌ」が「０」でない場合（「１」～「４」である場合、「１」以上である場合）、ＭＰＥＧ－Ｈ音声アセットが存在するＭＰＥＧ－Ｈオーディオストリームであるので、自装置がこのＭＰＥＧ－Ｈオーディオストリーム（ＭＰＥＧ－Ｈオーディオ）を再生可能であるか否かを判定する。受信機２は、このＭＰＥＧ－Ｈオーディオストリームを再生可能ではない場合（Ｎｏ）、ステップＳ１０４の処理が行われる。 If "nga_level" is not "0" (is between "1" and "4", or is "1" or greater), the MPEG-H audio stream contains an MPEG-H audio asset, so the receiver 2 determines whether its own device can play this MPEG-H audio stream (MPEG-H audio). If the receiver 2 cannot play this MPEG-H audio stream (No), it proceeds to step S104.

一方、再生できる場合、受信機２は、「ｎｇａ＿ｌｅｖｅｌ」が示すレベルが、自装置が再生可能なレベルであるか（再生可能なレベル以下であるか）を判定する。受信機２は、再生可能なレベルでない場合（Ｎｏ）、ステップＳ１０４の処理が行われる。
再生可能なレベルである場合、受信機２は、「ｎｇａ＿ｐｒｏｆｉｌｅ」が示すプロファイルが、自装置が再生可能なプロファイルであるかを判定する。再生可能なプロファイルでない場合（Ｎｏ）、ステップＳ１０４の処理が行われる。一方、再生可能なプロファイルである場合（Ｙｅｓ）、ステップＳ１１２の処理が行われる。
なお、ステップＳ１１１の判定の少なくとも１つで再生可能ではないと判定された場合（Ｎｏ）、変数ｉを１加算し、次の「ｃｏｍｐｏｎｅｎｔ＿ｔａｇ」値のアセットに対して、ステップＳ１０４の処理が行われる。 On the other hand, if playback is possible, the receiver 2 determines whether the level indicated by "nga_level" is at a level at which playback is possible for the receiver 2 itself (or whether it is at or below the level at which playback is possible). If the receiver 2 determines that the level is not at a level at which playback is possible (No), the process proceeds to step S104.
If the level is playable, the receiver 2 determines whether the profile indicated by "nga_profile" is a profile playable by the receiver 2 itself. If it is not a playable profile (No), the process proceeds to step S104. On the other hand, if it is a playable profile (Yes), the process proceeds to step S112.
If it is determined in at least one of the determinations in step S111 that the asset is not reproducible (No), the variable i is incremented by 1, and the process in step S104 is performed on the asset with the next "component_tag" value.

（ステップＳ１１２）受信機２は、サイマルの有無、及び言語の確認等を行う。具体的には、受信機２は、「ｓｉｍｕｌｃａｓｔ＿ｇｒｏｕｐ＿ｔａｇ」、「ＥＳ＿ｍｕｌｔｉ＿ｌｉｎｇｕａｌ＿ｆｌａｇ」、「ｍａｉｎ＿ｃｏｍｐｏｎｅｎｔ＿ｆｌａｇ」、「ＩＳＯ＿６３９＿ｌａｎｇｕａｇｅ＿ｃｏｄｅ」、「ＩＳＯ＿６３９＿ｌａｎｇｕａｇｅ＿ｃｏｄｅ２」、及び「ｔｅｘｔ＿ｃｈａｒ」を用いて、この処理が行われる。
受信機２は、ＭＰＥＧ－Ｈオーディオの言語について、「ＩＳＯ＿６３９＿ｌａｎｇｕａｇｅ＿ｃｏｄｅ」又は「ＩＳＯ＿６３９＿ｌａｎｇｕａｇｅ＿ｃｏｄｅ２」のいずれか或いは組み合わせをもちいてもよい。 (Step S112) The receiver 2 checks whether simulcast is available and the language, etc. Specifically, the receiver 2 performs this process using "simulcast_group_tag,""ES_multi_lingual_flag,""main_component_flag,""ISO_639_language_code,""ISO_639_language_code2," and "text_char."
The receiver 2 may use either or a combination of "ISO_639_language_code" or "ISO_639_language_code2" for the language of MPEG-H audio.

（ステップＳ１１３）受信機２は、Ｓ１１２の処理で取得したアセットの情報を、メモリ（ＲＡＭ２５４又は補助記憶装置２５２）において、リストに追加する。これにより、再生可能なストリームがリスト化される。その後、ステップＳ１０４の処理が行われる。ここで、受信機２は、変数ｉを１加算し、次の「ｃｏｍｐｏｎｅｎｔ＿ｔａｇ」値のアセットに対して、ステップＳ１０４の処理が行われる。 (Step S113) The receiver 2 adds the asset information acquired in the processing of S112 to a list in memory (RAM 254 or auxiliary storage device 252). This creates a list of playable streams. Then, the processing of step S104 is performed. Here, the receiver 2 increments the variable i by 1, and performs the processing of step S104 for the asset with the next "component_tag" value.

（ステップＳ１２１）受信機２は、再生可能なストリームのリスト（Ｓ１１３の処理でリスト化されたリスト）から、ストリームを選択する。選択は、受信機２が自動選択してもよいし、受信者が手動で選択してもよい。その後、ステップＳ１２２の処理が行われる。
（ステップＳ１２２）受信機２は、選択された言語等の表示を行う。選択は、受信機２に自動選択されてもよいし、受信者に手動で選択されてもよい。その後、ステップＳ１２３へ進む。
なお、ＭＰＥＧ－Ｈオーディオにおいて、１つのアセットに多言語の音声のデータ列が格納される場合がある。この場合、ステップＳ１２１でＭＰＥＧ－Ｈオーディオストリームが選択された場合、受信機２は、アセットを変更（選択）せずに、言語を選択できる。
（ステップＳ１２３）受信機２は、ステップＳ１２１及びＳ１２２で選択された音声ストリームを再生する。その後、ステップＳ１０２の処理が行われることで、受信機２は、このフローチャートの動作を繰り返す。 (Step S121) The receiver 2 selects a stream from the list of playable streams (the list created in the process of S113). The selection may be automatic by the receiver 2 or manual by the recipient. Then, the process of step S122 is performed.
(Step S122) The receiver 2 displays the selected language, etc. The selection may be automatic by the receiver 2 or manual by the recipient. Then, the process proceeds to step S123.
In MPEG-H audio, audio data streams in multiple languages may be stored in one asset. In this case, if an MPEG-H audio stream is selected in step S121, the receiver 2 can select a language without changing (selecting) the asset.
(Step S123) The receiver 2 plays back the audio stream selected in steps S121 and S122. After that, the receiver 2 repeats the operation of this flowchart by performing the process of step S102.

なお、受信機２は、特に指定しない場合は、最も小さい値のｃｏｍｐｏｎｅｎｔ＿ｔａｇ値のストリームを選択する（例えば、ステップＳ１２１の処理）。また、受信機２は、リストから受信者が任意に選択できるようにする（図１１参照）。受信機２は、任意に選択した場合は、その値を保持（ストリームを再生）しつづける。ただし、受信機２は、ｃｏｍｐｏｎｅｎｔ＿ｔａｇ値（再生しているストリーム）がなくなった場合、または、サイマル有無、言語等の条件が変化した場合には、適切な切り替え処理後、直ちに再生可能な最も小さい値のｃｏｍｐｏｎｅｎｔ＿ｔａｇ値のストリームを選択し再生する。 If no particular selection is made, the receiver 2 will select the stream with the smallest component_tag value (for example, the processing of step S121). Furthermore, the receiver 2 allows the recipient to select any desired selection from a list (see Figure 11). If a desired selection is made, the receiver 2 will continue to retain that value (play the stream). However, if the component_tag value (the stream being played) is lost, or if conditions such as the presence or absence of simulcast or language change, the receiver 2 will immediately select and play the stream with the smallest component_tag value that can be played after appropriate switching processing.

図１６は、本実施形態に係る切替えの別の詳細例を表すフローチャートである。
この図において、図１５と同じ処理については、同じ符号を付し、説明を省略する。 FIG. 16 is a flowchart showing another detailed example of switching according to this embodiment.
In this figure, the same processes as those in FIG. 15 are denoted by the same reference numerals, and the description thereof will be omitted.

（ステップＳ１１１１）受信機２は、「ｎｇａ＿ｌｅｖｅｌ」が「０」であるかを判定することで、ＭＰＥＧ－Ｈオーディオ信号が存在しないか否かを判定する。受信機２は、「ｎｇａ＿ｌｅｖｅｌ」が「０」である場合（Ｙｅｓ）、ステップＳ１１１４の処理が行われる。一方、ｎｇａ＿ｌｅｖｅｌ」が「０」でない場合（Ｎｏ）、ステップＳ１１１２の処理が行われる。
（ステップＳ１１１２）受信機２は、「ｎｇａ＿ｐｒｏｆｉｌｅ」が示すプロファイルが、自装置が再生（対応）可能なプロファイルであるかを判定する。受信機２は、再生可能なプロファイルでない場合（Ｎｏ）、ステップＳ１０４の処理が行われる。一方、再生可能なプロファイルである場合（Ｙｅｓ）、ステップＳ１１１３の処理が行われる。 (Step S1111) The receiver 2 determines whether or not an MPEG-H audio signal is present by determining whether "nga_level" is "0." If "nga_level" is "0" (Yes), the receiver 2 performs the process of step S1114. On the other hand, if "nga_level" is not "0" (No), the receiver 2 performs the process of step S1112.
(Step S1112) The receiver 2 determines whether the profile indicated by "nga_profile" is a profile that the receiver 2 can play (support). If the receiver 2 determines that the profile is not a playable profile (No), the receiver 2 performs the process of step S104. On the other hand, if the profile is a playable profile (Yes), the receiver 2 performs the process of step S1113.

（ステップＳ１１１３）受信機２は、「ｎｇａ＿ｌｅｖｅｌ」が示すレベルが、自装置が再生（対応）可能なレベルであるか（再生可能なレベル以下であるか）を判定する。再生可能なレベルでない場合（Ｎｏ）、ステップＳ１０４の処理が行われる。再生可能なレベルである場合（Ｙｅｓ）、ステップＳ１１３の処理が行われる。
（ステップＳ１１１４）受信機２は、自装置が再生可能なストリームであるか否かを判定する。具体的には、受信機２は、「ｓｔｒｅａｍ＿ｃｏｎｔｅｎｔ」、「ｃｏｍｐｏｎｅｎｔ＿ｔｙｐｅ」、及び「ｓｔｒｅａｍ＿ｔｙｐｅ」を用いて、自装置が再生可能なストリームであるか否かを判定する。再生可能なストリームでない場合（Ｎｏ）、ステップＳ１０４の処理が行われる。再生可能なストリームである場合（Ｙｅｓ）、ステップＳ１１２の処理が行われる。
なお、ステップＳ１１３の処理において、再生可能なレベルである場合（Ｙｅｓ）、ステップＳ１１１４又はＳ１１２の処理が行われてもよい。 (Step S1113) The receiver 2 determines whether the level indicated by "nga_level" is a level that the receiver itself can play (support) (whether it is below the playable level). If it is not a playable level (No), the process proceeds to step S104. If it is a playable level (Yes), the process proceeds to step S113.
(Step S1114) The receiver 2 determines whether or not the stream is playable by the receiver 2 itself. Specifically, the receiver 2 determines whether or not the stream is playable by the receiver 2 itself, using "stream_content", "component_type", and "stream_type". If the stream is not playable (No), the process of step S104 is performed. If the stream is playable (Yes), the process of step S112 is performed.
If the level is such that playback is possible (Yes) in the process of step S113, the process of step S1114 or S112 may be performed.

［ユースケース］
以下、本実施形態における放送（アセット）パターン、受信機、受信機における音声選択について、ユースケースの一例を示す。
放送パターン１（例えば番組Ａ）は、次の音声（アセット）１１～１４が放送されるパターンである。
放送パターン１：
（１Ａ）音声１１：ＭＰＥＧ－Ｈレベル４（２２．２ｃｈ（９／１０／３）、セリフ日本語／英語、解説音声日本語）
（１Ｂ）音声１２：ＭＰＥＧ－４（７．１ｃｈ、日本語）
（１Ｃ）音声１３：ＭＰＥＧ－４（ステレオ日本語）
（１Ｄ）音声１４：ＭＰＥＧ－４（ステレオ英語）
音声１１は、ＭＰＥＧ－Ｈオーディオの音声であり、レベル４、２２．２ｃｈの音声（ＢＧＭ）であり、セリフとして日本語と英語、解説音声として日本語の音声がある。音声１３は、音声１２とサイマルで送出される音声である。 [Use Case]
Below, an example of a use case will be shown regarding a broadcast (asset) pattern, a receiver, and audio selection in the receiver in this embodiment.
Broadcast pattern 1 (for example, program A) is a pattern in which the following audio (asset) 11 to 14 are broadcast.
Broadcast pattern 1:
(1A) Audio 11: MPEG-H Level 4 (22.2ch (9/10/3), dialogue Japanese/English, commentary Japanese)
(1B) Audio 12: MPEG-4 (7.1ch, Japanese)
(1C) Audio 13: MPEG-4 (stereo Japanese)
(1D) Audio 14: MPEG-4 (stereo English)
The audio 11 is MPEG-H audio, level 4, 22.2ch audio (BGM), and includes dialogue in Japanese and English, and commentary in Japanese. The audio 13 is audio transmitted simultaneously with the audio 12.

放送パターン２（例えば、番組Ａ以外の番組Ｂ）は、次の音声１１～１４が放送されるパターンである。
放送パターン２：
（２Ａ）音声１１：ＭＰＥＧ－Ｈレベル３（１１．１ｃｈ（４／７／０）、セリフ日本語／英語、解説音声日本語）
（２Ｂ）音声１２：ＭＰＥＧ－４（７．１ｃｈ、日本語）
（２Ｃ）音声１３：ＭＰＥＧ－４（ステレオ日本語）
（２Ｄ）音声１４：ＭＰＥＧ－４（ステレオ英語）
放送パターン１と放送パターン２を比較すると、ＭＰＥＧ－Ｈ（音声１１）のレベルが「レベル４」と「レベル３」であることと、チャンネル音声の数が「２２．２ｃｈ」と「１１．１ｃｈ」であることが異なる。なお、音声１３は、音声１２とサイマルで送出される音声である。 Broadcast pattern 2 (for example, program B other than program A) is a pattern in which the following sounds 11 to 14 are broadcast.
Broadcast pattern 2:
(2A) Audio 11: MPEG-H Level 3 (11.1ch (4/7/0), dialogue Japanese/English, commentary Japanese)
(2B) Audio 12: MPEG-4 (7.1ch, Japanese)
(2C) Audio 13: MPEG-4 (stereo Japanese)
(2D) Audio 14: MPEG-4 (stereo English)
Comparing broadcast pattern 1 and broadcast pattern 2, the differences are that the MPEG-H (audio 11) levels are "Level 4" and "Level 3," and the number of channel audio is "22.2ch" and "11.1ch." Audio 13 is audio that is transmitted simultaneously with audio 12.

受信機２には、下記の能力を持つ受信機が存在する。
受信機Ａは、ＭＰＥＧ－Ｈオーディオのレベル４に対応（再生）できる受信機である。
受信機Ｂは、ＭＰＥＧ－Ｈオーディオのレベル３に対応（再生）できる受信機である。
受信機Ｃは、ＭＰＥＧ－Ｈオーディオに対応していない（再生できない）受信機である。 Receiver 2 includes receivers with the following capabilities:
Receiver A is a receiver that can support (play back) level 4 of MPEG-H audio.
Receiver B is a receiver that can support (play back) level 3 of MPEG-H audio.
Receiver C is a receiver that does not support (cannot play) MPEG-H audio.

各受信機Ａ～Ｃにおける音声選択については、次の通りとなる。
受信機Ａは、ＭＰＥＧ－Ｈのレベル４以下及びＭＰＥＧ－４に対応できる。よって、受信機Ａは、放送パターン１、２の両方で、ＭＰＥＧ－Ｈの音声１１を選択する。
受信機Ｂは、ＭＰＥＧ－Ｈのレベル３以下及びＭＰＥＧ－４に対応できるが、ＭＰＥＧ－Ｈのレベル４には対応できない。よって、受信機Ｂは、放送パターン１では、ＭＰＥＧ－４の音声１２、１３、又は１４のいずれかを選択するが、放送パターン２では、ＭＰＥＧ－Ｈの音声１１を選択する。
受信機Ｃは、放送パターン１、２の両方でＭＰＥＧ－４の音声１２、１３、１４のいずれかを選択する。
なお、受信機２は、ＭＰＥＧ－Ｈの音声を選択する（できる）場合でも、ユーザー操作や設定により、ＭＰＥＧ－４の音声を選択してもよい。 The audio selection for each of the receivers A to C is as follows.
Receiver A is compatible with MPEG-H level 4 and below, as well as MPEG-4. Therefore, receiver A selects MPEG-H audio 11 for both broadcast patterns 1 and 2.
Receiver B is compatible with MPEG-H level 3 and below and MPEG-4, but not with MPEG-H level 4. Therefore, receiver B selects either MPEG-4 audio 12, 13, or 14 in broadcast pattern 1, but selects MPEG-H audio 11 in broadcast pattern 2.
Receiver C selects one of MPEG-4 audio 12, 13, and 14 for both broadcast patterns 1 and 2.
Even if the receiver 2 can select MPEG-H audio, it may select MPEG-4 audio through user operation or settings.

より具体的には、次の通りである。
＜受信機Ｂ：ＭＰＥＧ－Ｈレベル３に対応＞
放送パターン１の場合、受信機Ｂは、ＭＰＥＧ－４の音声１２、１３、又は１４のいずれかを選択できる。ここで、受信機Ｂは、サイマル音声（音声１２と音声１３）からの選択が重複しないようにするため、音声１２と音声１４の組、又は、音声１３と音声１４の組から、音声を選択する。
放送パターン２の場合、受信機Ｂは、ＭＰＥＧ－Ｈの音声１１を選択できる。 More specifically, it is as follows.
<Receiver B: MPEG-H Level 3 compatible>
In the case of broadcast pattern 1, receiver B can select any one of MPEG-4 audio 12, 13, or 14. Here, receiver B selects audio from the set of audio 12 and audio 14 or the set of audio 13 and audio 14 to prevent overlapping of selections from simulcast audio (audio 12 and audio 13).
In the case of broadcast pattern 2, receiver B can select MPEG-H audio 11.

放送パターン２の場合、受信機Ｂは、例えば７．１スピーカーを備えているときには、音声１１を選択し、選択した音声１１のレンダリング処理（ダウンコンバート）により、７．１ｃｈの音声を生成する。この場合、受信機Ｂは、次の音声（Ｂ１１）～（Ｂ１３）を再生できる。
（Ｂ１１）７．１ｃｈセリフ日本語
（Ｂ１２）７．１ｃｈセリフ英語
（Ｂ１３）７．１ｃｈセリフ日本語＋解説音声日本語 In the case of broadcast pattern 2, if receiver B is equipped with, for example, 7.1 speakers, it selects audio 11 and generates 7.1ch audio by rendering (down-converting) the selected audio 11. In this case, receiver B can play the following audio (B11) to (B13).
(B11) 7.1ch dialogue Japanese (B12) 7.1ch dialogue English (B13) 7.1ch dialogue Japanese + commentary audio Japanese

放送パターン２の場合、受信機Ｂは、例えばステレオヘッドホンを利用するときには、音声１１を選択し、選択した音声１１のレンダリング処理（ダウンコンバート）により、ステレオの音声を生成する。この場合、受信機Ｂは、次の音声（Ｂ２１）～（Ｂ２３）を再生できる。
（Ｂ２１）ステレオセリフ日本語
（Ｂ２２）ステレオセリフ英語
（Ｂ２３）ステレオセリフ日本語＋解説音声日本語 In the case of broadcasting pattern 2, when using stereo headphones, for example, receiver B selects audio 11 and generates stereo audio by rendering (down-converting) the selected audio 11. In this case, receiver B can play the following audio (B21) to (B23).
(B21) Stereo dialogue Japanese (B22) Stereo dialogue English (B23) Stereo dialogue Japanese + commentary audio Japanese

＜受信機Ａ：ＭＰＥＧ－Ｈレベル４に対応＞
受信機Ａが、２２．２ｃｈのスピーカーを備えている場合について説明する。
放送パターン１の場合、受信機Ａは、音声１１を選択し、選択した音声１１のレンダリング処理により、次の音声（Ａ１１）～（Ｂ１３）を再生できる。
（Ａ１１）２２．２ｃｈセリフ日本語
（Ａ１２）２２．２ｃｈセリフ英語
（Ａ１３）２２．２ｃｈセリフ日本語＋解説音声日本語 <Receiver A: MPEG-H Level 4 compatible>
A case where the receiver A is equipped with a 22.2ch speaker will be described.
In the case of broadcast pattern 1, receiver A selects audio 11, and by performing a rendering process on the selected audio 11, can reproduce the following audio (A11) to (B13).
(A11) 22.2ch dialogue Japanese (A12) 22.2ch dialogue English (A13) 22.2ch dialogue Japanese + commentary audio Japanese

放送パターン２の場合、受信機Ａは、音声１１を選択し、選択した音声１１のレンダリング処理（アップコンバート）により、次の音声（Ａ２１）～（Ｂ２３）を再生できる。
（Ａ２１）２２．２ｃｈセリフ日本語
（Ａ２２）２２．２ｃｈセリフ英語
（Ａ２３）２２．２ｃｈセリフ日本語＋解説音声日本語 In the case of broadcast pattern 2, receiver A selects audio 11, and by rendering (upconverting) the selected audio 11, can reproduce the following audio (A21) to (B23).
(A21) 22.2ch dialogue Japanese (A22) 22.2ch dialogue English (A23) 22.2ch dialogue Japanese + commentary audio Japanese

以上のように、本実施形態では、放送システムＳｙｓは、ＭＰＥＧ－Ｈオーディオを音声アセット（コンポーネントの一例）に含む放送を行う。受信機２の分離器２２は、放送の放送波から、ｎｇａ＿ｐｒｏｆｉｌｅ＿ｌｅｖｅｌ（ＭＰＥＧ－Ｈオーディオが存在するか否かを示す識別情報の一例）を、多重化レイヤーで取得する。ＣＰＵ２５５（制御部の一例）は、ｎｇａ＿ｐｒｏｆｉｌｅ＿ｌｅｖｅｌに基づいて、受信機２の能力に応じた音声アセットを選択する。音声デコーダー２３２は、選択された音声アセットの音声データを復号化する。
これにより、放送システムＳｙｓでは、受信機２は、自装置の能力に応じて、適切な音声再生を行うことができる。 As described above, in this embodiment, the broadcasting system Sys broadcasts audio including MPEG-H audio as an audio asset (an example of a component). The separator 22 of the receiver 2 acquires nga_profile_level (an example of identification information indicating whether MPEG-H audio is present) from the broadcast airwaves at the multiplexing layer. The CPU 255 (an example of a control unit) selects an audio asset according to the capabilities of the receiver 2 based on the nga_profile_level. The audio decoder 232 decodes the audio data of the selected audio asset.
This allows the receiver 2 in the broadcasting system Sys to perform appropriate audio reproduction according to its own capabilities.

また、本実施形態では、ｎｇａ＿ｐｒｏｆｉｌｅ＿ｌｅｖｅｌは、ＭＭＴ（ＭＰＥＧＭｅｄｉａＴｒａｎｓｐｏｒｔ）の記述子であって、番組要素のうち音声信号に関するパラメータを記述する記述子であるＭＨ－音声コンポーネント記述子に配置される。分離器２２は、ＭＨ－音声コンポーネント記述子を取得する。ＣＰＵ２５５は、ＭＨ－音声コンポーネント記述子に配置されたｎｇａ＿ｐｒｏｆｉｌｅ＿ｌｅｖｅｌに基づいて、受信機２の能力に応じた音声コンポーネントを選択する。 In addition, in this embodiment, nga_profile_level is placed in the MH-audio component descriptor, which is an MMT (MPEG Media Transport) descriptor that describes parameters related to audio signals among program elements. The separator 22 acquires the MH-audio component descriptor. The CPU 255 selects an audio component according to the capabilities of the receiver 2 based on the nga_profile_level placed in the MH-audio component descriptor.

これにより、受信機２は、ＭＨ－音声コンポーネント記述子からｎｇａ＿ｐｒｏｆｉｌｅ＿ｌｅｖｅｌを読み出すことで、自装置の能力に応じて、適切な音声再生を行うことができる。また、受信機２は、音声デコーダー２３２へ音声データを入力する前（復号化を行う前）に、ＭＰＥＧ－Ｈオーディオが存在するか否かを判定できる。これにより、ＭＰＥＧ－Ｈに受信機２は、ＭＰＥＧ－Ｈオーディオを復号化できない場合や、復号化しなくてもよい場合（受信者がＭＰＥＧ－４オーディオを選択している場合等）に、ＭＰＥＧ－Ｈオーディオのデータの復号化を回避できる。 By reading the nga_profile_level from the MH-audio component descriptor, the receiver 2 can perform appropriate audio playback according to its own device's capabilities. Furthermore, the receiver 2 can determine whether MPEG-H audio is present before inputting the audio data to the audio decoder 232 (before decoding). This allows the receiver 2 to avoid decoding MPEG-H audio data when it cannot decode MPEG-H audio or when decoding is not necessary (for example, when the receiver has selected MPEG-4 audio).

また、本実施形態では、放送は、ＭＰＥＧ－Ｈオーディオ音声アセットと、ＭＰＥＧ－４オーディオ音声アセットと、を含む。音声デコーダー２３２は、ＭＰＥＧ－Ｈ音声アセットの音声データを復号化するミキサー部２３３１（第１復号化部の一例）と、ＭＰＥＧ－４音声アセットの音声データを復号化するダウンミキサー部２３３２（第２復号化部の一例）を備える。分離器２２は、ＭＰＥＧ－ＨオーディオとＭＰＥＧ－４オーディオを分離する。
ＣＰＵ２５５は、ＭＨ－音声コンポーネント記述子に配置されたｎｇａ＿ｐｒｏｆｉｌｅ＿ｌｅｖｅｌに基づいて、ＭＰＥＧ－Ｈオーディオを再生可能であると判定した場合に、ＭＰＥＧ－ＨオーディオとＭＰＥＧ－４オーディオのいずれかの音声アセットを選択する。音声デコーダー２３２は、選択されたＭＰＥＧ－Ｈオーディオ又はＭＰＥＧ－４オーディオのいずれかの音声アセットの音声データを復号化する。 In this embodiment, the broadcast includes MPEG-H audio assets and MPEG-4 audio assets. The audio decoder 232 includes a mixer unit 2331 (an example of a first decoding unit) that decodes audio data from the MPEG-H audio assets, and a downmixer unit 2332 (an example of a second decoding unit) that decodes audio data from the MPEG-4 audio assets. The separator 22 separates the MPEG-H audio from the MPEG-4 audio.
If the CPU 255 determines that MPEG-H audio can be played back based on the nga_profile_level placed in the MH-audio component descriptor, it selects either the MPEG-H audio or MPEG-4 audio audio asset. The audio decoder 232 decodes the audio data of the selected audio asset, either the MPEG-H audio or the MPEG-4 audio.

これにより、受信機２は、ＭＰＥＧ－Ｈオーディオ又はＭＰＥＧ－４オーディオを選択でき、受信機２は、自装置の能力に応じて、ＭＰＥＧ－Ｈオーディオ又はＭＰＥＧ－４の音声再生を行うことができる。ＭＰＥＧ－Ｈオーディオを再生可能である受信機２において、受信者は、ＭＰＥＧ－Ｈオーディオ又はＭＰＥＧ－４の音声を選択できる。 This allows the receiver 2 to select either MPEG-H audio or MPEG-4 audio, and the receiver 2 can play either MPEG-H audio or MPEG-4 audio depending on its own device's capabilities. On a receiver 2 that can play MPEG-H audio, the recipient can select either MPEG-H audio or MPEG-4 audio.

また、本実施形態では、放送は、ＭＰＥＧ－Ｈオーディオ音声アセットと、ＭＰＥＧ－４音声アセットと、を含む。音声デコーダー２３２は、ＭＰＥＧ－４音声アセットの音声データを復号化する。分離器２２は、ＭＰＥＧ－ＨオーディオとＭＰＥＧ－４オーディオを分離する。ＣＰＵ２５５は、ＭＨ－音声コンポーネント記述子に配置されたｎｇａ＿ｐｒｏｆｉｌｅ＿ｌｅｖｅｌに基づいて、ＭＰＥＧ－Ｈオーディオを再生可能ではないと判定した場合に、ＭＰＥＧ－４音声アセットを選択する。音声デコー２３２は、選択されたＭＰＥＧ－４オーディオアセットの音声データを復号化する。
これにより、受信機２は、自装置（外部アンプ、外部スピーカーを含む）がＭＰＥＧ－Ｈオーディオに対応していない場合に、ＭＰＥＧ－４オーディオを選択でき、受信機２は、自装置の能力に応じて、ＭＰＥＧ－Ｈオーディオ又はＭＰＥＧ－４の音声再生を行うことができる。 In this embodiment, the broadcast also includes MPEG-H audio assets and MPEG-4 audio assets. The audio decoder 232 decodes the audio data of the MPEG-4 audio assets. The separator 22 separates the MPEG-H audio from the MPEG-4 audio. The CPU 255 selects the MPEG-4 audio asset when it determines that the MPEG-H audio cannot be played based on the nga_profile_level placed in the MH-audio component descriptor. The audio decoder 232 decodes the audio data of the selected MPEG-4 audio asset.
This allows the receiver 2 to select MPEG-4 audio if its own device (including external amplifiers and external speakers) does not support MPEG-H audio, and the receiver 2 can play MPEG-H audio or MPEG-4 audio depending on the capabilities of its own device.

また、本実施形態では、音声アセットの各々は、ｃｏｍｐｏｎｅｎｔ＿ｔａｇで識別される。ＭＰＥＧ－４オーディオは、１又は複数の音声アセットから構成される（多重化されて送出される）。分離器２２は、ｃｏｍｐｏｎｅｎｔ＿ｔａｇの順序に従って、ＭＰＥＧ－Ｈ音声アセットと、ＭＰＥＧ－４音声アセットと、のＭＨ－音声コンポーネント記述子（構成情報の一例）を読み込み、ＭＨ－音声コンポーネント記述子からｎｇａ＿ｐｒｏｆｉｌｅ＿ｌｅｖｅｌを取得する。
これにより、受信機２は、音声アセットごとに、音声アセットの情報とともに、ＭＰＥＧ－Ｈオーディオであるか否かを判定できる。 In this embodiment, each audio asset is identified by a component_tag. MPEG-4 audio is composed of one or more audio assets (they are multiplexed and sent). The separator 22 reads the MH-audio component descriptors (an example of configuration information) of the MPEG-H audio asset and the MPEG-4 audio asset in the order of the component_tag, and obtains the nga_profile_level from the MH-audio component descriptor.
This allows the receiver 2 to determine for each audio asset whether it is MPEG-H audio along with the audio asset information.

また、本実施形態では、ＭＨ－音声コンポーネント記述子には、ｎｇａ＿ｐｒｏｆｉｌｅ＿ｌｅｖｅｌ（ｎｇａ＿ｌｅｖｅｌが０であるか否か）と、ｎｇａ＿ｌｅｖｅｌ（ＭＰＥＧ－Ｈオーディオの処理能力を表すレベルの一例）或いはｎｇａ＿ｐｒｏｆｉｌｅ（機能の集合を表すプロファイルの一例）と、が配置される。ＣＰＵ２５５は、ｎｇａ＿ｌｅｖｅｌ又はｎｇａ＿ｐｒｏｆｉｌｅに基づいて、受信機２の能力に応じた音声コンポーネントを選択する。
これにより、受信機２は、ＭＰＥＧ－Ｈオーディオが存在するか否かを判定できるとともに、ＭＰＥＧ－Ｈオーディオのレベル又はプロファイルを取得できる。受信機２は、レベルやプロファイルと自装置の能力に応じて、適切な音声再生を行うことができる。 In this embodiment, the MH-audio component descriptor contains nga_profile_level (whether nga_level is 0 or not), and nga_level (an example of a level indicating the processing capability of MPEG-H audio) or nga_profile (an example of a profile indicating a set of functions). The CPU 255 selects an audio component according to the capabilities of the receiver 2 based on nga_level or nga_profile.
This allows the receiver 2 to determine whether MPEG-H audio is present and to obtain the level or profile of the MPEG-H audio, allowing the receiver 2 to perform appropriate audio playback according to the level, profile, and the capabilities of its own device.

［変形例：ＭＰＥＧ－Ｈの再生可能判定］
受信機２は、ＭＰＥＧ－Ｈが再生可能であるか否かを、次のように判定してもよい。
＜ケース１＞
受信機２は、ＭＰＥＧ－Ｈオーディオであるか否かを、「ｃｏｍｐｏｎｅｎｔ＿ｔａｇ」を用いて判定してもよい。この場合、「ｃｏｍｐｏｎｅｎｔ＿ｔａｇ」において、ＭＰＥＧ－Ｈ音声アセットには特定値が割り当てられる。例えば、０ｘ００１０～０ｘ００１ＦにはＭＰＥＧ－４音声アセットのうち言語が日本語であるものを割り当て、０ｘ００２０～０ｘ００２５にはＭＰＥＧ－４音声アセットのうち言語が英語であるものを割り当てる。０ｘ００２６～０ｘ００２Ｆに、ＭＰＥＧ－Ｈ音声アセットを割り当てる。受信機２は、「ｃｏｍｐｏｎｅｎｔ＿ｔａｇ」が０ｘ００２６～０ｘ００２Ｆのいずれかである場合、ＭＰＥＧ－Ｈ音声アセットであると判定する。「ｃｏｍｐｏｎｅｎｔ＿ｔａｇ」が０ｘ００２６～０ｘ００２Ｆのいずれでもない場合、受信機２は、ＭＰＥＧ－Ｈ音声アセットでないと判定する。この場合、受信機２は、ＭＰＥＧ－Ｈのプロファイルとレベルも識別可能に「ｃｏｍｐｏｎｅｎｔ＿ｔａｇ」を割り当ててもよい。受信機２は、ＭＰＥＧ－Ｈのプロファイルとレベルを、「ｎｇａ＿ｐｒｏｆｉｌｅ＿ｌｅｖｅｌ」に割り当ててもよい。 [Modification: MPEG-H Playability Determination]
The receiver 2 may determine whether or not MPEG-H can be played back as follows.
<Case 1>
The receiver 2 may determine whether an asset is MPEG-H audio using "component_tag." In this case, specific values are assigned to MPEG-H audio assets in "component_tag." For example, MPEG-4 audio assets in Japanese are assigned to 0x0010 to 0x001F, and MPEG-4 audio assets in English are assigned to 0x0020 to 0x0025. MPEG-H audio assets are assigned to 0x0026 to 0x002F. The receiver 2 determines that an asset is an MPEG-H audio asset if "component_tag" is any of 0x0026 to 0x002F. If "component_tag" is not any of 0x0026 to 0x002F, the receiver 2 determines that the asset is not an MPEG-H audio asset. In this case, the receiver 2 may assign the "component_tag" so that the MPEG-H profile and level can also be identified. The receiver 2 may assign the MPEG-H profile and level to the "nga_profile_level".

本変形例のケース１では、ステップＳ１０４の処理で（ステップＳ１１１の処理を行わなくても）、受信機２は、アセットがＭＰＥＧ－Ｈ音声アセットであるか否かを判定できる。また、新たなフィールドを設けなくても、受信機２は、アセットがＭＰＥＧ－Ｈ音声アセットであるか否かを判定できる。
なお、上記実施形態と本変形例１を比較すると、上記実施形態では、「ｎｇａ＿ｐｒｏｆｉｌｅ＿ｌｅｖｅｌ」に、ＭＰＥＧ－Ｈオーディオであるか否か、ＭＰＥＧ－Ｈのプロファイル及びレベルを格納するので、他のフィールド（本変形例では、ｃｏｍｐｏｎｅｎｔ＿ｔａｇ）の値の内容、制限（個数やバイト数等）、又はその規則を、既存の運用を保ったまま、運用することが可能である。 In Case 1 of this modified example, the receiver 2 can determine whether or not an asset is an MPEG-H audio asset in the process of step S104 (without performing the process of step S111). Furthermore, the receiver 2 can determine whether or not an asset is an MPEG-H audio asset without providing a new field.
Comparing the above embodiment with this variant 1, in the above embodiment, "nga_profile_level" stores whether it is MPEG-H audio, and the MPEG-H profile and level, so it is possible to operate the contents, restrictions (number, number of bytes, etc.) or rules of the values of other fields (component_tag in this variant) while maintaining the existing operation.

＜ケース２＞
受信機２は、ＭＰＥＧ－Ｈオーディオであるか否かを、「ｃｏｍｐｏｎｅｎｔ＿ｔｙｐｅ」を用いて判定してもよい。例えば、「ｃｏｍｐｏｎｅｎｔ＿ｔｙｐｅ」値に、ｂ８：ＭＰＥＧ－Ｈと定義する。受信機２は、「ｃｏｍｐｏｎｅｎｔ＿ｔｙｐｅ」がｂ８である場合、ＭＰＥＧ－Ｈ音声アセットと判定する。「ｃｏｍｐｏｎｅｎｔ＿ｔｙｐｅ」がｂ８でない場合、受信機２は、ＭＰＥＧ－Ｈ音声アセットでないと判定する。この場合、受信機２は、ＭＰＥＧ－Ｈのプロファイルとレベルも識別可能に「ｃｏｍｐｏｎｅｎｔ＿ｔｙｐｅ」を割り当ててもよい。受信機２は、ＭＰＥＧ－Ｈのプロファイルとレベルを、「ｎｇａ＿ｐｒｏｆｉｌｅ＿ｌｅｖｅｌ」に割り当ててもよい。
本変形例のケース２では、新たなフィールドを設けなくても、受信機２は、アセットがＭＰＥＧ－Ｈ音声アセットであるか否かを判定できる。
なお、上記実施形態と本変形例２を比較すると、上記実施形態では、「ｎｇａ＿ｐｒｏｆｉｌｅ＿ｌｅｖｅｌ」に、ＭＰＥＧ－Ｈオーディオであるか否か、ＭＰＥＧ－Ｈのプロファイル及びレベルを格納するので、他のフィールド（本変形例では、ｃｏｍｐｏｎｅｎｔ＿ｔｙｐｅ）の値の内容、制限（個数やバイト数等）、又はその規則を、既存の運用を保ったまま、運用することが可能である。 <Case 2>
The receiver 2 may determine whether the asset is MPEG-H audio using "component_type." For example, the "component_type" value may be defined as b8:MPEG-H. If "component_type" is b8, the receiver 2 determines that the asset is an MPEG-H audio asset. If "component_type" is not b8, the receiver 2 determines that the asset is not an MPEG-H audio asset. In this case, the receiver 2 may assign "component_type" so that the MPEG-H profile and level can also be identified. The receiver 2 may assign the MPEG-H profile and level to "nga_profile_level."
In case 2 of this modified example, the receiver 2 can determine whether or not an asset is an MPEG-H audio asset without providing a new field.
Comparing the above embodiment with this variant 2, in the above embodiment, "nga_profile_level" stores whether it is MPEG-H audio, and the MPEG-H profile and level, so it is possible to operate the contents, restrictions (number, number of bytes, etc.) or rules of the values of other fields (component_type in this variant) while maintaining the existing operation.

＜ケース３＞
受信機２は、ＭＰＥＧ－Ｈオーディオであるか否かを、「ｓｔｒｅａｍ＿ｃｏｎｔｅｎｔ」を用いて判定してもよい。上述のとおり、「ｓｔｒｅａｍ＿ｃｏｎｔｅｎｔ」値には、ＭＰＥＧ－４ＡＡＣの音声ストリームに対しては特定値（０ｘ０３）、ＭＰＥＧ－４ＡＬＳの音声ストリームに対しては別の値（０ｘ０４）が設定される。さらに、ＭＰＥＧ－Ｈの音声ストリームに対しては、さらに別の値（例えば、０ｘ０５）が設定する。
受信機２は、「ｓｔｒｅａｍ＿ｃｏｎｔｅｎｔ」が０ｘ０５である場合、ＭＰＥＧ－Ｈ音声アセットと判定する。「ｓｔｒｅａｍ＿ｃｏｎｔｅｎｔ」が０ｘ０５でない場合、受信機２は、ＭＰＥＧ－Ｈ音声アセットでないと判定する。この場合、受信機２は、ＭＰＥＧ－Ｈのプロファイルとレベルも識別可能に「ｓｔｒｅａｍ＿ｃｏｎｔｅｎｔ」を割り当ててもよい。受信機２は、ＭＰＥＧ－Ｈのプロファイルとレベルを、「ｎｇａ＿ｐｒｏｆｉｌｅ＿ｌｅｖｅｌ」に割り当ててもよい。
本変形例のケース３でも、新たなフィールドを設けなくても、受信機２は、アセットがＭＰＥＧ－Ｈ音声アセットであるか否かを判定できる。
なお、上記実施形態と本変形例３を比較すると、上記実施形態では、「ｎｇａ＿ｐｒｏｆｉｌｅ＿ｌｅｖｅｌ」に、ＭＰＥＧ－Ｈオーディオであるか否か、ＭＰＥＧ－Ｈのプロファイル及びレベルを格納するので、他のフィールド（本変形例では、ｓｔｒｅａｍ＿ｃｏｎｔｅｎｔ）の値の内容、制限（個数やバイト数等）、又はその規則を、既存の運用を保ったまま、運用することが可能である。 <Case 3>
The receiver 2 may use "stream_content" to determine whether the audio is MPEG-H audio. As described above, the "stream_content" value is set to a specific value (0x03) for an MPEG-4 AAC audio stream, and to another value (0x04) for an MPEG-4 ALS audio stream. Furthermore, a further value (e.g., 0x05) is set for an MPEG-H audio stream.
If "stream_content" is 0x05, the receiver 2 determines that the asset is an MPEG-H audio asset. If "stream_content" is not 0x05, the receiver 2 determines that the asset is not an MPEG-H audio asset. In this case, the receiver 2 may assign "stream_content" so that the MPEG-H profile and level can also be identified. The receiver 2 may assign the MPEG-H profile and level to "nga_profile_level."
In case 3 of this modified example, the receiver 2 can also determine whether an asset is an MPEG-H audio asset without providing a new field.
Comparing the above embodiment with this variant 3, in the above embodiment, "nga_profile_level" stores whether it is MPEG-H audio or not, and the MPEG-H profile and level, so it is possible to operate the contents, restrictions (number, number of bytes, etc.) or rules of the values of other fields (in this variant, stream_content) while maintaining the existing operation.

同様に、その他、ステップＳ１１１、Ｓ１１２で参照されるフィールド、例えば「ｓｔｒｅａｍ＿ｔｙｐｅ」又は「ａｓｓｅｔ＿ｔｙｐｅ」に、ＭＰＥＧ－Ｈオーディオであるか否かを示す情報、又は、プロファイル及びレベルを識別可能な情報を、設定してもよい。 Similarly, other fields referenced in steps S111 and S112, such as "stream_type" or "asset_type," may be set with information indicating whether the data is MPEG-H audio, or information that identifies the profile and level.

＜ケース４＞
本変形例では、新たな記述子として、「ＭＨ－ＭＰＥＧ－Ｈオーディオ記述子」を導入する。受信機２は、「ＭＨ－ＭＰＥＧ－Ｈオーディオ記述子」を用いて、ＭＰＥＧ－Ｈオーディオであるか否かの判定、及びプロファイル及びレベルを識別する。 <Case 4>
In this modification, a new descriptor, "MH-MPEG-H audio descriptor," is introduced. The receiver 2 uses the "MH-MPEG-H audio descriptor" to determine whether the audio is MPEG-H audio and to identify the profile and level.

図１７は、本変形例に係るＭＨ－ＭＰＥＧ－Ｈオーディオ記述子の構造の一例を示す概略図である。なお、「ＭＨ－ＭＰＥＧ－Ｈオーディオ記述子」（ＭＨ－ＭＰＥＧ－ＨＡｕｄｉｏＤｉｓｃｒｉｐｔｏｒ（））は、ＭＭＴ－ＳＩの記述子の１つであり、「ＭＨ－音声コンポーネント記述子」等の上記記述子に加えて、ＭＭＴ－ＳＩに記述される。 Figure 17 is a schematic diagram showing an example of the structure of an MH-MPEG-H audio descriptor related to this variant. Note that the "MH-MPEG-H audio descriptor" (MH-MPEG-H Audio Descriptor()) is one of the descriptors in MMT-SI, and is described in MMT-SI in addition to the above-mentioned descriptors such as the "MH-audio component descriptor."

「ＭＨ－ＭＰＥＧ－Ｈオーディオ記述子」において、各フィールドの意味は、次の通りである。
「ｄｅｓｃｒｉｐｔｏｒ＿ｔａｇ」は、ＭＨ－ＭＰＥＧ－Ｈオーディオ記述子であることを示す固定値を記述する。
「ｄｅｓｃｒｉｐｔｏｒ＿ｌｅｎｇｔｈ」は、ＭＨ－ＭＰＥＧ－Ｈオーディオ記述子の記述子長を記述する。 In the "MH-MPEG-H Audio Descriptor", the meaning of each field is as follows:
"Descriptor_tag" describes a fixed value indicating that it is an MH-MPEG-H audio descriptor.
"descriptor_length" describes the descriptor length of the MH-MPEG-H audio descriptor.

「ｎｇａ＿ｔｙｐｅ」は、次世代オーディオのタイプを示す。例えば、ＭＰＥＧ－Ｈオーディオには’１’が設定される。なお、ＭＰＥＧ－Ｈオーディオ以外の場合（ＭＰＥＧ－４の場合）、’０’が設定されてもよいし、他の次世代オーディオのタイプが設定されてもよい。
「ｐｒｏｆｉｌｅ＿ｌｅｖｅｌ」は、プロファイルとレベルを示す。「ｐｒｏｆｉｌｅ＿ｌｅｖｅｌ」では、３ビットのうち、先頭の１ビットがプロファイル、残りの２ビットがレベルを示す。例えば、先頭の１ビットは、値が「１」の場合にプロファイルが「ＢａｓｉｃＰｒｏｆｉｌｅ」であることを示し、値が「０」の場合にプロファイルが「ＬｏｗＣｏｍｐｌｅｘｉｔｙＰｒｏｆｉｌｅ」である。残りの２ビットは、値が「００」の場合にレベルが「Ｌｅｖｅｌ１」、値が「０１」の場合にレベルが「Ｌｅｖｅｌ２」、値が「１０」の場合にレベルが「Ｌｅｖｅｌ３」、値が「１１」の場合にレベルが「Ｌｅｖｅｌ４」である。
なお、プロファイルとレベルは別のフィールドであってもよい。 "nga_type" indicates the type of next-generation audio. For example, '1' is set for MPEG-H audio. Note that for audio other than MPEG-H audio (MPEG-4), '0' may be set, or another next-generation audio type may be set.
"profile_level" indicates a profile and a level. Of the three bits in "profile_level," the first bit indicates a profile, and the remaining two bits indicate a level. For example, if the first bit has a value of "1," it indicates that the profile is a "Basic Profile," and if the value is "0," it indicates that the profile is a "Low Complexity Profile." For the remaining two bits, if the value is "00," the level is "Level 1," if the value is "01," the level is "Level 2," if the value is "10," the level is "Level 3," and if the value is "11," the level is "Level 4."
The profile and the level may be separate fields.

「ｃｏｍｐｏｎｅｎｔ＿ｔａｇ」は、「ＭＨ－音声コンポーネント記述子」内のｃｏｍｐｏｎｅｎｔ＿ｔａｇと同一の値である。なお、「ＭＨ－ＭＰＥＧ－Ｈオーディオ記述子」の「ｃｏｍｐｏｎｅｎｔ＿ｔａｇ」は、ＭＨ－ストリーム識別記述子内のｃｏｍｐｏｎｅｎｔ＿ｔａｇとも、同一の値である。 The "component_tag" has the same value as the component_tag in the "MH-Audio Component Descriptor." Note that the "component_tag" in the "MH-MPEG-H Audio Descriptor" also has the same value as the component_tag in the MH-Stream Identification Descriptor.

「ｐｒｅｓｅｔ（）」は、プリセットの有無や数を示す。プリセットは、例えば、複数の音の素材ごとに、音声の値（音声を調整する値）が予め設定されたものである。プリセットの値が異なると、音の素材間の音声バランス等が異なる。プリセットには、例えば、音声の再生環境（スピーカーの種類や配置）に応じたプリセット、アクセス性改善のためのプリセット、視聴者の好みに合わせたプリセットが予め設定される。のためのプリセットには、ダイアログの音量だけを大きく（或いは小さく）調整する、音声解説を挿入する、又は、セカンドデバイス（例えば手元のスピーカー）で再生するプリセットがある。
プリセットを選択することにより、受信者は、全ての音の素材に対して細かい設定をしなくても、好みの音声（音のバランス等）を選択して、楽しむことができる。 "preset()" indicates whether or not there are presets and the number of presets. A preset is, for example, a preset in which audio values (values for adjusting the audio) are set in advance for each of multiple sound materials. Different preset values result in different audio balances between sound materials. Presets include, for example, presets that correspond to the audio playback environment (speaker type and placement), presets for improving accessibility, and presets that suit the viewer's preferences. Presets for include presets that increase (or decrease) the volume of only the dialogue, insert audio commentary, or play on a second device (for example, a speaker at hand).
By selecting a preset, the receiver can select and enjoy the audio he or she likes (sound balance, etc.) without having to make detailed settings for all the audio materials.

「ｉｎｔｅｒａｃｔｉｖｅ（）」は、インタラクティブの有無と操作内容を示す。
インタラクティブの有無は、受信者が各音の素材に対して調整できるか否かを示す。操作内容には、各音の素材を調整する操作内容を示す。操作内容には、例えば素材（音声調整対象のオブジェクト）の情報、調整内容（強弱やエフェクト等の種類、ユーザーインタフェースの種類、上限値や下限値等の条件）、調整ツール（ツール名、提供元、ダウンロード先等）等である。受信者は、放送番組に応じて、ＭＰＥＧ－Ｈの音声を調整することができる。 "Interactive( )" indicates whether or not there is an interactive function and the operation content.
The interactive status indicates whether the receiver can adjust each sound material. The operation details indicate the operation details for adjusting each sound material. The operation details include, for example, information about the material (object to be adjusted for sound), the adjustment details (type of strength or effect, type of user interface, conditions such as upper and lower limits), and the adjustment tool (tool name, provider, download destination, etc.). The receiver can adjust the MPEG-H audio according to the broadcast program.

＜送出運用規則・受信処理規準＞
図１８は、本変形例に係る送出運用規則の一例を表す概略図である。
図１９は、本変形例に係る受信処理規準の一例を表す概略図である。
図１８は、ＭＨ－ＭＰＥＧ－Ｈオーディオ記述子の送出運用規則の一例であり、図１９は、ＭＨ－ＭＰＥＧ－Ｈオーディオ記述子の受信処理規準の一例である。 <Transmission Operation Rules and Reception Processing Standards>
FIG. 18 is a schematic diagram showing an example of the transmission operation rules according to this modification.
FIG. 19 is a schematic diagram showing an example of a reception processing standard according to this modification.
FIG. 18 shows an example of the transmission operation rules for the MH-MPEG-H audio descriptor, and FIG. 19 shows an example of the reception processing standards for the MH-MPEG-H audio descriptor.

「ｄｅｓｃｒｉｐｔｏｒ＿ｔａｇ」には、固定値「０ｘ８０４０」が設定されている。この値は、「ＭＨ－音声コンポーネント記述子」内の「ｄｅｓｃｒｉｐｔｏｒ＿ｔａｇ」の固定値「０ｘ８０１４」よりも大きな値となっている。この場合、受信機２は、「ＭＨ－音声コンポーネント記述子」を、「ＭＨ－ＭＰＥＧ－Ｈオーディオ記述子」よりも優先順位が高いものとして取り扱われ、先に読み込む。換言すれば、「ＭＨ－ＭＰＥＧ－Ｈオーディオ記述子」を、「ＭＨ－音声コンポーネント記述子」の補助データとする。なお、「ｄｅｓｃｒｉｐｔｏｒ＿ｔａｇ」の値は、「０ｘ８０１４」よりも小さな値であってもよい。 The "descriptor_tag" is set to the fixed value "0x8040". This value is greater than the fixed value "0x8014" of the "descriptor_tag" in the "MH-audio component descriptor". In this case, the receiver 2 treats the "MH-audio component descriptor" as having a higher priority than the "MH-MPEG-H audio descriptor" and reads it first. In other words, the "MH-MPEG-H audio descriptor" is treated as auxiliary data for the "MH-audio component descriptor". Note that the value of "descriptor_tag" may be a value smaller than "0x8014".

受信機２は、ＭＰＴ（図７参照）において、「ＭＨ－ＭＰＥＧ－Ｈオーディオ記述子」がある（ｄｅｓｃｒｉｐｔｏｒ＿ｔａｇ値が「０ｘ８０４０」の記述子がある）と判定した場合、ＭＰＥＧ－Ｈ音声アセットがあると判定する。
この場合、受信機２は、「ＭＨ－ＭＰＥＧ－Ｈオーディオ記述子」に記述された「ｃｏｍｐｏｎｅｎｔ＿ｔａｇ」の音声アセットを、ＭＰＥＧ－Ｈ音声アセットを識別する。
なお、「ＭＨ－ＭＰＥＧ－Ｈオーディオ記述子」に、ＭＰＥＧ－Ｈ音声アセット以外の音声アセットについても記述される場合、受信機２は、「ｎｇａ＿ｔｙｐｅ」値（例えば、値が「０」：ＭＰＥＧ－Ｈ）に対応する「ｃｏｍｐｏｎｅｎｔ＿ｔａｇ」のアセットを、ＭＰＥＧ－Ｈ音声アセットとして識別してもよい。 If the receiver 2 determines that an "MH-MPEG-H audio descriptor" exists in the MPT (see Figure 7) (a descriptor with a descriptor_tag value of "0x8040"), it determines that an MPEG-H audio asset exists.
In this case, the receiver 2 identifies the audio asset of the "component_tag" described in the "MH-MPEG-H audio descriptor" as the MPEG-H audio asset.
In addition, if the "MH-MPEG-H Audio Descriptor" also describes audio assets other than MPEG-H audio assets, the receiver 2 may identify an asset with a "component_tag" corresponding to the "nga_type" value (e.g., a value of "0": MPEG-H) as an MPEG-H audio asset.

受信機２は、図１５のステップＳ１１１の処理において、変数ｉが、ＭＰＥＧ－Ｈ音声アセットとして識別した音声アセットの「ｃｏｍｐｏｎｅｎｔ＿ｔａｇ」値と一致する場合に、ＭＰＥＧ－Ｈオーディオ信号が存在すると判定する。一方、一致しない場合にはＭＰＥＧ－Ｈオーディオ信号が存在しないと判定する。
受信機２は、ＭＰＥＧ－Ｈオーディオ信号が存在すると判定する場合、自装置がこのＭＰＥＧ－Ｈオーディオストリームを再生可能であるか否かを判定する。受信機２は、このＭＰＥＧ－Ｈオーディオストリームを再生可能ではない場合（Ｎｏ）、ステップＳ１０４の処理が行われる。一方、再生できる場合、受信機２は、「ｐｒｏｆｉｌｅ＿ｌｅｖｅｌ」が示すレベルが、自装置が再生可能なレベルであるかを判定する。受信機２は、再生可能なレベルでない場合（Ｎｏ）、ステップＳ１０４の処理が行われる。再生可能なレベルである場合、受信機２は、「ｐｒｏｆｉｌｅ＿ｌｅｖｅｌ」が示すプロファイルが、自装置が再生可能なプロファイルであるかを判定する。受信機２は、再生可能なプロファイルでない場合（Ｎｏ）、ステップＳ１０４の処理が行われる。一方、再生可能なプロファイルである場合（Ｙｅｓ）、ステップＳ１１２の処理が行われる。 15, if the variable i matches the "component_tag" value of the audio asset identified as an MPEG-H audio asset, the receiver 2 determines that an MPEG-H audio signal is present. On the other hand, if they do not match, the receiver 2 determines that an MPEG-H audio signal is not present.
When the receiver 2 determines that an MPEG-H audio signal is present, it determines whether its own device can play back this MPEG-H audio stream. If the receiver 2 cannot play back this MPEG-H audio stream (No), it performs the process of step S104. On the other hand, if it can play back, it determines whether the level indicated by "profile_level" is a level at which its own device can play back. If the receiver 2 does not determine this level at which it can play back (No), it performs the process of step S104. If the level at which it can play back is a level at which it can play back, it performs the process of step S104. If the profile indicated by "profile_level" is a profile at which it can play back, it performs the process of step S112. If the profile indicated by "profile_level" is not a profile at which it can play back, it performs the process of step S112.

受信機２は、図１５のステップＳ１１３の処理において、Ｓ１１２の処理で取得したアセットの情報を、メモリにおいて、リストに追加する。ここで、受信機２は、ＭＰＥＧ－Ｈ音声アセットに対しては、「ｐｒｅｓｅｔ（）」及び「ｉｎｔｅｒａｃｔｉｖｅ（）」の情報も、リストに記載することで、メモリに格納する。
なお、受信機２は、ステップＳ１２２又はＳ１２３の処理において、「ｐｒｅｓｅｔ（）」の情報に基づいて、自動でプリセットを選択してもよいし、プリセットの内容を表示して受信者にプリセットを選択させてもよい。受信機２は、ステップＳ１２２又はＳ１２３の処理において、「ｉｎｔｅｒａｃｔｉｖｅ（）」の情報に基づいて、ＭＰＥＧ－Ｈオーディオのデジタルミキサーを表示してもよい。これらの場合、受信機２は、ステップＳ１２３において、選択されてプリセット、又は、デジタルミキサーによる調整結果に基づいて、音声ストリームを再生する。 15, the receiver 2 adds the asset information acquired in S112 to a list in memory. Here, for MPEG-H audio assets, the receiver 2 also stores the "preset()" and "interactive()" information in the list in memory.
In the process of step S122 or S123, the receiver 2 may automatically select a preset based on the information in "preset()," or may display the contents of the preset and allow the receiver to select a preset. In the process of step S122 or S123, the receiver 2 may display the MPEG-H audio digital mixer based on the information in "interactive()." In these cases, the receiver 2 plays back the audio stream based on the selected preset or the adjustment results by the digital mixer in step S123.

このように、本変形例では、放送には、ＭＭＴ（ＭＰＥＧＭｅｄｉａＴｒａｎｓｐｏｒｔ）の記述子であって、番組要素のうち音声信号に関するパラメータを記述する記述子であるＭＨ－音声コンポーネント記述子と、ＭＰＥＧ－Ｈオーディオに関するパラメータを記述する記述であるＭＰＥＧ－Ｈオーディオ記述子が含まれる。ｎｇａ＿ｔｙｐｅは、ＭＰＥＧ－Ｈオーディオ記述子に配置される。分離器２２は、ＭＨ－音声コンポーネント記述子とＭＰＥＧ－Ｈオーディオ記述子を取得する。ＣＰＵ２５５は、ＭＰＥＧ－Ｈオーディオ記述子に配置されたｎｇａ＿ｔｙｐｅに基づいて、受信機２の能力に応じた音声コンポーネントを選択する。
これにより、受信機２は、ＭＰＥＧ－Ｈオーディオ記述子からｎｇａ＿ｔｙｐｅを読み出すことで、自装置の能力に応じて、適切な音声再生を行うことができる。 As described above, in this modification, the broadcast includes an MH-audio component descriptor, which is an MMT (MPEG Media Transport) descriptor that describes parameters related to audio signals among program elements, and an MPEG-H audio descriptor that describes parameters related to MPEG-H audio. The nga_type is placed in the MPEG-H audio descriptor. The separator 22 acquires the MH-audio component descriptor and the MPEG-H audio descriptor. The CPU 255 selects an audio component according to the capabilities of the receiver 2, based on the nga_type placed in the MPEG-H audio descriptor.
This allows the receiver 2 to read out nga_type from the MPEG-H audio descriptor and perform appropriate audio playback according to the capabilities of its own device.

また、本変形例では、ｐｒｅｓｅｔ（）とｉｎｔｅｒａｃｔｉｖｅ（）は、ＭＰＥＧ－Ｈオーディオ記述子に配置される。分離器２２は、ＭＨ－音声コンポーネント記述子とＭＰＥＧ－Ｈオーディオ記述子を取得する。ＣＰＵ２５５は、ＭＰＥＧ－Ｈオーディオ記述子に配置されたｐｒｅｓｅｔ（）又はｉｎｔｅｒａｃｔｉｖｅ（）に基づいて、音声再生を行う。
これにより、受信機２は、ｐｒｅｓｅｔ（）に基づいて、音声再生を行うことができる。受信者は、全ての音の素材に対して細かい設定をしなくても、好みの音声（音のバランス等）を選択して、楽しむことができる。受信機２は、ｉｎｔｅｒａｃｔｉｖｅ（）に基づいて、音声を調整して、音声再生を行うことができる。受信者は、放送番組に応じて、ＭＰＥＧ－Ｈベースの音声を調整することができる。 In this modification, preset() and interactive() are placed in the MPEG-H audio descriptor. The separator 22 acquires the MH-audio component descriptor and the MPEG-H audio descriptor. The CPU 255 plays back audio based on preset() or interactive() placed in the MPEG-H audio descriptor.
This allows the receiver 2 to play back audio based on preset(). The receiver can select and enjoy the audio of their choice (sound balance, etc.) without having to make detailed settings for all audio materials. The receiver 2 can adjust the audio and play back the audio based on interactive(). The receiver can adjust the MPEG-H-based audio according to the broadcast program.

なお、上記実施形態（変形例を含む）において、アセットの優先順位は、以下のとおりであってもよい。映像および音声アセットについて、同一のａｓｓｅｔ＿ｔｙｐｅのアセットが１つのＭＰＴ中に複数定義されている場合、および複数のコンポーネント記述子（ＭＨ－音声コンポーネント記述子）がＭＨ－ＥＩＴに配置されている場合は、受信機２は、そのアセットの優先順位については、ｃｏｍｐｏｎｅｎｔ＿ｔａｇ値の小さい順とする。つまり、受信機２は、デフォルトアセットの優先順位がもっとも高く、値が大きくなるに従って優先度は低くなると判定する。この優先度は、例えばＥＰＧ上にストリーム一覧を表示する場合や、ストリーム切り替えボタンを押したときの表示順などに利用することができる。ただし、ｓｉｍｕｌｃａｓｔ＿ｇｒｏｕｐ＿ｔａｇ値が同じアセットでサイマル送出されている音声において、２ｃｈまでの再生環境の場合で、ＡＡＣ５．１ｃｈに合わせＡＡＣ２ｃｈ音声がサイマル運用されている場合は、受信機２は、ＡＡＣ２ｃｈ音声を優先して再生する。 In the above embodiment (including variations), asset priority may be as follows: For video and audio assets, if multiple assets of the same asset_type are defined in one MPT, and if multiple component descriptors (MH-audio component descriptors) are placed in the MH-EIT, the receiver 2 prioritizes the assets in ascending order of component_tag value. In other words, the receiver 2 determines that the default asset has the highest priority, with higher values indicating lower priorities. This priority can be used, for example, when displaying a list of streams on an EPG or in the display order when a stream switch button is pressed. However, for audio simultaneously transmitted using assets with the same simulcast_group_tag value, in a playback environment of up to 2ch, if AAC 2ch audio is simultaneously operated in accordance with AAC 5.1ch, the receiver 2 will prioritize playback of AAC 2ch audio.

図２０は、本実施形態（変形例を含む）に係るサイマル音声の運用の一例を示す概略図である。この図は、サイマルで送出する音声でのｓｉｍｕｌｃａｓｔ＿ｇｒｏｕｐ＿ｔａｇ値の運用例である。
この図において、音声モードが「ＭＰＥＧ－Ｈ」であるアセットは、ＭＰＥＧ－Ｈ音声アセットであり、言語が複数の言語（日本語と英語）である。この音声アセットには、ｃｏｍｐｏｎｅｎｔ＿ｔａｇ値「０ｘ００２６」が割り当てられている。この値は、他の音声アセット（ＭＰＥＧ－４音声アセット：０ｘ００１０～０ｘ００１３、０ｘ００２０～０ｘ００２３）よりも、大きい値が割り当てられている。つまり、この図の一例は、ＭＰＥＧ－Ｈ音声アセットの優先順位は、ＭＰＥＧ－４音声アセットよりも低く設定されている。 20 is a schematic diagram showing an example of the operation of simulcast_group_tag values in simulcast audio according to this embodiment (including modifications).
In this diagram, an asset with an audio mode of "MPEG-H" is an MPEG-H audio asset, and is available in multiple languages (Japanese and English). This audio asset is assigned a component_tag value of "0x0026." This value is higher than the other audio assets (MPEG-4 audio assets: 0x0010-0x0013, 0x0020-0x0023). In other words, in this example, the priority of MPEG-H audio assets is set lower than that of MPEG-4 audio assets.

この図には、２つのサイマル音声を運用する場合の一例である。第１サイマル音声は、ｓｉｍｕｌｃａｓｔ＿ｇｒｏｕｐ＿ｔａｇ値が「０ｘ００」のものであり、第２サイマル音声は、ｓｉｍｕｌｃａｓｔ＿ｇｒｏｕｐ＿ｔａｇ値が「０ｘ０１」のものである。
第１サイマル音声は、４個のＭＰＥＧ－４音声アセット（０ｘ００１０、０ｘ００１１、０ｘ００２０、０ｘ００２１）と１個のＭＰＥＧ－Ｈ音声アセット（０ｘ００２６）から構成されている。つまり、ＭＰＥＧ－４音声アセットとＭＰＥＧ－Ｈ音声アセットが含まれるサイマル音声である。
第２サイマル音声は、４個のＭＰＥＧ－４音声アセット（０ｘ００１２、０ｘ００１３、０ｘ００２２、０ｘ００２３）から構成されている。つまり、ＭＰＥＧ－４音声アセットのみのサイマル音声である。
この図は、音声が、第１サイマル音声と第２サイマル音声から構成されることを示す。 This diagram shows an example of operating two simulcast audios. The first simulcast audio has a simulcast_group_tag value of "0x00", and the second simulcast audio has a simulcast_group_tag value of "0x01".
The first simul audio is made up of four MPEG-4 audio assets (0x0010, 0x0011, 0x0020, 0x0021) and one MPEG-H audio asset (0x0026). In other words, it is simul audio that includes an MPEG-4 audio asset and an MPEG-H audio asset.
The second simul audio is made up of four MPEG-4 audio assets (0x0012, 0x0013, 0x0022, 0x0023). In other words, it is a simul audio made up of only MPEG-4 audio assets.
This diagram shows that the audio is made up of a first simul audio and a second simul audio.

図２１は、本実施形態（変形例を含む）に係るサイマル音声の運用の別の一例を示す概略図である。この図は、サイマルで送出する音声でのｓｉｍｕｌｃａｓｔ＿ｇｒｏｕｐ＿ｔａｇ値の運用例である。
この図において、音声モードが「ＭＰＥＧ－Ｈ」であるアセットは、ＭＰＥＧ－Ｈ音声アセットであり、言語が複数の言語（日本語と英語）である。この音声アセットには、ｃｏｍｐｏｎｅｎｔ＿ｔａｇ値「０ｘ００１０」が割り当てられている。この値は、他の音声アセット（ＭＰＥＧ－４音声アセット：０ｘ００１０、０ｘ００１２、０ｘ００２０、０ｘ００２１）よりも、小さい値が割り当てられている。つまり、この図の一例は、ＭＰＥＧ－Ｈ音声アセットの優先順位は、ＭＰＥＧ－４音声アセットよりも高く設定されている。
この図のサイマル音声は、ｓｉｍｕｌｃａｓｔ＿ｇｒｏｕｐ＿ｔａｇ値が「０ｘ００」のものであり、４個のＭＰＥＧ－４音声アセット（０ｘ００１０、０ｘ００１１、０ｘ００２０、０ｘ００２１）と１個のＭＰＥＧ－Ｈ音声アセット（０ｘ００２６）から構成されている。この図は、１つのサイマル音声で運用する場合の一例である。 21 is a schematic diagram showing another example of the operation of simulcast_group_tag values in simulcast audio according to this embodiment (including modifications).
In this diagram, an asset with an audio mode of "MPEG-H" is an MPEG-H audio asset, and is available in multiple languages (Japanese and English). This audio asset is assigned a component_tag value of "0x0010." This value is assigned a smaller value than other audio assets (MPEG-4 audio assets: 0x0010, 0x0012, 0x0020, 0x0021). In other words, in this example, the priority of MPEG-H audio assets is set higher than that of MPEG-4 audio assets.
The simulcast_group_tag value in this diagram is "0x00", and is made up of four MPEG-4 audio assets (0x0010, 0x0011, 0x0020, 0x0021) and one MPEG-H audio asset (0x0026). This diagram shows an example of operation with one simulcast audio.

なお、図２１において、図２０のように、複数のサイマル音声が存在してもよい。例えば、図２１の音声アセットには、図２０の４つのＭＰＥＧ－４音声アセット（０ｘ００１２、０ｘ００１３、０ｘ００２２、０ｘ００２３、ｓｉｍｕｌｃａｓｔ＿ｇｒｏｕｐ＿ｔａｇ値が「０ｘ０１」）が追加されてもよい。
また、図２０において、図２１のように、１つのサイマル音声（ＭＰＥＧ－Ｈ音声アセットの優先順位が低い）であってもよく、ｓｉｍｕｌｃａｓｔ＿ｇｒｏｕｐ＿ｔａｇ値が「０ｘ０１」の４つの音声アセットが除かれていてもよい。 Note that multiple simulcast audio may exist in Fig. 21, as in Fig. 20. For example, the four MPEG-4 audio assets in Fig. 20 (0x0012, 0x0013, 0x0022, 0x0023, simulcast_group_tag value "0x01") may be added to the audio assets in Fig. 21.
Also, in Figure 20, as shown in Figure 21, there may be one simulcast audio (MPEG-H audio assets have a lower priority), and the four audio assets with a simulcast_group_tag value of "0x01" may be excluded.

なお、上記実施形態（変形例を含む）において、「ＭＨ－音声コンポーネント記述子」に、ｐｒｅｓｅｔ（）又はｉｎｔｅｒａｃｔｉｖｅ（）が含まれてもよい。
なお、上記実施形態において、「音声」は「オーディオ」と置き換えてもよい。「アセット」は「コンポーネント」に、「コンポーネント」は「アセット」に置き換えられてもよい。 In the above embodiment (including the modified example), the "MH-audio component descriptor" may include preset() or interactive().
In the above embodiment, "voice" may be replaced with "audio.""Asset" may be replaced with "component," and "component" may be replaced with "asset."

なお、上述した実施形態における放送局１（放送装置１）、受信機２、放送局サーバ３、事業者サーバ４の一部、例えば、受信機２の分離器（Ｄｅｍｕｘ、ＴＬＶ／ＭＭＴ分離部）２２、２２ａ、セレクタ（音声アセット選択部）２３１、音声デコーダー（デコーダー部）２３２、ミキサー部２３３１、ダウンミキサー部２３３２、スイッチ部２３３３、ＤＡＣ部２３３４、ミキサー２３３、映像デコーダー２４１、提示処理器２４２、入出力装置２５１、ＣＰＵ２５５、通信チップ２５６の少なくとも一部をコンピュータで実現するようにしてもよい。その場合、この制御機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、放送局１、受信機２、放送局サーバ３、又は事業者サーバ４に内蔵されたコンピュータシステムであって、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよい。
また、上述した実施形態における放送局１、受信機２、放送局サーバ３、及び事業者サーバ４の一部、または全部を、ＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）等の集積回路として実現してもよい。放送局１、受信機２、放送局サーバ３、及び事業者サーバ４の各機能ブロックは個別にプロセッサ化してもよいし、一部、または全部を集積してプロセッサ化してもよい。また、集積回路化の手法はＬＳＩに限らず専用回路、または汎用プロセッサで実現してもよい。また、半導体技術の進歩によりＬＳＩに代替する集積回路化の技術が出現した場合、当該技術による集積回路を用いてもよい。 In addition, in the above-described embodiment, at least some of the broadcast station 1 (broadcast device 1), receiver 2, broadcast station server 3, and provider server 4, for example, the receiver 2's separator (Demux, TLV/MMT separator) 22, 22a, selector (audio asset selector) 231, audio decoder (decoder) 232, mixer 2331, downmixer 2332, switch 2333, DAC 2334, mixer 233, video decoder 241, presentation processor 242, input/output device 251, CPU 255, and communication chip 256, may be implemented by a computer. In this case, a program for implementing this control function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read into and executed by a computer system. Note that the "computer system" referred to here refers to a computer system built into the broadcast station 1, receiver 2, broadcast station server 3, or provider server 4, and includes hardware such as an OS and peripheral devices. Furthermore, "computer-readable recording media" refers to portable media such as flexible disks, optical magnetic disks, ROMs, and CD-ROMs, as well as storage devices such as hard disks built into computer systems. Furthermore, "computer-readable recording media" may also include devices that dynamically store programs for a short period of time, such as communication lines used when transmitting programs via networks such as the Internet or communication lines such as telephone lines, or devices that store programs for a fixed period of time, such as volatile memory within computer systems that serve as servers or clients in such cases. Furthermore, the above-mentioned programs may be programs that realize some of the aforementioned functions, or may be programs that can realize the aforementioned functions in combination with programs already stored in the computer system.
Furthermore, some or all of the broadcast station 1, receiver 2, broadcast station server 3, and provider server 4 in the above-described embodiment may be realized as an integrated circuit such as an LSI (Large Scale Integration). Each functional block of the broadcast station 1, receiver 2, broadcast station server 3, and provider server 4 may be individually implemented as a processor, or some or all of them may be integrated into a processor. Furthermore, the integrated circuit implementation method is not limited to LSI, and may be implemented using a dedicated circuit or a general-purpose processor. Furthermore, if an integrated circuit implementation technology that can replace LSI emerges due to advances in semiconductor technology, an integrated circuit based on that technology may be used.

以上、図面を参照してこの発明の一実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、この発明の要旨を逸脱しない範囲内において様々な設計変更等をすることが可能である。 One embodiment of the present invention has been described in detail above with reference to the drawings, but the specific configuration is not limited to that described above, and various design modifications can be made without departing from the spirit of the present invention.

放送システムＳｙｓ
中継局Ｓａ
放送局、放送装置１
ＭＰＥＧ－Ｈエンコーダ１１
ＭＰＥＧ－４エンコーダ１１１、１１２、１１３
Ｍｕｘ（マルチプレクサ）１２
受信機２
放送局サーバ３
事業者サーバ４
チューナー２１１
復調器２１２
分離器（Ｄｅｍｕｘ、ＴＬＶ／ＭＭＴ分離部）２２、２２ａ
セレクタ（音声アセット選択部）２３１
音声デコーダー（デコーダー部）２３２
ＭＰＥＧ－Ｈオーディオコアデコーダー２３２－１
ＭＰＥＧ－４デコーダー２３２－２
ミキサー部２３３１
ダウンミキサー部２３３２
ＭＰＥＧ－Ｈオーディオレンダラー２３３－１
ミキサー２３３－２
スイッチ部２３３３
ＤＡＣ部２３３４
ミキサー２３３
スピーカー２３４
映像デコーダー２４１
提示処理器２４２
ディスプレイ２４３
入出力装置（外部出力Ｉ／Ｆ）２５１
補助記憶装置２５２
ＲＯＭ２５３
ＲＡＭ２５４
ＣＰＵ２５５
通信チップ２５６ Broadcasting System Sys
Relay Station Sa
Broadcasting station, broadcasting equipment 1
MPEG-H Encoder 11
MPEG-4 Encoder 111, 112, 113
Mux (multiplexer) 12
Receiver 2
Broadcasting station server 3
Business server 4
Tuner 211
Demodulator 212
Separator (Demux, TLV/MMT separation section) 22, 22a
Selector (audio asset selection unit) 231
Audio decoder (decoder section) 232
MPEG-H Audio Core Decoder 232-1
MPEG-4 Decoder 232-2
Mixer section 2331
Downmixer section 2332
MPEG-H Audio Renderer 233-1
Mixer 233-2
Switch section 2333
DAC section 2334
Mixer 233
Speaker 234
Video decoder 241
Presentation Processor 242
Display 243
Input/output device (external output I/F) 251
Auxiliary storage device 252
ROM 253
RAM 254
CPU 255
Communication chip 256

Claims

a receiving unit for receiving a broadcast including MPEG-H audio in the audio component;
an acquisition unit that acquires, from the broadcast wave of the broadcast, identification information indicating whether or not MPEG-H audio is present at a multiplexing layer;
a selection unit that selects an audio component according to the capabilities of the receiver based on the identification information;
a decoding unit that decodes the audio data of the selected audio component;
the identification information is included in an audio component descriptor;
the acquisition unit acquires the audio component descriptor,
the selection unit selects an audio component according to a receiver's capability based on the identification information included in the audio component descriptor;
the broadcast wave includes a descriptor that describes information related to object-based audio;
The acquisition unit acquires a descriptor that describes information about the object-based audio.
Receiver .

The descriptor describing information related to the object-based audio includes a profile indicating a level or a set of functions indicating processing capabilities of MPEG-H audio;
The receiver according to claim 1 , wherein the selection unit selects an audio component according to a capability of the receiver based on the level or the profile.

a receiving unit for receiving a broadcast including MPEG-H audio in the audio component;
an acquisition unit that acquires, from the broadcast wave of the broadcast, identification information indicating whether or not MPEG-H audio is present at a multiplexing layer;
a selection unit that selects an audio component according to the capabilities of the receiver based on the identification information;
a decoding unit that decodes the audio data of the selected audio component;
the identification information is included in an audio component descriptor;
the acquisition unit acquires the audio component descriptor,
the selection unit selects an audio component according to a receiver's capability based on the identification information included in the audio component descriptor;
the broadcast wave includes an audio component of MPEG-H audio and an audio component of MPEG-4 audio;
the selection unit selects an audio component of either MPEG-H audio or MPEG-4 audio when determining that MPEG-H audio can be played back based on the identification information included in the audio component descriptor;
The decoding unit decodes the audio data of the selected audio component, either MPEG-H audio or MPEG-4 audio.
Receiver .

a receiving unit for receiving a broadcast including MPEG-H audio in the audio component;
an acquisition unit that acquires, from the broadcast wave of the broadcast, identification information indicating whether or not MPEG-H audio is present at a multiplexing layer;
a selection unit that selects an audio component according to the capabilities of the receiver based on the identification information;
a decoding unit that decodes the audio data of the selected audio component;
the identification information is included in an audio component descriptor;
the acquisition unit acquires the audio component descriptor,
the selection unit selects an audio component according to a receiver's capability based on the identification information included in the audio component descriptor;
the broadcast wave includes an audio component of MPEG-H audio and an audio component of MPEG-4 audio;
the selection unit selects an audio component of MPEG-4 audio when determining that MPEG-H audio cannot be played back based on the identification information included in the audio component descriptor;
The decoding unit decodes the audio data of the selected audio component of the MPEG-4 audio.
Receiver .

each of the audio components is identified by a component identifier;
the MPEG-4 audio is composed of one or more audio components;
The receiver according to claim 3 or claim 4, wherein the acquisition unit reads configuration information of the audio components of MPEG-H audio and the audio components of MPEG-4 audio in accordance with the order of the component identifiers, and acquires the identification information from the configuration information.

a receiving unit for receiving a broadcast including MPEG-H audio in the audio component;
an acquisition unit that acquires, from the broadcast wave of the broadcast, identification information indicating whether or not MPEG-H audio is present at a multiplexing layer;
a selection unit that selects an audio component according to the capabilities of the receiver based on the identification information;
a decoding unit that decodes the audio data of the selected audio component;
the identification information is included in an audio component descriptor;
the acquisition unit acquires the audio component descriptor,
the selection unit selects an audio component according to a receiver's capability based on the identification information included in the audio component descriptor;
The audio component descriptor includes a profile that indicates a level or set of functions that indicate processing capabilities of MPEG-H audio;
The selection unit selects an audio component according to the capabilities of a receiver based on the level or the profile.
Receiver.

a receiving unit for receiving a broadcast including MPEG-H audio in the audio component;
an acquisition unit that acquires, from the broadcast wave of the broadcast, identification information indicating whether or not MPEG-H audio is present at a multiplexing layer;
a selection unit that selects an audio component according to the capabilities of the receiver based on the identification information;
a decoding unit that decodes the audio data of the selected audio component;
The broadcast wave includes an audio component descriptor and a descriptor that describes information related to object-based audio,
the identification information is included in a descriptor that describes information about the object-based audio;
the acquisition unit acquires the audio component descriptor and a descriptor describing information related to the object-based audio;
The selection unit selects an audio component according to the capabilities of a receiver based on the identification information included in a descriptor that describes information related to the object-based audio.
Receiver .

A receiving method in a receiver, comprising:
a receiving step of receiving a broadcast including MPEG-H audio in the audio component;
an acquisition step of acquiring, from the airwaves of the broadcast, identification information indicating whether or not MPEG-H audio is present at a multiplexing layer;
a selection step of selecting an audio component according to the receiver's capabilities based on said identification information;
a decoding step of decoding audio data of the selected audio component;
the identification information is included in an audio component descriptor;
The obtaining step obtains the audio component descriptor,
the selecting step selects an audio component according to the capability of a receiver based on the identification information included in the audio component descriptor;
the broadcast wave includes a descriptor that describes information related to object-based audio;
The obtaining step obtains a descriptor that describes information about the object-based audio.
Receiving method.

a receiving unit for receiving a broadcast including MPEG-H audio in the audio component;
an acquisition unit that acquires, from the broadcast wave of the broadcast, identification information indicating whether or not MPEG-H audio is present at a multiplex layer;
the broadcast wave includes a descriptor that describes information related to object-based audio;
The acquisition unit acquires an audio component descriptor and a descriptor describing information about the object-based audio, and the computer of the receiver acquires the descriptor ,
selecting an audio component according to the receiver's capabilities based on the identification information;
Decoding the audio data of the selected audio component;
the identification information is included in the audio component descriptor;
and selecting an audio component according to the capabilities of a receiver based on the identification information included in the audio component descriptor.
program.

A broadcasting system that broadcasts audio including MPEG-H audio,
The receiver,
a receiving unit for receiving a broadcast including MPEG-H audio in the audio component;
an acquisition unit that acquires, from the broadcast wave of the broadcast, identification information indicating whether or not MPEG-H audio is present at a multiplexing layer;
a selection unit that selects an audio component according to the capabilities of the receiver based on the identification information;
a decoding unit that decodes the audio data of the selected audio component;
the identification information is included in an audio component descriptor;
the acquisition unit acquires the audio component descriptor,
the selection unit selects an audio component according to a receiver's capability based on the identification information included in the audio component descriptor;
the broadcast wave includes a descriptor that describes information related to object-based audio;
The acquisition unit acquires a descriptor that describes information about the object-based audio.
Broadcasting system.