JP6204313B2

JP6204313B2 - Electronics

Info

Publication number: JP6204313B2
Application number: JP2014174641A
Authority: JP
Inventors: 竜太田邨
Original assignee: Kyocera Document Solutions Inc
Current assignee: Kyocera Document Solutions Inc
Priority date: 2014-08-28
Filing date: 2014-08-28
Publication date: 2017-09-27
Anticipated expiration: 2034-08-28
Also published as: JP2016051228A

Description

本発明は、電子機器に関するものである。 The present invention relates to an electronic device.

ある情報処理装置は、ＡＳＩＣ（Application Specific Integrated Circuit）などの専用ハードウェアで特定のプロセスを実行するとともに、ＳＩＭＤ（Single Instruction Multiple Data stream）プロセッサーによるソフトウェア処理で特定のプロセスを実行している（例えば特許文献１参照）。 A certain information processing apparatus executes a specific process by dedicated hardware such as an ASIC (Application Specific Integrated Circuit) and also executes a specific process by software processing by a SIMD (Single Instruction Multiple Data stream) processor (for example, Patent Document 1).

特開２０１１−１９１９０３号公報JP 2011-191903 A

図４は、電子機器内のデータフローの一例を示すブロック図である。逐次的に処理すべき一連の処理をプロセスブロックとすると、図４に示すように、例えば、電子機器において４つのＡＳＩＣと１つのＳＩＭＤプロセッサーによる３つのプロセスで、複数のプロセスブロックが実行される。例えばプロセスブロック＃０では、まず、画像データにおけるバンドデータ（一定の幅分のラインデータの集合）に対して、ＡＳＩＣ＃０による処理が実行された後、ＳＩＭＤプロセッサーによるプロセス＃０なる処理が実行され、その後に、別のＡＳＩＣ＃１による処理が実行される。なお、複数のプロセスブロックは、並列に実行される場合もシーケンシャルに実行される場合もある。 FIG. 4 is a block diagram illustrating an example of a data flow in the electronic device. Assuming that a series of processes to be processed sequentially is a process block, as shown in FIG. 4, for example, in an electronic device, a plurality of process blocks are executed by three processes by four ASICs and one SIMD processor. For example, in the process block # 0, first, after the processing by the ASIC # 0 is performed on the band data in the image data (a set of line data for a certain width), the process # 0 by the SIMD processor is performed. Thereafter, processing by another ASIC # 1 is executed. The plurality of process blocks may be executed in parallel or sequentially.

具体的には、例えば、複合機などの画像形成装置において、あるプロセスブロックとして、まず、ＡＳＩＣによって画像読取による画像データの取得が実行され、ＳＩＭＤプロセスでその画像データに対する空間フィルターが適用され、次に、別のプロセスブロックとして、画像データに対して色変換などの画像処理が実行された後、ＡＳＩＣによってハーフトーニングが実行される。 Specifically, for example, in an image forming apparatus such as a multifunction machine, as a certain process block, first, acquisition of image data by image reading is executed by an ASIC, and a spatial filter is applied to the image data in the SIMD process. In addition, as another process block, image processing such as color conversion is performed on the image data, and then halftoning is performed by the ASIC.

図５は、図４に示すプロセスブロック（ＰＢ）＃０〜＃２によるＡＳＩＣに要求されるデータ処理の帯域を説明する図である。一般的に、プロセスブロックを中断し別のプロセスブロックを実行すると、コンテクストスイッチが発生しオーバーヘッドが発生するため、図５に示すように、プロセスブロック＃０〜＃２は、逐次的に実行される。そのため、ＡＳＩＣによるデータ処理速度に比べＳＩＭＤプロセスのデータ処理速度が高い場合、処理の種類によっては、図５におけるＡＳＩＣ＃２のように、特定のＡＳＩＣに広い帯域（つまり、高いスループット）が要求されることになり、その帯域を実現するために装置のコストが高くなったり、要求される帯域を達成せずに処理速度を低く抑えることになったりする。 FIG. 5 is a diagram for explaining a data processing band required for the ASIC by the process blocks (PB) # 0 to # 2 shown in FIG. Generally, when a process block is interrupted and another process block is executed, a context switch occurs and overhead is generated. Therefore, as shown in FIG. 5, process blocks # 0 to # 2 are executed sequentially. . Therefore, when the data processing speed of the SIMD process is higher than the data processing speed by the ASIC, depending on the type of processing, a specific ASIC may require a wide bandwidth (that is, high throughput) as in the ASIC # 2 in FIG. As a result, the cost of the apparatus increases to realize the bandwidth, or the processing speed is reduced without achieving the required bandwidth.

本発明は、上記の問題に鑑みてなされたものであり、全体としてのデータ処理速度の低下を抑えつつ、ＡＳＩＣなどの複数のハードウェア処理部のそれぞれの要求帯域を時間軸方向で平準化することで、最大要求帯域を低く抑える電子機器を得ることを目的とする。 The present invention has been made in view of the above problems, and leveling the required bandwidth of each of a plurality of hardware processing units such as an ASIC in the time axis direction while suppressing a decrease in the overall data processing speed. Thus, it is an object to obtain an electronic device that keeps the maximum required bandwidth low.

本発明に係る電子機器は、データに対してハードウェア処理とソフトウェア処理とを順番に実行する電子機器であって、前記ハードウェア処理を行うハードウェア処理部と、前記ソフトウェア処理を行うソフトウェア処理部と、前記ソフトウェア処理部用の内部メモリーと、ラインを主走査方向において所定数に分割して得られる１サブライン分のデータを外部メモリーから読み出すリードダイレクトメモリーアクセスコントローラーと、前記ハードウェア処理部および前記リードダイレクトメモリーアクセスコントローラーのうち、プロセス識別子に対応する送信元から１サブライン分のデータを前記内部メモリーへ転送するロード転送部と、１サブライン分のデータを前記外部メモリーへ書き込むライトダイレクトメモリーアクセスコントローラーと、前記内部メモリーに格納されている１サブライン分のデータを、前記ハードウェア処理部および前記ライトダイレクトメモリーアクセスコントローラーのうち、プロセス識別子に対応する宛先へ転送するストア転送部と、前記ロード転送部および前記ストア転送部に対して前記プロセス識別子を指定して、１サブライン分のデータの転送を実行させるコントローラーとを備える。そして、前記内部メモリーは、複数のバッファー領域を有し、前記ロード転送部は、前記バッファー領域へデータを書き込み、前記ソフトウェア処理部は、マルチスレッドで、前記複数のバッファー領域から１サブライン分のデータを読み出して処理を実行する。前記コントローラーは、前記複数のバッファー領域の排他制御を行い、前記ソフトウェア処理部による前記バッファー領域内のデータの使用が完了するまで、前記バッファー領域をロックし、あるプロセス識別子により指定される前記バッファー領域のうち、ロックされていない前記バッファー領域がないときには、前記ロード転送部に対して、前記プロセス識別子により指定される前記データの、前記内部メモリーへの転送を実行させず、別のプロセス識別子により指定される前記バッファー領域のうち、ロックされていない前記バッファー領域があれば、前記ロード転送部に対して、前記別のプロセス識別子により指定される前記データの、前記内部メモリーへの転送を実行させる。 An electronic device according to the present invention is an electronic device that sequentially performs hardware processing and software processing on data, and includes a hardware processing unit that performs the hardware processing and a software processing unit that performs the software processing An internal memory for the software processing unit, a read direct memory access controller that reads data for one subline obtained by dividing a line into a predetermined number in the main scanning direction from an external memory, the hardware processing unit, and the Among the read direct memory access controllers, a load transfer unit for transferring data for one subline from the transmission source corresponding to the process identifier to the internal memory, and a write direct memory access controller for writing data for one subline to the external memory. A roller, a store transfer unit that transfers data for one subline stored in the internal memory to a destination corresponding to a process identifier among the hardware processing unit and the write direct memory access controller, and the load transfer And a controller for designating the process identifier to the store transfer unit and transferring data for one subline. The internal memory has a plurality of buffer areas, the load transfer unit writes data to the buffer areas, and the software processing unit is multi-threaded and data for one subline from the plurality of buffer areas. Is read and the process is executed. The controller performs exclusive control of the plurality of buffer areas, locks the buffer area until the use of the data in the buffer area by the software processing unit is completed, and the buffer area specified by a certain process identifier When there is no buffer area that is not locked, the load transfer unit does not transfer the data specified by the process identifier to the internal memory, and is specified by another process identifier. If there is an unlocked buffer area among the buffer areas to be transferred, the load transfer unit is caused to transfer the data specified by the another process identifier to the internal memory.

本発明によれば、コンテクスト切り替えのオーバーヘッドを抑えつつ、データおよび処理の粒度を小さくしているので、全体としてのデータ処理速度の低下を抑えつつ、ＡＳＩＣなどの複数のハードウェア処理部のそれぞれの要求帯域を時間軸方向で平準化することで、最大要求帯域が低く抑えられる。 According to the present invention, since the granularity of data and processing is reduced while suppressing the overhead of context switching, each of a plurality of hardware processing units such as ASICs is suppressed while suppressing a decrease in the overall data processing speed. By leveling the required bandwidth in the time axis direction, the maximum required bandwidth can be kept low.

図１は、本発明の実施の形態に係る電子機器の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an electronic apparatus according to an embodiment of the present invention. 図２は、図１に示す電子機器の動作について説明する図である。FIG. 2 is a diagram illustrating an operation of the electronic device illustrated in FIG. 図３は、図１に示す電子機器において使用されるプロセス識別子のフォーマットの一例を示す図である。FIG. 3 is a diagram showing an example of a format of a process identifier used in the electronic device shown in FIG. 図４は、電子機器内のデータフローの一例を示すブロック図である。FIG. 4 is a block diagram illustrating an example of a data flow in the electronic device. 図５は、図４に示すプロセスブロック＃０〜＃２によるＡＳＩＣに要求されるデータ処理の帯域を説明する図である。FIG. 5 is a diagram for explaining a data processing band required for the ASIC by the process blocks # 0 to # 2 shown in FIG.

以下、図に基づいて本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の実施の形態に係る電子機器の構成を示すブロック図である。図１に示す電子機器は、データに対してハードウェア処理とソフトウェア処理とをシーケンシャルに実行する。例えば、この電子機器は、複合機などの画像形成装置であって、画像データに対する画像処理をハードウェア処理およびソフトウェア処理で行う。 FIG. 1 is a block diagram showing a configuration of an electronic apparatus according to an embodiment of the present invention. The electronic device illustrated in FIG. 1 sequentially performs hardware processing and software processing on data. For example, the electronic device is an image forming apparatus such as a multifunction peripheral, and performs image processing on image data by hardware processing and software processing.

図１に示す電子機器は、ハードウェア処理部としての複数のＡＳＩＣ１を備え、また、ソフトウェア処理部としてのＳＩＭＤプロセッサー２を備える。ＳＩＭＤプロセッサー２は、各ＡＳＩＣ１に比べ高いデータ処理速度を有する。 The electronic device shown in FIG. 1 includes a plurality of ASICs 1 as hardware processing units, and also includes a SIMD processor 2 as a software processing unit. The SIMD processor 2 has a higher data processing speed than each ASIC 1.

複数のＡＳＩＣ１は、それぞれ、所定のハードウェア処理（つまり、専用ハードウェアによる特定のデータ処理）を行い、ＳＩＭＤプロセッサー２は、マルチスレッドで動作し、複数のプロセスブロックに対応するそれぞれのスレッドで、ソフトウェア処理（つまり、ソフトウェアによるプログラミングされたデータ処理）を行う。 Each of the plurality of ASICs 1 performs predetermined hardware processing (that is, specific data processing by dedicated hardware), and the SIMD processor 2 operates in a multi-thread, and each thread corresponding to a plurality of process blocks includes: Perform software processing (ie, programmed data processing by software).

この実施の形態では、ＳＩＭＤプロセッサー２は、画像処理を実行し、少なくともサブラインの画素数と同数のプロセシングエレメントを備える。つまり、１サブライン内のすべての画素に対する処理がその複数のプロセシングエレメントで並列に行われる。 In this embodiment, the SIMD processor 2 executes image processing and includes at least the same number of processing elements as the number of pixels of the subline. That is, processing for all the pixels in one subline is performed in parallel by the plurality of processing elements.

リードＤＭＡＣ（Direct Memory Access Controller）３は、外部メモリー１０２から１サブライン分のデータを読み出す。１サブラインは、データの１ラインを主走査方向において所定数に分割して得られるデータ量である。 A read DMAC (Direct Memory Access Controller) 3 reads data for one subline from the external memory 102. One subline is a data amount obtained by dividing one line of data into a predetermined number in the main scanning direction.

リードＤＭＡＣ３は、内蔵メモリー（例えば、ＳＲＡＭ（Static RAM）、ＤＲＡＭなど）を備え、プロセス識別子ごとにコンテクスト情報をその内蔵メモリーに保存し、コントローラー３１により指定されたプロセス識別子に対応するコンテクスト情報を内蔵メモリーから読み出してコンテクスト切り替えを行った後、切り換え後のコンテクスト情報に基づいて、指定されたプロセス識別子に対応するデータの読み出しを実行する。このようにすることで、外部メモリー１０２にコンテクスト情報を保存する場合に比べ、コンテクスト情報の読み出しに起因する遅延が少なくて済む。 The read DMAC 3 includes a built-in memory (for example, SRAM (Static RAM), DRAM, etc.), stores context information for each process identifier in the built-in memory, and incorporates context information corresponding to the process identifier specified by the controller 31. After reading from the memory and switching the context, data corresponding to the designated process identifier is read based on the context information after the switching. By doing so, the delay due to the reading of the context information can be reduced as compared with the case where the context information is stored in the external memory 102.

ライトＤＭＡＣ４は、１サブライン分のデータを外部メモリー１０２へ書き込む。ライトＤＭＡＣ４も、内蔵メモリーを備え、プロセス識別子ごとにコンテクスト情報をその内蔵メモリーに保存し、コントローラー３１により指定されたプロセス識別子に対応するコンテクスト情報を内蔵メモリーから読み出してコンテクスト切り替えを行った後、切り換え後のコンテクスト情報に基づいて、指定されたプロセス識別子に対応するデータの書き込みを実行する。 The write DMAC 4 writes data for one subline to the external memory 102. The write DMAC 4 also has a built-in memory, stores context information for each process identifier in the built-in memory, reads the context information corresponding to the process identifier specified by the controller 31 from the built-in memory, switches the context, and then switches. Based on the subsequent context information, data writing corresponding to the designated process identifier is executed.

図１において、ルーティング部１１は、ＡＳＩＣ１間およびＡＳＩＣ１とバッファー１２との間の静的なルーティングを行う。バッファー１２は、ＡＳＩＣ１に対応して設けられており、ＦＩＦＯ（First-In First-Out）で、ＡＳＩＣ１へ転送されるデータおよびＡＳＩＣ１から転送されるデータを受け付けて一時的に格納する。ただし、バッファー１２は、ＡＳＩＣ１と同数でもよいが、ＡＳＩＣ１より少なくてもよい。 In FIG. 1, the routing unit 11 performs static routing between the ASICs 1 and between the ASIC 1 and the buffer 12. The buffer 12 is provided corresponding to the ASIC 1 and receives and temporarily stores data transferred to the ASIC 1 and data transferred from the ASIC 1 by FIFO (First-In First-Out). However, the number of buffers 12 may be the same as that of ASIC1, but may be less than ASIC1.

また、図１において、内部メモリー２１は、ＳＩＭＤプロセッサー２用の内部メモリー（つまり、ＳＩＭＤプロセッサー２内のプロセシングエレメントの共有メモリー）であって、ＲＡＭ（Random Access Memory）であり、メモリーインターフェイス２２は、内部メモリー２１に対するリード／ライトを実行する。 In FIG. 1, an internal memory 21 is an internal memory for the SIMD processor 2 (that is, a shared memory of processing elements in the SIMD processor 2), and is a RAM (Random Access Memory). Read / write to the internal memory 21 is executed.

内部メモリー２１は、複数のバッファー領域を有し、ＳＩＭＤプロセッサー２は、マルチスレッドで動作し、複数のバッファー領域から１サブライン分のデータを読み出して処理を実行する。例えば、内部メモリー２１は、プロセスブロックごと（つまり、ＳＩＭＤプロセッサー２のスレッドごと）に、リングバッファー領域を有し、ＳＩＭＤプロセッサー２は、各スレッドにおいて、リングバッファー領域から順番にデータを読み出して処理を実行する。 The internal memory 21 has a plurality of buffer areas, and the SIMD processor 2 operates in a multi-thread manner, reads data for one subline from the plurality of buffer areas, and executes processing. For example, the internal memory 21 has a ring buffer area for each process block (that is, for each thread of the SIMD processor 2), and the SIMD processor 2 reads and processes data sequentially from the ring buffer area in each thread. Run.

ロード転送部２３は、ＡＳＩＣ１およびリードＤＭＡＣ３のうち、プロセス識別子に対応する送信元から、１サブライン分のデータを、メモリーインターフェイス２２を使用して内部メモリー２１へ転送する。具体的には、ロード転送部２３は、上述のバッファー領域へデータを書き込む。 The load transfer unit 23 transfers data for one subline from the transmission source corresponding to the process identifier of the ASIC 1 and the read DMAC 3 to the internal memory 21 using the memory interface 22. Specifically, the load transfer unit 23 writes data in the buffer area.

ストア転送部２４は、内部メモリー２１に格納されている１サブライン分のデータを、ＡＳＩＣ１およびライトＤＭＡＣ４のうち、プロセス識別子に対応する宛先へ転送する。 The store transfer unit 24 transfers data for one subline stored in the internal memory 21 to a destination corresponding to the process identifier of the ASIC 1 and the write DMAC 4.

ルーティング部２５は、１サブライン分のデータの、リードＤＭＡＣ３から、ＡＳＩＣ１（具体的には、対応するバッファー１２）およびロード転送部２３のいずれかへのルーティング、並びに１サブライン分のデータの、ストア転送部２４から、ＡＳＩＣ１（具体的には、対応するバッファー１２）およびライトＤＭＡＣ４のいずれかへのルーティングを行う。具体的には、プロセス識別子に対応して宛先へのチャネルが決定され、データがそのチャネルで宛先へ転送される。このとき、１サブライン分のデータが宛先へバースト転送される。 The routing unit 25 routes the data for one subline from the read DMAC 3 to either the ASIC 1 (specifically, the corresponding buffer 12) or the load transfer unit 23, and stores the data for one subline. Routing is performed from the unit 24 to either the ASIC 1 (specifically, the corresponding buffer 12) or the write DMAC 4. Specifically, a channel to the destination is determined corresponding to the process identifier, and the data is transferred to the destination through the channel. At this time, data for one subline is burst transferred to the destination.

コントローラー３１は、Ｉ／Ｏ（Input/Output）バス３２を使用して、ＳＩＭＤプロセッサー２との間でプロセッサー間通信を行うとともに、リードＤＭＡＣ３、ライトＤＭＡＣ４、ロード転送部２３、ストア転送部２４などにコマンドを出力する。 The controller 31 uses an I / O (Input / Output) bus 32 to perform inter-processor communication with the SIMD processor 2, and to read DMAC 3, write DMAC 4, load transfer unit 23, store transfer unit 24, etc. Output the command.

コントローラー３１は、ロード転送部２３およびストア転送部２４に対してプロセス識別子を指定して、１サブライン分のデータの転送を実行させる。 The controller 31 designates a process identifier for the load transfer unit 23 and the store transfer unit 24 and causes the data transfer for one subline to be executed.

また、コントローラー３１は、プロセス識別子を指定して、１サブライン分のデータを、リードＤＭＡＣ３に読み出させる。さらに、コントローラー３１は、プロセス識別子を指定して、１サブライン分のデータを、ライトＤＭＡＣ４に書き込ませる。 In addition, the controller 31 designates a process identifier and causes the read DMAC 3 to read data for one subline. Furthermore, the controller 31 designates a process identifier and writes data for one subline in the write DMAC 4.

また、コントローラー３１は、内部メモリー２１の複数のバッファー領域の排他制御を行い、ＳＩＭＤプロセッサー２によるバッファー領域内のデータの使用が完了するまで、そのバッファー領域をロックする。そして、コントローラー３１は、あるプロセス識別子により指定されるバッファー領域のうち、ロックされていないバッファー領域がないときには、ロード転送部２３に対して、そのプロセス識別子により指定されるデータの、内部メモリー２１への転送を実行させず、別のプロセス識別子により指定されるバッファー領域のうち、ロックされていないバッファー領域があれば、ロード転送部２３に対して、その別のプロセス識別子により指定されるデータの、内部メモリー２１への転送を実行させる。 Further, the controller 31 performs exclusive control of a plurality of buffer areas in the internal memory 21 and locks the buffer areas until the use of data in the buffer area by the SIMD processor 2 is completed. When there is no unlocked buffer area among the buffer areas specified by a certain process identifier, the controller 31 sends the data specified by the process identifier to the internal memory 21 to the load transfer unit 23. If there is an unlocked buffer area among the buffer areas specified by another process identifier, the load transfer unit 23 receives the data specified by the other process identifier. The transfer to the internal memory 21 is executed.

ホストインターフェイス３３は、メインＣＰＵ（Central Processing Unit）１０１とコントローラー３１との間の通信を行う。メインＣＰＵ１０１は、例えば、画像データからバンドデータを生成し、さらに、主走査方向においてバンドデータを分割して得られるサブバンドデータを生成し、外部メモリー１０２に格納する。なお、メインＣＰＵ１０１は、上述のプロセスブロックの実行中には、コントローラー３１には介入しない。外部メモリー１０２は、メインＣＰＵ１０１のメインメモリーであるＤＲＡＭなどである。メインＣＰＵ１０１および外部メモリー１０２は、システムクロックドメインに属し、ＳＩＭＤプロセッサー２は、システムクロックドメインとは異なるクロックドメインに属する。 The host interface 33 performs communication between a main CPU (Central Processing Unit) 101 and the controller 31. For example, the main CPU 101 generates band data from image data, further generates sub-band data obtained by dividing the band data in the main scanning direction, and stores it in the external memory 102. Note that the main CPU 101 does not intervene in the controller 31 during execution of the above-described process block. The external memory 102 is a DRAM that is a main memory of the main CPU 101. The main CPU 101 and the external memory 102 belong to the system clock domain, and the SIMD processor 2 belongs to a clock domain different from the system clock domain.

次に、上記電子機器の動作について説明する。 Next, the operation of the electronic device will be described.

図２は、図１に示す電子機器の動作について説明する図である。図３は、図１に示す電子機器において使用されるプロセス識別子のフォーマットの一例を示す図である。 FIG. 2 is a diagram illustrating an operation of the electronic device illustrated in FIG. FIG. 3 is a diagram showing an example of a format of a process identifier used in the electronic device shown in FIG.

まず、メインＣＰＵ１０１によって、外部メモリー１０２にサブバンドデータが準備される。 First, the main CPU 101 prepares subband data in the external memory 102.

そして、以下のように、外部メモリー１０２から１サブラインずつ、データが読み出され、複数のプログラムブロックが、１サブラインの粒度で並行に実行される。 Then, as described below, data is read from the external memory 102 one subline at a time, and a plurality of program blocks are executed in parallel at a granularity of one subline.

コントローラー３１は、内部メモリー２１のバッファー領域を排他制御しており、所定の調停方式で、空きバッファー領域のあるプロセスブロック（スレッド）の１つ選択し、選択したプロセスブロック（スレッド）に対応するプロセス識別子を指定して、リードＤＭＡＣ３およびロード転送部２３へコマンドを出力する。 The controller 31 exclusively controls the buffer area of the internal memory 21, selects one process block (thread) having an empty buffer area by a predetermined arbitration method, and processes corresponding to the selected process block (thread). The identifier is designated and a command is output to the read DMAC 3 and the load transfer unit 23.

プロセス識別子は、図３に示すように、プロセスブロック固有のプロセスブロックＩＤ、およびプロセス種別を示すローカルプロセスＩＤを有する。所定の複数セットの一連の処理に対してそれぞれローカルプロセスＩＤの値が割り当てられており、プロセス識別子に含まれるローカルプロセスＩＤに基づいて、プロセスブロックＩＤで指定されたプロセスブロックで実行すべき処理およびその順序が特定される。 As shown in FIG. 3, the process identifier has a process block ID unique to the process block and a local process ID indicating the process type. A local process ID value is assigned to each of a predetermined series of processes, and a process to be executed in the process block specified by the process block ID based on the local process ID included in the process identifier The order is specified.

リードＤＭＡＣ３は、そのコマンドを受け付けると、そのコマンドにより指定されたプロセス識別子に対応するコンテクスト情報（１サブライン分のデータの読み出しアドレスなど）をセットした後、外部メモリー１０２から１サブライン分のデータを読み出し、コンテクスト情報を更新する。また、ロード転送部２３は、その１サブライン分のデータを内部メモリー２１のそのプロセス識別子に対応するバッファー領域へ転送する。その際、プロセス識別子におけるプロセスブロックＩＤがコンテクストＩＤとして使用され、プロセスブロックＩＤごとにコンテクスト情報が保存される。 When the read DMAC 3 accepts the command, it sets context information (such as a read address of data for one subline) corresponding to the process identifier specified by the command, and then reads data for one subline from the external memory 102. , Update context information. The load transfer unit 23 transfers the data for one subline to the buffer area corresponding to the process identifier in the internal memory 21. At this time, the process block ID in the process identifier is used as a context ID, and context information is stored for each process block ID.

ロード転送部２３は、転送が完了すると、転送完了をコントローラー３１に通知する。転送完了の通知を受け付けると、転送したデータに対してＳＩＭＤプロセッサー２による処理を実行する場合には、コントローラー３１は、ただちに、ＳＩＭＤプロセッサー２にコマンドを送信し、そのデータに対する処理を実行させる。処理後のデータは、そのプロセス識別子に対応するバッファー領域に格納される。そして、ＳＩＭＤプロセッサー２は、１サブバンド分のデータ処理が完了すると、処理完了をコントローラー３１に通知する。 When the transfer is completed, the load transfer unit 23 notifies the controller 31 of the transfer completion. When the notification of transfer completion is received, when the processing by the SIMD processor 2 is executed on the transferred data, the controller 31 immediately sends a command to the SIMD processor 2 to execute the processing on the data. The processed data is stored in a buffer area corresponding to the process identifier. When the data processing for one subband is completed, the SIMD processor 2 notifies the controller 31 of the completion of processing.

コントローラー３１は、ＳＩＭＤプロセッサー２による処理が完了し内部メモリー２１から転送すべきデータを所定の調停方式で選択し、その選択したデータのプロセス識別子を指定して、ストア転送部２４にコマンドを出力する。ストア転送部２４は、そのコマンドを受け付けると、指定されたそのプロセス識別子に対応する内部メモリー２１のバッファー領域からそのデータを読み出し、そのプロセス識別子を指定して、読み出したデータをルーティング部２５へ送出する。そのデータは、ルーティング部２５を介して、そのプロセス識別子に対応する宛先（ＡＳＩＣ１のバッファー１２またはライトＤＭＡＣ４）に受信される。 The controller 31 selects the data to be transferred from the internal memory 21 after the processing by the SIMD processor 2 is completed, designates the process identifier of the selected data, and outputs a command to the store transfer unit 24. . Upon receipt of the command, the store transfer unit 24 reads the data from the buffer area of the internal memory 21 corresponding to the designated process identifier, designates the process identifier, and sends the read data to the routing unit 25. To do. The data is received via the routing unit 25 at a destination (the buffer 12 of the ASIC 1 or the write DMAC 4) corresponding to the process identifier.

なお、データを外部メモリー１０２に格納する場合には、宛先は、ライトＤＭＡＣ４とされる。その場合、コントローラー３１は、プロセス識別子を指定してコマンドをライトＤＭＡＣ４に出力する。ライトＤＭＡＣ４は、そのコマンドを受け付けると、そのコマンドにより指定されたプロセス識別子に対応するコンテクスト情報（１サブライン分のデータの書き込みアドレスなど）をセットした後、ルーティング部２５を介してそのデータを受信し、受信したデータを外部メモリー１０２に書き込む。 When data is stored in the external memory 102, the destination is the write DMAC 4. In that case, the controller 31 designates a process identifier and outputs a command to the write DMAC 4. When the write DMAC 4 receives the command, the write DMAC 4 sets the context information (such as the write address of data for one subline) corresponding to the process identifier specified by the command, and then receives the data via the routing unit 25. The received data is written into the external memory 102.

ＡＳＩＣ１は、バッファー１２にデータが入力されると、そのデータに対して処理を実行し、処理後のデータをバッファー１２に書き込む。また、ＡＳＩＣ１は、そのデータの処理が完了すると、その処理完了をコントローラー３１に通知する。コントローラー３１は、その通知を受け付けると、そのデータに対応するプロセス識別子に対応する空きのバッファー領域があれば、ロード転送部２３に対してそのプロセス識別子を指定してコマンドを出力し、ロード転送部２３は、そのコマンドに従って、バッファー１２からデータを読み出し、内部メモリー２１のバッファー領域に転送する。 When data is input to the buffer 12, the ASIC 1 executes processing on the data and writes the processed data to the buffer 12. Further, when the processing of the data is completed, the ASIC 1 notifies the controller 31 of the completion of the processing. When the controller 31 receives the notification, if there is an empty buffer area corresponding to the process identifier corresponding to the data, the controller 31 designates the process identifier and outputs a command to the load transfer unit 23, and the load transfer unit 23 reads the data from the buffer 12 according to the command and transfers it to the buffer area of the internal memory 21.

なお、ＡＳＩＣ１による処理後のデータを、ＳＩＭＤプロセッサー２による処理を経ずに外部メモリー１０２に格納したい場合には、上述のように、まず、ＡＳＩＣ１による処理後のデータは、ロード転送部２３によって内部メモリー２１に転送され、その後、ＳＩＭＤプロセッサー２による処理を経ずに、ストア転送部２４によって、内部メモリー２１からライトＤＲＡＭ４に転送され、外部メモリー１０２に書き込まれる。このとき、ストア転送部２４によるデータ転送が終了した時点で、データ転送の終了がコントローラー３１に通知され、コントローラー３１は、このデータにより使用された内部メモリー２１内のバッファー領域のロックを解除する。 If the data processed by the ASIC 1 is to be stored in the external memory 102 without being processed by the SIMD processor 2, the data processed by the ASIC 1 is first stored by the load transfer unit 23 as described above. The data is transferred to the memory 21, and thereafter transferred from the internal memory 21 to the write DRAM 4 by the store transfer unit 24 without being processed by the SIMD processor 2 and written to the external memory 102. At this time, when the data transfer by the store transfer unit 24 is completed, the controller 31 is notified of the end of the data transfer, and the controller 31 releases the lock of the buffer area in the internal memory 21 used by this data.

また、外部メモリー１０２に格納されているデータを、ＳＩＭＤプロセッサー２による処理を経ずにＡＳＩＣ１に転送したい場合には、上述のように、まず、リードＤＭＡＣ３およびロード転送部２３によって、データが内部メモリー２１に転送され、その後、ＳＩＭＤプロセッサー２による処理を経ずに、ストア転送部２４によって、内部メモリー２１から（ルーティング部２５およびバッファー１２を介して）ＡＳＩＣ１に転送される。このとき、ストア転送部２４によるデータ転送が終了した時点で、データ転送の終了がコントローラー３１に通知され、コントローラー３１は、このデータにより使用された内部メモリー２１内のバッファー領域のロックを解除する。 When data stored in the external memory 102 is to be transferred to the ASIC 1 without being processed by the SIMD processor 2, the data is first stored in the internal memory by the read DMAC 3 and the load transfer unit 23 as described above. Then, without being processed by the SIMD processor 2, the data is transferred from the internal memory 21 to the ASIC 1 (via the routing unit 25 and the buffer 12) by the store transfer unit 24. At this time, when the data transfer by the store transfer unit 24 is completed, the controller 31 is notified of the end of the data transfer, and the controller 31 releases the lock of the buffer area in the internal memory 21 used by this data.

このようにして、内部メモリー２１に空きのバッファー領域があるプログラムブロックが、ＳＩＭＤプロセッサー２によるソフトウェア処理が１サブラインの粒度で実行されていき、その前または後に、ＡＳＩＣ１によるハードウェア処理が実行される。その際、データに対して実行すべき一連の処理は、そのデータに付加されているプロセス識別子で特定されるため、コントローラー３１が、各データに対して逐一実行すべき処理を指定する必要がない。 In this way, a program block having an empty buffer area in the internal memory 21 is subjected to software processing by the SIMD processor 2 with a granularity of one subline, and before or after that, hardware processing by the ASIC 1 is executed. . At this time, since a series of processes to be executed on the data is specified by a process identifier added to the data, the controller 31 does not need to specify a process to be executed for each data one by one. .

したがって、複数のプログラムブロックにおけるＳＩＭＤプロセッサー２による処理がマルチスレッドで並行に行われるため、複数のプログラムブロックにおけるＳＩＭＤプロセッサー２による処理が時間軸に沿って分散されて実行され、そのＳＩＭＤプロセッサー２のプロセスの前または後で実行されるＡＳＩＣ１の処理も時間軸に沿って分散されて実行される。 Therefore, since the processing by the SIMD processor 2 in a plurality of program blocks is performed in a multithreaded manner in parallel, the processing by the SIMD processor 2 in the plurality of program blocks is executed while being distributed along the time axis. The processing of the ASIC 1 executed before or after is also distributed and executed along the time axis.

そして、すべてのサブブロックデータに対する一連の処理が完了し、処理後のサブブロックデータが外部メモリー１０２に格納されると、メインＣＰＵ１０１は、処理後のサブブロックデータを結合して１つのバンドデータとする。 When a series of processing for all the sub-block data is completed and the processed sub-block data is stored in the external memory 102, the main CPU 101 combines the processed sub-block data into one band data. To do.

以上のように、上記実施の形態によれば、以下に述べるように、コンテクスト切り替えに起因する全体としてのデータ処理速度の低下を抑えつつ、処理の粒度を１サブラインとしてＡＳＩＣ１のそれぞれの要求帯域を時間軸方向で平準化することで、最大要求帯域が低く抑えられる。 As described above, according to the above-described embodiment, as described below, each requested bandwidth of the ASIC 1 is set with the processing granularity as one subline while suppressing a decrease in the overall data processing speed due to context switching. By leveling in the time axis direction, the maximum required bandwidth can be kept low.

一般的に、短命令ステップ数のソフトウェア処理にハードウェア処理がカスケード接続される場合、ソフトウェア処理の処理粒度（コンテクスト切り替え粒度）に対してソフトウェア処理とハードウェア処理との間のデータバッファのサイズが十分でないとき、後続のハードウェア処理の速度がソフトウェア処理に対して十分でないとソフトウェア処理を待たせることになる。この点について、上記実施の形態では、処理粒度が１サブラインとしており十分小さいため、ソフトウェア処理とハードウェア処理との間のデータバッファが十分確保されている。 Generally, when hardware processing is cascade-connected to software processing with a short instruction step number, the size of the data buffer between software processing and hardware processing is smaller than the processing granularity (context switching granularity) of software processing. When it is not sufficient, if the speed of the subsequent hardware processing is not sufficient for the software processing, the software processing is made to wait. In this regard, in the above embodiment, since the processing granularity is one subline and is sufficiently small, a data buffer between software processing and hardware processing is sufficiently secured.

また、一般的に、処理の粒度を小さくすると、コンテクスト切り替えに起因するオーバーヘッドが大きくなる。この点については、上記実施の形態では、リードＤＭＡＣ３およびライトＤＭＡＣ４にローカルな内蔵メモリーを設け、そのメモリーを使用してコンテクスト切り替えが実行されるとともに、ＳＩＭＤプロセッサー２ではマルチスレッドで処理を切り替えるため、コンテクスト切り替えのオーバーヘッドを軽減している。 In general, when the processing granularity is reduced, the overhead due to context switching increases. Regarding this point, in the above embodiment, the read DMAC 3 and the write DMAC 4 are provided with a local built-in memory, and the context switching is performed using the memory, and the SIMD processor 2 switches the processing in a multi-thread. The overhead of context switching is reduced.

さらに、ソフトウェア処理をマルチスレッド化した場合、複数のスレッド（ここでは、複数のプロセスブロックに対応する）で１つのハードウェア処理部が使用される場合、一般的に、ハードウェア処理部を共有するために複雑な制御が必要となる。この点については、上記実施の形態では、上述のプロセス識別子に基づいて、１サブラインの粒度で各データの処理順序が判別されるため、比較的簡単な制御で、１つのＡＳＩＣ１を複数のプロセスブロックで共有することができる。 Furthermore, when software processing is multi-threaded, when one hardware processing unit is used by a plurality of threads (corresponding to a plurality of process blocks in this case), the hardware processing unit is generally shared. Therefore, complicated control is required. With regard to this point, in the above embodiment, since the processing order of each data is determined with the granularity of one subline based on the above-described process identifier, one ASIC 1 is connected to a plurality of process blocks with relatively simple control. Can be shared on

なお、上述の実施の形態は、本発明の好適な例であるが、本発明は、これらに限定されるものではなく、本発明の要旨を逸脱しない範囲において、種々の変形、変更が可能である。 The above-described embodiments are preferred examples of the present invention, but the present invention is not limited to these, and various modifications and changes can be made without departing from the scope of the present invention. is there.

例えば、上記実施の形態において、上述の調停では、デッドロック回避のために、後方プロセスが優先されるようにしてもよい。 For example, in the above-described embodiment, in the above-described arbitration, a backward process may be prioritized in order to avoid deadlock.

本発明は、例えば、画像形成装置に適用可能である。 The present invention is applicable to, for example, an image forming apparatus.

１ＡＳＩＣ（ハードウェア処理部の一例）
２ＳＩＭＤプロセッサー（ソフトウェア処理部の一例）
３リードＤＭＡＣ
４ライトＤＭＡＣ
２１内部メモリー
２３ロード転送部
２４ストア転送部
３１コントローラー 1 ASIC (an example of a hardware processing unit)
2 SIMD processor (example of software processing unit)
3 Lead DMAC
4 Write DMAC
21 Internal memory 23 Load transfer unit 24 Store transfer unit 31 Controller

Claims

In an electronic device that sequentially performs hardware processing and software processing on data,
A hardware processing unit for performing the hardware processing;
A software processing unit for performing the software processing;
An internal memory for the software processor;
A read direct memory access controller for reading out data for one subline obtained by dividing a line into a predetermined number in the main scanning direction from an external memory;
Among the hardware processing unit and the read direct memory access controller, a load transfer unit that transfers data for one subline from the transmission source corresponding to the process identifier to the internal memory,
A write direct memory access controller that writes data for one subline to the external memory;
A store transfer unit that transfers data for one subline stored in the internal memory to a destination corresponding to a process identifier among the hardware processing unit and the write direct memory access controller;
A controller for specifying the process identifier for the load transfer unit and the store transfer unit, and transferring data for one subline,
The internal memory has a plurality of buffer areas,
The load transfer unit writes data to the buffer area,
The software processing unit is multi-threaded and executes processing by reading data for one subline from the plurality of buffer areas,
The controller performs exclusive control of the plurality of buffer areas, locks the buffer area until the use of the data in the buffer area by the software processing unit is completed, and the buffer area specified by a certain process identifier When there is no buffer area that is not locked, the load transfer unit does not transfer the data specified by the process identifier to the internal memory, and is specified by another process identifier. If there is an unlocked buffer area among the buffer areas to be transferred, the load transfer unit is caused to execute transfer of the data specified by the another process identifier to the internal memory. ,
Electronic equipment characterized by

When the transmission source is the read direct memory access controller, the controller links data read by the read direct memory access controller together with data transfer by the load transfer unit, and the destination is the write direct memory access controller 2. The electronic device according to claim 1, wherein data writing by the write direct memory access controller is linked with data transfer by the store transfer unit.

The read direct memory access controller includes a built-in memory, stores context information in the built-in memory for each process identifier, reads out the context information corresponding to the process identifier specified by the controller from the built-in memory, and stores the context information. After switching, executing reading of the data corresponding to the specified process identifier based on the context information after the switching,
The electronic device according to claim 1.

The data is image data,
The hardware processing and the software processing include image processing on the image data;
The electronic device according to any one of claims 1 to 3, wherein

The software processing unit is a SIMD processor,
The data for the one subline is image data,
The SIMD processor includes at least the same number of processing elements as the number of pixels of a subline, and the processing element processes data for one subline in parallel;
The electronic device according to any one of claims 1 to 4 , wherein