AU767372B2

AU767372B2 - Combined control and data pipeline path in computer graphics system

Info

Publication number: AU767372B2
Application number: AU57772/01A
Authority: AU
Inventors: Timothy Merrick Long; Benjamin John Widdup
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2000-08-03
Filing date: 2001-08-02
Publication date: 2003-11-06
Anticipated expiration: 2021-08-02
Also published as: AU5777201A

Description

A S&FRef: 562652

AUSTRALIA

PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT

ORIGINAL

Name and Address of Applicant: e s o Actual Inventor(s): Address for Service: Canon Kabushiki Kaisha 30-2, Shimomaruko 3-chome, Ohta-ku Tokyo 146 Japan Benjamin John Widdup, Timothy Merrick Long Spruson Ferguson St Martins Tower,Level 31 Market Street Sydney NSW 2000 (CCN 3710000177) Invention Title: Combined Control and Computer Graphics Sy ASSOCIATED PROVISIONAL APPLICATION DETAILS [33] Country [31] Applic. No(s) AU PQ9159 SData Pipeline Path in 'stem [32] Application Date 03 Aug 2000 The following statement is a full description of this invention, including the best method of performing it known to me/us:- 5815c -1- COMBINED CONTROL AND DATA PIPELINE PATH IN COMPUTER GRAPHICS SYSTEM Technical Field of the Invention The present invention relates generally to bus architectures on semiconductor integrated circuits (ICs) and, in particular, to graphics processing ICs.

Background Art Computer graphics processing ICs typically contain a main data pipeline for processing data sequentially in stages. Fig. 1 shows a prior art arrangement, wherein an instruction engine 100 sends data on a connection 102 to a polygon processor 104, which sends processed data on a connection 106 to a concentrator 108. The concentrator 106 sends processed data on a connection 110 to a pixel processor 112, which sends the data after processing on a connection 114 to a pixel cache 116, which then sends the processed data to a frame buffer 132 on a connection 118, and the frame buffer outputs the data on a connection 134.

The instruction engine 100 also sends configuration and/or command messages i to the other stages 104, 116 and 132. These messages are sent on a bus 130, and by means of a series of connections depicted by arrows 122 to 128, to the various processing stages 104, 108, 112, 116 and 132 respectively. A disadvantage of this arrangement is that the data bus in the forward direction, which comprises the connections 102, 106, 110 and 114, must be "flushed" prior to the instruction engine 100 sending a block of configuration/command messages on the bus 130 to the various processing modules. This must be done in order to avoid a situation where, for example, a command message sent by the instruction engine 100 on the bus 130 reaches the pixel processor 112 via the connections 120, 130, 126 before the pixel processor 112 has fully completed processing data previously received via the data bus 110.

Fig. 2 shows an arrangement which overcomes the aforementioned requirement to flush the pipeline. The prior art arrangement as shown in Fig. 2 comprises a series of consecutive processing stages 200, 204, 208, 212, 216 and 220, interconnected by a forward bus depicted by arrow segments 202, 206, 210, 214 and 218. The instruction 250701; 13:37 562652.doc engine 200 generates all configuration and command messages in this arrangement. The single forward bus arrangement carries both graphic data and command messages in an interleaved fashion. This avoids the need to flush the pipeline, ie. the forward data bus, of graphic data prior to sending configuration commands. This feature derives from the fact that the interleaving structure of data and commands on the forward bus maintains a relative order between the data and the commands, and accordingly, commands do not reach a particular processing stage prior to data which must first be processed.

Fig. 3 depicts the graphics processor bus architecture of Fig. 2, to which an additional bus, typically called a "register bus", has been added using a brute force "star" arrangement. Substantial graphics processing systems implemented in hardware generally require this register bus, which is used for chip configuration (eg. for setting registers), and also for testing and debugging purposes. The forward data bus (depicted by the arrow segments 302, 306, 310, 314 and 318), which carries commands as well as data, and the register bus (depicted by dashed arrows 326-336), are separate entities, each of which connects to a substantial number of points on the IC. As chip design becomes more advanced and ICs get larger, the wire interconnections within the IC become a dominating factor in the design, as opposed to design of the various logic cells. Routing S. of large buses on ICs is hence becoming an increasingly difficult problem.

The star connected bus configuration shown in Fig. 3 allows the initiating module, ie. the register bus controller 324, to communicate directly with all the slave modules 300, 304, 308, 312, 316 and 320. As noted, however, this solution is very wire intensive.

Fig. 4 shows a less wire intensive approach, when compared to Fig. 3, to implementation of the register bus. This approach uses a daisy-chain arrangement to provide communication between a register bus controller 426 and the slave modules 400, 404, 408, 412, 416 and 420. The primary processing modules comprise the instruction engine 400, the polygon processor 404, the concentrator 408, the pixel processor 412, the pixel cache 416 and the frame buffer 420. These modules are directly connected in a pipeline fashion as depicted by forward bus arrow segments 402, 406, 410, 414 and 418, 250701; 13:37 562652.doc where the frame buffer 420 provides an output on a connection 422. The register bus controller 426 is connected to the instruction engine 400 as depicted by a dashed arrow 428. The instruction engine 400 is, in turn, connected in a daisy chain fashion to the polygon processor 404 as depicted by a dashed arrow segment 430. In a similar fashion, the polygon processor is connected to the concentrator 408 as depicted by a dashed arrow segment 432, the concentrator being connected to the pixel processor 412 as depicted by a dashed arrow segment 434. The pixel processor 412 is connected to the pixel cache 416 as depicted by a dashed arrow segment 436, and the pixel cache 416 is connected to the frame buffer 420 as depicted by a dashed arrow segment 438. The frame buffer 420 is connected to the register bus controller 426 as depicted by a dashed arrow segment 424.

Solid arrow segments are used to denote the data and command message data go,i connections, while dashed arrow segments are used to donate register bus data connections. The register bus need not be connected in the same order as the pipeline.

In the daisy chain arrangement, the master, ie. the register bus controller 426 can send out, for example, a register command (eg. "readregister"), which flows around the loop of dashed arrow segments until the command arrives at the appropriately addressed module (called a "target" module). If a module receives a command outside its address ***range, (ie. a command for which it is not the target-module), the module simply passes S 4 the command, unchanged, on to the following module. When the target module receives the command, the target module processes the command, and sends an acknowledgment signal back to the master (ie. the register bus controller 426) along with return data, if appropriate (eg. a "readregisteracknowledge" command in this example), on the daisy chain. If a module, other than the master, receives an acknowledgment command, it simply passes the acknowledgment command on unchanged.

Summary of the Invention It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.

According to a first aspect of the invention, there is provided a graphic processor bus architecture comprising: 250701; 13:37 562652.doc a series of pipelined processing stages wherein at least two stages of said series are configured to generate a packet comprising at least one of a command, a register instruction, and data; and a bus interconnecting said stages.

According to a second aspect of the invention, there is provided a graphic processor bus architecture comprising: a series of pipelined processing stages wherein at least two stages of said series are configured to generate a packet comprising at least one of a command, a register instruction, and data; a forward bus interconnecting said stages; and a reverse bus interconnecting a last stage in said series to a first stage in said series.

Brief Description of the Drawings A number of preferred embodiments of the present invention will now be described with reference to the drawings, in which: Fig. 1 shows a prior art graphics processing pipeline architecture; Fig. 2 shows another prior art graphics processing pipeline architecture; Fig. 3 shows conventional addition of a chip configuration bus to the architecture shown in Fig. 2; S 20 Fig. 4 shows a prior art daisy chain architecture; and :Fig. 5 depicts an advantageous arrangement of a graphics processor bus architecture.

Detailed Description including Best Mode Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.

In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including" and not "consisting only 250701; 13:37 562652.doc of'. Variations of the word comprising, such as "comprise" and "comprises" have corresponding meanings.

Fig. 5 shows one arrangement of an improved graphics processor bus architecture. A series of consecutive processing stages 500, 504, 508, 512, 516 and 520 are interconnected by a forward bus depicted by arrow segments 502, 506, 510, 514 and 518. The pixel cache 516 is connected to a frame buffer 520 by a forward bus segment 518. The frame buffer stage 520 makes a decision about where to send data incoming on the forward bus segment 518, and delivers data either on an output bus segment 522, or alternatively, sends the data on a reverse bus 524 to a register bus controller 526, and thereafter by means of a reverse bus segment 528, back to the instruction engine 500. In addition to the closed loop nature of the architecture shown in the figure, it is also noted that the forward bus (ie. 502, 506, 510, 514 and 518) carries both data, commands and ooaQ register instructions in the forward direction. The reverse bus, depicted by the arrow segments 524 and 528, carries data, commands and register instructions in the reverse direction. Accordingly, the described arrangement shows an architecture having a single closed-loop bus structure, which is described in terms of a single forward bus and a single reverse bus for clarity of explanation.

In Fig. 5, the instruction engine 500 is capable of generating commands, and in addition, any of the series of consecutive processing stages 500, 504, 508, 512, 516, 520 20 and 526 are capable of generating commands. This command generation can occur either independently, or as a consequence of processing another command that the aforementioned processing stage has received.

Furthermore, the instruction engine 500 is capable of generating register instructions, and in addition any of the series of consecutive processing stages 500, 504, 508, 512, 516, 520 and 526 are capable of generating register instructions. This register instructions generation can occur either independently, or as a consequence of processing another register instructions that the aforementioned processing stage has received.

Furthermore, the instruction engine 500 is capable of generating data, and in addition any of the series of consecutive processing stages 500, 504, 508, 512, 516, 520 250701; 13:37 562652.doc -6and 526 are capable of generating data. This data generation can occur either independently, or as a consequence of processing data or commands that the aforementioned processing stage has received.

Furthermore, generation of a command, a register instruction, and data, can occur independently in any one of the series of consecutive processing stages 500, 504, 508, 512, 516, 520 and 526, or in the most general case, can be generated as a consequence of processing any one of a command, a register instruction, and data that the aforementioned processing stage has received.

The term "packet" is used to denote either a configuration or command message (eg. "set background colour"), or a register bus instruction (eg. "set register or data.

A structure of the packet is described in more detail in regard to the inset 530.

.i Forward packets flow down the forward bus and are processed by respective target processing stages in order. At the end of the forward processing path, the frame buffer 520 makes a decision about where to send a received forward packet. The buffer 520 thus can terminate the forward packet and deliver associated data, if the packet is a data packet, on the output bus 522. The data is typically pixel data. Alternately, the buffer 520 can pass the packet back around the loop by means of the reverse bus 524-528, the packet then being referred to as a reverse packet.

S The closed loop architecture described in the figure can also be utilised in an "open loop" configuration, by disconnecting the reverse bus 524, while maintaining other functionality as described in relation to the closed loop architecture. In the open loop architecture, the "forward bus" depicted by the arrow segments 502, 506, 510, 514 and 518 can be referred to simply as a "bus", since there is no "reverse" bus from which the bus needs to be distinguished. Similarly, the term "packets" can be used in this case, since there is no need to distinguish between "forward" and "reverse" packets.

The frame buffer 520, can also provide a debug mode capability to assist in debugging the various elements of the graphics processor architecture. In the debug mode, the frame buffer 520 delivers a copy of the system bus, instead of the usual output data, on the output bus segment 522. This allows a designer to monitor what is happening 250701; 13:37 562652.doc -7internally in the chip, and facilitates debugging of any problems. Any packets which would normally be passed on by the frame buffer 520 on the reverse path 524, continue to be so passed in debug mode, a copy of the packet also appearing at the output 522.

The physical structure of the forward and reverse bus in the preferred embodiment is illustrated in an inset 530, which shows a bus structure 38 bits wide, with bits 0 to 31 being reserved for data, and bits 32 to 37 being reserved for an identifier field.

In a traditional multi-bus system where data and configuration/command messages are carried on one bus, and chip configuration instructions are carried on another separate register bus, the main bus would be 37 bits wide, while the register instruction bus would also be 37 bits wide ie a total of 74 bits wide. In a graphic processor containing ten modules, therefore, the combined bus according to the described arrangement, as depicted in the inset 530, results in the saving of 360 connection wires on the chip. This is calculated by noting that (10 X 38) wires are required, instead of(10 X 74) wires.

Bearing in mind the bus structure depicted in the inset 530, it is noted that everything sent on forward and reverse bus structures in the present arrangement is referred to as a packet, whether data, register instructions, or configuration/commands are i being sent. If data is being sent, then the identifier field of the bus indicates that the packet, in the case being considered, comprises data only, and the identifier indicates what particular processing should be applied to the data. The identifier field of the bus structure shown in the inset 530 indicates whether the contents of the bus are, at the time being considered, in the form of data, or in the form of a command or a register instruction. The destination of a current packet can be defined implicitly, or alternatively, the "data" portion of the bus can be used to explicitly identify a destination, ie. the target, ooee .oV of a packet.

o° 25 It is useful to define a "transaction" as being either a configuration/command, or S°a register instruction. Using this terminology, it is noted that two types of transactions are Savailable. Transactions without an acknowledge (eg. "registerwrite", "pixelout") are known as type 1 transactions. Alternatively, transactions with return data or an acknowledge (eg. "register-read"), are known as type 2 transactions. Type 1 transactions 250701; 13:37 562652.doc are sent by an originating processing stage, wherever that stage may be in the bus architecture loop, and it is clear that this transaction will, in due course, be processed by a relevant target processing stage. Type 2 transactions on the other hand, such as register reads, require that a response be returned to the originating stage, this response typically being a type 1 transaction. When a type 2 transaction is sent, the target stage which processes the transaction performs the associated action (eg. the stage reads the specified register), and then sends the return acknowledgment back as a type 1 transaction, which in the present example is a return_datafromregisterread command.

It is noted that all packets on the bus retain a relative ordering, and are therefore executed in that order. At a given instant in time, if an originator of a type 2 transaction is still waiting for a response from the associated target stage, the type 2 transaction is termed an "outstanding transaction".

The act of processing a type 1 transaction can result in generation of consequent type 1 transactions. For example, a "repeatpixel" command can be processed by the .o pixel cache stage 516, which will cause the pixel cache to output pixel_out commands, where is specified in the "repeatpixel" command. Accordingly, while the pixel cache 516 is busy generating the "pixelout" commands, it must stall the *pipeline.

The described arrangement allows any processing stage to generate a packet, whether that packet is independently generated, or generated as a consequence of .oprocessing another packet that the particular stage has received from elsewhere. The arrangement also concentrates the functions of data flow, command and register instruction flow, into a single bus system. Furthermore, a looped bus structure is disclosed, whereby data, commands, and register instructions, originating as previously noted from any processing stage, can be directed to any other processing stage, either upstream, as denoted by an arrow 532, or down-stream, as denoted by an arrow 534, in the architecture.

A "First In First Out" (FIFO) buffer can be added between one or more pairs of modules in order to smooth the data flow and reduce effects of stalls.

250701; 13:37 562652.doc A number of design rules should preferably be followed in order to prevent the architecture from "locking up". Lockups can, for example, occur if every processing stage in the system stalls its upstream stage, while waiting for a downstream stage to accept data currently being processed by the module, or stage, in question. In order to avoid such a situation, the following rules should preferably be followed: each packet, where the term packet encompasses configuration/command messages, (eg. "set a background colour"), or register instructions (eg. "set register or data, should be correctly directed to a target processing stage. In other words, every possible packet, legal or illegal in the context of the particular implementation, should be terminated in a sink stage. An example of an illegal packet is a packet having an undefined format as a result of signal corruption.

while a processing stage is waiting for a return value, or an acknowledgment, the stage should continue processing data, and passing other packets as required; no more than one stage should be capable of having multiple outstanding transactions at any given time; there should be at least one stage which does not independently generate new transactions;.

modules which are capable of independently generating packets should give priority to incoming packets over self-generated packets; packets which are routed on the reverse bus should not expand into multiple packets, although this expansion may be allowed provided that the design prevents the expansion from filling the pipeline loop to capacity; Design rule nos. 1 to 6 ensure that the bus system does not lock up as a result of every module blocking transactions while waiting for its outstanding transactions to be processed.

Industrial Applicability It is apparent from the above that the embodiment of the inveifion is applicable to the computer and data processing industries.

250701; 13:37 562652.doc The foregoing describes only one embodiment of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiment being illustrative and not restrictive.

250701 13:37 *e S a 250701; 13:37 562652.doc

Claims

1. A graphic processor bus architecture comprising: a series of pipelined processing stages wherein at least two stages of said series are configured to generate a packet comprising at least one of a command, a register instruction, and data; and a bus interconnecting said stages.

2. A graphic processor bus architecture according to claim 1; wherein one of said at least two stages is adapted to generate said packet in response to receipt of a prior packet. S.o

3. A graphic processor bus architecture according to claim 2, wherein; said generated packet is a plurality of packets.

4. A graphic processor according to claim 1, wherein one of said at least two stages can output a copy of packets on the bus, whereby monitoring of the bus can thereby be performed. 20

5. A graphic processor bus architecture comprising: a series of pipelined processing stages wherein at least two stages of said series are configured to generate a packet comprising at least one of a command, a register S.i instruction, and data; a forward bus interconnecting said stages; and a reverse bus interconnecting a last stage in said series to a first stage in said series.

6. A graphic processor bus architecture according to claim 5, wherein; one of said at least two stages is adapted to generate said packet in response to receipt of a prior packet. 250701; 13:37

562652.doc 12-

7. A graphic processor bus architecture according to claim 6, wherein; said generated packet is a plurality of packets.

8. A graphic processor bus architecture according to claim 5; wherein a second or subsequent stage of said series of pipelined stages is an originating stage which is adapted to send said packet to a target stage of said series of pipelined stages, said target stage being upstream of said originating stage.

9. A graphic processor according to claim 5, wherein one of said at least two stages can output a copy of packets on the bus, whereby monitoring of the bus can thereby be performed.

10. A graphic processor bus architecture substantially as described herein with reference to Fig. DATED this thirty-first Day of July, 2001 Canon Kabushiki Kaisha Patent Attorneys for the Applicant SPRUSON FERGUSON 562652.doc 250701; 13:37