Centralised interactive graphical application server

09852490 · 2017-12-26

    Inventors

    Cpc classification

    International classification

    Abstract

    A system for processing a plurality of graphical programs on a centralized computer system whereby the images produced by the programs are compressed and transmitted to a plurality of remote processing devices where they are decompressed. Compression assistance data (CAD) is produced by intercepting instructions outputted by the programs and the CAD is then used in the compression step.

    Claims

    1. A system for producing a compressed image data transmission comprising: i) a graphics instruction modification module (GIMM); ii) a graphics processor unit (GPU) accessible by the GIMM; iii) a transmission module; iv) memory accessible by the transmission module and the GPU; wherein operation of the system causes the system to perform at least the following steps: a) receiving a first set of unmodified graphics instructions from a first graphics generating computer program; b) modifying at least one of the first set of unmodified graphics instructions by the GIMM to create at least one modified graphics instruction; c) processing at least one of the first set of unmodified graphics instructions or at least one of the modified graphics instructions or a combination thereof by the GPU to generate a plurality of pixels; d) compressing at least one of the plurality of pixels to generate compressed image data; e) transmitting the compressed image data by the transmission module for decompression by a remote decompression device; wherein the step of modifying comprises at least one action selected from a list consisting of adding at least one graphics instruction, removing at least one of the graphics instructions, replacing at least one of the graphics instructions, translating at least one of the graphics instructions and any combination thereof; and wherein the step of modifying facilitates the generation of the compressed image data transmission.

    2. The system as claimed in claim 1 wherein step (b) occurs before step (c).

    3. The system as claimed in claim 1 wherein the step of modifying is responsive to a bandwidth constraint, a latency constraint or a combination thereof.

    4. The system as claimed in claim 1 wherein the step of modifying is performed to facilitate the compressing of the image data.

    5. The system as claimed in claim 1 wherein operation of the system further causes the system to perform the step of determining at least one difference in at least one pixel of the plurality of pixels between two different points in time to further facilitate the generation of the compressed image data transmission.

    6. The system as claimed in claim 1 wherein operation of the system further causes the system to perform the step of determining at least one similarity between at least one pixel of the plurality of pixels between two different points in time to further facilitate the generation of the compressed image data transmission.

    7. The system as claimed in claim 1 wherein operation of the system further causes the system to perform the step of identifying an attribute of at least one of the unmodified graphics instructions the step of identifying an attribute comprises at least one step selected from a group consisting of identifying a difference between two of the unmodified graphics instructions, identifying the type of one of the unmodified graphics instructions, identifying the contents of one of the unmodified graphics instructions and any combination thereof; and wherein the step of modifying is responsive to the attribute.

    8. The system as claimed in claim 7 wherein the step of modifying is performed to facilitate the compression of the image data.

    9. The system as claimed in claim 1 wherein the generation of the compressed image data is performed by the GPU.

    10. The system as claimed in claim 1 wherein operation of the system further causes the system to perform the step of receiving a second set of unmodified graphics instructions from a second graphics generating computer program and wherein the step of modifying is performed to facilitate sharing the GPU between instructions associated with the first graphics generating computer program and instructions associated with the second graphics generating computer program.

    11. A system for producing a compressed image data transmission comprising: i) a graphics instruction modification module (GIMM); ii) a graphics processor unit (GPU) accessible by the GIMM; iii) a transmission module; iv) memory accessible by the transmission module and the GPU; wherein operation of the system causes the system to perform at least the following steps: a) receiving a first set of unmodified graphics instructions from a first graphics generating computer program; b) modifying at least one of the first set of unmodified graphics instructions by the GIMM to create at least one modified graphics instruction; c) processing at least one of the first set of unmodified graphics instructions or at least one of the modified graphics instructions or a combination thereof by the GPU to generate a plurality of pixels; d) determining at least one difference in at least one pixel of the plurality of pixels between two different points in time, or determining at least one similarity between at least one pixel of the plurality of pixels between two different points in time, or a combination thereof; e) compressing at least one of the plurality of pixels to generate compressed image data; f) transmitting the compressed image data by the transmission module for decompression by a remote decompression device; wherein the step of determining facilitates the compression; wherein the step of modifying comprises at least one action selected from a list consisting of adding at least one graphics instruction, removing at least one of the graphics instructions, replacing at least one of the graphics instructions, translating at least one of the graphics instructions and any combination thereof; and wherein the step of modifying facilitates the generation of the compressed image data transmission.

    12. The system as claimed in claim 11 wherein step (d) occurs before step (e).

    Description

    (1) In order to provide a better understanding of the present invention, an embodiment will now be described by way of example only, with reference to the accompanying figures, in which:

    (2) FIG. 1 illustrates in schematic form a centralised interactive graphical application system in accordance with the present invention;

    (3) FIG. 2 illustrates in schematic form a typical pipeline for the processing of a game;

    (4) FIG. 3 illustrates the typical processing flow of rendering a scene by a graphics processor;

    (5) FIG. 4 illustrates a texture being projected onto a simple 3D scene;

    (6) FIG. 5 illustrates an overview of a gaming system according to one embodiment of the invention;

    (7) FIG. 6 is a schematic diagram illustrating a single games server according to an embodiment of the invention;

    (8) FIG. 7 illustrates, in schematic form, a vertex shader;

    (9) FIG. 8 illustrates, in schematic form, the architecture in accordance with a preferred embodiment of the present invention;

    (10) FIG. 9 illustrates a flowchart of the steps for the processing of a first frame in accordance with a preferred embodiment of the present invention; and

    (11) FIG. 10 illustrates a flowchart of the steps for the processing of a second and subsequent frames in accordance with a preferred embodiment of the present invention

    (12) The present invention relates to a system for interactive applications that functions to generate compressed data streams representing video images which include computer-generated graphics. Information relating to the field of computer-generated graphics can be found in the textbook “3D Computer Graphics”, 3.sup.rd Edition, Alan Watt, Addison-Wesley, 2000.

    (13) Typically, a computer-generated 3D graphical figure is generated in the form of a mesh of polygons that define the shape of a 3D object and a texture which is laid across the mesh using a texture mapping process. Herein, the term “graphical object” may refer to a graphical figure, a single vertex, group of vertices, a “sprite” (collection of pixels) or parameter to a “Bezier patch” or “n patch” function object. A number of separate graphical objects may be combined into a fewer number of logical objects, e.g. walls and floors may be defined as being a single object and thus coated with the same texture. Herein, the term “flat color” refers to a single color, without variation across its surface.

    (14) With reference to FIG. 1, a system for interactive applications in accordance with the present invention 10 has an instruction interception module 11, a main graphics processor 12, a subsidiary graphics processor 13 responsive to the instruction interception module, and an encoder 14 responsive to the graphics processors. In this embodiment, the encoder is a DSP, alternatively the encoder may be a graphics processor, any CPU, or other processing device. In this embodiment, the instruction interception module is a device driver, alternatively the instruction interception module is an addition to existing middleware, such as DirectX™ or OpenGL™. In this embodiment, one instruction interception module feeds two graphics processors, but in another embodiment, there may be more than one instruction interception module, e.g. one per graphics processor. The main graphics processor and the subsidiary graphics processor are graphics processor modules. They may be separate hardware units, or separate processes executed on one hardware unit.

    (15) FIG. 2 shows a typical pipeline for the processing of a game comprising games software 21, 3D graphics middleware 22, a device driver 23, a dedicated graphics processor card 24, a rendered image 25, and a compression system or encoder 26. The dotted lines A to E denote the different sources of information that may be used by the encoding process in the compression sub-system.

    (16) Because of the immense processing power required to render the complex scenes of a typical game, the graphics are normally rendered by a dedicated graphics processor chip. In a PC, the chip normally resides on a separate card and in a console it is a separate chip in the main box.

    (17) The games software 21 feeds the graphics processor 24 with a 3D scene description, including the 3D co-ordinates of points describing objects, their position in 3D space, how they are to be distorted (e.g. a dent in a racing car), what skins (‘textures’) they are to be covered with, how they are to be lit or shaded (and hence the textures further altered). Following a large amount of processing, a fully rendered 2D image 25 is produced suitable for viewing on a device such as a computer monitor.

    (18) FIG. 3 shows a typical processing flow of rendering a scene by a graphics processor. The steps are: transformation 31 including movement and distortion of shapes; lighting 32 including lighting and shading of a scene; set-up and clipping 33 involving working out which object obscures others; and rendering 34 that is the drawing of the final scene.

    (19) With reference to FIG. 2, a difficulty in trying to obtain information about the shape and position of objects before the graphics processor (paths C to E) is that the position and shape of the object will not have been determined until it has been through the transformation step 31 of the graphics processor. As the objects lie in 3D space, knowledge of which objects are overlapped by others is not available until rendered by the graphics processor.

    (20) Taking the information from source A would mean that parallel processing cannot be performed and taking the information from source B, the graphics processor, would mean having to develop a new graphics chip instead of using the best already available on the market at any point in time.

    (21) With reference to FIG. 1, in the present invention an instruction interception module 11 feeds a subsidiary graphics processor.

    (22) In this topology the main graphics processor 12 is a standard device as used in popular computer graphics cards or games consoles and operates as normal. The subsidiary processors operate in parallel but are fed a different set of instructions according to the requirements of the encoder.

    (23) Object Identification

    (24) To determine the shape and position of individual graphics objects, the instruction interception module 11 is designed so as to feed the same 3D scene instructions to a subsidiary graphics processor 13 but with the lighting and shading aspects turned off or altered so as to minimise alterations of the textures in a scene. Instead of applying textures, the second processor is instructed to render each shape using a unique, flat color. Alternatively, it may have an aspect of the color that makes it unique, such as by using a preset bit-range of the textures/colors to uniquely identify an object. It is then a relatively simple task for the compression encoder 14 to determine the shapes and positions of objects and use this information to pick out the individual objects from the properly rendered image that had been produced by the main graphics processor 12.

    (25) In the logic used by the graphics card, the bit-range may cross over the boundary of the range used to describe, say, the blue and red elements of a color; the whole green range and parts of the blue range may be free to change.

    (26) The encoder may comprise of two elements: a pre-processor that works with the subsidiary graphical processor unit (GPU) to extract certain details such as motion vectors; and a ‘pure’ encoder that receives ‘short cuts’ from the pre-processor. This enables the use of off the shelf MPEG-4 encoders. The pre-processor may be software that runs on the same processor as the encoder, the instruction interception module, the games processor or a separate processor.

    (27) A pre-determined range of colors, shades or color content could be used to signify that a given shape was translucent. (MPEG-4 allows for objects to be either solid or to have one of 255 levels of transparency).

    (28) Furthermore, the direct feed-forward path 15 from the instruction interception module to the encoder could be used to allow the encoder to detect complete scene changes. In such cases that scene should not be compressed as a difference from a previous image (temporal compression) as the differences would be too great.

    (29) Although cheaper chips could be used, ideally the subsidiary processor is substantially exactly the same as the main processor, so as to avoid any mismatches between the images rendered, and operates at the same resolution as the main processor.

    (30) A number of these subsidiary processors may be used; the number will depend on the encoding technology being used, cost, space and power considerations. As an alternative to having one processor for each piece of information required by the encoder, if the processors and encoder were fast enough, fewer or indeed potentially only one graphics processor could be used. Hence more than one graphics processor function could be implemented with one unit of dedicated graphics processor hardware. In this scenario, the graphics processors could be clocked to render frames more quickly than the final display rate and the results stored in buffer memory in a series of adjacent, non-overlapping locations for the encoder to work on. Handshake signalling between the encoder and the instruction interception module would co-ordinate the processes.

    (31) Motion Estimation

    (32) The most computationally intensive element of MPEG-4 compression is ‘motion estimation’ which consumes between 50% and 80% of the processing power.

    (33) One of the methods of compressing moving video is in looking for similarities between subsequent frames of video. In an example of a person walking across a field, if the camera is still, then for each frame of video the field will be almost identical. The major changes would occur between where the person was in one frame and where he appears a split second later in the subsequent frame.

    (34) In conventional motion estimation the screen is divided into blocks and then compared with blocks of the previous frame, looking for as close a match as possible. Once identified, instead of transmitting the entire block a signal can be transmitted to the decoder that a particular block can be found in a previously transmitted frame but at a given offset from where it should appear in the new frame. This offset is called a motion vector.

    (35) Each of these blocks can be just 8×8 pixels. To conduct a search the encoder would first have to look at an offset of just one pixel to the left, subtract each of the 64 pixels of that block from those of the test block, then move one pixel to the left and one pixel up, and subtract all 64 pixels again and compare the results. In a fast moving scene, the offset could be say, 100 pixels away in any direction. This process is thus computationally intensive.

    (36) However, in the case of the graphics server of the present invention, to reduce the complexity, the subsidiary graphics processor is used to coat each of the 3D objects with a new texture (a coating, or ‘skin’). These textures have one aspect of their colors vary in the horizontal position and another vary in the vertical position. If a fixed point is taken on the screen and the pixels of the texture of the object compared at that position between two consecutive frames, it can be determined in which direction that object has moved. So, for example, if that pixel has a higher color value in the red component, it is detected that it has moved to the right. If it has a higher color value in the blue component, it is detected that it has moved downwards.

    (37) According to one embodiment of the present invention, 32-bit pixels are used (a ‘bit’ being a binary digit, equalling either 0 or 1). The following is an example of how these bits might be allocated in each pixel:

    (38) uuuuuuuuu aaaaaaaa yyyyyyy xxxxxxx

    (39) 10×u (unique code for each object/texture combo)

    (40) 8×a (alpha—used to denote transparency. 255=opaque)

    (41) 7×X (variation of texture in the x axis)

    (42) 7×Y (variation of texture in the y axis)

    (43) So, using an example, but in decimal to make it easier to understand, and only considering the x and y digits, the following pattern may be used:

    (44) TABLE-US-00001 11 12 13 14 21 22 23 24 31 32 33 34 41 42 43 44

    (45) In this example pattern above, one may consider one pixel placed one down and third from the left. In this frame it holds the number 23. If in the next frame the object had moved to the left by one pixel the apparatus would detect the number 24 in the same position on the screen. As the ‘x’ digit had increased from 3 to 4, the movement is detected as being 1 pixel to the left. If the ‘y’ digit, namely ‘2’ had decreased to 1, the object has moved down by one pixel.

    (46) These numbers are interpreted by the graphics card as colors, so in a 32-bit pixel, the 12 upper-most pixels may be reserved for red, the next 10 for blue and the last 10 for green.

    (47) Motion vectors are calculated on a block by block basis, either 8×8 or 16×16, in the current version of MPEG-4. Such motion vectors may be calculated by taking an average of all the x elements and an average of all the y elements.

    (48) This block based information may be used to get a motion vector for an entire object (or part of one) by detecting which pixels in each of the blocks belong to a given object and averaging them to deduce the motion vector(s) of that object. (An object may have more than one vector, e.g. a person walking will have limbs moving differently to the body).

    (49) Where an object has no sub-elements that might move separately, then once the motion vector is calculated for that object in one block, the apparatus may instantly deduce the motion vector of another block in which that object occupies the majority of pixels.

    (50) Motion vectors are applied on a block or macroblock basis. The encoder is adapted to generate a motion vector for an entire block by aggregating the motion vectors of the parts of the objects that occupy that block. In one embodiment, this involves calculating the motion vector of each pixel for each object in a block then calculating the motion vector of each object within that block (which may be different for the entire object over the whole screen) and then calculating the motion vector of that block using a weighted average of the objects within that block.

    (51) Typically, the entire object includes all the blocks that are deemed to be associated with an object by the encoder whether or not one of these blocks actually contains a pixel that belongs to that object. The motion vector therefore applies to all of the blocks.

    (52) When encoding objects (rather than just using objects internally), they are defined as a bounding rectangle such that that rectangle conforms to the grid of 16×16 pixel macroblocks. Many blocks or macroblocks may be totally empty but belong to that Video Object Plane, and hence that object.

    (53) In one embodiment, instead of wrapping each object with a unique texture, the method involves projecting a single, screen-sized texture onto the scene, much like a slide projector, and then blending (i.e. binding) that texture with the blank surfaces of all the objects in a scene. The objects' surfaces have been ‘cleaned’ and replaced with flat, unique colors beforehand so that individual objects can be identified. The blending process combines the projected, ‘reference’, texture with the objects' surfaces to produce unique textures on each object. Movement causes a ‘tear’ in this projected ‘fabric’ and so movement can be easily detected. The reference grid is located at the same position as the viewer.

    (54) This last method involves projecting a texture onto the entire scene. This process comprises of first determining a “projection matrix” which maps the texture onto the back plane (the furthest visible plane in the scene). This matrix is then used on each object so as to set its texture mappings to the projected texture. The underlying steps will be familiar to a person skilled in the art of graphics programming.

    (55) If the projected texture had the following pattern:

    (56) TABLE-US-00002 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

    (57) Then after movement by 1 pixel to the left by a 3×3 pixel square in the mid-left of the pattern, the following pattern is attained:

    (58) TABLE-US-00003 1 2 3 4 5 6 8 9 10 0 11 12 14 15 16 0 17 18 20 21 22 0 23 24 25 26 27 28 29 30

    (59) One calculation pass is used to project the texture, the scene is moved and the GPU performs the transformation calculations and moves the pixels on the screen—this is a second pass. The scene is then reset, by removing the projected texture and resetting the objects' surfaces to their initial states and re-projecting the texture. The whole process takes two passes (the re-setting and projecting happen in one pass).

    (60) Each of the objects may be pre-colored with a unique, flat color and when applying the grid texture, combining the texture with these colors to produce a unique texture for each object (e.g. the red channel is reserved to produce a unique color for each object and the other channels are reserved to produce the grid texture).

    (61) In one embodiment, these methods are implemented by an instruction interception function feeding instructions to a subsidiary graphics processor module that cause it to generate new object textures. These instructions cause it to generate a single texture that covers the entire image frame and binds to the surfaces of the objects in said image frame to produce new object textures. The single texture is projected onto or reflected from the image frame. The step of generating a single texture may include setting the surface of an object to a default and then altering the single texture so as to create and apply a new surface texture for the object. The default indicates whether the object is translucent and is unique to each object in an image frame. The encoder determines the shape and position of objects based on the new object textures.

    (62) The invention provides a way to associate a 3D object with its 2D screen representation and to identify not only motion but the amount and direction of the motion. The preferred embodiment is to use a projective texture to cast a reference grid on a scene. A texture is defined that has the same dimensions as the screen and project it from a point at the same position as camera. The effect will be that the entire scene will be covered by the texture.

    (63) A texel is a single element within a texture, similar to a pixel being a single screen element. Each texel of the projective texture may, in this embodiment of the invention, be split into a unique X and a Y component representative of its position in the texture. Various texture formats can be selected to accomplish this. The most common 32 bit texture formats typically have a red, green, blue, and alpha components, each occupying 8 bits. More suitable formats such as 10-bit colors or even floating point colors exist, but currently are not as widely supported.

    (64) By projecting such a grid on the scene a reference grid is provided that corresponds exactly with the pixel locations of the screen. By projecting this texture before any movement has occurred and then comparing the result against the original after movement then not only can it be detected that motion has occurred but also by how many pixels and what the x and y components of that motion are.

    (65) FIG. 4 shows a texture being projected onto a simple 3D scene. The dotted lines on the screen 42 show the outlines 44 of the 2D representations of the two 3D cubes 46 that comprise the scene with its background 48.

    (66) In order to track motion, a method of storing the current state of the positions of objects relative to the screen is provided, which should do so to a relatively fine level of resolution.

    (67) Data that could be used are the object vertices, the texels that form the surface of an object, or the variables that are used to create or move the polygons or texels.

    (68) One method of accomplishing this is to allocate a texture to store the screen position of the object across its surfaces—a “texture cache”. Each texel in this texture will store a representation of its screen position, determined by projection of the reference texture during rendering. Care must be taken in selecting the size of the texture in order to yield a high enough sampling resolution across the surfaces. Since the apparatus tracks the movement of pixels on the screen that is occurring in the original image, a constant texel to pixel ratio is preferred. Thus, the texture cache used is the same size as the original texture that had been allocated to that object. The original game may have allocated more than one texture to cover various parts of an object where a texture is tiled (repeated) across an object's surface. In the case of tiling the texture cache is made larger by the same multiple as the original texture is repeated across that object's surface.

    (69) Using projective texture mapping with the reference grid as the projected texture allows the apparatus, on a per texel basis, to identify where on the screen that texel would appear if it were in the ‘real’ image (i.e. the one the original programmer had intended).

    (70) When the object moves, the texture mapping of the texture cache also changes so that it now corresponds to its new position and this fact is confirmed by the fact that it now contains a different set of reference grid texels. The primary texture now ‘knows’ where the object's texels are in terms of 2D screen co-ordinates and the storage texture ‘knows’ where it was in the previous frame.

    (71) The screen position is sampled from within texture space, so the normal rendering pipeline cannot be used, as it results in screen space. In other words, the screen space position is rendered into the texture, instead of the normal way of things which is to render the texture into screen space, i.e. produce an image on the screen. When the object moves, the texture mapping of the texture cache will also change so that it now corresponds to its new position. The cache will therefore ‘remember’ the orientation of its texels with regard to the screen.

    (72) Two texture caches are, in this embodiment, maintained: one for the previous frame, and one for the current frame. In an alternative embodiment only one may be used per object or even only one for all objects whereby objects are allocated a portion of a texture instead of the whole texture.

    (73) For each frame, render targets may be swapped once at the beginning of the scene to a different texture cache, and the current positions are rendered into that cache. The other cache will still have the previous frame's screen positions. On a per texel basis then, a subtraction finds the exact screen movement of that texel from the previous frame to this frame. Programmable pipeline hardware can be used to render the texel differences of the current scene into an area accessible by the video encoder for use in completion of the compression sequence. The data stored in this area is the exact motion of the texels that represented each pixel and hence the motion of that pixel.

    (74) Since both the texture caches are bound to the object vertices in the same way they will not shift relative to each other. This technique therefore works with movement in the x, y or z axes as well as rotation or distortion through morphing.

    (75) Since some parts of a texture may be reused in a mapping, such as the example of a half-face texture being mapped twice, the invention provides a method of making sure a “unique texture space” is used, i.e. one in which there are no overlaps. Otherwise, for each time the texel appears in the image, its current value will be overwritten with the new screen position, and the apparatus would know of only one location of the texel (the last to be rendered), as opposed to all the locations of the texel. To overcome this limitation means having to re-map the vertices to a new texture by copying the original texture to a new texture. This method may be performed by doubling the size of the original texture and adding a constant offset to the vertices that referred to previously mapped areas.

    (76) The new texture may be squeezed back to the size of the original and thus shrink the original polygons. Alternatively the original polygon dimensions could be retained but the remapping re-arranged so as not to have so much unused space.

    (77) A further method would be to keep the texture caches the same size as the original texture. However, if N triangles are mapped to the original texture, then the caches would be divided into N rectangles. These rectangles would then be divided into two so that there were twice the number of triangles as the original.

    (78) The system may then run through all the triangles that are mapped to the original and assign them one of the triangles in each of the texture caches. The fact that the relative positions have changed when being remapped is not problematic as the texture caches are used to render to and hence only require a “unique texture space”.

    (79) Overview of Implementation

    (80) Reference is now made to FIG. 5, which illustrates the main components of a system implementing the present invention. The components include a bank 50 of centralised games servers serving a plurality of user stations such as television terminal 52. The centralised servers and user stations are connected via one, or more, data communications networks 54, such as a cable or satellite broadcasting network and/or a public telephone network. A set top box 56 is provided for, amongst other functions, decoding the compressed video data. A user input device 58, such as a joystick, is provided for detecting user input during a game. The network or networks 54 provide for the transmission of compressed video data 60 from the centralised servers 50 to the user terminals, where the set top box 56 converts the signal to a decompressed video signal 62, and the transmission of user input 64 in the other direction.

    (81) FIG. 6 schematically illustrates features of an embodiment of a single game server in the bank 50 of game servers. User input 64 comes in through a network interface 70 into a central processing unit (CPU) 72 of a conventional personal computer (PC), on which is running a conventional PC game program. The game programs send a first set of graphics instructions to a first graphics processing unit (GPU1) 76 which is intercepted by an instruction interception module 74, which may be embodied as a software wrapper or hardware form. The first set of instructions, including vertex data, transformation data and texture data are passed to GPU1 76 whilst a specially manipulated version of the instructions is generated and passed to a second graphics processing unit (GPU2) 78. Both the GPUs may be provided on a common graphics card. GPU1 76 renders the image data as the game intended whilst GPU2 78 is used to render specially adapted graphics data from which to extract compression assistance data used for compression, e.g. motion vectors. Each image frame is to be divided into blocks of pixels and then for each block, where possible, the x,y co-ordinates of that block of pixels are found in the previous image frame. A Digital Signal Processing unit (DSP) 84 uses the compression assistance data from GPU2 to compress the image data from GPU1, using a known video compression algorithm, such as MPEG-4, and passes the resulting compressed video stream 86 to the CPU 72 for transmission across the network 54. Handshake signalling 88 between the DSP 88 and the CPU 72 is used for quality control if there is congestion on the network.

    (82) Note, in relation to the above that the functions of the DSP 88 can be replaced by another CPU or even a different time slice on the CPU 72. Further, GPU2 78 can alternatively be embodied using another time slice on GPU1 76.

    (83) The method of the preferred embodiment allows the movement of objects to be tracked and hence the movement of the pixels, on the 2D screen. Knowledge of the shape and position of a 2D, screen representation of an ‘object’ is also useful for another compression technique aside from motion estimation. MPEG-4 provides for the possibility of encoding arbitrarily shaped objects. Amongst other benefits this allows the ‘blockiness’ at the periphery of a shape to be reduced.

    (84) It is important to note that the task at this level is to identify the best possible match between a block of pixels in the current frame and a block in the previous frame. It does not have to be an exact match. The encoder can then decide whether it is a good enough match to encode that block in inter-frame mode or to use an intra-frame technique.

    (85) A further embodiment of the invention will now be described with reference to FIGS. 7 to 10. The latest graphics chips now have programmable transform and lighting stages instead of (and as well as) the previous fixed graphics pipelines.

    (86) These are often referred to as “vertex shaders” (VS) and “pixel shaders” (PS) by Microsoft®. They are sometimes also referred to as “vertex program”, “fragment program”, “vertex processor” or “fragment processor” by other organisations.

    (87) The VS handles all the functions that deal with the moving and transforming of vertices (the 3D points that describe the structure of an object) as well as the lighting, color and the texture mapping co-ordinates. The PS however deals with such things as textures and how they are applied to objects and some lighting.

    (88) Whereas before, a games programmer had to use the fixed pipeline—i.e. a non-configurable process—by using the VS he now has the ability to write a short program (programmable pipeline module) that will execute on the graphics processing unit (GPU).

    (89) The game communicates with the VS by first loading a program and then passing in data through a number of “constant” registers. It then passes vertex data through some “input” registers and executes the vertex program. The resulting, transformed vertices then appear in “output” registers. However these registers do not contain data that corresponds to 2D screen co-ordinates but instead as “reciprocal homogenous co-ordinates”. Another stage of the GPU pipeline called the “stepper” converts these to 2D screen co-ordinates and clips the vertices according to whether or not they lie within the field of view (“frustum”).

    (90) To track the motion of pixels across the screen therefore only the changes to the vertices need to be determined after they have been through the transformation stage. To get a pixel-level resolution the apparatus then interpolates across that polygon's surface using the same techniques as would normally be used to apply a texture to the surface of an object. That interpolation function is part of the standard processing pipeline of a GPU.

    (91) One way of processing the vertices would be to store them after they have been transformed by the VS and then subtract the vertices of two consecutive frames to calculate the screen movement. This is a viable proposition but then the issue of generating unique texture space should also be resolved.

    (92) Alternatively, the apparatus could take the difference between the input parameters and process those to end up with the difference between a vertex over two frames. The apparatus would thus store all the variables that are used in determining the transformation process of a vertex from the previous frame and then feed them into a new VS program at the same time as the variables used to transform the vertex of the current frame.

    (93) In frame 0 the apparatus stores the input variables. In the processing for frame 1, frame 0's variables plus frame 1's variables are input. The altered VS program is very similar to the original except that after having calculated the transformation of a vertex as was intended by the original game programmer, it has been altered so that it will also process the previous frame's vertex using variables stored in different registers. It will then go on to calculate the difference between these two vertices and output that result instead of a single, transformed vertex as was originally intended.

    (94) FIG. 7 shows a standard vertex shader 102. Note that the input vertex data 104 and vertex shader constants 106 could be in any order. The selection of which constants refer to color information, lighting, vertex positions, or otherwise is unknown. However, the outputs 108 are defined: oPos is the ‘screenspace’ position of the vertex, oD0 is the diffuse color, oD1 is the specular color, oT0 up to oT3 are four pairs of texture coordinates. oFog is the range of the fog and oPts is the point size for the point sprite processor.

    (95) In order to determine which inputs (“vertex shader constants”) and which instructions in the VS program are used to determine the new positions of the vertices, the process begins with the output, oPos and works backwards.

    (96) With reference to FIG. 8, in the graphics pipeline 200, a game program on the CPU 202 is shown with a DirectX wrapper 204 and storage for parameters 26. The vertex shader 208 outputs both transformed vertices and motion vectors 210.

    (97) With reference to FIG. 9, the steps 300 of the system are shown in processing the first frame, F0.

    (98) First 302, The primary (original) VS program is analysed by the DX wrapper and a new, associated program created. Here the process analyses the original VS program, determines which of the instructions and input parameters deal purely with the position of a vertex and creates a new VS program.

    (99) The original VS program is then loaded 304 onto GPU.

    (100) Next 306, F0 data is loaded as parameters into VS program and a copy is made. All the input parameters used by the original VS program that affect a vertex's position in F0 are stored.

    (101) A single vertex of the model is loaded and processed 308. One vertex of the model transformed into screen space is now available (It is actually a 4-D vector in canonical device co-ordinates and another name for it is reciprocal homogenous co-ordinates, RHC, this is converted to screen co-ordinates by a “stepper”).

    (102) Finally 310: if more vertices of model are left, go back to step 306.

    (103) With reference to FIG. 10, the steps 400 of the system are shown in processing the second, F1, and subsequent frames.

    (104) The original VS program is loaded 402 onto GPU.

    (105) Next, 404, F1 data loaded as parameters into VS program a copy is made. All the input parameters used by the original VS program that affect a vertex's position in F1 are stored for processing of subsequent frames.

    (106) A single vertex of model loaded and processed 408.

    (107) The previously created new, associated VS program is loaded 410.

    (108) The input parameters relating to a vertex position from the previous frame, F0, and the parameters for the current frame, F1, are loaded 412 into the new VS program then the same vertex is processed 414 using F0 data and then F1 data and the difference is output. The new VS program processes these two frames' worth of vertices in one pass, subtracts the two results and outputs it through the normal output register of the VS.

    (109) If more vertices of the model are left 416, go back 418 to step 402.

    (110) Instead of loading and unloading two VS programs, one possibility is to join the two to make one big VS program. An input parameter or an internal variable is used to jump to the appropriate part of the program according to whether the motion vector is to be calculated the image is to be rendered. Using one VS program halves the render state changes (i.e. the system can loop back 420 to step 404) and thus speeds things up slightly.

    (111) The rest of the rendering pipeline then continues as normal with the “Stepper” interpolating across a polygon, culling occluded pixels and handing over to the pixel shader for rendering into memory. The only difference is that instead of rendering into screen memory, the pipeline is instructed to render into a different memory location.

    (112) Optionally, the two transformed vertices may be output instead of the differences.

    (113) The whole of this process is called “parameterisation” and the determining of which parts of the existing VS are relevant to us is known as “register coloring”. These techniques are used in compiler theory.

    (114) The benefits are in using existing hardware and in avoiding having to recompile the game code and this is done by altering the existing game code specifically by: the use of register coloring in the context of modifying an existing game program to ultimately determine “information useful for compression”. This step is essentially used so as to lessen the amount of input parameters to be stored and then fed into the new VS program as there are only a limited number of input registers. It also allows us to shorten the new VS program by avoiding copying the instructions that determine lighting or coloring of vertices. It is not essential that this is done, but it would be wasteful without it and the risk exists that the new VS program would be too large for the GPU or that too many input registers would be needed which the GPU didn't possess; the use of the vertex shader to determine motion vectors; and determining which parameters influence the position transformation of a vertex and storing them for processing later.

    (115) Vertices could be created dynamically by the game code and sent to the GPU in any order. Some way of recognising them between frames so that the correct pair are subtracted is required. This can be achieved by creating a signature for each vertex. The signature is made up of everything (or at least a number of things) that are used in rendering that vertex, for example which VS program is used to process it.

    (116) Current GPU hardware cannot store post-transformed vertices so if there are any complex special effects that require multiple passes this means that the vertex transformations will have to be calculated each time. If it is really complex then the programmer may opt to do those transformations on the CPU instead.

    (117) Optionally, a cache may be used to store post-transformed vertices between two or more frames (for use in determining motion vectors).

    (118) Normally, a single pass through a VS processes only a single vertex. A GPU may have several VSes that works in parallel but they all use the same render target. To calculate the difference between two vertices would mean swapping the render target. The normal process would be to load vertex 1, run the original VS, load and run the modified VS program, then load the original VS program again, load vertex 2 etc. Swapping targets and render states are overheads and doing so for every vertex would slow the overall processing considerably.

    (119) To minimise the delay, all the input details for that complete object may be stored and arranged so that the render target is changed only once before the vertices are processed. This would reduce both the number of render state changes and render target changes.

    (120) Another way of reducing state changes would be to have a single VS program. The process could combine the original VS program with the new part to make a larger program and then use one of the input constants as a switch so the single new VS knows which part of its code to process: the original or the new addition. The GPU can only render one pixel at a time so both VS programs may not be processed at the same time. Thus, there may be more than one VS program.

    (121) Furthermore, instead of having two distinct render targets, the process could have just one that was twice as large as the screen and simply direct the desired image to one half and the difference between vertices data to the other half. This would also help us use only one GPU instead of two. Alternatively, the process could embed the two pieces of information in a singleimage by restricting the rendered image to one range of colors and the difference between vertices data to another range.

    (122) For parameterisation to work it assumes that there is enough space for the altered VS program and that there are enough registers to pass in two sets of variables. That will largely depend on the VS program and the capabilities of graphics chip. However, there are things that can be done to help us ensure that the process does not run into problems.

    (123) The first is that when the host game program interrogates the device driver of the GPU to ask what capabilities it has, the process can state that the GPU has a smaller VS program store and fewer registers to store variables than that GPU actually supports.

    (124) At any one point in time there is a large variation in the graphics cards that are used by end consumers. And so games software must therefore cope with a variety of different capabilities. It is therefore common practice for a game to check the facilities offered by a GPU before trying to invoke one. By stating that the GPU is of a lower sophistication the chances of breaking the game are lessened. Instead, the game would adapt by either dropping that graphical feature or by passing that set of commands to the host CPU instead. The result may only be a minor diminishing of the graphical special effects rather than a break-down of the games program altogether.

    (125) As stated earlier, as well as doing vertex transformation calculations, a VS also deals with aspects such as lighting. Some of the VS program and some registers would therefore be used in a way that has no affect on the motion of pixels on the screen. To reduce the number of registers needed and to make the modified version of the VS program run faster the existing VS program is pre-analysed to determine which parts are relevant.

    (126) Sorting out what is and isn't relevant isn't straightforward if there isn't any prior knowledge of the game programmer's intentions, as the input registers aren't segregated. However, in normal operation the VS program will output a transformed vertex using a pre-designated register (which will then be used as the input to another stage in the overall graphics pipeline). Armed with that knowledge the process can work backwards through the VS program and identify every instruction and memory location that has an influence on a vertex co-ordinate, thus building a “dependency graph”. From this the process can identify the relevant instructions and input registers. This is a well known problem in compiler theory, and is called “register coloring”. The process can then strip out any unwanted instructions.

    (127) A GPU may also support a fixed graphics pipeline as well as the programmable type described above. For the techniques described to work the process would have to first produce a PS and VS programs that mimicked the processes of the instructions that would have used the fixed pipeline. These would then be used to replace the original fixed pipeline instructions before applying the techniques of the invention of motion vector generation.

    (128) Methods are well known to produce a single block-based motion vector from a group of pixel-based motion vectors and to decide between inter and intra coding.

    (129) This embodiment of the present invention has the following features: using the post-transformed vertices of two frames to calculate motion vectors of pixels on the screen; doing so by first storing them in memory in such a way that the vertices no longer overlap and then subtracting the two sets of vertices from each other to get the differences; using the vertex shader to process the differences in the input variables in order to generate the difference between vertices over two frames. (The “vertex shader” is a type of “programmable pipeline” as is a “pixel shader”.); using the vertex shader and/or pixel shader to determine the differences in color and or luminance over two frames; using a larger render target so that the real image and the differences between vertices may be stored as the same “render target” recording of input parameters for processing later so the process doesn't need to keep swapping render targets; analysing the vertex shader program and altering it so as to isolate only those instructions and data that influence the vertex positions themselves; lie to the host games program when it interrogates the GPU for its capabilities; using the above techniques within only one or more than one VS/PS/GPU (i.e. hacking the existing code in a single GPU, running a new processing cycle with the new code, or using another GPU altogether to run the new code); extrapolating the differences between vertices over the surfaces of the polygons so as to get “motion vectors” that correspond with the screen pixels.

    (130) Differences don't have to be between two consecutive frames—they could be any two frames.

    (131) CPUs can have more than one VS and/or PS.

    (132) Instead of polygons, a game may describe an object using such techniques as “Bezier patches” or “n patches” meaning that mathematical functions are used to describe 3D surfaces instead. Instead of tracking discrete vertices, the process may therefore have to track the input variables to higher-level shape description functions.

    (133) The techniques of the present invention could be used for 2D objects such as billboards, not just 3D ones.

    (134) Optionally, the method may be modified to encompass the use of tags with vertices so that it is easier to relate the same vertex between two frames. Some games use dynamically generated vertices to describe scenes. As it is not certain that they will be produced by the game in the same order the process cannot be sure that when the process is determining the difference between two vertices that in fact the process is dealing with the same point of the same object in both cases.

    (135) Optionally, instead of tracking all the vertices the process could track a lower resolution model such as a simple bounding box or even just two points in space that represent the extremes of the z positions of an object. The process could then apply the transformation functions to these points (maybe in the CPU) and simply extrapolate those post-transformed results to the pre-transformed vertices of the object.

    (136) An advantage of the present invention is that it exploits the fact that with synthetic images there is, by the particular methods and means provided by the invention, access to information relating to the images prior to its rendering and that this information can facilitate the compression process. By processing this information in parallel with the rendering of that image, the speed of compression and hence reduce the latency of the system can be improved.

    (137) Further modifications and improvements may be added without departing from the scope of the invention herein described.