DEVICES AND METHODS FOR GENERATING ELEMENTARY GEOMETRIES
20180005427 · 2018-01-04
Assignee
Inventors
Cpc classification
G06F8/45
PHYSICS
International classification
G06T1/20
PHYSICS
Abstract
Elementary geometries for rendering objects of a 3D scene are generated from input geometry data sets. Instructions of a source program are transformed into a code executable in a rendering pipeline by at least one graphics processor, by segmenting the source program into sub-programs, each adapted to process the input data sets, and by ordering the sub-programs in function of the instructions. Each ordered sub-program is configured in the executable code for being executed only after the preceding sub-program has been executed for all input data sets. Launching the execution of instructions to generate elementary geometries includes determining among the sub-programs a starting sub-program, deactivating all sub-programs preceding it and activating it as well as all sub-programs following it. Modularity is thereby introduced in generating elementary geometries, allowing time-efficient lazy execution of grammar rules.
Claims
1. An execution pipeline device comprising at least one graphics processor configured to launch the execution of instructions adapted to generate elementary geometries usable for rendering at least one object of a 3D scene, from input geometry data sets, said instructions being grouped into at least two ordered sub-programs, each comprising a part of said instructions and being adapted to process said input geometry data sets according to rules associated with a node of a dataflow graph, and each of said sub-programs that follows a preceding of said sub-programs being arranged for being executed only after said preceding sub-program has been executed for all said input geometry data sets, said execution pipeline device further comprising a transform feedback module-implementing a transform feedback mechanism configured to associate each of said ordered sub-programs with a respective Vertex Buffer Object in order to execute said sub-programs that follows a preceding of said sub-programs being arranged for being executed only after said preceding sub-program has been executed for all said input geometry data sets, and an intermediate memory element being associated with each of said sub-programs that follows a preceding of said sub-programs, the intermediate memory element storing intermediate elementary geometries obtained by the execution of the associated sub-program.
2. The execution pipeline device according to claim 1, wherein said at least one processor is further configured to: determine among said ordered sub-programs, a starting sub-program from which execution is needed, deactivate all sub-programs preceding said starting sub-program, activate said starting sub-program and all sub-programs following said starting sub-program.
3. The execution pipeline device according to claim 1, wherein said at least one processor is further configured to determine said starting sub-program for one frame when said sub-programs involve no active feedback loop: when said frame is a first frame, by setting said starting sub-program to the first of said sub-programs, when said frame is a special frame for which at least one parameter associated with at least one of said sub-programs is modified on-the-fly, by setting said starting sub-program to the first of said at least one associated sub-program, when said frame is a frame different from said first frame and from said special frame, by setting said starting sub-program to the last of said sub-programs, associated with rendering said at least one object of a 3D scene.
4. The execution pipeline device according to claim 1, wherein said at least one processor is further configured to determine said starting sub-program for one frame when said sub-programs involve at least one active feedback loop from at least one next sub-program to at least one previous sub-program among said sub-programs: when said frame is a first frame, by setting said starting sub-program to the first of said sub-programs, when said frame is a special frame for which at least one parameter associated with at least one of said sub-programs is modified on-the-fly and said at least one associated sub-program is preceding the first of said at least one previous sub-program, by setting said starting sub-program to the first of said at least one associated sub-program, when said frame is a frame different from said first frame and from said special frame, by setting said starting sub-program to the first of said at least one previous sub-program corresponding to said at least one active feedback loop.
5. A method of executing instructions adapted to generate elementary geometries usable for rendering at least one object of a 3D scene, from input geometry data sets, said instructions being grouped into at least two ordered sub-programs, each comprising a part of said instructions and being adapted to process said input geometry data sets according to rules associated with a node of a dataflow graph, and each of said sub-programs that follows a preceding of said sub-programs being arranged for being executed only after said preceding sub-program has been executed for all said input geometry data sets, said method comprising implementing a transform feedback mechanism configured to associate each of said ordered sub-programs with a respective Vertex Buffer Object in order to execute said sub-programs that follows a preceding of said sub-programs being arranged for being executed only after said preceding sub-program has been executed for all said input geometry data sets, and an intermediate memory element being associated with each of said sub-programs that follows a preceding of said sub-programs, the intermediate memory element storing intermediate elementary geometries obtained by the execution of the associated sub-program.
6. The method according to claim 5, further comprising: determining among said ordered sub-programs, a starting sub-program from which execution is needed, deactivating all sub-programs preceding said starting sub-program, activating said starting sub-program and all sub-programs following said starting sub-program.
7. The method according to claim 5, further comprising determining said starting sub-program for one frame when said sub-programs involve no active feedback loop: when said frame is a first frame, by setting said starting sub-program to the first of said sub-programs, when said frame is a special frame for which at least one parameter associated with at least one of said sub-programs is modified on-the-fly, by setting said starting sub-program to the first of said at least one associated sub-program, when said frame is a frame different from said first frame and from said special frame, by setting said starting sub-program to the last of said sub-programs, associated with rendering said at least one object of a 3D scene.
8. The method according to claim 5, further comprising determining said starting sub-program for one frame when said sub-programs involve at least one active feedback loop from at least one next sub-program to at least one previous sub-program among said sub-programs: when said frame is a first frame, by setting said starting sub-program to the first of said sub-programs, when said frame is a special frame for which at least one parameter associated with at least one of said sub-programs is modified on-the-fly and said at least one associated sub-program is preceding the first of said at least one previous sub-program, by setting said starting sub-program to the first of said at least one associated sub-program, when said frame is a frame different from said first frame and from said special frame, by setting said starting sub-program to the first of said at least one previous sub-program corresponding to said at least one active feedback loop.
9. A graphics card comprising the execution pipeline device according to claim 1.
10. A computer program product comprising program code instructions to execute the method according to claim 5, when this program is executed on a computer.
11. A non-transitory processor readable medium having stored therein instructions for causing a processor to perform the method according to claim 5.
Description
4. LIST OF FIGURES
[0143] The present disclosure will be better understood, and other specific features and advantages will emerge upon reading the following description of particular and non-restrictive illustrative embodiments, the description making reference to the annexed drawings wherein:
[0144]
[0145]
[0146]
[0147]
[0148]
[0149]
[0150]
[0151]
[0152]
[0153]
[0154]
[0155]
[0156]
[0157]
[0158]
[0159]
[0160]
5. DETAILED DESCRIPTION OF EMBODIMENTS
[0161] The present disclosure will be described in reference to a particular hardware embodiment of a graphics processing device, as diagrammatically shown on
[0162] The apparatus 1 corresponds for example to a personal computer (PC), a laptop, a tablet, a smartphone or a games console—especially specialized games consoles producing and displaying images live.
[0163] The apparatus 1 comprises the following elements, connected to each other by a bus 15 of addresses and data that also transports a clock signal: [0164] a microprocessor 11 (or CPU); [0165] a graphics card 12 comprising: [0166] several Graphical Processor Units (or GPUs) 120, [0167] a Graphical Random Access Memory (GRAM) 121; [0168] a non-volatile memory of ROM (Read Only Memory) type 16; [0169] a Random Access Memory or RAM 17; [0170] one or several I/O (Input/Output) devices 14 such as for example a keyboard, a mouse, a joystick, a webcam; other modes for introduction of commands such as for example vocal recognition are also possible; [0171] a power source 18; and [0172] a radiofrequency unit 19.
[0173] The apparatus 1 also comprises a display device 13 of display screen type, directly connected to the graphics card 12 by a bus 130, to display synthesized images calculated and composed in the graphics card, for example live. The use of the dedicated bus 130 to connect the display device 13 to the graphics card 12 offers the advantage of having much greater data transmission bitrates and thus reducing the latency time for the displaying of images composed by the graphics card. According to a variant, a display device is external to the device 1 and is connected to the apparatus 1 by a cable or wirelessly for transmitting the display signals. The apparatus 1, for example the graphics card 12, comprises an interface for transmission or connection adapted to transmit a display signal to an external display means such as for example an LCD or plasma screen or a video-projector. In this respect, the RF unit 19 can be used for wireless transmissions.
[0174] According to a variant, the power supply 18 is external to the apparatus 1.
[0175] It is noted that the word “register” used in the description of memories 121, 16, and 17 designates in each of the memories mentioned, both a memory zone of low capacity (some binary data) as well as a memory zone of large capacity (enabling a whole program to be stored or all or part of the data representative of data calculated or to be displayed).
[0176] When switched-on, the microprocessor 11 loads and executes the instructions of the program contained in the RAM 17.
[0177] The random access memory 17 notably comprises: [0178] in a register 170, the operating program of the microprocessor 11 responsible for switching on the apparatus 1, [0179] in a register 171, a source code comprising instructions for procedural geometry generation, written in a high-level programming language called language A, [0180] in a register 172, a main compiler noted compiler A, configured to translate the source code of register 171 into a language for execution in the graphics card 12, called language B, the latter being for example GLSL or HLSL, [0181] in a register 173, an auxiliary compiler or sub-compiler, noted compiler B, configured to translate sub-programs derived from the source code and expressed in language B, into machine language directly interpretable by the graphics card 12, [0182] in a register 174, sub-programs extracted from the source code, [0183] in a register 175, a pipeline launcher able to launch a rendering pipeline constructed in the graphics card 12 from the source code; [0184] in a register 176, runtime parameters exploited in the rendering pipeline constructed in the graphics card 12, [0185] in a register 177, parameters representative of the scene (for example modelling parameters of the object(s) of the scene, lighting parameters of the scene).
[0186] The algorithms carrying out geometry generation and described hereafter in relation with the rendering pipeline are stored in the memory GRAM 121 of the graphics card 12. When switched on and once the parameters representative of the environment and the runtime parameters are loaded into the RAM 17, the graphic processors 120 of the graphics card 12 load those parameters into the GRAM 121 and execute the instructions of these algorithms in the form of microprograms of “shader” type using HLSL or GLSL languages for example. A shader is a program designed to run on some stage of a graphics processor, in the frame of a rendering pipeline.
[0187] The random access memory GRAM 121 comprises notably: [0188] in a register 1211, the parameters representative of the scene, [0189] in a register 1212, the runtime parameters, [0190] in a register 1213, the rendering pipeline or execution pipeline, having a generation part obtained as described below, [0191] in a register 1214, VBOs exploited in the pipeline stored in the register 1213, [0192] in a register 1215, intermediate buffers and rendering buffer exploited in the pipeline stored in the register 1213.
[0193]
[0194] The CPU 11 includes notably: [0195] a module 111 for segmenting an input source code into two or more sub-programs, to be exploited in an execution pipeline of the graphics card 12, [0196] a module 112 for ordering the extracted sub-programs in a sequential order, used in the execution pipeline, [0197] a module 113 for synchronizing the sequential execution of the sub-programs in the graphics card 12, [0198] a module 114 for compiling the sub-programs extracted from the source code, called the sub-compiling module 114, [0199] a module 115 for generating a geometry generation part of the execution pipeline to be run in the graphics card 12, [0200] a module 116 for generating a pipeline launcher, exploited for launching the execution pipeline, [0201] a module 117 for pipeline launching, adapted to launch the execution pipeline in particularly efficient ways compliant with the present disclosure.
[0202] The graphics processors or GPUs 120 of the graphics card 12 are more detailed with reference to
[0203] Each GPU 120 thus includes two main parts associated with rendering pipelines: an upstream part 20 dedicated to geometry generation stages before rasterization; and a downstream part 21 dedicated to rasterization and subsequent fragment stages. In operation, this is applied to one or several input geometries, corresponding respectively to surfaces to be rendered generally called patches. For rendering purpose, each input geometry may be subdivided into several elementary geometries. An input geometry corresponds typically to a quadrilateral or a square but may be any kind of geometric surface, such as e.g. a triangle.
[0204] The upstream part 20 comprises a vertex shader 201, which constitutes a first programmable stage associated with the rendering pipeline, handling the processing of each vertex of the input geometries. The vertex shader 201 is implemented as a microprogram comprising instructions for processing each vertex. For each input vertex, associated with user-defined attributes such as e.g. its position, a normal vector and texture coordinates, the vertex shader 201 outputs in operation an output vertex to the next stage of the rendering pipeline. That output vertex is associated with user-defined output attributes, including for example the user-defined input attributes and more, e.g. the binormal estimated from the tangent and the normal corresponding to a vertex. The vertex shader 201 processes each vertex independent vertices, meaning that it processes each vertex independently from the other vertices, i.e. without any information about the other vertices—there is accordingly a 1:1 mapping from input vertices to output vertices. For sake of illustration, if the input geometry corresponds to a quadrilateral, the vertex shader 201 processes four independent input vertices and outputs four independent output vertices, transmitted to the next stage for further processing.
[0205] The next stage of the upstream part 20 is a tessellation shader 202, which takes the vertices output from the vertex shader 201, pulls them into primitives, and tessellates the latter. The primitives (or geometric primitives) are the simplest handled geometric objects, or elementary geometries, obtained from the conversion of vertices by graphics language APIs such OpenGL or Direct3D.
[0206] The tessellation stage 202 itself comprises three sub-stages or shaders: the tessellation control 203, the tessellator 204 and the tessellation evaluation 205. The tessellation control shader 203 receives an array with the vertices of the input patch. It is activated for each vertex and computes the attributes for each of the vertices that make up the output patch, also stored in an array. In a variant, if some of the attributes associated with a patch are identical (e.g. same colour for each vertex of a patch), the common attribute(s) is/are associated to the patch, which enables to reduce the amount of information to be transmitted—namely: one piece of information associated with the patch instead multiple same pieces of information associated respectively with multiple vertices. The tessellation control shader 203 is also in charge of associating attributes to the output patch, these attributes defining the subdivision degree of the patch. For example, a patch corresponding to a rectangle may be subdivided into i×j quads, i and j being integers comprised between 1 and 64, 128 or 256; or into 2×i×j triangles. The more important the subdivision degree (i.e. the bigger i and j), the smoother the surface to be rendered and the more important the computation needs—so that an appropriate trade-off needs to be adopted. The subdivision degrees of each side of the patch and the interior of the patch are controlled by tessellation levels, the number thereof being controlled by tessellation levels—typically comprised between 0 and 64. For example, there are 4 outer tessellation levels (one for each side) and 2 inner tessellation levels for a patch being a quad, and there 3 outer tessellation levels (one for each side) and 1 inner tessellation levels for a patch being a triangle. At the output of the tessellation control stage 203, a set of vertices with attributes and a set of tessellation levels associated with the patch (corresponding to the input geometry) are produced and transmitted to the tessellator 204.
[0207] The tessellator 204 (also called TPG, i.e. Tessellation Primitive Generator) corresponds to the next sub-stage of the tessellation stage 202. It is responsible for generating primitives in function of the input geometry (the patch) and of the tessellation levels set by the tessellation control shader 203. A primitive being defined with its vertices, the tessellator 204 is responsible for the generation of new vertices inside the patch, attributes such as tessellation coordinates being associated with each new vertex. The number of primitives generated by the tessellator stage 204 is directly dependent on the tessellation levels set at previous stage.
[0208] The tessellation evaluation shader 205, corresponding to the last sub-stage of the tessellation stage 202, is activated for each newly created vertex, and is responsible for placing an input vertex in function of the tessellation coordinates and possibly other parameters—such as e.g. a displacement map.
[0209] The output of the tessellation stage 202, in the form of the generated primitives, is transmitted to the geometry shader 206. The latter, which constitute the next stage of the upstream part 20 associated with the rendering pipeline, has access to all the vertices that form every received primitive. It governs the processing of primitives in function of instructions comprised in this shader 206. Advantageously, those instructions include generating an index value to be assigned to each primitive, in function of the attributes (for example coordinates) associated with at least two vertices of the primitive, as preferably detailed in previous patent application WO 2013/104448 A1 to Thomson Licensing cited above.
[0210] A next functional block is the Transform Feedback 207, in charge of retrieving the primitives generated at the previous steps and to record them in Buffer Objects. This enables to re-use those primitives subsequently any number of times, through resubmitting same post-transform data. Buffer Objects are generally adapted to store arrays of unformatted memory in GRAM 121, and are used for storing vertex data, as well as pixel data retrieved from images or the framebuffer. More specifically, the data obtained from the chain of vertex, tessellation and geometry shaders 201, 202 and 206 are stored in VBOs (Vertex Buffer Objects), dedicated to vertex array data. As will be apparent below, those VBOs play a determining role in ensuring synchronization in the preferred embodiments described in the present disclosure.
[0211] The outputs of the geometry shader 206 are also transmitted to the downstream part 21 associated with rendering pipelines. They are submitted as inputs to a rasterizer 210, which is responsible for breaking down each individual primitive into discrete elements, based on the data associated with the primitive. This amounts to turning the graphics format governing the primitives into pixels or dots for output on the display device 13 or other output devices, or for storage in a bitmap file format. More precisely, the rasterizer 210 produces fragments, each of which represents a sample-sized segment of a rasterized primitive at the scale of a pixel—namely, the size covered by a fragment corresponds to a pixel area. In this respect, interpolating operations are carried out for the fragments in order to compute data values between vertices—attributes being computed for each pixel based on the vertex attributes and the pixel's distance to each vertex screen position.
[0212] A fragment shader 211 of the second part 21 is in charge of processing the fragments generated by the rasterizer 210. The outputs of that fragment shader 211 include colours, depth values and stencil values. More precisely, the fragment shader 211 is adapted to process each fragment so as to obtain a set of colours (e.g. RGB values) and a depth value exploited for example by a z-buffer technique. If a fragment is visible from a given point of view, the colour attributes corresponding to the fragment are then associated with the pixel of the rendered synthesis image.
[0213]
[0214] The CPU 11 comprises a compiler A, referenced 372, configured to translate a source code 371 (recorded in register 171), directed to generating elementary geometries from input geometry data sets, from language A into language B so as to produce the execution pipeline 12 in the graphics card 12. Compiler 372 is adapted to split source code 371 into sub-programs 374 (stored in register 174) upon explicitly user-defined instructions source code 371, defining which rules belong to each sub-program. It is also adapted to proceed with such a splitting when a synchronization step is required, which is for instance the case when a grammar rule has two or more predecessors. The way to force synchronization is then to create a new sub-program starting with this rule.
[0215] As specified above in the summary, in a variant, compiler 372 is able to split source code 371 into sub-programs 374 on the grounds of internal program analysis based on heuristic capacities.
[0216] Sub-programs 374 further define the nodes of a dataflow graph, having edges corresponding to the sub-program execution dependencies and a root node corresponding to the set of rules containing the grammar axiom rule, i.e. the start rule. In the general situation, this dataflow graph is an RCDG. Compiler 372 is configured to create for each node of the dataflow graph: a translation of sub-programs 374 in language B, an expression-instantiated language A interpreter corresponding to the sub-program source code, an adequate number of intermediate buffers for the storing of intermediate primitives obtained by executing sub-programs, and a Transform Feedback mechanism able to order the sequential sub-program execution at runtime—as will be illustrated below.
[0217] Compiler 372 is also configured to provide associated runtime parameters 376 to the execution pipeline 32, and to generate a pipeline launcher 375 (stored in register 175) responsible for launching the execution pipeline 32—and more precisely the geometry generation part thereof—in the graphics card 12. The runtime parameters 376 are preferably user-controlled.
[0218] Compiler A 372 is completed with auxiliary compiler B, referenced 373, configured to translate sub-programs 374 derived from the source code 371 and expressed in language B, into the machine language directly interpretable by the graphics card 12. Compilers 372 and 373 and pipeline launcher 375 are functional entities, which do not necessarily correspond to physically separated units. In particular, they can be implemented in one or several integrated circuits within the CPU 11. In relation with previously described
[0219] In operation, compiler A 372 turns the source code 371 into a set 31 of decorated parse trees or abstract syntax trees (AST) and derives from them sub-programs 374, which in the example are the three sub-programs P1, P2 and P3. Those sub-programs 374 in language B are then compiled to machine language by compiler B 373 and associated with language A interpreters respectively associated with those sub-programs 374 and created by compiler A 372, so as to form part of the execution pipeline 32.
[0220] As specified in the summary part, the creation of the interpreters is preferably based on the technique described in patent application WO 2013/104504 A1 to Thomson Licensing.
[0221] In addition, compiler A 372 orders the sub-programs 374 and creates the generation part of the execution pipeline 32 (corresponding to the upstream part 20 on
[0222] Quite significantly in the present disclosure, the derived sub-programs 374 are such that in the execution pipeline 32, each sub-program can be executed only after a preceding sub-program has been executed for all the input geometry data sets.
[0223] This will be made clearer through a first detailed example illustrated on
[0244] The instructions SetInput and SetOutput in the source code 371 are compiler directives, which are only used by compiler 372 and are not interpreted at runtime.
[0245] In a variant implementation concerning the Split instruction, the second parameter is a ratio comprised between 0 and 1 and indicating the relative length of the two parts along the split axis. In another variant, the split is greater than 2, so that the subdivision along the split axis leads to at least 3 parts, indicated by 2 or more second parameters instead of 1.
[0246] In the illustrative source code 371 above, all relevant instructions are explicitly user-defined for segmenting the latter into sub-programs P1, P2 and P3 of
[0247] As concerns the segmenting, this is made by explicitly stating U, V and W and by assigning them intermediate buffers b0, b1 and b2.
[0248] As regards the ordering of sub-programs, this is made by specifying the relationship between the rules by means of the intermediate buffers b0, b1 and b2. Accordingly, in the source code 371: [0249] b0 is an output buffer for rule U2 and an input buffer for rule V; [0250] b1 is an output buffer for rule U3 and an input buffer for rule W; [0251] b2 is an output buffer for rule V1 and an input buffer for rule W.
[0252] Consequently, it results from b0 information that sub-program P1 comes prior to sub-program P2; from b1 information that sub-program P1 comes prior to sub-program P3; and from b2 information that sub-program P2 comes prior to sub-program P3. Namely, ordered sub-programs P1, P2 and P3 constitute a sequential chain—without any branch or cycle.
[0253] On
[0254] As visible on
[0255] As concerns AST 312 (
[0256] The last AST 313 (
[0257] The outputs of the T nodes 3112, 3122 and 3132 comprise ready-to-render primitives, and can be made available for rendering before the last AST 313 by being stored into rendering buffer 315 br.
[0258] With reference to
[0259] The generated execution pipeline 32, which is ordered and task-sequential, is illustrated on
[0260] The use of VBOs is a significant aspect for synchronization in the present disclosure, and is visible on
[0265] Synchronization is ensured from a sub-program interpreter to the next by means of the Transform Feedback mechanism associated with the VBOs 321. Indeed, the seeds 33 become available to the next sub-program interpreter only when all of them have been processed by the previous sub-program interpreter. Thereby, data parallelism can be kept.
[0266] Therefore, the execution of the pipeline 32 comprises three intermediate passes 331, noted Pass 1, Pass 2 and Pass 3 and respectively associated with sub-programs P1, P2 and P3, and then a rendering pass 332 following the availability of the output primitives in VBOrender.
[0267] In more complex situations than in the previous first example, compiler 372 has to deal with RCDGs including branches and/or cycles. Now, the nodes of an RCDG are ordered to ensure the execution dependencies in agreement with the present disclosure, a node execution being associated with a GPU generation pass. Preferred embodiments for proceeding with such situations with branches or cycles are described below.
[0268] As specified above in the summary part, a preferred implementation consists in exploiting a topological sorting algorithm detecting and breaking cycles, derived from EP-2779101 A1 to Thomson Licensing—the following explanations being partly retrieved from the related disclosure. This is illustrated on
[0269] Clusters of nodes to be evaluated, defined from graph 40, form a flow vector 42—they consist in five clusters (noted 1.sup.st, 2.sup.nd, 3.sup.rd, 4.sup.th and 5.sup.th on
[0270] First, a dependency counter is associated with each node, so as to assign the nodes to cluster(s) in function of an information representative of the dependencies existing between the nodes in the graph. The dependency counter is advantageously first initialized to the number of direct predecessors to that node. Then, all the nodes having no predecessor (i.e. a null dependency counter) are placed in the first cluster of the flow vector. These nodes have no dependency and are ready to be evaluated. Each insertion of a node in a cluster of the flow vector decrements the dependency counter of all its direct successors. For example, the insertion of the nodes in the first cluster decrements by one the dependency counter associated with direct successors of the nodes assigned to the first cluster. All nodes having an updated dependency counter equal to zero are then assigned to a second cluster and the dependency counters of their direct successor in the graph are each decremented by one. The new nodes having then an updated dependency counter equal to zero are assigned to a third cluster and so on.
[0271] Consequently, each cluster comprises nodes that can be evaluated in parallel, which also amounts to say that they can be evaluated in any arbitrary order. Since in the present disclosure, the sub-programs must be executed sequentially, a further step consists in ordering the nodes of each cluster sequentially. Any arbitrary order is then valuable, which may be founded e.g. on the processing order or any initial numbering.
[0272] The specific case of cycles is dealt with in relation with
[0273] Proceeding as described above, through detecting and breaking the cycles, enables to implement the sequential ordering of the sub-programs when constructing the flow vector. Anyway, the cycles remain valid for the runtime execution.
[0274] More illustrative details are available in EP-2779101 A1, in which complementary information can be found.
[0275] Two further illustrative examples of graphs, the second one with branching in relation with
[0276] In the second example, a branching dataflow structure 31A as seen on
[0277] The dataflow structure 31A is reflected in a resulting task-sequential execution pipeline 32A, represented on
[0278] As apparent from this arrangement, though sub-program P2 has been ordered prior to P3, the reverse would be valuable as well insofar as P2 and P3 correspond to parallel nodes.
[0279] The dynamic contents of the buffers 314A bij and 315A br associated with the pipeline 32A are shown on
[0280] For sake of clarity, it is specified that the elementary geometries numbered 1 and 2 are different from one buffer to another, whether the buffers be intermediate buffers bij (the elementary geometries being then intermediate primitives) or rendering buffer br (the elementary geometries being then ready-to-render primitives exploited at the next stages of the execution pipeline 32). The presentation on
[0281] Each of those parts 351, 352 is itself divided into sub-parts 341-344, respectively directed to various kinds of data. These include: [0282] sub-part 341 for storing input quad coordinates, consisting in the four components (homogeneous coordinates) of the respective vertices of the initial quad—such shapes being used as rendering primitives; [0283] sub-part 342 for storing a matrix transform to be applied to the initial quad, involving translation, rotation and scale, which corresponds to cumulating the instructions for geometric transformations associated with the considered sub-program; [0284] sub-part 343 for storing a barycentric transform matrix, providing a deformation to be applied to the initial quad and corresponding to the cumulated instructions of barycentric transformations associated with the considered sub-program; [0285] sub-part 344 for storing colour data, as a field having four real components enabling to encode various types of information to be applied to the quad, such as notably RGB colour (Red Green Blue) and a texture identifier.
[0286] In operation, buffers 314A bij and 315A br are exploited in parallel for the three seeds 33A. Each of the intermediate buffers 314A is written and read at the passes 331A corresponding to its position. For example, buffer b13 is filled by intermediate primitives provided by interpreter I.sub.P1 during Pass 1, and those data are retrieved by interpreter I.sub.P3 at the beginning of Pass 3. As concerns the rendering buffer 315A br, it is empty (which is represented by sign “Ø” on
[0287] For sake of still enhanced clarity, a practical application will now be described in relation with that second example, through a procedural modelling in which the last set of instructions (contained in sub-program P4) depends on the execution of the former sets of instructions (associated with sub-programs P1, P2 and P3).
[0288] In this application, a set of six buildings is modelled, including three style-S1 and three style-S2 buildings. Also, three one-to-one footbridges must connect buildings of different styles. Then: [0289] each of the three seeds 33A corresponds to the footprint of a style-S1 building—four points forming a quad, the pipeline 32A being executed three times in parallel (data parallelism for each seed); [0290] the instruction set of sub-program P1 creates for each of the seeds 33A a new quad corresponding to the footprint of a style-S2 building and positions it in space; [0291] the instruction set of sub-program P2 generates the style-S1 building; in this respect: [0292] input intermediate buffer b12 contains a unique elementary geometry per seed, namely the 3D coordinates of the four points of the initial quad (seed) without additional transformation—the linear and barycentric transformation matrices being equal to identity and the colour field not being used; [0293] the instruction set comprises notably a rule of extrusion of the initial quad, then the generation of multiple windows at different floors in compliance with style-S1; [0294] the ready-to-render elements are directly put into the rendering buffer br—these pertain to the elementary geometries corresponding to various elements of the façade generated with sub-program P2, such as windows and doors; [0295] the output intermediate buffer b24 contains a unique elementary geometry per seed, corresponding to the quad on which the footbridge will be secured; [0296] the instruction set of sub-program P3 generates the style-S2 building; in this respect: [0297] input intermediate buffer b13 contains a unique elementary geometry per seed, namely the 3D coordinates of the four points of the initial quad (seed) with a transformation matrix corresponding to a translation—the barycentric transformation matrix being equal to identity and the colour field not being used; [0298] the instruction set comprises notably a rule of extrusion of the initial quad, then the generation of multiple windows at different floors in compliance with style-S2; [0299] the ready-to-render elements are directly put into the rendering buffer br—these pertain to the elementary geometries corresponding to various elements of the façade generated with sub-program P3, such as windows and doors; [0300] the output intermediate buffer b34 contains a unique elementary geometry per seed, corresponding to the quad on which the footbridge will be secured; [0301] the instruction set of sub-program P4 generates a footbridge for each seed from the two attachment quads stored in intermediate buffers b24 and b34—the generated elements of the footbridge being pushed into the rendering VBO 322A of the last pass 332A.
[0302] In the third example, a cycle dataflow structure 31B as seen on
[0303] The dataflow structure 31B is reflected in a resulting task-sequential execution pipeline 32B, represented on
[0304] As apparent from this arrangement, the sequential order of interpreters I.sub.P1 to I.sub.P3 is established without consideration of the loop pointing from sub-program P3 to P2, but the intermediate buffer b32 is provided for execution at runtime.
[0305] In operation, there is one cycle execution per frame, the buffer b32 being empty at the first pipeline execution.
[0306] For sake of still enhanced clarity, another practical application will be described in relation with that third example. In this application, a procedural modelling needs to start from a non-constant current state, which depends on the preceding generation.
[0307] In this application, a set of wind-driven small branches with leaves must be generated and rendered on-the-fly. With time, the branches have changing positions and lose progressively sub-branches and leaves. Then: [0308] a seed corresponds to the position of a branch—so that the number of seeds is equal to the number of branches to be generated; [0309] the set of instructions of sub-program P1 positions those seeds in space; in this respect: [0310] output intermediate buffer b12 contains a unique elementary geometry per seed, namely the 3D coordinates of the four points of the initial quad (seed), with a transformation matrix linked to translation and rotation—the barycentric transformation matrix being equal to identity and the colour field being used for storing the initial number of sub-branches to be generated; [0311] the set of instructions of sub-program P2 generates a branch knowing its position and the number of sub-branches to be generated; in this respect: [0312] the instruction set comprises notably rules for extrusion, branching, rotation, so as to generated the sub-tree structure; [0313] output intermediate buffer b23 contains a number of elementary geometries per seed, corresponding to the various branch elements with sub-program P2—such as sub-branches and leaves; [0314] the set of instructions of sub-program P3: [0315] copies all elementary geometries into the rendering buffer 322B for the rendering pass 332B; [0316] computes the new position of the branches and the reduction in the number of sub-branches in function of various climate parameters—such as the wind strength and direction; [0317] output intermediate buffer b32 contains a unique elementary geometry per seed, namely the 3D coordinates of the four points of the initial quad—the linear and barycentric transformation matrices being equal to identity, and the colour field storing the new number of sub-branches to be generated.
[0318] The execution pipeline 32 created by compiler 372 is a CPU-based lazy execution pipeline launcher, which at runtime defines a starting node from which the task-sequential pipeline 32 has to be re-executed for the concerned frame. This execution is advantageously controlled by the activation and deactivation of the corresponding Transform Feedback mechanisms.
[0319] In operation: [0320] all the predecessors of the starting node have their Transform Feedback mechanism deactivated, keeping in cache the generated geometries; [0321] the starting node and the subsequent nodes have their Transform Feedback mechanism activated, which corresponds to the new generation requested for the concerned frame.
[0322] In specific embodiments, a distinction is basically made between dataflow with no cycle, and dataflow with cycle (which means an active cycle, with pending iterations). In the former case, at runtime: [0323] at the first frame, all the nodes are executed and the generated geometries are rendered; [0324] then the launcher 375 enters in a lazy mode by setting the starting node to the last node (in charge of the rendering pass); consequently, only the rendering pass using the cached generated geometries is executed for the frames after the first one; [0325] when however a grammar parameter is modified on-the-fly, the launcher 375: [0326] sets the starting node to the node corresponding to the first sub-program to which the modified grammar parameter belongs; [0327] calls the pipeline execution for the concerned frame; and then [0328] enters into the lazy mode by setting the starting node to the last node.
[0329] By contrast, in the case of a dataflow with cycle, at runtime: [0330] all the nodes are executed and the generated geometries are rendered at the first frame; [0331] then the launcher 375 enters into a lazy mode by setting the starting node to the first node No of the first cycle (which corresponds e.g. to interpreter I.sub.P2 on
[0336] The main steps for creating the tools enabling geometry generation in compliance with the present disclosure, as illustrated on
[0343] The main steps for launching the execution of the pipeline 32 in compliance with the present disclosure, as illustrated on
[0349] More precisely regarding the setting of the starting node in step 81, it is proceeded as follows as shown on
[0359] Naturally, the present disclosure is not limited to the embodiments previously described.
[0360] In particular, the present disclosure extends to any device implementing the described methods. The applications of the present disclosure are not limited to a live utilisation but also extend to any other utilisation for which procedural generation can be exploited, for example for processing known as postproduction processing in a recording studio for the display of synthesis images.
[0361] The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or a device), the implementation of features discussed may also be implemented in other forms (for example a program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device.
[0362] Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, a web server, a game console, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.
[0363] Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette (“CD”), an optical disc (such as, for example, a DVD, often referred to as a digital versatile disc or a digital video disc), a random access memory (“RAM”), or a read-only memory (“ROM”). The instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.
[0364] As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.
[0365] A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application.