Patent classifications
G06F9/3877
Object-Oriented Memory for Client-to-Client Communications
Systems and corresponding methods employ an object-oriented (OO) memory (OOM) to effect inter-hardware-client (IHC) communication among a plurality of hardware clients included in same. A system comprises a centralized OOM and the plurality of hardware clients communicate, directly, to the centralized OOM device via OO message transactions. The centralized OOM device effects IHC communication among the plurality of hardware clients based on the OO message transactions. Another system comprises a plurality of OO memories (OOMs) capable of inter-object-oriented-memory-device communication. A hardware client communicates, directly, to a respective OOM device via OO message transactions. The inter-object-oriented-memory-device communication effects IHC communication among the plurality of hardware clients based on the OO message transactions.
ASSIGNING GEOMETRY FOR PRETESTING AGAINST SCREEN REGIONS FOR AN IMAGE FRAME USING PRIOR FRAME INFORMATION
A method including rendering graphics for an application using graphics processing units (GPUs). Responsibility for rendering of geometry is divided between GPUs based on screen regions, each GPU having a corresponding division of the responsibility which is known. First pieces of geometry are rendered at the GPUs during a rendering phase of a previous image frame. Statistics are generated for the rendering of the previous image frame. Second pieces of geometry of a current image frame are assigned based on the statistics to the GPUs for geometry testing. Geometry testing at a current image frame on the second pieces of geometry is performed to generate information regarding each piece of geometry and its relation to each screen region, the geometry testing performed at each of the GPUs based on the assigning. The information generated for the second pieces of geometry is used when rendering the geometry at the GPUs.
Configurable circuit telemetry system
Aspects of the disclosure provide for a circuit, in some examples, including a storage element, a co-processor, and a telemetry sequencer coupled to the storage element and the co-processor. The telemetry sequencer is configured to implement a digital state machine to receive configuration information indicating a type of telemetry data for generation, retrieve operations and operands, where the operations and the operands define a sequential series of actions for execution to generate the telemetry data, drive the co-processor with the operations and the operands by passing some of the operations and some of the operands to the co-processor for processing by the co-processor, receive, from the co-processor, and store an intermediate output of the series of actions as the telemetry data in a first format, and receive, from the co-processor, and store a final output of the series of actions as the telemetry data in a second format.
Path simplification for computer graphics applications
Systems and methods provide for efficiently and accurately determining a simplified path that conforms to the geometry of an original path by simultaneously minimizing the deviation from the original path and reducing the number of anchor points in the simplified path. A simplified path may be iteratively generated by updating parametric values and anchor points for candidate simplified paths at epochs. A deviation in distance between points on the original path and corresponding points on candidate paths may be iteratively decreased to ensure that the resulting simplified path follows the geometry of the original path to a predetermined threshold. Continuity constrains can also be applied to ensure smoothness of the simplified path.
EFFECTIVE AND SCALABLE BUILDING AND PROBING OF HASH TABLES USING MULTIPLE GPUS
Described approaches provide for effectively and scalably using multiple GPUs to build and probe hash tables and materialize results of probes. Random memory accesses by the GPUs to build and/or probe a hash table may be distributed across GPUs and executed concurrently using global location identifiers. A global location identifier may be computed from data of an entry and identify a global location for an insertion and/or probe using the entry. The global location identifier may be used by a GPU to determine whether to perform an insertion or probe using an entry and/or where the insertion or probe is to be performed. To coordinate GPUs in materializing results of probing a hash table a global offset to the global output buffer may be maintained in memory accessible to each of the GPUs or the GPUs may compute global offsets using an exclusive sum of the local output buffer sizes.
PROACTIVE REQUEST COMMUNICATION SYSTEM WITH IMPROVED DATA PREDICTION USING EVENT-TO-STATUS TRANSFORMATION
A data prediction subsystem stores event data indicating amounts of items removed from locations over a previous period of time and event-to-status transition rules for each location. An event is detected at a first location. The detected event is associated with a change in status of a first item. Based on the detected event and the event-to-status transition rules, an anticipated item status is determined for the first item, indicating whether the first item is believed to be present at the first location at a time during the previous period of time of the event data. Based at least in part on the anticipated item status for the first item, a prediction value is determined that corresponds to a recommended amount of the first item to request for a future time.
Systems, methods, and apparatuses for heterogeneous computing
- Rajesh M. Sankaran ,
- Gilbert Neiger ,
- Narayan Ranganathan ,
- Stephen R. Van Doren ,
- Joseph Nuzman ,
- Niall D. McDonnell ,
- Michael A. O'Hanlon ,
- Lokpraveen B. Mosur ,
- Tracy Garrett Drysdale ,
- Eriko Nurvitadhi ,
- Asit K. Mishra ,
- Ganesh Venkatesh ,
- Deborah T. Marr ,
- Nicholas P. Carter ,
- Jonathan D. Pearce ,
- Edward T. Grochowski ,
- Richard J. Greco ,
- Robert Valentine ,
- Jesus Corbal ,
- Thomas D. Fletcher ,
- Dennis R. Bradford ,
- Dwight P. Manley ,
- Mark J. Charney ,
- Jeffrey J. Cook ,
- Paul Caprioli ,
- Koichi Yamada ,
- Kent D. Glossop ,
- David B. Sheffield
Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.
Methods and apparatus to emulate graphics processing unit instructions
Embodiments are disclosed for emulation of graphics processing unit instructions. An example method executing an instrumented kernel using a logic circuit, the instrumented kernel including an emulation sequence; saving, in response to determination that the emulation sequence is to be executed, source data to a shared memory; setting an emulation request flag to indicate to processor circuitry separate from the logic circuit that offloaded execution of the emulation sequence is to be executed; monitoring the emulation request flag to determine whether the offloaded execution of the emulation sequence is complete; and accessing resulting data from the shared memory.
Detecting execution hazards in offloaded operations
Detecting execution hazards in offloaded operations is disclosed. A second offload operation is compared to a first offload operation that precedes the second offload operation. It is determined whether the second offload operation creates an execution hazard on an offload target device based on the comparison of the second offload operation to the first offload operation. If the execution hazard is detected, an error handling operation may be performed. In some examples, the offload operations are processing-in-memory operations.
Continuation analysis tasks for GPU task scheduling
Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.