G06F2212/455

Method and system for converting a single-threaded software program into an application-specific supercomputer

The invention comprises (i) a compilation method for automatically converting a single-threaded software program into an application-specific supercomputer, and (ii) the supercomputer system structure generated as a result of applying this method. The compilation method comprises: (a) Converting an arbitrary code fragment from the application into customized hardware whose execution is functionally equivalent to the software execution of the code fragment; and (b) Generating interfaces on the hardware and software parts of the application, which (i) Perform a software-to-hardware program state transfer at the entries of the code fragment; (ii) Perform a hardware-to-software program state transfer at the exits of the code fragment; and (iii) Maintain memory coherence between the software and hardware memories. If the resulting hardware design is large, it is divided into partitions such that each partition can fit into a single chip. Then, a single union chip is created which can realize any of the partitions.

Smooth image scrolling with disk I/O activity optimization and enhancement to memory consumption
11579763 · 2023-02-14 ·

A system and method for performing image scrolling are disclosed. In one embodiment, a system for image scrolling organizes each set of related images as a series object. The system writes selected images from one of the series objects, from the image cache to the frame buffer, for image scrolling on a display. A garbage collection module performs garbage collection in the image cache. The garbage collection module operates on memory space where a series object is released or can be moved, for reclaiming memory. The image scrolling is smoother than if the garbage collection module were to track and operate on each image as an object.

PERFORMING LOAD AND STORE OPERATIONS OF 2D ARRAYS IN A SINGLE CYCLE IN A SYSTEM ON A CHIP

In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format

Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.

SYSTEMS, METHODS, AND APPARATUSES FOR TILE LOAD

Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in the form of decode circuitry to decode an instruction having fields for an opcode, a destination matrix operand identifier, and source memory information, and execution circuitry to execute the decoded instruction to load groups of strided data elements from memory into configured rows of the identified destination matrix operand to memory.

Selectively writing back dirty cache lines concurrently with processing

A graphics pipeline includes a cache having cache lines that are configured to store data used to process frames in a graphics pipeline. The graphics pipeline is implemented using a processor that processes frames for the graphics pipeline using data stored in the cache. The processor processes a first frame and writes back a dirty cache line from the cache to a memory concurrently with processing of the first frame. The dirty cache line is retained in the cache and marked as clean subsequent to being written back to the memory. In some cases, the processor generates a hint that indicates a priority for writing back the dirty cache line based on a read command occupancy at a system memory controller.

Methods and apparatus for encrypting camera media
11706382 · 2023-07-18 · ·

Apparatus and methods for encrypting captured media. In one embodiment, the method includes capturing media data via use of a lens of an image capture apparatus; obtaining a number used only once (NONCE) value from the captured media data; obtaining an encryption key for use in encryption of the captured media data; using the obtained NONCE value and the obtained encryption key for encrypting the captured media data; and storing the encrypted media data. In some variants, the media is encrypted prior to storage, thereby obviating any instances in which the captured media data resides in a wholly unencrypted instance. Apparatus and methods for decrypting encrypted captured media are also disclosed.

APPLICATION PROGRAMMING INTERFACE TO DISASSOCIATE A VIRTUAL ADDRESS

Apparatuses, systems, and techniques to manage memory arrays. In at least one embodiment an application programming interface (API) is performed to disassociate a virtual address indicated by the API from a corresponding physical address.

SYSTEMS AND METHODS FOR REORDERING DATA IN A STORAGE DEVICE BASED ON DATA ACCESS PATTERNS
20230004318 · 2023-01-05 ·

A method for reordering data for storage includes detecting a data access pattern, associated with an application, for accessing a data, generating a remapping function based on a data access pattern information, the remapping function including operations to determine a reordering of the data based on address information for the data, receiving the data at a storage device, the data being ordered according to a first layout sequence, reordering the data, by the storage device, based on the remapping function, and storing the data, at the storage device, according to a second layout sequence corresponding to the data access pattern, the second layout sequence being different than the first layout sequence.

STREAMING ENGINE WITH STREAM METADATA SAVING FOR CONTEXT SWITCHING
20230004391 · 2023-01-05 ·

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces addresses of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. Stream metadata is stored in response to a stream store instruction. Stored stream metadata is restored to the stream engine in response to a stream restore instruction. An interrupt changes an open stream to a frozen state discarding stored stream data. A return from interrupt changes a frozen stream to an active state.