G06F9/30029

Graphics texture footprint discovery

Accesses to a mipmap by a shader in a graphics pipeline are monitored. The mipmap is stored in a memory or cache associated with the shader and the mipmap represents a texture at a hierarchy of levels of detail. A footprint in the mipmap of the texture is marked based on the monitored accesses. The footprint indicates, on a per-tile, per-level-of-detail (LOD) basis, tiles of the mipmap that are expected to be accessed in subsequent shader operations. In some cases, the footprint is defined by a plurality of footprint indicators that indicate whether the tiles of the mipmap are expected to be accessed in subsequent shader operations. In that case, the plurality of footprint indicators are set to a first value to indicate that the tile was not access during the first frame or a second value to indicate that the tile was accessed during the first frame.

ENCODING AND DECODING DEVICE FOR SYSTEM DATA OF STORAGE DEVICE
20230134339 · 2023-05-04 ·

An encoding device and a decoding device use linear and nonlinear codes for encoding and decoding system data for a storage device. The encoding device includes a linear encoder for encoding first data to generate encoded data and a nonlinear transformer for transforming the encoded data with second data to generate output data. The first data includes data on a physical address corresponding to a logical address. The second data includes the logical address and a timestamp value indicating a version of map data mapping between the logical address and the physical address.

AI SYNAPTIC COPROCESSOR
20230205529 · 2023-06-29 ·

A coprocessor may include a memory configured to store a plurality of Very Long Data Words, each as a test Very Long Data Word (VLDW) having a length in the range of about one thousand bits to one million or more bits and containing encoded information that is distributed across the length of the VLDW. A processor generates search terms and a processing logic unit receives a test VLDW from the memory, receives a search term from the processor, and computes a Boolean inner product between the search term and the test VLDW read from memory indicative of the measure of similarity between the test VLDW and the search term. Optionally, buffers within logic circuits of processing pipelines may receive the test VLDWs.

HARDWARE DEVICE FOR ENFORCING ATOMICITY FOR MEMORY OPERATIONS

A system includes a hardware compare and swap (CAS) module communicatively coupled to a bus, the CAS module to perform an atomic operation in response to a first request from a first request agent for the atomic operation to be performed on a data value that is shared among a plurality of request agents and obtain a first result value. The atomic operation includes initiating a CAS command via the bus. The CAS module performs the atomic operation in response to a second request from a second request agent and obtains a second result value. Responsive to determining a failure to successfully process one or more of the first request or the second request, the hardware CAS module repetitively performs the atomic operation, for one or more of the first request or the second request.

Method and apparatus for performing reduction operations on a set of vector elements

An apparatus and method are described for performing SIMD reduction operations. For example, one embodiment of a processor comprises: a value vector register containing a plurality of data element values to be reduced; an index vector register to store a plurality of index values indicating which values in the value vector register are associated with one another; single instruction multiple data (SIMD) reduction logic to perform reduction operations on the data element values within the value vector register by combining data element values from the value vector register which are associated with one another as indicated by the index values in the index vector register; and an accumulation vector register to store results of the reduction operations generated by the SIMD reduction logic.

BIT MATRIX MULTIPLICATION
20230195835 · 2023-06-22 ·

Detailed are embodiments related to bit matrix multiplication in a processor. For example, in some embodiments a processor comprising: decode circuitry to decode an instruction have fields for an opcode, an identifier of a first source bit matrix, an identifier of a second source bit matrix, an identifier of a destination bit matrix, and an immediate; and execution circuitry to execute the decoded instruction to perform a multiplication of a matrix of S-bit elements of the identified first source bit matrix with S-bit elements of the identified second source bit matrix, wherein the multiplication and accumulation operations are selected by the operation selector and store a result of the matrix multiplication into the identified destination bit matrix, wherein S indicates a plural bit size is described.

INCREASED PRECISION NEURAL PROCESSING ELEMENT

Neural processing elements are configured with a hardware AND gate configured to perform a logical AND operation between a sign extend signal and a most significant bit (“MSB”) of an operand. The state of the sign extend signal can be based upon a type of a layer of a deep neural network (“DNN”) that generate the operand. If the sign extend signal is logical FALSE, no sign extension is performed. If the sign extend signal is logical TRUE, a concatenator concatenates the output of the hardware AND gate and the operand, thereby extending the operand from an N-bit unsigned binary value to an N+1 bit signed binary value. The neural processing element can also include another hardware AND gate and another concatenator for processing another operand similarly. The outputs of the concatenators for both operands are provided to a hardware binary multiplier.

CLOCK DRIFT MONITOR
20230195160 · 2023-06-22 ·

Provided are embodiments for monitoring clock drift. Embodiments may include an XOR gate that is configured to receive a first clock signal from a first clock source and a second clock signal from a second clock source, wherein the XOR logic gate is further configured to generate a switching output based on an XOR operation of the first clock signal and the second clock signal, and a rising edge detector and a falling edge detector that are configured to detect a rising edge and a falling edge of the switching output. Embodiments may also include an AND gate that is configured to threshold compare the rising edge to a configurable threshold to determine if a fault condition exists indicating clock drift between the first clock signal and the second clock signal and provide an indication of the fault condition based at least in part on the comparison.

Rendering operations using sparse volumetric data

A ray is cast into a volume described by a volumetric data structure, which describes the volume at a plurality of levels of detail. A first entry in the volumetric data structure includes a first set of bits representing voxels at a lowest one of the plurality of levels of detail, and values of the first set of bits indicate whether a corresponding one of the voxels is at least partially occupied by respective geometry. A set of second entries in the volumetric data structure describe voxels at a second level of detail, which represent subvolumes of the voxels at the first lowest level of detail. The ray is determined to pass through a particular subset of the voxels at the first level of detail and at least a particular one of the particular subset of voxels is determined to be occupied by geometry.

Logical address distribution in multicore memory system
11681554 · 2023-06-20 · ·

A workload distribution scheme is provided for a multicore memory system. The memory system includes a memory device including blocks and a controller including cores. The controller receives multiple logical addresses from a host, determines a range of logical addresses among the multiple logical addresses to be allocated for the cores, and distributes multiple subsets of the logical addresses in the range to the cores, based on an operation of modulo and shuffling on the multiple logical addresses.