G06F9/3001

SOLID STATE DRIVE
20180011663 · 2018-01-11 · ·

A memory stores data, a memory interface circuit reads the data from the memory, and an arithmetic circuit performs a prescribed arithmetic operation on the data. A host interface circuit outputs an arithmetic request to the arithmetic circuit, and also outputs a reading instruction to the memory via the memory interface circuit, upon receipt of an arithmetic instruction from a host device. The host interface circuit receives, from the arithmetic circuit, an arithmetic result of the prescribed arithmetic operation performed on the data read from the memory via the memory interface circuit, and outputs the arithmetic result to the host device.

Enabling removal and reconstruction of flag operations in a processor

In one embodiment, a processor includes fetch logic to fetch instructions, decode logic to decode the fetched instructions, and execution logic to execute at least some of the instructions. The decode logic may determine whether a flag portion of a first instruction to be folded is to be performed, and if not, accumulate a first immediate value of the first instruction with a folded immediate value obtained from an entry of an immediate buffer.

Look-up table initialize

A digital data processor includes an instruction memory storing instructions specifying a data processing operation and a data operand field, an instruction decoder coupled to the instruction memory for recalling instructions from the instruction memory and determining the operation and the data operand, and an operational unit coupled to a data register file and to an instruction decoder to perform a data processing operation upon an operand corresponding to an instruction decoded by the instruction decoder and storing results of the data processing operation. The operational unit is configured to perform a table write in response to a look up table initialization instruction by duplicating at least one data element from a source data register to create duplicated data elements, and writing the duplicated data elements to a specified location in a specified number of at least one table and a corresponding location in at least one other table.

Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format

Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.

Method and device capable of executing instructions remotely in accordance with multiple logic units

Systems, apparatuses and method related to remotely executable instructions are described. A device may be wirelessly coupled to (e.g., physically separated) another device, which may be in a physically separate device. The another device may remotely execute instructions associated with performing various operations, which would have been entirely executed at the device absent the another device. The outputs obtained as a result of the execution may be transmitted, via the transceiver, back to the device via a wireless communication link (e.g., using resources of an ultra high frequency (UHF), super high frequency (SHF), extremely high frequency (EHF), and/or tremendously high frequency (THF) bands). The another device at which the instructions are remotely executable may include memory resources, processing resources, and transceiver resources; they may be configured to use one or several communication protocols over licensed or shared frequency spectrum bands, directly (e.g., device-to-device) or indirectly (e.g., via a base station).

SYSTEMS, METHODS, AND APPARATUSES FOR TILE LOAD

Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in the form of decode circuitry to decode an instruction having fields for an opcode, a destination matrix operand identifier, and source memory information, and execution circuitry to execute the decoded instruction to load groups of strided data elements from memory into configured rows of the identified destination matrix operand to memory.

SEMICONDUCTOR DEVICE
20230236832 · 2023-07-27 ·

A semiconductor device including a first processor having a first register, the first processor configured to perform region of interest (ROI) calculations using the first register; and a second processor having a second register, the second processor configured to perform arithmetic calculations using the second register. The first register is shared with the second processor, and the second register is shared with the first processor.

SYSTEMS AND METHODS FOR PERFORMING 16-BIT FLOATING-POINT MATRIX DOT PRODUCT INSTRUCTIONS

Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.

METHODS AND APPARATUS TO FACILITATE READ-MODIFY-WRITE SUPPORT IN A COHERENT VICTIM CACHE WITH PARALLEL DATA PATHS

Methods, apparatus, systems and articles of manufacture are disclosed facilitate read-modify-write support in a coherent victim cache with parallel data paths. An example apparatus includes a random-access memory configured to be coupled to a central processing unit via a first interface and a second interface, the random-access memory configured to obtain a read request indicating a first address to read via a snoop interface, an address encoder coupled to the random-access memory, the address encoder to, when the random-access memory indicates a hit of the read request, generate a second address corresponding to a victim cache based on the first address, and a multiplexer coupled to the victim cache to transmit a response including data obtained from the second address of the victim cache.

CONFIGURABLE PARSER AND A METHOD FOR PARSING INFORMATION UNITS
20230007106 · 2023-01-05 ·

A packet processing technique can include receiving a packet, and parsing the packet based on a protocol field to generate a parse result vector. The parse result vector is used to select between forwarding the packet to a virtual machine executing on a host processing integrated circuit, forwarding the packet to a physical media access controller, multicasting the packet to multiple virtual machines executing on the host processing integrated circuit, and sending the packet to a hypervisor.