Patent classifications
G06F13/1652
Technologies for dividing work across accelerator devices
Technologies for dividing work across one or more accelerator devices include a compute device. The compute device is to determine a configuration of each of multiple accelerator devices of the compute device, receive a job to be accelerated from a requester device remote from the compute device, and divide the job into multiple tasks for a parallelization of the multiple tasks among the one or more accelerator devices, as a function of a job analysis of the job and the configuration of each accelerator device. The compute engine is further to schedule the tasks to the one or more accelerator devices based on the job analysis and execute the tasks on the one or more accelerator devices for the parallelization of the multiple tasks to obtain an output of the job.
ON-CHIP INTERCONNECT FOR MEMORY CHANNEL CONTROLLERS
Methods, systems, and apparatus, including computer-readable media, are described for an integrated circuit that accelerates machine-learning computations. The circuit includes processor cores that each include: multiple channel controllers; an interface controller for coupling each channel controller to any memory channel of a system memory; and a fetch unit in each channel controller. Each fetch is configured to: receive channel data that encodes addressing information; obtain, based on the addressing information, data from any memory channel of the system memory using the interface controller; and write the obtained data to a vector memory of the processor core via the corresponding channel controller that includes the respective fetch unit.
MEMORY ACCESS METHOD, SWITCH, AND MULTIPROCESSOR SYSTEM
A memory access method includes: receiving, by the switch, a data packet; matching a flow table on the data packet, where the flow table includes at least one flow entry, where the flow entry includes a matching field and an action field, and the at least one flow entry includes a first flow entry, where a matching field of the first flow entry is used to match source node information, destination node information, and a protocol type in the data packet, and an action field of the first flow entry is used to indicate an operation command for a storage device embedded in the switch; and when the data packet successfully matches the first flow entry, performing an operation on the storage device according to the operation command in the action field of the successfully matched first flow entry.
SCALE-OUT HIGH BANDWIDTH MEMORY SYSTEM
A high bandwidth memory (HBM) system includes a first HBM+ card. The first HBM+ card includes a plurality of HBM+ cubes. Each HBM+ cube has a logic die and a memory die. The first HBM+ card also includes a HBM+ card controller coupled to each of the plurality of HBM+ cubes and configured to interface with a host, a pin connection configured to connect to the host, and a fabric connection configured to connect to at least one HBM+ card.
Technolgies for millimeter wave rack interconnects
Racks and rack pods to support a plurality of sleds are disclosed herein. Switches for use in the rack pods are also disclosed herein. A rack comprises a plurality of sleds and a plurality of electromagnetic waveguides. The plurality of sleds are vertically spaced from one another. The plurality of electromagnetic waveguides communicate data signals between the plurality of sleds.
MULTI-CHIP SYSTEM AND DATA TRANSMISSION METHOD THEREOF
A multi-chip system and a data transmission method thereof are provided. The multi-chip system includes a first chip, a link unit, and a second chip. The first chip includes multiple transmitter (TX) channels and a first data processing module. The TX channels are configured to provide at least one transaction information. The first data processing module converts the at least one transaction information into at least one first data packet according to a general packet format and packs the at least one first data packet according to a specific packet format to generate a second data packet. The first data processing module merges two sets of second data packets into a third data packet and transmits the third data packet to the link unit. The second chip receives the third data packet through the link unit.
HOST CONTROLLER INTERFACE USING MULTIPLE CIRCULAR QUEUE, AND OPERATING METHOD THEREOF
A host controller interface configured to provide interfacing between a host device and a storage device includes processing circuitry; a doorbell register configured to store a head pointer and a tail pointer of one or more first queues; and an entry buffer configured to store a first command from one of the one or more first queues in the entry buffer, wherein the processing circuitry is configured to, determine an order in which the commands of the one or more first queues are to be processed, route the first command to be stored in the entry buffer according to the determined order, and route a first response to be stored in one of one or more second queues.
Systems, Apparatus, and Methods for Reordering Image Data
Described examples relate to an apparatus for rearranging or reordering image data of a stream. The apparatus may include control circuitry coupled to a memory array. The control circuitry may be configured to receive the image data of the stream that includes a first image comprising first data elements organized in a row-wise format or a column-wise format and a second image comprising second data elements organized in a row-wise format or a column-wise format. The control circuitry may also be configured to write the first data elements to the memory array in a first order according to a first addressing sequence, read the first data elements from the memory array in a second order according to a second addressing sequence, and write the second data elements in the memory array according to the second addressing sequence.
Multi-core processing and memory arrangement
This invention provides a generalized electronic computer architecture with multiple cores, memory distributed amongst the cores (a core-local memory). This arrangement provides predictable, low-latency memory response time, as well as a flexible, code-supplied flow of memory from one specific operation to another (using an operation graph). In one instantiation, the operation graph consists of a set of math operations, each accompanied by an ordered list of one or more input addresses. Input addresses may be specific addresses in memory, references to other math operations in the graph, or references to the next item in a particular data stream, where data streams are iterators through a continuous block of memory. The arrangement can also be packaged as a PCIe daughter card, which can be selectively plugged into a host server/PC constructed/organized according to traditional von Neumann architecture.
LOOKAHEAD PRIORITY COLLECTION TO SUPPORT PRIORITY ELEVATION
A queuing requester for access to a memory system is provided. Transaction requests are received from two or more requestors for access to the memory system. Each transaction request includes an associated priority value. A request queue of the received transaction requests is formed in the queuing requester. Each transaction request includes an associated priority value. A highest priority value of all pending transaction requests within the request queue is determined. An elevated priority value is selected when the highest priority value is higher than the priority value of an oldest transaction request in the request queue; otherwise the priority value of the oldest transaction request is selected. The oldest transaction request in the request queue with the selected priority value is then provided to the memory system. An arbitration contest with other requesters for access to the memory system is performed using the selected priority value.