Patent classifications
G06F13/4013
Data transfer using point-to-point interconnect
In a computer comprising a plurality of integrated circuits (ICs), each IC may be connected to all other ICs via a respective point-to-point interconnect. A source IC divides the data to be transmitted to a destination IC for a transaction to generate multiple data cells so that each data cell includes a different portion of the data. The source IC transmits one of the data cells to the destination IC and remaining data cells to intermediate ICs, wherein an intermediate IC is an IC other than the source IC or the destination IC. The intermediate ICs forward the remaining data cells to the destination IC.
Executing memory requests out of order
An on-chip cache is described which receives memory requests and in the event of a cache miss, the cache generates memory requests to a lower level in the memory hierarchy (e.g. to a lower level cache or an external memory). Data returned to the on-chip cache in response to the generated memory requests may be received out-of-order. An instruction scheduler in the on-chip cache stores pending received memory requests and effects the re-ordering by selecting a sequence of pending memory requests for execution such that pending requests relating to an identical cache line are executed in age order and pending requests relating to different cache lines are executed in an order dependent upon when data relating to the different cache lines is returned. The memory requests which are received may be received from another, lower level on-chip cache or from registers.
REORDERING OF SPARSE DATA TO INDUCE SPATIAL LOCALITY FOR N-DIMENSIONAL SPARSE CONVOLUTIONAL NEURAL NETWORK PROCESSING
Exemplary embodiments maintain spatial locality of the data being processed by a sparse CNN. The spatial locality is maintained by reordering the data to preserve spatial locality. The reordering may be performed on data elements and on data for groups of co-located data elements referred to herein as chunks. Thus, the data may be reordered into chunks, where each chunk contains data for spatially co-located data elements, and in addition, chunks may be organized so that spatially located chunks are together. The use of chunks helps to reduce the need to re-fetch data during processing. Chunk sizes may be chosen based on the memory constraints of the processing logic (e.g., cache sizes).
PERFORMANT INLINE ECC ARCHITECTURE FOR DRAM CONTROLLER
Techniques are disclosed for reducing the time required to read and write data to memory. Data reads and/or writes can be delayed when error correction code (ECC) bits, which are used to detect and/or correct data corruption, are written to memory. Writing ECC bits can take longer in some instances than writing data bits because an ECC write may involve a read/modify/write operation, as opposed to just simply writing the bits to memory. Some latencies associated with writing ECC bits can be hidden by interleaving ECC writes with data writes. However, if insufficient data writes are available for interleaving, hiding such latencies become difficult. Thus, various techniques are disclosed, for example, where ECC writes are deferred until a sufficient number of data writes become available for interleaving. By interleaving ECC writes, the disclosed techniques decrease the overall time required to read and write data to memory.
I/O Coherent Request Node for Data Processing Network with Improved Handling of Write Operations
A method and apparatus for data transfer in a data processing network uses both ordered and optimized write requests. A first write request is received at a first node of the data processing network is directed to a first address and has a first stream identifier. The first node determines if any previous write request with the same first stream identifier is pending. When a previous write request is pending, a request for an ordered write is sent to a Home Node of the data processing network associated with the first address. When no previous write request to the first stream identifier is pending, a request for an optimized write is sent to the Home Node. The Home Node and first node are configured to complete a sequence of ordered write requests before the associated data is made available to other elements of the data processing network.
Shuffler circuit for lane shuffle in SIMD architecture
Techniques are described to perform a shuffle operation. Rather than using an all-lane to all-lane cross bar, a shuffler circuit having a smaller cross bar is described. The shuffler circuit performs the shuffle operation piecewise by reordering data received from processing lanes and outputting the reordered data.
Automatically generating efficient remote procedure call (RPC) code for heterogeneous systems
In one embodiment, a system includes at least one processor and logic integrated with and/or executable by the processor, the logic being configured to instantiate, using an interface definition language (IDL) on a first server, a remote procedure call (RPC) function to exchange information between the first server and a second server, generate at least one stub on the first server using the RPC, and generate at least one stub on the second server using the RPC, wherein the at least one stub generated on the second server does not perform any marshalling or un-marshaling of data when endianess of the two servers is the same. Other systems, methods, and computer program products for exchanging information between servers using RPCs are described in more embodiments.
MULTIPLE ENDIANNESS COMPATIBILITY
Examples of the present disclosure provide apparatuses and methods for multiple endianness compatibility. An example method comprises receiving a plurality of bytes and determining a particular endianness format of the plurality of bytes. The method can include, responsive to determining the particular endianness format is a first endianness format, reordering bits of each byte of the plurality of bytes on a bytewise basis, storing the reordered plurality of bytes in an array of memory cells, and adjusting a shift direction associated with performing a number of operations on the plurality of bytes stored in the array. The method can include, responsive to determining the particular endianness format is a second endianness format, storing the plurality of bytes in the array without reordering bits of the plurality of bytes.
Content data receiving device, content data delivery system, and content data receiving method
A content data receiving device is provided with a communication interface, a buffer, and a processor. The processor, in a case in which the sequence included in the content data that has been received by the communication interface is discontinuous, performs a retransmission request to a transmission side after causing the buffer to hold the content data, causes the buffer to keep holding the content data that has been received by the communication interface until receiving content data in a continuous sequence, and, in a case of receiving the content data in a continuous sequence, outputs the content data to a subsequent stage so that the sequence is continuous.
Configurable ordering controller for coupling transactions
A method for coupling transactions with a configurable ordering controller in a computer system. The method comprises sending, by a coupling device, first data packets with an unordered attribute being set to an ordering controller. The method further comprises sending, by the coupling device, second data packets with requested ordering to the ordering controller, back-to-back after the first data packets, without waiting until all of the first data packets are completed. The method further comprises sending, by the ordering controller, the first data packets to a memory subsystem in a relaxed ordering mode, wherein the ordering controller sends the first data packets to the memory subsystem in an arbitrary order, and wherein the ordering controller sends the second data packets to the memory subsystem after sending all of the first data packets to the memory subsystem.