Patent classifications
G06F9/3897
MULTI-PROCESSOR BRIDGE WITH CACHE ALLOCATE AWARENESS
Techniques for loading data, comprising receiving a memory management command to perform a memory management operation to load data into the cache memory before execution of an instruction that requests the data, formatting the memory management command into one or more instruction for a cache controller associated with the cache memory, and outputting an instruction to the cache controller to load the data into the cache memory based on the memory management command.
VIRTUAL NETWORK PRE-ARBITRATION FOR DEADLOCK AVOIDANCE AND ENHANCED PERFORMANCE
A device includes a data path, a first interface configured to receive a first memory access request from a first peripheral device, and a second interface configured to receive a second memory access request from a second peripheral device. The device further includes an arbiter circuit configured to, in a first clock cycle, a pre-arbitration winner between a first memory access request and a second memory access request based on a first number of credits allocated to a first destination device and a second number of credits allocated to a second destination device. The arbiter circuit is further configured to, in a second clock cycle select a final arbitration winner from among the pre-arbitration winner and a subsequent memory access request based on a comparison of a priority of the pre-arbitration winner and a priority of the subsequent memory access request.
Multicore shared cache operation engine
Techniques including receiving configuration information for a trigger control channel of the one or more trigger control channels, the configuration information defining a first one or more triggering events, receiving a first memory management command, store the first memory management command, detecting a first one or more triggering events, and triggering the stored first memory management command based on the detected first one or more triggering events.
Sequential execution of user-submitted code and native functions
Systems and methods are described for modifying input and output (I/O) to an object storage service by implementing any combination of and any number of owner-specified functions and native functions. A function can implement a data manipulation. The functions can be applied prior to implementing a request method (e.g., GET, PUT, LIST, etc.) specified within the I/O request, such that the data to which the method is applied may not match the object specified within the request. For example, a user may request to obtain a data set. The data set may be passed to a native function that filters sensitive data to the data set, the output of the native function may be passed to an owner-specified function that redacts data from the filtered data set, and the request method may then be applied to the output of the owner-specified function.
Delayed snoop for improved multi-process false sharing parallel thread performance
Techniques for maintaining cache coherency comprising storing data blocks associated with a main process in a cache line of a main cache memory, storing a first local copy of the data blocks in a first local cache memory of a first processor, storing a second local copy of the set of data blocks in a second local cache memory of a second processor executing a first child process of the main process to generate first output data, writing the first output data to the first data block of the first local copy as a write through, writing the first output data to the first data block of the main cache memory as a part of the write through, transmitting an invalidate request to the second local cache memory, marking the second local copy of the set of data blocks as delayed, and transmitting an acknowledgment to the invalidate request.
Compiler for a Fracturable Data Path in a Reconfigurable Data Processor
A complier produces a configuration file to configure a fracturable data path of a configurable unit in a coarse-grained reconfigurable processor to concurrently generate different address sequences generated using different address associated with different operations. The fracturable data path includes multiple computation stages respectively including a pipeline register. The compiler analyzes a first address calculation and a second address calculation and assigns a first set of stages to the first operation to generate the first address sequence and a second set of stages to the second operation to generate the second address sequence using the second set of stages, based on the analysis. A configuration file for the configurable unit is generated by the compiler that assigns the first set of stages to the first operation and the second set of stages to the second operation and includes two or more immediate values for each computation stage.
CONFIGURABLE CACHE FOR COHERENT SYSTEM
A device includes a memory bank. The memory bank includes data portions of a first way group. The data portions of the first way group include a data portion of a first way of the first way group and a data portion of a second way of the first way group. The memory bank further includes data portions of a second way group. The device further includes a configuration register and a controller configured to individually allocate, based on one or more settings in the configuration register, the first way and the second way to one of an addressable memory space and a data cache.
CREDIT AWARE CENTRAL ARBITRATION FOR MULTI-ENDPOINT, MULTI-CORE SYSTEM
A device includes a data path, a first interface configured to receive a first memory access request from a first peripheral device, and a second interface configured to receive a second memory access request from a second peripheral device. The device further includes an arbiter circuit configured to determine a first destination device connected to the data path and associated with the first memory access request and a first credit threshold corresponding to the first memory access request. The arbiter circuit is further configured to determine a second destination device connected to the data path and associated with the second memory access request and a second credit threshold corresponding to the second memory access request. The arbiter circuit is configured to arbitrate access to the data path by the first memory access request and the second memory access request based on the first credit threshold and the second credit threshold.
MULTICORE, MULTIBANK, FULLY CONCURRENT COHERENCE CONTROLLER
A system includes a multi-core shared memory controller (MSMC). The MSMC includes a snoop filter bank, a cache tag bank, and a memory bank. The cache tag bank is connected to both the snoop filter bank and the memory bank. The MSMC further includes a first coherent slave interface connected to a data path that is connected to the snoop filter bank. The MSMC further includes a second coherent slave interface connected to the data path that is connected to the snoop filter bank. The MSMC further includes an external memory master interface connected to the cache tag bank and the memory bank. The system further includes a first processor package connected to the first coherent slave interface and a second processor package connected to the second coherent slave interface. The system further includes an external memory device connected to the external memory master interface.
NEURAL PROCESSING ACCELERATOR
A system for calculating. A scratch memory is connected to a plurality of configurable processing elements by a communication fabric including a plurality of configurable nodes. The scratch memory sends out a plurality of streams of data words. Each data word is either a configuration word used to set the configuration of a node or of a processing element, or a data word carrying an operand or a result of a calculation. Each processing element performs operations according to its current configuration and returns the results to the communication fabric, which conveys them back to the scratch memory.