Patent classifications
G06F2212/251
EMBEDDED DEVICE AND PROGRAM UPDATING METHOD
An object of the present invention is to perform a program updating process without reconstructing a program using a pre-update program and an update differential program. An embedded device has a nonvolatile memory having a plurality of planes from/to which data can be read/written independently and an address translator performing address translation by using an address translation table. When an address obtained by decoding an instruction by a CPU is an address corresponding to a change part in a default program, the address translator translates the address to an address in which a differential program is disposed.
Buffer Addressing for a Convolutional Neural Network
A method for providing input data for a layer of a convolutional neural network “CNN”, the method comprising: receiving input data comprising input data values to be processed in a layer of the CNN; determining addresses in banked memory of a buffer in which the received data values are to be stored based upon format data indicating a format parameter of the input data in the layer and indicating a format parameter of a filter which is to be used to process the input data in the layer; and storing the received input data values at the determined addresses in the buffer for retrieval for processing in the layer.
Deep object graph traversal
A first thread of a garbage collector can determine fields, in class metadata assigned to objects contained in a local memory, that are self-referencing fields. For a plurality of the objects, recursively, the first thread can determine whether the object has at least one child object, indicated by a self-referencing field of the object, that has not been assigned to a destination cache or a previously generated source cache. If so, the first thread can assign the child object to the destination cache to indicate that the child object is live. The first thread can add the destination cache to a global scan queue as a source cache. A second thread of the garbage collector can initiate object scanning on objects indicated in the first source cache. Memory locations where live objects are not located can be reclaimed.
TENSOR CONTROLLER ARCHITECTURE
A machine-learning accelerator system, comprising: a plurality of controllers each configured to traverse a feature map with n-dimensions according to instructions that specify, for each of the n-dimensions, a respective traversal size, wherein each controller comprises: a counter stack comprising counters each associated with a respective dimension of the n-dimensions of the feature map, wherein each counter is configured to increment a respective count from a respective initial value to the respective traversal size associated with the respective dimension associated with that counter; a plurality of address generators each configured to use the respective counts of the counters to generate at least one memory address at which a portion of the feature map is stored; and a dependency controller computing module configured to (1) track conditional statuses for incrementing the counters and (2) allow or disallow each of the counters to increment based on the conditional statuses.
Digital signal processing data transfer
A technique for transferring data in a digital signal processing system is described. In one example, the digital signal processing system comprises a number of fixed function accelerators, each connected to a memory access controller and each configured to read data from a memory device, perform one or more operations on the data, and write data to the memory device. To avoid hardwiring the fixed function accelerators together, and to provide a configurable digital signal processing system, a multi-threaded processor controls the transfer of data between the fixed function accelerators and the memory. Each processor thread is allocated to a memory access channel, and the threads are configured to detect an occurrence of an event and, responsive to this, control the memory access controller to enable a selected fixed function accelerator to read data from or write data to the memory device via its memory access channel.
EVENT-DRIVEN DESIGN SIMULATION
A simulation system that includes a simulation accelerator that uses parallel processing to accelerate the simulation of register transfer level codes (RTLs) while minimizing memory access latency is disclosed. The accelerator has an array of parallel computing resources. The simulation accelerator receives compiled RTLs in which the components of the design are mapped to instructions. The instructions are divided into groups, in which instructions belonging to a same group are logically independent of each other. The simulation accelerator fetches instructions and data for processing by the parallel computing resources for one group of instructions at a time.
DATA PLACEMENT WITH PACKET METADATA
Systems, apparatuses, and methods for determining data placement based on packet metadata are disclosed. A system includes a traffic analyzer that determines data placement across connected devices based on observed values of the metadata fields in actively exchanged packets across a plurality of protocol types. In one implementation, the protocol that is supported by the system is the compute express link (CXL) protocol. The traffic analyzer performs various actions in response to events observed in a packet stream that match items from a pre-configured list. Data movement is handled underneath the software applications by changing the virtual-to-physical address translation once the data movement is completed. After the data movement is finished, threads will pull in the new host physical address into their translation lookaside buffers (TLBs) via a page table walker or via an address translation service (ATS) request.
MEMORY ACCESS
A method for triggering prefetching of memory address translations for memory access requests to be issued by a memory access component of a processor in a data processing system to a memory management function in the data processing system is provided. The method includes obtaining command data from one or more memory access commands in a sequence of memory access commands for the memory access component, predicting one or more memory addresses, for which one or more memory address translations are likely to be required by the memory management function to process one or more memory access requests, from the obtained command data, in response to the predicting, performing one or more trigger operations to trigger a prefetch of the one or more memory address translations, using the predicted one or more memory addresses, in advance of the one or more memory access requests.
LOW-COST ADDRESS MAPPING FOR STORAGE DEVICES WITH BUILT-IN TRANSPARENT COMPRESSION
An infrastructure for mapping between logic block addresses (LBAs) and physical block addresses (PBAs). A disclosed method includes: receiving a request the specifies an LBA; determining an applicable zone based on the LBA from a set of zones, wherein the set of zones expose an LBA address space of the storage device; identifying at least one tree from a set of trees having a root node associated with the applicable zone; traversing the at least one tree to identify a set of leaf nodes based on the LBA, wherein each leaf node points to an mpage; and determining corresponding PBA information for the LBA by examining mapping information contained in each mpage.
Storage system and method for accessing same
A data access system including a processor and a storage system including a main memory and a cache module. The cache module includes a FLC controller and a cache. The cache is configured as a FLC to be accessed prior to accessing the main memory. The processor is coupled to levels of cache separate from the FLC. The processor generates, in response to data required by the processor not being in the levels of cache, a physical address corresponding to a physical location in the storage system. The FLC controller generates a virtual address based on the physical address. The virtual address corresponds to a physical location within the FLC or the main memory. The cache module causes, in response to the virtual address not corresponding to the physical location within the FLC, the data required by the processor to be retrieved from the main memory.