Patent classifications
G06F2212/2542
ARTIFICIAL INTELLIGENCE CHIP AND DATA OPERATION METHOD
An artificial intelligence chip and a data operation method are provided. The artificial intelligence chip receives a command carrying first data and address information and includes a chip memory, a computing processor, a base address register, and an extended address processor. The base address register is configured to access an extended address space in the chip memory. The extended address processor receives the command. The extended address processor determines an operation mode of the first data according to the address information. When the address information points to a first section of the extended address space, the extended address processor performs a first operation on the first data. When the address information points to a section other than the first section of the extended address space, the extended address processor notifies the computing processor of the operation mode and the computing processor performs a second operation on the first data.
APPARATUSES, SYSTEMS, AND METHODS FOR CONTROLLING CACHE ALLOCATIONS IN A CONFIGURABLE COMBINED PRIVATE AND SHARED CACHE IN A PROCESSOR-BASED SYSTEM
Apparatuses, systems, and methods for controlling cache allocations in a configurable combined private and shared cache in a processor-based system. The processor-based system is configured to receive a cache allocation request to allocate a line in a share cache structure, which may further include a client identification (ID). The cache allocation request and the client ID can be compared to a sub-non-uniform memory access (NUMA) (sub-NUMA) bit mask and a client allocation bit mask to generate a cache allocation vector. The sub-NUMA bit mask may have been programmed to indicate that processing cores associated with a sub-NUMA region are available, whereas processing cores associated with other sub-NUMA regions are not available, and the client allocation bit mask may have been programmed to indicate that processing cores are available. The sub-NUMA bit mask and the client allocation bit mask can be combined to create a cache allocation vector that a cache allocation request to allocate a line serviced by one of processing cores.
Thread associated memory allocation and memory architecture aware allocation
A method and system for thread aware, class aware, and topology aware memory allocations. Embodiments include a compiler configured to generate compiled code (e.g., for a runtime) that when executed allocates memory on a per class per thread basis that is system topology (e.g., for non-uniform memory architecture (NUMA)) aware. Embodiments can further include an executable configured to allocate a respective memory pool during runtime for each instance of a class for each thread. The memory pools are local to a respective processor, core, etc., where each thread executes.
INCREASING PAGE SHARING ON NON-UNIFORM MEMORY ACCESS (NUMA)-ENABLED HOST SYSTEMS
In one set of embodiments, a hypervisor of a host system can determine that a delta between local and remote memory access latencies for each of a subset of NUMA nodes of the host system is less than a threshold. In response, the hypervisor can enable page sharing across the subset of NUMA nodes, where enabling page sharing comprises associating the subset of NUMA nodes with a single page sharing table, and where the single page sharing table holds entries identifying host physical memory pages of the host system that are shared by virtual machines (VMs) placed on the subset of NUMA nodes.
Distributed index for fault tolerant object memory fabric
Embodiments of the invention provide systems and methods for managing processing, memory, storage, network, and cloud computing to significantly improve the efficiency and performance of processing nodes. Embodiments can implement an object memory fabric including object memory modules storing memory objects created natively within the object memory module and may be a managed at a memory layer. The memory module object directory may index all memory objects within the object memory module. A hierarchy of object routers communicatively coupling the object memory modules may each include a router object directory that indexes all memory objects and portions contained in object memory modules below the object router in the hierarchy. The hierarchy of object routers may behave in aggregate as a single object directory communicatively coupled to all object memory modules and to process requests based on the router object directories.
DEVICE AND METHOD FOR SHARED MEMORY PROCESSING AND NON-TRANSITORY COMPUTER STORAGE MEDIUM
A device for shared memory processing is provided in implementations of the disclosure. The device for shared memory processing includes a set of shared memory units, a set of processing units, and a set of global clock synchronizers. Each shared memory unit corresponds to one global clock synchronizer and is coupled with K processing units via the corresponding global clock synchronizer, and the coupled K processing units perform conflict-free memory access to the shared memory unit during one instruction cycle of the corresponding global clock synchronizer. One instruction cycle of each global clock synchronizer includes N clocks, K is less than or equal to N, and K and N are integers greater than zero. A method for shared memory processing and a non-transitory computer storage medium are also provided.
NETWORK ENTITIES AND METHODS PERFORMED THEREIN FOR HANDLING CACHE COHERENCY
A method performed by a coordinating entity in a disaggregated data center architecture wherein computing resources are separated in discrete resource pools and associated together to represent a functional server. The coordinating entity obtains a setup of processor cores that are coupled logically as the functional server, and determines an index indicating an identity of a cache coherency domain based on the obtained setup of processor cores. The coordinating entity further configures one or more communicating entities associated with the obtained setup of processor cores, to use the determined index when handling updated cache related data.
INTEGRATING AND INCREASING PERFORMANCE OF DISAGGREGATED MEMORY IN OPERATING SYSTEMS
Local memory and disaggregated memory may be identified and monitored for integrating disaggregated memory in a computing system. Candidate data may be migrated between the local memory and disaggregated memory to optimize allocation of disaggregated memory and migrated data according to a dynamic set of migration criteria.
DYNAMIC COMPRESSION FOR MULTIPROCESSOR PLATFORMS AND INTERCONNECTS
The present disclosure provides an interconnect for a non-uniform memory architecture platform to provide remote access where data can dynamically and adaptively be compressed and decompressed at the interconnect link. A requesting interconnect link can add a delay to before transmitting requested data onto an interconnect bus, compress the data before transmission, or packetize and compress data before transmission. Likewise, a remote interconnect link can decompress request data.
SUB-NUMA CLUSTERING FAULT RESILIENT MEMORY SYSTEM
A sub-Non-Uniform Memory Access (sub-NUMA) clustering fault resilient memory system includes an initialization subsystem that is coupled to a processing system and a memory system. The initialization subsystem determines that the processing system and the memory system are configured to provide a plurality of NUMA nodes, allocates a respective portion of the memory system to each of the plurality of NUMA nodes, and configures each respective portion of the memory system to mirror a mirrored subset of that respective portion of the memory system. Subsequently, respective data that is utilized by each of the plurality of NUMA nodes provided by the processing system and the memory system and that is stored in the mirrored subset of the respective portion of the memory system allocated to that NUMA node is mirrored in that respective portion of the memory system.