G06F15/8061

VECTOR PROCESSOR DATA STORAGE

Aspects of the present disclosure provide an aligned storage strategy for stripes within a long vector for a vector processor, such that the extra computation needed to track strides between input stripes and output stripes may be eliminated. As a result, the stripe locations are located in a more predictable memory access pattern such that memory access bandwidth may be improved and the tendency for memory error may be reduced.

Technologies for dynamically managing resources in disaggregated accelerators

Technologies for dynamically managing resources in disaggregated accelerators include an accelerator. The accelerator includes acceleration circuitry with multiple logic portions, each capable of executing a different workload. Additionally, the accelerator includes communication circuitry to receive a workload to be executed by a logic portion of the accelerator and a dynamic resource allocation logic unit to identify a resource utilization threshold associated with one or more shared resources of the accelerator to be used by a logic portion in the execution of the workload, limit, as a function of the resource utilization threshold, the utilization of the one or more shared resources by the logic portion as the logic portion executes the workload, and subsequently adjust the resource utilization threshold as the workload is executed. Other embodiments are also described and claimed.

Technologies for data center multi-zone cabling

Technologies for connecting data cables in a data center are disclosed. In the illustrative embodiment, racks of the data center are grouped into different zones based on the distance from the racks in a given zone to a network switch. All of the racks in a given zone are connected to the network switch using data cables of the same length. In some embodiments, certain physical resources such as storage may be placed in racks that are in zones closer to the network switch and therefore use shorter data cables with lower latency. An orchestrator server may, in some embodiments, schedule workloads or create virtual servers based on the different zones and corresponding latency of different physical resources.

Techniques to configure physical compute resources for workloads via circuit switching

Embodiments are generally directed apparatuses, methods, techniques and so forth to select two or more processing units of the plurality of processing units to process a workload, and configure a circuit switch to link the two or more processing units to process the workload, the two or more processing units each linked to each other via paths of communication and the circuit switch.

Reconfigurable parallel processing

Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.

DYNAMIC PROCESSING MEMORY CORE ON A SINGLE MEMORY CHIP
20210357151 · 2021-11-18 ·

Embodiments of the present invention provide a method for incorporating a dynamic processing memory core into a single memory chip to enable computational processing and memory storage from the single memory chip. The method includes storing data elements by memory storage devices positioned on the single memory chip. The method also includes executing, by a processing devices positioned on the single memory chip, memory instructions. The method also includes transitioning the dynamic memory processing core from a memory storage device to a processing device by instructing the processing device to execute the memory instructions. The method also includes transitioning the dynamic processing memory core from the processing device to the memory storage device by instructing the processing device to not execute the memory instructions thereby terminating the computational processing of the dynamic processing memory core and maintaining the memory storage provided by the memory storage device.

Reconfigurable parallel processing with various reconfigurable units to form two or more physical data paths and routing data from one physical data path to a gasket memory to be used in a future physical data path as input

Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.

Method and apparatus for desynchronizing execution in a vector processor

In one implementation a vector processor unit having preload registers for at least some of vector length, vector constant, vector address, and vector stride. Each preload register has an input and an output. All the preload register inputs are coupled to receive a new vector parameters. Each of the preload registers' outputs are coupled to a first input of a respective multiplexor, and the second input of all the respective multiplexors are coupled to the new vector parameters.

Vector compare and store instruction that stores index values to memory

The present disclosure is directed to methods to generate a packed result array using parallel vector processing, of an input array and a comparison operation. In one aspect, an additive scan operation can be used to generate memory offsets for each successful comparison operation of the input array and to generate a count of the number of data elements satisfying the comparison operation. In another aspect, the input array can be segmented to allow more efficient processing using the vector registers. In another aspect, a vector processing system is disclosed that is operable to receive a data array, a comparison operation, and threshold criteria, and output a packed array, at a specified memory address, comprising of the data elements satisfying the comparison operation.

DYNAMIC PROCESSING MEMORY CORE ON A SINGLE MEMORY CHIP
20230297287 · 2023-09-21 ·

Embodiments of the present invention provide a method for incorporating a dynamic processing memory core into a single memory chip to enable computational processing and memory storage from the single memory chip. The method includes storing data elements by memory storage devices positioned on the single memory chip. The method also includes executing, by a processing devices positioned on the single memory chip, memory instructions. The method also includes transitioning the dynamic memory processing core from a memory storage device to a processing device by instructing the processing device to execute the memory instructions. The method also includes transitioning the dynamic processing memory core from the processing device to the memory storage device by instructing the processing device to not execute the memory instructions thereby terminating the computational processing of the dynamic processing memory core and maintaining the memory storage provided by the memory storage device.