G06F9/52

System, apparatus and method for configurable control of asymmetric multi-threading (SMT) on a per core basis

In one embodiment, a processor includes: a plurality of cores each comprising a multi-threaded core to concurrently execute a plurality of threads; and a control circuit to concurrently enable at least one of the plurality of cores to operate in a single-threaded mode and at least one other of the plurality of cores to operate in a multi-threaded mode. Other embodiments are described and claimed.

System, apparatus and method for configurable control of asymmetric multi-threading (SMT) on a per core basis

In one embodiment, a processor includes: a plurality of cores each comprising a multi-threaded core to concurrently execute a plurality of threads; and a control circuit to concurrently enable at least one of the plurality of cores to operate in a single-threaded mode and at least one other of the plurality of cores to operate in a multi-threaded mode. Other embodiments are described and claimed.

Method and system for converting a single-threaded software program into an application-specific supercomputer

The invention comprises (i) a compilation method for automatically converting a single-threaded software program into an application-specific supercomputer, and (ii) the supercomputer system structure generated as a result of applying this method. The compilation method comprises: (a) Converting an arbitrary code fragment from the application into customized hardware whose execution is functionally equivalent to the software execution of the code fragment; and (b) Generating interfaces on the hardware and software parts of the application, which (i) Perform a software-to-hardware program state transfer at the entries of the code fragment; (ii) Perform a hardware-to-software program state transfer at the exits of the code fragment; and (iii) Maintain memory coherence between the software and hardware memories. If the resulting hardware design is large, it is divided into partitions such that each partition can fit into a single chip. Then, a single union chip is created which can realize any of the partitions.

Method and system for converting a single-threaded software program into an application-specific supercomputer

The invention comprises (i) a compilation method for automatically converting a single-threaded software program into an application-specific supercomputer, and (ii) the supercomputer system structure generated as a result of applying this method. The compilation method comprises: (a) Converting an arbitrary code fragment from the application into customized hardware whose execution is functionally equivalent to the software execution of the code fragment; and (b) Generating interfaces on the hardware and software parts of the application, which (i) Perform a software-to-hardware program state transfer at the entries of the code fragment; (ii) Perform a hardware-to-software program state transfer at the exits of the code fragment; and (iii) Maintain memory coherence between the software and hardware memories. If the resulting hardware design is large, it is divided into partitions such that each partition can fit into a single chip. Then, a single union chip is created which can realize any of the partitions.

Method and system for performing parallel computations to generate multiple output feature maps
11579921 · 2023-02-14 · ·

Systems and methods for performing parallel computation are disclosed. The system can include: a task manager; and a plurality of cores coupled with the task manager and configured to respectively perform a set of parallel computation tasks based on instructions from the task manager, wherein each of the plurality of cores further comprises: a processing unit configured to generate a first output feature map corresponding to a first computation task among the set of parallel computation tasks; an interface configured to receive one or more instructions from the task manager to collect external output feature maps corresponding to the set of parallel computation tasks from other cores of the plurality of cores; a reduction unit configured to generate a reduced feature map based on the first output feature map and received external output feature maps.

System and method for controlling access to shared resource in system-on-chips

An access control system controls access to a shared resource for various functional circuits. The access control system can include a comparison circuit, a processing circuit, and a selection circuit. The comparison circuit receives identification data associated with a functional circuit based on a transaction initiated by the functional circuit, and compares the identification data and reference data to generate a select signal. The processing circuit receives error data and response data outputted by the shared resource based on an execution of the transaction, and generates another response data. The selection circuit selects and outputs, based on the select signal, one of the response data outputted by the shared resource and the response data generated by the processing circuit as a transaction response that is to be provided to the functional circuit.

System and method for controlling access to shared resource in system-on-chips

An access control system controls access to a shared resource for various functional circuits. The access control system can include a comparison circuit, a processing circuit, and a selection circuit. The comparison circuit receives identification data associated with a functional circuit based on a transaction initiated by the functional circuit, and compares the identification data and reference data to generate a select signal. The processing circuit receives error data and response data outputted by the shared resource based on an execution of the transaction, and generates another response data. The selection circuit selects and outputs, based on the select signal, one of the response data outputted by the shared resource and the response data generated by the processing circuit as a transaction response that is to be provided to the functional circuit.

Data race detection with per-thread memory protection

Data race detection in multi-threaded programs can be achieved by leveraging per-thread memory protection technology in conjunction with a custom dynamic memory allocator to protect shared memory objects with unique memory protection keys, allowing data races to be turned into inter-thread memory access violations. Threads may acquire or release the keys used for accessing protected memory objects at the entry and exit points of critical sections within the program. An attempt by a thread to access a protected memory object within a critical section without the associated key triggers a protection fault, which may be indicative of a data race.

HYBRID DATABASE FOR TRANSACTIONAL AND ANALYTICAL WORKLOADS

A computer-implemented method, medium, and system for global deadlock detection in a hybrid database for transactional and analytical workloads are disclosed. In one computer-implemented method, a daemon is launched on a coordinator segment in a massively parallel processing (MPP) database, where the MPP database is a hybrid database for both transactional workloads and analytical workloads. A respective local wait-for graph for each of a plurality of segments in the MPP database is collected periodically, where each of the plurality of segments includes the coordinator segment or a worker segment of a plurality of worker segments in the MPP database. A global wait-for graph that includes all collected local wait-for graphs is built. The global wait-for graph is used to determine that a global deadlock exists in the MPP database. The global deadlock is broken using one or more predefined policies in response to determining that the global deadlock exists.

Surrogate process creation technique for high process-per-server scenarios

A system and method for launching parallel processes on a server configured to process a number of parallel processes. A request is received from a parallel application to start a number of parallel processes. In response to this request a launcher creates a surrogate. The surrogate inherits communications channels from the launcher. The surrogate then executes activities related to the launch of the parallel processes, and then launches the parallel processes. The parallel processes are launched and the surrogate is terminated.