Patent classifications
G06F9/3812
Look-ahead staging for time-travel reconstruction
Disclosed herein are system, method, and computer program product embodiments for utilizing look-ahead-staging (LAS) to guarantee the ability to rollback and reconstruct a package while minimizing locking duration and enabling multiple packages to be processed in a data pipeline simultaneously. An embodiment operates by receiving a package from a source system for processing through a data pipeline. The embodiment stores the package in a persistent storage together with a respective package status. The embodiment transmits the package to the data pipeline in response to the storing. The embodiment receives a commit notification for the package from a target system in response to the transmitting. The embodiment then removes the package from the persistent storage in response to receiving the commit notification for the package.
Modifying a series of lock acquire and release operations to use a single lock reservation
Provided are a computer program product, system, and method for modifying a series of lock acquire and release operations to use a single lock reservation. A representation of source code is scanned to determine a series of acquire lock program statement and release lock program statement pairs to acquire and release a lock by a thread. A first acquire lock program statement in the series is modified to be an acquire with reserve program statement that when executed by the thread causes the thread to acquire the lock and indicate the lock as reserved for use by the thread. A last release lock program statement in the series is modified to be a release with cancel program statement that when executed by the thread causes the thread to release the lock and indicate the lock as not reserved.
Dynamic hammock branch training for branch hammock detection in an instruction stream executing in a processor
Dynamic hammock branch training for branch hammock detection in an instruction stream executing in a processor is disclosed. A branch hammock detection circuit is configured to dynamically detect branch hammocks in an instruction stream during run-time processing of the instruction stream. In response to an identified conditional branch instruction, the branch hammock detection circuit starts a training process for a potential branch hammock predicated by the conditional branch instruction. The branch hammock detection circuit is configured to determine if an identified in-training branch hammock is an actual branch hammock based on setting a potential convergence point as the target address for the conditional branch instruction based on whether the branch is taken or not taken. If an instruction is processed at the set convergence point, this means the set convergence point can be an actual convergence point and the in-training branch hammock can be detected as an actual branch hammock.
METHOD AND SYSTEM FOR ACCELERATING AI TRAINING WITH ADVANCED INTERCONNECT TECHNOLOGIES
According to various embodiments, methods and systems are provided to accelerate artificial intelligence (AI) model training with advanced interconnect communication technologies and systematic zero-value compression over a distributed training system. According to an exemplary method, during each iteration of a Scatter-Reduce process performed on a cluster of processors arranged in a logical ring to train a neural network model, a processor receives a compressed data block from a prior processor in the logical ring, performs an operation on the received compressed data block and a compressed data block generated on the processor to obtain a calculated data block, and sends the calculated data block to a following processor in the logical ring. A compressed data block calculated from corresponding data blocks from the processors can be identified on each processor and distributed to each other processor and decompressed therein for use in the AI model training.
Protection domains for processes in shared address space
Methods, systems and computer program products provide protection domains for processes in shared address space. Multiple processes may share address space, for example, in a software isolated process running on top of a library operating system (OS). A protection domain (PD), such as a Protection Key (PKEY), may be assigned to a process to protect its allocated address spaces from access by other processes. PDs may be acquired from a host OS. A library OS may manage PDs to protect processes and/or data. A PD may be freed and reassigned to a different process or may be concurrently assigned to multiple processes, for example, when the number of processes exceeds the number of protection domains. Threads spawned by a process may inherit protection provided by a PD assigned to the process. Process PDs may be disassociated with address spaces as they are deallocated for a process or its threads.
DATA PIPELINE CONTROLLER
A processing system including at least one processor may obtain a first ontology of a first type of data pipeline component, map the first ontology to a second ontology for a second type of data pipeline component that is stored in a catalog of data pipeline component types, provide a second data schema for the second type of data pipeline component as a template for a first data schema for the first type of data pipeline component, and add the first type of data pipeline component to the catalog of data pipeline component types, where the adding comprises storing the first ontology and the first data schema for the first type of data pipeline component in the catalog of data pipeline component types.
Performing atomic store-and-invalidate operations in processor-based devices
Performing atomic store-and-invalidate operations in processor-based devices is disclosed. In this regard, a processing element (PE) of one or more PEs of a processor-based device includes a store-and-invalidate logic circuit used by a memory access stage of an execution pipeline of the PE to perform an atomic store-and-invalidate operation. Upon receiving an indication to perform a store-and-invalidate operation (e.g., in response to a store-and-invalidate instruction execution) comprising a store address and store data, the memory access stage uses the store-and-invalidate logic circuit to write the store data to a memory location indicated by the store address, and to invalidate an instruction cache line corresponding to the store address in an instruction cache of the PE. The operations for storing data and invalidating instruction cache lines are performed as one atomic store-and-invalidate operation, such that the store-and-invalidate operation is considered successful only if both the store and invalidate operations are successful.
MODIFYING A SERIES OF LOCK ACQUIRE AND RELEASE OPERATIONS TO USE A SINGLE LOCK RESERVATION
Provided are a computer program product, system, and method for modifying a series of lock acquire and release operations to use a single lock reservation. A representation of source code is scanned to determine a series of acquire lock program statement and release lock program statement pairs to acquire and release a lock by a thread. A first acquire lock program statement in the series is modified to be an acquire with reserve program statement that when executed by the thread causes the thread to acquire the lock and indicate the lock as reserved for use by the thread. A last release lock program statement in the series is modified to be a release with cancel program statement that when executed by the thread causes the thread to release the lock and indicate the lock as not reserved.
Compilation and execution of parallel code fragments
Systems and methods for executing compiled code having parallel code fragments is provided. One method includes storing executable code having a plurality of parallel code fragments, each of the plurality of parallel code fragments representing alternative executable paths through a code stream. The method further includes determining a code level supported by a processor executable at a computing system, the processor executable supporting a hosted computing environment. The method also includes translating the executable code into machine-readable code executable by a processor of the computing system. Translating the executable code includes selecting a code fragment from among the plurality of parallel code fragments for execution based on the code level supported by the processor executable. The method includes executing the machine-readable code within the hosted computing environment.
CONTROLLING THE OPERATION OF A DECOUPLED ACCESS-EXECUTE PROCESSOR
Data processing apparatuses, methods of data processing, instructions, and simulator computer programs for providing a corresponding instruction execution environment are disclosed. Decode circuitry is responsive to an instance of a predetermined instruction type to cause issue circuitry to issue at least one subsequent instruction for execution to one of first and second instruction execution circuitry which support decoupled access-execute instruction execution. The predetermined instruction type is thus a steering instruction for at least one subsequent instruction and the programmer is provided with a mechanism for determining which program instructions are treated as access instructions and which are treated as execute instructions.