Patent classifications
G06F9/3009
Permitting unaborted processing of transaction after exception mask update instruction
A data processing apparatus has processing circuitry with transactional memory support circuitry to support execution of a transaction using transactional memory. In response to an exception mask updating instruction which updates exception mask information to enable at least one subset of exceptions which was disabled at the start of processing of a transaction, the processing circuitry permits un-aborted processing of one or more subsequent instruction of the transaction that follow the exception mask update instruction.
SOFTWARE-DIRECTED DIVERGENT BRANCH TARGET PRIORITIZATION
Instruction set architecture extensions to configure priority ordering of divergent target branch instructions on SIMT computing platforms to enable tools such as compilers (e.g., under influence of execution profilers) or human software developers to configure branch direction prioritization explicitly in code. Extensions for simple (two-way) branch instructions as well as multi-target (more than two branch target instructions) are disclosed.
HARDWARE ASSISTED EFFICIENT MEMORY MANAGEMENT FOR DISTRIBUTED APPLICATIONS WITH REMOTE MEMORY ACCESSES
Systems, apparatuses and methods may provide for technology that uses centralized hardware to detect a local allocation request associated with a local thread, detect a remote allocation request associated with a remote thread, wherein the remote allocation request bypasses a remote operating system, and process the local allocation request and the remote allocation request with respect to central heap, wherein the central heap is shared by the local thread and the remote thread. The local allocation request and the remote allocation request may include one or more of a first request to allocate a memory block of a specified size, a second request to allocate multiple memory blocks of a same size, a third request to resize a previously allocated memory block, or a fourth request to deallocate the previously allocated memory block.
Techniques for efficiently transferring data to a processor
A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS TO REQUEST A HISTORY RESET OF A PROCESSOR CORE
Systems, methods, and apparatuses relating to instructions to reset software thread runtime property histories in a hardware processor are described. In one embodiment, a hardware processor includes a hardware guide scheduler comprising a plurality of software thread runtime property histories; a decoder to decode a single instruction into a decoded single instruction, the single instruction having a field that identifies a model-specific register; and an execution circuit to execute the decoded single instruction to check that an enable bit of the model-specific register is set, and when the enable bit is set, to reset the plurality of software thread runtime property histories of the hardware guide scheduler.
Enhanced processing of user profiles using data structures specialized for graphical processing units (GPUs)
Disclosed are techniques for processing user profiles using data structures that are specialized for processing by a GPU. More particularly, the disclosed techniques relate to systems and methods for evaluating characteristics of user profiles to determine whether to offload certain user profiles to the GPU for processing or to process the user profiles locally by one or more central processing units (CPUs). Processing user profiles may include comparing the interest tags included in the user profiles with logic trees, for example, logic trees representing marketing campaigns, to identify user profiles that match the campaigns.
Apparatuses, methods, and systems for instructions to request a history reset of a processor core
Systems, methods, and apparatuses relating to instructions to reset software thread runtime property histories in a hardware processor are described. In one embodiment, a hardware processor includes a hardware guide scheduler comprising a plurality of software thread runtime property histories; a decoder to decode a single instruction into a decoded single instruction, the single instruction having a field that identifies a model-specific register; and an execution circuit to execute the decoded single instruction to check that an enable bit of the model-specific register is set, and when the enable bit is set, to reset the plurality of software thread runtime property histories of the hardware guide scheduler.
CASCADING EXECUTION OF ATOMIC OPERATIONS
Cascading execution of atomic operations, including: receiving a request for each thread of a plurality of threads to perform an atomic operation, wherein the plurality of threads comprises a plurality of thread subsets each corresponding to a local memory, wherein the local memory for a thread subset is accessible by the thread subset and inaccessible to a remainder of threads in the plurality of threads; generating a plurality of intermediate results by performing, by each thread subset, the atomic operation in the local memory corresponding to the thread subset; and generating a result for the request by aggregating the plurality of intermediate results in a shared memory accessible to all threads in the plurality of threads.
DYNAMIC ASYMMETRIC RESOURCES
Techniques for tuning a processor core and/or accelerator are described. An example of a technique is the use of monitoring circuitry to monitor threads in a processor core; and hardware unit off resources to, in response to commands from the scheduler, to selectively enable and/or disable execution circuitry in the processor core to tune the processor core for a thread.
Interrupt handling method, computer system, and non-transitory storage medium that resumes waiting threads in response to interrupt signals from I/O devices
The present invention provides an interrupt handling system for handling interrupts in a computer system is provided. The interrupt handling system captures and processes the interrupts in a user space of the computer system. The present invention also provides for an interrupt registration method that facilitates interrupt handling in the user space during porting of user applications from one platform to another.