Patent classifications
G06F9/355
Providing code sections for matrix of arithmetic logic units in a processor
The present invention relates to a processor having a trace cache and a plurality of ALUs arranged in a matrix, comprising an analyser unit located between the trace cache and the ALUs, wherein the analyser unit analyses the code in the trace cache, detects loops, transforms the code, and issues to the ALUs sections of the code combined to blocks for joint execution for a plurality of clock cycles.
Method and apparatus for performing reduction operations on a set of vector elements
An apparatus and method are described for performing SIMD reduction operations. For example, one embodiment of a processor comprises: a value vector register containing a plurality of data element values to be reduced; an index vector register to store a plurality of index values indicating which values in the value vector register are associated with one another; single instruction multiple data (SIMD) reduction logic to perform reduction operations on the data element values within the value vector register by combining data element values from the value vector register which are associated with one another as indicated by the index values in the index vector register; and an accumulation vector register to store results of the reduction operations generated by the SIMD reduction logic.
Generation and use of memory access instruction order encodings
Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes selecting a next memory load or memory store instruction to execute based on dependencies encoded within the block, and on a store vector that stores data indicating which memory load and memory store instructions in the instruction block have executed. The store vector can be masked using a store mask. The store mask can be generated when decoding the instruction block, or copied from an instruction block header. Based on the encoded dependencies and the masked store vector, the next instruction can issue when its dependencies are available.
AUTOSCALING AND THROTTLING IN AN ELASTIC CLOUD SERVICE
Techniques described herein can optimize usage of computing resources in a data system. Dynamic throttling can be performed locally on a computing resource in the foreground and autoscaling can be performed in a centralized fashion in the background. Dynamic throttling can lower the load without overshooting while minimizing oscillation and reducing the throttle quickly. Autoscaling may involve scaling in or out the number of computing resources in a cluster as well as scaling up or down the type of computing resources to handle different types of situations.
MEMORY DEALLOCATION ACROSS A TRUST BOUNDARY
A method of memory deallocation across a trust boundary between a first software component and a second software component is described. Some memory is shared between the first and second software components. An in-memory message passing facility is implemented using the shared memory. The first software component is used to deallocate memory from the shared memory which has been allocated by the second software component. The deallocation is done by: taking at least one allocation to be freed from the message passing facility; and freeing the at least one allocation using a local deallocation mechanism while validating that memory access to memory owned by data structures related to memory allocation within the shared memory are within the shared memory.
MEMORY DEALLOCATION ACROSS A TRUST BOUNDARY
A method of memory deallocation across a trust boundary between a first software component and a second software component is described. Some memory is shared between the first and second software components. An in-memory message passing facility is implemented using the shared memory. The first software component is used to deallocate memory from the shared memory which has been allocated by the second software component. The deallocation is done by: taking at least one allocation to be freed from the message passing facility; and freeing the at least one allocation using a local deallocation mechanism while validating that memory access to memory owned by data structures related to memory allocation within the shared memory are within the shared memory.
Computer processor that implements pre-translation of virtual addresses with target registers
A computer processor that implements pre-translation of virtual addresses with target registers is disclosed. The computer processor may include a register file comprising one or more registers. The computer processor may include processing logic. The processing logic may receive a value to store in a register of one or more registers. The processing logic may store the value in the register. The processing logic may designate the received value as a virtual instruction address, the virtual instruction address having a corresponding virtual base page number. The processing logic may translate the virtual base page number to a corresponding real base page number and zero or more real page numbers corresponding to zero or more virtual page numbers adjacent to the virtual base page number. The processing logic may further store in the register of the one or more registers the real base page number and the zero or more real page numbers.
Computer processor that implements pre-translation of virtual addresses with target registers
A computer processor that implements pre-translation of virtual addresses with target registers is disclosed. The computer processor may include a register file comprising one or more registers. The computer processor may include processing logic. The processing logic may receive a value to store in a register of one or more registers. The processing logic may store the value in the register. The processing logic may designate the received value as a virtual instruction address, the virtual instruction address having a corresponding virtual base page number. The processing logic may translate the virtual base page number to a corresponding real base page number and zero or more real page numbers corresponding to zero or more virtual page numbers adjacent to the virtual base page number. The processing logic may further store in the register of the one or more registers the real base page number and the zero or more real page numbers.
Apparatus and method for efficient gather and scatter operations
An apparatus and method are described for performing efficient gather operations in a pipelined processor. For example, a processor according to one embodiment of the invention comprises: gather setup logic to execute one or more gather setup operations in anticipation of one or more gather operations, the gather setup operations to determine one or more addresses of vector data elements to be gathered by the gather operations; and gather logic to execute the one or more gather operations to gather the vector data elements using the one or more addresses determined by the gather setup operations.
Apparatus and method for efficient gather and scatter operations
An apparatus and method are described for performing efficient gather operations in a pipelined processor. For example, a processor according to one embodiment of the invention comprises: gather setup logic to execute one or more gather setup operations in anticipation of one or more gather operations, the gather setup operations to determine one or more addresses of vector data elements to be gathered by the gather operations; and gather logic to execute the one or more gather operations to gather the vector data elements using the one or more addresses determined by the gather setup operations.