Patent classifications
G06F15/8076
Multithreading in vector processors
In one embodiment, a system includes a processor having a vector processing mode and a multithreading mode. The processor is configured to operate on one thread per cycle in the multithreading mode. The processor includes a program counter register having a plurality of program counters, and the program counter register is vectorized. Each program counter in the program counter register represents a distinct corresponding thread of a plurality of threads. The processor is configured to execute the plurality of threads by activating the plurality of program counters in a round robin cycle.
QUICK CLEARING OF REGISTERS
A method of clearing of registers and logic designs with AND and OR logics to propagate the zero values provided to write enable signal buses upon the execution of clear instruction of more than one registers, allowing more than one architecturally visible registers to be cleared with one signal instruction regardless of the values of data buses.
INSTRUCTION AND LOGIC TO PROVIDE VECTOR COMPRESS AND ROTATE FUNCTIONALITY
Instructions and logic provide vector compress and rotate functionality. A processor may include a mask register, a decoder, and an execution unit. The mask register may include a data field, wherein the data field corresponds to an element location in a vector. The decoder may be coupled to the mask register. The decoder may decode an instruction to obtain a decoded instruction. The decoded instruction may specify a vector source, the mask register, a vector destination, and a vector destination offset location. The execution unit is coupled to the decoder. The execution unit may read an unmasked value in the data field; copy an vector element from the vector source to a location adjacent to the element; change the unmasked value to a masked value; determine that the vector destination is full; store a vector destination operand associated with the vector destination in a memory; and re-execute the instruction using the masked value and the vector destination offset location.
Temporal clones to identify valid items from a set of items
Techniques are provided for using bitmaps to indicate which items, in a set of items, are invalid. The bitmaps include an active bitmap and one or more temporal clones. The active bitmap indicates which items in the set are currently valid. The temporal clones are outdated versions of the active bitmap that indicate which items in the set were invalid at previously points in time. Temporal clones may not be very different from each other. Therefore, temporal clones may be efficiently compressed. For example, a bitmap may be selected as a base bitmap, and one or more other bitmaps are encoded using delta encoding. Run length encoding may then be applied to further compress the bitmap information. These bitmaps may then be used to determine which items are valid relative to past-version requests.
Temporal clones to identify valid items from a set of items
Techniques are provided for using bitmaps to indicate which items, in a set of items, are invalid. The bitmaps include an active bitmap and one or more temporal clones. The active bitmap indicates which items in the set are currently valid. The temporal clones are outdated versions of the active bitmap that indicate which items in the set were invalid at previously points in time. Temporal clones may not be very different from each other. Therefore, temporal clones may be efficiently compressed. For example, a bitmap may be selected as a base bitmap, and one or more other bitmaps are encoded using delta encoding. Run length encoding may then be applied to further compress the bitmap information. These bitmaps may then be used to determine which items are valid relative to past-version requests.
EXCEPTION PRESERVING PARALLEL DATA PROCESSING OF STRING AND UNSTRUCTURED TEXT
A parallel processing method, system, and/or computer program product for performing data parallel wide accesses on an unstructured text is provided. The parallel processing includes creating a pointer that points to a beginning of the unstructured text and loading into a vector register a string segment of the unstructured text based on the pointer. Then, access permissions of a first byte of the string segment are automatically tested. In turn, a determination is made as to whether the string segment includes an end indication, and a remaining portion of the unstructured text is validated by accessing and loading a last character identified by the end indication into the vector register when the string segment is determined to include the end indication.
Instruction and logic to provide vector compress and rotate functionality
Instructions and logic provide vector compress and rotate functionality. Some embodiments, responsive to an instruction specifying: a vector source, a mask, a vector destination and destination offset, read the mask, and copy corresponding unmasked vector elements from the vector source to adjacent sequential locations in the vector destination, starting at the vector destination offset location. In some embodiments, the unmasked vector elements from the vector source are copied to adjacent sequential element locations modulo the total number of element locations in the vector destination. In some alternative embodiments, copying stops whenever the vector destination is full, and upon copying an unmasked vector element from the vector source to an adjacent sequential element location in the vector destination, the value of a corresponding field in the mask is changed to a masked value. Alternative embodiments zero elements of the vector destination, in which no element from the vector source is copied.
Granular creation and refresh of columnar data
Techniques are provided for granular load and refresh of columnar data. In an embodiment, a particular data object that contains particular data formatted different from column-major format is maintained, the particular data including first data and second data. First and second data objects contain the first and second data, respectively, organized in the column-major format. In response to changes being committed to the first data in the particular data object, invalidating one or more rows of the first data object. In response to a number of invalidated rows of the first data object exceeding a threshold, automatically performing a refresh operation on the first data object independent of any refresh operation on the second data object.
Granular creation and refresh of columnar data
Techniques are provided for granular load and refresh of columnar data. In an embodiment, a particular data object that contains particular data formatted different from column-major format is maintained, the particular data including first data and second data. First and second data objects contain the first and second data, respectively, organized in the column-major format. In response to changes being committed to the first data in the particular data object, invalidating one or more rows of the first data object. In response to a number of invalidated rows of the first data object exceeding a threshold, automatically performing a refresh operation on the first data object independent of any refresh operation on the second data object.
Endian-mode-independent memory access in a bi-endian-mode processor architecture
Embodiments relate to vector processors. An aspect includes endian-mode-sensitive memory instructions for a vector processor. One embodiment includes a computer-implemented method for copying data between a vector register that includes byte elements 0 to S and a memory that is byte addressable. The computer-implemented method includes obtaining a vector instruction by a processor in a computer. The processor determines that the vector instruction is a memory access instruction specifying the vector register and a memory address. In response to the determination that is instruction is a memory access instruction and independent of a current global endian mode setting that is selectable in the processor, the processor executes the memory access instruction by copying the byte data between the memory and the vector register so that the byte element n of the vector register corresponds to the memory address+n for n=0 to S.