Patent classifications
G06F9/30192
Non-cached loads and stores in a system having a multi-threaded, self-scheduling processor
Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute instructions; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In a representative embodiment, the processor core is further adapted to execute a non-cached load instruction to designate a general purpose register rather than a data cache for storage of data received from a memory circuit. The core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, and to generate one or more work descriptor data packets to another circuit for execution of corresponding execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.
APPARATUS AND METHOD FOR PROPAGATING CONDITIONALLY EVALUATED VALUES IN SIMD/VECTOR EXECUTION USING AN INPUT MASK REGISTER
An apparatus and method for propagating conditionally evaluated values are disclosed. For example, a method according to one embodiment comprises: reading each value contained in an input mask register, each value being a true value or a false value and having a bit position associated therewith; for each true value read from the input mask register, generating a first result containing the bit position of the true value; for each false value read from the input mask register following the first true value, adding the vector length of the input mask register to a bit position of the last true value read from the input mask register to generate a second result; and storing each of the first results and second results in bit positions of an output register corresponding to the bit positions read from the input mask register.
Method and apparatus for optimizing scan data and method and apparatus for correcting trajectory
A method and an apparatus optimizes scan data obtained by sensors on vehicle, and corrects trajectory for a vehicle/robot based on the optimized scan data. The method for optimizing the scan data obtained by scanning environment elements, includes: step of obtaining the scan data, including obtaining at least two frames of scan data respectively corresponding to different timings; step of cluster processing, based on the characteristic of the data points, including classifying the plurality of data points in each frame of the scan data into one or more clusters; step of establishing correspondence, among the at least two frames of scan data, including searching and obtaining at least one set of clusters having correspondence; step of optimizing clusters, among the at least two frames of scan data, including conducting calculation to each set of the at least one set of clusters having correspondence, to obtain optimized clusters respectively corresponding to each set of the at least one set of clusters having correspondence; and step of optimizing the scan data, including accumulating all optimized clusters to obtain an optimized scan date for the at least two frames of scan data.
Thread commencement and completion using work descriptor packets in a system having a self-scheduling processor and a hybrid threading fabric
Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.
Method of secure memory addressing
The problem to be solved is to seek an alternative to known addressing methods which provides the same or similar effects or is more secure. Solution The problem is solved by a method (40) of addressing memory in a data-processing apparatus (10) comprising, when a central processing unit (11), while performing a task (31, 32, 33, 34) of the apparatus (10), executes an instruction involving a pointer (59) into a segment (s, r, d, h, f, o, i, c) of the memory: decoding the instruction by means of an instruction decoder (12), generating a virtual address (45) within the memory by means of a safe pointer operator (41) operating on the pointer (59), augmenting the virtual address (45) by an identifier (43) of the task (31, 32, 33, 34) and an identifier (44) of the segment (s, r, d, h, f, o, i, c), said identifiers (43, 44) being hardware-controlled (42), and, based on the augmented address (45), dereferencing the pointer (59) via a memory management unit (13).
APPLICATION PROGRAMMING INTERFACE TO DECOMPRESS DATA
Apparatuses, systems, and techniques to perform an operation to indicate one or more non-zero values within one or more matrices of data; to perform an API to compress one or more matrices of data; to perform a matrix multiply accumulate (MMA) operation on two or more matrices of data, wherein at least one of the two or more matrices contain compressed data; and/or to perform an API to decompress one or more matrices of data. In at least one embodiment, one or more circuits are configured to receive and compile one or more instructions to perform computational operations for a sparse matrix multiplication.
Stacked transistors with different gate lengths in different device strata
Disclosed herein are stacked transistors with different gate lengths in different device strata, as well as related methods and devices. In some embodiments, an integrated circuit structure may include stacked strata of transistors, with two different device strata having different gate lengths.
Apparatus and method for performing operations on capability metadata
An apparatus is provided comprising storage elements to store data blocks, where each data block has capability metadata associated therewith identifying whether the data block specifies a capability, at least one capability type being a bounded pointer. Processing circuitry is then arranged to be responsive to a bulk capability metadata operation identifying a plurality of the storage elements, to perform an operation on the capability metadata associated with each data block stored in the plurality of storage elements. Via a single specified operation, this hence enables query and/or modification operations to be performed on multiple items of capability metadata, hence providing more efficient access to such capability metadata.
Managed multiple die memory QoS
Devices and techniques for implementing quality-of-service (QoS) parameters in a managed memory device having a number of memory dies are disclosed herein.
Microarchitectural sensitive tag flow
Described herein are systems and methods for microarchitectural sensitive tag flow. For example, some methods include detecting dependence of data stored in a second data storage circuitry on the first instruction, where the first instruction will output a value to be stored in the second data storage circuitry, and wherein the second data storage circuitry is associated with a third tag indicating whether the second data storage circuitry has been designated as storing sensitive data; responsive to the dependence of data stored in the second data storage circuitry on the first instruction, checking whether the second tag indicates a sensitive instruction; and, responsive to the second tag indicating a sensitive instruction, updating the third tag to indicate that data stored in the second data storage circuitry has been designated as sensitive.