G06F9/3818

Distributed system task management using a simulated clock

Systems and methods are provided for the deterministic simulation of distributed systems, such as vehicle-based processing systems. A distributed system may be represented as a plurality of subsystems or “nodelets” executing with a single process of a computing device during a simulation. A simulated clock may be used during execution of the nodelets to mitigate the variability in timestamped data that may be caused by latency or jitter. In some embodiments, all timestamps generated during a given frame of work will be assigned the same time value, regardless of when within the frame the timestamps were generated. A task scheduler can update the value of the simulated clock as execution proceeds through different frames of work.

RANGE PREFETCH INSTRUCTION

In response to an instruction decoder decoding a range prefetch instruction specifying first and second address-range-specifying parameters and a stride parameter, prefetch circuitry controls, depending on the first and second address-range-specifying parameters and the stride parameter, prefetching of data from a plurality of specified ranges of addresses into the at least one cache. A start address and size of each specified range is dependent on the first and second address-range-specifying parameters. The stride parameter specifies an offset between start addresses of successive specified ranges. Use of the range prefetch instruction helps to improve programmability and improve the balance between prefetch coverage and circuit area of the prefetch circuitry.

LOOP DRIVEN REGION BASED FRONTEND TRANSLATION CONTROL FOR PERFORMANT AND SECURE DATA-SPACE GUIDED MICRO-SEQUENCING

Methods and apparatus relating to loop driven region based frontend translation control for performant and secure data-space guided micro-sequencing are described. In an embodiment, Data-space Translation Logic (DTL) circuitry receives a static input and a dynamic input and generates one or more outputs based at least in part on the static input and the dynamic input. A frontend counter generates a count value for the dynamic input based at least in part on an incremented/decremented counter value and a next counter value from the DTL circuitry. The DTL circuitry is capable to receive a new dynamic input prior to consumption of the one or more outputs. Other embodiments are also disclosed and claimed.

ISSUE, EXECUTION, AND BACKEND DRIVEN FRONTEND TRANSLATION CONTROL FOR PERFORMANT AND SECURE DATA-SPACE GUIDED MICRO-SEQUENCING

Methods and apparatus relating to issue, execution, and backend driven frontend translation control for performant and secure data-space guided micro-sequencing are described. In an embodiment, Data-space Translation Logic (DTL) circuitry receives a static input and a dynamic input, and generates one or more outputs based at least in part on the static input and the dynamic input. The DTL circuitry generates the one or more outputs prior to commencement of speculation operations in a processor. Other embodiments are also disclosed and claimed.

Method and system for vehicle engagement control

A method includes receiving, by machine-learning logic, observations indicative of a states associated with a first and second group of vehicles arranged within an engagement zone during a first interval of an engagement between the first and the second group of vehicles. The machine-learning logic determines actions based on the observations that, when taken simultaneously by the first group of vehicles during the first interval, are predicted by the machine-learning logic to result in removal of one or more vehicles of the second group of vehicles from the engagement zone during the engagement. The machine-learning logic is trained using a reinforcement learning technique and on simulated engagements between the first and second group of vehicles to determine sequences of actions that are predicted to result in one or more vehicles of the second group being removed from the engagement zone. The machine-learning logic communicates the plurality of actions to the first group of vehicles.

SYSTEMS, APPARATUSES, AND METHODS FOR ADDITION OF PARTIAL PRODUCTS

Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand. Execution circuitry then executes the decoded single instruction to perform, for each packed data element position of the destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.

HARDWARE CHANNEL-PARALLEL DATA COMPRESSION/DECOMPRESSION

A multichannel data packer includes a plurality of two-input multiplexers and a controller. The plurality of two-input multiplexers is arranged in 2.sup.N rows and N columns in which N is an integer greater than 1. Each input of a multiplexer in a first column receives a respective bit stream of 2.sup.N channels of bit streams. Each respective bit stream includes a bit-stream length based on data in the bit stream. The multiplexers in a last column output 2.sup.N channels of packed bit streams each having a same bit-stream length. The controller controls the plurality of multiplexers so that the multiplexers in the last column output the 2.sup.N channels of bit streams that each has the same bit-stream length.

METHOD FOR EXECUTING A MACHINE CODE BY MEANS OF A MICROPROCESSOR

A method for executing a machine code using a microprocessor includes, after an operation of decoding a current loaded instruction, constructing a mask from the signals generated by an instruction decoder in response to decoding of the current loaded instruction by the decoder. The constructed mask varies as a function of the current loaded instruction. Subsequently, before an operation of decoding a next loaded instruction, the next loaded instruction is unmasked using the constructed mask.

LOOK-UP TABLE READ

A digital data processor includes an instruction memory storing instructions specifying data processing operations and a data operand field, an instruction decoder coupled to the instruction memory for recalling instructions from the instruction memory and determining the operation and the data operand, and an operational unit coupled to a data register file and an instruction decoder to perform an operation upon an operand corresponding to an instruction decoded by the instruction decoder and storing results of the operation. The operational unit is configured to perform a table recall in response to a look up table read instruction by recalling data elements from a specified location and adjacent location to the specified location, in a specified number of at least one table and storing the recalled data elements in successive slots in a destination register. Recalled data elements include at least one interpolated data element in the adjacent location.

Programmable re-order buffer for decompression

Examples described herein relate to a decompression engine that can request compressed data to be transferred over a memory bus. In some cases, the memory bus is a width that requires multiple data transfers to transfer the requested data. In a case that requested data is to be presented in-order to the decompression engine, a re-order buffer can be used to store entries of data. When a head-of-line entry is received, the entry can be provided to the decompression engine. When a last entry in a group of one or more entries is received, all entries in the group are presented in-order to the decompression engine. In some examples, a decompression engine can borrow memory resources allocated for use by another memory client to expand a size of re-order buffer available for use. For example, a memory client with excess capacity and a slowest growth rate can be chosen to borrow memory resources from.