G06F9/345

Prefetch strategy control for parallel execution of threads based on one or more characteristics of a stream of program instructions indicative that a data access instruction within a program is scheduled to be executed a plurality of times

A single instruction multiple thread (SIMT) processor includes execution circuitry, prefetch circuitry and prefetch strategy selection circuitry. The prefetch strategy selection circuitry serves to detect one or more characteristics of a stream of program instructions that are being executed to identify whether or not a given data access instruction within a program will be executed a plurality of times. The prefetch strategy to use is selected from a plurality of selectable prefetch strategies in dependence upon the detection of such detected characteristics.

Streaming engine with deferred exception reporting

This invention is a streaming engine employed in a digital signal processor. A fixed data stream sequence is specified by a control register. The streaming engine fetches stream data ahead of use by a central processing unit and stores it in a stream buffer. Upon occurrence of a fault reading data from memory, the streaming engine identifies the data element triggering the fault preferably storing this address in a fault address register. The streaming engine defers signaling the fault to the central processing unit until this data element is used as an operand. If the data element is never used by the central processing unit, the streaming engine never signals the fault. The streaming engine preferably stores data identifying the fault in a fault source register. The fault address register and the fault source register are preferably extended control registers accessible only via a debugger.

Streaming engine with deferred exception reporting

This invention is a streaming engine employed in a digital signal processor. A fixed data stream sequence is specified by a control register. The streaming engine fetches stream data ahead of use by a central processing unit and stores it in a stream buffer. Upon occurrence of a fault reading data from memory, the streaming engine identifies the data element triggering the fault preferably storing this address in a fault address register. The streaming engine defers signaling the fault to the central processing unit until this data element is used as an operand. If the data element is never used by the central processing unit, the streaming engine never signals the fault. The streaming engine preferably stores data identifying the fault in a fault source register. The fault address register and the fault source register are preferably extended control registers accessible only via a debugger.

Method and device for managing operation of a computing unit capable of operating with instructions of different sizes

An integrated circuit comprises a processing unit configured for booting up with a set of boot instructions, then for determining the size of the instructions of an application programme and potentially rebooting on its own initiative, while being reconfigured, in order for it to execute the instructions of the application program. Only one boot memory is needed as a consequence.

Method and device for managing operation of a computing unit capable of operating with instructions of different sizes

An integrated circuit comprises a processing unit configured for booting up with a set of boot instructions, then for determining the size of the instructions of an application programme and potentially rebooting on its own initiative, while being reconfigured, in order for it to execute the instructions of the application program. Only one boot memory is needed as a consequence.

BRANCH PREDICTION CIRCUIT AND INSTRUCTION PROCESSING METHOD
20220350608 · 2022-11-03 · ·

A branch prediction circuit includes a branch target address storage circuitry, a higher order address storage circuitry, an address generation circuitry, and a branch instruction execution circuitry. The branch target address storage circuitry stores a first address of a branch instruction executed in the past, a lower order address of a second address of an instruction to be executed next, and information pertaining to a reference target for a higher order address of the second address and to whether or not reference is needed. The higher order address storage circuitry stores the higher order address of the second address. The address generation circuitry generates the second address when a third address of an instruction to be newly executed matches the first address. The branch instruction execution circuitry provides an instruction for speculative execution of the instruction having the second address.

PROCESSING UNIT AND PROCESSING SYSTEM
20220350636 · 2022-11-03 ·

Provided is a processing unit including a parallel processing determination unit configured to determine whether parallel processing is to be performed on at least some of processing target data, a VM execution control unit configured to cause each of a plurality of cores included in a parallel processing execution unit to execute a VM when the parallel processing determination unit determines that the parallel processing is to be performed, a storage control unit configured to divide at least some of the processing target data into partial data that can be processed in parallel and to cause a memory of the parallel processing execution unit to store the partial data, and a processing result obtaining unit configured to obtain a plurality of processing results of the partial data processed by VMs of the plurality of cores from the parallel processing execution unit.

PROCESSING UNIT AND PROCESSING SYSTEM
20220350636 · 2022-11-03 ·

Provided is a processing unit including a parallel processing determination unit configured to determine whether parallel processing is to be performed on at least some of processing target data, a VM execution control unit configured to cause each of a plurality of cores included in a parallel processing execution unit to execute a VM when the parallel processing determination unit determines that the parallel processing is to be performed, a storage control unit configured to divide at least some of the processing target data into partial data that can be processed in parallel and to cause a memory of the parallel processing execution unit to store the partial data, and a processing result obtaining unit configured to obtain a plurality of processing results of the partial data processed by VMs of the plurality of cores from the parallel processing execution unit.

METHOD OF EXECUTING OPERATION, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM
20220350607 · 2022-11-03 ·

A method of executing an operation in a deep learning training, an electronic device, and a computer-readable storage medium, which relate to a field of artificial intelligence, especially to a field of deep learning. The method of executing an operation in a deep learning training includes: acquiring an instruction for the operation including a plurality of vector operations; determining, for each vector operation of the plurality of vector operations, two source operand vectors for a comparison; and executing the vector operation on the two source operand vectors using an instruction format for the vector operation, so as to obtain an operation result including a destination operand vector.

METHOD OF EXECUTING OPERATION, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM
20220350607 · 2022-11-03 ·

A method of executing an operation in a deep learning training, an electronic device, and a computer-readable storage medium, which relate to a field of artificial intelligence, especially to a field of deep learning. The method of executing an operation in a deep learning training includes: acquiring an instruction for the operation including a plurality of vector operations; determining, for each vector operation of the plurality of vector operations, two source operand vectors for a comparison; and executing the vector operation on the two source operand vectors using an instruction format for the vector operation, so as to obtain an operation result including a destination operand vector.