G06F9/312

Hardware processors and methods for tightly-coupled heterogeneous computing

Methods and apparatuses relating to tightly-coupled heterogeneous computing are described. In one embodiment, a hardware processor includes a plurality of execution units in parallel, a switch to connect inputs of the plurality of execution units to outputs of a first buffer and a plurality of memory banks and connect inputs of the plurality of memory banks and a plurality of second buffers in parallel to outputs of the first buffer, the plurality of memory banks, and the plurality of execution units, and an offload engine with inputs connected to outputs of the plurality of second buffers.

Non-default instruction handling within transaction

Embodiments relate to non-default instruction handling within a transaction. An aspect includes entering a transaction, the transaction comprising a first plurality of instructions and a second plurality of instructions, wherein a default manner of handling of instructions in the transaction is one of atomic and non-atomic. Another aspect includes encountering a non-default specification instruction in the transaction, wherein the non-default specification instruction comprises a single instruction that specifies the second plurality of instructions of the transaction for handling in a non-default manner comprising one of atomic and non-atomic, wherein the non-default manner is different from the default manner. Another aspect includes handling the first plurality of instructions in the default manner. Yet another aspect includes handling the second plurality of instructions in the non-default manner.

Systems and methods for using error correction and pipelining techniques for an access triggered computer architecture

A method for improving performance of an access triggered architecture for a computer implemented application is provided. The method first executes typical operations of the access triggered architecture according to an execution time, wherein the typical operations comprise: obtaining a dataset and an instruction set; and using the instruction set to transmit the dataset to a functional block associated with an operation, wherein the functional block performs the operation using the dataset to generate a revised dataset. The method further creates a pipeline of the typical operations to reduce the execution time of the typical operations, to create a reduced execution time; and executes the typical operations according to the reduced execution time, using the pipeline.

Streaming engine with flexible streaming engine template supporting differing number of nested loops with corresponding loop counts and loop offsets
10339057 · 2019-07-02 · ·

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.

Transaction end plus commit to persistence instructions, processors, methods, and systems

A processor of an aspect includes a decode unit to decode a transaction end plus commit to persistence instruction. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the instruction, is to atomically ensure that data associated with all prior store to memory operations made to a persistent memory, which are to have been accepted to memory when performance of the instruction begins, but which are not necessarily to have been stored in the persistent memory when the performance of the instruction begins, are to be stored in the persistent memory before the instruction becomes globally visible. The execution unit, in response to the instruction, is also to atomically end a transactional memory transaction before the instruction becomes globally visible.

Data processing

Data processing circuitry comprises out-of-order instruction execution circuitry to execute program instructions in an instruction execution order; a data store, to store information on a set of instructions for which execution has been initiated, the data store providing ordering information indicating the relative position of each instruction in the set of instructions with respect to a program code order; commit circuitry to commit the results of instructions executed by the instruction execution circuitry; one or more cumulative status registers configured to be set in response to a respective condition generated by execution of an instruction and then to remain set until an unset instruction is executed; and an identifier store, to store for at least those of the one or more cumulative status registers which are not currently set, an identifier of an instruction which is earliest in the program code order in the set of instructions and which generated a condition to set that cumulative status register.

Data read/write method and apparatus, storage device, and computer system
10303474 · 2019-05-28 · ·

Embodiments of the present invention provide a data read/write method and apparatus, a storage device, and a computer system, so as to reduce completion time of a data read/write operation in a multi-core computer system. The method includes: determining, by a host device, N cores used for executing a target process, where the N cores are in a one-to-one correspondence with N execution threads included in the target process; grouping the N execution threads to determine M execution thread groups, and allocating an indication identifier to each execution thread group; and sending M data read/write instructions to a storage device, where each data read/write instruction includes an indication identifier of a corresponding execution thread group.

Computer vision processing in hardware data paths
10296351 · 2019-05-21 · ·

An apparatus includes a processor and a coprocessor. The processor may be configured to generate a command to run a directed acyclic graph. The coprocessor may be configured to (i) receive the command from the processor, (ii) parse the directed acyclic graph into a data flow including one or more operators, (iii) schedule the operators in one or more data paths and (iv) generate one or more output vectors by processing one or more input vectors in the data paths. The data paths may be implemented with a plurality of hardware engines. The hardware engines may operate in parallel to each other. The coprocessor may be implemented solely in hardware.

Method and apparatus for cleaning files in a mobile terminal and associated mobile terminal
10275355 · 2019-04-30 · ·

A method for cleaning files stored in a mobile terminal is disclosed. The mobile terminal receives a file cleaning instruction from a user. In response to the file cleaning instruction, the mobile terminal identifies cache files based on the cache files' associated information and past user activities on the cache files and groups the identified cache files and their associated information into multiple cache file categories. At least one of the multiple cache file categories is located in an extended storage device of the mobile terminal (e.g., a SD card). Next, the mobile terminal displays information of the multiple cache file categories on the display, each cache file category having an associated file cleaning option and cleans at least one of the multiple cache file categories from the mobile terminal in accordance with a user choice of the corresponding file cleaning option.

Data processing apparatus and method for controlling performance of speculative vector operations
10261789 · 2019-04-16 · ·

A data processing apparatus and a method of controlling performance of speculative vector operations are provided. The apparatus comprises processing circuitry for performing a sequence of speculative vector operations on vector operands, each vector operand comprising a plurality of vector elements, and speculation control circuitry for maintaining a speculation width indication indicating the number of vector elements of each vector operand to be subjected to the speculative vector operations. The speculation width indication is set to an initial value prior to performance of the sequence of speculative vector operations. The processing circuitry generates progress indications during performance of the sequence of speculative vector operations, and the speculation control circuitry detects, with reference to the progress indications and speculation reduction criteria, presence of a speculation reduction condition. The speculation reduction condition is a condition indicating that a reduction in the speculation width indication is expected to improve at least one performance characteristic of the data processing apparatus relative to continued operation without the reduction in the speculation width indication. The speculation control circuitry is responsive to detection of the speculation reduction condition to reduce the speculation width indication. This can significantly increase performance (for example in terms of throughput and/or energy consumption) when performing speculative vector operations.