Patent classifications
G06F9/3552
STREAMING ENGINE WITH COMPRESSED ENCODING FOR LOOP CIRCULAR BUFFER SIZES
A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register specifies a circular address mode for the loop, first and second block size numbers and a circular address block size selection. For a first circular address block size selection the block size corresponds to the first block size number. For a first circular address block size selection the block size corresponds to the first block size number. For a second circular address block size selection the block size corresponds to a sum of the first block size number and the second block size number.
PROCESSOR PREFETCH THROTTLING BASED ON SHORT STREAMS
In an embodiment, a processor comprises a prefetch history array and a prefetch circuit. The prefetch history array comprises a plurality of entries corresponding to prefetch addresses, each entry of the plurality of entries comprising a sublength value associated with a frequency that a stride is repeated. The prefetch circuit is to: for each entry of the plurality of entries, adjust the sublength value based on stride matches for an address of the entry; adjust a short stream counter based on the sublength values of the plurality of entries in the prefetch history array; determine whether the short stream counter has exceeded a throttling threshold; and in response to a determination that the short stream counter has exceeded the throttling threshold, throttle a prefetch level of the prefetch circuit. Other embodiments are described and claimed.
STREAMING ENGINE WITH MULTI DIMENSIONAL CIRCULAR ADDRESSING SELECTABLE AT EACH DIMENSION
A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register independently specifies a linear address or a circular address mode for each of the nested loops.
Streaming engine with compressed encoding for loop circular buffer sizes
A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register specifies a circular address mode for the loop, first and second block size numbers and a circular address block size selection. For a first circular address block size selection the block size corresponds to the first block size number. For a first circular address block size selection the block size corresponds to the first block size number. For a second circular address block size selection the block size corresponds to a sum of the first block size number and the second block size number.
Systems and methods for high speed remote change data capture
A system and method are described for change data capture wherein a source database is replicated to a remote database. In one embodiment, a data storage device contains source log blocks containing information regarding database transactions processed for a source database, wherein the data storage device and source database are located in a first location, and at least some of the source log blocks include a plurality of database transactions in the same order as processed in the source database. Messages containing source log blocks are sent from the first location to a second location geographically remote from the first location. One or more components at the second location receive source log blocks from the messages, extract the source transaction data from the source log blocks, and use the source transaction data to write database transaction information to a second database based on the extracted source transaction data.
Data processing apparatus and method for decoding program instructions in order to generate control signals for processing circuitry of the data processing apparatus
A data processing apparatus and method for accessing operands stored within a set of registers. Instruction decoder circuitry, responsive to program instructions, generates register access control signals identifying for each program instruction which registers in the register set are to be accessed by the processing circuitry when performing the processing operation specified by that program instruction. The set of registers are logically arranged as a plurality of register groups, with each register in the set being a member of more than one register group. Each program instruction includes a register specifier field, and instruction decoder circuitry is responsive to each program instruction to determine a selected register group, and to determine one or more selected members of that selected register group. The instruction decoder circuitry then outputs register access control signals identifying the register corresponding to each selected member of the selected register group.
Streaming engine with multi dimensional circular addressing selectable at each dimension
A streaming engine employed in a digital data processor may specify a fixed read-only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register independently specifies a linear address or a circular address mode for each of the nested loops.
SYSTEM AND METHOD FOR MODULO ADDRESSING VECTORIZATION WITH INVARIANT CODE MOTION
A processing device includes a target processor instruction memory to store a plurality of memory access instructions, and a compiler. A vector invariant candidate detection module of the compiler to determine whether the memory access instruction is to be replaced by vector invariant access code, and in response: the complier to generate first replacement code that vectorizes the memory access instruction using vector invariant access code, and to replace the memory access instruction with the first replacement code. A vector modulo addressing candidate detection module of the compiler to determine whether the memory access instruction is to be replaced by vector modulo addressing access code, and in response: the complier to generate second replacement code that vectorizes the memory access instruction using vector modulo addressing code, and to replace the memory access instruction with the second replacement code.
Accelerating processor based artificial neural network computation
An apparatus employed in a processing device comprises a processor configured to process data of a predefined data structure. A memory fetch device is coupled to the processor and is configured to determine addresses of the packed data for the processor. The packed data is stored on a memory device that is coupled to the processor. The memory fetch device is further configured to provide output data based on the addresses of the packed data to the processor, where the output data is configured according to the predefine data structure.
Integrated circuit with control node circuitry and processing circuitry
Traditionally, providing parallel processing within a multi-core system has been very difficult. Here, however, a system is provided where serial source code is automatically converted into parallel source code, and a processing cluster is reconfigured on the fly to accommodate the parallelized code based on an allocation of memory and compute resources. Thus, the processing cluster and its corresponding system programming tool provide a system that can perform parallel processing from a serial program that is transparent to a user. Generally, a control node connected to the address and data leads of a host processor uses messages to control the processing of data in a processing cluster. The cluster includes nodes of parallel processors, shared function memory, a global load/store, and hardware accelerators all connected to the control node by message busses. A crossbar data interconnect routes data to the cluster circuits separate from the message busses.