G06F7/38

RATE DOMAIN NUMERICAL PROCESSING CIRCUIT AND METHOD

Rate domain numerical processing comprises receiving an input serial data stream on a single input wire in which a multi-valued number is represented as a rate of pulse events comprising pulses, null pulses or a combination thereof. The rate of pulse events in an output serial data stream is varied in accordance with an operand to perform a multi-valued arithmetic operation selected from multiplication, division, addition and subtraction or a combination thereof. The rate may be varied by scaling the rate by an operand, which may be implemented using a count and compare circuit topology. The output serial data stream is output on a single output wire. The rate domain operations, specifically multiplication and division are accomplished without the resource & power intensive binary multiplier and binary divisions circuits. The operations are implemented using simple registers, adders, accumulators, counters, comparators, and basic logic, which is far more SWaP-C efficient.

Method and device for dynamically adjusting decimal point positions in neural network computations

The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.

Lightweight profiling using branch history

Technical solutions are described for profiling an execution of a computer program. An example method includes setting a program-counter indicator to a first state in response to updating a program counter register. The method further includes profiling the execution of the computer program by periodically sampling the program counter register according to a sampling time-interval. Sampling the program counter register includes recording a content of the program counter register in response to the program-counter indicator being in the first state. The method also includes increasing the sampling time-interval.

MAGNETIC PROCESSING UNIT
20230168860 · 2023-06-01 · ·

A magnetic processing unit (“MPU”) includes a magnetic element with one or more input channels and one or more output channels. The magnetic element can acquire magnetic arrangement configured to perform a predetermined operation on the received input signal and provide a resulting output signal.

Computation memory operations in a logic layer of a stacked memory

Some die-stacked memories will contain a logic layer in addition to one or more layers of DRAM (or other memory technology). This logic layer may be a discrete logic die or logic on a silicon interposer associated with a stack of memory dies. Additional circuitry/functionality is placed on the logic layer to implement functionality to perform various computation operations. This functionality would be desired where performing the operations locally near the memory devices would allow increased performance and/or power efficiency by avoiding transmission of data across the interface to the host processor.

Instruction for determining histograms
09804839 · 2017-10-31 · ·

A processor is described having a functional unit of an instruction execution pipeline. The functional unit has comparison bank circuitry and adder circuitry. The comparison bank circuitry is to compare one or more elements of a first input vector against an element of a second input vector. The adder circuitry is coupled to the comparison bank circuitry to add the number of elements of the second input vector that match a value of the first input vector on an element by element basis of the first input vector.

Instruction cache behavior and branch prediction

Instruction cache behavior and branch prediction are used to improve the functionality of a computing device by profiling branching instructions in an instruction cache to identify likelihoods of proceeding to a plurality of targets from the branching instructions; identifying a hot path in the instruction cache based on the identified likelihoods; and rearranging the plurality of targets relative to one another and associated branching instructions so that a first branching instruction that has a higher likelihood of proceeding to a first hot target than to a first cold target and that previously flowed to the first cold target and jumped to the first hot target instead flows to the first hot target and jumps to the first cold target.

ARITHMETIC PROCESSING DEVICE AND CONTROL METHOD THEREOF
20170300298 · 2017-10-19 · ·

An arithmetic processing device includes arithmetic processing cores, and a control circuit that includes a request port accepting a request for a memory space; a processing circuit unit that executes processing of the request; a control pipeline that determines whether or not the processing is executable by the processing circuit unit on the request input through the request port, and that executes first abort processing for the request when the processing is not executable on the request, and issues the processing to the processing circuit unit when the processing is executable; and an identical-address request arbitration circuit that holds an occurrence order of requests with an identical address that is aborted due to the processing being not executable, and that executes second abort processing on those of requests input to the control pipeline which have the identical address and which are other than a leading request in the occurrence order.

ARITHMETIC PROCESSING DEVICE AND CONTROL METHOD THEREOF
20170300298 · 2017-10-19 · ·

An arithmetic processing device includes arithmetic processing cores, and a control circuit that includes a request port accepting a request for a memory space; a processing circuit unit that executes processing of the request; a control pipeline that determines whether or not the processing is executable by the processing circuit unit on the request input through the request port, and that executes first abort processing for the request when the processing is not executable on the request, and issues the processing to the processing circuit unit when the processing is executable; and an identical-address request arbitration circuit that holds an occurrence order of requests with an identical address that is aborted due to the processing being not executable, and that executes second abort processing on those of requests input to the control pipeline which have the identical address and which are other than a leading request in the occurrence order.

Processing with compact arithmetic processing element
09792088 · 2017-10-17 · ·

A processor or other device, such as a programmable and/or massively parallel processor or other device, includes processing elements designed to perform arithmetic operations (possibly but not necessarily including, for example, one or more of addition, multiplication, subtraction, and division) on numerical values of low precision but high dynamic range (“LPHDR arithmetic”). Such a processor or other device may, for example, be implemented on a single chip. Whether or not implemented on a single chip, the number of LPHDR arithmetic elements in the processor or other device in certain embodiments of the present invention significantly exceeds (e.g., by at least 20 more than three times) the number of arithmetic elements, if any, in the processor or other device which are designed to perform high dynamic range arithmetic of traditional precision (such as 32 bit or 64 bit floating point arithmetic).