Patent classifications
G06F9/30105
Memory device including arithmetic circuit and neural network system including the same
Provided is a memory device that includes a memory bank including a plurality of memory cells arranged in a region where a plurality of word lines and a plurality of bit lines of the memory device intersect each other, a sense amplifier configured to amplify a signal transmitted through selected bit lines among the plurality of bit lines, and an arithmetic circuit configured to receive a first operand from the sense amplifier, receive a second operand from outside the memory device, and perform an arithmetic operation by using the first operand and the second operand, based on an internal arithmetic control signal generated in the memory device.
DETECTION OF POTENTIAL NEED TO USE A LARGER DATA FORMAT IN PERFORMING FLOATING POINT OPERATIONS
Detection of whether a result of a floating point operation is safe. Characteristics of the result are examined to determine whether the result is safe or potentially unsafe, as defined by the user. An instruction is provided to facilitate detection of safe or potentially unsafe results.
EFFICIENT EXCEPTION HANDLING IN TRUSTED EXECUTION ENVIRONMENTS
Systems, methods, and apparatuses relating efficient exception handling in trusted execution environments are described. In an embodiment, a hardware processor includes a register, a decoder, and execution circuitry. The register has a field to be set to enable an architecturally protected execution environment at one of a plurality of contexts for code in an architecturally protected enclave in memory. The decoder is to decode an instruction having a format including a field for an opcode, the opcode to indicate that the execution circuitry is to perform a context change. The execution circuitry is to perform one or more operations corresponding to the instruction, the one or more operations including changing, within the architecturally protected enclave, from a first context to a second context.
Compact linked-list-based multi-threaded instruction graduation buffer
A processor and instruction graduation unit for a processor. In one embodiment, a processor or instruction graduation unit according to the present invention includes a linked-list-based multi-threaded graduation buffer and a graduation controller. The graduation buffer stores identification values generated by an instruction decode and dispatch unit of the processor as part of one or more linked-list data structures. Each linked-list data structure formed is associated with a particular program thread running on the processor. The number of linked-list data structures formed is variable and related to the number of program threads running on the processor. The graduation controller includes linked-list head identification registers and linked-list tail identification registers that facilitate reading and writing identifications values to linked-list data structures associated with particular program threads. The linked-list head identification registers determine which executed instruction result or results are next to be written to a register file.
DATA RECORDER
An apparatus that allows for access to any and all registers of a central processing unit in a line replaceable unit (LRU) without a need to open the housing of the LRU is provided. The apparatus may receive write or read packets from an external device and relay the same to an LRU. The apparatus may receive state information from one or more registers of the LRU in response. The apparatus may transmit or transfer the state information to an external device. The apparatus may be used to update firmware in the LRU, for diagnostics or testing.
Generation and use of memory access instruction order encodings
Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes selecting a next memory load or memory store instruction to execute based on dependencies encoded within the block, and on a store vector that stores data indicating which memory load and memory store instructions in the instruction block have executed. The store vector can be masked using a store mask. The store mask can be generated when decoding the instruction block, or copied from an instruction block header. Based on the encoded dependencies and the masked store vector, the next instruction can issue when its dependencies are available.
Apparatus and methods for vector operations
Aspects for vector operations in neural network are described herein. The aspects may include a vector caching unit configured to store a first vector and a second vector, wherein the first vector includes one or more first elements and the second vector includes one or more second elements. The aspects may further include one or more adders and a combiner. The one or more adders may be configured to respectively add each of the first elements to a corresponding one of the second elements to generate one or more addition results. The combiner may be configured to combine a combiner configured to combine the one or more addition results into an output vector.
Systems, apparatuses, and methods for fused multiply add
Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand. Execution circuitry then executes the decoded single instruction to perform, for each packed data element position of the destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.
Low-latency register error correction
To implement low-latency register error correction a register may be read as part of an instruction when that instruction is the currently executing instruction in a processor. A correctable error in data produced from reading the register can be detected. In response to detecting the correctable error, the currently executing instruction in the processor can be changed into a register update instruction that is executed to overwrite the data in the register with corrected data. Then, the original (e.g., unchanged) instruction can be rescheduled.
Vector processor and control method therefor
A vector processor is disclosed. The vector processor includes a plurality of register files provided to each of a plurality of single instruction multiple data (SIMD) lanes, storing each of a plurality of pieces of data, and respectively outputting input data to be used in a current cycle among the plurality of pieces of data, a shuffle unit for receiving a plurality of pieces of input data outputted from the plurality of register files, and performing shuffling such that the received plurality of pieces of input data respectively correspond to the plurality of SIMD lanes and outputting the same; and a command execution unit for performing a parallel operation by receiving input data outputted from the shuffle unit.