Patent classifications
G06F9/30141
Compiler for RISC processor having specialized registers
A compiler is disclosed. The compiler is configured to generate executable code based on source code, where the source code includes a plurality of variables. The compiler includes an executable code generator configured to allocate a register to each of the source code variables, where the executable code generator is configured to select one of a group of register types to be allocated for each variable, and where the allocated register of each variable corresponds with the determined register type determined therefor.
ROTATING ACCUMULATOR
A processing unit for generating an output vector is provided. The processing unit comprises an output vector register and a vector unit and is configured to execute machine code instructions, each instruction being an instance of a predefined set of instruction types in an instruction set of the processing unit. The instruction set includes a vector processing instruction defined by a corresponding opcode, which causes the processing unit to: i) process, using the vector unit, at least two input vectors to generate a result value; ii) perform a rotation operation on the plurality of elements of the output register in which the result value or a value based on the result value is placed in the first end element of the output register.
Load-store instruction for performing multiple loads, a store, and strided increment of multiple addresses
A processor having an instruction set including a load-store instruction having operands specifying, from amongst the registers in at least one register file, a respective destination of each of two load operations, a respective source of a store operation, and a pair of address registers arranged to hold three memory addresses, the three memory addresses being a respective load address for each of the two load operations and a respective store address for the store operation. The load-store instruction further includes three stride operands each specifying a respective stride value for each of the two load addresses and one store address, wherein at least some possible values of each stride operand specify the respective stride value by specifying one of a plurality of fields within a stride register in one of the one or more register files, each field holding a different stride value.
Microprocessor with shared functional unit for executing multi-type instructions
A microprocessor that includes a shared functional unit, a first execution queue and a second execution queue is introduced. The first execution queue includes a plurality of entries, wherein each entry of the first execution queue includes a first count value which is decremented until the first count value reaches 0. The first execution queue dispatches the first-type instruction to the shared functional unit when the first count value reaches 0. The second execution queue include a plurality of entries, wherein each entry of the second execution queue comprises a second count value which is decremented until the second count value reaches 0. The second execution queue dispatches the second-type instruction to the shared functional unit when the second count value reaches 0. The issue unit resolves all data dependencies and resource conflicts so that the first and second count values are preset for the first-type and second-type instructions to be mutually executed at the exact time in the future by the shared functional unit.
NEURAL NETWORK ACTIVATION COMPRESSION WITH NON-UNIFORM MANTISSAS
Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.
EXTENSION OF REGISTER FILES FOR LOCAL PROCESSING OF DATA IN COMPUTING ENVIRONMENTS
A mechanism is described for facilitating extension of register files in computing environments. A method of embodiments, as described herein, includes facilitating, inside an extended register file, performance of one or more tasks relating to an instruction, where the one or more tasks are performed by an extension mechanism being hosted inside the extended register file of a computing device.
EMULATED MULTIPORT MEMORY ELEMENT CIRCUITRY WITH EXCLUSIVE-OR BASED CONTROL CIRCUITRY
Integrated circuits may include memory element circuitry. The memory element circuitry may include multiple dual-port memory elements that are controlled to effectively form a multi-port memory element having multiple read and write ports. A respective bank of dual-port memory elements may be coupled to each write port. Write data may be received concurrently over one or more of the write ports and stored on the banks. Switching circuitry may be coupled between the banks and the read ports of the memory element circuitry. The switching circuitry may be controlled using read control signals generated by logic XOR-based control circuitry. The control circuitry may include dual-port memory elements that store addressing signals associated with the write data. The read control signals may control the switching circuitry to selectively route the most-recently written data to corresponding read ports during a data read operation.
Fibre Channel Scale-Out With Physical Path Discovery and Volume Move
Methods, storage arrays and computer readable media for path discovery to ports of a Fibre Channel storage system that includes a multi-array pool and is part of a group of arrays are provided. One example method includes executing a pull operation via a group leader array of the group of arrays. The pull operation is configured to gather port status of each one of the arrays in the group of arrays. The method further executes a push operation via the group leader array of the group of arrays. The push operation is configured to populate a local cache of each array in the group of arrays with the port status of each one of the arrays in the group of arrays. The method executes the pull operation and the push operation on a periodic schedule, such that changes that occur at particular ones of the arrays of the group of arrays are pushed to each one of the arrays in the group of arrays. Configurations for enabling volume moves, striping data across arrays and pools, pool creation, pool deletes, pool adds, group merge and other processes are also provided.
Graphic Processor Unit with Improved Energy Efficiency
A GPU architecture employs a crossbar switch to preferentially store operand vectors in a compressed form allowing reduction in the number of memory circuits that must be activated during an operand fetch and to allow existing execution units to be used for scalar execution. Scalar execution can be performed during branch divergence.
Processor having read shifter and controlling method using the same
A processor that includes a register file, a read shifter, a decode unit and a plurality of functional units is introduced. The register file includes a read port. The read shifter includes a plurality of shifter entries and is configured to shift out a shifter entry among the plurality of shifter entries every clock cycle. Each of the plurality of shifter entries is associated with a clock cycle and each of the plurality of shifter entries comprises a read value that indicates an availability of the read port of the register file for a read operation in the clock cycle. The decode unit is coupled to the read shifter and is configured to decode and issue an instruction based on the read values included in the plurality of shifter entries of the read shifter. The plurality of functional units is coupled to the decode unit and the register file and is configured to execute the instruction issued by the decode unit and perform the read operation to the read port of the register file.