Patent classifications
G06F9/30
Reference voltage training scheme
Various aspects of the subject technology relate to systems, methods, and machine-readable media for DDR reference voltage training. The method includes receiving a data stream, the data stream including pulses generated from a reference voltage in relation to a voltage input logic low and a voltage input logic high of an input stream. The method also includes receiving a clock signal, the clock signal including an in-phase signal and a quadrature-phase signal, the in-phase signal orthogonal to the quadrature-phase signal. The method also includes utilizing the in-phase signal and the quadrature-phase signal of the clock signal in relation to the data stream to obtain a stream of in-phase samples and a stream of quadrature-phase samples. The method also includes adjusting the reference voltage based on a relationship of the stream of in-phase samples to the stream of quadrature-phase samples.
Signed multiplication using unsigned multiplier with dynamic fine-grained operand isolation
An N×N multiplier may include a N/2×N first multiplier, a N/2×N/2 second multiplier, and a N/2×N/2 third multiplier. The N×N multiplier receives two operands to multiply. The first, second and/or third multipliers are selectively disabled if an operand equals zero or has a small value. If the operands are both less than 2.sup.N/2, the second or the third multiplier are used to multiply the operands. If one operand is less than 2.sup.N/2 and the other operand is equal to or greater than 2.sup.N/2, the first multiplier is used or the second and third multipliers are used to multiply the operands. If both operands are equal to or greater than 2.sup.N/2, the first, second and third multipliers are used to multiply the operands.
Multiplication and accumulation (MAC) operator
A MAC operator includes a plurality of multipliers, a plurality of floating-point to fixed-point converters, an adder tree, an accumulator, and a fixed-point to floating-point converter. Each of the plurality of multipliers may perform a multiplication operation on first data and second data of a single-precision floating-point (FP32) format to output multiplication result data of the FP 32 format. Each of the plurality of floating-point to fixed-point converters may convert the FP 32 format into a fixed-point format. The adder tree may perform a first addition operation on the data of the fixed-point format. The accumulator may perform an accumulation operation on the data output from the adder tree. And the fixed-point to floating-point converter may convert the data of the fixed-point format into data of the FP32 format.
Implicit integrity for cryptographic computing
In one embodiment, a processor includes a memory hierarchy and a core coupled to the memory hierarchy. The memory hierarchy stores encrypted data, and the core includes circuitry to access the encrypted data stored in the memory hierarchy, decrypt the encrypted data to yield decrypted data, perform an entropy test on the decrypted data, and update a processor state based on a result of the entropy test. The entropy test may include determining a number of data entities in the decrypted data whose values are equal to one another, determining a number of adjacent data entities in the decrypted data whose values are equal to one another, determining a number of data entities in the decrypted data whose values are equal to at least one special value from a set of special values, or determining a sum of n highest data entity value frequencies.
Variable-length instruction buffer management
A vector processor is disclosed including a variety of variable-length instructions. Computer-implemented methods are disclosed for efficiently carrying out a variety of operations in a time-conscious, memory-efficient, and power-efficient manner. Methods for more efficiently managing a buffer by controlling the threshold based on the length of delay line instructions are disclosed. Methods for disposing multi-type and multi-size operations in hardware are disclosed. Methods for condensing look-up tables are disclosed. Methods for in-line alteration of variables are disclosed.
Variable-length instruction buffer management
A vector processor is disclosed including a variety of variable-length instructions. Computer-implemented methods are disclosed for efficiently carrying out a variety of operations in a time-conscious, memory-efficient, and power-efficient manner. Methods for more efficiently managing a buffer by controlling the threshold based on the length of delay line instructions are disclosed. Methods for disposing multi-type and multi-size operations in hardware are disclosed. Methods for condensing look-up tables are disclosed. Methods for in-line alteration of variables are disclosed.
Broadside random access memory for low cycle memory access and additional functions
A computational system includes one or more processors. Each processor has multiple registers, as well attached memory to hold instructions. The processor is coupled to one or more broadside interfaces. A broadside interface allows the processor to load or store an entire widget state in a single clock cycle of the processor. The broadside interface also allows the processor to move and store 32 bytes of information into RAM in less than four to five clock cycles of the processor while the processor concurrently performs one or more mathematical operations on the information while the move and store operation is taking place.
Dynamic graphical processing unit register allocation
Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.
Processor with conditional-fence commands excluding designated memory regions
An apparatus includes a processor, configured to designate a memory region in a memory, and to issue (i) memory-access commands for accessing the memory and (ii) a conditional-fence command associated with the designated memory region. Memory-Access Control Circuitry (MACC) is configured, in response to identifying the conditional-fence command, to allow execution of the memory-access commands that access addresses within the designated memory region, and to defer the execution of the memory-access commands that access addresses outside the designated memory region, until completion of all the memory-access commands that were issued before the conditional-fence command.
Handling an input/output store instruction
An input/output store instruction is handled. A data processing system includes a system nest coupled to at least one input/output bus by an input/output bus controller. The data processing system further includes at least a data processing unit including a core, system firmware and an asynchronous core-nest interface. The data processing unit is coupled to the system nest via an aggregation buffer. The system nest is configured to asynchronously load from and/or store data to at least one external device which is coupled to the at least one input/output bus. The data processing unit is configured to complete the input/output store instruction before an execution of the input/output store instruction in the system nest is completed. The asynchronous core-nest interface includes an input/output status array with multiple input/output status buffers.