G06F7/44

METHOD OF DETERMINING THE CENTER OF LOADING OF A ROLLING ELEMENT

A method of determining the center of loading of a rolling element includes providing a rolling element body and at least three load sensors. The sensors are each positioned within a bore of the rolling element body at a separate distance from a reference position. Load measurements are taken with each one of the sensors at various positions about the circumference of the bearing and the center of loading is calculated at each one of the positions to determine the variation in axial loading about the bearing circumference.

Neural network processor using dyadic weight matrix and operation method thereof
11562046 · 2023-01-24 · ·

An neural network (NN) processor includes an input feature map buffer configured to store an input feature matrix, a weight buffer configured to store a weight matrix trained in a form of a, a transform circuit configured to perform a Walsh-Hadamard transform on an input feature vector obtained from the input feature matrix and a weight vector included in the weight matrix to output a transformed input feature vector and a transformed weight vector, and an arithmetic circuit configured to perform an element-wise multiplication (EWM) on the transformed input feature vector and the transformed weight vector.

Neural network processor using dyadic weight matrix and operation method thereof
11562046 · 2023-01-24 · ·

An neural network (NN) processor includes an input feature map buffer configured to store an input feature matrix, a weight buffer configured to store a weight matrix trained in a form of a, a transform circuit configured to perform a Walsh-Hadamard transform on an input feature vector obtained from the input feature matrix and a weight vector included in the weight matrix to output a transformed input feature vector and a transformed weight vector, and an arithmetic circuit configured to perform an element-wise multiplication (EWM) on the transformed input feature vector and the transformed weight vector.

LARGE INTEGER MULTIPLICATION ENHANCEMENTS FOR GRAPHICS ENVIRONMENT

An apparatus to facilitate large integer multiplication enhancements in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a multiplication operation, wherein the multiplication operation is part of a chain of multiplication operations for a large integer multiplication; and issue a multiply and add (MAD) instruction for the multiplication operation utilizing at least one of a double precision multiplier or a 48 bit output, wherein the MAD instruction to generate an output in a single clock cycle of the processor.

LARGE INTEGER MULTIPLICATION ENHANCEMENTS FOR GRAPHICS ENVIRONMENT

An apparatus to facilitate large integer multiplication enhancements in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a multiplication operation, wherein the multiplication operation is part of a chain of multiplication operations for a large integer multiplication; and issue a multiply and add (MAD) instruction for the multiplication operation utilizing at least one of a double precision multiplier or a 48 bit output, wherein the MAD instruction to generate an output in a single clock cycle of the processor.

GENERALIZED ACCELERATION OF MATRIX MULTIPLY ACCUMULATE OPERATIONS

A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.

Logic simulation of circuit designs using on-the-fly bit reduction for constraint solving
11615225 · 2023-03-28 · ·

A system performs logic simulation of a circuit design specified using a hardware description language such as Verilog. The system performs constraint solving based on an expression specified in the specification of the circuit design. The system identifies required bits for each variable in the expression. The number of required bits is less than the number of bits specified in the variable declaration. The system performs bit-level constraint solving by performing a bit operation on the set of required bits and a simplified processing of the remaining bits of the variable. Since the original circuit design is preserved with the original bit-widths for simulation, those required bits are used on the fly internally during constraint solving. Furthermore, dynamic bit reductions on arithmetic operations are performed on the fly. The system improves computational efficiency by restricting bit operations to fewer bits of variables and operators of the expression.

Mixer with improved linearity

Mixers with improved linearity are disclosed. A diode or FET ring mixer is implemented with at least one parallel shunt element coupled with the ring mixer, the shunt element providing shunt to a diode or FET, for example, to reduce the effect of nonlinear or off resistance and/or capacitance. Linearity, isolation, symmetry, even order harmonics of the ring mixer, or any combination thereof can be improved as a result. The linearity of the ring mixer with parallel shunt resistors can be further improved by adding series resistors in the ring according to certain embodiments.

NEURAL NETWORK ACCELERATOR WITH CONFIGURABLE POOLING PROCESSING UNIT
20230259743 · 2023-08-17 ·

A neural network accelerator includes a plurality of hardware processing units, each hardware processing unit comprising hardware to accelerate performing one or more neural network operations on data; and a crossbar coupled to each hardware processing unit of the plurality of hardware processing units and configured to selectively form, from a plurality of selectable pipelines, a pipeline from one or more of the hardware processing units of the plurality of hardware processing units to process input data to the neural network accelerator. The plurality of hardware processing units comprising (i) a convolution processing unit configured to accelerate performing convolution operations on data, and (ii) a configurable pooling processing unit configured to selectively perform an operation of a plurality of selectable operations on data, the plurality of selectable operations comprising a depth-wise convolution operation and one or more pooling operations.

CORE GROUP MEMORY PROCESSSING WITH MAC REUSE

A multi-accumulator multiply-and-accumulate (MAC) unit can include a multiplier and a plurality of accumulators. The multiplier can be configured to multiply a given element of a corresponding column of a first matrix and a plurality of elements of a corresponding row of a second matrix to generate a plurality of corresponding partial product elements that can be accumulated by corresponding ones of the plurality of accumulators.