G06F7/575

Free-form integration of machine learning model primitives

A processor may include a set of primitive operators, receive a set of data-driven operators, at least one of the set of data-driven operators including a machine learning model, and receive an input-output data pair set. Based on a grammar specifying rules for linking the set of primitive operators and the set of data-driven operators, the processor may search among the set of primitive operators and the set of data-driven operators to find a symbolic model that fits the input-output data set.

Free-form integration of machine learning model primitives

A processor may include a set of primitive operators, receive a set of data-driven operators, at least one of the set of data-driven operators including a machine learning model, and receive an input-output data pair set. Based on a grammar specifying rules for linking the set of primitive operators and the set of data-driven operators, the processor may search among the set of primitive operators and the set of data-driven operators to find a symbolic model that fits the input-output data set.

MEMORY DEVICE AND COMPUTING METHOD THEREOF
20230118468 · 2023-04-20 ·

A memory device and a computing method thereof are provided in the present disclosure. The computing method includes the following steps. A plurality of input-values of a model computation are respectively received through a plurality of first-word-lines of a memory array. Inverted logic values of the input-values are respectively received through a plurality of second-word-lines. The input-values are respectively received through a plurality of first-bit-lines. The inverted logic values are respectively received through a plurality of second-bit-lines. Logic XNOR operation is performed according to each of the input-values and each of the inverted values to obtain a first computation result, and multiplied with one of self-coefficients or one of mutual coefficients of the model computation to obtain a plurality of output-values. The output-values are outputted through a plurality of common-source-lines.

MEMORY DEVICE AND COMPUTING METHOD THEREOF
20230118468 · 2023-04-20 ·

A memory device and a computing method thereof are provided in the present disclosure. The computing method includes the following steps. A plurality of input-values of a model computation are respectively received through a plurality of first-word-lines of a memory array. Inverted logic values of the input-values are respectively received through a plurality of second-word-lines. The input-values are respectively received through a plurality of first-bit-lines. The inverted logic values are respectively received through a plurality of second-bit-lines. Logic XNOR operation is performed according to each of the input-values and each of the inverted values to obtain a first computation result, and multiplied with one of self-coefficients or one of mutual coefficients of the model computation to obtain a plurality of output-values. The output-values are outputted through a plurality of common-source-lines.

Arithmetic logic unit design in column analog to digital converter with shared gray code generator for correlated multiple samplings

An arithmetic logic unit (ALU) includes a front end latch stage coupled to latch Gray code (GC) outputs of a GC generator in response to a comparator output. A signal latch stage is coupled to latch outputs of the front end latch stage. A GC to binary stage is coupled to generate a binary representation of the GC outputs latched in the signal latch stage. First inputs of an adder stage are coupled to receive outputs of the GC to binary stage. Outputs of the adder stage are generated in response to the first inputs and second inputs of the adder stage. A pre-latch stage is coupled to latch outputs of the adder stage. A feedback latch stage is coupled to latch outputs of the pre-latch stage. The second inputs of the adder stage are coupled to receive outputs of the feedback latch stage.

Arithmetic logic unit design in column analog to digital converter with shared gray code generator for correlated multiple samplings

An arithmetic logic unit (ALU) includes a front end latch stage coupled to latch Gray code (GC) outputs of a GC generator in response to a comparator output. A signal latch stage is coupled to latch outputs of the front end latch stage. A GC to binary stage is coupled to generate a binary representation of the GC outputs latched in the signal latch stage. First inputs of an adder stage are coupled to receive outputs of the GC to binary stage. Outputs of the adder stage are generated in response to the first inputs and second inputs of the adder stage. A pre-latch stage is coupled to latch outputs of the adder stage. A feedback latch stage is coupled to latch outputs of the pre-latch stage. The second inputs of the adder stage are coupled to receive outputs of the feedback latch stage.

Processor with smart cache in place of register file for providing operands

A processor including a pointer storage that stores pointer descriptors each including addressing information, an arithmetic logic unit (ALU) configured to execute an instruction which includes operand indexes each identifying a corresponding pointer descriptor, multiple address generation units (AGUs), each configured to translate addressing information from a corresponding pointer descriptors into memory addresses for accessing corresponding operands stored in a memory, and a smart cache. The smart cache includes a cache storage, and uses the memory addresses from the AGUs to retrieve and store operands from the memory into the cache storage, and to provide the stored operands to the ALU when executing the instruction. The smart cache replaces a register file used by a conventional processor for retrieving and storing operand information. The pointer operands include post-update capability that reduces instruction fetches. Wasted memory cycles associated with cache speculation are avoided.

Processor with smart cache in place of register file for providing operands

A processor including a pointer storage that stores pointer descriptors each including addressing information, an arithmetic logic unit (ALU) configured to execute an instruction which includes operand indexes each identifying a corresponding pointer descriptor, multiple address generation units (AGUs), each configured to translate addressing information from a corresponding pointer descriptors into memory addresses for accessing corresponding operands stored in a memory, and a smart cache. The smart cache includes a cache storage, and uses the memory addresses from the AGUs to retrieve and store operands from the memory into the cache storage, and to provide the stored operands to the ALU when executing the instruction. The smart cache replaces a register file used by a conventional processor for retrieving and storing operand information. The pointer operands include post-update capability that reduces instruction fetches. Wasted memory cycles associated with cache speculation are avoided.

VECTOR COMPUTATIONAL UNIT
20230115874 · 2023-04-13 ·

A microprocessor system comprises a computational array and a vector computational unit. The computational array includes a plurality of computation units. The vector computational unit is in communication with the computational array and includes a plurality of processing elements. The processing elements are configured to receive output data elements from the computational array and process in parallel the received output data elements.

Systems and methods for improving cache efficiency and utilization

Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.