G06F7/5318

HYBRID ELECTRO-PHOTONIC NETWORK-ON-CHIP
20240111091 · 2024-04-04 ·

Various embodiments provide for a circuit package including an electronic integrated circuit comprising a plurality of processing elements, and a plurality of bidirectional photonic channels, e.g., implemented in a photonic integrated circuit underneath the electronic integrated circuit, that connect the processing elements into an electro-photonic network. The processing elements include message routers with photonic-channel interfaces. Each bidirectional photonic channel interfaces at one end with a photonic-channel interface of the message router of a first one of the processing elements and at the other end with a photonic-channel interface of the message router of a second one of the processing elements and is configured to optically transfer messages (e.g., packets) between the message routers of the first and second processing elements.

ACCUMULATOR, MULTIPLIER, AND OPERATOR CIRCUIT

This application provides an accumulator, a multiplier, and an operator circuit, and relates to the field of electronic technologies, to reduce an area and power consumption of the accumulator. The accumulator includes W compressor layers, where W is an integer greater than or equal to 1. The W compressor layers include at least one first compressor layer. In an input array of each first compressor layer, a first array includes a plurality of positive-phase bits, and a second array includes a plurality of negative-phase bits. Each first compressor layer includes a first compression circuit configured to compress the first array and a second compression circuit configured to compress the second array. To be specific, bits with different phases in the input array of each first compressor layer are compressed by different compression circuits.

APPARATUS AND METHOD FOR PROCESSING AN INSTRUCTION MATRIX SPECIFYING PARALLEL AND DEPENDENT OPERATIONS
20190227982 · 2019-07-25 ·

An execution unit to execute instructions using a time-lag sliced architecture (TLSA). The execution unit includes a first computation unit and a second computation unit, where each of the first computation unit and the second computation unit includes a plurality of logic slices arranged in order, where each of the plurality of logic slices except a lattermost logic slice is coupled to an immediately following logic slice to provide an output of that logic slice to the immediately following logic slice, where the immediately following logic slice is to execute with a time lag with respect to its immediately previous logic slice. Further, each of the plurality of logic slices of the second computation unit is coupled to a corresponding logic slice of the first computation unit to receive an output of the corresponding logic slice of the first computation unit.

PROGRAMMABLE MULTIPLY-ADD ARRAY HARDWARE
20190196788 · 2019-06-27 ·

An integrated circuit including a data architecture including N adders and N multipliers configured to receive operands. The data architecture receives instructions for selecting a data flow between the N multipliers and the N adders of the data architecture. The selected data flow includes the options: (1) a first data flow using the N multipliers and the N adders to provide a multiply-accumulate mode and (2) a second data flow to provide a multiply-reduce mode.

ELECTRO-PHOTONIC NETWORK FOR MACHINE LEARNING
20240201436 · 2024-06-20 · ·

Various embodiments provide for electro-photonic networks, including a plurality of processing elements connected by bidirectional photonic channels, suited for implementing neural-network models. Weights of the model may be preloaded into memory of the processing elements based on assignments of neural nodes to processing elements implementing them, and routers of the processing elements can be configured to stream activations between the processing elements based on a predetermined flow of activations in the model.

CLOCK SIGNAL DISTRIBUTION USING PHOTONIC FABRIC
20240201437 · 2024-06-20 · ·

Various embodiments provide for clock signal distribution within a processor, such as a machine learning (ML) processor, using a photonic fabric.

Multi-input configurable logic cell with configurable output region

Configurable circuits include an input selection region, a computation region, a switching region, and an output region. The input selection region includes a set of input multiplexers and selects and routes input signals. The computation region includes a set of lookup tables, each lookup table being coupled to selected signals from the input selection stage to generate a respective output signal. The switching region includes a set of output multiplexers, each output multiplexer being coupled to output signals from the set of lookup tables to provide circuit outputs responsive to respective output selection signals. The output region includes a domino logic stage, having a set of transistors, coupled to output signals from the set of lookup tables to provide circuit outputs that determine combinations of the signals output by the set of lookup tables.

Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
10289605 · 2019-05-14 · ·

A matrix of execution blocks form a set of rows and columns. The rows support parallel execution of instructions and the columns support execution of dependent instructions. The matrix of execution blocks process a single block of instructions specifying parallel and dependent instructions.

MACHINE LEARNING OPTIMIZATION CIRCUIT AND METHOD THEREOF
20240281209 · 2024-08-22 ·

A machine learning optimization circuit and a method thereof are provided. The method includes steps of: generating a local feature matrix from an extraction range in a feature tensor matrix, and the local feature matrix includes feature values of X columns, Y rows, and Z channels; partitioning W sub-feature matrices from the local feature matrix, and each of the W sub-feature matrices includes X?Y?Z/W feature values; simultaneously performing parallel dot product operations on the W sub-feature matrices by W?K parallel operation modules to generate W?K temporary feature matrices; and integrating the W?K temporary feature matrices into a local feature output matrix corresponding to the local feature matrix, and the local feature output matrix includes feature values of X columns, Y rows, and Z channels.

ELECTRO-PHOTONIC NETWORK FOR MACHINE LEARNING
20240280747 · 2024-08-22 ·

Electro-photonic networks, including a plurality of processing elements connected by bidirectional photonic channels, suited for implementing neural-network models. Weights of the model may be preloaded into memory of the processing elements based on assignments of neural nodes to processing elements implementing them, and routers of the processing elements can be configured to stream activations between the processing elements based on a predetermined flow of activations in the model.