G06E1/045

TRANSPOSE OPERATIONS USING PROCESSING ELEMENT ARRAY
20210096823 · 2021-04-01 ·

Provided are integrated circuits and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.

Dual phase matrix-vector multiplication system

A processor can scan a portion of a vector to identify first nonzero entries. The processor can scan another portion of the vector to identify second nonzero entries. The processor can scale a portion of a matrix using the first nonzero entries to generate first intermediate elements. The processor can scale another portion of the matrix using the second nonzero entries to generate second intermediate elements. The processor can store the first intermediate elements in a first buffer and store the second intermediate elements in a second buffer. The processor can copy a subset of the first intermediate elements from the first buffer to a memory and copy a subset of the second intermediate elements from the second buffer to the memory. The subsets of first and second intermediate elements can be aggregated to generate an output vector.

APPARATUS AND METHODS FOR IMPLEMENTING ARBITRARY UNITARY TRANSFORMATIONS ON OPTICAL MODES VIA A RECTANGULAR ARCHITECTURE
20210096443 · 2021-04-01 ·

An apparatus includes a first optical circuit and a second optical circuit. The first optical circuit has a network of interconnected interferometers to perform an M-mode universal transformation on N input optical modes that are divided into (M−1) groups of pulses. The first optical circuit also includes M input ports. Each input port of a first (M−1) input ports is configured to receive a corresponding group of pulses in the (M−1) groups of pulses. The first optical circuit also includes M output ports and a first delay line to couple an Mth output port with an Mth input port. The second optical circuit includes a network of beamsplitters and swap gates to perform a (2M−3)-mode residual transformation. The first optical circuit and the second optical circuit are configured to perform an arbitrary N-mode unitary transformation to the N input optical modes via a rectangular architecture.

Processor core design optimized for machine learning applications

A computing system includes a plurality of functional units, each functional unit having one or more inputs and an output. There is a shared memory block coupled to the inputs and outputs of the plurality of functional units. There is a private memory block assigned to each of the plurality of functional units. An inter functional unit data bypass (IFUDB) block is coupled to the plurality of functional units. The IFUDB is configured to route signals between the one or more functional units without use of the shared memory block.

HETEROGENEOUSLY INTEGRATED OPTICAL NEURAL NETWORK ACCELERATOR

Embodiments of the present disclosure are directed toward techniques and configurations for an optical accelerator including a photonics integrated circuit (PIC) for an optical neural network (ONN). In embodiments, an optical accelerator package includes the PIC and an electronics integrated circuit (EIC) that is heterogeneously integrated into the optical accelerator package to proximally provide pre- and post-processing of optical signal inputs and optical signal outputs provided to and received from an optical matrix multiplier of the PIC. In some embodiments, the EIC is a single EIC or discrete EICs to provide pre- and post-processing of the optical signal inputs and optical signal outputs including optical to electrical and electrical to optical transduction. Other embodiments may be described and/or claimed.

TECHNIQUES FOR CONFORMANCE TESTING COMPUTATIONAL OPERATIONS
20210026759 · 2021-01-28 ·

Examples described herein generally relate to performing conformance testing of a computational operation. A reference result including one or more reference intermediate products and a reference accumulator output at a first level of precision can be generated for the computational operation and based on one or more inputs. A hardware result can similarly be created using hardware at a second level of precision. The reference result can be compared to the hardware result to determine a variance value. A conformance result can be output based on whether the variance value is within a threshold range.

Matrix multiplication using optical processing

Systems and methods for performing matrix operations using a photonic processor are provided. The photonic processor includes encoders configured to encode a numerical value into an optical signal and optical multiplication devices configured to output an electrical signal proportional to a product of one or more encoded values. The optical multiplication devices include a first input waveguide, a second input waveguide, a coupler circuit coupled to the first input waveguide and the second input waveguide, a first detector and a second detector coupled to the coupler circuit, and a circuit coupled to the first detector and second detector and configured to output a current that is proportional to a product of a first input value and a second input value.

Photonic Blockchain Based on Optical Proof-of-Work

An apparatus for combined digital and optical processing of a cryptocurrency data block includes a digital processor that computes a hash vector from the cryptocurrency data block; a laser and splitter that produces optical input signals; optical modulators that binary phase-shift key modulate the optical input signals based on the hash vector; a photonic matrix multiplier circuit that performs an optically perform a discrete matrix-vector product operation on the modulated optical input signals to produce optical output signals, where the discrete matrix-vector product operation is defined by matrix elements limited to K discrete values, where 2K17; and photodetectors and comparators that perform optoelectronic conversions of the optical output signals to produce corresponding digital electronic output signals. The digital processor performs a second hash computation on an XOR result between the digital electronic output signals and the hash vector to produce a proof of work result.

Transpose operations using processing element array

Provided are systems and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.

TRANSPOSE OPERATIONS USING PROCESSING ELEMENT ARRAY
20200409664 · 2020-12-31 ·

Provided are systems and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.