Patent classifications
G06F7/57
Tensor Accelerator Capable of Increasing Efficiency of Data Sharing
A tensor accelerator includes two tile execution units and a bidirectional queue. Each of the tile execution units includes a buffer, a plurality of arithmetic logic units, a network, and a selector. The buffer includes a plurality of memory cells. The network is coupled to the plurality of memory cells. The selector is coupled to the network and the plurality of arithmetic logic units. The bidirectional queue is coupled between the selectors of the tile execution units.
PRINTING APPARATUS AND USB EQUIPMENT
There is provided a printing apparatus including: a first receptacle connector that couples a first external device and an arithmetic section; a second receptacle connector that couples a second external device and the arithmetic section; and a USB controller that causes the arithmetic section to function as the device or the host based on a first electric signal input from the first external device and a second electric signal input from the second external device, in which, when the first external device is coupled, the USB controller causes the arithmetic section to function as the device when the second external device that functions as the device is coupled, causes the arithmetic section to function as the device when the second external device that functions as the host is coupled, and causes the arithmetic section to function as the device when the second external device is not coupled.
PRINTING APPARATUS AND USB EQUIPMENT
There is provided a printing apparatus including: a first receptacle connector that couples a first external device and an arithmetic section; a second receptacle connector that couples a second external device and the arithmetic section; and a USB controller that causes the arithmetic section to function as the device or the host based on a first electric signal input from the first external device and a second electric signal input from the second external device, in which, when the first external device is coupled, the USB controller causes the arithmetic section to function as the device when the second external device that functions as the device is coupled, causes the arithmetic section to function as the device when the second external device that functions as the host is coupled, and causes the arithmetic section to function as the device when the second external device is not coupled.
Elements for in-memory compute
A memory array arranged in multiple columns and rows. Computation circuits that each calculate a computation value from cell values in a corresponding column. A column multiplexer cycles through multiple data lines that each corresponds to a computation circuit. Cluster cycle management circuitry determines a number of multiplexer cycles based on a number of columns storing data of a compute cluster. A sensing circuit obtains the computation values from the computation circuits via the column multiplexer as the column multiplexer cycles through the data lines. The sensing circuit combines the obtained computation values over the determined number of multiplexer cycles. A first clock may initiate the multiplexer to cycle through its data lines for the determined number of multiplexer cycles, and a second clock may initiate each individual cycle. The multiplexer or additional circuitry may be utilized to modify the order in which data is written to the columns.
Elements for in-memory compute
A memory array arranged in multiple columns and rows. Computation circuits that each calculate a computation value from cell values in a corresponding column. A column multiplexer cycles through multiple data lines that each corresponds to a computation circuit. Cluster cycle management circuitry determines a number of multiplexer cycles based on a number of columns storing data of a compute cluster. A sensing circuit obtains the computation values from the computation circuits via the column multiplexer as the column multiplexer cycles through the data lines. The sensing circuit combines the obtained computation values over the determined number of multiplexer cycles. A first clock may initiate the multiplexer to cycle through its data lines for the determined number of multiplexer cycles, and a second clock may initiate each individual cycle. The multiplexer or additional circuitry may be utilized to modify the order in which data is written to the columns.
Method and apparatus for efficient multiplication to improve performance in computational machines
A method and apparatus is disclosed for determining a stochastic binary string (SBS) representing a value based on the value represented in binary two's complement. Several different generators are disclosed for generating SBS strings, each being generated to have particular features that are advantageous under various conditions in which the string is to be multiplied with another SBS string. Several such generators can be presented and selected depending upon the particular values to be converted to SBS representation and the functions to be performed on those values.
AI synaptic coprocessor
A synaptic coprocessor may include a memory configured to store a plurality of Very Long Data Words, each as a test Very Long Data Word (VLDW) having a length in the range of about one thousand bits to one million or more bits and containing encoded information that is distributed across the length of the VLDW. A processor generates search terms and a processing logic unit receives a test VLDW from the memory, receives a search term from the processor, and computes a Boolean inner product between the search term and the test VLDW read from memory indicative of the measure of similarity between the test VLDW and the search term. Optionally, buffers within logic circuits of processing pipelines may receive the test VLDWs.
AI synaptic coprocessor
A synaptic coprocessor may include a memory configured to store a plurality of Very Long Data Words, each as a test Very Long Data Word (VLDW) having a length in the range of about one thousand bits to one million or more bits and containing encoded information that is distributed across the length of the VLDW. A processor generates search terms and a processing logic unit receives a test VLDW from the memory, receives a search term from the processor, and computes a Boolean inner product between the search term and the test VLDW read from memory indicative of the measure of similarity between the test VLDW and the search term. Optionally, buffers within logic circuits of processing pipelines may receive the test VLDWs.
Extended memory architecture
Systems, apparatuses, and methods related to extended memory communication subsystems for performing extended memory operations are described. An example apparatus can include a plurality of computing devices. Each of the computing devices can include a processing unit configured to perform an operation on a block of data, and a memory array configured as a cache for each respective processing unit. The example apparatus can further include a first communication subsystem coupled to a host and to each of the plurality of communication subsystems. The example apparatus can further include a plurality of second communication subsystems coupled to each of the plurality of computing devices. Each of the plurality of computing devices can be configured to receive a request from the host, send a command to execute at least a portion of the operation, and receive a result of performing the operation from the at least one hardware accelerator.
Extended memory architecture
Systems, apparatuses, and methods related to extended memory communication subsystems for performing extended memory operations are described. An example apparatus can include a plurality of computing devices. Each of the computing devices can include a processing unit configured to perform an operation on a block of data, and a memory array configured as a cache for each respective processing unit. The example apparatus can further include a first communication subsystem coupled to a host and to each of the plurality of communication subsystems. The example apparatus can further include a plurality of second communication subsystems coupled to each of the plurality of computing devices. Each of the plurality of computing devices can be configured to receive a request from the host, send a command to execute at least a portion of the operation, and receive a result of performing the operation from the at least one hardware accelerator.