Patent classifications
G06F9/3004
METHOD AND APPARATUS TO SORT A VECTOR FOR A BITONIC SORTING ALGORITHM
A method is provided that includes performing, by a processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values in a first portion of the lanes are sorted in a first order indicated by the vector sort instruction and the values in a second portion of the lanes are sorted in a second order indicated by the vector sort instruction; and storing the sorted vector in a storage location.
Central plant control system with control region detection based on control envelope ray-casting
Disclosed herein are related to a system, a method, and a non-transitory computer readable medium for controlling a central plant. In one aspect, the system obtains a control envelope for a device controlled by the controller. The control envelope comprises boundaries of a plurality of control regions in a multidimensional operating space for the device. Each control region is enclosed by a corresponding boundary. The system counts a number of times that a ray of the device crosses the boundary of a control region. The system determines whether an operating point of the device is within the control region based on the counted number. The system operates the device according to a predetermined control technique corresponding to the control region, in response to determining that the operating point of the device is within the control region.
Apparatuses and methods for approximating nonlinear function
The present disclosure relates to a method and an apparatus for approximating non-linear function. In some embodiments, an exemplary processing unit includes: one or more registers for storing a lookup table (LUT) and one or more operation elements communicatively coupled with the one or more registers. The LUT includes a control state and a plurality of data entries. The one or more operation elements are configured to: receive an input operand; select one or more bits from the input operand; select a data entry from the plurality of data entries using the one or more bits; and determine an approximation value of a non-linear activation function for the input operand using the data entry.
Non-transitory recording medium having computer-readable program recorded thereon, server apparatus, function graph display control apparatus, and function graph display control method
A non-transitory recording medium records a program that causes a computer to execute a process of causing a display to display, in response to one or more input operations accepted via an input device, one first mathematical expression display area including one first mathematical expression, one first graph display area associated with the one first mathematical expression display area, one first slider display area associated with the one first graph display area, one second mathematical expression display area including one second mathematical expression, one second graph display area which is associated with the one second mathematical expression display area and is an area different from the one first graph display area, and one second slider display area which is associated with the one second graph display area and is an area different from the one first slider display area.
Integrated circuit, semiconductor device and control method for semiconductor device
An integrated circuit for allowing a band of an external memory to be effectively used in processing a layer algorithm is disclosed. One aspect of the present disclosure relates to an integrated circuit including a first arithmetic part including a first arithmetic unit and a first memory, wherein the first arithmetic unit performs an operation and the first memory stores data for use in the first arithmetic unit and a first data transfer control unit that controls transfer of data between the first memory and a second memory of a second arithmetic part including a second arithmetic unit, wherein the second arithmetic part communicates with an external memory via the first arithmetic part.
Performing speculative address translation in processor-based devices
Performing speculative address translation in processor-based devices is disclosed herein. In one exemplary embodiment, a processor-based device provides a processing element (PE) that defines a speculative translation instruction such as an enqueue instruction for offloading operations to a peripheral device. The speculative translation instruction references a plurality of bytes including one or more virtual memory addresses. After receiving the speculative translation instruction, an instruction decode stage of an execution pipeline circuit of the PE transmits a request for address translation of the virtual memory address to a memory management unit (MMU) of the PE. The MMU then performs speculative address translation of the virtual memory address into a corresponding translated memory address. In some embodiments, any address translation errors encountered are raised to an appropriate exception level, and may be raised synchronously or asynchronously with respect to an operation performed when the speculative translation instruction is executed.
Indexing external memory in a reconfigurable compute fabric
Various examples are directed to systems and methods in which a flow controller of a first synchronous flow may receive an instruction to execute a first loop using the first synchronous flow. The flow controller may determine a first iteration index for a first iteration of the first loop. The flow controller may send, to a first compute element of the first synchronous flow, a first synchronous message to initiate a first synchronous flow thread for executing the first iteration of the first loop. The first synchronous message may comprise the iteration index. The first compute element may execute an input/output operation at a first location of a first compute element memory indicated by the first iteration index.
Performing multiple point table lookups in a single cycle in a system on chip
In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
MEMORY CONTROLLER WITH ARITHMETIC LOGIC UNIT AND/OR FLOATING POINT UNIT
Techniques for performing an operation at a memory controller are described. An example includes decoder circuitry to decode a single instruction, the single instruction to include one or more fields for an opcode to indicate an arithmetic or Boolean operation to be performed by a memory controller, and one or more fields to identify at least one source location; and execution circuitry of the memory controller to execute the decoded instruction according to the opcode.
MEMORY DEVICE AND OPERATING METHOD THEREOF
A memory device, includes a memory array for storing a plurality of vector data each of which has an MSB vector and a LSB vector. The memory array includes a plurality of memory units each of which has a first bit and a second bit. The first bit is used to store the MSB vector of each vector data, the second bit is used to store the LSB vector of each vector data. Each vector data is executed with a multiplying-operation, the MSB vector and the LSB vector of each vector data is executed with a first group-counting operation and a second group-counting operation respectively. The threshold voltage distribution of each memory unit is divided into N states, where N is a positive integer and N is less than 2 to the power of 2, the effective bit number stored by each memory unit is less than 2.