IPIQ

G06F9/38

Neural network training mechanism

11580361 · 2023-02-14 ·

Intel Corporation

An apparatus to facilitate neural network (NN) training is disclosed. The apparatus includes training logic to receive one or more network constraints and train the NN by automatically determining a best network layout and parameters based on the network constraints.

Systems and methods for performing horizontal tile operations

11579883 · 2023-02-14 ·

Intel Corporation

Disclosed embodiments relate to systems and methods for performing instructions specifying horizontal tile operations. In one example, a processor includes fetch circuitry to fetch an instruction specifying a horizontal tile operation, a location of a M by N source matrix comprising K groups of elements, and locations of K destinations, wherein each of the K groups of elements comprises the same number of elements, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction by generating K results, each result being generated by performing the specified horizontal tile operation across every element of a corresponding group of the K groups, and writing each generated result to a corresponding location of the K specified destination locations.

Register sharing mechanism to equally allocate disabled thread registers to active threads

11579878 · 2023-02-14 ·

Intel Corporation

An apparatus is disclosed. The apparatus includes one or more processors comprising register sharing circuitry to receive meta-information indicating a number of threads that are to be disabled and provide an indication that an associated thread is disabled, a plurality of General Purpose Register Files (GRFs), wherein one or more of the plurality of GRFs is associated with one of the plurality of threads and a plurality of multiplexers coupled to the one or more GRFs to receive the indication from the register sharing circuitry and disable thread access to an associated GRF based on an indication that a thread is to be disabled.

Scheduler for amp architecture with closed loop performance and thermal controller

11579934 · 2023-02-14 ·

Apple Inc.

Systems and methods are disclosed for scheduling threads on a processor that has at least two different core types, such as an asymmetric multiprocessing system. Each core type can run at a plurality of selectable voltage and frequency scaling (DVFS) states. Threads from a plurality of processes can be grouped into thread groups. Execution metrics are accumulated for threads of a thread group and fed into a plurality of tunable controllers for the thread group. A closed loop performance control (CLPC) system determines a control effort for the thread group and maps the control effort to a recommended core type and DVFS state. A closed loop thermal and power management system can limit the control effort determined by the CLPC for a thread group, and limit the power, core type, and DVFS states for the system. Deferred interrupts can be used to increase performance.

Distributed processing architecture

11580388 · 2023-02-14 ·

Microsoft Technology Licensing, Llc

Embodiments of the present disclosure include techniques for processing neural networks. Various forms of parallelism may be implemented using topology that combines sequences of processors. In one embodiment, the present disclosure includes a computer system comprising a plurality of processor groups, the processor groups each comprising a plurality of processors. A plurality of network switches are coupled to subsets of the plurality of processor groups. A subset of the processors in the processor groups may be configurable to form sequences, and the network switches are configurable to form at least one sequence across one or more of the plurality of processor groups to perform neural network computations. Various alternative configurations for creating Hamiltonian cycles are disclosed to support data parallelism, pipeline parallelism, layer parallelism, or combinations thereof.

Handling load-exclusive instructions in apparatus having support for transactional memory

11579873 · 2023-02-14 ·

Arm Limited

An apparatus is described with support for transactional memory and load/store-exclusive instructions using an exclusive monitor indication to track exclusive access to a given address. In response to a predetermined type of load instruction specifying a load target address, which is executed within a given transaction, any exclusive monitor indication previously set for the load target address is cleared. In response to a load-exclusive instruction, an abort is triggered for a transaction for which the given address is specified as one of its working set of addresses. This helps to maintain mutual exclusion between transactional and non-transactional threads even if there is load speculation in the non-transactional thread.

Instruction address translation and caching for primary and alternate branch prediction paths

11579884 · 2023-02-14 ·

Advanced Micro Devices, Inc.

Techniques for performing instruction fetch operations are provided. The techniques include determining instruction addresses for a primary branch prediction path; requesting that a level 0 translation lookaside buffer (“TLB”) caches address translations for the primary branch prediction path; determining either or both of alternate control flow path instruction addresses and lookahead control flow path instruction addresses; and requesting that either the level 0 TLB or an alternative level TLB caches address translations for either or both of the alternate control flow path instruction addresses and the lookahead control flow path instruction addresses.

Big data application lifecycle management

11580010 · 2023-02-14 ·

Paypal, Inc.

Aspects of the present disclosure involve systems, methods, devices, and the like for creating an application lifecycle management platform for big data applications. In one embodiment the lifecycle management platform can include a multiple-layer container file that integrates multiple big-data tools/platforms. The system may create a generic template application, create a build environment for the generic template application, create a test environment for the generic template application, and run the built generic template application in the test environment prior to the user writing any new code in the generic template application. In one embodiment, the test environment includes a container management system or virtual machine that launches the big data application (which may be the generic template application before a developer edits the file) on a separate big-data server cluster.

Processing pipeline with first and second processing modes having different performance or energy consumption characteristics

11579879 · 2023-02-14 ·

Arm Limited

An apparatus 2 has a processing pipeline 4 supporting at least a first processing mode and a second processing mode with different energy consumption or performance characteristics. A storage structure 22, 30, 36, 50, 40, 64, 44 is accessible in both the first and second processing modes. When the second processing mode is selected, control circuitry 70 triggers a subset 102 of the entries of the storage structure to be placed in a power saving state.

Univariate density estimation method

11579947 · 2023-02-14 ·

Microsoft Technology Licensing, Llc

A method for use with a computing device. The method may include receiving a data set including a plurality of univariate data points and determining a target kernel bandwidth for a kernel density estimator (KDE). Determining the target kernel bandwidth may include computing a plurality of sample KDEs and selecting the target kernel bandwidth based on the sample KDEs. The method may further include computing the KDE for the data set using the target kernel bandwidth. For one or more tail regions of the data set, the method may further include computing one or more respective tail extensions. The method may further include computing and outputting a renormalized piecewise density estimator that, in each tail region, equals a renormalization of the respective tail extension for that tail region, and, outside the one or more tail regions, equals a renormalization of the KDE.

Patent classifications

G06F9/38