G06F12/0862

INSTRUCTION PRE-FETCHING
20180011735 · 2018-01-11 ·

Pre-fetching instructions for tasks of an operating system (OS) is provided by calling a task scheduler that determines a load start time for a set of instructions for a particular task corresponding to a task switch condition. The OS calls, and in response to the load start time, a loader entity module that generates a pre-fetch request that loads the set of instructions for the particular task from a non-volatile memory circuit into a random access memory circuit. The OS calls the task scheduler to switch to the particular task.

HARDWARE CONTROLLED INSTRUCTION PRE-FETCHING
20180011736 · 2018-01-11 ·

A task control circuit maintains, in response to task event information, a task information queue that includes task information for a plurality of tasks. Based upon the task information in the task information queue, a future task switch condition is identified as corresponding to a task switch time for a particular task of the plurality of tasks. A load start time is determined for a set of instructions for the particular task. A pre-fetch request is generated to load the set of instructions for the particular task into the memory circuit. The pre-fetch request is forwarded to a hardware loader circuit. In response to the task switch time, a task event trigger is generated for the particular task. The hardware loader circuit is used to load, in response to the pre-fetch request, the set of instructions from a non-volatile memory into the memory circuit.

Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format

Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.

Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format

Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.

ZERO LATENCY PREFETCHING IN CACHES

This invention involves a cache system in a digital data processing apparatus including: a central processing unit core; a level one instruction cache; and a level two cache. The cache lines in the second level cache are twice the size of the cache lines in the first level instruction cache. The central processing unit core requests additional program instructions when needed via a request address. Upon a miss in the level one instruction cache that causes a hit in the upper half of a level two cache line, the level two cache supplies the upper half level cache line to the level one instruction cache. On a following level two cache memory cycle, the level two cache supplies the lower half of the cache line to the level one instruction cache. This cache technique thus prefetches the lower half level two cache line employing fewer resources than an ordinary prefetch.

ZERO LATENCY PREFETCHING IN CACHES

This invention involves a cache system in a digital data processing apparatus including: a central processing unit core; a level one instruction cache; and a level two cache. The cache lines in the second level cache are twice the size of the cache lines in the first level instruction cache. The central processing unit core requests additional program instructions when needed via a request address. Upon a miss in the level one instruction cache that causes a hit in the upper half of a level two cache line, the level two cache supplies the upper half level cache line to the level one instruction cache. On a following level two cache memory cycle, the level two cache supplies the lower half of the cache line to the level one instruction cache. This cache technique thus prefetches the lower half level two cache line employing fewer resources than an ordinary prefetch.

CAT AWARE LOADS AND SOFTWARE PREFETCHES
20230236973 · 2023-07-27 ·

In one embodiment, a method of selectively reserving portions of a last level cache (LLC) for a multi-core processor, the method comprising: allocating, by an executive system, plural classes of service to the portions of the LLC, wherein the portions comprise ways, and wherein each of the plural classes of service are allocated to one or more of the ways; assigning, by the executive system, one of the plural classes of service to an application as a default class of service, wherein the assignment controls which of the ways the application can allocate into; and overriding, by the application, the default class of service to enable allocation by the application to the one or more of the ways associated with a non-default class of service.

CAT AWARE LOADS AND SOFTWARE PREFETCHES
20230236973 · 2023-07-27 ·

In one embodiment, a method of selectively reserving portions of a last level cache (LLC) for a multi-core processor, the method comprising: allocating, by an executive system, plural classes of service to the portions of the LLC, wherein the portions comprise ways, and wherein each of the plural classes of service are allocated to one or more of the ways; assigning, by the executive system, one of the plural classes of service to an application as a default class of service, wherein the assignment controls which of the ways the application can allocate into; and overriding, by the application, the default class of service to enable allocation by the application to the one or more of the ways associated with a non-default class of service.

HOST TECHNIQUES FOR STACKED MEMORY SYSTEMS
20230004305 · 2023-01-05 ·

Techniques are provided for operating a memory package and more specifically to increasing bandwidth of a system having stacked memory. In an example, a system can include a storage device having a first type of volatile memory and a second type of volatile memory, and a host device coupled to the storage device. The host device can issue commands to the storage device to store and retrieve information of the system. The host device can include a memory map of the storage device and latency information associated with each command of the commands. The host can sort and schedule pending commands according to the latency information and can intermix commands for the first type of volatile memory and commands for the second type of volatile memory to maintain a high utilization or efficiency of a data interface between the host device and the storage device.

Control device of storage system

The present disclosure discloses a control device of data storage system, including a host interface, a peer interface, a storage unit interface, a processor and a local data management module. The host interface is connected and communicated with a storage server for data interaction with the storage server. The peer interface is configured for data communication connection with a storage unit of an adjacent control device in the data storage system. The storage unit interface is configured to connect a storage unit. The local data management module is configured for local data management of the data in the storage unit according to the data management instruction via the processor. The host interface is configured to send result data of local data management to the storage server.