G06F9/30079

Computing chip, hashrate board and data processing apparatus

This disclosure relates to a computing chip, a hashrate board, and a data processing apparatus. The computing chip includes a plurality of operation stages arranged in a pipeline configuration. Each operation stage includes: a first combinational logic circuit occupying a plurality of first cell points adjacent to each other, at least a portion of the first cell points being located in a first incomplete column; one or more second combinational logic circuits each occupying one or more second cell points, at least a portion of the second cell points being located in a second incomplete column; and a plurality of registers each occupying a plurality of third cell points, at least a portion of the third cell points being located in the first incomplete column or the second incomplete column. The first cell points, the second cell points, and third cell points occupy equal areas on the computing chip.

Programmable instruction buffering for accumulating a burst of instructions

A processing system 2 includes a processing pipeline 12, 14, 16, 18, 28 which includes fetch circuitry 12 for fetching instructions to be executed from a memory 6, 8. Buffer control circuitry 34 is responsive to a programmable trigger, such as explicit hint instructions delimiting an instruction burst, or predetermined configuration data specifying parameters of a burst together with a synchronising instruction, to trigger the buffer control circuitry to stall a stallable portion of the processing pipeline (e.g. issue circuitry 16), to accumulate within one or more buffers 30, 32 fetched instructions starting from a predetermined starting instruction, and, when those instructions have been accumulated, to restart the stallable portion of the pipeline.

METHOD AND APPARATUS FOR DYNAMICALLY ADJUSTING PIPELINE DEPTH TO IMPROVE EXECUTION LATENCY

Apparatus and method for managing pipeline depth of a data processing device. For example, one embodiment of an apparatus comprises: an interface to receive a plurality of work requests from a plurality of clients; and a plurality of engines to perform the plurality of work requests; wherein the work requests are to be dispatched to the plurality of engines from a plurality of work queues, the work queues to store a work descriptor per work request, each work descriptor to include information needed to perform a corresponding work request, wherein the plurality of work queues include a first work queue to store work descriptors associated with first latency characteristics and a second work queue to store work descriptors associated with second latency characteristics; engine configuration circuitry to configure a first engine to have a first pipeline depth based on the first latency characteristics and to configure a second engine to have a second pipeline depth based on the second latency characteristics.

Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format

Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.

Performing speculative address translation in processor-based devices

Performing speculative address translation in processor-based devices is disclosed herein. In one exemplary embodiment, a processor-based device provides a processing element (PE) that defines a speculative translation instruction such as an enqueue instruction for offloading operations to a peripheral device. The speculative translation instruction references a plurality of bytes including one or more virtual memory addresses. After receiving the speculative translation instruction, an instruction decode stage of an execution pipeline circuit of the PE transmits a request for address translation of the virtual memory address to a memory management unit (MMU) of the PE. The MMU then performs speculative address translation of the virtual memory address into a corresponding translated memory address. In some embodiments, any address translation errors encountered are raised to an appropriate exception level, and may be raised synchronously or asynchronously with respect to an operation performed when the speculative translation instruction is executed.

METHODS, SYSTEMS, AND COMPUTER READABLE MEDIA FOR CUSTOMIZING DATA PLANE PIPELINE PROCESSING USING BERKELEY PACKET FILTER (BPF) HOOK ENTRY POINTS
20230221975 · 2023-07-13 ·

A method for customizing data plane pipeline processing to generate test packets using Berkeley packet filter (BPF) hook entry points is disclosed. The method includes receiving a program code script for customizing a performance of one or more stages of a multistage processing pipeline platform configured for conducting a network operation process, wherein the multistage processing pipeline platform includes one or more predefined hook entry points and compiling the program code script to generate a customized plug-in component that is deployed in at least one of the one or more predefined hook entry points of the multistage processing pipeline platform. The method further includes initiating execution of the customized plug-in component in response to a triggering of the at least one of the one or more predefined hook entry points of a stage element of the multistage processing pipeline platform, wherein the execution of the customized plug-in component dynamically configures at least one feature of the network operation process at runtime.

Container certificate injection

Methods, systems, and apparatus, including computer programs encoded on computer storage media for using certificate injection tasks to generate containers having corresponding digital certificates. One of the methods includes receiving, by the distributed computing system, a source container image. A certificate injection task is executed, including: launching a container instance from the source container image and executing injection code within an execution environment of the launched container instance that writes one or more digital certificates to one or more corresponding locations within a file system of the execution environment. An output container image having the one or more digital certificates is then generated.

64-BIT TWO-DIMENSIONAL BLOCK LOAD WITH TRANSPOSE

An apparatus to facilitate 64-bit two-dimensional (2D) block load with transpose is disclosed. The apparatus includes a processor comprising processing resources; and load store pipeline hardware circuitry coupled to the processing resources, the load store pipeline hardware circuitry to receive a 64-bit two-dimensional (2D) block load message with transpose from the processing resources. The load store pipeline hardware circuitry comprising a load store pipeline sequencer to map rows of a block of memory corresponding to the 64-bit 2D block load message with transpose to 64-bit standard load messages; and load store pipeline return circuitry to: sequentially number general register files (GRFs) used for returning elements of the block of memory accessed by the 64-bit standard load messages to the processing resources; and return, to the processing resources, the sequentially numbered GRFs in response to the 64-bit 2D block load message with transpose.

WATCHPOINTS FOR DEBUGGING IN A GRAPHICS ENVIRONMENT

An apparatus to facilitate watchpoints for debugging in a graphics environment is disclosed. The apparatus includes processing resources to perform graphics operations using a plurality of threads; and load store pipeline hardware circuitry coupled to the processing resources to: configure a watchpoint register with a value of a watchpoint address, the watchpoint address comprising an address of a memory location in the processor; receive a memory access request from a thread of the plurality of threads; determine, using the watchpoint register, whether the memory access request is requesting access to the watchpoint address; and responsive to the memory access request requesting access to the watchpoint address, return an exception payload to the thread, the exception payload comprising watchpoint details corresponding to the watchpoint address and a scoreboard identifier (SBID) associated with the memory access request.

Prefetch mechanism for a cache structure

An apparatus and method is provided, the apparatus comprising a processor pipeline to execute instructions, a cache structure to store information for reference by the processor pipeline when executing said instructions; and prefetch circuitry to issue prefetch requests to the cache structure to cause the cache structure to prefetch information into the cache structure in anticipation of a demand request for that information being issued to the cache structure by the processor pipeline. The processor pipeline is arranged to issue a trigger to the prefetch circuitry on detection of a given event that will result in a reduced level of demand requests being issued by the processor pipeline, and the prefetch circuitry is configured to control issuing of prefetch requests in dependence on reception of the trigger.