Patent classifications
G06F115/10
Method to segregate logic and memory into separate dies for thermal management in a multi-dimensional packaging
A packaging technology to improve performance of an AI processing system resulting in an ultra-high bandwidth system. An IC package is provided which comprises: a substrate; a first die on the substrate, and a second die stacked over the first die. The first die can be a first logic die (e.g., a compute chip, CPU, GPU, etc.) while the second die can be a compute chiplet comprising ferroelectric or paraelectric logic. Both dies can include ferroelectric or paraelectric logic. The ferroelectric/paraelectric logic may include AND gates, OR gates, complex gates, majority, minority, and/or threshold gates, sequential logic, etc. The IC package can be in a 3D or 2.5D configuration that implements logic-on-logic stacking configuration. The 3D or 2.5D packaging configurations have chips or chiplets designed to have time distributed or spatially distributed processing. The logic of chips or chiplets is segregated so that one chip in a 3D or 2.5D stacking arrangement is hot at a time.
System and method for routing in an electronic design
Embodiments include herein are directed towards a method for electronic circuit design. Embodiments may include enabling data transmission between plurality of protocol adapters, each of the protocol adapters including one ingress port and one egress port, wherein the ingress port of each of the plurality of protocol adapters maintains an active connection with a single egress port at one time. Embodiments may further include transmitting data between the plurality of protocol adapters using a distributed routing matrix that provides an interface between the plurality of protocol adapters.
Method for latency detection on a hardware simulation accelerator
A method for performing automated detection of transaction latency for a processor design model running an application in a hardware simulation accelerator. The method includes loading the processor design model into the hardware simulation accelerator, loading the application into the processor design model running within the hardware simulation accelerator, simulating the processor design model running the application within the hardware simulation accelerator, and for each individual transaction of the application: establishing a first checkpoint at a start of an execution of the individual transaction by creating a breakpoint and resetting a counter, establishing a second checkpoint at a completion of the transaction by creating another breakpoint and obtaining latency information for the second checkpoint. The latencies of the individual transaction from the start to the completion are measured based on the latency information.
System and method for non-intrusive debugging at an embedded software breakpoint
The present disclosure relates to a method for use with an electronic design. Embodiments may include simulating a processor model and a hardware model, each executed with a corresponding simulator thread on a simulation platform. Embodiments may also include simulating embedded software using the processor model. The simulating may include updating a given register of the processor model that stores a value that changes in response to switching between processes within the embedded software. Embodiments may further include setting a simulator breakpoint and a software breakpoint and enabling debugging of both non-virtual and virtual addresses at the software breakpoint without leaving the software breakpoint.
Automated overclocking using a prediction model
A system, a method, and a machine-readable medium for overclocking a computer system is provided. An example of a method for overclocking a computer system includes predicting a stable operating frequency for a central processing unit (CPU) in a target system based, at least in part, on a model generated from data collected for a test system. An operating frequency for the CPU is adjusted to the stable operating frequency. A benchmark test is run to confirm that the CPU is operating within limits.
Frozen boundary multi-domain parallel mesh generation
A computer-implemented method for meshing a model of a physical electro-magnetic assembly is disclosed. The method includes separating the base mesh of the model into two domains and freezing the boundary between these domains. Each domain is then sent for mesh refinement by separate computer processors. Each computer processor generates a refined mesh of the respective domain without communication between processors. Two-way boundary mesh mapping is then performed, resulting in a global conformal mesh. Surface recovery and identity assignment are then performed by separate computer processors in parallel for each domain, without communication between processors. Related apparatus, systems, techniques, methods and articles are also described.
Processor core simulator including trace-based coherent cache driven memory traffic generator
A core simulator includes one or more simulated processors, a trace-based traffic generator, and a simulated memory subsystem. Each simulated processor includes a core element and at least one lower-level cache excluded from the core element. The trace-based traffic generator includes a plurality of modeled caches that model the at least lower-level cache without modeling the core element. The trace-based traffic generator is configured to receive at least one workload trace and based on the workload trace simulate actual memory traffic to be processed by the simulated memory subsystem. The simulated memory subsystem is shared between the at least one simulated processor and the trace-based traffic generator. The trace-based traffic generator performs a data exchange with the memory subsystem based on the at least one workload trace. The data exchange impacts a measured performance of the at least one simulated processor.
Method and system for converting a single-threaded software program into an application-specific supercomputer
The invention comprises (i) a compilation method for automatically converting a single-threaded software program into an application-specific supercomputer, and (ii) the supercomputer system structure generated as a result of applying this method. The compilation method comprises: (a) Converting an arbitrary code fragment from the application into customized hardware whose execution is functionally equivalent to the software execution of the code fragment; and (b) Generating interfaces on the hardware and software parts of the application, which (i) Perform a software-to-hardware program state transfer at the entries of the code fragment; (ii) Perform a hardware-to-software program state transfer at the exits of the code fragment; and (iii) Maintain memory coherence between the software and hardware memories. If the resulting hardware design is large, it is divided into partitions such that each partition can fit into a single chip. Then, a single union chip is created which can realize any of the partitions.