Patent classifications
G06F2117/04
INTEGRATED CIRCUIT INCLUDING STANDARD CELLS, METHOD OF MANUFACTURING THE INTEGRATED CIRCUIT, AND COMPUTING SYSTEM FOR PERFORMING THE METHOD
An integrated circuit includes a standard cell including a first output pin and a second output pin configured to each output the same output signal, a first routing path connected to the first output pin, and a second routing path connected to the second output pin. The first routing path includes a first cell group including at least one load cell, the second routing path includes a second cell group including at least one load cell, and the first routing path and the second routing path are electrically disconnected from each other outside the standard cell.
Clock-tree transformation in high-speed ASIC implementation
A method includes providing a first clock tree including a root clock and a plurality of levels of integrated clock gates (ICGs) under the root clock. The plurality of levels of ICGs in the first clock tree is flattened to generate a second clock tree including a plurality of ICGs in a same level under the root clock. A fake module is formed to reserve a region between the root clock and the plurality of ICGs. The fake module includes the root clock as a first input, and a first plurality of outputs coupled to clock-inputs of the plurality of ICGs. A skew balancing is performed on the second clock tree using a clock tree synthesis (CTS) tool to generate a third clock tree, wherein no buffers are inserted into the fake module, and wherein buffers are inserted by the CTS tool under the plurality of ICGs.
Circuit architecture for expanded design for testability functionality
A circuit architecture for expanded design for testability functionality is provided that includes an Intellectual Property (IP) core for use with a design for an integrated circuit (IC). The IP core provides an infrastructure harness circuit configured to control expanded design for testability functions available within the IC. An instance of the IP core can be included in a circuit block of the design for the IC. The infrastructure harness circuit can include an outward facing interface configured to connect to circuitry outside of the circuit block and an inward facing interface configured to connect to circuitry within the circuit block. The instance of the IP core can be parameterized to configure the infrastructure harness circuit to control a plurality of functions selected from the expanded design for testability functions based on a user parameterization of the instance of the IP core.
Method for finding non-essential flip flops in a VLSI design that do not require retention in standby mode
The invention relates to a method for reducing the number of flip-flops in a VLSI design that require data retention, thereby eliminating the respective backup cells for those flip flops, the method comprises the steps of: (a) defining one or more criteria for non-essentiality of backup cells! (b) during the physical design stage, analyzing the VLSI design based on said one or more criteria for non-essentiality, and finding those flip-flops that meet these criteria, wherein said analysis is performed at the gate level, independent from any higher level representation of the design; and (c) eliminating from the VLSI design those backup cells for all non-essential flip-flops that meet one or more of said criteria for non-essentiality, thereby leaving in the design only those backup cells for those flip-flops that do not meet any of said criteria.
TIMING ERROR DETECTION AND CORRECTION CIRCUIT
An integrated circuit and method of designing an integrated circuit including an error detection and correction circuit is described. The integrated circuit includes a data-path being arranged between an output of a first register and second register clocked by a system clock. The integrated circuit includes a timing error detection and correction circuit (EDAC) which has a clock unit configured to receive a reference clock and to provide a delayed reference clock. The EDAC includes a plurality of transition detectors coupled to a respective node on the data-path and an error detection circuit coupled to each transition detector. The error detection circuit flags an error if a transition occurs during a time period between a transition of the reference clock and a corresponding transition of the delayed reference clock. A timing correction circuit coupled to the error detection circuit outputs the system clock derived from the delayed reference clock.
CLOCK SWEEPING SYSTEM
A clock sweeping system includes multiple delay elements and a selection circuit. The delay elements are configured to generate multiple delayed clock signals. Each delay element is configured to receive an input signal and delay the input signal to generate a corresponding first delayed clock signal. The input signal is one of a first clock signal, a second clock signal, and a corresponding output signal generated by a previous delay element. The selection circuit is configured to select and output, based on a first select signal for a plurality of times, a corresponding second delayed clock signal as a first output clock signal. The selection circuit is further configured to select and output, based on a second select signal, one of the first and second clock signals as a second output clock signal. The first output clock signal is asynchronous with respect to the second output clock signal.
LIGHTWEIGHT UNIFIED POWER FORMAT IMPLEMENTATION FOR EMULATION AND PROTOTYPING
A method for designing a circuit includes adding, to a circuit design, a power switch configured to produce only one output over an acknowledgement port. The power switch does not include input and output supply ports. The method also includes adding, to the circuit design, an isolation circuit in which only one select pin is used to produce an output. The isolation circuit does not include isolation power and retention circuitry. The method also includes adding, to the circuit design, a retention circuit. The retention circuit includes a clock gating enabled register, a first AND gate connected to a clear pin of the register, and a second AND gate connected to a chip enable pin of the register. The method further includes compiling, by a processing device, the circuit design.
CLOCK GATING AND CLOCK SCALING BASED ON RUNTIME APPLICATION TASK GRAPH INFORMATION
An apparatus to facilitate clock gating and clock scaling based on runtime application task graph information is disclosed. The apparatus includes a processor to: receive, from a compiler, a bitstream generated from code of an application, the bitstream related to a workload of the application; generate a task graph of the application using at least part of the bitstream, the task graph to represent one of a relationship and dependency of the code; program the bitstream to an accelerator device, wherein the bitstream to configure the accelerator device to support the workload of the application; execute one or more kernels of the code using the accelerator device; identify one or more optimizations for the accelerator device based on the task graph of the application; and transmit a command to cause the one or more optimizations to be implemented in the at least one region of the accelerator device.
Determining and verifying metastability in clock domain crossings
The technology disclosed relates to verifying metastability for a clock domain crossing (CDC) in a circuit design. The technology disclosed may include, for a destination clock domain in the circuit design, creating a circuit graph based, at least in part, on the circuit design. The circuit graph includes start points and stop points. The start points may be data inputs, clocks, and enables of the destination clock domain. The stop points may be synchronizer outputs of the destination clock domain and a source clock domain in the circuit design. The technology disclosed may also include traversing the circuit graph to mark all graph nodes that reside in a source-destination path of the CDC. Based on the marked graph nodes, the start points, and the stop points, the technology disclosed may also include propagating destination domain qualifiers on the circuit graph within an allowed sequential depth.
METHOD FOR CO-DESIGN OF HARDWARE AND NEURAL NETWORK ARCHITECTURES USING COARSE-TO-FINE SEARCH, TWO-PHASED BLOCK DISTILLATION AND NEURAL HARDWARE PREDICTOR
Methods, systems, and apparatus for combined or separate implementation of coarse-to-fine neural architecture search (NAS), two-phase block NAS, variable hardware prediction, and differential hardware design are provided and described. A variable predictor is trained, as described herein. Then, a controller or policy may be used to iteratively modify a neural network architecture along dimensions formed by neural network architecture parameters. The modification is applied to blocks (e.g., subnetworks) within the neural network architecture. In each iteration, the remainder of the neural network architecture parameters are modified and learned with a differential NAS method. The training process is performed with two-phase block NAS and incorporates a variable hardware predictor to predict power, performance, and area (PPA) parameters. The hardware parameters may be learned as well using the variable hardware predictor.