G06F11/0754

Interface Service Function Monitoring Method and System Based on Data Acquisition
20220321440 · 2022-10-06 ·

The present disclosure discloses an interface service function monitoring method and system based on data acquisition. The method includes the following steps: initiating, by a monitoring device, a service function monitoring request for an interface; determining whether the interface makes a response to the monitoring request; in a case that a response is made, determining whether the response content of the interface is expected; in a case that an abnormality occurs, analyzing, according to abnormality information, a cause of the abnormality, forming an abnormality analysis report, storing the abnormality information and the abnormality analysis report to a target server, and sending, by the target server, the abnormality information and the abnormality analysis report to an administrator; and determining, by the administrator, at least one of current service function states of the interface according to a monitoring record and the abnormality analysis report which are stored in the target server.

POWER SUPPLY UNIT POWER LEVEL PROTECTION LIMITS

Techniques are disclosed for managing power supplies and related power consumption alerts for a server based on the availability of redundant power supplies. When the system detects that a redundant power supply to a server has ceased to provide power, the system configures a new power consumption threshold for the remaining power supply. The new power consumption threshold is different than an initial power consumption threshold (if configured) when the redundant power supply was providing power. A power consumption threshold for triggering power consumption alerts for a power supply may be based on a number of power supplies actively providing power.

Discovery crawler for application dependency discovery, reporting, and management tool

Techniques for monitoring operating statuses of an application and its dependencies are provided. A monitoring application may collect and report the operating status of the monitored application and each dependency. Through use of existing monitoring interfaces, the monitoring application can collect operating status without requiring modification of the underlying monitored application or dependencies. The monitoring application may determine a problem service that is a root cause of an unhealthy state of the monitored application. Dependency analyzer and discovery crawler techniques may automatically configure and update the monitoring application. Machine learning techniques may be used to determine patterns of performance based on system state information associated with performance events and provide health reports relative to a baseline status of the monitored application. Also provided are techniques for testing a response of the monitored application through modifications to API calls. Such tests may be used to train the machine learning model.

Enabling stateless accelerator designs shared across mutually-distrustful tenants
11651112 · 2023-05-16 · ·

An apparatus to facilitate enabling stateless accelerator designs shared across mutually-distrustful tenants is disclosed. The apparatus includes a fully-homomorphic encryption (FHE)-capable circuitry to establish a secure session with a trusted environment executing on a host device communicably coupled to the apparatus; generate, as part of establishing the secure session, per-tenant FHE keys for each tenant utilizing the FHE-capable circuitry, the per-tenant FHE keys utilized to encrypt tenant data provided to an FHE-capable compute kernel of the FHE-capable circuitry; process tenant data that is in an FHE-encrypted format encrypted with a per-tenant FHE key of the per-tenant FHE keys; and store the tenant data that is in the FHE-encrypted format encrypted with the per-tenant FHE key of the per-tenant FHE keys.

Detecting datacenter mass outage with near real-time/offline using ml models

The present embodiments relate to data center outage detection and alert generation. An outage detection service as described herein can process near real-time data from various sources in a datacenter and process the data using a model to determine one or more projected sources of a detected outage. The model as described herein can include one or more machine learning models incorporating a series of rules to process near-real time data and offline data and determine one or more projected sources of an outage. An alert message can be generated to provide the projected sources of the outage and other data relevant to the outage.

Managing data center failure events

Managing data center recovery from failure events can include a failure event platform having aspects provided via a user interface that integrates multiple failure and recovery management and execution features. The features can include, among others, application drift monitoring between production and recovery environments, real-time health checks of system components, user-modifiable scripting for prioritizing and customizing data center recovery actions, and a recovery execution tool.

PROGRAMMATIC SELECTION OF BREAKPOINTS AND TABLE VALUES FOR A LOOKUP TABLE THAT CONFORMS WITH AN ERROR TOLERANCE
20230153185 · 2023-05-18 ·

Exemplary embodiments may programmatically determine the number and locations of breakpoints and table values in a lookup table so as to reduce their counts. In some exemplary embodiments, the minimum number of breakpoints and breakpoint locations that conform with the error tolerance may be determined along with table values for those breakpoints. The exemplary embodiments may determine if a lookup table that conforms with the error tolerance is feasible and may inform a user of the feasibility or infeasibility of generating the lookup table. Where it is feasible to generate the lookup table, the exemplary embodiments may provide automatic programmatic generation of a lookup table with minimal breakpoints and optimal or near-optimal table values.

OUT-OF-BOUNDS RECOVERY CIRCUIT
20230205621 · 2023-06-29 ·

Out-of-bounds recovery circuits configured to detect an out-of-bounds violation in an electronic device, and cause the electronic device to transition to a predetermined safe state when an out-of-bounds violation is detected. The out-of-bounds recovery circuits include detection logic configured to detect that an out-of-bounds violation has occurred when a processing element of the electronic device has fetched an instruction from an unallowable memory address range for the current operating state of the electronic device; and transition logic configured to cause the electronic device to transition to a predetermined safe state when an out-of-bounds violation has been detected by the detection logic.

SYSTEM MONITORING METHOD AND APPARATUS

A system monitoring method and apparatus comprises: collecting periodically status indicator data of a monitored system to generate a status indicator data sequence; selecting predetermined pieces of status indicator data according to data collecting time in a reverse chronological order; determining a category from predetermined categories, the predetermined pieces of status indicator data belonging to the determined category; selecting, from the historical status indicator data, status indicator data belonging to the determined category and obtained in a collection period as characteristic data of the determined category; calculating a predicted value of a status indicator of the system at a predicting moment using the characteristic data; and determining whether the system is abnormal, based on a difference between the calculated predicted value and a true value of the status indicator of the system collected at the predicting moment. The present implementation can accurately find the abnormality of the system rapidly.

SYSTEM TO TRACK AND MEASURE MACHINE LEARNING MODEL EFFICACY

Systems and/or techniques for facilitating online-monitoring of machine learning models are provided. In various embodiments, a system can receive monitoring settings associated with a machine learning model to be monitored. In various cases, the monitoring settings can identify a first set of data features that are generated as output by the machine learning model. In various cases, the monitoring settings can identify a second set of data features that are received as input by the machine learning model. In various aspects, the system can compute a first set of statistical metrics based on the first set of data features. In various cases, the first set of statistical metrics can characterize a performance quality of the machine learning model. In various instances, the system can compute a second set of statistical metrics based on the second set of data features. In various cases, the second set of statistical metrics can characterize trends or distributions of input data associated with the machine learning model. In various aspects, the system can store the first set of statistical metrics and the second set of statistical metrics in a data warehouse that is accessible to an operator. In various embodiments, the system can render the first set of statistical metrics and the second set of statistical metrics on an electronic interface, such that the first set of statistical metrics and the second set of statistical metrics are viewable to the operator.