G06F11/3447

Adaptable online breakpoint detection over I/O trace time series via deep neural network autoencoders re-parameterization

One example method includes accessing I/O traces, generating parameters based on the I/O traces, and defining an autoencoder deep neural network, training the autoencoder deep neural network using the parameters, collecting and storing new I/O traces, computing an encoded features difference series using the new I/O traces, detecting breakpoints in the encoded features difference series, evaluating a utility of the breakpoints, and performing an action based on the breakpoint utility evaluation.

Integrated event processing and policy enforcement
11550692 · 2023-01-10 · ·

A method may include receiving an event from an event source. The event may correspond to event data. The event source may be a container executing an image. The image may correspond to image metadata including attributes describing the image. The method may further include combining the event data with the image metadata to obtain enriched data, detecting, using the enriched data, a deviation from a policy, and in response to detecting the deviation from the policy, performing an action to enforce the policy.

Potential replacement algorithm selection based on algorithm execution context information

According to some embodiments, an available algorithm data store may contain information about a pool of available algorithms. An algorithm selection platform coupled to the available algorithm data store may access the information about the pool of available algorithms and compare the information about each of the pool of available algorithms with at least one requirement associated with the current algorithm executing in the real environment. The algorithm selection platform may then automatically determine algorithm execution context information and, based on said comparison and the algorithm execution context information, select at least one of the pool of available algorithms as a potential replacement algorithm. An indication of the selected at least one potential replacement algorithm may then be transmitted (e.g., to be evaluated in a shadow environment by an algorithm evaluation platform).

Creating robustness scores for selected portions of a computing infrastructure

A system for generating a robustness score for hardware components, nodes, and clusters of nodes in a computing infrastructure is provided. The system includes a memory and at least one processing device coupled to the memory. The processing device is to obtain first telemetry data associated with a selected portion of a computing infrastructure, and the selected portion includes a first node and a first hardware component. The processing device is further to obtain first metadata associated with the selected portion, input one or more telemetry inputs corresponding to the first telemetry data into a machine learning model, input one or more metadata inputs corresponding to the first metadata into the machine learning model, and generate, from the machine learning model, a first robustness score for the first hardware component representing a health state of the first hardware component.

Hyper-parameter space optimization for machine learning data processing pipeline
11544136 · 2023-01-03 · ·

A data processing pipeline may be generated to include an orchestrator node, a preparator node, and an executor node. The preparator node may generate a training dataset. The executor node may execute machine learning trials by applying, to the training dataset, a machine learning model and/or a different set of trial parameters. The orchestrator node may identify, based on a result of the machine learning trials, a machine learning model for performing a task. Data associated with the execution of the data processing pipeline may be collected for storage in a tracking database. A report including de-normalized and enriched data from the tracking database may be generated. The hyper-parameter space of the machine learning model may be analyzed based on the report. A root cause of at least one fault associated with the execution of the data processing pipeline may be identified based on the analysis.

MACHINE LEARNING PERFORMANCE MONITORING AND ANALYTICS
20220414539 · 2022-12-29 · ·

A system comprising at least one hardware processor; and a non-transitory computer-readable storage medium having stored thereon program instructions, the program instructions executable by the at least one hardware processor to: receive a test dataset comprising data associated with test dataset of a machine learning model applied to target data, generate a set of expected values associated with the test dataset, and analyze the test dataset, based, at least in part, on the set of expected values, to detect a variance between the test dataset and the set of expected values, wherein the variance is indicative of an accuracy parameter of the machine learning model.

SYSTEMS AND METHODS FOR MONITORING SOFTWARE SERVICES
20220413977 · 2022-12-29 ·

A computer-implemented method for generating a monitor for at least one software service from a monitor template, includes, in at least some aspects: providing a monitor template. Further, in certain instances, the method includes determining one or more endpoints included in code for a first software service of the at least one software service. In addition, in some aspects, the method includes generating a first monitor for the first software service code using the monitor template based at least upon a first endpoint of the one or more endpoints included in the first software service code.

Application assessment system to achieve interface design consistency across micro services
11537364 · 2022-12-27 · ·

Methods and systems are used for achieving interface design consistency across micro services. As an example, a user interface (UI) training request including at least a set of reference objects is received, the set of reference objects including at least a set of reference UIs. A user interface behavior reference model (UIBRM) is trained to generate a trained UIBRM by analyzing reference UI displays rendered on a browser in response to interactions with the set of reference UIs. A UI displays assessment request including at least a set of development objects is received, the set of development objects including at least a set of development UIs. A UI displays assessment is performed to generate an assessment of development UI displays by comparing the trained UIBRM to the development UI displays rendered on the browser in response to interactions with at least a subset of the set of development UIs.

Technologies for providing advanced management of power usage limits in a disaggregated architecture

Technologies for providing advanced management of power usage limits in a disaggregated architecture include a compute device. The compute device includes circuitry configured to execute operations associated with a workload in a disaggregated system. The circuitry is also configured to determine whether a present power usage of the compute device is within a predefined range of a power usage limit assigned to the compute device. Additionally, the circuitry is configured to send, to a device in the disaggregated system and in response to a determination that the present power usage of the present compute device is not within the predefined range of the power usage limit assigned to the present compute device, offer data indicative of an offer to reduce the power usage limit assigned to the present compute device to enable a second power utilization limit of another compute device in the disaggregated system to be increased.

Systems and methods for unsupervised anomaly detection using non-parametric tolerance intervals over a sliding window of t-digests

Systems and methods for unsupervised training and evaluation of anomaly detection models are described. In some embodiments, an unsupervised process comprises generating an approximation of a data distribution for a training dataset including varying values for a metric of a computing resource. The process further determines, based on the size of the training dataset, a first quantile probability and a second quantile probability that represent an interval for covering a prescribed proportion of values for the metric within a prescribed confidence level. The process further trains a lower limit of the anomaly detection model using a first quantile that represents the first quantile probability in the approximation of the data distribution and an upper limit using a second quantile that represents the second quantile probability in the approximation. The trained upper and lower limits may be used to monitor input data for anomalous behavior and, if detected, trigger responsive action(s).