G06F11/3433

Creating robustness scores for selected portions of a computing infrastructure

A system for generating a robustness score for hardware components, nodes, and clusters of nodes in a computing infrastructure is provided. The system includes a memory and at least one processing device coupled to the memory. The processing device is to obtain first telemetry data associated with a selected portion of a computing infrastructure, and the selected portion includes a first node and a first hardware component. The processing device is further to obtain first metadata associated with the selected portion, input one or more telemetry inputs corresponding to the first telemetry data into a machine learning model, input one or more metadata inputs corresponding to the first metadata into the machine learning model, and generate, from the machine learning model, a first robustness score for the first hardware component representing a health state of the first hardware component.

Green cloud computing recommendation system

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating customized recommendations for environmentally-conscious cloud computing frameworks for replacing computing resources of existing datacenters. One of the methods involves receiving, through a user interface presented on a display of a computing device, data regarding a user's existing datacenter deployment and the user's preferences for the new cloud computing framework, generating one or more recommendations for environmentally-conscious cloud computing frameworks based on the received data, and presenting such recommendations through the user interface for the user's review and consideration.

Method to optimize restore based on data protection workload prediction
11693743 · 2023-07-04 · ·

An intelligent method of selecting a data recovery site upon receiving a data recovery request. The backup system collects historical activity data of the storage system to identify work load of every data recovery site. A predicted activity load for each data recovery site is then generated using the collected data. When a request for data recovery is received, the system first identifies which data recovery site has copies of the files to be recovered. Then it uses the predicted work load for these data recovery sites to determine whether to use a geographically local site or a site that may be remote geographically, but has a lower work load.

Resource allocation in microservice architectures

A method for adjusting the resource allocation ratio between microservices used to run an application. A microservice test sequence is defined which has an order that follows the traffic flow through the microservices. Each microservice is analyzed in order of the test sequence to classify whether or not it is acting as a bottleneck for the application. This is done by measuring whether or not decrementing the microservice's resource causes the application throughput to decrease. For each microservice classified as a bottleneck and in reverse order of the test sequence, its resource is successively incremented until the application throughput starts to increase, indicating it is no longer acting as a bottleneck. The resource allocation ratio can then be adjusted to reflect this procedure.

Systems and methods for managing distributed database deployments

Various aspects provide for implementation of a cloud service for running, monitoring, and maintaining cloud distributed database deployments and in particular examples, provides cloud based services to run, monitor and maintain deployments of the known MongoDB database. Various embodiments provide services, interfaces, and manage provisioning of dedicated servers for the distributed database instances (e.g., MongoDB instances). Further aspects, including providing a database as a cloud service that eliminates the design challenges associated with many distributed database implementations, while allowing the client's input on configuration choices in building the database. In some implementations, clients can simply identity a number of database nodes, capability of the nodes, and within minutes have a fully functioning, scalable, replicated, and secure distributed database in the cloud.

Metadata control in a load-balanced distributed storage system
11544226 · 2023-01-03 · ·

A plurality of computing devices are communicatively coupled to each other via a network, and each of the plurality of computing devices is operably coupled to one or more of a plurality of storage devices. A plurality of failure resilient address spaces are distributed across the plurality of storage devices such that each of the plurality of failure resilient address spaces spans a plurality of the storage devices. The plurality of computing devices maintains metadata that maps each failure resilient address space to one of the plurality of computing devices. The metadata is grouped into buckets. Each bucket is stored in a group of computing devices. However, only the leader of the group is able to directly access a particular bucket at any given time.

Detecting performance regressions in software for controlling autonomous vehicles
11544173 · 2023-01-03 · ·

The disclosure relate to detecting performance regressions in software used to control autonomous vehicles. For instance, a simulation may be run using a first version of the software. While the simulation is running, CPU and memory usage by one or more functions of the first version of the software may be sampled. The sampled CPU and memory usage may be compared to CPU or memory usage by each of the one or more functions in a plurality of simulations each running a corresponding second version of the software. Based on the comparisons, an anomaly corresponding to a performance regression in the first version of the software relating to one of the one or more functions may be identified. In response to detecting the anomaly, the first version of the software and the one of the one or more functions may be flagged for review.

Selecting a host based on quality of stored data
11544258 · 2023-01-03 · ·

A database node selects one of a number of database mirror hosts to perform a database read query. In theory, mirror hosts store redundant database entries. In practice, some mirror hosts fail to receive or retain some entries, e.g. if a mirror host was down for maintenance, a mirror host was overburdened, etc. The health of a mirror host, quantified as a health score, represents a likelihood that an insert statement was received and retained. Health scores are generated based on health metrics—data periodically retrieved from mirror hosts used to infer how well the database was operating, e.g. a count of entries that were inserted, a count of errors encountered while the entries were inserted, etc. If the database read query specifies entries inserted during a specific time period, the mirror host may be selected based on the health scores representing that time period.

TENANT DATABASE PLACEMENT IN OVERSUBSCRIBED DATABASE-AS-A-SERVICE CLUSTER

Placement of a tenant database in an oversubscribed, database-as-a-service cluster comprised of a plurality of nodes is described. The placement may be based on per-node estimates of a probability of resource demand violation if the tenant database is placed on the node. Past resource usage of similar tenant databases subscribed to the cluster that are collected and stored as compressed traces may be used to obtain the estimates. In some examples, based on the estimates, a per-node expected number of resource violations is determined and compared across nodes, where the determined placement minimizes the number of resource violations. In other examples, when the tenant database is being placed in parallel with other tenant databases, a score assigned to each valid configuration for the placement may be modified based on the estimates, where the determined placement is the configuration having a lowest score.

SELECTING A NODE DEDICATED TO TRANSACTIONS OF A PARTICULAR WORK GROUP FOR EXECUTING A TARGET TRANSACTION OF ANOTHER WORK GROUP

A computing network includes nodes of different work groups. Nodes of a work group are dedicated to transactions of the work group. If a node of a first work group is predicted to have an idleness window, a second work group may borrow the node to execute a transaction of the second work group. At least a subset of steps of the transaction may be categorized into a step group. Trees of a transaction may be categorized into one or more tree groups. A node is selected for executing a transaction, if the predicted idleness duration of the node is sufficient relative to the predicted runtime of the transaction, the step group, and/or tree group. A credit system is maintained. A first work group transfers a credit to a second work group when borrowing a node of the second work group for executing a transaction of the first work group.