Patent classifications
G06F11/008
MULTI-DEVICE PROCESSING ACTIVITY ALLOCATION
Allocating processing activities among multiple computing devices can include identifying multiple computing activities of a computer-executable process and, for each computing activity identified, estimating in real time the computing resources needed. The identifying can be in response to detecting a computer-executable instruction executed by one multiple communicatively coupled computing devices, and the computer-executable instruction can be associate with the computer-executable process. A current condition and configuration of each of the computing devices can be determined in real time. For each computing device an effect induced by executing one or more of the plurality of activities can be predicted, the predicting based each computing device's current condition and configuration and performed by a machine learning model trained using data collected from prior real-time processing of example process activities. Based on the predicting, computing activities can be allocated in real time among the computing devices.
Node health prediction based on failure issues experienced prior to deployment in a cloud computing system
To improve the reliability of nodes that are utilized by a cloud computing provider, information about the entire lifecycle of nodes can be collected and used to predict when nodes are likely to experience failures based at least in part on early lifecycle errors. In one aspect, a plurality of failure issues experienced by a plurality of production nodes in a cloud computing system during a pre-production phase can be identified. A subset of the plurality of failure issues can be selected based at least in part on correlation with service outages for the plurality of production nodes during a production phase. A comparison can be performed between the subset of the plurality of failure issues and a set of failure issues experienced by a pre-production node during the pre-production phase. A risk score for the pre-production node can be calculated based at least in part on the comparison.
Disk drive failure prediction with neural networks
Techniques are described herein for predicting disk drive failure using a machine learning model. The framework involves receiving disk drive sensor attributes as training data, preprocessing the training data to select a set of enhanced feature sequences, and using the enhanced feature sequences to train a machine learning model to predict disk drive failures from disk drive sensor monitoring data. Prior to the training phase, the RNN LSTM model is tuned using a set of predefined hyper-parameters. The preprocessing, which is performed during the training and evaluation phase as well as later during the prediction phase, involves using predefined values for a set of parameters to generate the set of enhanced sequences from raw sensor reading. The enhanced feature sequences are generated to maintain a desired healthy/failed disk ratio, and only use samples leading up to a last-valid-time sample in order to honor a pre-specified heads-up-period alert requirement.
Machine-learning based optimization of data center designs and risks
In exemplary aspects of optimizing data centers, historical data corresponding to a data center is collected. The data center includes a plurality of systems. A data center representation is generated. The data center representation can be one or more of a schematic and a collection of data from among the historical data. The data center representation is encoded into a neural network model. The neural network model is trained using at least a portion of the historical data. The trained model is deployed using a first set of inputs, causing the model to generate one or more output values for managing or optimizing the data center with respect to design and risk aspects.
TOOL FOR BUSINESS RESILIENCE TO DISASTER
Methods, systems, and computer programs are presented for estimating downtime and recovery time after a disaster. One method includes an operation for calculating component fragility functions for components of a facility that are vulnerable to damage after a disaster. Further, the method includes calculating component recovery functions for the components of the facility. The component recovery functions indicate a probability of recovery after a disaster over time. The method further includes operations for calculating a facility fragility function and a facility recovery function based on the component fragility functions and the component recovery functions, and for determining a downtime for the facility for a given intensity associated with the disaster. Further, the method includes an operation for causing presentation of the downtime for the facility on a user interface (UI).
Systems and methods for performing a technical recovery in a cloud environment
A computer-implemented method for testing failover may include: determining one or more cross-regional dependencies and traffic flow of an application in a first region of a cloud environment, wherein the one or more cross-regional dependencies include a dependency of the application in the first region of the cloud environment to one or more applications in at least one other region of the cloud environment; determining a risk score associated with performing failover of the application to a second region of the cloud environment at least based on the determined one or more cross-regional dependencies and traffic flow of the application; comparing the determined risk score with a predetermined risk score; in response to determining that the determined risk score is lower than the predetermined risk score, performing failover of the application to the second region of the cloud environment; isolating the second region of the cloud environment from the first region of the cloud environment for a predetermined period of time; and monitoring operation of the application in the second region of the cloud environment during the predetermined period of time.
Electronic system for dynamic analysis and detection of transformed transient data in a distributed system network
Embodiments of the invention are directed to systems, methods, and computer program products for dynamic analysis and detection of transformed transient data in a distributed system network. The system is structured for validating, determining and evaluating temporal data transformations associated with technology resource components across iterations of technology applications for maintaining backward compatibility. The system comprises an execution module structured for executing technology resource components in a plurality of testing technology environments concurrently. The system further comprises an analysis module structured for evaluating iterations of a first technology resource component by comparing the transformed first testing output with the transformed second testing output to determine modifications to the first iteration of the first technology resource component in the second iteration of the first technology resource component that succeeds the first iteration.
Configuring new storage systems based on write endurance
A method performed by a computing device, of configuring a new design of a new data storage system (DSS) having initial configuration parameters is provided. The new design includes an initial plurality of storage drives. The method includes (a) collecting operational information from a plurality of remote DSSs in operation, the operational information including numbers of writes of various write sizes received by respective remote DSSs of the plurality of remote DSSs over time; (b) modeling a number of drive writes per day (DWPD) of the initial plurality of storage drives of the new DSS based on the collected operational information from the plurality of remote DSSs and the initial configuration parameters; (c) comparing the modeled number of DWPD to a threshold value; and (d) in response to the modeled number of DWPD exceeding the threshold value, reconfiguring the new DSS with an updated design.
Cloud-based providing of one or more corrective measures for a storage system
An illustrative method includes detecting, by a cloud based storage system services provider based on a problem signature, that a storage system has experienced a problem that is associated with the problem signature; and deploying, without user intervention, one or more corrective measures that modify the storage system to resolve the problem.
INFORMATION PROCESSING SYSTEM
According to an embodiment, when a storage status of a first storage unit is recognized as a protected state, a control unit writes data to a second storage unit. When a read target address is recorded in a data migration log area, the control unit reads data from the second storage unit. When the read target address is not recorded in the data migration log area, the control unit reads data from the first storage unit.