VISUAL TIMELINE BASED SYSTEM TO RECOMMEND POTENTIAL ROOT CAUSE OF FAILURE AND REMEDIATION OF AN OPERATION USING CHANGE MANAGEMENT DATABASE
20200097350 ยท 2020-03-26
Assignee
Inventors
- Vallinayagam Pitchaimani (Chennai, IN)
- Praveen Chavali (Hyderabad, IN)
- Sunil Meher (Hyderabad, IN)
- Siva Datla (Hyderabad, IN)
Cpc classification
G06F11/321
PHYSICS
G06F11/2635
PHYSICS
International classification
G06F11/07
PHYSICS
G06F11/22
PHYSICS
Abstract
A method and computing system to recommend potential root causes of failure of an operation of a computer system is provided. An indication of a failed operation is received. A number of change orders (CO) that change one or more configuration items (CIs) that are associated with the operation from a baseline state to the current state of operation is determined. A root cause analysis (RCA) graph is displayed for a selected CO and has a plurality of CIs and connections therebetween in a first display area and a change order timeline in a second display area. Any of the number of CIs that were changed are highlighted. A potential cause listing in a third display area provides a list of highlighted CIs and a percentage indication by each highlighted CI that represents a calculated percentage that the CI is a potential cause of the failed operation.
Claims
1. A method by a computer of a computing system, the method comprising: receiving an indication of an operation that failed in a computer system associated with the operation; determining, starting at a baseline state of the operation and ending at a current state of the operation, a number of change orders that change one or more configuration items that are associated with the operation; displaying a baseline state root cause analysis (RCA) graph, the baseline state RCA graph representing a last known state of the operation where the operation passed testing, the baseline state RCA graph comprising a plurality of configuration items and connections between the plurality of configuration items in a first area of a display and a change order timeline in a second area of the display; for each of the number of change orders and responsive to receiving a user selection of a change order in the change order timeline, displaying a RCA graph of a state of the computer system associated with the operation, wherein the RCA graph of the state of the computer system associated with the operation comprises: a number of the plurality of configuration items that remain associated with the operation and connections between the number of the plurality of configuration items displayed in the first area , wherein any of the number of the plurality of configuration items that remain that were changed by the change order are highlighted; an indication of the change order displayed in the change order timeline; and a potential cause listing in a third area of the display, wherein the potential cause listing displays a list of configuration items that are highlighted and a percentage indication by each configuration item in the list of configuration items, each percentage indication representing a calculated percentage that the configuration item is a potential cause of the failure.
2. The method of claim 1 further comprising displaying a number of new configuration items added by the change order in the first area and connections between the number of new configuration items and the number of the plurality of configuration items displayed in the first area and highlighting any of the number of new configuration items that were changed by the change order.
3. The method of claim 1 wherein the RCA graph of the state of the computer system associated with the operation further comprises a suggestion action displayed in a fourth area of the display responsive to a percentage indication of a configuration item listed in the third area of the display being above a first threshold level.
4. The method of claim 3 further comprising: responsive to receiving an indication to perform the suggestion action, initiating performance of the suggestion action.
5. The method of claim 4, further comprising: performing a test routine of the operation that failed after performing the suggestion action; and providing an indication of whether the operation passed the test routine.
6. The method of claim 5, further comprising: updating the RCA graph of the state of the computer system associated with the change order for each of the change orders based on results of the test routine.
7. The method of claim 1 further comprising calculating the calculated percentage for each configuration item in the list using a user configurable weighted calculation based on whether the change order is classified as a major incident, whether the change order is classified as an unauthorized change order, whether the change order is classified as an emergency order, whether any attribute of the configuration item has been changed, whether any of the configuration items being displayed has been added, whether any configuration item was removed, and a focal distance the configuration item is from a focal configuration item.
8. The method of claim 7 wherein calculating the calculated percentage comprises: determining
9. The method of claim 8 further comprising: receiving a change in one of the user configurable weights; responsive to receiving the change in the one of the user configurable weights, recalculating the calculated percentage; and updating the displayed percentage indication based on the recalculating of the calculated percentage.
10. The method of claim 1 wherein the RCA graph of the state of the computer system associated with the operation further comprises an indication displayed in a fifth area of the display indicating that a higher percentage configuration item is in another RCA graph associated with another change order.
11. The method of claim 1 wherein each of the RCA graphs of a state of the computer system associated with the operation is displayed with a default number of configuration item levels, the method comprising: responsive to a highest calculated percentage being below a second threshold level, displaying on a currently displayed RCA graph a suggestion that a higher number of configuration item levels should be displayed; and responsive to receiving an indication of a number of configuration item levels to display, for each RCA graph, displaying the number of configuration item levels when the RCA graph is being displayed.
12. A configuration management system configured to recommend potential root causes of failure of an operation of a computer system, the configuration management system comprising: an incident problem management engine configured to receive a message indicating an operation that failed in the computer system; a configuration management database (CMDB) configured to: receive and store change orders; store associations of configuration items with change orders and operations of computer systems; and store information regarding a baseline state for a plurality of operations performed by the computer system, each baseline state representing a last known state of an operation of the plurality of operations in which the operation passed testing, the information for each baseline state comprising a listing of a plurality of configuration items and connections between the plurality of configuration items; and a change management engine configured to: responsive to receiving an indication of the message indicating the operation that failed, fetch information from the CMDB regarding a baseline state for the operation that failed; identify, from the CMDB and starting at a baseline state of the operation and ending at a current state of the operation, a number of change orders that change one or more configuration items that are associated with the operation; for each of the number of change orders, fetching information from the CMDB regarding configuration items changed by the change order; display a baseline state root cause analysis (RCA) graph in a first area of a display and a change order timeline in a second area of the display; responsive to receiving a user selection of a change order in the timeline, display a RCA graph of a state of the computer system associated with the operation, wherein the RCA graph of the state of the computer system associated with the operation comprises: a number of the plurality of configuration items that remain associated with the operation and connections between the number of the plurality of configuration items displayed in the first area, wherein any of the number of the plurality of configuration items that remain that were changed by the change order are highlighted; a number of new configuration items when the change order indicates new configuration items have been added and connections between the number of new configuration items and the number of the plurality of configuration items displayed in the first area, wherein any of the number of new configuration items that were added by the change order are highlighted; an indication of the change order displayed in the change order timeline; and a potential cause listing in a third area of the display, wherein the potential cause listing displays a list of configuration items that are highlighted and a percentage indication by each configuration item in the list of configuration items, each percentage indication representing a calculated percentage that the configuration item is a potential cause of the failure.
13. The configuration management system of claim 12, wherein the CMDB is further configured to: receive a new change order; parse the change order to determine configuration items changed by the new change order and to determine configuration items added or deleted from the computer system; and store information regarding the configuration items changed by the new change order and information regarding the configuration items added or deleted from the computer system.
14. The configuration management system of claim 12, wherein the change management engine is further configured to: identify when a percentage indication of a configuration item in the list of configuration items is above a first threshold; and display a suggestion action in a fourth area of the display responsive to the percentage indication being above the first threshold.
15. The configuration management system of claim 14, wherein the suggestion action comprises a user-selectable icon and wherein the configuration management engine is further configured to: detect a user selection of the user-selectable icon; and responsive to detecting the user selection, initiating implementation of the suggestion action.
16. The configuration management system of claim 12 wherein the configuration management engine is further configured to calculate the calculated percentage for each configuration item changed by a change order by: determining
17. The configuration management system of claim 16 wherein the configuration management engine is further configured to: receive a change in one of the user configurable weights; responsive to receiving the change in the one of the user configurable weights, recalculate the calculated percentage; and update the displayed percentage indication based on the recalculating of the calculated percentage.
18. The configuration management system of claim 12 wherein the configuration management engine is further configured to display an indication in a fifth area of the display indicating that a higher percentage configuration item is in another RCA graph associated with another change order.
19. The configuration management system of claim 18 wherein the indication in the fifth area of the display comprises a user-selectable icon, wherein the configuration management engine is further configured to display the another RCA graph responsive to detecting a user selection of the user-selectable icon.
20. The configuration management system of claim 12 wherein each of the RCA graphs of a state of the computer system is displayed with a default number of configuration item levels, wherein the configuration management engine is further configured to: responsive to a highest calculated percentage being below a second threshold level, display on a currently displayed RCA graph a suggestion that a higher number of configuration item levels should be displayed; and responsive to receiving an indication of a number of configuration item levels to display, for each RCA graph, displaying the number of configuration item levels in the RCA graph being displayed.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] The accompanying drawings are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this application. In the drawings:
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
DETAILED DESCRIPTION OF EMBODIMENTS
[0034] Embodiments of the present inventive concepts now will be described more fully hereinafter with reference to the accompanying drawings. Throughout the drawings, the same reference numbers are used for similar or corresponding elements. The inventive concepts may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the inventive concepts to those skilled in the art. Like numbers refer to like elements throughout.
[0035] It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present inventive concepts.
[0036] As used herein, the term or is used nonexclusively to include any and all combinations of one or more of the associated listed items.
[0037] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used herein, the singular forms a, an and the are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms comprises, comprising, includes or including when used herein, specify the presence of stated features, integers, steps, operations, elements, or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or groups thereof.
[0038] Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. It will be further understood that terms used herein should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
[0039] Some embodiments described herein provide methods or change management systems for recommending potential root cause of failure of an operation of a computer system. According to some embodiments, the configuration management system includes an incident problem management engine configured to receive a message indicating an operation that failed in the computer system. The configuration management system further includes a configuration management database (CMDB) configured to receive and store change orders; store associations of configuration items with change orders and operations of computer systems; and store information regarding a baseline state for a plurality of operations performed by the computer system, each baseline state representing a last known state of an operation of the plurality of operations in which the operation passed testing, the information for each baseline state comprising a listing of a plurality of configuration items and connections between the plurality of configuration items. The configuration management system further includes a change management engine is configured to responsive to receiving an indication of the message indicating the operation that failed, fetch information from the CMDB regarding a baseline state for the operation that failed. The change management is further configured to identify, from the CMDB and starting at a baseline state of the operation and ending at a current state of the operation, a number of change orders that change one or more configuration items that are associated with the operation. The change management is further configured to, for each of the number of change orders, fetch information from the CMDB regarding configuration items changed by the change order. The change management is further configured to display a baseline state root cause analysis (RCA) graph in a first area of a display and a change order timeline in a second area of the display. The change management is further configured to responsive to receiving a user selection of a change order in the timeline, display a RCA graph of a state of the computer system associated with the operation, wherein the RCA graph of the state of the computer system associated with the operation includes: a number of the plurality of configuration items that remain associated with the operation and connections between the number of the plurality of configuration items displayed in the first area, wherein any of the number of the plurality of configuration items that remain that were changed by the change order are highlighted; a number of new configuration items when the change order indicates new configuration items have been added and connections between the number of new configuration items and the number of the plurality of configuration items displayed in the first area, wherein any of the number of new configuration items that were added by the change order are highlighted; an indication of the change order displayed in the change order timeline; and a potential cause listing in a third area of the display, wherein the potential cause listing displays a list of configuration items that are highlighted and a percentage indication by each configuration item in the list of configuration items, each percentage indication representing a calculated percentage that the configuration item is a potential cause of the failure.
[0040]
[0041] As further described in
[0042] Initially, at operation 200, the CMDB 102 receives and stores change orders for operations of computer systems that the CMDB 102 services. Each change order documents changes in one or more operations of a computer system serviced by the CMDB 102. A change order may document which users or group of users is affected by the change, a classification of the change order, a type of the change order, an indication of who approved the change order, and what configuration items and relationships between configuration items are changed by the change order.
[0043] A configuration item may be a service, a device, a device component, software, a software update, a software patch, and the like. A relationship is the logical relation between two configuration items. For example, a computer server that contains a Windows operating system is a relationship. Whenever a configuration item is changed, a change order documents the change. Each change order may be associated with multiple configuration items.
[0044] In an embodiment, a classification of a change order specifies whether the change order is a major incident, an unauthorized change order, an emergency order, or none of the preceding classifications (i.e., is not a major incident, an unauthorized change order, or an emergency order). A major incident is defined by the entity controlling the CMDB 102. Each change order requires specified conditions to be met. If these conditions are not met, the change order is an unauthorized change order. When a business decides a change is urgent, the change order is classified as an emergency order.
[0045] The CMDB 102 performs operations on change orders. Turning to
[0046] Returning to
[0047] At operation 204, the CMDB 102 stores information regarding a baseline state for operations performed by the computer systems serviced by the CMDB 102. A baseline state is the last known state where the operation was working properly.
[0048] At operation 206, the incident/problem management engine 100 receives an indication of an operation that failed in a computer system associated with the operation. The failed operation may be a failure of a service or a failure of a device or a failure of a component of a device. The indication may come from a monitoring system that monitors components and services in the computer system and issues alarms when failures occur, from a built-in-test routine, from a user, from a help-desk, etc. At operation 208, the incident/problem management engine 100 notifies the change management engine 104 of the failed operation.
[0049] At operation 210, the change management engine 104 transmits a request to the CMDB 102 for information regarding the failed operation. At operation 212, the CMDB 102 transmits the information requested to the change management engine 102. The information requested includes configuration items associated with the failed operation, change orders associated with the operation or the configuration items associated with the failed operation, and a baseline state of the operation.
[0050] At operation 214, the change management engine 104 determines, from the information from the CMDB 102, starting at the baseline state of the failed operation and ending at a current state of the failed operation, a number of change orders that changed one or more of the configuration items that are associated with the operation.
[0051] Turning to
[0052] Returning to
[0053] Turning to
[0054] Turning to
[0055] W.sub.1-W.sub.7 are user configurable weights, MI is a number of major incidents reported for the change order between the change order number displayed and the current state and is 0 if there are no major incidents, UC is 1 if the change order is classified as an unauthorized change order and 0 if the change order is not classified as a an unauthorized change order, EM is 1 if the change order is classified as an emergency change and 0 if the change order is not classified as an emergency change, AC is 1 if any attribute of the configuration item has been changed and 0 if no attributes of the configuration item has been changed, AD is 1 if a configuration item has been added and 0 if a configuration item has not been added, DE is 1 if any configuration item was removed and 0 if no configuration item was remove, and DI a focal distance the configuration item is from a focal configuration item. For example, in one embodiment, the weights W.sub.1 to W.sub.7 may be W.sub.1=30, W.sub.2=10, W.sub.3=30, W.sub.4=2, W.sub.5=10, W.sub.6=10, and W.sub.7=9. The Pcalc calculation with these weights is
Other weightings can be used. In operation 902, the Pcalc for each configuration item in the list is calculated and normalized using the sum of all Pcalc calculated for the configuration items in the list.
[0056] A user may want to change the weightings. Turning to
[0057] Returning to
[0058]
[0059] Returning to
[0060] The suggestion action 700 has a user-selectable item. Turning to
[0061] Turning now to
[0062] Turning now to
[0063] Turning now to
[0064] In the embodiment shown in
[0065]
[0066] In the embodiment shown in
[0067]
[0068] In the embodiment shown in
[0069] In the embodiment shown in
[0070] Thus, example systems, methods, and non-transitory machine readable media for reducing occurrences have been described. The advantages provided include reducing occurrence of errors in determining a root cause of failure of an operation, reducing load on the networks used by displaying changes made by a change order and a percentage indication that the configuration items changed are the root cause of failure, providing an indication that a higher percentage possible root cause of failure is in another RCA graph of another change order and a link to that RCA graph. The advantages result in faster identification of a root cause of a failure of a failed operation of a computer system.
[0071] As will be appreciated by one of skill in the art, the present inventive concepts may be embodied as a method, data processing system, or computer program product. Furthermore, the present inventive concepts may take the form of a computer program product on a tangible computer usable storage medium having computer program code embodied in the medium that can be executed by a computer. Any suitable tangible computer readable medium may be utilized including hard disks, CD ROMs, optical storage devices, or magnetic storage devices.
[0072] Some embodiments are described herein with reference to flowchart illustrations or block diagrams of methods, systems and computer program products. It will be understood that each block of the flowchart illustrations or block diagrams, and combinations of blocks in the flowchart illustrations or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general-purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart or block diagram block or blocks.
[0073] These computer program instructions may also be stored in a computer readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture including instruction means which implement the function/act specified in the flowchart or block diagram block or blocks.
[0074] The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions/acts specified in the flowchart or block diagram block or blocks.
[0075] It is to be understood that the functions/acts noted in the blocks may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Although some of the diagrams include arrows on communication paths to show a primary direction of communication, it is to be understood that communication may occur in the opposite direction to the depicted arrows.
[0076] Computer program code for carrying out operations described herein may be written in an object-oriented programming language such as Java or C++. However, the computer program code for carrying out operations described herein may also be written in conventional procedural programming languages, such as the C programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a standalone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer. In the latter scenario, the remote computer may be connected to the user's computer through a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
[0077] Many different embodiments have been disclosed herein, in connection with the above description and the drawings. It will be understood that it would be unduly repetitious and obfuscating to literally describe and illustrate every combination and subcombination of these embodiments. Accordingly, all embodiments can be combined in any way or combination, and the present specification, including the drawings, shall be construed to constitute a complete written description of all combinations and subcombinations of the embodiments described herein, and of the manner and process of making and using them, and shall support claims to any such combination or subcombination.
[0078] In the drawings and specification, there have been disclosed typical embodiments and, although specific terms are employed, they are used in a generic and descriptive sense only and not for purposes of limitation, the scope of the inventive concepts being set forth in the following claims.