SYSTEM AND METHOD FOR MONITORING THE STATUS OF MULTIPLE SERVERS ON A NETWORK

20250193070 ยท 2025-06-12

Assignee

Inventors

Cpc classification

International classification

Abstract

A system and method for monitoring a plurality of servers by a monitoring server in a computer network. A list of servers and a plurality of services to monitor in the computer network is generated at the monitoring server. A status query is transmitted sequentially by the monitoring server to each of the plurality of servers, the status query including the plurality of services to monitor al each server. A status message report is received from each of the plurality of servers in response to each status query. An event is reported in an event log for each server that has an abnormal service status. The transmission of the status query to each server is performed by the monitoring server at a specified service time interval.

Claims

1. A method comprising: determining to monitor at least one of a plurality of services of at least one of a plurality of servers; transmitting a status inquiry at least about a session layer to the at least one of the plurality of servers; receiving a status of the at least one of the plurality of servers, the status indicating whether an abnormal condition is detected; and reporting where the status is abnormal.

2. The method of claim 1, wherein at least one of transmitting the status inquiry and reporting where the status is abnormal is implemented at a specified service time interval.

3. The method of claim 1, wherein determining the at least one of the plurality of services to monitor is based on at least one of registry settings of a monitoring server.

4. The method of claim 1, further comprising reconfiguring a specified service time interval at which a monitoring server is to at least one of transmit the status inquiry and report where the status is abnormal.

5. The method of claim 1, further comprising reconfiguring the at least one of the plurality of services.

6. The method of claim 1, wherein reporting where the status is abnormal comprises reporting if an error occurred while checking the status.

7. The method of claim 1, wherein reporting where the status is abnormal comprises reporting if the status is other than running.

8. The method of claim 1, wherein reporting where the status is abnormal comprises reporting at a start of a monitoring of the at least one of the plurality of services.

9. The method of claim 1, wherein reporting the event in the event log comprises reporting at an end of a monitoring of the at least one of the plurality of services.

10. A non-transitory computer readable medium having computer readable instructions that, when executed by one or more processors cause the one or more processors to implement: determining to monitor at least one of a plurality of services of at least one of a plurality of servers; transmitting a status inquiry at least about a session layer to the at least one of the plurality of servers; receiving a status of the at least one of the plurality of servers, the status indicating whether an abnormal condition is detected; and reporting where the status is abnormal.

11. The non-transitory computer program product of claim 10, wherein at least one of transmitting the status inquiry and reporting where the status is abnormal is implemented at a specified service time interval.

12. The non-transitory computer program product of claim 10, wherein the computer readable instructions further, when executed by the one or more processors, implement reconfiguring a specified service time interval at which a monitoring server is to at least one of transmit the status inquiry and report the event.

13. The non-transitory computer program product of claim 10, wherein the computer readable instructions further, when executed by the one or more processors, implement reconfiguring the at least one of the plurality of services.

14. The non-transitory computer program product of claim 10, wherein reporting where the status is abnormal comprises reporting if an error occurred while checking the status.

15. The non-transitory computer program product of claim 10, wherein reporting where the status is abnormal comprises reporting if the status is other than running.

16. The non-transitory computer program product of claim 10, wherein reporting where the status is abnormal comprises reporting at a start of a monitoring of the at least one of the plurality of services.

17. The non-transitory computer program product of claim 10, wherein reporting where the status is abnormal comprises reporting at an end of a monitoring of the at least one of the plurality of services.

18. A system, comprising: a processor; and a memory storing program instructions and communicably coupled to the processor; wherein the processor, when executing the instructions, implements: determining at least one of a plurality of services to monitor at least at one of a plurality of servers; transmitting a status inquiry at least about a session layer to the at least one of the plurality of servers; receiving a status of the at least one of the plurality of servers, the status indicating whether an abnormal condition is detected; and reporting where the status is abnormal.

19. The system according to claim 18, wherein the status inquiry at least about the session layer inquires whether a Service Control Manager of the at least one of the plurality of servers is accessible, wherein the Service Control Manager is at the session layer, and wherein the plurality of services are at a network layer.

20. The system according to claim 19, wherein the status message comprises either a first event code or a second event code respectively depending on whether either the Service Control Manager at the session layer is determined to be abnormal or, after determining whether the Service Control Manager at the session layer is not abnormal, that the at least one of the plurality of services at the network layer is abnormal, wherein the session layer is a Level 5 layer, wherein the network layer is a Level 3 layer, and wherein reporting where the status is abnormal comprises reporting in an event log where the status is abnormal.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0014] These and other advantages and aspects of the present invention will become apparent and more readily appreciated from the following detailed description of the invention taken in conjunction with the accompanying drawings, as follows.

[0015] FIG. 1 illustrates a main parameters setting page in accordance with an exemplary embodiment of the invention.

[0016] FIG. 2 illustrates a folder selection interface page to enable the user to select a location for installation of the heartbeat monitor service executable file in accordance with an exemplary embodiment of the invention.

[0017] FIG. 3 illustrates a special event log for recording the operational status of a service on a monitored server in accordance with an exemplary embodiment of the invention.

[0018] FIG. 4 illustrates an events properly page for an event recorded in the event log in accordance with an exemplary embodiment of the invention.

[0019] FIG. 5 illustrates a registry editor page in accordance with an exemplary embodiment of the invention.

[0020] FIG. 6 illustrates processing logic for the heartbeat monitor service in accordance with an exemplary embodiment of the invention.

[0021] FIGS. 7A-7C illustrate user interface displays for editing the service time interval and services to be monitored on each server in an exemplary embodiment of the invention.

[0022] FIG. 8 illustrates an exemplary command line prompt to check the status of other services that are not run in an automatic mode.

[0023] FIG. 9 illustrates an exemplary server list display for administration of the heartbeat monitor service.

DETAILED DESCRIPTION OF THE INVENTION

[0024] The following description of the invention is provided as an enabling teaching of the invention and its best, currently known embodiment. Those skilled in the art will recognize that many changes can be made to the embodiments described while still obtaining the beneficial results of the present invention. It will also be apparent that some of the desired benefits of the present invention can be obtained by selecting some of the features of the present invention without utilizing other features. Accordingly, those who work in the art will recognize that many modifications and adaptations of the invention are possible and may even be desirable in certain circumstances and are part of the present invention. Thus, the following description is provided as illustrative of the principles of the invention and not in limitation thereof since the scope of the present invention is defined by the claims.

[0025] In an exemplary embodiment, the heartbeat monitor service of the invention is loaded on a specified.Net Framework based monitor server. Using a simple text-based list of servers to monitor and a parameter based list of Windows NT services, the heartbeat monitor service will check the status of each service per server. If a service is active and running, the monitor will move on to the next service. If the service is in any other state other than running, a report is made in a special event log. The heartbeat monitor service is a very simple concise method of checking the status of certain services on selected servers. More specifically, the heartbeat monitor service is designed to check the status of the server service on selected servers in order to determine if the server is viable on the network. If the heartbeat monitor service reports in the event log that the server service is not functioning for whatever reason, the chances are pretty certain that the server is not functioning on the network and requires attention.

[0026] The heartbeat monitor service works at a different level on the OSI (Open Systems Interconnection) Model than most other tools. This allows the heartbeat monitor service to provide a more reliable monitoring solution.

[0027] The monitor works at the Session Layer (layer 5) of the OSI model. By interrogating the layer 5 session, the monitor is able to determine that all lower network layers arc functional and that layer 5 of the operating system is accepting client sessions and directing them to the proper resource on the operating system. This aspect is unique in that all other commercial products stop at the network layer (layer 3) and never test the actual operational state of the operating system.

[0028] The heartbeat monitor service is able to provide more accurate alarms by limiting the interrogation only to critical components of the device being monitored. By limiting the scope of interrogation and providing the service in a resource friendly package, the invention is able to ensure that the monitor does not provide false positive alarms.

[0029] The heartbeat monitor service can scale to a large server infrastructure. It can also be tuned as needed to meet the needs of the specific network being monitored.

[0030] The monitor is able to be scalable because it incorporates simple methods to make changes related to all aspects of the tool. For example, configurable settings include: (1) services lo monitor; (2) remote hosts to poll; and (3) polling intervals.

[0031] Other available monitoring tools write the entire scope of the polling into scripts that arc difficult to maintain and sometimes not changeable. A unique aspect of the heartbeat monitor service is the ability to manipulate the polling criteria via a user friendly graphical user interface, and the ability to change almost all parameters as needed.

[0032] The heartbeat monitor service includes quality reporting capability. The heartbeat monitor service offers versatility by way of reporting real time events, and also generating reports that determine the operational state of the remote agents.

[0033] For comparison, in the Microsoft Operations Manager (MOM) monitoring tool, all data reporting is driven by remote agents. This causes a constraint because the remote agent could fail, which would result in no reports being available. Because it functions in an agent driven environment, the manager could only respond if the agent object called.

[0034] The heartbeat monitor service utility addresses this deficiency by providing a parent driven environment that constantly polls the child host in a resource friendly method. This provides a higher availability of the child host.

[0035] The below table illustrates limitations from other products and methods, and reflects the capabilities of the heartbeat monitor service:

TABLE-US-00001 TABLE 1 Technical Comparative Option Disadvantage Products Heartbeat Monitor Service Capabilities Host Agent If the Host is Microsoft Does not rely on agents for monitoring. not MOM Reporting identifies the agents not running. responsive, no Alarms can be generated. SNMP Server Shareware SNMP can fail without impacting server health functionality products; and thus cause false positives. The invention is not other licensed uses a core service that other products do not impacted by monitoring provide. degraded suites. service. PING Only reports Scripting Ping only monitors at OSI layer 2 which is at OSI Layer technologies; network driven. On Windows-based machines, 2. other licensed as long as power is given to the network monitoring interface card, it will respond. This is because suites. the card has internal memory. The heartbeat monitor service uses layer 5, which is the session layer of the operating system. The session layer is responsible for accepting client requests and directing them to the proper resource. Thus, it is a more accurate indication of the status of actual server functionality. Monitoring Numerous Microsoft MOM and other type server monitors include a Suites false MOM, variety of tests that are susceptible to numerous Health positives. No NETIQ false positives. They do not take into account check regard for network latency or temporary resource spikes. network The heartbeat monitor service is driven at the latency, session level, which incorporates a longer momentary timeout and provides a session level load averages. connection.

[0036] In an exemplary embodiment, the heartbeat monitor service is distributed in a Microsoft install file (.msi). Running this.msi file will properly install the service on the monitoring server. The user simply follows the prompts of the install process. In most cases, accepting the default entries is all that is required for a proper installation.

[0037] FIG. 1 illustrates the main parameters setting page. The first entry. Event Log folder name 102, is the name given to the special Event Log folder which will be created upon the start of the service. Service Time Interval (ms) 104, is the interval time in milliseconds in which the service will run its monitoring process. The default entry of 10000, which is equivalent to 10 seconds, should be changed to a more appropriate time. Machine List File Path 106, is the location of the text based list of servers which will be monitored. The default location points to the place where the heartbeat monitor service executable will be deployed. Finally, Service List (semi-colon delimited) 108, is the list of services which will be monitored on each of the servers in the server list

[0038] As illustrated in FIG. 2, the final installation page contains the location 202 to which the service executable will be installed. The default location should be fine in most eases and will match the default location for the MachincList.txt file.

[0039] Once the install process has completed deploying the files, a prompt will appear for the user to input the credentials that the heartbeat monitor service will use to access the Service Control Manager of each monitored server. Note that in most eases this should be a Windows NT domain ID that has administrative access to the servers. The user inputs the Username using the format domain userID. The user inputs the Password and then confirms the Password. Should the two password inputs not match, or the passwords do not match the ID, or the ID is not available on the network, the user will be prompted to try again. After three attempts, the installer will uninstall the service and prompt the user to run the installer again.

[0040] Once the installation is complete the user will receive a prompt. The user selects Close and the heartbeat monitor service will be completely installed and running on the monitoring server. The first monitor pass will begin after the Service Time Interval (ms) period has completed.

[0041] While the heartbeat monitor service is in a running state, it will launch a monitor process at every interval specified in the Service Time Interval (ms) parameter. If the parameter is set to 900000 milliseconds, which is equivalent to 15 minutes, the monitor process will launch every 15 minutes. When it launches the process will read the machinelist.txt file and step through each machine, opening the Service Control Manager and checking the status of each service listed in the Service List parameter. If the service status is returned as running, the process will move to the next service or the next server, depending on how many services are being checked. If the service status is in any other condition a report will be made in the EventLog.

[0042] FIG. 3 illustrates an exemplary special event log for recording the operational status of a service on a monitored server. When the heartbeat monitor service starts the first time, it creates a special folder in the Event Viewer 300. The name of the folder was established upon installation of the service in the parameters page, the default of which is Heartbeat Monitor Service.

[0043] There are only seven types of event entries that will be made in the EventLog 310 by heartbeat monitor service: [0044] 1. Event 91(Information) heartbeat monitor service has started. [0045] 2. Event 92(Information) heartbeat monitor service has stopped. [0046] 3. Event 1(Information) the monitoring process begins. [0047] 4. Event 2(Information) the monitoring process has completed. [0048] 5. Event 3(Warning) service was unable to open the MachineList.txt file. [0049] 6. Event 101(Error) an error occurred checking the status of the service. [0050] 7. Event 102(Information) the service is in a state other than running.

CROSS REFERENCE TO RELATED APPLICATION

[0051] In the scenario where the sole purpose of the heartbeat monitor service is to monitor the network availability of a set of servers, the event message that will be most pertinent will be Event 101.

[0052] When this message is written to the EventLog 310, it means that heartbeat monitor service attempted to check the status of the server service on GATCETS01 and was unable to even open the Service Control Manager. Assuming that appropriate credentials were supplied during installation, this error could only mean that the server is in a state which makes the Service Control Manager inaccessible from the network. In most cases, this will mean that a problem has occurred with the server service and that the server itself needs attention. Therefore, monitoring for this event message will be a priority in using this server monitoring tool. FIG. 4 illustrates an exemplary events property page for an event recorded in the event log.

[0053] The settings which were entered at the time of installation arc saved in the registry and can be changed at anytime using the Registry Editor. FIG. 5 illustrates an exemplary Registry Editor page 500. After opening the Registry Editor, the user/administrator can migrate to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\SCSHeartbeatSvc to change the parameter settings.

[0054] Under the Parameters key 502 are the three main parameters which the user may want to change:

[0055] MachineListPath 504The path which points to the location where the MachineList.txt file exists. This can also assign a specific name to the file, such as in this case it is called gpcservers.txt.

[0056] Services 506The list of services which are to monitored on each server. If there is more than one service to be monitored, each service should be separated by a semi-colon.

[0057] Service TimeInterval 508The lime interval, in milliseconds, which passes before the service starts the monitoring process. In the example above, the interval has been set to 10 minutes. Note that this interval begins when the previous process has completed or when the service starts.

[0058] FIG. 6 illustrates the processing logic for the heartbeat monitor service in an exemplary embodiment. The logic depicted is performed for each server sequentially at the pre-specified service time interval. Processing starts in step 600 with activation of the heartbeat monitor service. Interrogation of a system server is initiated in step 602 with a list of services to be checked for operational status. The services status checking continues until all services have been checked as indicated in step 604. In decision step 606, a determination is made as to whether or not the server is responding to the status query. If the server is not responding, an error event is generated and logged indicating that the heartbeat monitor service attempted to check the status of a server service and was unable to open the Service Control Manager for the server. If the server is found to be responding in decision step 606, then a determination is made as to whether or not a service is started as indicated in decision step 610. If the service is not started, an event is generated and logged indicating that the service is in a state other than running as indicated in step 612. Otherwise, the service is started and in step 614, processing logic will not write an event to the event log for a normal return from the status checking. Processing logic for checking the status of server services then continues as indicated in step 620.

[0059] FIGS. 7A-7C illustrate user interface displays for editing the service time interval and services to be monitored on each server using the registry editor. In the example of FIG. 7A, the user has edited the service time interval to a value of 600,000 milliseconds (i.e., 10 minutes). The service does not run continuously, but only at the specified and variable service time interval. In FIG. 7B, the user has edited the services to be monitored to a value of server. In FIG. 7C, the user has edited the services to be monitored to include server telnet. FIG. 8 illustrates an exemplary command line prompt to check the status of other services that are not run in an automatic mode.

[0060] FIG. 9 illustrates an exemplary server list display for administration of the heartbeat monitor service. The server list is a single flat text file.

[0061] The server monitoring system and method of the present invention have been described as computer-implemented processes. It is important to note, however, that those skilled in the art will appreciate that the mechanisms of the present invention are capable of being distributed as a program product in a variety of forms, and that the present invention applies regardless of the particular type of signal bearing media utilized to carry out the distribution. Examples of signal beating media include, without limitation, recordable-type media such as diskettes or CD ROMs, and transmission type media such as analog or digital communications links.

[0062] The corresponding structures, materials, acts, and equivalents of all means plus function elements in any claims below are intended to include any structure, material, or acts for performing the function in combination with other claim elements as specifically claimed.

[0063] Those skilled in the art will appreciate that many modifications to the exemplary embodiment are possible without departing from the spirit and scope of the present invention. In addition, it is possible to use some of the features of the present invention without the corresponding use of the other features. Accordingly, the foregoing description of the exemplary embodiment is provided for the purpose of illustrating the principles of the present invention and not in limitation thereof since the scope of the present invention is defined solely by the appended claims.