Implementing power up detection in power down cycle to dynamically identify failed system component resulting in loss of resources preventing IPL
11176009 · 2021-11-16
Assignee
Inventors
- Lee N. Helgeson (Rochester, MN, US)
- Derek Howard (Rochester, MN, US)
- Russel L. Young (Rochester, MN, US)
- George J. Romano (Rochester, MN)
- Mussie T. Negussie (West Sacramento, CA, US)
Cpc classification
G06F9/5011
PHYSICS
International classification
G06F9/50
PHYSICS
G06F11/14
PHYSICS
Abstract
A method and apparatus for implementing power up detection in a power down cycle to dynamically determine whether a failed component in a system prevents another Initial Program Load (IPL) or re-IPL, or result in a loss of resources. Predefined mandatory functions are called to collect power down/up data that prevents re-IPL, or results in the reduction of resources. A user is notified, allowing the customer to continually utilize the system, while ordering hardware to be replaced.
Claims
1. An apparatus for implementing power up detection in a power down cycle to dynamically determine whether a failed component in a system would prevent re-IPL or result in a loss of resources comprising: a system wide firmware component manager; the system wide firmware component manager configured to collect power down/up data that prevents re-IPL wherein collecting the power down/up data comprises identifying the system is able to run with the system's actual temperatures, but that incorrect system thermal readings would cause the system to not re-IPL; and the system wide firmware component manager configured to notify a user to prevent re-IPL, allowing the customer to continually utilize the system, while ordering hardware to be replaced.
2. The apparatus as recited in claim 1, includes the system wide firmware component manager configured to perform power off self-test, and build the system wide firmware component manager.
3. The apparatus as recited in claim 1, includes the system wide firmware component manager configured to collect any information that results in resource loss.
4. The apparatus as recited in claim 1, includes global interface communication between the system wide firmware component manager and firmware components.
5. The apparatus as recited in claim 4, wherein the global interface communication between the system wide firmware component manager and firmware components is configured to identify loss of resource.
6. The apparatus as recited in claim 5, wherein the global interface communication between the system wide firmware component manager and firmware components is configured to identify error.
7. The apparatus as recited in claim 6, wherein the system wide firmware component manager is configured to use identified loss of resource and identified error to prevent re-IPL.
8. The apparatus as recited in claim 1, include the system wide firmware component manager configured to alert the user and identify failed hardware and potential loss of resources.
9. The apparatus as recited in claim 8, further includes the system wide firmware component manager configured to notify the user not to power cycle due to hardware failure until failed hardware is available.
10. A method for implementing power up detection in a power down cycle to dynamically determine whether a failed component in a system would prevent re-IPL or result in a loss of resources comprising: providing a system wide firmware component manager; collecting power down/up data that prevents re-IPL, wherein collecting the power down/up data comprises identifying that the system is able to run with the system's actual temperatures, but that incorrect system thermal readings would cause the system to not re-IPL; and notifying a user to prevent re-IPL, allowing the customer to continually utilize the system, while ordering hardware to be replaced.
11. The method as recited in claim 10, wherein notifying the user includes alerting user and identifying failed hardware.
12. The method as recited in claim 11, includes notifying the user of potential loss of resources.
13. The method as recited in claim 10, includes notifying the user not to power cycle due to hardware failure until failed hardware is available.
14. The method as recited in claim 10, wherein providing a system wide firmware component manager includes providing control code for performing power off self-test and building the system wide firmware component manager.
15. The method as recited in claim 10, includes providing predefined mandatory functions to collect power down/up data that prevents re-IPL.
16. The method as recited in claim 10, includes providing global interface communication between the system wide firmware component manager and firmware components.
17. The method as recited in claim 16, includes using the global interface communication between the system wide firmware component manager and firmware components to identify error.
18. The method as recited in claim 16, includes using the global interface communication between the system wide firmware component manager and firmware components to identify loss of resource.
19. The method as recited in claim 10, using identified loss of resource and identified error to notify user not to power cycle.
20. The method as recited in claim 10, includes notifying the user not to power cycle due to hardware failure until failed hardware is available.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The present disclosure together with the above and other objects and advantages may best be understood from the following detailed description of the preferred embodiments of the disclosure illustrated in the drawings, wherein:
(2)
(3)
(4)
(5)
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
(6) In the following detailed description of embodiments of the disclosure, reference is made to the accompanying drawings, which illustrate example embodiments by which the disclosure may be practiced. It is to be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the disclosure.
(7) The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
(8) In accordance with features of the disclosure, a method and apparatus for implementing power up detection in a power down cycle to dynamically determine whether a failed component in a system prevents Initial Program Load (IPL) or re-IPL, or results in a loss of resources. Predefined mandatory functions are called to collect power down/up data that prevents re-IPL, or results in the reduction of resources. A user is notified not to re-IPL or power off the system, allowing the customer to continually utilize the system, while ordering hardware to be replaced.
(9) Referring now to
(10) Computer system 100 includes a system memory 106. System memory 106 is a random-access semiconductor memory for storing data, including applications and programs. System memory 106 is comprised of, for example, a dynamic random access memory (DRAM), a synchronous direct random access memory (SDRAM), a current double data rate (DDRx) SDRAM, non-volatile memory, optical storage, and other storage devices.
(11) I/O bus interface 114, and buses 116, 118 provide communication paths among the various system components. Bus 116 is a processor/memory bus, often referred to as front-side bus, providing a data communication path for transferring data among CPUs 102 and caches 104, system memory 106 and I/O bus interface unit 114. I/O bus interface 114 is further coupled to system I/O bus 118 for transferring data to and from various I/O units.
(12) As shown, computer system 100 includes a storage interface 120 coupled to storage devices, such as, a direct access storage device (DASD) 122, and a CD-ROM 124. Computer system 100 includes a terminal interface 126 coupled to a plurality of terminals 128, #1-M, a network interface 130 coupled to a network 132, such as the Internet, local area or other networks, shown connected to another separate computer system 133, and a I/O device interface 134 coupled to I/O devices, such as a first printer/fax 136A, and a second printer 136B.
(13) I/O bus interface 114 communicates with multiple I/O interface units 120, 126, 130, and 134, which are also known as I/O processors (IOPs) or I/O adapters (IOAs), through system I/O bus 116. System I/O bus 116 is, for example, an industry standard PCI bus, or other appropriate bus technology.
(14) System memory 106 stores an operating system 140, a user interface 142, a predefined component functions to collect power down/up data that would prevent re IPL 144, and a firmware component manager control logic 146 in accordance with the preferred embodiments. Theredefined component functions to collect power down/up data that would prevent re IPL 144, and the firmware component manager control logic 146 together provide a system wide firmware mechanism to manage the firmware components to identify whether the system will be able to re-IPL and also warn the customers of reduced resources and failed re-IPL.
(15) In accordance with features of the disclosure, multiple system wide components are encompassed and includes identifying resource reduction and errors, optionally including error logs. Managing the firmware components to identify whether the system will be able to re-IPL and also warn the customers of reduced resources due to the re-IPL are done by using power up detection in a power down cycle. To dynamically determine whether a failed component in a system would prevent re-IPL or would result in the reduction of resources is automatically detected, and then alerting the customer/user not to re-IPL or power off the system is performed.
(16) Referring now to
(17) As indicated at a block 204, mandatory functions are pre-defined to collect power down/up data that would prevent re-IPL. For example, a system configuration is identified that prevent re-IPL, while the system can currently run in this configuration. For example, incorrect thermal readings could cause the system not to re-IPL, while the system can currently run with the actual temperatures. As indicated at a block 206, any information that would result in resource loss are collected.
(18) As indicated at a block 208, global interface communication are provided between the firmware manager and firmware components. It is determined if a power cycle would result in a failed re-IPL or loss of resources, as indicated at a decision block 210. If determined that a power cycle would result in a failed re-IPL or loss of resources, the user is alerted, failed part or component is reported, and potential resource loss is displayed, as indicated at a block 212. At block 212, the user is informed not to power cycle to significant hardware failure until the failed hardware is available. Otherwise if determined that a power cycle would not result in a failed re-IPL or loss of resources, and after alerting the user, operations exit as indicated at a block 214.
(19) Referring now to
(20) It should be understood that the features of the disclosure are not limited to the illustrated criteria of loss of resource, error with priority level, and prevent re-IPL, as listed in FIG. Other criteria could be identified for firmware components, such as the firmware components 304, A, B, C, and D in accordance with features of the disclosure.
(21) In brief summary, if any functions are determined that are failing that could cause the system not to re-IPL or result in loss of resources, the user or customer is automatically alerted, allowing the customer to continually utilize the system, while ordering the hardware that needs to be replaced.
(22) Referring now to
(23) Computer readable program instructions 404, 406, 408, and 410 described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The computer program product 400 may include cloud based software residing as a cloud application, commonly referred to by the acronym (SaaS) Software as a Service. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions 404, 406, 408, and 410 from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
(24) A sequence of program instructions or a logical assembly of one or more interrelated modules defined by the recorded program means 404, 406, 408, and 410, direct the system 100 for implementing enhanced resource utilization of the preferred embodiment.
(25) While the present disclosure has been described with reference to the details of the embodiments of the disclosure shown in the drawing, these details are not intended to limit the scope of the disclosure as claimed in the appended claims.