METHOD AND DEVICE FOR MONITORING THE RELIABILITY OF AN ELECTRONIC SYSTEM
20220043056 · 2022-02-10
Inventors
Cpc classification
G01R31/2849
PHYSICS
G01R31/2872
PHYSICS
International classification
Abstract
The invention relates to a method as well as an apparatus configured for its execution for monitoring the reliability of an electronic system, in particular an electronic system comprising one or more electronic components. The method comprises: repeatedly measuring, at different measurement times and according to a predetermined transmission quality measure, a transmission quality of signals transmitted to or from the electronic system over a wired electrical signal transmission path; (ii) comparing, for each of the measurement times, the associated measured transmission quality with a respective associated transmission quality reference value previously determined according to the transmission quality measure; and (iii) determining a value of a reliability indicator associated with the respective measurement time in dependence on the result of the associated comparison In this regard, the transmission quality measure is defined as a measure of the extent of a subrange of a one- or multi-dimensional operating parameter range of the electronic system in which, according to a predetermined reliability criterion, the electronic system operates reliably.
Claims
1. A method (200) for monitoring the reliability of an electronic system (100a; 100b), comprising: repeatedly measuring (225), at different measurement times (t) and according to a predetermined transmission quality measure, a transmission quality (V) of signals transmitted to or from the electronic system over a wired electrical signal transmission path (L1); comparing (245), for each of the measurement times, the associated measured transmission quality (V) with a respective associated transmission quality reference value (R) previously determined according to the transmission quality measure; and determining (255, 260) a value of a reliability indicator (Z) associated with the respective measurement time in dependence on the result of the associated comparison; wherein the transmission quality measure is defined as a measure of the extent of a subrange (T) of a one- or multi-dimensional operating parameter range (B) of the electronic system in which, according to a predetermined reliability criterion (P), the electronic system operates reliably.
2. The method according to claim 1, wherein the transmission quality measure is defined (a) in the case of a one-dimensional operating parameter range (B) as a function of at least one of the following: an extent (A1; A2; A3; A4) of the subrange (T) within the operating parameter range; the ratio of an extent (A1; A2; A3; A4) of the subrange (T) within the operating parameter range to a specified maximum extent of the operating parameter range; (b) in the case of a multidimensional operating parameter range, as a function of at least one of the following: the ratio of an at least two-dimensional extent (A1) of the subrange to the corresponding specified extent of the entire operating parameter range or a defined section thereof; an extent (A2; A3; A4) of the sub-range along an extension direction determined by a selected single dimension of the operating parameter range or along an extension direction determined by a linear combination of a plurality of its dimensions; the ratio of an extent (A2; A3; A4) of the subrange along an extension direction determined by a selected single dimension or along an extension direction determined by a linear combination of multiple dimensions of the operating parameter range to a specified maximum extent of the operating parameter range along that extension direction; or (c) as a function of a transmission quality measured for one or more selected individual operating points within the operating range or a predetermined section thereof in each case and is evaluated in accordance with the predetermined reliability criterion.
3. The method according to claim 2, wherein the transmission quality measure is additionally defined as a function of at least one parameter characterizing the location of the subrange within the operating parameter range.
4. The method according to any one of the preceding claims, wherein the transmission quality is determined by means of the transmission quality measure as a function of data representing a shmoo diagram spanning the operating range and the subrange therein.
5. The method according to claim 4, wherein the transmission quality (V) to be measured for the respective measurement time (t) is determined as a function of the value of at least one parameter representing a certain at least approximately determined extent (A1, . . . , A4) of the subrange within the Shmoo diagram.
6. The method according to any one of the preceding claims, wherein: for each measurement time T(t), the respective measurement of the transmission quality comprises measuring a respective transmission error rate with respect to a digital signal transmission over the signal transmission path (L1) for a plurality of operating points (WP) within the specified multi-dimensional operating parameter range; and the transmission quality for the respective measurement time is determined on the basis of the associated transmission error rates (BER) thus measured.
7. The method according to claim 6, wherein the comparison (245) of the transmission quality measured at a respective measurement time with an assigned respective reference value for the transmission quality determined beforehand in accordance with the transmission quality measure is performed in such a way that the transmission quality measured at the respective measurement time is included in the comparison in the form of a mean value or median value which results from an averaging (225) or median formation on the basis of this measured transmission quality and at least one transmission quality (V) measured at an earlier measurement time with respect to the transmission path, these transmission qualities (V) having been determined in each case in accordance with the transmission quality measure.
8. The method according to any one of the preceding claims, wherein the respective reference value (R) associated with one or more measurement times is determined as a function of at least one of the operating parameters.
9. The method according to any one of the preceding claims, wherein the operating parameter range has, each as a dimension, at least one of the following operating parameters: an operating parameter characterizing a signal level (U) of a signal transmitted via the transmission path; an operating parameter characterizing a signal phase (ϕ) of a signal transmitted via the transmission path; an operating parameter characterizing an operating temperature of the electronic system or the transmission path; an operating parameter characterizing an aging condition of the electronic system or the transmission path; an operating parameter that identifies one or more external influences or impacts to which the electronic system is exposed.
10. The method according to any one of the preceding claims, wherein the reference values (R) for the transmission quality are determined at different measurement times (t) prior to the determination of the subrange (T) under specific reference conditions and are stored in a data structure within a memory device (130) for subsequent comparison with the results of the repeated measurement of the transmission quality.
11. The method according to any of the preceding claims, wherein the respective value of the reliability indicator (Z) is determined to indicate sufficient reliability or lack of reliability depending on the result of the comparison.
12. The method according to claim 11, wherein the determination of the respective value of the reliability indicator is performed on the basis of a test criterion (P) which is dynamically adapted depending on a plurality of previously determined values of the reliability indicator or, as the case may be, of the resulting reliability indicator.
13. The method according to claim 12, wherein the dynamic adaptation of the test criterion (P) is performed in the context of machine learning on the basis of the previously determined values of the reliability indicator or, as the case may be, the resulting reliability indicator.
14. The method according to any one of claims 11 to 14, wherein the value of the reliability indicator (Z) is determined as a function of a deviation (A), determined during the comparison, of the respective transmission quality (V) measured according to the transmission quality measure from the associated respective reference value (R) based on one of the following test criteria (P): if the determined deviation does not fall below a predetermined fixed value for the minimum deviation, the value of the reliability indicator is set to indicate sufficient reliability and otherwise a lack of reliability; if the determined deviation does not fall below a value for the minimum deviation defined as a function of the measurement time, the value of the reliability indicator is set to indicate sufficient reliability and otherwise lack of reliability; if the respective measurement time is before the time of reaching a predetermined state of aging of the electronic system, the value of the reliability indicator is set to indicate sufficient reliability and otherwise lack of reliability.
15. The method according to any one of the preceding claims, wherein: at least one electronic component (K) is arranged in the transmission path (L1); and; the determination of the transmission quality at the respective measurement time is carried out depending on the type of the at least one component.
16. The method according to any one of the preceding claims, further comprising: applying the method according to any one of the preceding claims to determine a respective further reliability indicator for the measurement times based on a repeated measurement of the transmission quality according to the predetermined transmission quality measure of signals transmitted to or from the electronic system via a second signal transmission path (L2; L3) different from the first signal transmission path; and determining a resulting reliability indicator at the respective measurement time for the electronic system or at least one of the transmission paths as a function of the reliability indicator determined for the first transmission path for the respective measurement time and the further reliability indicator determined for this respective measurement time for the second transmission path.
17. The method according to any one of the preceding claims, further comprising: triggering (265) an action of the electronic system itself or an action of another entity acting thereon, when (250) according to the determined value of the reliability indicator or, as the case may be, the resulting reliability indicator, the electronic system is no longer sufficiently reliable.
18. An apparatus (105) for monitoring the reliability of an electronic system (100a, 100b), the apparatus being configured to perform the method (200) according to any one of the preceding claims.
19. A computer program comprising instructions that, when executed on one or more processors (125) associated with an apparatus (105) according to claim 18, cause the apparatus (105) to execute the method (200) according to any one of claims 1 to 17.
Description
[0056] Further advantages, features and possible applications of the present invention will be apparent from the following detailed description in connection with the figures.
[0057]
[0058]
[0059]
[0060]
[0061]
[0062] Throughout the figures, the same reference signs are used for the same or corresponding elements of the invention.
[0063] The system 100a to be monitored, illustrated in
[0064] All components of the system are located on a circuit board (not shown) and are mechanically and electrically connected to it by means of corresponding solder joints on associated conductor tracks of the circuit board.
[0065] Instead of the overall arrangement 100a, only the semiconductor circuit 105 can alternatively be regarded as the system 100b to be monitored. The system 100a or 100b is thus capable of monitoring itself with respect to its reliability by means of the method according to the invention in the sense of a self-test, so that it simultaneously represents a possible embodiment of the device according to the invention.
[0066] In the following, with reference to
[0067] The flow of the method 200 may be divided into three sequential phases, with a first phase being a preparatory phase that may be run only once and that includes a preparatory process or step 205 (the terms “process” and “step” are used herein as synonymous, and may refer to a single action or a plurality of actions combined into a process, depending on the context).
[0068] As part of this preparatory process 205, the values for a time-dependent reference value R(t) for the transmission quality determined according to a defined transmission quality measure are defined and stored in a lookup table LUT in the memory device 130 of the semiconductor circuit 105. The determination of the time history of the reference value, and thus of said values, may be performed in particular to test the fault susceptibility of the system 100a or 100b by performing the second phase of the method described below under well-defined test conditions to generate a data set representing an initial version of the Shmoo diagram shown in
[0069] At the same time, the shape, size and position of a minimum sub-range D of the operating range B required for proper operation of the system 100a or 100b is known (cf.
[0070] The reference value R(t1) is now selected so that it would correspond in the Shmoo diagram to a closed curve or area which, on the one hand, completely envelops the minimum subrange and, on the other hand, runs completely in the subrange T spanned by the operating points in the operating parameter range B of the system at which the system operates faultlessly and reliably at time t1.
[0071] Starting from this reference value R(t1), a desired time course R(t) can then be defined. In particular, the reference value R can also be defined as a constant (cf.
[0072] After step or process 205, which in particular can still be carried out at the factory, the second phase of the method begins, in which the actual monitoring of the system takes place, for example during its use in the field. In a step 210, a new current measurement time t is first determined for this purpose, which can be done in particular by incrementing (t:=t+Δt) a previous measurement time. The value t can be used in particular as a time index to index different successive, but not necessarily equidistant, measurement times. In the present example, however, the above-mentioned incrementation is used, resulting in equidistant measurement times t1, t2, . . . , t10.
[0073] In a further step 215, an associated test criterion P(t) is determined for the current measurement time t. If available, this is done on the basis of a plurality of values of the reliability indicator Z(t) determined in each loop-shaped run of the second phase of the method in steps 255 and 260, respectively, at preceding measurement times. Then, in a step 220, an error threshold F(t) associated with the current measurement time is read out from the lookup table LUT stored in the memory device 130. This error threshold can in particular be defined as a fixed value or value progression ab initio in the lookup table LUT, or it can likewise be defined only in the context of step 205 of the preparation phase, in particular on the basis of the results of the transmission quality measurements performed there at the various measurement points WP, and written to the lookup table LUT.
[0074] Now, in a step 225 for the current measurement time t, the actual measurement of the reliability of the system 100a or 100b can begin. For this purpose, in step 225 the system is tested successively at each operating point from a set of predetermined operating points WP (cf.
[0075] Now, in a further step 230, all operating points i in the operating parameter range B are identified whose respective current averaged transmission quality (BER(i, t)) lies above the error threshold F(t) assigned to the current measurement time t. These operating points i thus span the current subrange T(t) of the operating parameter range B in which the system 100a or 100b, respectively, operates reliably.
[0076] Based on these determined operating points, a selected extent V(t) of the subrange T(t) is now determined in step 235 as a measure of the transmission quality at measurement time t. Accordingly, said transmission quality measure is defined by the selected type of extension. In
[0077] The transmission quality measure can in particular also be defined as a relative measure. This can be achieved in particular by relating the specific extent of the subrange T(t) approximately (i) to the corresponding total extent of the operating parameter range B (cf.
[0078] Then, in a step 240, the reference value R(t) assigned to the current measurement time t is read from the lookup table LUT of the memory device 130 and, in a further step 245, a comparison of the determined extent V(t) of the subrange T(t) with the read reference value R(t) is performed by means of difference formation. The difference A(t)=V (t)−R(t) determined in this process represents a deviation of the extent V(t) from the reference value R(t) (cf.
[0079] According to the result of the test, the value of a reliability indicator Z(t) is then set in step 255 in the case of insufficient reliability (250—no) or in step 260 in the case of sufficient reliability (250—yes) to a value indicating the test result, for example to a correspondingly defined Boolean value “0” or “1”.
[0080] In case of an optional use (not explicitly shown in
[0081] In particular, it is thus possible in many cases to distinguish system-internal defects or faults which have their cause in a transmission path or one or more components of the system itself from faults which have an external cause and occur, for example, by means of electromagnetic coupling of interference signals, in particular only temporarily. Such a distinction is possible in particular if the reliability curve with respect to the two or more different transmission paths is clearly different, so that it can be assumed that a first subset of the transmission paths is subject to an external disturbance, while this is not the case for the second subset of the transmission paths formed by the other transmission path.
[0082] In the case of step 260, the method then immediately branches back to step 210, while in the case of step 255, this is preceded by step 265, in which a predetermined action, for example error signaling, is triggered.
[0083] Some further embodiments of the invention are still described below:
[0084] By using communication interfaces and corresponding protocols, data packets in particular can be transmitted between chipsets via the transmission path(s) involved in the method and thus these systems can be monitored with regard to the effects of material aging under automotive loads.
[0085] Furthermore, according to one embodiment, the look-up table can already be stored on the assembly during assembly production in a reference module, for example by means of a programming step in a memory device designed as an NVM or e-Fuse.
[0086] In another embodiment, the determination of the look-up table may include signal levels, phases, or temperature and aging dependent BER expectation values that are determined based on experience based on the assembly operating concept, in particular using a statistical design of experiments (DoE)—“Shmoo Diagram”.
[0087] Other possible embodiments are listed below in numbered order:
[0088] 1. A method for detecting at least one characteristic of an electronic component, the electronic component being determined to be used in a vehicle, on a vehicle or in services for a vehicle, the method comprising providing a communication interface for transmitting data relating to the electronic component to be examined.
[0089] 2. The method according to embodiment 1, wherein the method is used to monitor error-free transmission of data packets transmitted between chipsets by using one or a plurality of communication interfaces, in particular fast communication interfaces, and corresponding protocols.
[0090] 3. The method according to embodiment 1 or embodiment 2, wherein, as part of a build-in self-test, a bit error rate test (BERT) is repeatedly performed at the respective signal operating point and is compared with the BER values in a look-up table, wherein, if the mean value of the repeated BERT deviates from the BER values in the look-up table, the signal quality has changed.
[0091] 4. The method according to at least one of embodiments 1 to 3, wherein the look-up table has a signal level and phase dependent BER expectation values representing specification and intervention limits, wherein in particular these expectation values are clustered according to temperature and aging condition.
[0092] 5. The method according to at least one of embodiments 1 to 4, further comprising storing expected values on an assembly.
[0093] 6. The method according to at least one of embodiments 1 to 5, further comprising storing expected values during assembly production in a memory section of the assembly or module of the assembly.
[0094] 7. The method according to at least one of embodiments 1 to 6, wherein the look-up table BER expectation values are stored on the assembly during assembly production in a reference module, in particular into an NVM or an e-fuse by means of a programming step.
[0095] 8. The method according to at least one of embodiments 1 to 7, wherein the values of the look-up table are based on an assembly operation concept.
[0096] 9. The method according to at least one of embodiments 1 to 8, wherein the values of the look-up table are based on a static experimental design, in particular on a Shmoo diagram.
[0097] 10. The method according to at least one of embodiments 1 to 9, wherein the determination of the look-up table signal levels, phases, temperature and aging dependent BER expectation values are determined experience-based based on the assembly operating concept (SW), or by means of a statistical design of experiments (DoE)—“Shmoo Diagram”.
[0098] 11. The method according to at least one of embodiments 1 to 10, wherein the method is applied to an apparatus of a vehicle component.
[0099] 12. A device for carrying out the method according to at least one of embodiments 1 to 11, comprising a test device suitable for testing components of the automotive industry.
[0100] The present solution is applicable, among other things, to control units, sensors, as well as modules. The application of the method and device according to the invention can be made wherever complex or special functions are used based on semiconductor/electronic products.
[0101] Depending on the respective interface specification (e.g. PCIe Gen1/2, eSATA, USB) for the communication interfaces involved in the signal transmission, specified variance ranges exist for the signals to be transmitted for the signal levels and phases. During DoE, signal operating points (level, phase, temperature, different pre-aging of the mounting and interconnect technology) are set in the specified variance range and a BERT is repeatedly performed. Resulting Shmoo eye diagrams correlate to the operating points—specification and intervention limits (reference value or value curves) can thus be derived.
[0102] As an example, the “PCIe” interface is used here to further explain the invention and the advantages that can be achieved with it. PCI Express (“Peripheral Component Interconnect Express”, abbreviated PCIe or PCI-E) is a standard for connecting peripheral devices to the chipset of a main processor and provides a high data transfer rate per pin. The PCI Express protocol contains a very robust connection integrity scheme, but has some reliability limitations that are not immediately obvious. Each application packet contains a link-level cyclic redundancy check (LCRC) that is verified immediately upon receipt. An acknowledged/not acknowledged (ACK/NAK) mechanism handles the seamless retransmission of faulty packets and includes timeouts to ensure that broken connections do not go unnoticed.
[0103] However, one potential limitation is that the LCRC can only protect the data that is actually presented to the PCI Express interface logic—it does not provide a way to confirm that the data is actually correct. Additionally, retransmission of erroneous packets due to lack of acknowledgement (Not-Acknowledged/NAK) hides signal integrity issues in the physical link, as application software and even upper layer hardware are less likely to be aware of the retransmissions. Thus, whether due to a fundamental problem that occurs at design/manufacturing time, or due to aging, all but the most serious PCI Express link errors will be largely invisible to software.
[0104] Thus, an advantageous field for improvements in PCIe achievable according to the invention is in tracking reliability from the perspective of initial error-free transmission. For example, if each packet requires three attempts for successful delivery, the link may be reliable in the sense of correct data delivery, but not in the sense of error-free transmissions. Long experience with PCI Express has shown that channels with poor quality are the main source of problems. By means of the solution according to the invention, however, a deteriorating reliability of the PCIe transmission and thus of the electronic system involved can be detected at an early stage, and this at a time when no functional faults are yet occurring and thus the looming deterioration in reliability could not yet be detected by conventional means.
[0105] Overall, there are also the following advantages according to the invention, among others: The cost-intensive assurance of the functional safety of modern assemblies and other electronic systems can be significantly improved. In particular, the degree of innovation of corresponding products actually used in products can be increased in applications that are particularly demanding in terms of reliability, such as in automotive engineering or aircraft technology, since an additional monitoring and thus control component can be implemented according to the invention that did not previously exist. Thus, highly innovative products can also be used at the given time without violating reliability requirements.
[0106] It also provides an alternative to redundancy concepts in terms of cost, weight, and energy.
[0107] While at least one exemplary embodiment has been described above, it should be noted that a large number of variations thereon exist. It should also be noted that the exemplary embodiments described are only non-limiting examples, and it is not intended thereby to limit the scope, applicability, or configuration of the devices and methods described herein. Rather, the foregoing description will provide guidance to those skilled in the art for implementing at least one exemplary embodiment, it being understood that various changes in the operation and arrangement of the elements described in an exemplary embodiment may be made without departing from the subject matter set forth in each of the appended claims as well as its legal equivalents.
REFERENCE LIST
[0108] 100a, electronic system [0109] 105 first semiconductor circuit, at the same time device according to the invention [0110] 110 second semiconductor circuit [0111] 115, 120 communication interfaces [0112] 125 processor [0113] 130 storage device [0114] 200 exemplary embodiment of the method [0115] 202-265 steps of method 200 [0116] A, A1, A4 various extensions of the subrange T [0117] B operating parameter range [0118] BER bit error rate [0119] C Transition range [0120] D minimum operating range for proper operation [0121] F fault threshold [0122] I index for indexing the operating points WP [0123] K electronic component [0124] L1, . . . , L3 transmission lines [0125] LUT look-up table [0126] P test criterion [0127] R reference value [0128] T subrange [0129] t1, . . . , t10 different measurement times [0130] t index for indexing the measurement times [0131] U signal level [0132] ϕ signal phase [0133] transmission quality [0134] WP operating point(s) in the operating parameter range [0135] Z reliability indicator