Devices, systems and methods for internet and failover connectivity and monitoring
11595483 · 2023-02-28
Assignee
Inventors
- Christopher Van Oort (Ames, IA, US)
- Donald Van Oort (West Okoboji, IA, US)
- Nathan Lafferty (Ames, IA, US)
Cpc classification
H04L43/0876
ELECTRICITY
H04L67/02
ELECTRICITY
H04L12/66
ELECTRICITY
H04L12/28
ELECTRICITY
H04L69/40
ELECTRICITY
International classification
H04L12/66
ELECTRICITY
H04L43/0876
ELECTRICITY
Abstract
The disclosed apparatus, systems and methods relate a failover and internet connection monitoring system featuring a cloud server running an API, a probe, a firewall and a policy routing system. The failover connection monitoring system is capable of gathering and analyzing performance data and controlling the flow of packets to and from the internet over one or more connections to optimize performance of the network.
Claims
1. A failover connection monitoring and validation system, comprising: a. a server; b. a probe configured to issue routed packets, each packet comprising source information and destination information; c. a firewall configured to enforce policy routing; and d. a policy routing system on the firewall configured to route the routed packets through a network interface over a connection based on destination information for resolution on an endpoint on the server, wherein receipt of routed packets validates the status of the connection.
2. The failover connection monitoring system of claim 1, further comprising a platform adapted to gather performance data on multiple internet connections simultaneously.
3. The failover connection monitoring system of claim 1, further comprising a validation system, wherein the validation system is configured to provide connection validation for backup internet connections while they are otherwise not in use for primary internet traffic flows.
4. The failover connection monitoring system of claim 3, further comprising a secure system capable of making backup internet connection validations out-of-band.
5. The failover connection monitoring system of claim 1, further comprising a failover appliance.
6. The failover connection monitoring system of claim 1, wherein the probe is an installable device.
7. The failover connection monitoring system of claim 1, wherein the probe is software or firmware-based.
8. The failover connection monitoring system of claim 1, wherein the probe is in electronic communication with the firewall.
9. The failover connection monitoring system of claim 1, wherein the server is a cloud server.
10. A LAN connection monitoring and validation system, comprising: a. a cloud server; b. a local probe; and c. a firewall in operational communication with the local probe and cloud server via a primary connection and a secondary connection, wherein the local probe is configured to issue packets utilizing policy routing to route and label packets via the primary connection and secondary connection for resolution at the cloud server, and wherein receipt of labeled packets at the cloud server validates the status of the primary connection or the secondary connection.
11. The system of claim 10, wherein the LAN is configured to failover to the secondary connection when the primary connection is interrupted.
12. The system of claim 11, wherein the LAN is configured to failback to the primary connection when connectivity is restored.
13. The system of claim 10, wherein the system is configured to assess congestion via a summing approach.
14. The system of claim 10, wherein the system is configured to balance traffic via the primary and secondary connections.
15. The system of claim 10, wherein the server is configured to generate outage alerts.
16. The system of claim 10, wherein the server is configured to identify packet latency and induce failover.
17. The system of claim 10, wherein the server is configured to record network performance data.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The novel features of the system are set forth with particularity in the claims that follow. A better understanding of the features and advantages of the system will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the system, devices and methods are utilized, and the accompanying drawings of which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
(31)
(32)
(33)
(34)
DETAILED DESCRIPTION
(35) The various embodiments disclosed or contemplated herein relate to a monitoring and failover system and associated devices and methods. In various implementations, the disclosed devices, systems and methods relate to technologies that enable the monitoring of internet connectivity and failures, including the ability to identify and/or predict failures and change connectivity to a failover connection on a local network. In various implementations, the systems, devices and methods employ policy routing techniques to force packets to follow identified routes so as to monitor the connectivity of a local network from a remote server for display and control of the associated components. Several implementations and aspects of these approaches are described herein.
(36)
(37) As best shown in
(38) As is shown in
(39) In the implementations of
(40) Continuing with the implementations of
(41) To illustrate this distinction, in the implementations of
(42) It is understood that such a wireless connection 24 is typically considerably more costly to use over time than the wired primary connection 3, and may be slower or face other challenges understood in the art. Therefore, in certain implementations, when the primary connection 3 is back online, it is typically highly preferable to failback to the primary connection 3 and resume “normal” service, wherein the system returns to routing the network traffic through the primary connection 3. Further, as would be appreciated by one of skill in the art, when the failover appliance 20 is functioning properly and providing seamless transition to the wireless connection 24, a terminal 7 user may not even be aware that the change in route to the internet has been made in either direction, that is between the primary 3 and secondary 24 connections. In certain circumstances, it is possible that the failover appliance 20 will continue to provide internet connectivity after the primary connection 3 has been restored.
(43) One important aspect of the system, according to certain embodiments, is being configured to enforce policy routing, such as via a policy routing system (shown generally at 10). That is, in various implementations, the policy routing system 10 is configured to generate and route packets with specificity and/or precedence through several connections 3, 24. In these implementations, these policy routed packets are received (or not) remotely by resolution on a server 40, thereby allowing users to validate the individual connections. Alternate implementations can incorporate static routes with affinity, precedence, and/or metrics. Further explanation of these implementations follows.
(44) As is also shown in
(45) In accordance with one implementation, the probe 30 performs connection monitoring by generating traffic from within the local network 1 that is tagged and specifically routed through the various connections 3, 24 for collection and analysis on a cloud 40 server which is connected to a database, and various networked components known in the art and understood to enable the systems and methods disclosed herein.
(46) In one implementation, the database (also represented generally at 40) comprises a central processing unit (“CPU”) and main memory, an input/output interface for communicating with various databases, files, programs, and networks (such as the Internet, for example), and one or more storage devices. The storage devices may be disk drive devices, CD ROM devices, or elsewhere in the cloud. The server 40 may also have an interface, including, for example, a monitor or other screen device and an input device, such as a keyboard, a mouse, a touchpad, or any other such known input device. The server 40 in various implementations can have a database 40A that is integral to a processor or accessible to the processor through a computer network or other suitable communication link. In one embodiment, the database 40A is comprised of a plurality of database servers, some or all of which can be integral to the central processor or alternatively can be physically separate from the processor.
(47) As would be appreciated by one of skill in the art, when signals sent from the probe 30 to the server 40 are not routed as described below, that test traffic can validate the performance and function of various network connections 3, 24. In certain implementations, the status of these various connections 3, 24 is able to be viewed on a dashboard 26 on, for example, a terminal 7 or mobile device 28, as is shown in
(48) Turning to
(49) In the implementations of
(50) As shown in
(51) In various implementations, the system 10 validates individual connections by sending “pulses” of data packets from the probe 30 to the cloud 40 servers, where the pulses are received, for example via an application programming interface (“API”). In various alternate implementations, the probe 30 functionality can be integrated with the firewall 5, such that the firewall 5 performs the described functions and generate the disclosed test traffic. In yet further implementations, other network devices including but not limited to wireless access points, network attached storage (“NAS”) devices or the failover appliance 20 (as shown in
(52) As shown in
(53) It is understood that in these implementations, and as shown in
(54) In the implementation of
(55) In various implementations, the system 10 can utilize the timing of the packets being sent to detect a down connection. Such as, if packets are being sent every second or minute, the system may be configured to establish a threshold such that two misses will be considered to indicate a downed connection. It is understood that the network administrator will be able to configure the number of misses that are appropriate for individual applications on the basis of the specific hardware at use in that implementation and the overall desirability of false positives and/or false negatives. It is understood that in alternate implementations, a network administrator is not essential, and automated processes can be used to configure the system features, some non-limiting examples being artificial intelligence, machine learning and state system logic.
(56) Further, it is understood that the resistance of the system in certain implementations may requires an additional lag time be included in the thresholding calculation. For example, in one exemplary embodiment, a system pulses out a packet over the primary connection every minute, and it is understood that the system resistance thresholding coefficient is set to a lag time of about 17 seconds, a total time of 2:17—two misses (one per minute) plus lag coefficient 17 seconds) can be used to validate that a connection is down. Therefore, in such an embodiment, the system can indicate to users that a system is down after 2:17, utilizing an equation of timing of the misses plus lag time. It is understood that in various implementations, lag calculations of 0.0 seconds to 59 seconds or more may be used. Other amounts are of course possible in alternate implementations. It is further understood that in certain implementations, the pulse frequency and lag coefficients for primary and backup connections may vary, such as pulsing the backup connection less frequently to conserve data usage.
(57) Accordingly, in one exemplary implementation, a test packet is sent to the server every second, and if two packets are missed the system 10 is alerted that a connection may be down. When calculating the time that must pass before a connection is deemed down the system must consider lag time created by resistance in the system. Such that, if a packet is sent every second and if when two packets are missed the system 10 is alerted that a connection is down, and the resistance of the system 10 creates a lag time of 0.17 seconds, then if 2.17 seconds passes without the server receiving a packet the connection is deemed down. It is readily apparent that a large number of possible lag times can be used and calibrated to the individual network environment, resistance times and the like.
(58) In various implementations, the system 10 can utilize the policy routing of individual packets to validate the various connections 3, 24 remotely, for example with a remote cloud 40 server. In certain implementations, the status of the various connections 3, 24 is able to be viewed on a dashboard 26 on, for example, the terminal or mobile device 28, as is shown in
(59) In some embodiments machine state-machine logic, artificial intelligence, and/or machine learning is implemented to run on a network appliance in the place of server-side logic. The network appliance is used to send data to a server and/or control network routing functionality. Examples may include: applying quality of service during congested periods, keeping connections online during network downtime events, controlling the WAN connection that is used for whole or specific networks, turning some VLANS off and/or leaving others on during outages, and shifting DNS from non-encrypted to encrypted providers and from encrypted to non-encrypted providers or switching providers if issues are noticed.
(60) Returning to the implementations of
(61) In implementations with multiple interfaces 50, 52 and connections 3, 24, each interface/connection pair can be a default gateway, in that either interface 50, 52 can act as the primary default gateway as required.
(62) For example, in various implementations, in the absence of alternate or local routing instructions, packets (shown generally by reference arrows P) sent to and from the terminals 7A, 7B and/or the internet 8 are routed through the primary default gateway. In one example, the wired internet connection 3 may have a default gateway IP of 5.5.5.5 while the wireless internet connection 24 may have a default gateway IP of 6.6.6.6. In this implementation, during normal operation, the firewall 5 is configured to send all non-local traffic to 5.5.5.5, thereby routing it out by way of the primary connection 3. In this example, when the system 10 determines that the primary connection 3 has failed or is otherwise unavailable, the system defaults over to the failover appliance 20 to route all traffic to 6.6.6.6 and out the wireless connection 24.
(63) Various aspects of the system 10 can serve as a policy routing enforcement system. To achieve policy routing, it is understood that these packets P can also contain, in part, a routing table 60 that can dictate packet routing. That is, in certain implementations the packets P contain source IP and destination IP address information which is then evaluated via a router by analyzing the routing table to dictate if the traffic will be routed locally or routed to the internet. It is further understood that individual addresses can be resolved at discrete IP addresses. That is, the system 10 is able utilize various IP addresses to direct traffic through a specific gateway/interface/connection to be received at a corresponding address to validate the specific connection.
(64) For example, in one illustrative implementation, the system can be configured for policy routing, such that for packets issued by the probe 30:
(65) a.website.com resolves by DNS to an IP address of 1.2.3.4
(66) b.website.com resolves by DNS to an IP address of 1.2.3.5
(67) Accordingly, in this implementation, the probe 30 can be configured to issue routed packets and correspondingly the firewall 5 can be configured to enforce policy routing according to Rules applied to the policy routing component (which can be a firewall, router, or other device configured to dictate policy routing).
(68) In yet another example, the following rules are given:
(69) Rule 1: route 1.2.3.5/32 6.6.6.6 signals to the firewall 5 that any packet sent to b.website.com, should be sent out the second gateway interface 52 to the failover internet connection 24; and
(70) Rule 2: route 0.0.0.0/0 5.5.5.5 signals to the firewall 5 to send all packets out via the primary connection 3.
(71) Accordingly, as would be understood, in this implementation, when packets arrive at the firewall 5, the firewall 5 evaluates the rules in order, such that if the conditions of rule 1 are not met, rule 2 will be executed, sending the packet via the primary connection 3. As rule 1 is specific to one destination (1.2.3.5), it creates a “pinhole” through which only traffic to that host (1.2.3.5) will flow across our backup connection 24 during normal operation, while all other packets arriving at the firewall 5 will not match the limited rule 1, and will then be routed according to rule 2, thus sending the packet via the primary connection 3.
(72) It is understood that such policy routing can be expanded to encompass n connections. For example:
(73) Rule 1: route 1.2.3.5/32 6.6.6.6 signals to the firewall 5 that any packet sent to b.website.com, should be sent out the second gateway interface 52 to the failover internet connection 24; and
(74) Rule 2: route 1.2.3.6/32 7.7.7.7 signals to the firewall 5 that any packet sent to b.website.com, should be sent out a third gateway interface (not shown) via a tertiary connection (not shown); and
(75) Rule 3: route 0.0.0.0/0 5.5.5.5 signals to the firewall 5 to send all packets out via the primary connection 3.
(76) In various implementations, any number of connections and rules are possible, as would be readily appreciated by a skilled artisan.
(77) By utilizing such rules, the validation system 10 can utilize the probe 30 to send test packets (shown by reference arrows A and B in
(78) In these implementations, the packets A, B can therefore contain specific IP addresses resolved differently on the cloud or other database 40 server—be they on the same host or multiple hosts—such that the packets are received in the same database on the cloud 40 with sufficient information for the cloud server to establish which connection 3, 24 the packet was routed through.
(79) In accordance with these implementations, the server/database 40 is correspondingly configured to run a module or specific code to correlate the receipt of these packets A, B to validate that either or both connections 3, 24 are “up” or “down.” That is, the packets A, B contain source IP address information that are received and/or resolved by different API endpoints on different domains or subdomains or server IP addresses. The receipt of any packet A, B destined for a specific endpoint—such as an IP address/domain/sub-domain—that is received at an alternate endpoint API can thus be correlated as a “down” event or even in some cases used to detect router misconfigurations for incorrect policy-based routing rules. Correspondingly, the local system 1 can be configured to switch-over traffic via the various interfaces 50, 52 as appropriate.
(80) Further, when these packets A, B are received on the cloud 40 server in these implementations, the tags allow the database 40 to establish the destination address the packets A, B were sent to, as well as the source address they were delivered from. This information is then used to determine if a packet sent for example to 1.2.3.4, and should have egressed via the 5.5.5.5 network, egressed via the 6.6.6.6 network. This is critically important because it allows the determination of failover connection 24 status. It is also important because there are several conditions which can cause a firewall (acting as a failover device) to mark a primary connection 50 as failed, and begin routing traffic over a failover connection interface 52 even though a primary connection 3 and interface 50 may be active.
(81) It is understood that in certain implementations, the firewall 5 can be induced to failover based on assessing the connection at a specific local address based on port number. As one illustrative embodiment, if a packet routed to an address 1.1.1.10 responds to the system via a secondary port—for example 222—the system 10 will identify this as a network failure and the system 10 will failover to a secondary or tertiary connection. If it stops responding then switch back to a primary connection.
(82) By enforcing policy routing, the failover system 10 and probe 30 can perform a variety of operations. The system 10 can provide link-state information for multiple Internet connections which are connected to a location and provide failover state information. In certain implementations, the system 10 can also calculate performance statistics; up-time, down-time, time in failover, latency, congestion and other factors. In certain implementations, the system 10 provides a dashboard 26 that in certain implementations provides a “single pane of glass” view of all locations for an enterprise, such that an IT manager or other professional can prioritize repairs and monitor connection usage. As would be appreciated by one of skill in the art, in certain implementations, the data collection and machine learning/artificial intelligence approaches to pattern recognition can be utilized so as to add predictive aspects to the dashboard interface for end users.
(83) Various views of the dashboard 26 are shown in
(84) As discussed herein above, in certain implementations, when the system identifies an outage or other need to failover, be it at the server 40 level, probe 30 level, or by a user, the local network 1 can be commanded to failover to an alternate WAN connection. It is understood that certain detected disruptions will fully prevent direct communication via a down connection between the probe 30 and the server 40. In the these implementations, the server 40 and/or probe 30 may quantitatively or qualitatively realize a signal interruption. In the case of a cloud server, the server 40 or probe 30 (and any associated systems and components) can apply logic accordingly (down, traffic coming out of the primary WAN that should have come from secondary WAN, and the like).
(85) In implementations where an interruption is identified at the server 40 level, the system 10 is able to communicate back to the probe 30 or other local components via a variety of network-level electronic communication I/O (input/output), such as polling, as one in the art would appreciate. That is, in certain implementations a “signal” can be dispatched from the server 40 to the probe 30 to induce a failover. Many methods for this polling or other communication are possible, such as via representational state transfer (“REST”), Simple Object Access Protocol (“SOAP”), Web Services Description Language (“WSDL”), WebSockets, an SSH tunnel (or other secure channel), or other custom protocols built on top of TCP, UDP or the like traffic.
(86) Several additional applications and implementations of the system 10 operating at the cloud or external server 40 level are depicted in
(87) In accordance with some implementations, certain quality of service (“QoS”) or other traffic shaping, splitting, and classification and applied routing or load balancing connection switching can be achieved, which may be referred to herein as traffic classification and routing policies. In these implementations, network traffic (shown with reference arrow T) is classified routed through various connections 3, 24 based on specified rules that can be turned on or off, or selectively applied based on end user preference and/or algorithm. For example, in accordance with certain implementations, the system 10 has traffic classification and routing policies that evaluate one or more parameters, such as latency, application filtering, port filtering and/or other systems and methods of traffic classification that would be apparent to one of skill in the art.
(88) In these implementations, QoS is achieved for the betterment of the local network, such that some traffic can be selectively routed out of one WAN 3 connection or another WAN connection 24, but the terminal user experience is simply online: unaware of the routing and/or networking taking place between the connections 3, 24. In certain business network implementations, the application traffic classification and routing policies are active or applied could be partially or wholly determined or controlled with the probe 30 and cloud 40 logic and evaluation combination. One of skill in the art would appreciate various alternative configurations that are possible. In various implementations, the probe 30 can therefore trigger the application of or disablement of business rules for QoS, load balancing for WAN and other procedures in real or near real-time, instead of the probe 30 only triggering failover.
(89) One representative example occurs in a commercial setting is shown in
(90) Similarly, in certain implementations, as shown in
(91) As shown in
(92) By way of example, in exemplary implementations, the system 10 can monitor actual web page load times for several websites and generate the mean load time, the standard deviation and variance (square root of the standard deviation). In these implementations, the system 10 can thereby identify user locations which can be compared with thresholds to establish the strength of individual connections (for example “great,” “normal,” or “bad”). It is further appreciated that these given thresholds can be established and revised on an ongoing basis via the examination of the recorded data and can integrate statistical modeling. For example, an end user would be able to compare their connection strength with other comparable users' connection strength, i.e. “How does my hotel's internet speed compare to all my other hotels internet speed, compared to lots of other hotels internet speed across the country?” By way of a further example, after the system 10 via the server 40 has identified a widespread service outage on an ISP, the system can be configured to apply logic to pre-emptively failover potentially vulnerable locations to avoid service interruptions prior to otherwise being detected discretely. These implementations can also allow for end users to avoid the so-called “last mile” or common/shared infrastructure challenges, such as when several ISPs use the same fiber optic network, the system 10 can be configured to map and otherwise recognize more complex connection outages and initiate failover rules at several end networks 1.
(93) As shown in
(94)
(95) As shown in
(96) In certain implementations of the system 10, congestion can be detected using a summing approach 70, one implementation being shown generally in
(97) Returning to the system 10 as shown in
(98) As also shown in
(99)
(100) An exemplary healthy, or non-congested internet connection is shown in
(101) Accordingly, the uplink and/or downlink connection latencies 90A, 90B, 90C, 90D, 90E between these hops (shown generally at 78 and 80) can be summed (as shown in
(102)
(103) As shown in
(104) It is understood that by summing the latency over a finite number of hops (as shown by 90A, 90B, 90C, 90D, 90E), it is irrelevant if the internet connection is the first, second, third or other hop, because the Internet connection delay will be captured on one of these early hops. The delay for other hops is usually inconsequential (a few milliseconds) and is typically far less than the latency of our saturated link. By not incorporating more hops than the given finite number, the system avoids accidentally incorporating intra-ISP, or Internet backbone latency into the measurement.
(105) As shown in
(106) Although the disclosure has been described with reference to preferred embodiments, persons skilled in the art will recognize that changes may be made in form and detail without departing from the spirit and scope of the disclosed apparatus, systems and methods.