Proxy selection by monitoring quality and available capacity
11606438 · 2023-03-14
Assignee
Inventors
Cpc classification
H04L43/10
ELECTRICITY
H04L67/1029
ELECTRICITY
H04L67/1008
ELECTRICITY
H04L43/55
ELECTRICITY
International classification
G06F15/173
PHYSICS
H04L43/10
ELECTRICITY
Abstract
Empirical data of exit nodes are continuously monitored and each exit node's overall performance and available capacity are calculated. The empirical data can include monitoring the number of concurrent requests currently being executed by each exit node and the disconnection chronology of each exit node. Further, each exit node is tested by benchmark requests and ping messages and each exit node's quality rate is calculated. Additionally, systems and methods are provided to select an exit node with the highest quality and available capacity value, from a particular pool to route the user request.
Claims
1. A method for reporting an available capacity of an exit node to a database and selecting an exit node pool to implement a user request, the method comprising: reporting, by a proxy supernode, empirical data of an exit node to a first database; computing, by the proxy supernode, an available capacity value for the exit node, wherein the proxy supernode, upon computing the available capacity value, reports the available capacity value to a second database; calculating, by the proxy supernode, a success rate for the exit node to determine whether the success rate is higher than a first numerical value assigned to the exit node by a service provider infrastructure; when the success rate for the exit node is not higher than the first numerical value: changing, by the proxy supernode, a second numerical value assigned to the exit node by the service provider infrastructure, wherein upon the changing of the second numerical value, the proxy supernode computes a new available capacity value for the exit node and reports the new available capacity value to the second database; calculating, by the proxy supernode, a quality rate value for the exit node.
2. The method of claim 1 further comprising: selecting, by the proxy supernode, the exit node by analyzing the available capacity value and the quality rate value, wherein the exit node is used to execute a user request.
3. The method of claim 2, wherein the available capacity value for the exit node is greater than zero.
4. The method of claim 2, wherein the exit node executes the user request by gathering data from a target.
5. The method of claim 2, wherein the exit node, after executing the user request, forwards the data gathered from the target to the proxy supernode.
6. The method of claim 2, wherein the proxy supernode is connected to the exit node.
7. The method of claim 2, wherein the proxy supernode forwards the user request to the exit node.
8. The method of claim 1, wherein the empirical data of the exit node comprises at least one or combination of: disconnection chronology; instances of observed failures; instances of corrupt responses; effective load; pool assignment timestamps; number of user requests executed by the exit node.
9. The method of claim 8, wherein the disconnection chronology comprises a detailed log of the exit node.
10. The method of claim 9, wherein the detailed log comprises at least a list of timestamps indicating instances when the exit node connects to the proxy supernode and instances when the exit node disconnects from the proxy supernode.
11. The method of claim 1, wherein the exit node is in a pool of exit nodes.
12. The method of claim 1, wherein the success rate is a percentage of success in executing a user request by the exit node.
13. The method of claim 1, wherein the service provider infrastructure initially assigns the first numerical value to the exit node denoting a minimum tolerance rate for the exit node.
14. The method of claim 13, wherein the minimum tolerance rate denoted by the first numerical value signifies a minimum tolerated percentage of the success rate.
15. The method of claim 1, wherein the service provider infrastructure initially assigns the second numerical value to the exit node denoting a maximum capacity of the exit node.
16. The method of claim 15, wherein the maximum capacity of the exit node signifies the maximum number of user requests that can be executed concurrently by the exit node with a lowest rate of failure.
17. The method of claim 1, wherein the proxy supernode changes the second numerical value assigned to the exit node such that the success rate is higher than the first numerical value assigned to the exit node.
18. The method of claim 1, wherein the proxy supernode, after changing the second numerical value assigned to the exit node, stores the second numerical value within memory.
19. The method of claim 1, wherein the proxy supernode calculates the quality rate value of the exit node using at least one value or combination from: a) time taken by the exit node to perform a benchmark request; b) latency of the exit node; c) probability of disconnection of the exit node during a time frame set by the service provider infrastructure, calculated from the disconnection chronology.
20. The method of claim 1, wherein the quality rate value for the exit node is represented on a scale of 0-100.
21. The method of claim 1, wherein the proxy supernode, the first database, and the second database are present within the service provider infrastructure.
22. The method of claim 1, wherein the proxy supernode continually monitors the exit node.
Description
DESCRIPTION OF DIAGRAMS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
DETAILED DESCRIPTION
(12) A detailed description of one or more exemplary embodiments is provided below, along with the accompanying figures that show the steps involved in the described embodiments. Numerous specific details are provided in the following description in order to provide a thorough understanding of the described embodiments, which may be implemented according to the claims without some or all of these specific details.
(13) Some general terminology descriptions may be helpful and are included herein for convenience and are intended to be interpreted in the broadest possible interpretation.
(14) User Device 102—where a user can be any person or a business entity requesting and using proxies for the purpose of obtaining relevant information from the Web (e.g., for the purpose of collecting information, scraping websites, etc.), a User Device 102 can be any device that is capable of making requests to the proxy, including any physical device that is connected to a network; including, for example, a laptop, a mobile phone, a tablet computer, or any other smart device. Additionally, it should be noted that the term “user” is being used in the interest of brevity and may refer to any of a variety of entities that may be associated with a subscriber account such as, for example, a person, an organization, an organizational role within an organization, a group within an organization, requesting and using proxy services to obtain relevant information from the web (e.g., scraping, streaming, etc.).
(15) Service Provider Infrastructure 104—an infrastructure of the party providing the proxy as a service to the customer. Service Provider Infrastructure 104 comprises of: Front-end Proxy 106, Pool Database 110, Session Database 112, and Proxy Supernode 108. In some embodiments, Proxy Supernode 108 can be situated in different geographical locations and outside the Service Provider Infrastructure 104. However, the overall functions of both Service Provider Infrastructure 104 and Proxy Supernode 108 remain unchanged. Nevertheless, by architectural design, Proxy Supernode 108 remains a part of Service Provider Infrastructure 104.
(16) Front-end (FE) Proxy Server or front-end (FE) proxy 106—a proxy and a gateway providing interface into the Service Provider Infrastructure 104 for a User Device 102 or a group of User Devices 102. FE Proxy 106 is a constituent of the Service Provider Infrastructure 104 and can receive and forward requests from User Device 102 and send back the responses to User Devices 102 via Network 130. FE Proxy 106 may provide data caching services and serve User Device 102 with data stored in a local cache if the cached data is precisely the data requested by the user to control the bandwidth utilization at the exit node.
(17) Proxy Supernode 108—a proxy server and a processing unit configured to perform several complex functions. Proxy Supernode 108 communicates and maintains connections with multiple exit nodes to service the user requests. Proxy Supernode 108 is configured to continuously monitor exit nodes' overall performances and report empirical data of exit nodes' performances to Session Database 112. Further, Proxy Supernode 108 is configured to periodically test, analyze and calculate exit nodes' quality rate individually. Proxy Supernode 108 can report quality rates of exit nodes to Pool Database 110. In addition to calculating quality rates, Proxy Supernode 108 computes available capacity for each exit node and reports the computed available capacity for each exit node to Pool Database 110. Proxy Supernode 108 is responsible for selecting and forwarding the request from User Device 102 to exit node(s) present in several pools of exit nodes based on exit nodes' quality rate and available capacity. In the embodiments disclosed herein, Proxy Supernode 108 is a constituent of Service Provider Infrastructure 104. Proxy Supernode 108 can be located in a different geographical location outside the Service Provider Infrastructure 104; however, the overall functions remain unchanged.
(18) Pool Database 110— a memory storage that stores information about exit nodes according to their respective pools. Specifically, Pool Database 110 can contain data but are not limited to quality rates and available capacity values of each exit node classified according to their respective pools. Proxy Supernode 108 can populate, amend and retrieve the contents of Pool Database 110 regularly. Pool Database 110 is a part of Service Provider Infrastructure 104 and can be a physical storage unit or cloud-based storage.
(19) Session Database 112— a memory storage that stores empirical data of multiple exit nodes. An exit node's empirical data can include, the detailed log of exit nodes' connection and disconnection from Proxy Supernode 108 along with their respective timestamps (disconnection chronology), instances of observed failures and/or corrupt responses before the present concurrency (Pχ) value reaches the maximum capacity (C.sub.max) value, the present concurrency (Pχ) value, effective load, pool assignment timestamps, the total number of users serviced by the exit node. Proxy Supernode 108 populates and amends Session Database 112 with the aforementioned empirical data continually. Session Database 112 is a part of Service Provider Infrastructure 104 and can be a physical storage unit or cloud-based storage.
(20) Exit Node A 114; Exit Node B 116; exit node(s)—an exemplary instance of proxies that are used to reach specific targets. In simple terms, exit node is the last gateway before the traffic reaches the target. Several proxy servers can be used to execute a user's request; however, exit node is the final proxy that contacts the target and retrieves data from the target. Exit nodes can be, for example, a laptop, a mobile phone, a tablet computer, or smart devices. Further on, exit nodes can also be a device, which is capable of network connectivity, but not primarily intended for networking, such as connected home appliances, smart home security systems, autonomous farming equipment, wearable health monitors, smart factory equipment, wireless inventory trackers, biometric cybersecurity scanners, shipping containers, and others. Exit nodes can be located in different geographical locations. The disclosure presents an exemplary system of such exit nodes, but the total number of exit nodes in the pool may vary according to the proxy service provider's infrastructure.
(21) Exit Node Pool 118—an exemplary instance of a set of exit nodes that is being actively used for servicing requests from User Device 102. There can be an unlimited number of the exit nodes stored in the exit node pool.
(22) Target 120; target(s)—an exemplary instance of a server serving any kind of media content, resources, information, services over the Internet or other network. Target can be, for example, a particular IP address, a domain name, and/or a hostname, possibly with a defined network protocol port, that represents a resource address at a remote system serving the content accessible through industry standard protocols. Target may be a physical or a cloud server that contains the content requested through the target address.
(23) Network 122—is a digital telecommunications network that allows nodes to share and access resources. Examples of a network: local-area networks (LANs), wide-area networks (WANs), campus-area networks (CANs), metropolitan-area networks (MANs), home-area networks (HANs), Intranet, Extranet, Internetwork, Internet. In the current disclosure, the Internet is the most relevant Network for the functioning of the method.
(24) Proxy service provider—a party providing the proxying functionality that is delivered to a user as a service composed of proxies, that act as an intermediary for requests from clients seeking resources from other servers, and the proxy management components. One of the many available typologies for proxy servers being the type of IP address the proxy uses, including but not limited to Residential IPs proxies, Datacenter IP proxies, and Mobile IPs proxies.
(25) Quality rate; Q.sub.r—a numerical value calculated and assigned to an individual exit node by Proxy Supernode 108. Quality rate (Q.sub.r) is an aggregate criterion which is calculated by testing exit nodes periodically by various methods, and evaluating the responses to the aforementioned tests. In at least one exemplary instance in the current embodiment, the quality rating value for an exit node is calculated and assigned by evaluating at least the following—a) time taken by a particular exit node to perform a benchmark request to a specific target; b) latency while performing ping tests against a particular exit node; c) probability of a particular exit node's disconnections during the next ten minutes. Proxy Supernode 108 calculates the aforementioned probability by using the disconnection chronology of the particular exit node. In the current embodiment, Proxy Supernode 108 is configured by Service Provider Infrastructure 104 to calculate the probability of an exit node's disconnection during the next ten minutes. However, Service Provider Infrastructure 104 can decide through intelligent analysis the time period for which the aforementioned probability is calculated. More specifically, quality rate (Q.sub.r) value, is calculated using an exemplary formula:
Q.sub.r=(min(β/a,0.5)+min(ψ/b,0.5))×(1−c) where, β—benchmark threshold constant, denoting the ideal benchmark request speed (in milliseconds) of an exit node. Here, the value of β is 100. ψ—ping threshold constant, denoting the ideal ping latency (in milliseconds) of an exit node. Here, the value of ψ is 10. a—time taken (in milliseconds) by an exit node to perform a benchmark request to a specific target. b—latency (in milliseconds) while performing ping tests against an exit node. c—probability that an exit node will disconnect during the next ten minutes, calculated from the disconnection chronology of a particular exit node. The min ( ) function in the above formula takes the minimum value of the given sets, such that the value of each set does not exceed the value of 0.5. Additionally, in at least one exemplary instance in the current embodiment, quality rating values are assigned on a scale of 0-100; however, any alternative scale can be used to assign quality rating values.
(26) Maximum capacity; C.sub.max—a numerical value that denotes the maximum number of concurrent requests that can be executed successfully via a particular exit node. In other words, maximum capacity of an exit node is the total number of concurrent requests that the exit node can handle without failing or being blocked by the target. Here, the term “request” implies the full flow of data from User Device 102 via Service Provider Infrastructure 104 to an exit node and returning to the User Device 102. Service Provider infrastructure 104 can initially configure Proxy Supernode 108 to assign, based on intelligent analysis, a standard value of C.sub.max common for every exit node available with Proxy Supernode 108. However, through continuous monitoring of exit nodes' empirical data, if Proxy Supernode 108 detects the lowering success rates of a particular exit node, in that case, Proxy Supernode 108 can compute and assign a different maximum capacity (C.sub.max) value for that particular exit node.
(27) Present concurrency; Pχ—a numerical count which indicates the number of concurrent requests currently being executed by an exit node. Through continuous monitoring of exit node's performances, Proxy Supernode 108 records Pχ value for each exit node.
(28) Available capacity; C.sub.avail—a numerical value computed by Proxy Supernode 108 for each exit node using the C.sub.max value and the present concurrency (Pχ) value. Specifically, C.sub.avail for an exit node is computed as:
C.sub.avail=C.sub.max−Pχ In simple terms, an exit node's available capacity value indicates the available number of requests that can be executed concurrently without exceeding the maximum capacity value. Therefore, for an exit node, the available capacity value is always less than the maximum capacity value, i.e., C.sub.avail<C.sub.max. However, for a new exit node or an exit node with no active connections, the available capacity can be equal to the maximum capacity value, i.e., C.sub.avail=C.sub.max. Therefore, C.sub.avail is always ≤C.sub.max.
(29) Success rate—a percentage of user requests successfully executed by an exit node at every value of Pχ (present concurrency value).
(30) Minimum tolerance rate—a tolerated or a minimum percentage of success rate for every value of Pχ (present concurrency value).
(31) In one aspect, the present embodiments include a system and a method for effectively managing proxy service quality. Those of ordinary skill in the art will realize that the following detailed description of the present embodiments is illustrative only and is not intended to be in any way limiting. Other embodiments of the present system(s) and method(s) will readily suggest themselves to such skilled persons having the benefit of this disclosure. Reference will now be made in detail to implementations of the present embodiments as illustrated in the accompanying drawings. The same reference indicators will be used throughout the drawings and the following detailed description to refer to the same or like parts.
(32)
(33) In
(34) Within the Service Provider Infrastructure 104, FE Proxy 106 and Proxy Supernode 108 can communicate with each other, while Proxy Supernode 108 can access Pool Database 110 and Session Database 112. Furthermore, in Service Provider Infrastructure 104, FE Proxy 106 can communicate with an outside element, namely, the User Device 102. Likewise, Proxy Supernode 108 can communicate with outside elements, namely, Exit Node A 114, Exit Node B 116. User Device 102, Service Provider Infrastructure 104, Exit Node A 114, Exit Node B 116, and Target 120 have access to Network 122 and communicate with each other through the same. In
(35) In
(36) Upon receiving the request from User Device 102, FE Proxy 106 forwards the request to Proxy Supernode 108, which checks the request and chooses a suitable exit node pool by accessing the Pool Database 110. After choosing a suitable pool, Proxy Supernode 108 retrieves and checks the metadata of exit nodes belonging to the chosen exit node pool. The retrieved metadata contains the quality rates (Q.sub.r) and available capacity (C.sub.avail) values of each exit node in the respective pool. Proxy Supernode 108 analyzes the retrieved metadata to select an exit node to service the user request. In one of the embodiments, from the retrieved metadata, Proxy Supernode 108 identifies the exit nodes with greater than zero available capacity (C.sub.avail) values, i.e., C.sub.avail>0. After which, Proxy Supernode 108 arranges the identified exit nodes according to their respective quality rating (Q.sub.r) values in a descending order, i.e., beginning with the highest Q.sub.r value. By identifying and arranging the exit nodes with available capacity (C.sub.avail) values greater than zero, Proxy Supernode 108 can isolate the exit nodes with zero available capacity (C.sub.avail) values. Proxy Supernode 108 selects an exit node with the highest quality rate (Q.sub.r) value from the arranged list of exit nodes. If there are multiple exit nodes with the highest quality rate (Q.sub.r) value, then Proxy Supernode 108 selects an exit node with the highest quality rate (Q.sub.r) values at random.
(37) When a new request from another User Device 102 occurs, Proxy Supernode 108 can again select the previously selected exit node with the highest quality rate (Q.sub.r) if the available capacity (C.sub.avail) value is still greater than zero. If C.sub.avail=0 for an exit node, it indicates that the number of exit node's concurrent requests has reached its maximum limit and can no longer execute further requests.
(38) After selecting the exit node, Proxy Supernode 108 forwards the request for data extraction to the respective exit node, which in turn forwards the request to the intended target. Thus, through the current embodiment, Service Provider Infrastructure 104 is able to select an exit node to utilize to its fullest capacity without failing or being blocked by the target.
(39) In another embodiment, after choosing a suitable exit node pool and retrieving the metadata of exit nodes belonging to the chosen exit node pool, Proxy Supernode 108 selects an exit node with the highest quality rate (Q.sub.r) and the highest available capacity (C.sub.avail) value. In case of a new request from another User Device 102, Proxy Supernode 108 can still select the previously selected exit node with the highest quality rate (Q.sub.r) if the available capacity value (C.sub.avail) is greater than zero. If C.sub.avail=0 for an exit node, it indicates that the number of exit node's concurrent requests has reached its maximum and additional requests are not sent to the exit node.
(40) If the available capacity value (C.sub.avail) for the particular exit node with the highest quality rate (Q.sub.r) is zero, Proxy Supernode 108 chooses another exit node with the second highest quality rate (Q.sub.r) and a non zero available capacity value (C.sub.avail). After selecting the exit node, Proxy Supernode 108 forwards the request for data extraction to the respective exit node, which in turn forwards the request to the intended target. Thus, through the current embodiment, Service Provider Infrastructure 104 is enabled to select an exit node to utilize to its fullest capacity without failing or being blocked by the target.
(41) In yet another aspect, in
(42) However, if the success rate declines below the minimum tolerance rate for a particular exit node or exit nodes, Proxy Supernode 108 can detect the decline in the success rate for a particular exit node or exit nodes and can re-compute and assign a different maximum capacity (C.sub.max) value for the particular exit node or exit nodes so that the success rate for the aforementioned exit nodes remains higher than the minimum tolerance value.
(43) Additionally, Proxy Supernode 108 periodically tests each exit node belonging to several pools. The testing of exit nodes is carried out through, but is not limited to, benchmark requests and ping messages. Proxy Supernode 108 can send benchmark requests to exit nodes, wherein the requests are intended for one target or several different targets. The targets are dynamically determined internally by Proxy Supernode 108. Proxy Supernode 108 can monitor and register several parameter metrics of exit nodes, including, but not limited to: time taken to reach a specific target, number of hops to reach the exit node, availability, and latency while performing ping tests.
(44) Through testing the exit nodes, Proxy Supernode 108 obtains and analyzes the responses provided by the exit nodes to calculate their quality rate (Q.sub.r). While calculating the quality rate (Q.sub.r) for each exit node, Proxy Supernode 108 uses the values of a) time taken (in milliseconds) by a particular exit node to perform a benchmark request to a specific target; b) latency (in milliseconds) while performing ping test on a particular exit node; c) probability of a particular exit node's disconnections during the next ten minutes, calculated from the disconnection chronology of the particular exit node. More specifically, quality rate (Q.sub.r) value, is calculated using an exemplary formula:
Q.sub.r=(min(β/a,0.5)+min(ψ/b,0.5))×(1−c)
(45) In the current embodiment, quality rate (Q.sub.r) values are assigned on a scale of 0-100. The method and the mathematical formula for quality rate calculation is initially configured into Proxy Supernode 108 by Service Provider Infrastructure 104. After calculating the quality rates of exit nodes, Proxy Supernode 108 reports each exit node's quality rate to the Pool Database 110.
(46)
(47) After establishing the connection between User Device 102 and FE Proxy 106, in step 205, User Device 102 sends a request for data extraction intended for a specific target towards FE Proxy 106. Together with the request for data extraction, User Device 102 can send requirements for exit node pool selection, and verification credentials for user validation carried out at Proxy Supernode 108. Verification credentials can include, but are not limited to, user identifications, passwords, hash identifications, serial numbers and PINs. FE Proxy 106 receives the request for data extraction from User Device 102 and, in step 207, forwards the request to Proxy Supernode 108 present within the Service Provider Infrastructure 104. In some embodiments, FE Proxy 106 can add session identification to the request received from User Device 102 before forwarding the request to Proxy Supernode 108. Session identification can be generated and assigned to ensure a session's association with the context of the same User Device 102. Here, the term session generally refers to temporary and interactive data exchange between the User Device 102 and the Service Provider Infrastructure 106.
(48) Proxy Supernode 108 receives the request for data extraction from FE Proxy 106. Proxy Supernode 108 can carry out the user validation by verifying the credentials sent along with the request against the data from an internal database within Proxy Supernode 108 or an external database. Once user validation is successful, Proxy Supernode 108 checks the request to evaluate the requirements for exit node pool selection that are sent with the request. Requirements can include several attributes such as, but not limited to, exit node geo-location, ability to reach specific targets, and latency. After checking the request, Proxy Supernode 108 accesses the Pool Database 110 to select a suitable exit node pool in order to satisfy the requirements sent with the request. If the requirements for exit node pool selection is absent, Proxy Supernode 108 can select a suitable exit node pool randomly.
(49) After choosing a suitable exit node pool, Proxy Supernode 108, in step 209, retrieves the metadata of exit nodes belonging to the chosen pool, from Pool Database 110. The metadata retrieved from Pool Database 110 contains information regarding exit nodes available in the particular pool. Metadata includes, but is not limited to, an IP address of each exit node, geo-location of each exit node, quality rates (Q.sub.r) values and available capacity (C.sub.avail) values for each exit node. Promptly after, in step 211, Proxy Supernode 108 analyzes the retrieved metadata. Specifically, Proxy Supernode 108 identifies exit nodes with greater than zero available capacity (C.sub.avail) values (i.e., C.sub.avail>0).
(50)
(51) In step 215, Proxy Supernode 108 selects an exit node with the highest quality rate (Q.sub.r) value from the arranged list of exit nodes. If there are multiple exit nodes with the highest quality rate (Q.sub.r) value, then Proxy Supernode 108 selects an exit node with the highest quality rate (Q.sub.r) values at random.
(52) In step 217, Proxy Supernode 108 forwards the request for data extraction to the selected exit node (represented by Exit Node A 114). In step 219, after receiving the request from Proxy Supernode 108, Exit Node A 114 initiates a connection with Target 120. Consequently, in step 221, Target 120 confirms the connection, thereby establishing the connection with Exit Node A 114. There can be more messages exchanged as part of initiating and establishing the connection according to communication protocols' norms. Step 219 and 221 are meant to include all steps necessary to establish a connection between Exit Node A 114 and Target 120, based on the employed communication protocol.
(53)
(54)
(55) After establishing the connection between User Device 102 and FE Proxy 106, in step 305, User Device 102 sends a request for data extraction intended for a specific target towards FE Proxy 106. Together with the request for data extraction, User Device 102 can send requirements for exit node pool selection, and verification credentials for user validation carried out at Proxy Supernode 108. Verification credentials can include, but are not limited to, user identifications, passwords, hash identifications, serial numbers and PINs. FE Proxy 106 receives the request for data extraction from User Device 102 and, in step 307, forwards the request to Proxy Supernode 108 present within the Service Provider Infrastructure 104. In some embodiments, FE Proxy 106 can add session identification to the request received from User Device 102 before forwarding the request to Proxy Supernode 108. Session identification can be generated and assigned to ensure a session's association with the context of the same User Device 102. Here, the term session generally refers to temporary and interactive data exchange between the User Device 102 and the Service Provider Infrastructure 106.
(56) Proxy Supernode 108 receives the request for data extraction from FE Proxy 106. Proxy Supernode 108 can carry out the user validation by verifying the credentials sent along with the request against the data from an internal database within Proxy Supernode 108 or an external database. Once user validation is successful, Proxy Supernode 108 checks the request to evaluate the requirements for exit node pool selection that are sent with the request. Requirements can include several attributes such as, but not limited to, exit node geo-location, ability to reach specific targets, and latency. After checking the request, Proxy Supernode 108 accesses the Pool Database 110 to choose a suitable exit node pool in order to satisfy the requirements sent with the request. If the requirements for exit node pool selection is absent, Proxy Supernode 108 can select a suitable exit node pool randomly.
(57) After choosing a suitable exit node pool, Proxy Supernode 108, in step 309, retrieves the metadata of exit nodes belonging to the chosen pool, from Pool Database 110. The metadata retrieved from Pool Database 110 contains information regarding exit nodes available in the particular pool. Metadata includes, but is not limited to, IP address of each exit node, geo-location of each exit node, quality rates (Q.sub.r) values and available capacity (C.sub.avail) values for each exit node. Promptly after, in step 311, Proxy Supernode 108 analyzes the quality rate (Q.sub.r) value for each exit node.
(58)
(59) Thus, after the manner described above, Proxy Supernode 108 selects an exit node from the chosen pool of exit nodes. In step 317, Proxy Supernode 108 forwards the request for data extraction to the selected exit node (represented by Exit Node A 114). In step 319, after receiving the request from Proxy Supernode 108, Exit Node A 114 initiates a connection with Target 120. Consequently, in step 321, Target 120 confirms the connection, thereby establishing the connection with Exit Node A 114. There can be more messages exchanged as part of initiating and establishing the connection according to communication protocols' norms. Step 319 and 321 are meant to include all steps necessary to establish a connection between Exit Node A 114 and Target 120, based on the employed communication protocol.
(60)
(61) In another embodiment, Proxy Supernode computes available capacity (C.sub.avail) values for each exit node by continuously monitoring the present number of concurrent requests executed by that exit node.
(62) Further, while constantly monitoring exit nodes' overall performances, in step 403, Proxy Supernode 108 reports empirical data of each exit node to Session Database 112 regularly. Empirical data can include, but is not limited to: present concurrency (Pχ) value, disconnection chronology, success rate, instances of observed failures and/or corrupt responses before reaching maximum capacity value, effective load, pool assignment timestamps, the total number of users serviced by the exit node. In step 405, Proxy Supernode 108 proceeds to compute available capacity (C.sub.avail) value for each exit node by utilizing the present concurrency values (Pχ) of exit nodes. The present concurrency is a numerical count, which indicates the number of concurrent requests currently being executed by an exit node. Proxy Supernode 108 computes available capacity (C.sub.avail) as:
C.sub.avail=C.sub.max−P.sub.χ
(63) In the current embodiment, C.sub.max or maximum capacity value denotes the maximum number of concurrent requests that can be executed successfully via a particular exit node. Here, the term “request” implies the full flow of data from User Device 102 via Service Provider Infrastructure 104 to an exit node and returning to the User Device 102. Initially, Service Provider infrastructure 104 can configure Proxy Supernode 108 to assign, based on intelligent analysis, a common value of C.sub.max to every exit node known by Proxy Supernode 108. For instance, C.sub.max can be assigned as twelve for every exit node known by Proxy Supernode 108. C.sub.max=12 implies that exit nodes can execute twelve concurrent requests successfully. However, if, through continuous monitoring of exit nodes' empirical data, Proxy Supernode 108 detects the lowering success rates of a particular exit node, Proxy Supernode 108 can compute and assign a different maximum capacity (C.sub.max) value for that particular exit node.
(64) By calculating the available capacity (C.sub.avail) value for each exit node, Proxy Supernode 108 can determine the number of requests that can still be executed concurrently by each exit node while avoiding potential failures or being blocked by the target. Therefore, after computing available capacity (C.sub.avail) values, in step 407 Proxy Supernode 108 reports the computed available capacity (C.sub.avail) values for each exit node according to their pool classification to Pool Database 110.
(65)
(66) Service Provider infrastructure 104 initially can configure Proxy Supernode 108 to assign, based on intelligent analysis, a common value for the minimum tolerance rate for every exit node that is available with the Proxy Supernode 108. Proxy Supernode 108 calculates success rate for each Pχ value (present concurrency values) of exit nodes. Proxy Supernode 108 ensures that the success rate at every Pχ value is higher than the minimum tolerance rate.
(67) However, if the success rate for certain exit nodes is lower than the minimum tolerance rate, in step 503, Proxy Supernode 108 detects and identifies the exit node with the declined success rate, i.e., success rate lower than the minimum tolerance rate. Consequently, in step 505, Proxy Supernode 108 determines and assigns a different maximum capacity (C.sub.max) value (by lowering the original one to some degree) of the particular exit node such that the success rate remains higher than the minimum tolerance rate. This is done by lowering the C.sub.max value to a specific Pχ value in which the success rate of the exit node is higher than the minimum tolerance rate. Proxy Supernode 108 uses its internal memory for storing the maximum capacity (C.sub.max) value of every exit node in. Proxy Supernode 108 can update its internal memory with the changed maximum capacity (C.sub.max) values for certain exit nodes at any time.
(68) Further, while constantly monitoring exit nodes' overall performances, in step 507 Proxy Supernode 108 reports empirical data of each exit node to Session Database 112 regularly. Empirical data can include, but is not limited to: present number of concurrent requests, disconnection chronology, success rates, instances of observed failures and or corrupt responses before reaching maximum capacity value, effective load, pool assignment timestamps, the total number of users serviced by the exit node.
(69) In step 509, Proxy Supernode 108 proceeds to compute available capacity (C.sub.avail) for each exit node by utilizing the present concurrency values (Pχ) of each exit node. The present number of concurrent requests is a numerical count, which indicates the number of concurrent requests currently being executed by an exit node. Proxy Supernode 108 computes available capacity (C.sub.avail) as:
C.sub.avail=C.sub.max−Pχ
(70) In the current embodiment C.sub.max, or maximum capacity value, denotes the maximum number of concurrent requests that can be executed successfully via a particular exit node. Here, the term “request” implies the full flow of data from User Device 102 via Service Provider Infrastructure 104 to an exit node and returning to the User Device 102. Initially, Service Provider infrastructure 104 can configure Proxy Supernode 108 to assign, based on intelligent analysis, a common value of C.sub.max for every exit node available with Proxy Supernode 108. However, through continuous monitoring of exit nodes' empirical data, if Proxy Supernode 108 detects lowering success rates of a particular exit node, in that case, Proxy Supernode 108 can compute and assign a different maximum capacity (C.sub.max) value for that particular exit node.
(71) By calculating the available capacity (C.sub.avail) values for each exit node, Proxy Supernode 108 can determine the number of requests that can be still executed by each exit node without potential failures. Therefore, after computing available capacity values (C.sub.avail), in step 511 Proxy Supernode 108 reports the computed available capacity (C.sub.avail) values for each exit node according to their pool classification to Pool Database 110.
(72)
(73) When executing a benchmark request test, Proxy Supernode periodically sends benchmark requests to predefined targets via exit nodes in the Exit Node Pool 118. The targets are dynamically determined by Proxy Supernode 108. Similarly, when executing the ping test, Proxy Supernode periodically sends out a ping message to each exit node in the Exit Node pool 118. Proxy Supernode 108 can use network communication protocols including but is not limited to Internet Control Message Protocol ICMP, TCP and UDP to send the ping message. ICMP is one of the supporting protocols within the Internet Protocol (IP) and is used to send messages and operational information between network devices. However, ICMP is not typically part of regular data communication; ICMP is instead used for establishing and maintaining network communication as a diagnostic and troubleshooting tool. The ICMP ping message can contain up to 64 data bytes and 8 bytes of protocol reader information. Therefore, step 601 is meant to include all necessary steps for sending a benchmark request and a ping message to each exit node in Exit Node Pool 118.
(74) In step 603, each exit node in the Exit Node Pool 118 responds to the tests carried out by Proxy Supernode 108 by providing the appropriate responses. In case of a benchmark test, exit nodes respond by retrieving the necessary data from the intended target and forward the retrieved data to Proxy Supernode 108. Likewise, in case of a ping test, exit nodes respond to the ping message. Moreover, in a ping test, the response is often termed as the pong message. The response simply will echo back the ping message that was sent by Proxy Supernode 108. Therefore, step 603 is meant to include all necessary steps for sending the appropriate responses to both benchmark requests and ping messages.
(75) In step 605, proxy Supernode 108 obtains disconnection chronology for each exit node present in the Exit Node Pool 118 by accessing the Session Database 112. In step 607, after obtaining the disconnection chronology from Session Database 112, Proxy Supernode 108 proceeds to calculate the probability of each exit node's disconnection during the next ten minutes. In the current embodiment, Proxy Supernode 108 is configured by Service Provider Infrastructure 104 to calculate the probability of an exit node's disconnection during the next ten minutes. However, Service Provider Infrastructure 104 can decide through intelligent analysis the time period for which the aforementioned probability is calculated.
(76) Proxy Supernode 108 analyzes the exit nodes' responses and calculates the quality rate value (Q.sub.r) for each exit node. Proxy Supernode 108 calculated the quality rate (Q.sub.r) value for each exit node by an exemplary formula:
Q.sub.r=(min(β/a,0.5)+min(ψ/b,0.5))×(1−c) where, β—benchmark threshold constant, denoting the ideal benchmark request speed (in milliseconds) of an exit node. Here, the value of β is 100. ψ—ping threshold constant, denoting the ideal ping latency (in milliseconds) of an exit node. Here, the value of ψ is 10. a—time taken (in milliseconds) by an exit node to perform a benchmark request to a specific target. b—latency (in milliseconds) while performing ping tests against an exit node. c—probability that an exit node will disconnect during the next ten minutes, calculated from the disconnection chronology of the particular exit node. Moreover, the min ( ) function in the above formula takes the minimum value of the given sets, such that the value of each set does not exceed the value of 0.5.
(77) For instance, for a particular exit node, if a=300; b=30; c=0.4 (i.e., 40% probability that the particular exit node will be disconnected during the next ten minutes); then Q.sub.r can be calculated as:
Q.sub.r=(min(100/300,0.5)+min(10/30,0.5))×(1−0.4)
Q.sub.r=(min(0.33,0.5)+min(0.2,0.5))×0.6
Q.sub.r=0.318 Since, in the current embodiment the quality rate (Q.sub.r) values are assigned on a scale of 0-100, the obtained answer is multiplied by 100. Therefore in the above equation,
Q.sub.r=0.318×100=31.8
(78) In another instance, for a particular exit node, if a=150; b=10; c=0 (i.e., 0% probability that the particular exit node will be disconnected during the next ten minutes); then Q.sub.r can be calculated as:
Q.sub.r=(min(100/150,0.5)+min(10/10,0.5))×(1−0)
Q.sub.r=(min(0.67,0.5)+min(1,0.5))×1
Q.sub.r=1 Since, in the current embodiment the quality rate (Q.sub.r) values are assigned on a scale of 0-100, the obtained answer is multiplied by 100. Therefore in the above equation,
Q.sub.r=1×100=100 Also, notice that the above example of a particular exit node has quality rate (Q.sub.r) value as 100, which implies that the particular exit node has the maximum possible quality rate value.
(79) In another instance, for a particular exit node, if a=90; b=5; c=0.95 (i.e., 95% probability that the particular exit node will be disconnected during the next ten minutes); then Q.sub.r can be calculated as:
Q.sub.r=(min(100/90,0.5)+min(10/5,0.5))×(1−0.95)
Q.sub.r=(min(1.11,0.5)+min(2,0.5))×0.05
Q.sub.r=0.05 Since, in the current embodiment the quality rate (Q.sub.r) values are assigned on a scale of 0-100, the obtained answer is multiplied by 100. Therefore in the above equation,
Q.sub.r=0.05×100=5 Notice that in the above example, high probability of disconnection can significantly reduce the quality rate (Q.sub.r) value.
(80) In step 609, after calculating the quality rate for each exit node present in the Exit Node Pool 118, Proxy Supernode 108 reports the calculated quality rate values (Q.sub.r) to Pool Database 110. Specifically, Proxy Supernode 108 reports the calculated quality rate value (Q.sub.r) for each exit node according to their pool classification to Pool Database 110.
(81) Generally, the embodiments disclosed herein relate to the field of proxy technologies and services. The embodiments herein may be combined or collocated in a variety of alternative ways due to design choice. Accordingly, the features and aspects herein are not in any way intended to be limited to any particular embodiment. Furthermore, one must be aware that the embodiments can take the form of hardware, firmware, software, and/or combinations thereof. In one embodiment, such software includes but is not limited to firmware, resident software, microcode, etc.
(82) Furthermore, some aspects of the embodiments herein can take the form of a computer program product accessible from the computer readable medium 706 to provide program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, the computer readable medium 706 can be any apparatus that can tangibly store the program code for use by or in connection with the instruction execution system, apparatus, or device, including the computing system 700.
(83) The computer readable medium 706 can be any tangible electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device). Some examples of a computer readable medium 706 include solid state memories, magnetic tapes, removable computer diskettes, random access memories (RAM), read-only memories (ROM), magnetic disks, and optical disks. Some examples of optical disks include read only compact disks (CD-ROM), read/write compact disks (CD-R/W), and digital versatile disks (DVD).
(84) The computing system 700 can include one or more processors 702 coupled directly or indirectly to memory 708 through a system bus 710. The memory 708 can include local memory employed during actual execution of the program code, bulk storage, and/or cache memories, which provide temporary storage of at least some of the program code in order to reduce the number of times the code is retrieved from bulk storage during execution.
(85) Input/output (I/O) devices 704 (including but not limited to keyboards, displays, pointing devices, I/O interfaces, etc.) can be coupled to the computing system 700 either directly or through intervening I/O controllers. Network adapters may also be coupled to the computing system 700 to enable the computing system 700 to couple to other data processing systems, such as through host systems interfaces 712, printers, and/or storage devices through intervening private or public networks. Modems, cable modems, and Ethernet cards are just examples of network adapter types.
(86) The disclosure presents a method for rating proxy servers to implement a user request for data extraction and gathering from a web server, comprising: computing a capacity value (C.sub.avail) for an exit node by utilizing present concurrency values (Pχ) of the exit node, wherein:
C.sub.avail=C.sub.max−Pχ and wherein: “Pχ” is a numerical count by a computing method, which indicates a number of concurrent requests currently being executed by the exit node; “C.sub.max” is a maximum capacity value that denotes a maximum number of concurrent requests that can be executed successfully via the exit node; and calculating a quality rate (Q.sub.r) values for the exit node by: testing the exit node by carrying out benchmark request tests or ping message tests; obtaining empirical data for the exit node; analyzing responses from the exit node; and calculating a quality rate value (Q.sub.r); rating the exit node according to individual C.sub.avail and Q.sub.r values.
(87) In the method the rated exit node is in a pool and wherein the rated exit node is used for implementing the user request from a user device for data extraction and gathering from the web server by: checking the user request to identify requirements for an exit node pool selection that are sent with the request; choosing a suitable exit node pool conforming to requirements of the request; retrieving and analyzing metadata of exit nodes belonging to the chosen suitable exit node pool, wherein the metadata retrieved contains quality rates (Q.sub.r) and available capacity values (C.sub.avail) of each exit node in the pool; identifying the exit nodes with greater than zero available capacity (C.sub.avail) value; arranging the exit nodes identified according to the quality rate (Q.sub.r) values in a descending order; and, selecting the exit node with a highest quality rate (Q.sub.r) value from the order of the exit nodes.
(88) In the method, if there are multiple exit nodes with an equal highest quality rate (Q.sub.r) value, the method selects a random exit node with the highest quality rate (Q.sub.r) value at random. If the exit node has a highest quality rate value and has the available capacity value (C.sub.avail) of zero, another exit node with a second highest quality rate value (Q.sub.r) and with an available capacity (Cava′) value greater than zero is provided to implement the user request. The user request from the user device for data extraction and gathering from the web server may include verification credentials for user validation. The user validation is carried out by verifying credentials sent along with the request against the data from an internal database or an external database. The user request from the user device for data extraction and gathering from the web server may include requirements for exit node pool selection, such as exit node geo-location, ability to reach specific targets, latency. The metadata of the exit nodes in the chosen suitable exit node pool includes, but is not limited to, IP address of each of the exit nodes, geo-location of each of the exit nodes, quality rates (Q.sub.r) and available capacity (C.sub.avail) for each of the exit nodes. The exit node can be used for a new concurrent request from another user device if the exit node has a highest quality rate (Q.sub.r) and if the available capacity value (C.sub.avail) is not zero. The overall performances of the exit nodes belonging to multiple different pools are continuously monitored and empirical data on exit nodes' performances are reported to a database. The available capacity value (C.sub.avail) of the exit node and the quality rate for the exit node in the pool is stored in a pool database.
(89) By using the described method the exit node is rated and the rated exit node is in a pool and is used for implementing the user request from a user device for data extraction and gathering from the web server by: checking the user request to identify requirements for an exit node pool selection that are sent with the user request; choosing a suitable exit node pool conforming to the requirements of the user request; retrieving and checking metadata of exit nodes belonging to the chosen suitable exit node pool, wherein the metadata retrieved contains quality rates (Q.sub.r) and available capacity values (C.sub.avail) of each exit node in the pool; analyzing the quality rate (Q.sub.r) values; analyzing the available capacity (C.sub.avail) values; selecting the exit node from the chosen pool with a highest quality rate (Q.sub.r) and a highest available capacity (C.sub.avail) value.
(90) The quality rate (Q.sub.r) is calculated by using values of the following attributes: a time taken by the exit node to perform a benchmark request to a specific target; a latency while performing a ping test on the exit node; a probability that the exit node will disconnect in a foreseen time frame which is calculated from a disconnection chronology.
(91) The quality rate (Q.sub.r) value for the exit node is calculated as:
Q.sub.r=(min(β/a,0.5)+min(ψ/b,0.5))×(1-c) wherein, β″ is a benchmark threshold constant, denoting an ideal benchmark request speed (in milliseconds) of the exit node; “ψ” is a ping threshold constant, denoting an ideal ping latency (in milliseconds) of the exit node; “a” is a time taken (in milliseconds) by the exit node to perform a benchmark request to a specific target; “b” is a latency (in milliseconds) while performing a ping test against the exit node; “c” is a probability that the exit node will disconnect in a foreseen time frame, calculated from a disconnection chronology of the exit node. The min (ψ/b, 0.5) function takes a minimum value, such that the value does not exceed 0.5.
(92) In the method disclosed, the empirical data of the exit nodes that is used to exit node evaluation can contain, but is not limited to, a disconnection chronology, instances of observed failures and or corrupt responses before reaching maximum capacity value (C.sub.max), present concurrency (Pχ), effective load, pool assignment timestamps, and a total number of users serviced by the exit node. The disconnection chronology contains a detailed log of connects and disconnects of the exit node from a service provider infrastructure, along with respective timestamps.
(93) In the method, when C.sub.avail=0, it means that a number of concurrent requests has reached a maximum and additional requests are not sent to the exit node. The maximum capacity value (C.sub.max) is a fixed number that is initially assigned to the exit node in the pool based on intelligent analysis. If a success rate declines below a minimum tolerance rate for the exit node, the maximum capacity value (C.sub.max) is re-computed and a different value is assigned to the exit node so that the success rate remains higher than the minimum tolerance value. The minimum tolerance rate denotes a tolerated or accepted success/failure ratio for the exit node executing user requests, is initially a common value configured based on intelligent analysis, and can be changed based on empirical analysis of performance of the exit node.
(94) The maximum capacity value (C.sub.max) of the exit node in the pool is calculated by: calculating a success rate for each Pχ value of the exit node; ensuring that the success rate at every Pχ value is higher than a minimum tolerance rate; detecting and identifying exit nodes in a pool with success rates lower than the minimum tolerance rate; determining and assigning a new maximum capacity (C.sub.max) value of the exit node; calculating a new success rate of the exit node that is higher than the minimum tolerance rate.
(95) The new maximum capacity (C.sub.max) value is lowered to a specific Pχ value at which the success rate of the exit node is higher than the minimum tolerance rate.
(96) The method disclosed uses the testing of the the exit nodes by carrying out benchmark request tests or ping message tests that are performed at regularly occurring intervals.
(97) In the method disclosed the quality rate (Q.sub.r) values are assigned on a scale of 0-100.
(98) Although several embodiments have been described, one of ordinary skill in the art will appreciate that various modifications and changes can be made without departing from the scope of the embodiments detailed herein. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present teachings. The benefits, advantages, solutions to problems, and any element(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature or element of any or all the claims. The invention is defined solely by the appended claims including any amendments made during the pendency of this application and all equivalents of those claims as issued.
(99) Moreover, in this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms “comprises”, “comprising”, “has”, “having”, “includes”, “including”, “contains”, “containing” or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises, has, includes, contains a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. An element preceded by “comprises . . . a”, “has . . . a”, “includes . . . a”, “contains . . . a” does not, without additional constraints, preclude the existence of additional identical elements in the process, method, article, and/or apparatus that comprises, has, includes, and/or contains the element. The terms “a” and “an” are defined as one or more unless explicitly stated otherwise herein. The terms “approximately”, “about” or any other version thereof, are defined as being close to as understood by one of ordinary skill in the art. A device or structure that is “configured” in a certain way is configured in at least that way but may also be configured in ways that are not listed. For the indication of elements, a singular or plural form can be used, but it does not limit the scope of the disclosure and the same teaching can apply to multiple objects, even if in the current application an object is referred to in its singular form.
(100) This disclosure is provided to allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing detailed description, it is demonstrated that multiple features are grouped together in various embodiments for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment.