Methods and apparatus for traffic management in peer-to-peer networks

Abstract

Methods and apparatus relating to routing and caching systems for reducing traffic and the bandwidth used by decentralized peer-to-peer (P2P) file sharing networks is described. The peer-to-peer network operates over an underlying network including first and second network portions. The method includes routing a peer-to-peer message in one of said network portions with an intended destination in the other of said network portions to a gateway between peer-to-peer modes residing on said first and second network portions. The method further includes controlling transport of said message at said gateway to limit propagation of said message into said other of said network portions.

Claims

1. A method of reducing traffic in a decentralized peer-to-peer network, said decentralized peer-to-peer network operating over an underlying network comprising first and second network portions, the method comprising: identifying, with an Internet Service Provider (ISP) router, whether messages in the first network portion are peer-to-peer messages or other messages; routing all peer-to-peer messages in the first network portion with an intended destination in the second network portion outside of a network of an Internet Service Provider (ISP) to a gateway between peer-to-peer nodes residing on said first and second network portions; and controlling transport of said peer-to-peer messages at said gateway to limit propagation of said peer-to-peer messages into said second network portion, wherein the other messages directly flow from the ISP router into the second network portion, while the peer-to-peer messages only flow through the gateway into the second network portion.

2. The method of claim 1, wherein said first network portion comprises a portion of said underlying network managed by the ISP and said second network portion comprises a portion of said underlying network not managed by the ISP that is connected to said first network portion across a boundary.

3. The method of claim 2, further comprising: limiting a number of peer-to-peer connections across said boundary to a permitted maximum.

4. The method of claim 1, wherein said transport controlling further comprises: blocking said peer-to-peer messages at said gateway.

5. The method of claim 1, wherein said transport controlling further comprises: redirecting said peer-to-peer messages to a peer-to-peer node within said first network portion.

6. The method of claim 1, wherein said transport controlling further comprises: responding to said peer-to-peer messages from said gateway.

7. The method of claim 6 wherein said peer-to-peer messages comprise queries, and wherein said responding further comprises: sending a response to said queries comprising cached data derived from previous responses to the queries.

8. The method of claim 6, wherein said peer-to-peer messages comprise file requests, and wherein said responding further comprises: sending a response to said file requests comprising previously cached data for a requested file.

9. The method of claim 1, wherein said peer-to-peer messages comprises file request messages, and wherein said controlling further comprises: modifying a response to a previous file search request such that said response does not indicate that a requested file may be found in said second network portion.

10. The method of claim 9, wherein said requested file is identified by a hash value.

11. The method of claim 9, further comprising: storing requested files in a cache, wherein said response is modified to refer to said cache.

12. The method of claim 9, wherein said underlying network comprises a third network portion, and wherein said modifying further comprises: modifying said response to indicate that said requested file is obtainable from a peer-to-peer node located on said third network portion.

13. The method of claim 1, wherein said physical network comprises a third network portion, wherein use of each of said network portions has an associated cost, wherein data transport over said third network portion has a cost less than a cost associated with said second network portion, and wherein said controlling further comprises: directing said peer-to-peer messages into said third network portion.

14. The method of claim 1, wherein said peer-to-peer messages have message identifiers, and wherein said controlling further comprises: storing said message identifiers for said peer-to-peer messages; monitoring message identifiers of the peer-to-peer messages passing through said gateway to produce identified messages; and limiting propagation of said identified messages such that said messages pass between said first and second network portions no more than a permitted maximum number of times.

15. The method of claim 14, wherein said permitted maximum number of times is one.

16. A computer network message controller that reduces traffic in a decentralized peer-to-peer network, said decentralized peer-to-peer network operating over a physical network comprising first and second network portions, said network message controller comprising: a router that is configured to identify whether messages in the first network portion are peer-to-peer messages or other messages and route all peer-to-peer messages in the first network portion with an intended destination in the second network portion outside of a network of an Internet Service Provider (ISP) to a gateway between peer-to-peer nodes residing on said first and second network portions; and a gateway controller that is configured to control transport of said peer-to-peer messages into said second network portion, wherein the other messages directly flow from the router into the second network portion, while the peer-to-peer messages only flow through the gateway into the second network portion.

17. The computer network message controller of claim 16, wherein said first network portion comprises a portion of said physical network managed by the ISP and said second network portion comprises a portion of said physical network not managed by the ISP that is connected to said first network portion across a boundary.

18. The computer network message controller of claim 17, wherein said gateway controller is configured to limit a number of peer-to-peer connections across said boundary to a permitted maximum.

19. The computer network message controller of claim 16 wherein said gateway controller is configured to block the peer-to-peer messages at said gateway.

20. The computer network message controller of claim 16 wherein said gateway controller is configured to redirect the peer-to-peer messages to a peer-to-peer node within said first network portion.

21. The computer network message controller of claim 16 wherein said gateway controller is configured to respond to the peer-to-peer messages.

22. The computer network message controller of claim 21, further comprising: a cache that is configured to store data, wherein said peer-to-peer messages comprise queries, and wherein said gateway controller is configured to send a response to said queries including data from said cache.

23. The computer network message controller of claim 21 wherein said peer-to-peer messages comprise file requests, further comprising: a cache that is configured to store data derived from previous responses to file requests, and wherein said gateway controller is configured to send a response to said file request including data from said cache.

24. The computer network message controller of claim 16, wherein said peer-to-peer messages comprise file request messages, and said gateway controller is configured to modify a response to a previous file search request such that said response does not indicate that a requested file may be found in said second network portion.

25. The computer network message controller of claim 24, wherein said requested file is identified by a hash value.

26. The computer network message controller of claim 24, further comprising: a cache that is configured to store requested files, wherein said gateway controller is configured to modify said response to refer to said cache.

27. The computer network message controller of claim 16 wherein said underlying network further comprises: a third network portion, wherein said gateway controller is configured to modify said response to indicate that said requested file is obtainable from a peer-to-peer node located on said third network portion.

28. The computer network message controller of claim 16, wherein said peer-to-peer messages have message identifiers, and wherein said gateway controller is configured to store said message identifiers for said peer-to-peer messages, monitor message identifiers of the peer-to-peer messages passing through said gateway to produce identified messages, and limit propagation of said identified messages such that said identified messages pass between said first and second network portions no more than a permitted maximum number of times.

29. The computer network message controller of claim 28, wherein said permitted maximum number of times is one.

30. The computer network message controller of claim 16, wherein said gateway controller further comprises: a processor, and a program memory that is configured to store processor control code coupled to said processor to load and implement said code.

31. A gateway controller, that is configured to reduce traffic in a decentralized peer-to-peer network operating over an underlying network comprising first and second network portions, the gateway controller operating at a gateway between peer-to-peer nodes residing on said first and second network portions, the gateway controller comprising: an interface for said first and second network portions, that is configured to receive all peer-to-peer messages in the first network portion with an intended destination in the second network portion outside of a network of an Internet Service Provider (ISP), wherein a router is configured to identify whether messages in the first network portion are peer-to-peer messages or other messages; and a controller that is configured to limit propagation of the peer-to-peer messages into the second network portion, wherein the other messages directly flow from the router into the second network portion, while the peer-to-peer messages only flow through the gateway into the second network portion.

32. The gateway controller of claim 31, wherein said controller is configured to block the peer-to-peer messages at said gateway.

33. The gateway controller of claim 31, wherein said controller is configured to redirect the peer-to-peer messages to a peer-to-peer node within said first network portion.

34. The gateway controller of claim 31, wherein said controller is configured to respond to the peer-to-peer messages.

35. The gateway controller of claim 34, further comprising: a query cache that is configured to store data derived from responses to queries, wherein said controller is configured to respond to the queries using data from said query cache, and the peer-to-peer messages comprise queries.

36. The gateway controller of claim 34, further comprising: a file request cache that is configured to store data derived from responses to file requests, wherein the peer-to-peer messages comprise file requests and said controller is configured to respond to said file requests using data from said file request cache.

37. The gateway controller of claim 31, wherein said peer-to-peer messages comprise file request messages, and said controller is configured to modify a response to a previous file search request such that said response does not indicate that a requested file may be found in said second network portion.

38. The gateway controller of claim 37, wherein said requested file is identified by a hash value.

39. The gateway controller of claim 37, further comprising: a cache that is configured to store requested files, wherein said controller is configured to modify said response to refer to said cache.

40. The gateway controller of claim 31, wherein said underlying network further comprises: a third network portion, wherein said controller is configured to modify said response to indicate said requested file is obtainable from a peer-to-peer node located on said third network portion.

41. The gateway controller of claim 31, wherein the peer-to-peer messages have message identifiers, said controller is configured to store said message identifiers for the peer-to-peer messages, monitor the message identifiers of the peer-to-peer messages passing through said gateway to produce identified messages, and limit propagation of said identified messages such that said peer-to-peer messages pass between said first and second network portions no more than a permitted maximum number of times.

42. The gateway controller of claim 41, wherein said permitted maximum number of times is one.

43. The gateway controller of claim 31, wherein said first network portion comprises a portion of said underlying network managed by the ISP and said second network portion comprises a portion of said underlying network not managed by the ISP that is connected to said first network portion across a boundary, and wherein said controller is configured to provide a limited number of peer-to-peer connections across said boundary.

44. The gateway controller of claim 31, wherein said controller further comprises: a processor; and a program memory that is configured to store processor control code coupled to said processor to load and implement said code, said code comprising code to configure said controller to control transport of said message into said other of said network portions.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) These and other aspects of the present invention will now be further described by way of example only with reference to the accompanying figures in which:

(2) FIGS. 1a, 1b and 1c show, respectively, an example of a centralised peer-to-peer network, an example of a decentralised peer-to-peer network, and steps illustrating location and retrieval of a file in a decentralised peer-to-peer network;

(3) FIGS. 2a and 2b show a computer in contact with a remote web server directly and via a proxy cache respectively;

(4) FIGS. 3a to 3f show, respectively, a TCP/IP data packet, a P2P message header, a P2P pong message, a P2P query message, a P2P query hit message, and a P2P Get message;

(5) FIGS. 4a to 4c show, respectively, a P2P network, a P2P network including a gateway node, and an implementation of a gateway node;

(6) FIG. 5 shows an example of an internet service provider network including a P2P gateway;

(7) FIGS. 6a and 6b show, respectively, an embodiment of the gateway node, and tables of a data store for the node of FIG. 6a;

(8) FIG. 7 shows messages in a P2P network including a gateway node;

(9) FIG. 8 shows processing of a P2P query/query hit at a gateway node;

(10) FIG. 9 shows processing of a P2P download request at a gateway node.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

(11) It is helpful for understanding the invention to provide some background information on an example of a P2P protocol, as described below the Gnutella protocol, further details of which can be found at http://www.rfc-gnutella.sourceforge.net/.

(12) FIG. 3a shows a conventional TCP/IP (transmission control/internet protocol) data packet. 300 comprising an IP header 102 (including IP source and destination addresses), a TCP header 304 (including, among other data, source and destination port numbers) and payload data 306. To connect to a P2P network a Gnutella node operating as a client establishes a TCP connection with a server (servent) and sends a connection request which the server acknowledges. Once a connection is established the two P2P nodes then communicate with each other by exchanging messages, sometimes called (protocol) descriptors. These P2P messages or descriptors each have a header 310 as shown in FIG. 3b comprising a node/message identifier 312, a message function 314, a time to live (TTL) counter 316, a hop count 318 and a payload length field 319. The node/message identifier may comprise a combination of a node identifier and a counter value to create a unique message identifier which can be used, for example, to prevent propagation of duplicate messages arising from loops in the system topology. The message function 314 specifies the message type (Ping, Pong and the like); the hop counter 318 is incremented at each network hop from one peer to another and the TTL counter 316 is correspondingly decremented by one; the payload length assigns the length of the accompanying P2P message.

(13) In the Gnutella protocol a Ping message (of zero payload length, not shown in FIG. 3) is used for probing the network and the response by Ping message is a Pong message 320 as shown in FIG. 3c. A Pong message includes a port number 322 and an IP address 324 (those are identified by their IP addresses), together with a field 326 specifying the number of files shared by the node, and a field 328 specifying the number of kilobytes of data shared by the node. Other messages include a query message 330 as shown in FIG. 3d comprising a minimum required speed field 332 and a field 334 carrying search data. The response to a query is a query hit message 340 as shown in FIG. 3e. A query hit message comprises a number of hits field 342, port and IP address fields 344, 346, a (download) speed field 348, result data 350 and a node or servent identifier 352 (such as an IP address). The result data comprises a set of results each having a format 354 comprising a file index 356, a file hash value 358, a file size 360 and a file name 362. The file index 356 is a number which can be used to index a file shared by a node; the hash value 358 is optional. FIG. 3f shows a so-called push message 370 comprising a node identifier 372, a file index 374, a file hash value 376 (optional) and IP address 378, and a port number 380. (Push messages are used to request a host behind a firewall to make an outgoing upload connection to a requesting node).

(14) The Gnutella protocol also includes a set of rules for propagating messages. Thus for Ping and Query messages a node propagating a message to all the nodes to which it is directly connected except the originator of the message, that is ping and query messages are broadcast to all neighbours. Pong, query hit and push messages are sent back along the same path as that which carried the initial associated message (ping or query). A node does not propagate a message with an identical identifier to a message it has previously received. When a message is propagated a node decrements TTL field and increments the hop count.

(15) Once a file has been located it is downloaded using the HTTP (Hyper Text Transfer Protocol) GET command. Thus a query hit includes an IP address and port as well as an index and file size and file name the GET command can be used to request the selected file, indicating the index and file name and a http version number such as version 1.0.

(16) An example of a P2P session is given below:

(17) TABLE-US-00002 Client: QUERY “Madonna American Pie” Server: QUERYHIT IP Address MADONNA - AMERICANPIE.MP3, <INDEX>,<HASH> Client: GET /get/<INDEX>/ MADONNA - AMERICANPIE.MP3 HTTP/1.0

(18) FIGS. 4a and 4b illustrate the concept of a gateway node. FIG. 4a shows a peer-to-peer network with a plurality of nodes 400 some of which are within a network 402 managed by a service provider, and others of which are located elsewhere in the Internet 404. It can be seen that nodes within the ISP's network each have a number of peers in the outside world which results in a large number of connections (shown as lines in FIG. 4a) across the boundary 406 of the ISP's network. Since every packet which crosses boundary 406 costs the ISP money this results in high network costs. Moreover, the large number of connections lead to an untidy topology which is difficult for the ISP to manage.

(19) The effect of introducing a gateway node 408 comprising an active node on the network which acts as a gatekeeper at the edge of the ISP's network 402. All P2P requests are routed through gateway node 408 and attempts to connect to nodes outside the ISP's network results in connections to the gateway node. This facilitates a high level of interconnectivity within the ISP's network and a small number of fixed connections, for example eight, crossing the expensive ISP boundary 406. Since all traffic is routed through the gateway node 408 this node can form the function of a cache router, a bandwidth throttle, and a content filter. The gateway node 408 monitors and routes P2P traffic transparently and also stores information relating to P2P queries and the associated responses (query hits) in a database. In preferred embodiments queried hits are also rewritten so that they refer (directly or indirectly) to indexes of files on the gateway node. The destination port number and internet address of a query hit may also be rewritten to point to the gateway node. Since the gateway node stores the results of past searches it may itself respond to search queries based upon these stored results. The gateway node 408 may also store contents (files) indexed based upon a key or checksum. A file checksum from a query hit packet may be used to access the database for a subsequent P2P download.

(20) Still referring to FIGS. 4a and 4b, consider a network with a number X of hosts on the ISP network and a number Y of hosts outside the network, each node having P peers which are randomly selected from all the nodes on the network. Then for each connection from each internal node, the likelihood of a peer being outside of the network is:

(21) $\frac{Y}{X + Y}$

(22) Therefore, the total number of connections crossing the network border will be:

(23) $\frac{XPY}{X + Y}$

(24) Typical values for an ISP with 10,000 active P2P users would be: X=10,000, Y=1,000,000 and P=4. This results in:

(25) $\frac{10, 000 \times 4 \times 1, 000, 000}{10, 000 + 1, 000, 000} \approx \frac{4 \times 10^{10}}{10^{6}} \approx 4 x 10^{4}$

(26) The effect of 40,000 permanently established TCP connections constantly relaying data is that large amounts of data gets transferred over the ISP's boundary resulting in high network costs. In particular, with the Gnutella network, network traffic represents a considerable proportion of the overall P2P traffic, possibly as much as 40% of the overall traffic.

(27) With a gateway node, this is reduced to a small fixed number of connections (typically 8) independent of the number of users on the network. At a simplistic level this reduces the network bandwidth used by a factor of about 5,000, a 99.98% reduction (to accurately model the traffic saving an understanding of the rules about query routing and the prorogation of messages is needed. This is highly specific to the precise P2P network and the configuration of the clients, but the overall results achieved are still approximately the same.)

(28) As previously mentioned, as well as reducing the amount of traffic being relayed the gateway node also stores data based upon the search traffic and responses that it sees. This enables it to build up a database of the locations of files on the network. This information may then be used for intelligent routing and caching of subsequent download requests.

(29) Referring now to FIG. 4c, this illustrates one implementation of a gateway node 408 comprising, in this example, a P2P caching router. As described further below such a P2P caching router may comprise conventional computer hardware coupled with routing hardware and suitable program codes. In FIG. 4c a plurality of computers 410, typically personal computers are coupled via a network 412 to a router 414, network 412 and router 414 typically comprising part of an ISP's network infrastructure. Non P2P traffic from router 414 is routed directly to internet 416 whilst P2P traffic is routed to P2P caching router 408 and thence to internet 416. This allows the ISP to reduce the level of network traffic on the network managed by the ISP and more particularly to reduce amount of “upstream” bandwidth required by the ISP, that is bandwidth to the Internet external to the ISP. The gateway node 408 preferably caches both network and download traffic and may cache one or both of inbound and outbound traffic (forward and reverse) caching. Router 414 may be configured to recognise P2P traffic based upon, for example, the destinations/ports by looking at or snooping packet contents.

(30) Referring now to FIG. 5, this shows further details of the arrangement of FIG. 4c, and like elements are indicated by like referencing rules. Thus in FIG. 5 PCs 410 are coupled to an ADSL (Asymmetric Digital Subscriber Line) or cable modem 502 with an IP backbone connection 514 to a network 412 managed by ISP 500 but typically operating over physical network hardware provided by a telephone or cable company. The ISP router 414 and P2P gateway 408 are, in the example shown coupled to a common router 506 which connects the ISP to a backbone or core network 508 again, for example, provided by a cable company. Router 506 may separate incoming P2P traffic for gateway 408 in a similar way to that in which router 414 handles outgoing P2P traffic or, alternatively, both incoming and outgoing P2P traffic may be identified by router 414 and sent via gateway 408. Backbone 508 may provide a link to other portions of the ISP's network, as well as one or more links 512 to the networks of other internet service providers; generally backbone 508 will also include a high bandwidth connection into the Internet 416.

(31) As previously mentioned, ISP router 414 (and/or router 506) may identify P2P traffic in a number of ways. For example the Gnutella protocol comprises http traffic sent to ports 6346 and 6347, whilst KaZaa comprises HTTP traffic sent to ports 1214 (although version 2.0 of KaZaa selects a random port for incoming P2P connection). The destination port of a P2P message may be read from the message (see, for example, fields 322 of the Pong message shown in FIG. 3c). Where this is no fixed port for a P2P message a P2P packet may potentially be identified by reading the payload of a packet, for example to identify a P2P header or other P2P-protocol format data FIGS. 6a and 6b show details of the P2P gateway 408 of the FIG. 5. Thus the gateway may comprise a conventional computer system including a processor 602, a working memory 604, permanent program memory 606, a data store 608 and (optional) user interface 612 all linked by a common data and control bus 614. The gateway 408 also includes a data communications card 616 linked to bus 614 to provide physical data communications interfaces, packet processing and routing functionality in accordance with control exercise by processor 602. In the illustrated example three data communications connections are provided, a first by direct communications link 618 to an “internal” ISP network, for example physically provided by a cable company, which generally speaking, will not have an associated per byte cost. A second bi-directional communications link 620 may also be provided to a second physical network, for example of a second cable company, which may provide a reduced cost packet data connection. A third bi-directional communications link 622 provides a connection to external networks (the “rest of the world”), in particular the Internet.

(32) Permanent program memory stores operating system code, (optional) user interface code, data communications control code for controlling data communications card 616, TCP/IP code, P2P protocol code, query/queryhit handling code (described below), and download request handling code (described further below). This code is loaded and implemented by processor 602 to provide the corresponding functions for the gateway node 408. Some or all of this code may be provided on a carrier medium illustratively shown by removal of storage medium 607, such as a CD-ROM.

(33) Data store 608 stores cached data files and cached query hits. FIG. 6b shows file cache tables of data store 608 comprising a source table and a cache table. The cache tables are indexed by a cache ID, in which, one embodiment, comprises a number between 0 and 2.sup.32. Files in data store 608 are indexed by the cache ID, in one embodiment a portion of the cache ID comprising a directory identifier and a portion of the cache ID comprising the file name (for example /123/456/789). Files stored in data store 608 may comprise either complete or partial files; in one embodiment data store 608 comprises approximately 1 TB of RAID storage.

(34) In the cache table, associated with the cache ID, is a hash value such as a MD5 or SHA value, and “InOurCache” flag to indicate whether or not the identified file is cached, a time value to provide a timeout for deleting old files, and optionally a file name. Details of the SHA1 (US Secure Hash Algorithm 1) can be found in RFC (Request for Comments) RFC3174; details of message digest (MD) functions such as MD2 and MD5 can be found on the website of RSA Data Security, Inc and in RFCs 1319-1321. Broadly speaking a hash function generates a fixed length output from a variable length input, the output providing a representation of the input file or message. It is desirable that a hash function is collision resistant, that is that it is unlikely that two different input messages will result in the same output. The MD5 algorithm provides a 128 bit (16 byte) fingerprint or message digest of the input in such a way that it is extremely unlikely that two files with different contents will have the same message digest. The SHA1 algorithm is similar but produces a 20 byte output.

(35) The cache ID in the cache table links to one or more Source IDs for one or more remote sources (that is external to the cache), each having an IP address, port and remote machine index. The cache for queries/query hits is similar but includes the file name which is not needed for the file cache.

(36) FIG. 6 illustrates messages flowing in a P2P network including gateway node as illustrated in FIG. 5, showing steps in locating and then downloading a file. The steps show messages flowing between a user node such as one of personal computers 410, the P2P gateway 408 and a remote P2P node such as a node within internet space 404 in FIG. 4b.

(37) Initially user node 410 issues a query 700 which is received by P2P gateway 408. If this query includes a hash value for a requested file the gateway may be able to respond immediately with a query hit 702 specifying a location for the file, either in the cache or in some other location, preferably within the ISP's network. Likewise if P2P gateway 408 includes a cache of query hits previously sent in response to queries, this cache including file names, then even if query 700 does not include a hash value P2P gateway 408 may be able to respond with a query hit 702 based upon a query hit stored in data store 608 in response to a previous similar query.

(38) If P2P gateway 408 is not able to or configured to respond directly to user node 410 query is relayed in accordance with the P2P protocol to a remote P2P node. This broadcasting step may involve preferentially broadcasting to nodes within the ISP's network or otherwise limiting propagation of the query 700 outside the ISP's network. In accordance with the P2P protocol a query hit 704 in response to query 700 is relayed back to P2P gateway 408. Alternatively where P2P gateway 408 does not comprise an active node on the network query hit 704 may nonetheless be sent via P2P gateway 408 by one of routers 414,506.

(39) When query hit 704 is received by P2P gateway 408 the hash value in the query hit is read and if there is no entry for the hash value in the cache 608 the query hit is added to the cache and a corresponding cache ID is created, thus linking the source of the query hit, cache ID and hash value. Optionally the file name may also be included in the cache. Storing query hits in this way facilitates reducing P2P network search traffic.

(40) If on reading the hash value in query hit 704 it is determined a cached version of a requested file exists within data store 608 then the cache ID of the requested file is substituted for the index in the query hit. This is repeated for all the cached files identified in the query hit. Optionally the IP address of the gateway may be substituted for the source address. The port number of the query hit may also be modified to a known or assigned P2P port number to facilitate subsequent identification of a packet as a P2P data packet (particularly where the P2P protocol employs dynamic port allocation).

(41) The query hit 704 is then relayed back to user node 410 which subsequently issues an HTTP GET request to download the file. Although this request is sent to the address and port specified in the query hit the request is intercepted by the P2P gateway 408. Then using either the hash value (present in the GET request) or the index as a cache ID the gateway 408 checks whether the requested file is stored in the cache. If the requested is cached the gateway node 408 responds immediately with the requested file 708 but if the file is not cached the gateway selects a source for the requested file using the source table of data store 608. (Where no source is listed in the cache the gateway may use the source indicated by the requesting user node 410 in the GET request) the gateway 408 then issues its own GET request 710, selecting a source which may be the remote P2P node which responded with query hit 704 or which may comprise an alternative remote P2P node, for example a P2P node on a network which it is cheaper for the ISP to access or more preferably a P2P node within the ISP's own network. The file 712 is then retrieved from this P2P node, stored within the cache, and then served 708 to the requesting user node 410.

(42) FIGS. 8 and 9 show flow diagrams illustrating in more detail the procedure described above in reference to FIG. 7 described above.

(43) FIG. 8 illustrates the operation of an embodiment of query/queryhit handling code within gateway 408. Thus at step S800 the gateway 408 receives a P2P query and, at step S802 and checks whether it is able to respond from the cache. If gateway 408 is able to respond from the cache it does so at step S804 and the procedure then ends. However if the gateway is unable to respond directly it relays the query to its intended destination at step S806 and then waits to receive a query hit response including one or more file hash values at step S808. The gateway node then checks, at step S810, whether the one or more hash values received at step S808 are in the cache table of data source 608.

(44) If there is no entry for hash value within the cache table then, at step S816, the gateway node assigns a free cache ID to the hash value and adds a new record to the cache table comprising the assigned cache ID, the hash value and a time/date stamp. If the hash value is present then, at step S812, the corresponding cache ID is read from the cache table and, at step S814, the query hit is rewritten with the cache ID as the index and, optionally, with a standard P2P protocol port number to facilitate later P2P packet processing. Although it is not necessary the gateway also rewrites the IP address in the query hit to point to the gateway.

(45) The procedure then updates the cache source table record using the hash value and/or cache ID index (S818) to add a new source table record comprising a source ID, cache ID, IP address, port and remote (machine) index. Optionally at step S820 a search cache comprising corresponding information but also including a file name (to facilitate responding to queries without hash values) may also be updated at step 820. Then at step S822 the rewritten query hit is sent back to the requester and the procedure ends at step S824.

(46) FIG. 9 shows a flow diagram of a procedure for handling a user download request received from a node such as user node 410.

(47) At step S900 the gateway node 408 receives a user download (GET) request and at step S902 checks whether the request includes a hash value. If the request does include a hash value this is read, at step S904 for accessing the cache tables; if not the rewritten index, that is the cache ID, is extracted from the file request for accessing the cache table (step S906). Then, at step S908 this procedure checks whether the request is filed within the cache and if so serves the file to the requesting node from the cache (step S910) and the procedure then ends at step S912. If the requested file is not in the cache the procedure checks at step S914, whether or not a source identifier for the requested file is stored in the cache. If so, at step S916, the available sources for the file are read from the cache and one is selected, either randomly or based upon a cost (monetary or otherwise). For example an internet service provider may have an arrangement with a cable company to provide access to users connected to that cable company network at reduced rates compared with other upstream access from the ISP's network. If, at step S914, there is no source for the file identified in the cache the intended destination of the downloaded request is selected as the source (step S918), for example from the “envelope” of the download request. Then, at step S920, the gateway connects to the identified source and downloads the requested file, saving it in the cache and updating the InOurCache flag (S922) and sending the file to the requester (S924) the procedure then ending at step S926. It will be appreciated that even when the gateway node is unable to read the network search traffic, for example because it is encrypted, where the download request includes a hash value a file may be served to the requester from the cache to significantly reduce at least download traffic on the P2P network. It will further be appreciated that a requesting user may obtain so called FastTrack file access by simply sending a GET result including the hash value for the desired file to the gateway (since the hash function algorithms are widely known hash values for commonly accessed files may be readily listed).

(48) No doubt many other effective alternatives will occur to the skilled person. For example although specific embodiments of the invention have been described with reference to P2P networks operating over TCP/IP, the principals described above may be applied to P2P networks operating over other protocols (e.g. UDP), and in other environments such as, for example, mobile communications systems, wireless computer networks and alike.

(49) The invention encompasses modifications apparent to those skilled in the art lying within the scope of the amended claims.

Methods and apparatus for traffic management in peer-to-peer networks

Assignee

Inventors

Cpc classification

Classification Explorer

H04L67/1093

ELECTRICITY

Classification Explorer

H04L67/568

ELECTRICITY

Classification Explorer

H04L45/00

ELECTRICITY

Classification Explorer

H04L69/329

ELECTRICITY

Classification Explorer

H04L67/1085

ELECTRICITY

Classification Explorer

H04L67/104

ELECTRICITY

International classification

Classification Explorer

G06F15/16

PHYSICS

Classification Explorer

H04L29/08

ELECTRICITY

Classification Explorer

H04L12/701

ELECTRICITY

Abstract

Claims

Description