System and method for traffic volume publication applying differential privacy
10931642 ยท 2021-02-23
Assignee
Inventors
Cpc classification
G08G1/0129
PHYSICS
H04L63/04
ELECTRICITY
H04W12/02
ELECTRICITY
G08G1/096775
PHYSICS
G08G1/096716
PHYSICS
International classification
H04W12/02
ELECTRICITY
Abstract
A method for a traffic volume publication system to publish traffic volumes in a road traffic network includes: receiving traffic information including information on a plurality of road segments and original traffic volume data for the road segments at a first timestamp and calculating a first window size for each road segment for the first timestamp; predicting a second window size for a third timestamp subsequent to the first timestamp, either based on the first window size calculated at the first timestamp or based on the first window size and a window size calculated in advance at a second timestamp prior to the first timestamp; determining a privacy budget allocated to the first timestamp based on the first window size and the second window size; and returning noisy traffic volume data which is obtained by inserting noise into the original traffic volume data, based on the determined privacy budget.
Claims
1. A method for a traffic volume publication system to publish traffic volumes in a road traffic network, the method comprising: receiving traffic information including information on a plurality of road segments and original traffic volume data for the road segments at a first timestamp and calculating a first window size for each road segment for the first timestamp depending on the result of comparing a transition probability calculated for each road segment at the first timestamp to a preset threshold, wherein the traffic information includes information on times when the traffic information is collected, information on vehicles present in road segments at those times, information on trajectories along which those vehicles move, road segment information, and traffic volume data; predicting a second window size for a third timestamp subsequent to the first timestamp, either based on the first window size calculated at the first timestamp or based on the first window size and a window size calculated in advance at a second timestamp prior to the first timestamp; determining a privacy budget allocated to the first timestamp based on the first window size and the second window size; and returning noisy traffic volume data which is obtained by inserting noise into the original traffic volume data, based on the determined privacy budget, wherein published stream of traffic volume data includes sensitive private information of a plurality of individuals and creating differential privacy information which is protected by adding noise to the original traffic data.
2. The method of claim 1, wherein the calculating of a first window size comprises: determining whether entropy for the calculated transition probability is larger than a preset threshold; and if the transition probability is larger than the threshold, setting the first window size to a preset value.
3. The method of claim 2, wherein the transition probability is calculated as the transition probability for each road segment at the first timestamp, and the entropy for the transition probability is calculated using the transition probability and a logarithm of the transition probability.
4. The method of claim 2, wherein the calculating of a first window size further comprises, if the transition probability is smaller than the threshold, recalculating the entropy by using the probability values of past trajectories prior to the first timestamp in the corresponding road segment of which transition probability is smaller than the threshold.
5. The method of claim 1, wherein the determining of a privacy budget further comprises calculating the privacy budget remaining until the first timestamp.
6. The method of claim 5, wherein the determining of a privacy budget comprises, if the first window size is larger than the second window size, allocating the entire remaining privacy budget as the privacy budget for the first timestamp.
7. The method of claim 6, wherein the determining of a privacy budget comprises, if the first window size is smaller than the second window size, allocating the remaining privacy budget divided by the increase from the first window size to the second window size as the privacy budget for the first timestamp.
8. The method of claim 7, wherein the returning of noisy traffic volume data comprises: calculating the amount of noise to be inserted into the original traffic volume data, based on the privacy budget allocated for the first timestamp; and inserting as much noise as calculated into the original traffic volume data.
9. A system for traffic volume publication in a road traffic network, the system comprising: a window size calculation module that calculates a transition probability for each of a plurality of road segments at a first timestamp, based on traffic information including information on the road segments and original traffic volume data for the road segments, and that compares entropy for the calculated transition probability to a preset threshold and calculates a first window size for each road segment for the first timestamp by using the preset threshold or the entropy calculated based on the probability values of past trajectories prior to the first timestamp, wherein the traffic information includes information on times when the traffic information is collected, information on vehicles present in road segments at those times, information on trajectories along which those vehicles move, road segment information, and traffic volume data; a window size prediction module that predicts a second window size for a third timestamp subsequent to the first timestamp, either based on the first window size calculated at the first timestamp or based on the first window size and a window size calculated in advance at a second timestamp prior to the first timestamp; and a traffic volume publication module that publishes noisy traffic volume data by calculating the amount of noise to be inserted into the original traffic volume data at the first timestamp by using the calculated first window size and the predicted second window size, wherein published stream of traffic volume data includes sensitive private information of a plurality of individuals and the differential privacy information is protected by adding noise to the original traffic data and wherein the window size calculation module, the window size prediction module, and the traffic volume publication module is executed by a hardware processor.
10. The system of claim 9, wherein the traffic volume publication module obtains the privacy budget remaining from the privacy budget used for the window for each road segment at the first timestamp, based on the first window size and the second window size.
11. The system of claim 10, wherein, if the first window size is larger than the second window size, the traffic volume publication module allocates the entire remaining privacy budget as the privacy budget for the first timestamp.
12. The system of claim 11, wherein, if the first window size is smaller than the second window size, the traffic volume publication module allocates the remaining privacy budget divided by the increase from the first window size to the second window size as the privacy budget for the first timestamp.
13. The system of claim 12, wherein the traffic volume publication module calculates the amount of noise to be inserted into the original traffic volume data, based on the privacy budget allocated for the first timestamp.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
DETAILED DESCRIPTION OF THE EMBODIMENTS
(8) In the following detailed description, only certain exemplary embodiments of the present invention have been shown and described, simply by way of illustration. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. Accordingly, the drawings and description are to be regarded as illustrative in nature and not restrictive. Like reference numerals designate like elements throughout the specification.
(9) Hereinafter, a traffic volume publication system and a traffic volume publication method using the same according to an exemplary embodiment of the present invention will be described in detail with reference to the drawings.
(10) Prior to explaining an exemplary embodiment of the present invention, the term window as used herein is defined as the minimum amount of time in which a variety of trajectories can be included when a vehicle arrives at the current road segment.
(11)
(12) As shown in
(13) The traffic information provision system 200 collects traffic information and provides it to the traffic information publication system 100. The traffic information collected by the traffic information provision system 200 includes information on times when the traffic information is collected, information on vehicles present in road segments at those times, information on trajectories along which those vehicles move, road segment information, traffic volume data, and so on.
(14) Here, a road traffic network from which actual traffic information is collected and traffic volume data contained in the traffic information received by the traffic volume publication system 100 will be described first with reference to
(15)
(16)
(17) As shown in
(18) For a certain natural number w, a first trajectory S.sub.t and a second trajectory St with length t are designated as w-neighboring if the following two conditions are satisfied.
(19) For 1i.sub.1<i.sub.2t, i.sub.2i.sub.1+1w.
(20) For 1i.sub.1<i.sub.2t, S.sub.t[i.sub.1]S.sub.t[i.sub.1] and S.sub.t[i.sub.2]S.sub.t[i.sub.2]. Here, i denotes an i-th index. That is, St denotes a place or road on a movement trajectory St with a length t a vehicle visits an i-th time.
(21) For instance, a description will be given with an example in which, for t=5 and w=3, S.sub.t={R.sub.1, R.sub.3, R.sub.5, R.sub.2, R.sub.7} and S.sub.t={R.sub.1, R.sub.4, R.sub.6, R.sub.3, R.sub.7}. Here, St and St represent a trajectory with a length 5, and R.sub.k denotes a place or road the vehicle visited. S.sub.t and S.sub.t represent sets, and the sequence of elements in each set is an ordered list of places or roads the vehicle visited.
(22) For w=3, if i.sub.1=2, S.sub.t[i.sub.1] equals R.sub.3 and S.sub.t[i.sub.1] equals R.sub.4. Likewise, for w=3, if i.sub.2=4, [i.sub.2] equals R.sub.2 and S.sub.t[i.sub.2] equals R.sub.3. Hence, given that S.sub.t[i.sub.l]S.sub.t[i.sub.1], S.sub.t[i.sub.2]S.sub.t[i.sub.2], and i.sub.2i.sub.1+13, St and St are 3-neighboring each other.
(23) Here, if a certain algorithm A satisfies a w-event, it means that the two trajectories S.sub.t and S.sub.t w-neighboring each other are taken as input and all outputs r satisfy the following Mathematical Formula 1.
(24)
(25) For example, it is assumed that 3-event -differential privacy is satisfied for traffic volume data in
(26) Meanwhile, referring again to
(27) Now, a structure of the traffic volume publication system 100 which processes traffic information to generate traffic volume information will be described with reference to
(28)
(29) As shown in
(30) The interface 110 works in conjunction with the traffic information provision system 200 and the terminal 300. Also, the interface 110 receives traffic information transmitted from the traffic information provision system 200, and delivers traffic volume information generated by the processor 120 to the terminal 300. Here, the traffic information includes information on times when the traffic information is collected, information on vehicles present in road segments at those times, information on trajectories along which those vehicles move, road segment information, and so on.
(31) The window size calculation module 121 identifies road segments from the traffic information received via the interface 110. Then, it calculates the window size for each of the identified road segments. Here, the window size for each road segments calculated by the window calculation module 121 is called a first window size, and an algorithm for calculating the first window size will be explained first with reference to
(32)
(33) As shown in
(34) That is, if the entropy for the transition probability is larger than a preset threshold , the corresponding road segment is identified as located off an intersection. Once identified as an intersection, the window size calculation module 121 sets the window size of the corresponding segment to 2.
(35) On the other hand, if the entropy for the transition probability is smaller than the threshold, the entropy is recalculated by using the probability values of past trajectories connected to the corresponding road segment by using BFS. This process is repeated until the entropy becomes larger than the threshold. The window size calculation module 121 sets the window size to not exceed a preset size MaxW, in order to prevent the insertion of too much noise.
(36) Meanwhile, after the window size calculation module 121 calculates the window size of a road segment, the window size prediction module 122 of
(37) That is, the window size varies because the transition matrix depends on timestamps. On the other hand, in w-event privacy, a privacy budget is allocated for each time unit. Thus, when the window size increases, there will be no remaining privacy budget to be allocated, causing a problem of allocating a budget of zero. For example, it is assumed that a window with a size of 3 at the timestamp t increases to 4 at the timestamp t+1. Then, a privacy budget of zero is allocated to the timestamp t+1 since there is no remaining privacy budget in the window. This means that an infinite amount of noise is inserted.
(38) Accordingly, the window size prediction module 122 obtains tm.sub.t+1 by multiplying the previous Maxw1 transition matrices according to the Markov assumption, since the w-event privacy technique takes into consideration only the data for the previous w1 time units. Then, the window size is obtained in the same way as the algorithm explained with respect to
(39) The traffic volume publication module 123 obtains the privacy budget .sub.rm,i remaining from the privacy budget used for the window for each road segment until the current timestamp, based on a first window size for the current timestamp calculated by the window size calculation module 121 and a second window size for the next timestamp predicted by the window size prediction module 122.
(40) If the first window size is smaller than the second window size, the remaining privacy budget divided by the increase in window size is allocated as the privacy budget for the current timestamp.
(41) However, if the first window size is larger than or equal to the second window size, the entire remaining privacy budget is allocated as the privacy budget for the current timestamp.
(42) A traffic volume publication algorithm for allocating a privacy budget by the traffic volume publication module 123 will be described first with reference to
(43)
(44) As shown in
(45) On the other hand, if the window size does not increase, the entire remaining privacy budget is allocated as the privacy budget .sub.t,i for the first timestamp.
(46) The traffic volume publication module 123 inserts noise into the original traffic volume data for each road segment R.sub.i according to the Laplace distribution 1/.sub.t,i, and returns noisy traffic volume data.
(47) A process for a proposed technique for MaxW=4 will be described with reference to
(48) Also, it is assumed that the window size for the next timestamp is smaller by 1 than the window size for the sixth timestamp t6. In this case, the entire remaining budget .sub.rm,i=/3 is allocated to .sub.t.sub.
(49) Meanwhile, the memory 130 of
(50) A method of publishing traffic volume data using the above-explained traffic volume publication system 100 will now be described with reference to
(51)
(52) As shown in
(53) Here, the traffic information includes information on times when the traffic information is collected, information on vehicles present in road segments at those times, information on trajectories along which those vehicles move, road segment information, traffic volume data, and so on.
(54) The traffic volume publication system 100 calculates the transition probability for each road segment at the first timestamp at which the traffic information received in the step S100 is collected (S101). If there is no vehicle that has moved from a first road segment to a second road segment, the transition probability is zero. On the other hand, if there are any vehicles that have moved from the first road segment to the second road segment, the transition probability is calculated based on the number of vehicles that have moved. Here, the traffic volume publication system 100 calculates the transition probability by the following Mathematical Formula 2, based on the algorithm explained previously with reference to
Transition probability (Base_p)=tm.sub.t(R.sub.i) [Mathematical Formula 2]
(55) Here, tm.sub.t represents the transition matrix at time t. Each row and column of the transition matrix represent one road segment, and the transition matrix element tm.sub.t[r][c] represents the probability that vehicles on a road segment corresponding to the row r will be on a road segment corresponding to the column c at the previous timestamp t1.
(56) By calculating the transition probability, the traffic volume publication system 100 determines whether the entropy for the calculated transition probability is larger than a preset threshold (S102). Here, the entropy for the transition probability is calculated by the following Mathematical Formula 3.
Entropy for transition probability (E)=Base_p*log(Base_p) [Mathematical Formula 3]
(57) If the entropy calculated by Mathematical Formula 2 is smaller than the preset threshold, the entropy is recalculated by using the probability values of past trajectories in the corresponding road segment (S103). The entropy recalculation procedure is repeated until the calculated entropy is larger than the threshold.
(58) On the other hand, if the calculated entropy is larger than the preset threshold, the traffic volume publication system 100 determines the window size for each road segment for the first timestamp (S104). Here, a description will be given with respect to an example in which the window size is set to 2, and the window size determined in step S104 is called a first window size.
(59) If the first window size is set to 2, the traffic volume publication system 100 predicts the window size for the next timestamp, i.e., the second timestamp, at the first timestamp at which traffic information is collected to determine the first window size (S105). The window size for the second timestamp varies because the transition matrix depends on timestamps.
(60) Accordingly, the traffic volume publication system 100 obtains the window size for the second timestamp by multiplying (maximum window size1) transition matrices at the third timestamp just prior to the first timestamp. That is, the window size for the second timestamp is obtained by the equation tm.sub.t+1=tm.sub.tMaxW+2*tm.sub.tMaxW+3* . . . *tm.sub.t. Eventually, the window size for the second timestamp (hereinafter referred to as second window size) is predicted by using the algorithm of
(61) Once the first window size and the second window size are calculated or predicted, the traffic volume publication system 100 calculates the privacy budget remaining until the first timestamp (S106). To calculate the remaining privacy budget .sub.rm,i in the step S106, the traffic volume publication system 100 uses the following Mathematical Formula 4:
(62)
(63) where represents the privacy budget allocated to the window, and k represents the earliest timestamp included in the window for R.sub.i. That is, the sigma term represents the sum of the privacy budgets allocated to the time units included in the window.
(64) After calculating the remaining privacy budget through Mathematical Formula 3, the traffic volume publication system 100 determines whether the first window size is smaller than the second window size (S107). If the second window size is smaller, the remaining privacy budget calculated in the step S106 is allocated to the first timestamp (S108).
(65) However, if the first window size is smaller than the second window size, the remaining privacy budget divided by the increase from the first window size to the second window size is allocated to the first timestamp (S109). Next, noise is inserted into the original traffic volume data for the corresponding road segment (S110), and then noisy traffic volume data is returned (S111). Here, the amount of noise inserted in the step S110 is determined according to the
(66) Laplace distribution 1/.sub.t,i.
(67) An example of comparing traffic volume data published according to the above procedure to traffic volume data published according to the conventional traffic volume data publication method will now be described with reference to
(68)
(69) As shown in
(70) Formula 5. The results show the average values obtained by repeating the experiment 10 times, and is fixed to 1. The prediction failure probability for each window size is approximately 3 %, but noise insertion is not taken into account.
Mean absolute error (MAE)=.sub.r R(|D(r)D(r)|/|R|)
Mean relative error (MRE)=.sub.r R(|D(r)D(r)|/|R|*D(r)) [Mathematical Formula 5]
(71) Herein, D(r) is the original traffic volume for the road segment r, D(r) is the traffic volume with noise inserted therein, and R is the set of all road segments.
(72) A threshold determines the window size for the road segment. As the threshold value increases, the average window size increases. This finding implies a reduction of the privacy budget allocated to each time unit and the insertion of more noise.
(73) As shown in
(74) As shown in
(75) The mean absolute and relative errors in the traffic volume for each road segment varying with MaxW when the threshold is zero may be estimated. In the existing method, as MaxW increases, more noise is inserted because the same MaxW is applied to all road segments. By contrast, in a method according to an exemplary embodiment of the present invention, it can be seen the errors are relatively constant because there are many road segments with a window size smaller than MaxW.
(76) While this invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.