NETWORK MONITORING SYSTEM

Abstract

The present disclosure provides a network monitoring system including a plurality of sensor devices which transmits data packets and a monitoring server which performs the transferable reinforcement learning on the data packets to establish a bandwidth allocation policy in which a quality of experience (QoE) satisfies a set reference QoE to allocate the bandwidth to the plurality of sensor devices.

Claims

1. A network monitoring system, comprising: a plurality of sensor devices which transmits data packets; and a monitoring server which performs the transferable reinforcement learning on the data packets to establish a bandwidth allocation policy in which a quality of experience (QoE) satisfies a set reference QoE to allocate the bandwidth to the plurality of sensor devices.

2. The network monitoring system according to claim 1, wherein the monitoring server includes: a storage unit which stores the data packets and the bandwidth allocated to the plurality of sensor devices; and a learning unit which computes the QoE quality by applying the data packets stored in the monitoring server to the bandwidth allocation policy generated with the transferable reinforcement learning, and allocates the bandwidth which satisfies the reference QoE quality.

3. The network monitoring system according to claim 2, wherein the learning unit includes: a flow embedding module which applies a flow state of the data packets to an attention mechanism configured by multi-perceptron to output a vector value which accelerates the training speed; and a bandwidth allocation module which forms a latent action which reduces an action search space to accelerate the training speed based on the vector value and a final action which expresses the latent action with a bandwidth allocation value.

4. The network monitoring system according to claim 3, wherein the flow embedding module includes a vectorization function which generates an intermediate embedding vector value by applying the flow state to a multilayer perceptron and a relation extraction function which generates the vector value which is a flow embedding weighted by applying the intermediate embedding vector value to an attention mechanism.

5. The network monitoring system according to claim 3, wherein the bandwidth allocation module includes an allocation function which forms the latent action according to a position point with respect to the plurality of sensor devices, an adaptation function which derives a control value to allow the latent action to be adapted to the plurality of target network environments, and a sharing function which forms the final action with the latent action and the control value.

6. The network monitoring system according to claim 5, wherein the bandwidth allocation module establishes the bandwidth allocation policy in which the QoE quality satisfies the reference QoE quality according to the final action to allocate the bandwidth to the plurality of sensor devices with a data allocation value which represents the final action.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0019] The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.

[0020] The above and other aspects, features and other advantages of the present disclosure will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:

[0021] FIG. 1 is a schematic view illustrating a network monitoring system according to the present disclosure;

[0022] FIG. 2 is a view illustrating a transferable reinforcement learning structure which is applied to a network monitoring system according to the present disclosure;

[0023] FIG. 3 is a view specifically illustrating a transferable reinforcement learning structure illustrated in FIG. 2; and

[0024] FIGS. 4 to 7 are exemplary execution diagrams illustrating a performance of a network monitoring system according to the present disclosure.

DETAILED DESCRIPTION OF THE EMBODIMENT

[0025] Those skilled in the art may make various modifications to the present disclosure and the present disclosure may have various embodiments thereof, and thus specific embodiments will be described in detail with reference to the drawings. However, this does not limit the present disclosure within specific exemplary embodiments, and it should be understood that the present disclosure covers all the modifications, equivalents and replacements within the spirit and technical scope of the present disclosure. In the description of respective drawings, similar reference numerals designate similar elements.

[0026] Terms such as first, second, A, or B may be used to describe various components but the components are not limited by the above terms. The above terms are used only to distinguish one component from the other component. For example, without departing from the scope of the present disclosure, a first component may be referred to as a second component, and similarly, a second component may be referred to as a first component. A term of and/or includes combination of a plurality of related elements or any one of the plurality of related elements.

[0027] It should be understood that, when it is described that an element is “coupled” or “connected” to another element, the element may be directly coupled or directly connected to the other element or coupled or connected to the other element through a third element. In contrast, when it is described that an element is “directly coupled” or “directly connected” to another element, it should be understood that no element is not present therebetween.

[0028] Terms used in the present application are used only to describe a specific exemplary embodiment, but are not intended to limit the present disclosure. A singular form may include a plural form if there is no clearly opposite meaning in the context. In the present application, it should be understood that the term “include” or “have” indicates that a feature, a number, a step, an operation, a component, a part or the combination thereof described in the specification is present, but do not exclude a possibility of presence or addition of one or more other features, numbers, steps, operations, components, parts or combinations, in advance.

[0029] If it is not contrarily defined, all terms used herein including technological or scientific terms have the same meaning as those generally understood by a person with ordinary skill in the art. Terms defined in generally used dictionary shall be construed that they have meanings matching those in the context of a related art, and shall not be construed in ideal or excessively formal meanings unless they are clearly defined in the present application.

[0030] In the specification and the claim, unless explicitly described to the contrary, the word “comprise” and variations such as “comprises” or “comprising”, will be understood to imply the inclusion of stated elements but not the exclusion of any other elements.

[0031] Hereinafter, exemplary embodiments according to the present disclosure will be described in detail with reference to accompanying drawings.

[0032] FIG. 1 is a schematic view illustrating a network monitoring system according to the present disclosure, FIG. 2 is a view illustrating a transferable reinforcement learning structure which is applied to a network monitoring system according to the present disclosure, and FIG. 3 is a view specifically illustrating a transferable reinforcement learning structure illustrated in FIG. 2.

[0033] Referring to FIGS. 1 to 3, a network monitoring system 100 includes a plurality of sensor devices 110 and a monitoring server 120.

[0034] Each of the plurality of sensor devices 110 transmits data packets according to the bandwidth in which information collected from the sensing area is allocated.

[0035] In the exemplary embodiment, the plurality of sensor devices 110 communicates with an external device, such as a navigation and a mobile terminal, and receive information transmitted from the external device to transmit the information to the monitoring server 120.

[0036] The monitoring server 120 performs the transferable reinforcement learning on the data packets to establish a bandwidth allocation policy in which a quality of experience (QoE) satisfies a set reference QoE to allocate the bandwidth to the plurality of sensor devices.

[0037] The monitoring server 120 may include a storage unit 122 and a learning unit 124.

[0038] First, the storage unit 122 stores the data packets and a bandwidth allocated to the plurality of sensor devices 110, but is not limited thereto.

[0039] The learning unit 124 calculates the QoE quality by applying the data packets stored in the storage unit 122 to the bandwidth allocation policy generated by the transferable reinforcement learning, and allocates the bandwidth which satisfies the reference QoE quality.

[0040] Here, the learning unit 124 may include a flow embedding module 126 and a bandwidth allocation module 128.

[0041] The flow embedding module 126 applies a flow state of the data packets to an attention mechanism configured by multi-perceptron to output a vector value which accelerates the training speed.

[0042] The flow embedding module 126 may include a vectorization function which generates an intermediate embedding vector value by applying the flow state to a multilayer perceptron and a relation extraction function which generates the vector value which is a flow embedding weighted by applying the intermediate embedding vector value to the attention mechanism.

[0043] Prior to describing the flow embedding module 126, in a circumstance with a limited network resource, the QoE quality Σ.sub.i=1.sup.N.sup.Dα.sub.i=A≤L.sub.ε is maximized by bandwidth allocation

[00001] ${.Math.}_{i = 1}^{N_{D}} a_{i}^{t},$

which will be represented as follows.

[00002] $? QoE = {.Math.}_{t = 0}^{T - 1} \int_{t}^{t + 1} QUAL (I^{τ}; a_{1}^{t}, .Math., a_{N_{D}}^{t}) d τ,$ $subject to {.Math.}_{i = 1}^{N_{D}} a_{i}^{t} = A \leq L_{ε}$ $? indicates text missing or illegible when filed$

[0044] Here, QUAL(I.sup.T;α.sub.1.sup.t, . . . , α.sub.N.sub.D.sup.t) is a function of evaluating a QoE quality, L.sub.ε is a link allocation amount according to the network environment ε, T is a total allocation time, N.sub.D is the number of sensor devices, I.sup.T=(I.sub.1.sup.T, . . . , I.sub.N.sub.D.sup.T) is a sum of information transmitted from the plurality of sensor devices 11.

[0045] Here, referring to FIGS. 2 and 3, the vectorization function included in the flow embedding module 126 generates an intermediate embedding vector value e.sub.l.sup.i=MLPϕ(S.sub.i.sup.t) by applying the flow states S.sub.i.sup.t to the multilayer perceptron MLP function MLPϕ(.) at every time step t.

[0046] The relation extraction function included in the flow embedding module 126 generates the flow embedding E.sup.t using the intermediate embedding vector E.sup.l=[e.sub.1.sup.l, . . . ,e.sub.N.sub.D.sup.t] and the attention function ATTψ(.).

[0047] In the relation extraction function, query, key, and value vectors (q.sub.i, k.sub.i, v.sub.i) may be calculated from the multilayer perceptron MLP functions (MLP.sub.ψ.sub.q(.), MLP.sub.ψ.sub.k(.), MLP.sub.ψ.sub.v(.).

x.sub.i=MLP.sub.ψ.sub.x(e.sub.i.sup.l)x∈{q,k,v}

[0048] Next, the attention weight W.sub.i which measures the importance with the flow state S.sub.i.sup.t and the other flow states may be calculated.

[00003] $\begin{matrix} w_{i} = Softmax ([\frac{q_{i}^{T} .Math. k_{1}}{\sqrt{d}}, .Math., \frac{q_{i}^{T} .Math. k_{N_{D}}}{\sqrt{d}}]) \\ = {[\Pr (w_{1}^{'}), .Math., \Pr (w_{N_{D}}^{'})]}^{T} = {[w_{1}, .Math., w_{N_{D}}]}^{T} \end{matrix}$

[0049] Here, it is defined as

[00004] $\Pr (w_{i}^{'}) = \frac{\exp (w_{i}^{'})}{{.Math.}_{j = 1}^{N_{D}} \exp (w_{j}^{'})} .$

[0050] The flow embedding e.sub.i.sup.t may be calculated by (w.sub.1v.sub.1+ . . . +w.sub.N.sub.Dv.sub.N.sub.D)

[0051] That is, the flow embedding process is summarized as follows and is a function EMB.sub.ψ(.) by a trainable parameter ψ.

[00005] $\begin{matrix} E^{t} = {EMB}_{ψ} (S^{t}) \\ = {ATT}_{ψ} ([{MLP}_{ψ} (S_{1}^{t}), .Math., {MLP}_{ψ} (S_{N_{D}}^{t})]) \end{matrix}$

[0052] The bandwidth allocation module 128 includes an allocation function which forms a latent action according to a position point with respect to the plurality of sensor devices 110, an adaptation function which derives a control value to allow the latent action to be adapted to the plurality of target network environments, and a sharing function which forms the final action with the latent action and the control value.

[0053] The allocation function expresses the position of the sensor device 110 with a 2D grid and derives the latent action ã.sup.t=[{tilde over (α)}.sub.1.sup.t, . . . ,{tilde over (α)}.sub.N.sub.p.sup.t].sup.T for the point (p.sub.1, . . . ,p.sub.N.sub.p) which are randomly disposed on the grid.

ã.sup.t=ALLOC.sub.ϕ.sub.1(E.sup.t)

[0054] Here, in order to accelerate the training speed, N.sub.p<<N.sub.D is assumed and ALLOC.sub.ϕ.sub.1(.) is a function parameterized with ϕ.sub.1 trained by the RL and forms the policy π.sub.ϕ.sub.1.

[0055] The adaptation function is a function ADJUST.sub.ϕ.sub.2 parameterized with ϕ.sub.2 which is trained by the transferable reinforcement learning forms the policy ϕ.sub.2 and derives a control value Δ.sup.t=[ã.sub.δ.sup.t, k.sub.δ.sup.t, v.sub.δ.sup.t] for adapting to various network environments.

Δ.sup.t=ADJUST.sub.ϕ.sub.2(E.sup.t)

[0056] Here, in order to increase the training speed for the domain adaptation, a range of the value of ã.sub.δ is limited to z % of ã.

[0057] The shape function derives a final action a.sup.t by passing the control value Δ.sup.t=[ã.sub.δ.sup.t, k.sub.δ.sup.t, v.sub.δ.sup.t] and the latent action ã.sup.t=[{tilde over (α)}.sub.1.sup.t, . . . ,{tilde over (α)}.sub.N.sub.p.sup.t].sup.T through the SHAPE function which is non-trainable.

a.sup.t=SHAPE(ã.sup.t+ã.sub.δ.sup.t)|k+k.sub.δ.sup.t,v+v.sub.δ.sup.t

[0058] Specific calculation of the shape (SHAPE) function is as follows.

[00006] $? = {.Math.}_{j = 1}^{N_{P}} \frac{{\tilde{a}}_{j}^{t} + {\tilde{a}}_{δ, j}^{t} + ϵ}{{.Math. D_{i} - p_{j} .Math.}^{k + k_{δ}^{t}} + ϵ},$ $a_{i}^{'} = {\begin{matrix} {\tilde{a}}_{i}^{'}, & if {\tilde{a}}_{i}^{'} \geq 0.1 N_{p}, \\ c, & otherwise . \end{matrix}$ $\begin{matrix} a^{t} = (1 - v - v_{δ}^{t}) A .Math. Softmax ([a_{1}^{'}, .Math., a_{N_{D}}^{'}]) \\ = (1 - v - v_{δ}^{t}) A .Math. {[\Pr (a_{1}^{'}), .Math., \Pr (a_{N_{D}}^{'})]}^{T} \\ = {[a_{1}^{'}, .Math., a_{N_{D}}^{'}]}^{T} \end{matrix}$ $? indicates text missing or illegible when filed$

[0059] Here, ∥D.sub.i−p.sub.j∥ is a distance between the sensor device D.sub.i and the point p.sub.j,∈<<1 is very small positive number, and c≤−2 is a value for clip.

[0060] The shape (SHAPE) function is not limited to the above equation and another type of calculating method is also sufficiently used.

[0061] Finally obtained bandwidth allocation values (general action) [α.sub.1.sup.t, . . . ,α.sub.N.sub.D.sup.t].sup.T of the sensor devices are transmitted to individual devices through one packet, and the device D.sub.i updates the information in the storage unit 122 according to the allocated bandwidth a.sub.i.sup.t through a network.

[0062] That is, the bandwidth allocation module 128 establishes the bandwidth allocation policy in which the QoE quality satisfies the reference QoE quality according to the final action to allocate the bandwidth to the plurality of sensor devices with a data allocation value which represents the final action.

[0063] As described above, the learning unit 124 establishes the bandwidth allocation policy in various network environments by means of the two-phase learning processes using the flow embedding and action shaping techniques with the transferable reinforcement learning.

[0064] The flow embedding expresses the state for the information update flow of the sensor devices as a low dimension vector value which accelerates the training speed, using an attention mechanism configured by a multilayer perceptron.

[0065] According to the action shaping, an action of determining an action of the policy is formed in two stages. In the first stage, a latent action which reduces an action search space to accelerate the training speed is derived, and in the second stage, a general action which expresses the latent action with a bandwidth allocation value is formed. Here, the process of forming the latent action as the general action is configured by a function which is not necessary to be trained to be adjusted by a parameter to support the domain adaptation.

[0066] In Phase 1, in an easy-to-learn environment (source environment) such as an ideal network, a trainable function is trained for the flow embedding and the action shaping to establish the initial policy.

[0067] In phase 2, in order to adapt the initial policy established in the source environment to an environment (target environment) such as a real network environment, only the adaptation function is trained for the action shaping to establish an optimal policy corresponding to the environment.

[0068] FIGS. 4 to 7 are exemplary execution diagrams illustrating a performance of a network monitoring system according to the present disclosure.

[0069] FIG. 4 illustrates a performance of the present disclosure (Repot) according to a network scale and it is understood that 90% or more of QoE performance is maintained regardless of the network scale.

[0070] FIG. 5 illustrates a performance when the initially established policy is adapted to a test in another network environment.

[0071] During the test, in all the control groups including the present disclosure (Repot), 20% or more of QoE is reduced, but in the present disclosure, it is understood that the performance is recovered to an original performance value through the adaptation performance, that is, the adaptation process of the action shaping.

[0072] FIG. 6 illustrates a gain of the training speed.

[0073] It shows that even though 1.6 million training samples were required for initial policy establishment, when it is adapted to another network environment, the training is possible with 100000 training samples.

[0074] FIG. 6.fwdarw.FIG. 7 illustrates a performance when the initially established policy is adapted in various network environments. It is understood that when the present disclosure is adapted by the action shaping method, 90% or more of QoE is maintained. Even though the present disclosure (Repot) can be adapted to the environment by another method, the QoE is low when the other method (full, top) is used.

[0075] The features, structures, effects and the like described in the foregoing embodiments are included in at least one embodiment of the present disclosure and are not necessarily limited to one embodiment. Moreover, the features, structures, effects and the like illustrated in each embodiment may be combined or modified by those skilled in the art for the other embodiments to be carried out. Therefore, the combination and the modification of the present disclosure are interpreted to be included within the scope of the present disclosure.

[0076] It will be appreciated that various exemplary embodiments of the present disclosure have been described herein for purposes of illustration, and that various modifications, changes, and substitutions may be made by those skilled in the art without departing from the scope and spirit of the present disclosure. Therefore, the exemplary embodiments of the present disclosure are provided for illustrative purposes only but not intended to limit the technical concept of the present disclosure. The scope of the technical concept of the present disclosure is not limited thereto. The protective scope of the present disclosure should be construed based on the following claims, and all the technical concepts in the equivalent scope thereof should be construed as falling within the scope of the present disclosure.

NETWORK MONITORING SYSTEM

Assignee

Inventors

Cpc classification

Classification Explorer

H04L47/805

ELECTRICITY

Classification Explorer

H04L47/823

ELECTRICITY

International classification

Classification Explorer

H04L47/80

ELECTRICITY

Classification Explorer

H04L47/70

ELECTRICITY

Abstract

Claims

Description