MANAGING NETWORK EVENT DATA IN A TELECOMMUNICATIONS NETWORK
20220321395 · 2022-10-06
Assignee
Inventors
Cpc classification
H04L41/069
ELECTRICITY
H04L41/024
ELECTRICITY
International classification
H04L41/069
ELECTRICITY
H04L41/0686
ELECTRICITY
Abstract
Managing network event data in a telecommunications network A method (200) is disclosed for managing network event data in a telecommunications network, wherein a network event is associated with a plurality of network event data fields, and wherein the telecommunications network comprises a data storage facility for network event data, the data storage facility comprising a plurality of storage functions. The method comprises obtaining queries submitted to the data storage facility (210) and, for a network event data field, determining a frequency with which data in the network event data field is required in order to respond to the obtained queries (220) and using a trained machine learning model to map the determined frequency to a value of a selection parameter for the network event data field (230). The method further comprises performing at least one of (240) storing data in the network event data field in a storage function in the data storage facility, migrating data in the network event data field between storage functions in the data storage facility or deleting data in the network event data field from a storage function in the data storage facility in accordance with the value of the selection parameter. Also disclosed are a method (400) of training a machine learning model and apparatus and a computer program product for carrying out methods for managing network event data and training a machine learning model.
Claims
1. A method for managing network event data in a telecommunications network, wherein a network event is associated with a plurality of network event data fields, and wherein the telecommunications network comprises a data storage facility for network event data, the data storage facility comprising a plurality of storage functions, the method comprising: obtaining queries submitted to the data storage facility; and for a network event data field; determining a frequency with which data in the network event data field is required in order to respond to the obtained queries; using a trained machine learning model to map the determined frequency to a value of a selection parameter for the network event data field; and performing at least one of: storing data in the network event data field in a storage function in the data storage facility; migrating data in the network event data field between storage functions in the data storage facility; or deleting data in the network event data field from a storage function in the data storage facility, in accordance with the value of the selection parameter.
2. The method of claim 1, wherein obtaining queries submitted to the data storage facility comprises obtaining queries submitted during an analysis time window comprising a plurality of time slots, and wherein determining a frequency with which data in the network event data field is required to respond to the obtained queries comprises: for a time slot in an analysis time window: accumulating obtained queries submitted within the time slot; extracting network event data fields required to respond to the accumulated queries; and adding the number of times the network event data field appears in the extracted network event data fields to a time slot frequency count for the network event data field.
3. The method of claim 2, wherein determining a frequency with which data in the network event data field is required to respond to the obtained queries further comprises: assembling time slot frequency counts for the network event data field from time slots in the analysis time window into a frequency vector for the network event data field during the analysis time window.
4. The method of claim 1, wherein the selection parameter value indicates a relative importance of the network event data field with respect to responding to queries submitted to the data storage facility, and wherein the trained machine learning model maps the determined frequency to a value of the selection parameter for the network event data field such that a higher frequency maps to a value indicating greater importance.
5. The method of claim 1, wherein using a trained machine learning model to map the determined frequency to a value of a selection parameter for the network event data field comprises: using the trained machine learning model to map the determined frequency to a dynamic value of the selection parameter; and generating a definitive value of the selection parameter by combining the frequency based value of the selection parameter with a static value of the selection parameter.
6. The method of claim 5, wherein the selection parameter comprises a binary value, and wherein combining the dynamic value of the selection parameter with a static value of the selection parameter comprises performing a logical OR operation on the dynamic value of the selection parameter and the static value of the selection parameter.
7. The method of claim 1, wherein the trained machine learning model is trained to map the determined frequency to a selection parameter for the network event data field according to the objective function:
8. The method of claim 7, wherein the particular selection parameter value indicates an availability of the network event data field in the data storage facility.
9. The method of claim 1, wherein storing data in the network event data field in a storage function in the data storage facility in accordance with the selection parameter comprises: selecting a storage function for the network event data field in accordance with the selection parameter; and initiating storage of data in the network event data field in the selected storage function.
10. The method of claim 1, wherein migrating data in the network event data field between storage functions in the data storage facility in accordance with the selection parameter comprises: selecting a storage function for the network event data field in accordance with the selection parameter; and on occurrence of a migration trigger, initiating migration of data in the network event data field to the selected storage function.
11. The method of claim 2, wherein migrating data in the network event data field between storage functions in the data storage facility in accordance with the selection parameter comprises: selecting a storage function for the network event data field in accordance with the selection parameter; and, on occurrence of a migration trigger, initiating migration of data in the network event data field to the selected storage function, and the migration trigger comprises expiry of the analysis time window.
12. The method of claim 1, wherein deleting data in the network event data field from a storage function in the data storage facility in accordance with the selection parameter comprises: generating an overview selection parameter value by combining selection parameter values over a plurality of analysis time windows; and determining whether to delete data in the network event data field from a storage function in the data storage facility on the basis of the overview selection parameter value.
13. The method of claim 12, wherein the selection parameter comprises a binary value, and combining selection parameter values over a plurality of analysis time windows comprises performing a logical OR operation on the selection parameter values over a plurality of analysis time windows.
14. The method of claim 1, further comprising: generating a vector of selection parameter values for a plurality of network event data fields; and determining whether the generated vector of selection parameter values satisfies a criterion representing a threshold for queries for which the required network data event fields have a particular selection parameter value.
15. The method of claim 14, further comprising: retraining the machine learning model as a result of determining that the generated vector of selection parameter values does not satisfy the criterion; and using the retrained machine learning model to map the determined frequency to a value of a selection parameter for the network event data field.
16-25. (canceled)
26. An apparatus for managing network event data in a telecommunications network, wherein a network event is associated with a plurality of network event data fields, and wherein the telecommunications network comprises a data storage facility for network event data, the data storage facility comprising a plurality of storage functions, the apparatus comprising a processor and a memory, the memory containing instructions executable by the processor such that the apparatus is configured to: obtain queries submitted to the data storage facility; and for a network event data field: determine a frequency with which data in the network event data field is required in order to respond to the obtained queries; use a trained machine learning model to map the determined frequency to a value of a selection parameter for the network event data field; and perform at least one of: storing data in the network event data field in a storage function in the data storage facility; migrating data in the network event data field between storage functions in the data storage facility; or deleting data in the network event data field from a storage function in the data storage facility, in accordance with the value of the selection parameter.
27-29. (canceled)
30. An apparatus for training a machine learning model for use in a method for managing network event data in a telecommunications network, wherein a network event is associated with a plurality of network event data fields, and wherein the telecommunications network comprises a data storage facility for network event data, the data storage facility comprising a plurality of storage functions, the apparatus comprising a processor and a memory, the memory containing instructions executable by the processor such that the apparatus is configured to: obtain queries submitted to the data storage facility; for each of a plurality of network event data fields, determine a frequency with which data in the network event data field is required in order to respond to the obtained queries; calculate a threshold frequency value; label the network event data fields with a selection parameter value generated on the basis of the determined frequencies and the threshold frequency value; and apply a machine learning algorithm to a training data set comprising the labelled network event data fields and determined frequencies to generate a model for mapping a determined frequency for a network event data field to a value of a selection parameter for the network event data field.
31-33. (canceled)
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0059] For a better understanding of the present invention, and to show more clearly how it may be carried into effect, reference will now be made, by way of example, to the following drawings, in which:
[0060]
[0061]
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
[0070]
[0071]
[0072]
[0073]
[0074]
[0075]
DETAILED DESCRIPTION
[0076] As discussed above, event data is currently managed such that an entire event is transitioned between Hot, Warm and cold storage functions on the basis of access patterns. In practice, only very few fields of an event are required in order to satisfy the vast majority of queries, and examples of the present disclosure therefore propose to archive only a subset of the data that is sufficient for a given business use case. An objective of examples of the present disclosure is to find the optimal subset of fields that would suffice for a given use case. As the data in question is semi-structured, feature selection methods designed for structured data are not appropriate. A straightforward field-selection machine learning model could be structured as a multi-label problem comprising features setting out a frequency of requests for each field and the labels denoting whether or not the fields are included in a particular storage function. However, such an approach, with its rigid field structure, does not scale well. That is, no modification can be made on the schema without abandoning the trained model. Schema fields can change over time, new fields can be added and others may be removed, and small changes can also occur in structure. The above discussed model does not have the flexibility to accommodate such changes.
[0077] The set of fields in a telecommunication database can be considered sufficient if it could answer all queries in its lifetime. This hard constraint prevents removing any field from the database, as the universe of queries can encompass all fields in the schema. However, the constraint can be softened by answering only a proportion, C, of such queries, say C=90%, over a period T. The problem can then be phrased as how to select a subset of fields from a schema such that C proportion of incoming queries are satisfied.
[0078] Examples of the present disclosure propose a method that enables selection of fields for storage functions based on frequency of query access. A proposed Frequent Field Selection (FFS) technique employs a machine learning model to introduce the learning aspect in this process. The model learns the fields of an event that are accessed to respond to queries (specific query processing) over a long period of time. The model uses this learning to find fields of interest in event data and propose to store information in such fields in the Warm storage function rather than the Cold storage function. Over a period of time, the model may additionally manage the data in the cold archives to avoid the huge storage of unnecessary events and data fields. Examples of the proposed model may be capable of adapting to variations in both data and queries, through introduction of new features or depreciation of existing features. Examples of the present disclosure propose a data migration mechanism which may combine query frequency information with properties of the Hot/Warm/Cold storage to optimise selection of events and event fields for storage in particular functions.
[0079] A high level architecture which may be used to implement examples of the present disclosure is illustrated in
[0080]
[0081]
[0082] The network event data managed according to the method 100 or 300 may comprise data relating to a plurality of different network event types, including fault events, alarm events, performance events, billing events, charging events etc. Referring initially to
[0083] In step 304, the FFS engine selects a time slot and, for that time slot in the analysis time window, the FFS engine accumulates obtained queries submitted within the time slot in step 306 and extracts network event data fields required to respond to the accumulated queries instep 308. In step 310, the FFS engine selects a network event data field and, in step 312, the FFS engine adds the number of times the network event data field appears in the extracted network event data fields to a time slot frequency count for the network event data field. In step 314, the FFS engine checks whether all data fields been considered within the current time slot. If not, the FFS engine returns to step 310 to select a new network event data field, and repeats this process until all network event data fields have been considered within the current time slot. Once all network event data fields have been considered in the current time slot, the FFS engine checks, in step 316 whether all time slots within the analysis time window have been considered. If not, the FFS engine returns to step 304 to select a new time slot and repeat steps 306 to 314, until all time slots within the analysis time window have been considered.
[0084] Referring now to
[0085] In step 320, the FFS engine uses a trained machine learning model to map the frequency vector for each network event data field to a dynamic value of a selection parameter. The selection parameter value indicates a relative importance of the network event data field with respect to responding to queries submitted to the data storage facility, and the trained machine learning model may map the determined frequency to a value of the selection parameter for the network event data field such that a higher frequency maps to a value indicating greater importance. In some examples, a particular selection parameter value may indicate an availability of the network event data field in the data storage facility. The availability may indicate presence of the data in the network event data field in the storage facility, or may indicate presence of the data in the network event data field in a particular storage function of the data storage facility. The particular storage function may comprise a function associated with particular read/write capabilities. The particular read/write capabilities may include a speed of read/write operations, and the particular storage function may comprise a short or medium term storage function (also referred to as Hot or Warm storage functions). The trained machine learning model may map the determined frequency to a value of the selection parameter such that a higher frequency maps to a selection parameter value associated with greater availability of the data in the network event data field.
[0086] The mapping step 320 may be performed according to the objective function:
wherein:
{right arrow over (c)} comprises a vector of storage capacity occupied by network event data fields;
[0087] {right arrow over (x)} comprises a vector of selection parameter values for network event data fields;
[0088] Q.sub.T comprises a set of queries submitted over an analysis time window T;
[0089] A(q) comprises a set of network event data fields required by a query q; and
[0090] B ({right arrow over (x)}) comprises the set of network event data fields having a particular selection parameter value according to {right arrow over (x)};
[0091] C comprises a threshold for queries for which the required network data event fields have the particular selection parameter value.
[0092] The particular selection parameter value may correspond to an availability of the data in the corresponding network event data fields, such that if A(q)−B({right arrow over (x)})=0 for a particular query q, this indicates that all of the network event data fields that are required by query q are available according to the vector {right arrow over (x)}. Further discussion of the above objective function is provided below.
[0093] In step 332, the FFS engine generates a definitive value of the selection parameter by combining the frequency based value of the selection parameter with a static value of the selection parameter. The static value may be preconfigured for the network event data fields on the basis of at least one of legal requirements, regulatory requirements, business requirements etc. In one example, the selection parameter comprises a binary value, and combining the dynamic value of the selection parameter with a static value of the selection parameter comprises performing a logical OR operation on the dynamic value of the selection parameter and the static value of the selection parameter. In this manner, any network event data fields that are required to be maintained in a certain storage function for legal, regulatory or business reasons may be assigned a static value of the selection parameter of 1. The combination using a logical OR function will ensure that such network event data fields are always assigned a positive definitive value of the selection parameter, in addition to any network event data filed having a positive dynamic value of the selection parameter.
[0094] In step 324, the FFS engine generates a vector of selection parameter values for a plurality of network event data fields and in step 326, the FFS engine determines whether or not the generated vector of selection parameter values satisfies a criterion representing a threshold for queries for which the required network data event fields have a particular selection parameter value. The particular selection parameter value may be associated with availability of the data in the network event data fields, such that the criterion represents a threshold for queries for which the required network event data fields are available.
[0095] If the generated vector of selection parameter values does not satisfy the criterion, the FFS retraining the machine learning model in step 328 (as discussed below with reference to
[0096] If the generated vector of selection parameter values does satisfy the criterion, the FFS may perform any one or more of the three options illustrated in
[0097] In a third option, the FFS engine generates an overview selection parameter value by combining selection parameter values over a plurality of analysis time windows in step 340. In examples in which the selection parameter comprises a binary value, this may comprise performing a logical OR operation on the selection parameter values over the plurality of analysis time windows. This combination may ensure that only those network event data fields that have never had a positive selection parameter value during the plurality of analysis time windows not be assigned a positive value of the overview selection parameter, and so will be selected for deletion from the storage function in step 342 as discussed below. In some examples, the plurality of analysis time windows may be sufficient to ensure that the total time covered by the plurality of analysis time windows fulfils certain criteria applicable to a particular service or service provider. For example, the plurality of analysis time windows may be sufficient to ensure that the total time covered by the plurality of analysis time windows is at least one calendar year, so as to account for variations in the nature of queries that may be submitted over a year long period. In some examples, the total time covered by the plurality of analysis time windows may comprise between 1.5 and 2 years.
[0098] In step 342, the FFS engine determines whether to delete data in a network event data field from a storage function in the data storage facility on the basis of the overview selection parameter value. The FFS engine checks in step 344 whether all network event data fields have been considered and, if so, initiates deletion of data in the network event data fields in step 346 in accordance with the determination. The third option illustrated in steps 340 to 346 thus has the effect of purging from the storage function any data that has not been required to respond to a received query for a significant period of time (the plurality of analysis time windows), so avoiding the unnecessary incurring of storage cost for data that is not required.
[0099] In some examples of the method 300, migration of data between storage functions may be performed, for example on a periodic basis and in combination with or independently of the migration that may be performed in step 338 discussed above. This migration is illustrated in
[0100]
[0101] The selection parameter value may be binary and labelling the network event data fields may comprise setting the selection parameter value to 1 for all network event data fields having a frequency over the threshold frequency. In some examples, the method may comprise generating vectors of frequencies for time slots over an analysis time window, which may for example be a retention time for a storage function in the data storage facility. In other examples the vector may comprise frequencies for multiple analysis time windows.
[0102] According to some examples of the present disclosure, calculating a threshold frequency value may comprise generating a vector:
{right arrow over (u)}=[Σ{right arrow over (v)}.sup.(1),Σ{right arrow over (v)}.sup.(2),Σ{right arrow over (v)}.sup.(3), . . . ,Σ{right arrow over (v)}.sup.(n)],
[0103] in which {right arrow over (v)}.sup.(i) comprises a vector of frequencies for a plurality of time slots within an analysis time window for a network event data field i, and calculating the threshold frequency value using the expression:
in which v.sub.θ comprises the threshold frequency value and C comprises a threshold for queries for which the required network data event fields have a particular selection parameter value.
[0104] In some examples, the method 400 may further comprise determining whether the generated model conforms to a constraint by generating a vector {right arrow over (y)} of mapped selection parameter values using the generated model and determining whether the generated vector satisfies the expression:
[0105] The machine learning algorithm applied in step 450 may for example comprise at least one of a Random Forrest algorithm or a Logistic Regression algorithm.
[0106]
[0107]
[0108]
[0109]
[0110]
[0111]
[0112] The following description provides additional detail as to how the above discussed steps of methods 200, 300 and 400 may be implemented, followed by a presentation of case studies.
[0113] As discussed above, aspects of the present disclosure allow flexibility in accommodating an evolving schema of network events and their data fields by treating the features of each field present in the schema as an individual sample. A learning outcome for a network event data field is achieved that comprises a single label which determines whether the field ought to be retained in a particular storage function or removed (set to null). The features for the learning model are the frequency of requests for the field over an analysis period T split over k equal sub-intervals or time slots t.sub.1, t.sub.2, t.sub.3, . . . , t.sub.k. In some examples, only leaf fields in the schema are considered. A field is a leaf filed if it is neither an array nor a structure. Hence, String, Integer, Long, Boolean and Map are the considered types of fields. x denotes the n dimensional vector [x.sub.1, x.sub.2, x.sub.3, . . . , x.sub.n]. Here n denotes the number of leaf fields and the binary variable x.sub.i determines whether the field i is included (1) or not (0) in a particular storage function. c.sub.i is the cost of the ith field which represents the space occupied by the field in a database. An objective is to minimise the number of fields A(q) required based on queries available in the database while answering at least a specified proportion of incoming queries Q.sub.T, over a period of time T i.e., ascertain that at least C fraction of queries are answered. The optimisation objective function can be considered to be minimizing
f(x)=Σ.sub.i-1.sup.ncixi
subject to the summation of the constraint function g(q, {right arrow over (x)}) where q represents the incoming query terms. f(x) can be represented as a dot product of two vectors {right arrow over (c)} and {right arrow over (x)}.
[0114] Q.sub.T: Set of queries asked over a period T
[0115] A(q): Set of attributes required by the query q
[0116] B({right arrow over (x)}): Set of attributes corresponding to {right arrow over (x)}
[0117] The optimization objective can now be represented as
[0118] A Frequent Field Selection (FFS) algorithm, for example running in an FFS engine as discussed above, optimises the above objective by taking the frequency of query requests for each field into account. The frequency vector of a field i is {right arrow over (v)}.sup.(i)=[v.sub.1.sup.(i), v.sub.2.sup.(i), v.sub.3.sup.(i), . . . v.sub.k.sup.(i))] where v.sup.i.sub.j is the frequency of requests received for the ith field within the jth time slot. The trained machine learning model of methods 200 and 300 provides a mapping function h:R.sup.k.fwdarw.Z.sub.2, where Z.sub.2=0, 1 denotes whether the field corresponding to the frequency vector, {right arrow over (v)}, is included or not in a particular storage function. Z.sub.2 thus represents the selection parameter of methods 200, 300, 400. An assumption may be made that by selecting a set of attributes corresponding to C proportion of attribute request frequency, C proportion of query requests will be satisfied, as discussed above.
[0119] In order to train the model, an initial set of records is labeled using a threshold. A sample is accepted (given a positive selection parameter label) if its cumulative frequency, Σ{right arrow over (v)}, is greater than the threshold frequency, v.sub.θ. To define v.sub.θ, a vector {right arrow over (U)} is defined that is composed of the sum of each i as follows.
{right arrow over (u)}=[Σ{right arrow over (v)}.sup.(1),Σ{right arrow over (v)}.sup.(2),Σ{right arrow over (v)}.sup.(3), . . . ,Σ{right arrow over (v)}.sup.(n)],
The threshold frequency can be calculated as:
[0120] In an implementation, the threshold may be labeled for each period, and the model accuracy may be periodically evaluated to see whether the constraint is respected. {right arrow over (y)}=[y.sup.(1), y.sup.(2), y.sup.(3), . . . , y.sup.(n)] denotes the predicted label vector, where y.sup.(i)=h({right arrow over (v)}.sup.(i)). To evaluate whether the model has deviated from the set constraint, the constraint may be formulated in terms of the frequency of field requests as follows:
[0121] Each time this constraint is violated, the threshold, v.sub.θ may be recomputed and the model may be trained with the current state of the system. In practice, it is likely that the threshold will only be violated if the value of the constraint proportion C is altered. In one example, {right arrow over (v)} is a sliding window with a fixed window size, k. When the demand for an arbitrary field changes, the change is therefore reflected immediately in the vector {right arrow over (v)} and the model accepts it according to the trained hypothesis.
[0122] It will be appreciated that not all of the attributes that are important are frequently queried. A set of attributes P may be defined that are required by legal authorities, financial accountants and business analysts. The frequencies corresponding to these attributes are negligible compared to that of the queries pertaining to customer requests. However, the necessity of such attributes are not stochastic as in the case of customer-related attributes. That is, the attribute set P can be predetermined and would not require a model to predict its importance. This characteristic of certain attributes may be accommodated by the use of dynamic and static values of the selection parameter. If y.sub.p is the vector denoting the set of attributes, P, that should be permanently selected for migration, and y.sub.t is the test vector produced by the trained model, then the final field selection vector can be the disjunction (logical OR) of y.sub.t and y.sub.p, i.e., y=y.sub.pVy.sub.p.
[0123] An architecture for the apparatus implementing the FFS algorithm is discussed below and illustrated in
[0124] A typical storage architecture for a telecommunication system makes use of an Application Programming Interface (API) to access the storage database by converting a request to queries, analyzing query responses and producing the necessary inputs for the calling program. This API module is referred to as a query manager 900a. The query manager is the interface between the business logic functions and the storage database. The architecture 900 also comprises a migration manager 900b, which listens to all queries sent from the query manager 900a to the storage. The migration manager 900b extracts the attributes required to respond to each query and assesses their frequency to compute a model that produces the subset of attributes selected for migration to long-term storage (archival), to medium term storage, for deletion from storage etc.
[0125] The migration manager 900b comprises three main components: a Request frequency monitor 910, an Optimized field selector 930, and a Data storage processor 950. The frequency monitor 910 feeds the frequency matrix to be processed by the field selector 930. The field selector 930 considers the fields that have a greater frequency to be important and creates a vector that encompasses these fields based on a programmed critique. The storage processor 950 periodically initiates the migration operation to back up only those fields as provided by the field selector 930 from one storage function to another, for example from the short-term storage to the long-term storage.
Request Frequency Monitor
[0126] As described above, the analysis time period T is split to k time slots, each of size Δt. The request frequency monitor 910, illustrated in greater detail in
Optimised Field Selector
[0127] The optimized field selector 930 is illustrated in
Data Storage Processor
[0128] In an existing system, data is periodically backed up from the short-term storage to the long-term storage for every fixed time frame, which can vary between months to just over a year. In the proposed FFS architecture, the storage processor 950, illustrated in
[0129] Data Migration Process within Different Storage Functions
[0130] The migration of the optimised event data may be dynamically orchestrated between the different storage options i.e. transferred between short-term storage (Hot/Warm) with fast read/write capabilities and long-term storage (Cold) based on the frequency of access. If the optimised event data in the cold storage is frequently accessed, then it may be moved to the Hot/Warm storage. This migration management may be performed using the algorithm below:
[0131] Let E.sub.i=>Sorted Ranked Frequency of optimized event access in the cold storage [0132] S.sub.i=>Size of storage required relate to event [0133] Max=>Calculated Space available in the Hot/warm cluster [0134] Cr=>Criteria for movement [0135] n=>number of records in a batch [0136] Th=>minimum chunk size
[0137] U is function mapping from event frequency into size of event required. U(E.sub.i)=S.sub.i where 1<=i<=n and newly defined objective function is [0138] Max S.sub.j, [0139] Subject to
Th<=S.sub.j<=Max
ΣU(E.sub.j) where [0140] 1<j<=n
[0141] Depending the type of sort, event access frequency (Ei) will be arranged in either ascending or descending order and related to this the data migration can be initiated from cold to hot/warm storage function or hot/warm to cold storage function. The example below illustrates data migration implementation from Cold to Hot/warm storage functions in storage systems.
TABLE-US-00001 Access Frequency count Event (Ei) Size(Kb) (Si) Event1 10 4 Event2 15 10 Event3 20 30 Event4 40 20 Event5 30 10 Event6 15 40 Event7 5 30 Event8 7 25 Event9 1 35 Event10 0 15 Sr Calculated reserved 80 Kb Space available in Hot or Cold or Warm depending on direction of migration Th Minimum chunk size 40 Kb reduce the copy load Less than 80 Kb Cr Criteria for movement is Event4 + Event5 30 FALSE dependent on iterative sum of event size “Event 4 + Event 70 FALSE 5 + Event3 + Event2” Exit and not moved “+Event6″ 115 TRUE So Event4, Event5, Event3, Event2 to be moved in two steps of 40 Kb each to Hot/warm
The following case studies illustrate example implementations of the methods and apparatus presented in this disclosure.
Case Study 1 (Illustrated in FIG. 13):
[0142] In the next generation charging system, telecommunication data is maintained in different databases. These databases are governed by an Event Processing Stack (EPS). The EPS is managed by an Event Data Management (EDM) server cluster 1304 in a Revenue Manager (RM). The telecommunications data mostly consists of Usage Charge Events as well as other events including Balance Adjustment, Refill, Order management etc. The schema of these events has a complicated tree-like structure in which the leaves are predominantly strings. For illustration, the schema of the UsageChargeEvent event is considered. This is an event generated by the charge server (CHA) 1306 of the Revenue Manager framework. This event has 357 fields in total including nested structures and arrays, out of which 240 are leaf fields. The number of time slots is k=20. Only the leaf attributes are considered for selection. Intermediate nodes are removed from the model to avoid confusion. It can be inferred that if a child node is included, so will be the corresponding parent node. The dimension of the frequency matrix is hence 240 samples with 20 features each. The features are all positive integers (frequency counts for the time slots). As the patterns to be learned are not complicated, deep learning frameworks are not required for building the model. The machine learning model used in in the present case study is a Random Forest classifier with 10 trees. The constraint set here is C=0:9, i.e., at least 90% of the requests are to be satisfied on archive. As the query database is unavailable, the outcome of the frequency monitor is generated by a driver for this illustration. The driver is given a probability for each field and the frequency monitor driver then produces frequencies periodically according to the probability assigned and delivers those frequencies to the field selector. As the hypothesis space is relatively simple for this function, the training can be completed by accounting just five periods where each period, T, contains 240 samples (corresponding to 240 fields). Thus, a total of 1200 samples is used for the initial training step, which should be sufficient to converge to the required hypothesis. During the update phase, the critique provides the target y.sub.c to train the field selector.
[0143]
[0144] Over the lifetime of the charge events, only a few fields will have a considerably high query frequency, some others may have lesser requests and most would not be queried at all. For this illustration, the frequency matrix is initially populated as shown in
[0145] Another phenomenon can also be observed from
[0146]
[0147] Case Study 2 (Illustrated in
[0148]
[0149]
[0150] Examples of the present disclosure thus provide methods and apparatus that facilitate the selection of an optimal subset of fields from a schema, based on the frequency with which queries that require those fields are submitted. The fields may be selected such that an agreed proportion of queries are satisfied. In this manner, storage of telecommunications network data in Hot/Warm/Cold storage can be optimised, with reduced overall storage requirements and faster query response. Example methods of the present disclosure are adaptive in that the model can adapt to changes in the structure of the schema as well as to changes in the demand for different data fields. Also proposed are data migration methods according to which optimised event frequency selection and properties of storage clusters are taken into account to select data for migration between storage functions. Selection of a suitable analysis time window T and time slot Δt can be performed on consideration of a customer query database, as these parameters may be tailored to the nature of the query demands and the extent to which their evolution over time is erratic or structured.
[0151] Examples of the present disclosure offer a reduction in TCO for long term storage of network event data, by reducing the storage of data that is never accessed by the operator, and is therefore unnecessary. By optimising the selection of network event data fields for inclusion in Warm storage, the demand for such storage can be reduced, enabling data in the selected network event data fields to be maintained in the Warm storage, and hence more easily available for query resolution, for a longer period of time. Consequently, overall response time for queries is reduced, as a greater proportion of the data required for query resolution is maintained in Warm storage, where it is more easily and quickly accessible.
[0152] The methods of the present disclosure may be implemented in hardware, or as software modules running on one or more processors. The methods may also be carried out according to the instructions of a computer program, and the present disclosure also provides a computer readable medium having stored thereon a program for carrying out any of the methods described herein. A computer program embodying the disclosure may be stored on a computer readable medium, or it could, for example, be in the form of a signal such as a downloadable data signal provided from an Internet website, or it could be in any other form.
[0153] It should be noted that the above-mentioned examples illustrate rather than limit the disclosure, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. The word “comprising” does not exclude the presence of elements or steps other than those listed in a claim, “a” or “an” does not exclude a plurality, and a single processor or other unit may fulfil the functions of several units recited in the claims. Any reference signs in the claims shall not be construed so as to limit their scope.