A Method for Automatically Presenting to a User Online Content Based on the User's Preferences as Derived from the User's Online Activity and Related System and Computer Readable Medium
20170357660 · 2017-12-14
Inventors
Cpc classification
G06F16/9537
PHYSICS
G06F17/18
PHYSICS
G06F16/958
PHYSICS
International classification
Abstract
The invention relates to a method for automatically presenting to a user online content (C) based on the user's preferences as derived from the user's online activity, wherein the method comprises: generating data structures (IP) representing the online content (C) accessed by the user on one or more user devices; identifying from the generated data structures (IP) one or more patterns (P) representative of the user's preferences in terms of online content (C); and identifying and presenting to the user the online content (C) corresponding to said patterns (P).
Claims
1. A method for automatically presenting to a user online content (C) based on the user's preferences as derived from the user's online activity, wherein the method comprises: for each online content (C) accessed by the user on one or more user devices: extracting (101) at least one keyword (K); extracting (103) a set (S) of metadata elements (M); assigning a weight (W) to the keyword (K) and to one or more metadata elements (M) in the set (S); generating at least one first data structure (IP) including the keyword (K), the set (S) of metadata elements (M) and the weights (W); identifying from the generated first data structures (IP) one or more patterns (P), each pattern (P) comprising at least one keyword (K) or at least one keyword (K) and one or more metadata elements (M), which patterns (P) are representative of the user's preferences in terms of online content (C); and identifying and presenting to the user the online content (C) corresponding to said patterns (P).
2. The method according to claim 1, wherein the method further comprises the step of extracting (102) at least one definition (D) for each keyword (K).
3. The method according to claim 1, wherein the set (S) of metadata elements (M) comprises one or more amongst source, time, date, location and language of the accessed online content (C).
4. The method according to claim 1, wherein the step of identifying one or more patterns (P) comprises running a weighted clustering algorithm (WCA).
5. The method according to claim 1, wherein the step of identifying the online content (C) comprises: generating a text search string (T) including a pattern (P); and feeding said text search string (T) to a web crawling software (WC).
6. The method according to claim 1, wherein the method further comprises the steps of: for each identified online content (C): extracting (101) at least one keyword (K); extracting (103) a set (S) of metadata elements (M); assigning a weight (W) to the keyword (K) and to one or more metadata elements (M) in the set (S); generating at least one second data structure (IP) including the keyword (K), the set (S) of metadata elements (M) and the weights (W); presenting to the user the identified online content (C) whose second data structure (IP) matches said patterns (P).
7. The method according to claim 1, wherein the method further comprises the step of monitoring (113) the user's online activity for updating (114) the weights (W) in the first data structures (IP).
8. A system for automatically presenting to a user online content (C) based on the user's preferences as derived from the user's online activity, wherein the system comprises at least one user device including a processing unit and a database, wherein the processing unit is configured to carry out the method according to claim 1 and the database is configured to store the generated first and/or second data structures (IP).
9. A computer readable medium, wherein the computer readable medium comprises program instructions for causing a computer to carry out the method according to claim 1.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
PREFERRED EMBODIMENTS OF THE INVENTION
[0047] In a preferred embodiment of the invention, a Personal Internet Agent PIA selects and presents relevant online content C to the user.
[0048] Firstly, the PIA collects and analyses data related to the user's online activity and, as a result, produces a set of IPs. An IP is a data structure which is representative of the core meaning of an online content C (e.g., a web page or a document). In particular, an IP includes a set S of metadata elements M, each representing a key attribute of the online content C, and associated weights W representing the importance of the different elements to the user. The PIA generates IPs for all types of online content C that the user has accessed such as the online browsing history on the user's mobile devices and PCs, GPS locations, etc. All IPs are saved in a database, for example, on a server of the service provider.
[0049] Secondly, the PIA uses the IPs to identify which online content C should be presented to the user. For example, this may be achieved by a weighted clustering algorithm WCA, which analyses the IPs and identifies patterns P in the interrelationships among them. The most relevant patterns P are the ones that indicate the interests of the user at the time being. The identified patterns P are then used to generate the search strings T that will be employed (e.g., by a web crawling software WC) to search for relevant online content C. The latter may be presented to the user, for example, on a mobile phone application, web pages, RSS feeds, etc.
[0050] Finally, the user's online activity may be continuously monitored 113, so as to update 114 the weights W of the IPs and consequently the user preferences.
[0051]
[0052] The input module encompasses the sources that generate input to the PIA in terms of online content C. Such sources may comprise any platform from which user activity can be recorded such as a web browser, a mobile browser, a mobile phone application, an RSS feed, a third party application, etc. Data is extracted from these sources either in real-time or subsequently by loading files corresponding to the accessed online content C in batch sequences (e.g., in case of new users).
[0053] The data processing module selects the online content C that is relevant to the user by generating IPs and identifying patterns P in the IP population. Hence, the purpose of the data processing layer is to categorize and analyse the user's online activity, and to select relevant online content C. This is accomplished by: (i) generating IPs; (ii) mining the elements of each IP from the online content C accessed by the user (ref.
[0054]
[0055] All IPs are saved in a database, whose purpose is to enable pattern recognition in the IPs. The database is designed such that patterns P across the elements of the IPs can be identified in a data mining process. IPs may be never removed from the database; nevertheless, the allocation of weights W in the IPs will ensure that older IPs will gradually have lower weights W.
[0056]
[0057] The purpose of the weighted cluster analysis is to identify the most significant patterns P in the user's online activity. The elements in the IPs and their corresponding weights W are the basis for the cluster analysis (ref.
[0058] The aim of the online content selection process is to find online content C that is as close as possible to the content that is basis for the highest valued cluster. Basically, the process finds online content C (e.g., by means of a web crawling software WC) thanks to an online search performed with the generated text strings T (ref.
[0059] The output module encompasses the channels on which the selected online content C is presented to the user. The list of URLs identified in the previous process can be presented to the user as content in (ref.
[0060] Optionally, a feedback module monitors 113 the user's online activity and accordingly updates 114 the weights W in the IPs, so that eventual changes in the user's preferences are recorded (ref.
[0061] Note that the use of a personal profiling technology such as that described in the latter embodiment is mainly targeted to the selection of web news articles. There are, however, other application areas in which the technology may advantageously be used, such as (ref.
Example 1: Polar Bear Article
[0062] The user accesses a web page via a mobile phone application. The web page contains an article about polar bears' reaction to the climate change in the Arctic.
[0063] The PIA (which may run on the mobile phone itself or on a server) retrieves the article's URL.
[0064] The text mining application accesses the web page for identifying languages, text patterns, word density, etc. and consequently extracting 101 the keywords K representing the content C of the article. For example, the extracted keywords K could be:
[0065] 1) Polar bear
[0066] 2) Climate change
[0067] 3) Arctic
[0068] 4) Ice season
[0069] 5) Reproductive success
[0070] The 5 keywords will then be converted into 5 corresponding IPs.
[0071] The metadata extraction application will simultaneously access the same web page and extract 103 metadata from the same article. For example, the extracted set S of metadata elements M could be: [0072] Date: the date the source was accessed [0073] Source: the name of the web page, e.g., www.wwf.org [0074] Geography: the location of the user when she accessed the web page [0075] Time: the time spent on the web page [0076] Language: the language in which the web page was written [0077] Publication date: the date the article was published
[0078] The metadata elements M will then populate each of the 5 IPs.
[0079] Optionally, a Wikipedia API, for example, extracts 102 the definition D of each keyword K. For example, the extracted definitions D could be: [0080] Polar bear: carnivorous bear [0081] Climate change: weather patterns [0082] Arctic: polar region [0083] Ice season: no result [0084] Reproductive success: passing of genes onto the next generation
[0085] Thus, 4 out of 5 IPs will be enriched with a definition D.
[0086] The PIA will now define a web search string T to search for similar articles. The web search string T will be defined based upon derived user preferences and the knowledge of the article as represented via the IPs. The user preferences may be derived thanks to a weighted cluster analysis, which identifies patterns P in the IPs generated from the article. For example, as a result of the weighted cluster analysis, the web search string T could satisfy the following requirements: [0087] Contain the keywords K and the definitions D from the IPs in the article [0088] Only look for articles in English [0089] Prioritize articles that are newer than 6 months old [0090] Prioritize articles from wwf.org, un.org and cnn.com [0091] Prioritize articles from USA
[0092] The PIA will then employ the web search string T to perform a web search via, for example, a web crawler WC, whose output may be a list of search results.
[0093] Optionally, the PIA may generate IPs from the articles in the list of search results (all or only the top ones) in the same way it was performed for the original article. This makes it possible to compare the articles to the web search string T requirements and rank the list of search results so that the PIA can suggest to the user articles that are as close as possible to her preferences as well as to the content C of the polar bear article.
Example 2: What is of Interest to Me?
[0094] The user accesses the application via her mobile phone, where she expects to be presented with online content C (e.g., as a list of web pages) that is of utmost interest to her in the given situation. In order to do so, the following procedure may be followed by the PIA.
[0095] Web search strings T may be generated according to situation-specific patterns P in the IP population that match with the user's current situation in terms of time, date and position. For example: [0096] Time: the user prefers reading articles on the stock market in the morning before 09:00 when the stock exchange opens—this will generate a corresponding web search string T. [0097] Date: the user prefers reading articles on Premier League Football on Tuesdays during the football season—this will generate a corresponding web search string T. [0098] Geography: the user prefers reading articles generated in the city where she lives—this is a general requirement, which will thus be included in all web search strings T generated for the user.
[0099] Web search strings T may also be generated according to more general patterns P in the IP population. For example: [0100] The last five articles the user read were about holiday in France—this will generate a corresponding web search string T. [0101] The topic that the user spent most time reading about the last 30 days was on the new iPhone—this will generate a corresponding web search string T. [0102] The user prefers reading articles in English, but sometimes also in German—this is a general requirement, which will thus be included in all web search strings T generated for the user.
[0103] The way articles are selected from the search strings T follows the same procedure as described in the previous example.