Method for automatically transforming text into video
09607611 ยท 2017-03-28
Assignee
Inventors
Cpc classification
H04N21/8543
ELECTRICITY
G10L13/027
PHYSICS
G11B27/031
PHYSICS
International classification
H04N9/80
ELECTRICITY
G11B27/031
PHYSICS
H04N21/8543
ELECTRICITY
G10L13/04
PHYSICS
Abstract
According to the present invention there is provided a method for automatically converting text-based information and content to video form. In one embodiment of the invention the method creates a video which preserves the main idea of a given input text, and is adapted to convey the essence of the text. According to the invention data is extracted from the input text and from other sources of information relevant to it, so that the text can be analyzed as a whole and with respect to its main content. After extracting all the possible data, the text is semantically analyzed, summarized and converted to a video as a configuration file.
Claims
1. A method for automatically converting a text to video without user interaction, comprising the steps of: a. extracting the content and information from said text and from sources relevant to said text; b. analyzing said text and said sources relevant to said text, and generating relations between entities; c. automatically summarizing said text; d. defining movie characteristics based on said extracted information; e. selecting entities and elements to present in a video; f. automatically creating a visualization tree, which creates a visualization of the text and which finds a desirable way to represent said selected entities and elements, wherein the creation of animated info-graphics and the decision regarding what information should be shown and how it should be placed in the video automatically uses said tree; g. setting audio characteristics; and h. automatically assembling said video as a configuration file.
2. A method according to claim 1, wherein the information extracted from a text may include: a. images and video from the source page; b. meta-data; c. page-links; d. tweets; e. author details; f. date; g. comments; h. CSS; and i. rating.
3. A method according to claim 1, wherein the analysis of the text and sources information is performed according to the criteria of: a. content categorization; b. entities and data elements extraction and mapping; c defining the general properties of the text; and d sentiment analysis of the entire text source and specific entities.
4. A method according to claim 1, wherein said movie characteristics are selected from among: speed; tempo; colors and fonts; sentiment; look and feel; and site properties; type of visual representation; type of video output.
5. A method according to claim 1, wherein selecting entity and element is based on the rules of: rules for different types of content, as decided in the movie characteristics step; priority for entity and elements type; variety of entities and elements types; and timing rules.
6. A method according to claim 1, wherein the setting audio characteristics are according to the content properties determined in the movie characteristic steps and may include: narration; sound effects based on defined characteristics and NLP analysis; and music soundtrack.
7. A method according to claim 1, wherein the configuration file is rendered.
8. A method according to claim 1, wherein the video is displayed as a native language without being rendered.
9. A method according to claim 8, wherein the native language is HTML or XML or JSON.
10. A method according to claim 1, wherein the configuration file is created in a server and is played upon a user's request.
11. A method according to claim 1, wherein said video is an advertisement created from text based content and other media resources.
12. A method according to claim 1, wherein a single script is embedded on a text article, page or master page in a website or publication to automatically convert its contents into short videos and embed it on any page of said website or publication.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
(12) According to the present invention there is provided a method for automatically converting text-based information and content to video form. In one embodiment of the invention the method creates a video which preserves the main idea of a given input text, and is adapted to convey the essence of the text. According to the invention data is extracted from the input text and from other sources of information relevant to it, so that the text can be analyzed as a whole and with respect to its main content. After extracting all the possible data, the text is semantically analyzed, summarized and converted to a video as a configuration file. The method of the present invention can for example create a video from an article, or it can display a recipe as a video; another example is a video advertisement that can be automatically generated from a text based content and other media resources. The file is generated on a server and it is played upon a user's request. In addition, the created configuration file is very easy to load and to deliver and saves time, as the photos and videos that are extracted from the sources of information are mostly downloaded before or during the time when the video is played, except for the first few media resources (e.g., two pictures), which are pre-downloaded.
(13) The present invention matches editorial content as well as animated infographics to visually represent the story in the text or certain data within a text.
(14) In the method of the present invention a single configuration file (script) can be embedded on any text article, page or master page in a website or publication to automatically convert its contents into short videos and embed it on every page. By doing so the entire content of a publication or website can be turned into video summaries at very large scales and very quickly.
(15) Reference is now made to
(16) The second step 102 is the text and content source analysis, in which the text and the other sources of information relevant to the text are analyzed. The analysis is done according to the following criteria: a. Content categorization, for example: news, sports, finance, recipes, review, bios, e-commerce, etc. The categorization is done with respect to both the text as a whole unit and to each part and sentence in the text. The categorization is done by understanding the main idea of the text, what is the text talking about in general, what is the essence of the text, in addition to analyzing each sentence in the text; Each source text will get a different treatment depending on the type of content (sports, entertainment, news, etc.) b. Entities and data elements extraction and mapping; c. Creating relations between entities, for example: person to person, person to place, person to age; d. Defining the general properties of the text: main subject, main entity, location, event, etc.; e. Sentiment analysis of the entire text source and specific entities.
(17) Each element is presented separately and if it is relevant the relation between the elements is also presented. Once the relations between entities are created it is possible to better visualize the content. For example, in the sentence: Barack Obama will visit the troops in Iraq next month there is a Person-Location relation; a person traveling to a place can be visualized graphically in an animation or info-graphic. Another example of a different type of Person-Location relation could be a person that was born at a location. These 2 examples would be displayed differently.
(18) Another example is Person-Person relation, which could be a meeting between two individuals, e.g.Hillary Clinton is scheduled to meet with David Cameron. Another example for Person-Person relation is a family relation between two persons; Charlie Sheen joined his father Martin Sheen to the award ceremony.
(19) The visualization of relations between the types of elements can be automatically created using a visualization tree. When new elements are added to the tree new relation types can be made automatically to existing elements.
(20) Sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document.
(21) Classifying the polarity of a given text at the document, sentence, or feature/aspect level means determining whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, beyond polarity sentiment classification looks, for instance, at emotional states such as angry, sad, and happy.
(22) For example if the text content is about a war or a conflict, the sentiment would most likely be negative, sad or expressing fear. Using this data it is possible to visually portray this emotional expression using various templates, fonts sound effects and whatever else that can be used to convey that feel.
(23) The third step 103 is the text summary, in which a summary of the text is prepared. For this step, configurable rules for text summary are determined: which summary to create, according to the aim of the video length; and what percentage of the original text to keep so that a short form video is created.
(24) Once the abovementioned rules are determined, a smart summary is created according to text analysis, which takes into account the category of the source in order to decide which content to keep after the summary. For example, in a sports category it will be decided to keep the relevant scores, while in a financial data category it will be decided to keep the stock information. The length of the text is also taken into consideration, when creating the summary, for example: if the text source is over X words long, it will be summarized to a Y amount of words, when Y is smaller then X. The exact numbers of X and Y are adaptively determined according to each individual text source.
(25) The fourth step 104 is to define the following movie characteristics according to the extracted information: Speed (animations, soundtrack, etc.); Tempo; Colors and fonts; Sentiment; Look and feel (Themes and which media templates to use)News, sports, entertainment, recipes, manuals, education, e-commerce, advertisements etc.; Site properties. Type of visual representation such as picture, video footage info-graphic etc.; Type of video outputas a video configuration file or as a rendered video file.
(26) Each of the characters is adaptively defined according to each individual text source that is received. In the characteristics step many parameters are determined, that will later assist in the movie making process. For example after the content category is determined this data can be used to help during the element selection phase. For instance, if the content category is Finance then the element selection component would focus on related elements such as: company names, stocks, currencies, market trends etc. Alternatively, if the content category is Sports the element selection component would focus to select elements such as scores, teams, player names, venues etc.
(27) The fifth step 105 is the entity and elements selection, where elements and entities are selected to use in the video based on appropriate rules. In the example to follow the following illustrative set of rules are discussed, it is being understood that additional or alternative rules can be employed and that the invention is in no way limited to any specific rule or set of rules exemplified: Rules for different types of content, as decided in the Movie characteristics step. For example, in case of a sports article, the emphasis is on scores, player names and venues, while in case of finance articles the emphasis is on currencies, company names, market indexes etc. Priority for entity and elements typee.g. high priority for things that can be visualized using images, maps and videos and lower priority for things that can only be visualized using info-graphics or animated text. It should be understood that, according to the invention, info-graphics are given higher priority than text. Variety of entities and elements typesthe priority rules are designed to be dynamic so that a certain level of media type variety is reached. An embodiment of the invention is to create a good mix of images, videos, maps, info-graphics and animated text. Timing ruleswhich take into account the amount of time that each element or entity is displayed. It is necessary to have enough spacing between each element so that every element gets the defined display time. The first and last elements are also taken into account and their timing is adjusted as well.
(28)
(29)
(30) Spread 725: sort by distance from other instances of the same element.
(31) The sixth step 106 is the media fetching and creation. After the element selection step, there is the stage of visualization, in which elements or entities chosen in the selection step are taken and used for a Dynamic visualization tree, which finds the best way to represent a specific entity or element. The visualization tree will be further explained below. After it is decided how to represent each selected element, relevant images are retrieved from a preferred media provider. In some cases an entity may be represented with a general image or video that suits the entity but does not refer specifically to a specific item. Moreover, an image or video that matches the general idea of the text and which are relevant to the entire text and not just to an entity, can be presented.
(32) The seventh step 107 is the Audio setting, which comprises the definitions of: Narrationgender, speed, language, artificial voice (TTS), real voice; Sound effects based on defined characteristics and NLP analysis; Music soundtrack; and Every other sound that can help to deliver the story in the best way.
(33) The eighth step 108 is the automatic video assembling. After all the previous steps are completed the video is assembled as a configuration file (e.g. JSON, XML, etc. . . . ). In one embodiment of the invention the configuration file is used without rendering. In this case the video can be displayed as HTML or in other native language on various devices such as: mobile devices, PC, smart TV, smartphone, smart glasses, smart watches etc., without the need to render it into a physical video file.
(34) In another embodiment of the invention the configuration file is rendered. Whether or not to render the configuration file is dictated by convenience of use under specific conditions, it will then be decided whether to render it or not on the basis of practical considerations. This configuration file contains all the media assets and information for the video including direction, display and animation instructions.
(35)
(36) It should be understood that according to the invention the visualization tree and hence the categorization of each element can be continuously expanded.
(37)
EXAMPLE
(38) The following is an example showing the whole process of converting text to video. The system entry point is a queue, starting from the movie maker that is initiated with a queue item which contains the text source or a URL containing the text source to be processed.
(39) In this example the text source is extracted from the following URL: http://edition.cnn.com/2012/11/08/sport/football/cristiano-ronaldo-interview/index.html?hpt=ieu t4 1. At first the Web page content is retrieved and cleaned from HTML junk so that the pure text is received. In this example, it is only the title and body of the article which are extracted as follows:
Extracted Text:
Title:
(40) Ronaldo admits perceived arrogance has cost him
(41) Body:
(42) Cristiano Ronaldo believes his arrogant image has prevented him from capturing the hearts of football fans across the globe.
(43) In an exclusive interview with CNN, the Real Madrid forward reveals how his onfield demeanor has left him sitting in the shadow of the sport's golden boy and fans' favorite, Lionel Messi.
(44) I don't want to cry about it, but sometimes I think yes, said Ronaldo after being asked whether his image had cost him in the past.
(45) It's a question to which I never give the 100% right answer, because sometimes I really don't know.
(46) I don't think it's allowed for people to change my personality.
(47) Maybe sometimes, I agree that I have a bad image on the pitch because I'm too serious.
(48) Ronaldo and Messi will go head-to-head for the prestigious Ballon d'Or in January, with the Barcelona star having won the award on each of the previous three occasions.
(49) Both men have taken the sport to a new level with their record goalscoring featsRonaldo has scored an astonishing 164 times in 160 appearances for the Spanish champions, while Messi hit 50 in La Liga alone last term.
(50) Ronaldo, who won the Ballon d'Or in 2008 when at Manchester United, led Madrid to the league title last season and has scored in his past six successive El Clasicos.
(51) The 27-year-old Portugal star is unhappy with how he's often portrayed in the media compared to more loveable Messi and says he has become a victim.
(52) But if you really know me, if you are my friend and I leave you inside my house and you share the day with me, you will know I hate to lose, he said.
(53) I learn by my mistakes and that's life. You know, sometimes I'm a victim of that because they don't know the real Cristiano.
(54) While Messi often plays with a smile on his face like a kid in the schoolyard, Ronaldo is often seen moaning, gesticulating and scowling while trying to inspire Real to victory. 2. The analysis module analyzes the text, retrieves the language, category and elements as follows:
Document Attributes: Language: English; Category: Sports; Main entity: Cristiano Ronaldo; Main Location: Madrid; Main organization: Real-Madrid Football Club; Date: Nov. 12, 2012. 3. The analysis module also summarizes the text:
Text Summary:
(55) Cristiano Ronaldo believes his arrogant image has prevented him from capturing the hearts of football fans across the globe.
(56) In an exclusive interview with CNN, the Real Madrid forward reveals how his onfield demeanor has left him sitting in the shadow of the sport's golden boy and fans' favorite, Lionel Messi. Both men have taken the sport to a new level with their record goalscoring featsRonaldo has scored an astonishing 164 times in 160 appearances for the Spanish champions, while Messi hit 50 in La Liga alone last term. But if you really know me, if you are my friend and I leave you inside my house and you share the day with me, you will know I hate to lose, he said. While Messi often plays with a smile on his face like a kid in the schoolyard, Ronaldo is often seen moaning, gesticulating and scowling while trying to inspire Real to victory.
(57)
(58) Cristiano Ronaldo believes his arrogant image has prevented him from capturing the hearts of football fans across the globe.
(59) In an exclusive interview with CNN, the Real Madrid forward reveals how his onfield demeanor has left him sitting in the shadow of the sport's golden boy and fans' favorite, Lionel Messi.
(60) Both men have taken the sport to a new level with their record goal scoring featsRonaldo has scored an astonishing 164 times in 160 appearances for the Spanish champions, while Messi hit 50 in La Liga alone last term.
(61) But if you really know me, if you are my friend and I leave you inside my house and you share the day with me, you will know I hate to lose, he said.
(62) While Messi often plays with a smile on his face like a kid in the schoolyard, Ronaldo is often seen moaning, gesticulating and scowling while trying to inspire Real to victory.
(63) At the end of this step there are 11 elements that were selected to be presented in the video. 5. According to the selected elements, relevant images are retrieved using a mapping between the element type and a preferred provider. An example for the first element Cristiano Ronaldo 901 can be seen in
(64) Although embodiments of the invention have been described by way of illustration, it will be understood that the invention may be carried out with many variations, modifications, and adaptations, without exceeding the scope of the claims.