SYSTEM AND METHOD FOR SYNTHESIZING SPOKEN DIALOGUE USING DEEP LEARNING
20210125236 · 2021-04-29
Inventors
Cpc classification
G06F16/9535
PHYSICS
H04N21/4126
ELECTRICITY
G06F16/58
PHYSICS
G06F16/9536
PHYSICS
H04N21/25883
ELECTRICITY
G06V20/41
PHYSICS
H04N21/44218
ELECTRICITY
H04N21/4532
ELECTRICITY
H04N21/4312
ELECTRICITY
International classification
G06F16/28
PHYSICS
G06F16/58
PHYSICS
G06F16/9535
PHYSICS
G06Q50/00
PHYSICS
H04N21/258
ELECTRICITY
H04N21/431
ELECTRICITY
H04N21/442
ELECTRICITY
H04N21/45
ELECTRICITY
Abstract
Disclosed are a system, method and software to associate attributes with digital media assets using deep learning. Digital media contains specific assets, such as audio, that can be replaced with other assets. The system, method and software allow for personalizing digital media content based in part on neural network analysis in order to generate synthesized audio, such as speech, spoken prompts, and/or dialogue.
Claims
1. A method for personalizing media content using synthesized audio, comprising: generating, by a server, a script configured to output a digital media stream; capturing, by a computing device communicatively coupled to the server, a user interaction with the digital media stream; analyzing, by the server, the user interaction to determine a user affinity for the digital media stream; selecting, by the server, an audio asset based on the user affinity; and updating, by the server, the script to insert the audio asset into the digital media stream.
2. The method of claim 1, wherein the audio asset is a speech-based audio asset.
3. The method of claim 1, wherein the audio asset is a song.
4. The method of claim 1, wherein the user affinity is selected from a group comprising at least one of a preference, an emotional value, a cognitive value, and a social value.
5. The method of claim 1, wherein the user interaction is selected from a group comprising at least one of a viewing habit, a purchase, a selection, and an answer to a question.
6. The method of claim 1, wherein the selecting utilizes an artificial intelligence method to identify the digital audio asset that matches the user affinity.
7. The method of claim 1, wherein the selecting utilizes a neural network to identify the digital audio asset that matches the user affinity.
8. The method of claim 1, wherein the user interaction is an input indicating a like or dislike by a user.
9. A system for personalizing media content using synthesized audio, comprising: a content distributor configured to generate a script, the script configured to output a digital media stream; a server coupled to the content distributor, the server configured to capture a user interaction with the digital media stream; and a correlation algorithm communicatively coupled to the server, the correlation algorithm configured to correlate a user affinity with the user interaction, wherein the content distributor is further configured to update the script to insert an audio asset into the digital media stream, wherein the audio asset is selected based on the user affinity.
10. The system of claim 9, wherein the audio asset is a speech-based audio asset.
11. The system of claim 9, wherein the audio asset is a spoken prompt.
12. The system of claim 9, wherein the user affinity is selected from a group comprising at least one of a preference, an emotional value, a cognitive value, and a social value.
13. The system of claim 9, wherein the user interaction is selected from a group comprising at least one of a viewing habit, a purchase, a selection, and an answer to a question.
14. The system of claim 9, wherein the content distributor utilizes an artificial intelligence method to select the digital audio asset that matches the user affinity.
15. The system of claim 9, wherein the content distributor utilizes a neural network to select the digital audio asset that matches the user affinity.
16. A method for providing interactive synthesized audio content, comprising: generating, by a server, a script to output a digital audio stream in the form of a spoken question; capturing, by a computing device communicatively coupled to the server, a user response to the spoken question; selecting, by the server, an audio asset based on the user response; updating, by the server, the script to insert the audio asset, thereby generating a second digital audio stream; and outputting to the computing device, by the server, the second digital audio stream.
17. The method of claim 16, wherein the user response is in the form of audio captured from a user.
18. The method of claim 16, wherein the user response is the form of an emotion captured from a user.
19. The method of claim 16, wherein the user response is an input indicating a like or dislike by a user.
20. The method of claim 16, wherein the selecting utilizes an artificial intelligence method to identify the audio asset based on the user response.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0024] The accompanying drawings, which are incorporated in and form a part of the specification, illustrate embodiments of the present invention and, together with the description serve to explain the principles of the invention.
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047] IMI;
[0048]
[0049]
[0050]
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
[0051] In describing an embodiment of the invention illustrated in the drawings, specific terminology will be used for the sake of clarity. However, the invention is not intended to be limited to the specific terms so selected, and it is to be understood that each specific term includes all technical equivalents which operate in a similar manner to accomplish a similar purpose.
[0052] With respect to the user, a number of terms are used that describe how the user is identified and profiled. As used herein the term “user identification” means a number, symbol, or alphanumeric sequence that identifies a single subscriber, a group of subscribers, a subset of subscribers, or a subscriber in a specific location or on a specific network.
[0053] The term “user profile” includes a stored or calculated profile that describes one or more aspects of the user, such as demographics, psychodemographics, and attributes. The profile can be determined from questions answered by the user, forms filled out by the user, and/or interactions of the user with the digital media narrative. Alternatively, the profile can be determined from the user's web surfing characteristics, shopping habits, television viewing habits, and/or actual purchases. Profiling of users based on these interactions, viewing habits, and purchases is well understood by those skilled in the art.
[0054] “User attributes” include aspects, characteristics, and qualities of the user that are useful for determining (matching, correlating, and selecting) digital media assets. These attributes may include characteristics such as affinities, likes or dislikes as described outside of affinities, perceptions, experiences, and other factors that play a role in determining the internal narrative perception identification framework.
[0055] The term “internal narrative traits preference topology” means a representation of personality, such as a representation that is similar to the Myers-Briggs personality classification scheme. However, the representation establishes a measure of the potential for impact on the individual as specifically applied to narrative and interactive narrative. The Keirsey temperament sorter is a personality test that scores results according to the Myers-Briggs personality classification scheme, and allows testing or classification to occur over the Internet. The Keirsey test allows viewers to create a personality inventory of themselves. The internal narrative traits preference topology thus provides for personality classification and gathers data from a variety of sources, including the individual's interactions with the interactive narrative. As can be seen in
[0056] “Narrative content” includes content used for story telling including story telling containing direct advertising, product placement advertising, or combinations thereof “Story based” content refers to content that tells a story, which either is based on fact or is fictitious in nature, as opposed to a simple recitation of product characteristics. This is not to imply that advertisements that contain product characteristics or facts cannot be story based, only that the story based content contains those characteristics or facts in the context of a story.
[0057] “Trigger points” are occurrences or time points within a story or narrative content that may cause the recipient (viewer, reader, or listener) to take a particular interpretation at one or more levels or that may affect the user's emotional state. “Personalization trigger points” are those trigger points that allow for modification of the story or narrative content in support of customization of the content to match the internal narrative perception framework and appropriately influence the user.
[0058] “Episodic” content refers to narrative content or stories that contain episodes, which either are arranged in a time sequence or are accessible to the user individually. A time sensitive sequence or set of episodes includes narrative content or stories that create personalized impact if temporally or sequentially changed. For example, a time sensitive sequence may include a mystery or story with a surprise ending that, per individual, may vary with respect to timing of delivery (i.e., wait 3 seconds versus 5) to build the optimal level of anticipation per individual.
[0059] “Self-narrating” refers to interjecting elements of a user's life into the narrative that the user is watching. Self-narrative techniques take stories related to the user's life in a deeper way. Users can provide self-narrative content that may be incorporated as narrative or digital media assets that may be included in personalized narrative. As an example, a user may upload a photo of himself and that photo may be used in a personalized narrative being presented to another user. This may occur when a user accepts membership in a group and agrees to share a viewing experience with another user. Self-narrative content may include audience-generated content. This content may be self-sustaining in that it can automatically be incorporated into numerous personalized narratives and add another life or perspective to the narrative, as opposed to simply being inserted or viewed at the direct request of a user.
[0060] A “self narrating audience generated content classification” refers to the labeling of a digital media asset such that the individual or group generating the asset is known, as well as the potential for using that asset. As an example, a user may provide a photo of himself, and indicate that that photo can be inserted into narrative presented to a select group of users or potentially all users. Similarly, the user may upload other digital imagery, text, video, or other material that serves as a background, an overlay or another element in a narrative. The classification of this material determines how and when it may be used.
[0061] “Significance of affinity” is a measure of the strength of affinity and is useful in determining the level of attraction an individual has for a particular element. As an example, a user that repeatedly selects oranges in an interactive presentation would have a higher probability of having a high level of affinity for oranges. The degree of significance may be related to the probability of having an affinity for an element.
[0062] “Association rules” provide the ability to match digital media assets to an individual, through a correlation of the attributes of the asset with the assets of the individual in order to provide the highest level of impact. The correlation of the attributes may consist of summing the number of matching attributes, identifying key attributes, or providing a true/false test for one or more attributes. A relative weighting scheme may be incorporated into the correlation to give preference to or emphasize certain attributes. Correlation thus refers to the process of matching or selecting a digital media asset based on the overlap between the attributes of the individual and the asset, with the goal of having a greater narrative impact.
[0063] A “collective/collaborative classification,” as illustrated in
[0064] An “affinity” is a measure of how much an individual is attracted to a particular element of a narrative. As an example, an individual who was always served fried eggs on Wednesdays by his/her mother may have fond memories of that experience and may have a positive affinity for fried eggs, in particular on Wednesday. Conversely, an individual who was forced to eat cold fried eggs as a punishment may have a negative affinity for fried eggs.
[0065]
[0066] In terms of economic arrangements, in one embodiment, a sponsor 101 may provide financing 108 to one or more of the artist 112, the creative director 110 and the distributor 102, such that they have resources to provide media assets and to provide a viewing mechanism for the consumer 104. In one embodiment, the artist 112 is a musician who provides songs and musical videos as original content 114. In this embodiment, the sponsor 101 is a manufacturer, such as a manufacturer of consumer package goods who desires to advertise its products to the consumer 104. The sponsor 101 provides financing to the artist 112 and the creative director 110/distributor 102 to permit the creative director 110/distributor 102 to create a database system and the appropriate digital media assets for personalized viewing by the consumer 104.
[0067] In an alternate embodiment, the sponsor 101 finances the artist 112 to provide new material directly related to its product, which may incorporate elements of previously produced songs, videos or other artistic works. In this alternate embodiment, the sponsor 101 may retain a closer relationship with the artist 112 to produce content specifically for advertising its products. This content may be further personalized through the content assembly and distribution system 100 to produce modified content 106 for the consumer 104. This modified content 106 is the personalized digital media asset.
[0068] Note that in the description set forth above, the artist 112 need not directly interact with or be financed by the sponsor 101. Rather, one or more intermediaries, such as agents, producers, studios, distributors or other entities may represent the artist 112 to the sponsor 101. Accordingly, the term “artist” is intended to include an actual artist and/or one or more such intermediaries.
[0069]
[0070] Background images 200 for digital media assets 212 may be still graphic images or MPEG based images, desktop colors, flash files, patterns or any other types of images that are typically used as background elements within current or future digital media narrative experiences. Video sequences 204 may be in the form of MPEG-2 or MPEG-4 video sequences, but may also be other types of video including but not limited to Real Video, AVI video, flash based video, animations, or other video sequences. Text 208 may include overlay text scrolling, masked text, and/or other types of textual messages that appear on a screen, such as tickertape messages that appear at the bottom of a screen. Branding graphics 206 may include icons, symbols, figurines or any other types of element that appear on a screen that are not typically thought of as video images. Foreground images 202 may include still images, such as photographs, drawings, animations or any other types of image, that are brought to the attention of a user in the foreground. Digital audio 210 may be any type of digital audio including MP3 audio or any other compressed or uncompressed digital audio streams, files or segments.
[0071] Although personalized digital media assets 212 are shown to have individual sets of frames, it is to be understood that the digital media assets may be a combination of all of the above aforementioned elements and may or may not contain individual frames, but may have certain points which delineate segments or pieces of the asset from the pieces, both temporally and in terms of content.
[0072]
[0073] An affinity may be the extent to which something creates an impact on the user, based on the user's life experiences. As an example, an old T-shirt may remind a user of an individual that previously wore that T-shirt. This may be a strong affinity if that person was very significant to the user (e.g., a parent, a spouse or a child). In some cases the user may not have an affinity for that object but may subsequently develop an affinity based on something that the user sees. As an example, if the user sees a magazine cover with an attractive person wearing an old T-shirt, this may create a new affinity. As another example, hearing a song while on a first date with a future spouse may create a new affinity to that particular song or artist. The user may have strong emotional feelings upon hearing the song because the song prompts the user to recall the early dating experience. Affinities and the quality of affinity thus define the extent to which the object or, in the case of the present system, digital media assets have an impact on users. Since affinities constantly change and may depend on a user's emotional state as well as past experiences, the system may update user profiles to determine which digital media assets should be used to create stronger emotions in the user.
[0074] The personalized (internal) user story 306 may be what the user perceives the digital media asset to be and may depend, at least in part, on the emotional state, demographics, psycho-demographics, cognitive states, social placement and/or group interaction dynamics within the online community, affinity for certain content features and/or other factors particular to the user. Trigger points 320 may be presented such that the digital media assets are customized to provide the user with a new personalized (internal) user story 312. This new personalized (internal) user story 312 may be composed of new images, audio, text and/or other content that, based on a user profile, are expected to trigger affinity element emotions 310 that are newer and/or stronger than emotions 304 previously experienced by the user or the expected emotions if the media assets were not personalized.
[0075] Trigger points 320 provide the mechanism for content management and the creation of a more personalized digital media asset based on a user's personal experiences. Trigger points 320 can be placed at various points in the digital media content, based on determinations of the creative director 110/distributor 102. For example, the creative director 110/distributor 102 may decide to place trigger points 320 in the digital media content 300 so that they occur at various points in time, when a certain character appears on the screen, when certain text is displayed, when words are spoken or sung, or based on other features of the digital presentation. During presentation of the digital media content, when the presentation reaches a trigger point 320, a script or other software program is executed. The creative director 110 or the distributor 102 may customize the script for each trigger point 320. The script may cause a computing device to access a database containing profile data relating to the user, and, based on the user profile information, the script may cause the insertion and/or replacement of video, graphics, audio or other material in the digital presentation. Furthermore, the timing of the presentation including but not limited to playback speed or changing the sequence of the presentation may be altered.
[0076] For example, the creative director 110 may place trigger points 320 in a digital media asset video sequence 300 each time that a particular character appears on screen. When the character appears on the screen, the script may, for example, access the user profile database to determine whether the user responds favorably to the character. If the user responds favorably to the character, the script may cause the insertion of a product advertisement on a location in the video screen whenever the character appears on the screen. Conversely, if the user does not respond favorably to the character, the server may take no action, or the system may insert different content into the screen when the character appears. Another example of a trigger point 320 is a portion of the digital media content that contains “hidden” advertisement, such as a character driving a particular brand of automobile while background music is played. The trigger point in this case could include both the type of vehicle and the background music. Upon reaching the trigger point 320, a script is run determining which type of vehicle manufactured by the sponsor may connect with the user and which song among a group may have the most appeal or greatest affinity to the user. For example, certain users may have a strong connection to pickup trucks and country music while other users may respond to Sport Utility Vehicles (SUVs) and classic rock. Upon reaching a trigger point 320, the system may select the make of vehicle that the character drives, the type of music played in the background and the volume of the background music. The system could also access additional information outside of the user profile to determine whether insertion should occur. For example, if a first user's profile indicates that the user has an affinity for a second user, when the character appears on the screen the system may determine whether the second user is also viewing the presentation at the same time. If so, the system may place an icon on the screen to indicate to the first user that the second user is also viewing the presentation.
[0077] In addition to trigger points 320, flexible trigger points 322 may be utilized. Flexible trigger points 322 have the property that they may be moved in time or entirely deleted. Flexible trigger points 322 thus allow further personalization of the digital media asset experience based on specific assets of the user typically learned through typical activity of the user with digital media asset 212. Trigger points 320 and flexible trigger points 322 may include time stamps and trigger point IDs 324. As represented in
[0078]
[0079] For example, the default experience 400 may be a new music video having interactive opportunities 402 that include selecting whether the video contains scenes from Spain or Italy followed by selecting scenes from New York City or the Rocky Mountains. The user's selections may be stored and utilized in personalizing future presentations. For example, if a user selects that a character travels from Italy to the Rocky Mountains, the system may infer that the user enjoys mountain scenery and perhaps skiing. Upon viewing an advertisement for a soda, the digital content may include a skier stopping to drink the soda. Conversely, if the user preferred New York City, the digital content may contain a dancer at a club stopping to drink the soda.
[0080] In another embodiment, a new user is asked a series of yes/no questions, multiple choice and/or open-ended questions. For example, a user may be asked to answer questions within the context of the narrative such as: “What is your favorite animal?”; “Would you rather ride a motorcycle or drive a sports car?”; “Do you prefer blue, red, green or yellow?”; or “True or false, I like fishing?” Questions may include demographic questions, such as gender, age, ethnic background, income, education level and region of residence. A user may also be asked general questions about his or her mood, state of mind or personality traits. The answers may be compiled to create a user profile which includes demographic information and personal likes/dislikes. The demographic information may be used to select the appropriate general database to which a member belongs, such as young female or middle-aged male. The questions may be designed to gauge the user's personality traits and affinities, the user's emotional state and the user's emotional response to media content. The queries may be more abstract than direct questions, such as “Pick a happy color,” “Choose a word that saddens you,” or even something similar to projective psychological tests such as an ink blot test or word association test.
[0081] Demographic information may be compiled because an individual's demographics have a great affect on his or her interest. A middle-aged parent is more likely to be interested in family oriented media narrative while a single young-adult is more likely to be interested in more risque media narrative. Also, a user that is in the economic middle-class may be more interested in high-priced leisure activities such as golf or skiing, while a user that is in a lower economic class may be more interested in less costly activities, such as basketball. Compiling an individual's demographics lends a wealth of information on materials that are more likely to have a personal connection.
[0082]
[0083] The server 590 may develop the personalized digital media asset 212 from content 531 and the digital asset repository 541. In one embodiment, the server 590 stores modified content in a modified content storage medium 551. In an alternate embodiment, the modified content is not stored, and the personalized digital media asset is presented to user 501 via the server 590.
[0084] Referring again to
[0085] In one embodiment, the user 501 provides a user ID 520, a password 522, and interactions/choices 524 to a server 590. The user 501 may be presented with digital video 500, digital audio 510, background images 200, foreground images 202, text 208, digital media graphics, digital animation, and/or branding graphics 206. The user 501 may participate in an online community system 521 in which the server 590 sends the user ID 520 to the online community system and receives lists of community user attributes 515 and active vs. inactive status 517. In this way, the server 590 may determine which users are online and which users may be able to share a personalized digital asset experience. Stored user profiles 561 are stored and the server 590 may access a user profile 560 using a user ID 520.
[0086] Content 531 may also be stored and provided to the server 590 in the form of digital graphics or video 530 and/or digital audio 534, optionally based on content requests 536.
[0087] A digital asset repository 541 may receive asset requests 540 from the server 590 and may provide items such as background images 200, foreground images 202, text 208, and branding graphics 206.
[0088] In one embodiment, the modified content is stored in modified content storage 551 and includes a time index 550, an asset ID 552, a media ID 554, a user ID 520, digital video 500, digital audio 510, background images 200, foreground images 202, text 208, and/or branding graphics 206.
[0089] In an alternate embodiment, personalized digital media narrative is created from the content 531 and the digital asset repository 541. The narrative may not be stored as modified content but may be directed at the user 501 without storage.
[0090]
[0091] Referring again to
[0092] A user profile monitor 500 may also work to understand outside emotion and data mapping 528 to determine whom the user is connecting with online 526 and the traveling profile management 524, which may ensure that an individual profile travels from program to program. Each of these elements may act to create a more complete stored user profile 561 and thus a better customization of the experience 504.
[0093]
[0094] The three elements, as illustrated in
[0095] Referring to
[0096]
[0097]
[0098]
[0099]
[0100]
[0101] The database structures illustrated in
[0102]
[0103] Software to provide the functionality for a personalized digital media asset creation may be developed using a number of computer languages such as C, C++, Perl, Lisp, Java and other procedural or object oriented languages. Different programming languages may be used for different aspects of the system, such that a first programming language may be used for the content creation process illustrated in
[0104] In one embodiment, the software may be a web-based application containing program modules. The program modules may include Java servlets, Java Server Pages (JSPs), HyperText Markup Language (HTML) pages, Joint Photographic Expert Groups (JPEG) images, Macromedia Flash MX movies, and/or a reusable Macromedia Flash MX component. The software may be executed on a compatible server environment including a web server, servlet container, Structured Query Language (SQL) database and Java Database Connectivity (JDBC) driver.
[0105] The Macromedia Flash MX movies and the reusable Macromedia Flash MX component may include multiple Macromedia Flash MX source files. A programmer may supply a first file that contains code for a Time Frame component and/or a reusable Flash MX component that implements the user side of the trigger point 320. An implementation may include visually framing the image to be displayed and resizing the image to be displayed to fit the frame, if necessary. For example, a programmer may supply a second file that includes code having two Time Frame instances and three buttons per Time Frame, the buttons including a “Warmer” button, a “Colder” button and a “Reset” button. The “Warmer” button may set a variable indicative of an affinity value to a lower value and load an image (or images) from the server that correspond to the new variable value. Similarly, the “Colder” button may set the affinity value variable to a higher value and load an image (or images) from the server that correspond to the new variable value. The “Reset” button may reset the variable to a mid-range value or the previously stored value for the user. As an alternative to the second file, a third file may be stored including a Time Frame instance, a “Load preferred image” button and two or more text entry boxes. The user may utilize text entry boxes to enter, for example, an affinity group name and a username. When the user enters valid information into both text entry boxes and clicks on the “Load preferred image” button, the information may be sent to the server. The server may use a database table to select an image based on the received information and may return the selected image to the user.
[0106] The application software may include multiple database tables such as tables of internal narrative perception identification frameworks, current users, user specific social affinities, user specific emotional affinities, and/or trigger points. In an embodiment, the application software may include a table that specifies an image that best represents the element for a specific affinity element group and affinity element.
[0107] The application software may include one or more HTML pages used to access the Macromedia Flash MX source files and to update the stored user Profile. The application may include one or more Java servlets. In an embodiment, a first Java servlet is utilized to find the affinity elements having the maximum value for the specific user, among all affinity elements in a specified group, and return the image corresponding to that element having the maximum value. In the embodiment, a second Java servlet is utilized to display the affinity values for the user, the affinity type and the affinity element group and to provide a means for the creative director 110 to update the affinity values.
[0108] The application software may include a plurality of JPEG image files that are provided from one or more sources. The sources may include any public source of image files, public copyrighted files with an appropriate copyright agreement or private files
[0109]
[0110]
[0111]
[0112] In one example, an advertiser that manufactures various types of pet food, including dog food and cat food, forms an agreement with a record label that distributes music videos on the Internet for free viewing. A new music video of a popular artist may include a scene, segment or image having a dog or cat walk across the background to eat from a bowl of food or simply have a dog or cat graphic. Sitting next to the bowl of food is a bag labeled with one of the advertiser's brand name of pet food. Upon entering the website, the user's personal profile may be accessed. The profile may include information that the individual is a dog lover and/or dog owner. During playback of the video, a selection of the species of animal may be determined at the trigger point 320 based on the viewer's profile. For example, a tag 1010 in the profile may indicate to insert a dog into the video. Insertion of a dog into the media as opposed to a cat increases both the effectiveness of the advertising, by allowing the advertiser to highlight dog food to a dog lover, and the enjoyment of the video, since a dog lover is more likely to enjoy a music-based digital media experience featuring a dog. Thus, both the advertiser and the artist may benefit from the enhanced digital media being presented to viewers. Furthermore, the personal profile may further indicate a preferred breed of dog, such as golden retriever or terrier. If such information is specified, the specific breed of dog or cat may be inserted at the trigger point 320. The affinity of the user to the breed of animal may result in the user feeling more personally connected to the video.
[0113]
[0114]
[0115]
[0116]
[0117]
[0118]
[0119] The screens of FIGs.15-21 may be enhanced by operation of a fuzzy logic based Enhanced Director Agent.
[0120] The DMA action may include changing any aspect of the digital media narrative that enhances the experience without destroying the integrity of the narrative experience. The action may include time sensitive changes, such as the changing of events, the playback speed, the timing of playback or the sequence of events such as changing the orientation of “scenes.” The action may include changing the audio including the volume of playback, the score (i.e., the background music), the language spoken or even the accent of the speaker. The action may include changing the video aspect such as the gender, race or age of a character, the background scenery, the elements of the episode (e.g., a motorcycle, bicycle or horse is ridden), the color of clothing worn, an overcast or sunny sky, or any other visual aspect of the DMA. The invention is intended to cover any DMA actions that make the digital media asset video sequence 300 more connected to the viewer and enhance the experience.
[0121] In an embodiment, the DMA actions are logical and do not break the flow of a narrative or an episodic narrative. In other words, in an embodiment, a changed asset does not destroy the plotline of a story and does not introduce a character or element that has no logical reason for appearing in the frame. For example, in this embodiment, it would not be appropriate to change the background scenery to a cityscape if the character is shown wearing skis, conversely changing the background to a mountain while the main character is carrying shopping bags would destroy the flow of the DMA.
[0122] Another collaborative aspect may include enabling another user to control the digital media presented to the user. A first user may experience a digital media narrative that includes the aforementioned automatic enhancements and allows for personal selection of events/media content. The first user may enjoy the content so much that he or she wishes to share the experience with select friends, family or colleagues. The user may save the personalized digital media and enable selected individuals to share the experience by informing the system. The first user may set a security level for further sharing. A low security level may allow general access to the digital media narrative and enable secondary viewers to share the personalized digital media narrative with other viewers. A high security level may limit viewing of the digital media content to users having a direct relationship to the first viewer. A medium security level may limit access to viewers having either a direct link to the first viewer or an indirect connection, such as a friend-of-friend connection.
[0123]
[0124] It should be noted that the invention is not limited to viewing on a Personal Computer (PC) or laptop computer but is intended for use with any digital viewing or listening device. This includes, but is not limited to, televisions, Personal Digital Assistants (PDAs), wireless telephones, MP3 players and any other device utilized to view or listen to video and audio signals and that can carry on two way communications.
[0125] The many features and advantages of the invention are apparent from the detailed specification. Thus, the appended claims are intended to cover all such features and advantages of the invention which fall within the true spirits and scope of the invention. Further, since numerous modifications and variations will readily occur to those skilled in the art, it is not desired to limit the invention to the exact construction and operation illustrated and described. Accordingly, all appropriate modifications and equivalents may be included within the scope of the invention.
[0126] Although this invention has been illustrated by reference to specific embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made which clearly fall within the scope of the invention. The invention is intended to be protected broadly within the spirit and scope of the appended claims.