Patent classifications
G10L15/26
Creating a Printed Publication, an E-Book, and an Audio Book from a Single File
As an example, a server may receive, from a computing device, a submission created by an author. The submission includes book data associated with a book and author data associated with the author. The author data includes incarceration data indicating whether the author was incarcerated. The server may determine, based on the author data and the book data, that the submission is publishable. The server may create, based on the book data, a printable book, an e-book, and an audio book and make one or more of the printable book, the e-book, and the audio book available for acquisition.
Creating a Printed Publication, an E-Book, and an Audio Book from a Single File
As an example, a server may receive, from a computing device, a submission created by an author. The submission includes book data associated with a book and author data associated with the author. The author data includes incarceration data indicating whether the author was incarcerated. The server may determine, based on the author data and the book data, that the submission is publishable. The server may create, based on the book data, a printable book, an e-book, and an audio book and make one or more of the printable book, the e-book, and the audio book available for acquisition.
MOBILE INTELLIGENT OUTSIDE SALES ASSISTANT
Systems, methods, and applications for mobile intelligent outside sales assistance are provided. Embodiments include receiving speech for recognition of an outside sales call; converting the speech for recognition to text; parsing the converted text into outside sales triples; storing the outside sales triples in an enterprise knowledge graph of a semantic graph database; generating real-time outside sales insights in dependence upon the speech of the outside sales call and the stored outside triples in the enterprise knowledge graph; and presenting the real-time outside sales insights to an outside sales agent.
MOBILE INTELLIGENT OUTSIDE SALES ASSISTANT
Systems, methods, and applications for mobile intelligent outside sales assistance are provided. Embodiments include receiving speech for recognition of an outside sales call; converting the speech for recognition to text; parsing the converted text into outside sales triples; storing the outside sales triples in an enterprise knowledge graph of a semantic graph database; generating real-time outside sales insights in dependence upon the speech of the outside sales call and the stored outside triples in the enterprise knowledge graph; and presenting the real-time outside sales insights to an outside sales agent.
CONTACT AND ACOUSTIC MICROPHONES FOR VOICE WAKE AND VOICE PROCESSING FOR AR/VR APPLICATIONS
A method to combine contact and acoustic microphones in a headset for voice wake and voice processing in immersive reality applications is provided. The method includes receiving, from a contact microphone, a first acoustic signal, determining a fidelity and a quality of the first acoustic signal, receiving, from an acoustic microphone, a second acoustic signal, and when the fidelity and quality of the first acoustic signal exceeds a pre-selected threshold, combining the first acoustic signal and the second acoustic signal to provide an enhanced acoustic signal to a smart glass user. A non-transitory, computer-readable medium storing instructions to cause a headset to perform the above method, and the headset, are also provided.
CONTACT AND ACOUSTIC MICROPHONES FOR VOICE WAKE AND VOICE PROCESSING FOR AR/VR APPLICATIONS
A method to combine contact and acoustic microphones in a headset for voice wake and voice processing in immersive reality applications is provided. The method includes receiving, from a contact microphone, a first acoustic signal, determining a fidelity and a quality of the first acoustic signal, receiving, from an acoustic microphone, a second acoustic signal, and when the fidelity and quality of the first acoustic signal exceeds a pre-selected threshold, combining the first acoustic signal and the second acoustic signal to provide an enhanced acoustic signal to a smart glass user. A non-transitory, computer-readable medium storing instructions to cause a headset to perform the above method, and the headset, are also provided.
Query rephrasing using encoder neural network and decoder neural network
A method comprising receiving first data representative of a query. A representation of the query is generated using an encoder neural network and the first data. Words for a rephrased version of the query are selected from a set of words comprising a first subset of words comprising words of the query and a second subset of words comprising words absent from the query. Second data representative of the rephrased version of the query is generated.
Multimodal based punctuation and/or casing prediction
Techniques for predicting punctuation and casing using multimodal fusion are described. An exemplary method includes processing generated text by: tokenizing the generated text into sub-words, and generating a sequence of lexical features for the sub-words using a pre-trained lexical encoder; processing audio of the audio by: generating a sequence of frame level acoustic embeddings using a pre-trained acoustic encoder on the audio, and generating task specific embeddings from the frame level acoustic embeddings; performing multimodal fusion of the sub-word level acoustic embeddings and the sequence of lexical features by: aligning the task specific embeddings to the sequence of lexical features, and combining the sequence of lexical features and aligned acoustic sequence; predicting punctuation and casing from the combined sequence of lexical features and aligned acoustic sequence; concatenating the sub-words of the text, and applying the predicted punctuation and casing; and outputting text having the predicted punctuation and casing.
Multimodal based punctuation and/or casing prediction
Techniques for predicting punctuation and casing using multimodal fusion are described. An exemplary method includes processing generated text by: tokenizing the generated text into sub-words, and generating a sequence of lexical features for the sub-words using a pre-trained lexical encoder; processing audio of the audio by: generating a sequence of frame level acoustic embeddings using a pre-trained acoustic encoder on the audio, and generating task specific embeddings from the frame level acoustic embeddings; performing multimodal fusion of the sub-word level acoustic embeddings and the sequence of lexical features by: aligning the task specific embeddings to the sequence of lexical features, and combining the sequence of lexical features and aligned acoustic sequence; predicting punctuation and casing from the combined sequence of lexical features and aligned acoustic sequence; concatenating the sub-words of the text, and applying the predicted punctuation and casing; and outputting text having the predicted punctuation and casing.
Receiving voice samples from listeners of media programs
Listeners to media programs provide feedback to creators or other entities associated with the media programs in the form of one or more spoken utterances. When a listener to a media program speaks one or more words to a microphone or other system, the words are captured and processed to determine an emotion of the listener, or to determine whether the words include any objectionable content. Data including the spoken words is captured and stored, and presented to the creator of the media program. Notifications of the utterances are provided to the creator, who may identify one of the utterances, and include the utterance in the media program, or invite the listener who provided the utterances to participate in the media program.