G10L2013/083

Question and answer processing method and electronic device for supporting the same

An electronic device is provided. The electronic device includes a communication interface comprising communication circuitry and a processor configured to functionally connect with the communication interface, wherein the processor is configured to: obtain a voice signal, obtain context information associated with the user in connection with obtaining the voice signal, determine first response information corresponding to the voice signal, if the context information meets a first condition, determine second response information corresponding to the voice signal, if the context information meets a second condition and send at least part of response information corresponding to the first response information or the second response information to an output device operatively connected with the electronic device or an external electronic device for the electronic device.

Computerized data-aware agent systems for retrieving data to serve a dialog between human user and computerized system

A system and method for data gathering system, comprising a data-aware knowledge base storing knowledge on relative costs of obtaining various data items; and a data retrieval decision-making processor operative, when an individual data element is sought to be retrieved, to determine whether or not to retrieve the data element by comparing at least one parameter representing need for the data element, also termed herein a utility value, with at least one parameter, retrieved from the data-aware knowledge base, which represents relative cost of obtaining the data element.

REAL-TIME SYSTEM FOR SPOKEN NATURAL STYLISTIC CONVERSATIONS WITH LARGE LANGUAGE MODELS

The techniques disclosed herein enable systems for spoken natural stylistic conversations with large language models. In contrast to many existing modalities for interacting with large language models that are limited to text, the techniques presented herein enable users to carry a fully spoken conversation with a large language model. This is accomplished by converting a user speech audio input to text and utilizing a prompt engine to analyze a sentiment expressed by the user. A large language model, having been trained on example conversations, by generating a text response as well as a style cue to express emotion in response to the sentiment expressed by speech audio input. A text-to-speech engine can subsequently interpret the text response and style cue to generate an audio output which emulates the sensation of human conversation.

SYSTEMS AND METHODS FOR PROVIDING NON-LEXICAL CUES IN SYNTHESIZED SPEECH

Systems and methods are disclosed for providing non-lexical cues in synthesized speech. Original text is analyzed to determine characteristics of the text and/or to derive or augment an intent (e.g., an intent code). Non-lexical cue insertion points are determined based on the characteristics of the text and/or the intent. One or more non-lexical cues are inserted at insertion points to generate augmented text. The augmented text is synthesized into speech, including converting the non-lexical cues to speech output.

TEXT NORMALIZATION BASED ON A DATA-DRIVEN LEARNING NETWORK
20180330729 · 2018-11-15 ·

Systems and processes for operating an intelligent automated assistant to perform text-to-speech conversion are provided. An example method includes, at an electronic device having one or more processors, receiving a text corpus comprising unstructured natural language text. The method further includes generating a sequence of normalized text based on the received text corpus; and generating a pronunciation sequence representing the sequence of the normalized text. The method further includes causing an audio output to be provided to the user based on the pronunciation sequence. At least one of the sequence of normalized text and the pronunciation sequence is generated based on a data-driven learning network.

Transliteration work support device, transliteration work support method, and computer program product

According to an embodiment, a transliteration work support apparatus include an input unit, an extraction unit, a presentation unit, a reception unit, and a correction unit. The input unit receives document information. The extraction unit extracts, as a correction part, a surface expression of the document information that matches a correction pattern expressing a plurality of surface expressions having the same regularity in way of correction in one form. The presentation unit presents a way of correction defined in accordance with the correction pattern used in the extraction of the correction part. The reception unit receives selection of the way of correction. The correction unit corrects the correction part based on the selected way of correction.

INTERNET-ENABLED AUDIO-VISUAL GRAPHING CALCULATOR

A method of graphically representing mathematical expressions in both audio and visual formats on a user device is described. Embodiments of the present invention include an Internet-enabled audio-visual graphing calculator that receives input from a user device in at least one of at least one of audio, visual, or Braille formats. An embodiment of the present invention interprets input received from the user device as a typeset mathematical expression, parses the typeset mathematical expression into an interpreted mathematical expression and compiles the interpreted mathematical expression into an evaluation function. At least one point is sampled on the evaluation function. The sampled evaluation function is rendered as a graph on a visual display of a user device. In an embodiment of the invention, an audible representation of the rendered graph is generated for playback on the user device.

Systems and methods for providing non-lexical cues in synthesized speech

Systems and methods may provide non-lexical cues in synthesized speech. A system may generate response text and a response intent based on user input. Non-lexical cue insertion points are determined based on the characteristics of the text and/or the intent. One or more non-lexical cues are inserted at insertion points to generate augmented text. The augmented text is synthesized into speech using speech units associated with the response text and the inserted response intent.

Methods and systems for language learning based on a series of pitch patterns
10019995 · 2018-07-10 · ·

A method for teaching a language, comprising: accessing, using a processor of a computer, an audio recording corresponding to a series of pitch patterns; accessing a cantillation representation of said series of pitch patterns, said cantillation representation comprising a plurality of cantillations; processing said audio recording to match the pitch patterns to the cantillations in said cantillation representation; calculating, using said processor, a start time and an end time for each of the series of cantillations as compared to said audio recording; outputting, using said processor, an aligned output representation comprising an identification of each of the cantillations, an identification of the start time for each of the cantillations, and an identification of the end time for each of the cantillations; receiving a request to play a requested pitch pattern; looking up said requested pitch pattern in said aligned output representation to retrieve one or more requested start times and one or more requested end times for said requested pitch pattern; and outputting said requested pitch pattern, said outputting comprising: playing said audio recording at the one or more requested start times until the one or more requested end times to output one or more instances of said requested pitch pattern from said audio recording, and displaying a textual representation of said audio recording, said displaying comprising: visually distinguishing a word of the textual representation to the audio recording, said visually distinguishing being performed based at least in part on said aligned output representation.

USER INTERFACE FOR GENERATING EXPRESSIVE CONTENT

Generation of expressive content is provided. An expressive synthesized speech system provides improved voice authoring user interfaces by which a user is enabled to efficiently author content for generating expressive output. An expressive synthesized speech system provides an expressive keyboard for enabling input of textual content and for selecting expressive operators, such as emoji objects or punctuation objects for applying predetermined prosody attributes or visual effects to the textual content. A voicesetting editor mode enables the user to author and adjust particular prosody attributes associated with the content for composing carefully-crafted synthetic speech. An active listening mode (ALM) is provided, which when selected, a set of ALM effect options are displayed, wherein each option is associated with a particular sound effect and/or visual effect. The user is enabled to rapidly respond with expressive vocal sound effects or visual effects while listening to others speak.