Patent classifications
H04M2201/39
PRESENTATION OF PART OF TRANSCRIPT BASED ON DETECTION OF DEVICE NOT PRESENTING CORRESPONDING AUDIO
In one aspect, an apparatus may include at least one processor and storage accessible to the at least one processor. The storage may include instructions executable by the at least one processor to receive a transcription of audio from a first client device. The audio may be detected at the first client device and may be streamed from the first client device as part of a video conference. The instructions may also be executable to determine that a second client device is not presenting a first part of the audio. Based on the determination, the instructions may be executable to send a first part of the transcription to the second client device and/or to present the first part the transcription at the second client device.
Selective performance of automated telephone calls to reduce latency and/or duration of assistant interaction
Implementations are directed to using an assistant to initiate automated telephone calls with entities. Some implementations identify an item of interest, identify a group of entities associated with the item, and initiate the calls with the entities. During a given call with a given entity, the assistant can request a status update regarding the item, and determine a temporal delay before initiating another call with the given entity to request a further status update regarding the item based on information received responsive to the request. Other implementations receive a request to perform an action on behalf of a user, identify a group of entities that can perform the action, and initiate a given call with a given entity. During the given call, the assistant can initiate an additional call with an additional entity, and generate notification(s), for the user, based on result(s) of the given call and/or the additional call.
Intelligent agent for interactive service environments
Techniques are described for providing information during a service session, using an intelligent agent. The intelligent agent executes as a process to monitor communications exchanged during a service session between an individual and a service representative (SR) within a service environment. The agent analyzes the communications to identify questions or other topics that are posed by the individual during the service session. The agent retrieves stored data related to such questions or other topics, and generates a message to address each question or other topic. The message is injected into the service session to be presented to the individual, to supplement the conversation that is taking place between the SR and the individual. In some implementations, the agent monitors the communications, generates the message, and/or injects the message into the service session at least partly autonomously of any explicit action taken by the SR.
Indicating callers for incoming voice calls on a shared speech-enabled device
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for indicating callers for incoming voice calls to a shared device among multiple users. The methods, systems, and apparatus include actions receiving an incoming voice call, determining a calling number and a called number from the incoming voice call, identifying a user account that corresponds to the called number, determining a contact name for the calling number based on contact entries for the user account, and providing the audible contact name for output to the device speaker.
GENERATION OF AUTOMATED MESSAGE RESPONSES
Systems, methods, and devices for computer-generating responses and sending responses to communications when the recipient of the communication is unavailable are disclosed. An individual may send a message (either audio or text) to a recipient. The recipient may be unavailable to contemporaneously respond to the message (e.g., the recipient may be performing an action that makes is difficult or impractical for the recipient to contemporaneously respond to the audio message). When the recipient is unavailable, a response to the message is generated and sent without receiving an instruction from the recipient to do so. The response may be sent to the message originating individual, and content of the response may thereafter be sent to the recipient to receive feedback regarding the correctness of the response. Alternatively, the response content may first be sent to the recipient to receive the feedback, and thereafter the response may be sent to the message originating individual.
Systems and methods for providing notifications within a media asset without breaking immersion
Systems and methods for providing notifications without breaking media immersion. A notification delivery application receives notification data while a media device provides a media asset. In response to receiving the notification data while the media device provides the media asset, the notification delivery application generates a voice model based on a voice detected in the media asset. The notification delivery application converts the notification data to synthesized speech using the voice model and generates, by the media device, the synthesized speech for output at an appropriate point in the media asset based on contextual features of the media asset.
INDICATING CALLERS FOR INCOMING VOICE CALLS ON A SHARED SPEECH-ENABLED DEVICE
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for indicating callers for incoming voice calls. The methods, systems, and apparatus include actions receiving an incoming voice call, determining a calling number and a called number from the incoming voice call, identifying a user account that corresponds to the called number, determining a contact name for the calling number based on contact entries for the user account, and providing the contact name for output.
Systems and methods for filtering unwanted sounds from a conference call using voice synthesis
To filter unwanted sounds from a conference call, a first voice signal is captured by a first device during a conference call and converted into corresponding text, which is then analyzed to determine that a first portion of the text was spoken by a first user and a second portion of the text was spoken by a second user. If the first user is relevant to the conference call while the second user is not, the first voice signal is prevented from being transmitted into the conference call, the first portion of text is converted into a second voice signal using a voice profile of the first user to synthesize the voice of the first user, and the second voice signal is then transmitted into the conference call. The second portion of text is not converted into a voice signal, as the second user is determined not to be relevant.
Selective performance of automated telephone calls to reduce latency and/or duration of assistant interaction
Implementations are directed to using an assistant to initiate automated telephone calls with entities. Some implementations identify an item of interest, identify a group of entities associated with the item, and initiate the calls with the entities. During a given call with a given entity, the assistant can request a status update regarding the item, and determine a temporal delay before initiating another call with the given entity to request a further status update regarding the item based on information received responsive to the request. Other implementations receive a request to perform an action on behalf of a user, identify a group of entities that can perform the action, and initiate a given call with a given entity. During the given call, the assistant can initiate an additional call with an additional entity, and generate notification(s), for the user, based on result(s) of the given call and/or the additional call.
VOICE DIALOGUE PROCESSING METHOD AND APPARATUS
The present application discloses a voice dialogue processing method and apparatus. The voice dialogue processing method includes: determining a voice semantics corresponding to a user voice to be processed; determining a reply sentence for the voice semantics based on a dialogue management engine, a training sample set of which is constructed from a dialogue business customization file including at least one dialogue flow, and the dialogue flow includes a plurality of dialogue nodes in a set order; and generating a customer service voice for replying to the user voice according to the determined reply sentence.