G10L15/083

Information processing device and information processing method

Provided is an information processing device that includes a determination unit that determines whether an object that outputs voice is a dialogue target related to voice dialogue based on a result of recognition of an input image, and a dialogue function unit that performs control related to the voice dialogue based on the determination. The dialogue function unit provides a voice dialogue function to the object based on the determination that the object being the dialogue target. Further provided is a method that includes determining whether an object that outputs voice is a dialogue target related to voice dialogue based on a result of recognition of an input image, and performing control related to the voice dialogue based on a result of the determining. The performing of the control further includes providing a voice dialogue function to the object based on the determination that the object is the dialogue target.

AUTOMATED ASSISTANT THAT DETECTS AND SUPPLEMENTS VARIOUS VEHICLE COMPUTING DEVICE CAPABILITIES

Implementations set forth herein relate to interactions, between vehicle computing devices and mobile computing devices, that reduce duplicative processes from occurring at either device. Reduction of such processes can be performed, in some instances, via communications between a vehicle computing device and a mobile computing device in order to determine, for example, how to uniquely render content at an interface of each respective computing device while the user is driving the vehicle. These communications can occur before a user has entered a vehicle, while the user is in the vehicle, and/or after a user has left the vehicle. For instance, just before a user enters a vehicle, a vehicle computing device can be primed for certain automated assistant interactions between the user and their mobile computing device. Alternatively, or additionally, the user can authorize the vehicle computing device to perform certain processes immediately after leaving the vehicle.

Edge Appliance to Provide Conversational Artificial Intelligence Based Software Agents

In some aspects, an edge appliance is placed in an active mode and causes a software agent that is based on a machine learning algorithm to engage in a conversation to take an order from a customer that is located at an order post. The edge appliance provides, using a communication interface, audio data that includes the conversation, to a communications system of a restaurant. The edge appliance provides, using the communication interface, a content of a cart associated with the order to a point-of-sale terminal of the restaurant. If the edge appliance determines, using the communication interface, that a microphone of the communication system is receiving audio input from an employee, the edge appliance automatically transitions the edge appliance from the active mode to an override mode, enabling the employee to receive a remainder of the order from the customer.

Audio generation system and method

A system for generating audio content in dependence upon an input audio track comprising audio corresponding to one or more sound sources, the system comprising an audio input unit operable to input the input audio track to one or more models, each representing one or more of the sound sources, and an audio generation unit operable to generate, using the one or more models, one or more audio tracks each comprising a representation of the audio contribution of the corresponding sound sources of the input audio track, wherein the generated audio tracks comprise one or more variations relative to the corresponding portion of the input audio track.

Methods and apparatus to determine audio source impact on an audience of media

Methods, apparatus, systems and articles of manufacture to determine audio source impact on an audience of media are disclosed. A disclosed example apparatus includes at least one memory, instructions in the apparatus, and processor circuitry to execute the instructions to: divide audio of monitored media into successive audio segments; perform speaker identification on the audio segments; generate confidence values for speakers identified in the audio segments; generate speaker identification data for ones of the speakers having respective confidence values that satisfy a threshold; and analyze the speaker identification data to determine a speaker impact on audience ratings data.

BONE CONDUCTION TRANSDUCERS FOR PRIVACY

A method for routing audio content through an electronic device that is to be worn by a user. The method obtains a communication and determines whether the communication is private. In response to determining that the communication is private, the method drives a bone conduction transducer of the electronic device with an audio signal associated with the communication. In response to determining that the communication is not private, however, the method drives a speaker of the electronic device with the audio signal.

MONITORING OF VIDEO CONFERENCE TO IDENTIFY PARTICIPANT LABELS

In one aspect, a device may include at least one processor and storage accessible to the at least one processor. The storage may include instructions executable by the at least one processor to monitor participation in a video conference and, based on the monitoring of participation, identify an identifier related to a first participant of the video conference. The instructions may also be executable to, based on the identifying of the identifier, present a graphical representation of the identifier on a display along with video showing the first participant participating in the video conference.

Generating customized meeting insights based on user interactions and meeting media
11689379 · 2023-06-27 · ·

Methods, systems, and non-transitory computer readable storage media are disclosed for generating meeting insights based on media data and device input data. For example, in one or more embodiments, the disclosed system utilizes analyzes media data including audio data or video data and inputs to client devices associated with a meeting to determine a portion of the meeting (e.g., a portion of the media data) that is relevant for a user. In response to determining a relevant portion of the meeting, the system generates an electronic message including content related to the relevant portion of the meeting. The system then provides the electronic message to a client device of the user. For instance, in one or more embodiments, the system generates a meeting summary, meeting highlights, or action items related to the media data to provide to the client device of the user. In one or more embodiments, the system also uses the summary, highlights, or action items to train a machine-learning model for use with future meetings.

System, method, and computer-readable medium that facilitate voice biometrics user authentication

A system, method, and computer readable medium that facilitate user authentication via voice biometrics in a network system featuring interactive voice response system access is provided. The voice biometric authentication mechanisms alleviate identity theft occurring via specific interactive voice response transactions. A voice biometrics authentication system interfaces with an interactive network platform and may be hosted by a third party provider of voice biometric technologies.

Agent assisting system for processing customer enquiries in a contact center

A system is disclosed that assists contact center agents with servicing customer enquiries. A wireless caller with an enquiry calls a contact center and is prompted to leave a voice message and accept a text callback as a response. The voice message is processed by a speech analytics system that extracts certain keywords in the voice message and develops a transcript as well. Upon selecting an available agent to provide the response, the keywords and transcript are presented to the agent along with a draft text response, formulated by the system using the identified keywords. Additional resources may be provided as necessary to the agent, who can also review the original audio recording. Upon reviewing and potentially editing the text response, the agent causes the text to be sent to the wireless caller, which may be sent as an SMS text, or in some other form.