IPIQ

H04M2201/405

Accessory for a voice-controlled device

11823681 · 2023-11-21 ·

Amazon Technologies, Inc.

This disclosure describes techniques and systems for encoding instructions in audio data that, when output on a speaker of a first device in an environment, cause a second device to output content in the environment. In some instances, the audio data has a frequency that is inaudible to users in the environment. Thus, the first device is able to cause the second device to output the content without users in the environment hearing the instructions. In some instances, the first device also outputs content, and the content output by the second device is played at an offset relative to a position of the content output by the first device.

AI avatar coaching system based on free speech emotion analysis for managing in place of CS managers

11825022 · 2023-11-21 ·

CS Sharing Inc.

Ji Eun Lim

Disclosed is an AI avatar coaching system based on a free speech emotion analysis for acting for CS managers. The AI avatar coaching system includes: an AI avatar coach server generating an AI avatar coach video for practical counseling training, and providing the generated AI avatar coach video; an educated/inexperienced counselor terminal receiving and outputting the AI avatar coach video provided from the AI avatar coach server; a purchase customer terminal performing a voice call for a counseling inquiry of a purchase customer; a counselor terminal performing the voice call for a counselor to perform counseling processing for the counseling inquiry of the purchase customer; and an omni channel customer/company consulting service server setting a voice call session for the voice call between the purchase customer terminal and the counselor terminal, and transmitting a report for the counseling inquiry and the counseling processing, in order to act for counseling services for multiple selling company customers. By the AI avatar coaching system based on a free speech emotion analysis for acting for CS managers, there is an effect that a counseling video of an experienced counselor is configured to be simulated into an avatar video and provided to educated/inexperienced counselors to learn a counseling/response method and effectively train the counselors through a specific practical cases.

Document identification device, document identification method, and program

11462212 · 2022-10-04 ·

NIPPON TELEGRAPH AND TELEPHONE CORPORATION

A document identification device that improves class identification precision of multi-stream documents is provided. The document identification device includes: a primary stream expression generation unit that generates a primary stream expression, which is a fixed-length vector of a word sequence corresponding to each speaker's speech recorded in a setting including a plurality of speakers, for each speaker; a primary multi-stream expression generation unit that generates a primary multi-stream expression obtained by integrating the primary stream expression; a secondary stream expression generation unit that generates a secondary stream expression, which is a fixed-length vector generated based on the word sequence of each speaker and the primary multi-stream expression, for each speaker; and a secondary multi-stream expression generation unit that generates a secondary multi-stream expression obtained by integrating the secondary stream expression.

SYSTEMS AND METHODS FOR PRIORITIZING EMERGENCY CALLS

20220303391 · 2022-09-22 ·

Systems for and methods of determining the priority of a call interaction include receiving a call interaction from a call center; validating, by a validation and transcription engine, that the call interaction is authentic; converting, by the validation and transcription engine, the call interaction into text; extracting, by a data calculation engine, organization, location, and time information from the text; calculating, by the data calculation engine, a priority of the call interaction from the extracted information and the text by determining an important of words in the text and correlating the words to a priority class using a pre-trained algorithm that is trained on emergency-type and emergency services-type language; determining that the call interaction should be transmitted to a queue of the call center for initial handling by a call center agent; and transmitting the call interaction, the calculated priority, and the extracted information to the call center.

Security tool

11289080 · 2022-03-29 ·

Bank Of America Corporation

A memory stores a first voice record of a first user and a second voice record of a second user. A processor receives from a device of the first user a recording of a voice conversation between the first and second users and compares the recording with the first and second voice records to determine that the voice conversation is between the first and second users. The processor also determines that the first and second users intend to conduct a transaction with each other and determines a transaction amount for the transaction. The processor further communicates, to the device of the first user, a message, receives, from the device of the first user, a confirmation of the transaction in response to the message, and in response to the confirmation, initiates the transaction.

INTEGRATING AUTOMATIC SPEECH RECOGNITION AND COMMUNITY QUESTION ANSWERING

20210327412 · 2021-10-21 ·

Intuit Inc.

Systems and methods for providing customized automatic speech recognition (ASR) in a customer support system are disclosed. In an example method, one or more data sources for training an ASR language model associated with the customer support system are identified, and one or more weighting models are selected, each weighting model applying a corresponding weight to each data source of the one or more data sources. The ASR language model is then trained based at least in part on the one or more data sources and the one or more weighting models, and a transcript may be generated for one or more customer support calls of the customer support system using the trained ASR language model.

Captioned Telephone Services Improvement

20210250441 · 2021-08-12 ·

Internet Protocol captioned telephone service often utilizing Automated Speech Recognition can be utilized with conference calls to separate out each of the various parties' speech as text, such as with text bubbles differentiated by caller on a device of the user. Additionally, a prioritized vocabulary can be provided for each user that is not shared with a public so that if the user utilizes words in their speech not common in the general public, those words can be more accurately identified by the telephone service. The service may learn and apply that vocabulary and/or the user may provide words to the service.

MULTI-MODAL CONVERSATIONAL AGENT PLATFORM

20210158811 · 2021-05-27 ·

A method includes receiving data characterizing an utterance of a query associated with a tenant; providing, to an automated speech recognition engine, the received data and a profile selected from a plurality of profiles based on the tenant, the profile configuring the automated speech recognition engine to process the received data; receiving, from the automated speech recognition engine, a text string characterizing the query; and processing, via an ensemble of natural language agents configured based on the tenant, the text string characterizing the query to determine a textual response to the query, the textual response including at least one word from a first lexicon associated with the tenant. Related systems, methods, apparatus, and computer readable mediums are also described.

Electronic apparatus for recognizing keyword included in your utterance to change to operating state and controlling method thereof

10978048 · 2021-04-13 ·

Samsung Electronics Co., Ltd.

An apparatus comprising one or more processors, a communication circuit, and a memory for storing instructions, which when executed, performs a method of recognizing a user utterance. The method comprises: receiving first data associated with a user utterance, performing, a first determination to determine whether the user utterance includes the first data and a specified word, performing a second determination to determine whether the first data includes the specified word, transmitting the first data to an external server, receiving a text generated from the first data by the external server, performing a third determination to determine whether the received text matches the specified word, and determining whether to activate the voice-based input system based on the third determination.

Security Tool

20210110819 · 2021-04-15 ·

Patent classifications

H04M2201/405