H04M3/42204

Conversation assistant

Usage data associated with a user of a telephonic device is accessed by a remote learning engine. A service or a product that is likely to be of interest to the user is identified by the remote learning engine based on the accessed usage data. A recommended voice bundle application for the user is determined by the remote learning engine based on the accessed usage data, the recommended voice bundle application being a voice application that, when executed by the telephonic device, results in a simulated multi-step spoken conversation between the telephonic device and the user to enable the user to receive the identified service or the identified product. A recommendation associated with the recommended voice bundle application is transmitted from the remote learning engine to the telephonic device. The recommendation is presented by the telephonic device to the user through voice communications. The user through voice communications has accepted the recommendation determining is determined by the telephonic device. In response to determining that the user has accepted the recommendation, the recommended voice bundle application on the telephonic device is executed by the telephonic device.

Voice enablement and disablement of speech processing functionality

Methods and devices for enabling and disabling applications using voice are described herein. In some embodiments, an individual speak an utterance to their electronic device, which may send audio data representing the utterance to a backend system. The backend system may generate text data representing the utterance, and may determine that an intent of the utterance was for an application to be enabled or disabled for their user account on the backend system. If, for instance, the intent was to enable the application, the backend system may receive one or more rules for performing functionalities of the application, as well as one or more sample templates of sample utterances and sample responses that future utterances may use when requesting the application. Furthermore, one or more invocation phrases that may be used within the future utterances to invoke the application may be received, along with slot values for the sample templates.

REDUCING TELEPHONE NETWORK TRAFFIC THROUGH UTILIZATION OF PRE-CALL INFORMATION
20240205331 · 2024-06-20 ·

Implementations receive, via a client device, user input to initiate a telephone call with an entity, and, in response to receiving the user input to initiate the telephone call with the entity and prior to initiating the telephone call with the entity: obtain pre-call information that is stored in association with the entity, and cause the pre-call information that is stored in association with the entity to be provided for presentation to the user via the client device. The pre-call information may include any information that would be provided for presentation to a user subsequent to initiation of the telephone call with the entity. Further, implementations determine, based on user consumption of the pre-call information, whether to (1) proceed with initiating the telephone call with the entity, or (2) refrain from initiating the telephone call with the entity, and cause the client device to implement the appropriate action.

REUSABLE MULTIMODAL APPLICATION
20190124148 · 2019-04-25 ·

A method and system are disclosed herein for accepting multimodal inputs and deriving synchronized and processed information. A reusable multimodal application is provided on the mobile device. A user transmits a multimodal command to the multimodal platform via the mobile network. The one or more modes of communication that are inputted are transmitted to the multimodal platform(s) via the mobile network(s) and thereafter synchronized and processed at the multimodal platform. The synchronized and processed information is transmitted to the multimodal application. If required, the user verifies and appropriately modifies the synchronized and processed information. The verified and modified information are transferred from the multimodal application to the visual application. The final result(s) are derived by inputting the verified and modified results into the visual application.

VOICE RECOGNITION-BASED DIALING
20190116260 · 2019-04-18 ·

A voice recognition-based dialing method and a voice recognition-based dialing system are provided. The methods includes: determining a recognition result based on a user's voice input, at least one acoustic model and at least one language model, where the at least one acoustic model and the at least one language model are obtained based on information collected in an electronic device. The system includes: obtain at least one acoustic model and at least one language model based on information collected in an electronic device; and determine a recognition result based on a user's voice input, the at least one acoustic model and the at least one language model. The acoustic models and the language models are updated based on the information collected in the electronic device, which may be helpful to the voice recognition-based dialing.

Pictures using voice commands
10257401 · 2019-04-09 · ·

A system and method is disclosed for enabling user friendly interaction with a camera system. Specifically, the inventive system and method has several aspects to improve the interaction with a camera system, including voice recognition, gaze tracking, touch sensitive inputs and others. The voice recognition unit is operable for, among other things, receiving multiple different voice commands, recognizing the vocal commands, associating the different voice commands to one camera command and controlling at least some aspect of the digital camera operation in response to these voice commands. The gaze tracking unit is operable for, among other things, determining the location on the viewfinder image that the user is gazing upon. One aspect of the touch sensitive inputs provides that the touch sensitive pad is mouse-like and is operable for, among other things, receiving user touch inputs to control at least some aspect of the camera operation. Another aspect of the disclosed invention provides for gesture recognition to be used to interface with and control the camera system.

Virtual telephony assistant

Examples are disclosed for placing an outbound telephony call using a smart speaker as a proxy device for a telephone account. A smart speaker may receive a verbal command to initiate the telephone call that includes identifying information for the called party. The verbal command may be forwarded to a smart speaker server where it may be converted to a computer instruction to initiate the telephone call. The computer instruction may then be forwarded to a communications server. The communications server may determine the telephone number to call based on the identifying information for the called party. The communications server may then establish a first communication link between itself and a telephony endpoint of the called party and a second communication link between itself and the smart speaker device. The communication links may then be bridged into a communications session between the smart speaker device and the telephony endpoint of the called party.

Conversation assistant

A graphical user interface (GUI) on a display of an electronic device visually presents to a user a group of voice bundles that are available for use on the electronic device. Each voice bundle includes a software application for performing a call flow that includes a sequence of prompt instructions and grammar instructions executable to result in a simulated multi-step spoken interaction between the electronic device and the user. An input is received from the user entered through the GUI indicating a selection of a voice bundle from the group of voice bundles. In response to the input, a remote server is identified that stores the selected voice bundle. Network communications is established between the electronic device and the remote server. The selected voice bundle is located on the remote server. A copy of the selected voice bundle is downloaded from the remote server onto the electronic device.

AUTOMATED MESSAGING

Approaches provide for generating an introductory text message to be delivered to a recipient when a voice-enabled communications device is used to send a message to the recipient for a first time. For example, audio input data that includes an instruction to send a text message can be received and an application can analyze the audio input data to determine an instruction to send a text message, a message body, and an intended recipient of the text message. The application can determine whether a text message has previously been sent to the intended recipient using the voice-enabled communications device or another device associated with the customer's account. In the situation where a text message has been sent, a text message is generated that includes the message body and the application causes the text message to be sent to the intended recipient. In the situation where it is determined that this is the first time a text message is being sent to the intended recipient using the voice-enabled communications device or another device associated with the customer's account, an introductory text message is generated and the application causes the introductory text message and the text message that includes the message body to be sent to the intended recipient.

Reusable multimodal application

A method and system are disclosed herein for accepting multimodal inputs and deriving synchronized and processed information. A reusable multimodal application is provided on the mobile device. A user transmits a multimodal command to the multimodal platform via the mobile network. The one or more modes of communication that are inputted are transmitted to the multimodal platform(s) via the mobile network(s) and thereafter synchronized and processed at the multimodal platform. The synchronized and processed information is transmitted to the multimodal application. If required, the user verifies and appropriately modifies the synchronized and processed information. The verified and modified information are transferred from the multimodal application to the visual application. The final result(s) are derived by inputting the verified and modified results into the visual application.