G10L15/00

Multimodal transmission of packetized data
11705121 · 2023-07-18 · ·

A system of multi-modal transmission of packetized data in a voice activated data packet based computer network environment is provided. A natural language processor component can parse an input audio signal to identify a request and a trigger keyword. Based on the input audio signal, a direct action application programming interface can generate a first action data structure, and a content selector component can select a content item. An interface management component can identify first and second candidate interfaces, and respective resource utilization values. The interface management component can select, based on the resource utilization values, the first candidate interface to present the content item. The interface management component can provide the first action data structure to the client computing device for rendering as audio output, and can transmit the content item converted for a first modality to deliver the content item for rendering from the selected interface.

Methods and systems for managing communication sessions

A voice enabled device can assist a computing device, such as a server, in processing/analyzing a voice input. The voice enabled device can initiate a network communication session and transmit the voice input to the computing device. The computing device can classify the voice input as a type of communication session (e.g., conversation, etc. . . . ). Based on the type of communication session, the computing device can either remain in communication with the voice enabled device and continue to process voice input or terminate the communication after instructing the voice enabled device to process the voice input.

Phrase vector learning device, method, and program

An appropriate vector of any phrase can be generated. A lattice construction unit 212 constructs a lattice structure formed by links binding adjacent word or phrase candidates based on a morphological analysis result and a dependency analysis result of input text. A first learning unit 213 performs learning of a neural network A for estimating nearby word or phrase candidates from word or phrase candidates based on the lattice structure. A vector generation unit 214 acquires a vector of each of the word or phrase candidates from the neural network A and sets the vector as learning data. A second learning unit performs learning of a neural network B for vectorizing the word or phrase candidates based on the learning data.

Barrier-free intelligent voice system and control method thereof
11705126 · 2023-07-18 ·

A barrier-free intelligent voice system and a method for controlling thereof, wherein multiple words are recognized from a voice audio to create multiple independent semantic units. Meanwhile, the system can continuously determine whether they are one of multiple voice tags created by the user. Thereafter, a target object, a program command, and a remark corresponding to the voice tag can be determined based on the successfully compared voice tag combination. Accordingly, a corresponding program can be started or a remote device can be triggered to operate. The present disclosure can be regarded as an AI intelligent voice processing engine. By allowing users to define different types of voice tag combinations, it can eliminate the grammatical and semantic analysis of natural language processing, eliminate speech translation differences and errors between different languages, effectively reduce the amount of calculations, increase the processing speed of the system, minimize system judgment errors.

RETRIEVAL DEVICE

A retrieval device 10 includes an input unit 11 configured to receive a search query from a user, a retrieval unit 12 configured to calculate a degree of fitness between the search query and each of a plurality of pieces of retrieval target data, a query expansion unit 13 configured to generate an expanded search query, and a policy determination unit 14 configured to determine which of a first process and a second process is to be executed on the basis of the degree of fitness for each piece of the retrieval data calculated by the retrieval unit 12. The first process is presenting the retrieval target data having a high degree of fitness to the user. The second process is proposing to the user that the retrieval unit is caused to calculate the degree of fitness for each piece of the retrieval target data using the expanded search query.

Speech signal processing system facilitating natural language processing using audio transduction
11705151 · 2023-07-18 ·

Systems and methods transmit, to a user device across a network, digital communication(s) thereby facilitating displaying the digital communication(s) via a user interface of the user device. Based on a user of the user device providing user input(s) in response to the digital communication(s) response data related to physical location(s) are received and data processing is performed thereon to determine whether additional data collection sequence(s) should be provided. Based on determining an additional data collection sequence should be provided, a condition-specific data collection sequence is provided via the user interface to facilitate obtaining condition-specific data related to a condition at a physical location, where the condition-specific data includes audio data collected via the user device and where the obtaining the condition-specific data comprises using a speech signal processing system to perform audio transduction to generate the audio data from a speech signal and facilitate performing natural language processing thereon.

Ambient lighting system with projected image and method for controlling an ambient lighting system

An ambient lighting system for automobiles configured to illuminate a passenger compartment, comprises: an optical system in turn comprising: a plurality of RGB LED light sources; a structural support on which the light sources are placed; and a contact surface on which a user can interact, and through which light rays can exit. The ambient lighting system also includes: a control unit configured to control the RGB LED light sources in order to present light animations; and a touch sensor configured to detect touch of the contact surface in a defined region, which is configured to define, along the optical system at least one soft key. The ambient lighting system further includes: a voice command recognition system electronically connected to the control unit and configured to recognize voice commands; and a projector configured to project an image, such as a symbol and/or logo, on the contact surface.

Ambient lighting system with projected image and method for controlling an ambient lighting system

An ambient lighting system for automobiles configured to illuminate a passenger compartment, comprises: an optical system in turn comprising: a plurality of RGB LED light sources; a structural support on which the light sources are placed; and a contact surface on which a user can interact, and through which light rays can exit. The ambient lighting system also includes: a control unit configured to control the RGB LED light sources in order to present light animations; and a touch sensor configured to detect touch of the contact surface in a defined region, which is configured to define, along the optical system at least one soft key. The ambient lighting system further includes: a voice command recognition system electronically connected to the control unit and configured to recognize voice commands; and a projector configured to project an image, such as a symbol and/or logo, on the contact surface.

ENHANCING VIEWING EXPERIENCE BY ANIMATED TRACKING OF USER SPECIFIC KEY INSTRUMENTS

Systems and methods are provided for identifying a key instrument in an event. One example method includes receiving a capture of the event and identifying, at a first computing device, the event. The key instrument is identified at the first computing device. An indicator to apply to and/or around the identified key instrument is generated for display.

Language-agnostic Multilingual Modeling Using Effective Script Normalization

A method includes obtaining a plurality of training data sets each associated with a respective native language and includes a plurality of respective training data samples. For each respective training data sample of each training data set in the respective native language, the method includes transliterating the corresponding transcription in the respective native script into corresponding transliterated text representing the respective native language of the corresponding audio in a target script and associating the corresponding transliterated text in the target script with the corresponding audio in the respective native language to generate a respective normalized training data sample. The method also includes training, using the normalized training data samples, a multilingual end-to-end speech recognition model to predict speech recognition results in the target script for corresponding speech utterances spoken in any of the different native languages associated with the plurality of training data sets.