G10L25/00

Generating neural network outputs using insertion commands

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing sequence modeling tasks using insertions. One of the methods includes receiving a system input that includes one or more source elements from a source sequence and zero or more target elements from a target sequence, wherein each source element is selected from a vocabulary of source elements and wherein each target element is selected from a vocabulary of target elements; generating a partial concatenated sequence that includes the one or more source elements from the source sequence and the zero or more target elements from the target sequence, wherein the source and target elements arranged in the partial concatenated sequence according to a combined order; and generating a final concatenated sequence that includes a finalized source sequence and a finalized target sequence, wherein the finalized target sequence includes one or more target elements.

Automated functional testing systems and methods of making and using the same
11796423 · 2023-10-24 · ·

An automatic robot control system and methods relating thereto are described. These systems include components such as a touch screen panel (“TSP”) robot controller for controlling a TSP robot, a camera robot controller for controlling a camera robot and an audio robot controller for controlling an audio robot. The TSP robot operates inside a TSP testing subsystem, the camera robot operates inside a camera testing subsystem, and the audio robot operates inside an audio testing subsystem. Inside the audio testing subsystem, an audio signals measurement system, using a bi-directional coupling, controls the operation of the audio robot controller. In this control scheme, a test application controller is designed to control the different types of subsystem robots. Methods relating to TSP, camera, and audio robots, and their controllers, taken individually or in combination, for automatic testing of device functionalities are also described.

Virtual assistant for generating personalized responses within a communication session

Intelligent agents (IA) for automatically generating responses to content within a communication session (CS) are disclosed. An IA is trained to target the responses to a user and the user's context within the CS. An IA receives CS content that includes natural language expressions encoding users' conversations and determines content features based on natural language models. The content features indicate intended semantics of the expressions. The IA identifies likely-relevant content to the targeted user, to generate a response for. Identifying such content includes determining a relevance of the content based on content features, a context of the CS, a user-interest model, and a content-relevance model. Identifying the likely-relevant content to respond to is based on the determined relevance of the content and relevance thresholds. Various responses to the identified portions of the content are automatically generated and provided based on a natural language response-generation model targeted to the user.

Continual learning for multi modal systems using crowd sourcing
11809965 · 2023-11-07 · ·

Systems, methods, and devices are disclosed for training a model. Media data is separated into one or more clusters, each cluster based on a feature from a first model. The media data of each cluster is sampled and, based on an analysis of the sampled media data, an accuracy of the media data of each cluster is determined. The accuracy is associated with the feature from the first model. Based on a subset dataset of the media data being outside a threshold accuracy, the subset dataset is automatically forwarded to a crowd source service. Verification of the subset dataset is received from the crowd source service, and the verified subset dataset is added to the first model.

Optimization method for implementation of mel-frequency cepstral coefficients

An optimization method for an implementation of mel-frequency cepstral coefficients is provided. The optimization method includes the following steps: performing a framing step, including using a 400×16 static random access memory to temporarily store a plurality of sampling points of a sound signal with overlap, and decomposing the sound signal into a plurality of frames. Each of the plurality of frames is 400 of the sampling points, there is an overlapping region between adjacent two of the plurality of frames, and the overlapping region includes 240 of the sampling points. The optimization method further includes performing a windowing step, which includes multiplying each of the plurality of frames by a window function in a bit-level design, and the optimization method includes performing a fast Fourier transform (FFT) step, which includes applying a 512 point FFT on a frame signal to obtain a corresponding frequency spectrum.

Voice input processing method and electronic device for supporting the same

An electronic device is provided. The electronic device includes a microphone, a communication circuitry, an indicator configured to provide at least one visual indication, and a processor configured to be electrically connected with the microphone, the communication circuitry, and the indicator, and a memory. The memory stores instructions, when executed, cause the processor to receive a first voice input through the microphone, perform a first voice recognition for the first voice input, if a first specified word for waking up the electronic device is included in a result of the first voice recognition, display a first visual indication through the indicator, receive a second voice input through the microphone, perform a second voice recognition for the second voice input, and if a second specified word corresponding to the first visual indication is included in a result of the second voice recognition, wake up the electronic device.

Summarized logical forms for controlled question answering
11829420 · 2023-11-28 · ·

Systems, devices, and methods discussed herein provide improved autonomous agent applications that are configured to generate automated answers to a question using summarized logical forms (SLFs). A myriad of techniques may be utilized to manually or automatically generate one or more summarized logical forms for an answer, where the summarized logical form(s) identifies the main entities/informative portions of the answer. Instead of indexing the whole of the answer as in conventional methods, an answer can be indexed using the summarized logical forms. A subsequent query may be matched to the SLF and the answer may be provided in response to the question. By indexing the answer with its informative portions, the speed and accuracy of identifying the answer is improved.

Generating diverse and natural text-to-speech samples

A method of generating diverse and natural text-to-speech (TTS) samples includes receiving a text and generating a speech sample based on the text using a TTS model. A training process trains the TTS model to generate the speech sample by receiving training samples. Each training sample includes a spectrogram and a training text corresponding to the spectrogram. For each training sample, the training process identifies speech units associated with the training text. For each speech unit, the training process generates a speech embedding, aligns the speech embedding with a portion of the spectrogram, extracts a latent feature from the aligned portion of the spectrogram, and assigns a quantized embedding to the latent feature. The training process generates the speech sample by decoding a concatenation of the speech embeddings and a quantized embeddings for the speech units associated with the training text corresponding to the spectrogram.

Voice application platform

Among other things, requests are received from voice assistant devices expressed in accordance with different corresponding protocols of one or more voice assistant frameworks. Each of the requests represents a voiced input by a user to the corresponding voice assistant device. The received requests are re-expressed in accordance with a common request protocol. Based on the received requests, responses to the requests are expressed in accordance with a common response protocol. Each of the responses is re-expressed according to a protocol of the framework with respect to which the corresponding request was expressed. The responses are sent to the voice assistant devices for presentation to the users.

Systems and method for third party natural language understanding service integration
11423910 · 2022-08-23 · ·

A virtual agent that utilizes an in-house natural language understanding (NLU) service and integrates a third party NLU service. The third-party NLU service is integrated with the virtual agent via a transformation script that establishes a transformation boundary through which communications are directed for adjustment and conditioning. The third party NLU service communicates with the virtual agent via an application programming interface (API). The virtual agent receives an utterance from a user via a chat session and provides the utterance to the third party NLU service. The third party NLU service may return intents, entities, and confidence, generate and return a response, and/or take actions within the cloud-based platform via the API, dependent upon the degree of integration. The virtual agent then provides a response to the user via the chat session.