G10L15/00

Techniques for language independent wake-up word detection
11545146 · 2023-01-03 · ·

A user device configured to perform wake-up word detection in a target language. The user device comprises at least one microphone (430) configured to obtain acoustic information from the environment of the user device, at least one computer readable medium (435) storing an acoustic model (150) trained on a corpus of training data (105) in a source language different than the target language, and storing a first sequence of speech units obtained by providing acoustic features (110) derived from audio comprising the user speaking a wake-up word in the target language to the acoustic model (150), and at least one processor (415,425) coupled to the at least one computer readable medium (435) and programmed to perform receiving, from the at least one microphone (430), acoustic input from the user speaking in the target language while the user device is operating in a low-power mode, applying acoustic features derived from the acoustic input to the acoustic model (150) to obtain a second sequence of speech units corresponding to the acoustic input, determining if the user spoke the wake-up word at least in part by comparing the first sequence of speech units to the second sequence of speech units, and exiting the low-power mode if it is determined that the user spoke the wake-up word.

Computer systems exhibiting improved computer speed and transcription accuracy of automatic speech transcription (AST) based on a multiple speech-to-text engines and methods of use thereof

In some embodiments, an exemplary inventive system for improving computer speed and accuracy of automatic speech transcription includes at least components of: a computer processor configured to perform: generating a recognition model specification for a plurality of distinct speech-to-text transcription engines; where each distinct speech-to-text transcription engine corresponds to a respective distinct speech recognition model; receiving at least one audio recording representing a speech of a person; segmenting the audio recording into a plurality of audio segments; determining a respective distinct speech-to-text transcription engine to transcribe a respective audio segment; receiving, from the respective transcription engine, a hypothesis for the respective audio segment; accepting the hypothesis to remove a need to submit the respective audio segment to another distinct speech-to-text transcription engine, resulting in the improved computer speed and the accuracy of automatic speech transcription and generating a transcript of the audio recording from respective accepted hypotheses for the plurality of audio segments.

System and method for language-based service hailing

Systems and methods are provided for language-based service hailing. Such system may comprise one or more processors and a memory storing instructions that, when executed by the one or more processors, cause the computing system to obtain a plurality of speech samples, each speech sample comprising one or more words spoken in a language, train a neural network model with the speech samples to obtain a trained model for determining languages of speeches, obtain a voice input, identify at least one language corresponding to the voice based at least on applying the trained model to the voice input, and communicate a message in the identified language.

Adversarial anonymization and preservation of content
11544460 · 2023-01-03 · ·

Systems and methods for anonymizing content suggestive of a particular characteristic while preserving relevant content are disclosed. An example method may be performed by one or more processors of a protection system and include defining an anonymization loss indicative of an accuracy at which a trained discriminator model can predict a particular characteristic, defining a content loss indicative of a difference between latent representations of versions of a document, defining a combined objective function incorporating the anonymization and content losses, extracting and anonymizing suggestive content from training documents while preserving relevant content, and adversarially training, using the associated accuracies and differences in the combined objective function, a transformation model to transform a given document representative of credentials of a given person possessing the particular characteristic into an anonymized document maximizing a predicted uncertainty of the trained discriminator model while simultaneously maximizing an amount of relevant information about the person preserved.

Voice review analysis
11545138 · 2023-01-03 · ·

Systems and methods for Artificial Intelligence (AI)-based analysis of oral reviews are provided. An example method includes prompting a user to provide an oral review concerning a subject; providing the user with an interface configured to receive the oral review; receiving, via the interface, the oral review concerning the subject in a free format; generating, based on the oral review, a text for review and presenting the text for review to the user; and providing, to the user, an option to publish the text for review via at least one social media. Generating the text for review may include removing filler words from the oral review and converting the oral review from the free format to a format according to a grammar rule of at least one human language.

SYSTEM AND METHOD FOR ESTABLISHING DENTAL TREATMENT ENVIRONMENT
20220415489 · 2022-12-29 · ·

A system for establishing a dental treatment environment, includes: a head-mounted device provided at a dental clinic to be mounted on a patient's head, the head-mounted device having an image display unit and an ear-mounted speaker, a microphone for converting a sound including the voice of the medical staff in charge of the patient into an electric signal; a voice recognition module for recognizing the voice of the medical staff in charge from the electric sound input from the microphone; a content module storing multiple image contents for relaxing the patient mentally physically; a user interface having a content selection unit configured such that the patient can select a play content provided to the image display unit from the multiple image contents; and an output signal generating module for generating an output signal that is output to the head-mounted device.

Generating IoT-based notification(s) and provisioning of command(s) to cause automatic rendering of the IoT-based notification(s) by automated assistant client(s) of client device(s)

Remote automated assistant component(s) generate client device notification(s) based on a received IoT state change notification that indicates a change in at least one state associated with at least one IoT device. The generated client device notification(s) can each indicate the change in state associated with the at least one IoT device, and can optionally indicate the at least one IoT device. Further, the remote automated assistant component(s) can identify candidate assistant client devices that are associated with the at least one IoT device, and determine whether each of the one or more of the candidate assistant client device(s) should render a corresponding client device notification. The remote automated assistant component(s) can then transmit a corresponding command to each of the assistant client device(s) it determines should render a corresponding client device notification, where each transmitted command causes the corresponding assistant client device to render the corresponding client device notification.

Voice signal enhancing method and device

The disclosure discloses a voice signal enhancing method and device, which divide a voice signal at the present scene into multiple frame signals based on a preset time interval; feed multiple frame signals into a trained neural network based on a preset step size, perform convolution operations on multiple frame signals through skip-connected convolutional layers to obtain multiple enhanced frame signals; superpose each enhanced frame signal according to the time domain of each enhanced frame signal to obtain an enhanced voice signal. Compared with the prior art, the present disclosure automatically enhances voice signals through the neural network without manual interference, so the effects and the application scenes of voice enhancement is not necessary to be limited by the preset method and method designers, thereby reducing the occurrence frequency of signal distortion and extra noises, which in turn improves the effects of the voice signal enhancement.

Systems and methods for recording relevant portions of a media asset
11540005 · 2022-12-27 · ·

Systems and methods are presented herein for recording portions of a media asset relevant to recording criteria. A media application receives input indicating the recording criteria and identifying a first keyword. The media application accesses a data structure to identify a first node associated with the first keyword. The data structure includes the first node and a plurality of nodes connected to the first node via a plurality of paths. The media application receiving audio component data for a portion of the media asset extracts a term from the audio component data, and identifies a second node in the data structure that is associated with the extracted term. The media application calculates a path score for the portion of the media asset based on a path size in the data structure between the first node and the second node. When the score is high enough, the portion of the media asset is recorded.

Apparatus, systems and methods for determining a commentary rating

Commentary rating determination systems and methods determine a commentary rating for commentary about a subject media content event that has been generated by a community member. An exemplary embodiment receives video information acquired by a 360° video camera, identifies a physical object from the received video information, determines a physical attribute associated with the identified physical object, wherein the determined physical attribute describes a characteristic of the identified physical object, compares the determined physical attribute of the identified physical object with a plurality of predefined physical object attributes stored in a database, and in response to identifying one of the plurality of predefined physical object attributes that matches the determined physical attribute, associates the quality value of the identified one of the plurality of predefined physical object attributes with the identified physical object. Then, the commentary rating is determined for the commentary based on the associated quality value.