Patent classifications
G10L2021/065
Neural network model for generation of compressed haptic actuator signal from audio input
A method comprises inputting an audio signal into a machine learning circuit to compress the audio signal into a sequence of actuator signals. The machine learning circuit being trained by: receiving a training set of acoustic signals and pre-processing the training set of acoustic signals into pre-processed audio data. The pre-processed audio data including at least a spectrogram. The training further includes training the machine learning circuit using the pre-processed audio data. The neural network has a cost function based on a reconstruction error and a plurality of constraints. The machine learning circuit generates a sequence of haptic cues corresponding to the audio input. The sequence of haptic cues is transmitted to a plurality of cutaneous actuators to generate a sequence of haptic outputs.
Automatically Captioning Audible Parts of Content on a Computing Device
- Asa Jonas Ivry Block ,
- Elliott Charles Burford ,
- Anthony Felice Tripaldi ,
- Stefanie Bianca Pitaro ,
- Heather Patricia Luipold ,
- Brian Kemler ,
- Kelsie Hope Van Deman ,
- Nadav Bar ,
- Robert James Berry ,
- Daniel Cohen ,
- Michelle Ramanovich ,
- THOMAS Weedon Hume ,
- Nicole Kiana Bleuel ,
- Benjamin Schlesinger ,
- Justin Wooyoung Lee ,
- Kevin Rocard ,
- Eric Laurent
Techniques and computing devices are described that automatically caption content directly from audio data being output from content sources, unlike other captioning systems which often rely on information contained in audio signals being sent to speakers. The disclosed techniques and computing devices may analyze metadata to determine whether the audio data is suitable for captioning or whether the audio data is some other type of audio data. Responsive to identifying audio data for captioning, the disclosed techniques and computing devices can generate a description of audible sounds interpreted from the audio data, providing for the automatic captioning of content and making audible content accessible to many users who have difficulty hearing or are otherise unable to listen to content.
SYSTEM AND METHOD FOR REPRODUCING TACTILE CONTENT USING SPATIAL INFORMATION
Disclosed are a system and method for reproducing tactile content using spatial information, which enhances a user’s tactile sense of music by converting a sound of music played in a space into tactile sense in real time without any sense of difference between tactile and auditory sense and automatically correcting tactile information by reflecting a user location and spatial characteristics so that performance or image information may be realized as tactile content using spatial information and the user location.
Glasses with closed captioning, voice recognition, volume of speech detection, and translation capabilities
The glasses with display may include a bridge, two temples hingedly coupled to the bridge, and a directional microphone array, the directional microphone array including two or more microphones positioned on the bridge or the temples. The glasses with display may also include a user microphone array, the user microphone array including one or more microphones positioned on the temples and oriented toward the mouth of a user wearing the glasses with display or one or more bone conduction microphones. In addition, the glasses with display include two lenses positioned in the bridge, at least one of the lenses including a display, the display visible by the user, the display including one or more of a directional display, closed caption display, and user volume display. The glasses with display additionally include a processor adapted to receive audio signals from the directional microphone array and the user microphone array, or from a separate mobile device, the processor adapted to control the display.
PERFORMING ARTIFICIAL INTELLIGENCE SIGN LANGUAGE TRANSLATION SERVICES IN A VIDEO RELAY SERVICE ENVIRONMENT
Video relay services, communication systems, non-transitory machine-readable storage media, and methods are disclosed herein. A video relay service may include at least one server configured to receive a video stream including sign language content from a video communication device during a real-time communication session. The server may also be configured to automatically translate the sign language content into a verbal language translation during the real-time communication session without assistance of a human sign language interpreter. Further, the server may be configured to transmit the verbal language translation during the real-time communication session.
APPARATUS AND A SYSTEM FOR SPEECH AND/OR HEARING THERAPY AND/OR STIMULATION
The present disclosure refers to solutions within the field of apparatuses or devices for speech and hearing exercising, for instance improving the awareness of persons with hearing and speech impairments to their own voice and surrounding sounds, allowing to creatively experiment with their own senses and to visualise the sound of their voice and/or additional elements, thereby accelerating the learning process and improving the interaction between patients and therapists.
Communication system for processing audio input with visual display
A reference acoustic input is processed into a quantization representation such that the quantization representation comprises acoustic components determined from the reference acoustic input, wherein the acoustic components comprise amplitude, rhythm, and pitch frequency of the reference acoustic input. A visual representation is generated that simultaneously depicts the acoustic components comprising amplitude, rhythm, and pitch frequency of the reference acoustic input. A user spoken input may be received and similarly processed and displayed.
WEARABLE VIBROTACTILE SPEECH AID
A method for training vibrotactile speech perception in the absence of auditory speech can comprise selecting a first word, generating a first control signal configured to cause at least one vibrotactile transducer to vibrate against a person's body with a first vibration pattern based on the first word, sampling a second word, generating a second control signal configured to cause a vibrotactile transducer to vibrate against the person's body with a second vibration pattern based on the second word, and presenting a comparison between the first word and the second word to the person. An apparatus for training vibrotactile speech perception can comprise array of vibrotactile transducers can be in contact with the person's body. The array of vibrotactile transducers can replicate a vibration pattern based on one or more spoken words.
Systems and methods for assisting the hearing-impaired using machine learning for ambient sound analysis and alerts
Systems and Methods for assisting the hearing-impaired are described. The methods rely on obtaining audio signals from the ambient environment of a hearing-impaired person. The audio signals are analyzed by a machine learning model that can classify audio signals into audio categories (e.g. Emergency, Animal Sounds) and audio types (e.g. Ambulance Siren, Dog Barking) and notify the user leveraging a mobile or wearable device. The user can configure notification preferences and view historical logs. The machine learning classifier is periodically trained externally based on labelled audio samples. Additional system features include an audio amplification option and a speech to text option for transcribing human speech to text output.
AUDIO IMPROVEMENT USING CLOSED CAPTION DATA
Methods and systems are described herein for improving audio for hearing impaired content consumers. An example method may comprise determining a content asset. Closed caption data associated with the content asset may be determined. At least a portion of the closed caption data may be determined based on a user setting associated with a hearing impairment. Compensating audio comprising a frequency translation associated with at least the portion of the closed caption data may be generated. The content asset may be caused to be output with audio content comprising the compensating audio and the original audio.