G10L13/07

Accessibility techniques for presentation of symbolic expressions

Methods for presenting symbolic expressions such as mathematical, scientific, or chemical expressions, formulas, or equations are performed by a computing device. One method includes: displaying a first portion of a symbolic expression within a first area of a display screen; while in a first state in which the first area is selected for aural presentation, aurally presenting first information related to the first portion of the symbolic expression; while in the first state, detecting particular user input; in response to detecting the particular user input, performing the steps of: transitioning from the first state to a second state in which a second area, of the display, is selected for aural presentation; determining second information associated with a second portion, of the symbolic expression, that is displayed within the second area; in response to determining the second information, aurally presenting the second information.

Speech synthesizer, and speech synthesis method and computer program product utilizing multiple-acoustic feature parameters selection

A speech synthesizer includes a statistical-model sequence generator, a multiple-acoustic feature parameter sequence generator, and a waveform generator. The statistical-model sequence generator generates, based on context information corresponding to an input text, a statistical model sequence that comprises a first sequence of a statistical model comprising a plurality of states. The multiple-acoustic feature parameter sequence generator, for each speech section corresponding to each state of the statistical model sequence, selects a first plurality of acoustic feature parameters from a first set of acoustic feature parameters extracted from a first speech waveform stored in a speech database and generates a multiple-acoustic feature parameter sequence that comprises a sequence of the first plurality of acoustic feature parameters. The waveform generator generates a distribution sequence based on the multiple-acoustic feature parameter sequence and generates a second speech waveform based on a second set of acoustic feature parameters generated based on the distribution sequence.

Speech synthesizer, and speech synthesis method and computer program product utilizing multiple-acoustic feature parameters selection

A speech synthesizer includes a statistical-model sequence generator, a multiple-acoustic feature parameter sequence generator, and a waveform generator. The statistical-model sequence generator generates, based on context information corresponding to an input text, a statistical model sequence that comprises a first sequence of a statistical model comprising a plurality of states. The multiple-acoustic feature parameter sequence generator, for each speech section corresponding to each state of the statistical model sequence, selects a first plurality of acoustic feature parameters from a first set of acoustic feature parameters extracted from a first speech waveform stored in a speech database and generates a multiple-acoustic feature parameter sequence that comprises a sequence of the first plurality of acoustic feature parameters. The waveform generator generates a distribution sequence based on the multiple-acoustic feature parameter sequence and generates a second speech waveform based on a second set of acoustic feature parameters generated based on the distribution sequence.

SYSTEMS AND METHODS FOR RECONSTRUCTING VOICE PACKETS USING NATURAL LANGUAGE GENERATION DURING SIGNAL LOSS

A device may receive and convert audio data to text data in real-time, and may detect a network fluctuation that causes missing voice packets. The device may process partial text and context of the text data, with a model, to generate a new phrase, and may generate a response phoneme for the new phrase. The device may utilize a text embedding model to generate a text embedding for the response phoneme, and may process the audio data, with the model, to generate a target voice sequence. The device may utilize an audio embedding model to generate an audio embedding for the target voice sequence, and may combine the text embedding and the audio embedding to generate an embedding input vector. The device may process the embedding input vector, with an audio synthesis model, to generate a final voice response, and may provide the audio data and the final voice response.

SYSTEMS AND METHODS FOR RECONSTRUCTING VOICE PACKETS USING NATURAL LANGUAGE GENERATION DURING SIGNAL LOSS

A device may receive and convert audio data to text data in real-time, and may detect a network fluctuation that causes missing voice packets. The device may process partial text and context of the text data, with a model, to generate a new phrase, and may generate a response phoneme for the new phrase. The device may utilize a text embedding model to generate a text embedding for the response phoneme, and may process the audio data, with the model, to generate a target voice sequence. The device may utilize an audio embedding model to generate an audio embedding for the target voice sequence, and may combine the text embedding and the audio embedding to generate an embedding input vector. The device may process the embedding input vector, with an audio synthesis model, to generate a final voice response, and may provide the audio data and the final voice response.

Natural language processing to merge related alert messages for accessibility

A method for merging incoming alerts for accessibility is described. A first input alert and a second input alert intended for presentation by a screen reader are received. If the first input alert and the second input alert have arrived with a specified time interval, the first input alert and the second input alert are combined into an output alert. The output alert is sent to a screen reader for presentation.

Natural language processing to merge related alert messages for accessibility

A method for merging incoming alerts for accessibility is described. A first input alert and a second input alert intended for presentation by a screen reader are received. If the first input alert and the second input alert have arrived with a specified time interval, the first input alert and the second input alert are combined into an output alert. The output alert is sent to a screen reader for presentation.

ABBREVIATION CLASSIFICATION FOR SPEECH SYNTHESIS AND RECOGNITION
20240144911 · 2024-05-02 ·

A method, computer system, and computer program product are provided for synthesizing and/or recognizing speech. Training data is provided to a machine learning model, wherein the training data comprises a plurality of labeled examples of acronyms and initialisms. The machine learning model is trained to classify strings into an acronym class or an initialism class. An input string is classified with the machine learning model into the acronym class or the initialism class. Based on the classifying, a pronunciation is generated for the input string.

ABBREVIATION CLASSIFICATION FOR SPEECH SYNTHESIS AND RECOGNITION
20240144911 · 2024-05-02 ·

A method, computer system, and computer program product are provided for synthesizing and/or recognizing speech. Training data is provided to a machine learning model, wherein the training data comprises a plurality of labeled examples of acronyms and initialisms. The machine learning model is trained to classify strings into an acronym class or an initialism class. An input string is classified with the machine learning model into the acronym class or the initialism class. Based on the classifying, a pronunciation is generated for the input string.

Natural language processing to merge related alert messages for accessibility

An apparatus and a computer program product for merging incoming alerts for accessibility are described. Two input alerts intended for presentation by a screen reader are received. If the two input alerts have arrived with a specified time interval, the two input alerts are combined into an output alert. The output alert is sent to a screen reader for presentation.