Patent classifications
G10L21/12
AUDIO WAVEFORM DISPLAY USING MAPPING FUNCTION
The described technology is generally directed towards providing a visible waveform representation of an audio signal, by processing the audio signal with a polynomial (e.g., cubic) mapping function. Coefficients of the polynomial mapping function are predetermined based on constraints (e.g., slope information and desired range of the resultant curve), and whether the plotted audio waveform corresponds to sound field quantities or power quantities. Once the visible representation of the reshaped audio waveform is displayed, audio and/or video editing operations can be performed, e.g., by time-aligning other audio or video with the reshaped audio waveform, and/or modifying the reshaped audio waveform to change the underlying audio data.
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND COMPUTER PROGRAM
[Problem] The purpose is to cause a user to perceive an output position of an important part in information presented by a spoken utterance. [Solution] An information processing device is provided. The information processing device includes an output control unit that controls output of a spoken utterance related to information presentation. The output control unit outputs the spoken utterance, and visually displays an output position of an important part of the spoken utterance. In addition, an information processing method is provided. The information processing method includes controlling, by a processor, output of a spoken utterance related to information presentation. The controlling further includes outputting the spoken utterance and visually displaying an output position of an important part of the spoken utterance.
Transforming Audio Content into Images
A technique is described herein for transforming audio content into images. The technique may include: receiving the audio content from a source; converting the audio content into a temporal stream of audio features; and converting the stream of audio features into one or more images using one or more machine-trained models. The technique generates the image(s) based on recognition of: semantic information that conveys one or more semantic topics associated with the audio content; and sentiment information that conveys one or more sentiments associated with the audio content. The technique then generates an output presentation that includes the image(s), which it provides to one or more display devices for display thereat. The output presentation serves as a summary of salient semantic and sentiment-related characteristics of the audio content.
Transforming Audio Content into Images
A technique is described herein for transforming audio content into images. The technique may include: receiving the audio content from a source; converting the audio content into a temporal stream of audio features; and converting the stream of audio features into one or more images using one or more machine-trained models. The technique generates the image(s) based on recognition of: semantic information that conveys one or more semantic topics associated with the audio content; and sentiment information that conveys one or more sentiments associated with the audio content. The technique then generates an output presentation that includes the image(s), which it provides to one or more display devices for display thereat. The output presentation serves as a summary of salient semantic and sentiment-related characteristics of the audio content.
Audio recording/playback device
According to one embodiment, an electronic device includes a hardware processor configured to display, on a screen, a first bar corresponding to utterance of a first user of a first zone, a second bar corresponding to utterance of a second user of a second zone, and a seek bar corresponding to a zone of a sound included in audio data when the audio data is played back. The hardware processor plays back, when a first position on the seek bar is specified, audio data from a first time point corresponding to the first position.
Audio waveform display using mapping function
The described technology is generally directed towards providing a visible waveform representation of an audio signal, by processing the audio signal with a polynomial (e.g., cubic) mapping function. Coefficients of the polynomial mapping function are predetermined based on constraints (e.g., slope information and desired range of the resultant curve), and whether the plotted audio waveform corresponds to sound field quantities or power quantities. Once the visible representation of the reshaped audio waveform is displayed, audio and/or video editing operations can be performed, e.g., by time-aligning other audio or video with the reshaped audio waveform, and/or modifying the reshaped audio waveform to change the underlying audio data.
Method and apparatus for sound event detection robust to frequency change
Disclosed is a sound event detecting method including receiving an audio signal, transforming the audio signal into a two-dimensional (2D) signal, extracting a feature map by training a convolutional neural network (CNN) using the 2D signal, pooling the feature map based on a frequency, and determining whether a sound event occurs with respect to each of at least one time interval based on a result of the pooling.
Method and apparatus for sound event detection robust to frequency change
Disclosed is a sound event detecting method including receiving an audio signal, transforming the audio signal into a two-dimensional (2D) signal, extracting a feature map by training a convolutional neural network (CNN) using the 2D signal, pooling the feature map based on a frequency, and determining whether a sound event occurs with respect to each of at least one time interval based on a result of the pooling.
Residual syncing of sound with light to produce a starter sound at live and latent events
Systems, methods, and devices for synchronizing sound with light to produce a starter sound for reducing a listener's perceived gap between reception of an action image produced by an action at a distance from a listener and an action sound produced by the action. In an embodiment, a method includes predicting when a distant action will occur to produce a predicted action time. The method also includes generating a starter sound from a sound generation location near the listener. The starter sound is generated according to the predicted action time and according to the distant action. The starter sound is generated at a time such that the starter sound arrives at a listener's location at or after the action image arrives at the listener's location and prior to arrival of the action sound produced by the action.
Residual syncing of sound with light to produce a starter sound at live and latent events
Systems, methods, and devices for synchronizing sound with light to produce a starter sound for reducing a listener's perceived gap between reception of an action image produced by an action at a distance from a listener and an action sound produced by the action. In an embodiment, a method includes predicting when a distant action will occur to produce a predicted action time. The method also includes generating a starter sound from a sound generation location near the listener. The starter sound is generated according to the predicted action time and according to the distant action. The starter sound is generated at a time such that the starter sound arrives at a listener's location at or after the action image arrives at the listener's location and prior to arrival of the action sound produced by the action.