Patent classifications
G10L25/09
Musical sound evaluation device, evaluation criteria generating device, method for evaluating the musical sound and method for generating the evaluation criteria
A musical sound evaluation device includes a musical sound acquisition unit which acquires an inputted musical sound, a feature quantity calculation unit which calculates a feature quantity from the musical sound, a feature quantity distribution data acquisition unit which acquires feature quantity distribution data representing a distribution of respective feature quantities for a plurality of musical sounds previously acquired, an evaluation value calculation unit which calculates an evaluation value for the inputted musical sound based on the feature quantity calculated by the feature quantity calculation unit and the feature quantity distribution data acquired by the feature quantity distribution data acquisition unit, and an evaluation unit which evaluates the musical sound based on the evaluation value.
TIME-BASED FREQUENCY TUNING OF ANALOG-TO-INFORMATION FEATURE EXTRACTION
A sound recognition system including time-dependent analog filtered feature extraction and sequencing. An analog front end (AFE) in the system receives input analog signals, such as signals representing an audio input to a microphone. Features in the input signal are extracted, by measuring such attributes as zero crossing events and total energy in filtered versions of the signal with different frequency characteristics at different times during the audio event. In one embodiment, a tunable analog filter is controlled to change its frequency characteristics at different times during the event. In another embodiment, multiple analog filters with different filter characteristics filter the input signal in parallel, and signal features are extracted from each filtered signal; a multiplexer selects the desired features at different times during the event.
TIME-BASED FREQUENCY TUNING OF ANALOG-TO-INFORMATION FEATURE EXTRACTION
A sound recognition system including time-dependent analog filtered feature extraction and sequencing. An analog front end (AFE) in the system receives input analog signals, such as signals representing an audio input to a microphone. Features in the input signal are extracted, by measuring such attributes as zero crossing events and total energy in filtered versions of the signal with different frequency characteristics at different times during the audio event. In one embodiment, a tunable analog filter is controlled to change its frequency characteristics at different times during the event. In another embodiment, multiple analog filters with different filter characteristics filter the input signal in parallel, and signal features are extracted from each filtered signal; a multiplexer selects the desired features at different times during the event.
In-vehicle voice command recognition method and apparatus, and storage medium
An in-vehicle voice command recognition method and apparatus, and a storage medium. The method includes: acquiring a voice command inputted by a user; determining basic information of the user according to a pre-trained deep neural network (DNN) model; identifying contents of the voice command according to the basic information of the user, and determining at least one potential user intention according to the identified contents and a scenario page context at the time when the user inputs the voice command; determining a confidence level of the potential user intention according to the DNN model; determining a real user intention from the potential user intention according to the confidence level; and executing a corresponding action according to the real user intention. The embodiments of the present disclosure can effectively improve the correct recognition rate of voice commands.
In-vehicle voice command recognition method and apparatus, and storage medium
An in-vehicle voice command recognition method and apparatus, and a storage medium. The method includes: acquiring a voice command inputted by a user; determining basic information of the user according to a pre-trained deep neural network (DNN) model; identifying contents of the voice command according to the basic information of the user, and determining at least one potential user intention according to the identified contents and a scenario page context at the time when the user inputs the voice command; determining a confidence level of the potential user intention according to the DNN model; determining a real user intention from the potential user intention according to the confidence level; and executing a corresponding action according to the real user intention. The embodiments of the present disclosure can effectively improve the correct recognition rate of voice commands.
Robust feature extraction using differential zero-crossing counts
A low power sound recognition sensor is configured to receive an analog signal that may contain a signature sound. Sparse sound parameter information is extracted from the analog signal and compared to a sound parameter reference stored locally with the sound recognition sensor to detect when the signature sound is received in the analog signal. A portion of the sparse sound parameter information is differential zero crossing (ZC) counts. Differential ZC rate may be determined by measuring a number of times the analog signal crosses a threshold value during each of a sequence of time frames to form a sequence of ZC counts and taking a difference between selected pairs of ZC counts to form a sequence of differential ZC counts.
Robust feature extraction using differential zero-crossing counts
A low power sound recognition sensor is configured to receive an analog signal that may contain a signature sound. Sparse sound parameter information is extracted from the analog signal and compared to a sound parameter reference stored locally with the sound recognition sensor to detect when the signature sound is received in the analog signal. A portion of the sparse sound parameter information is differential zero crossing (ZC) counts. Differential ZC rate may be determined by measuring a number of times the analog signal crosses a threshold value during each of a sequence of time frames to form a sequence of ZC counts and taking a difference between selected pairs of ZC counts to form a sequence of differential ZC counts.
Time-based frequency tuning of analog-to-information feature extraction
A sound recognition system including time-dependent analog filtered feature extraction and sequencing. An analog front end (AFE) in the system receives input analog signals, such as signals representing an audio input to a microphone. Features in the input signal are extracted, by measuring such attributes as zero crossing events and total energy in filtered versions of the signal with different frequency characteristics at different times during the audio event. In one embodiment, a tunable analog filter is controlled to change its frequency characteristics at different times during the event. In another embodiment, multiple analog filters with different filter characteristics filter the input signal in parallel, and signal features are extracted from each filtered signal; a multiplexer selects the desired features at different times during the event.
Time-based frequency tuning of analog-to-information feature extraction
A sound recognition system including time-dependent analog filtered feature extraction and sequencing. An analog front end (AFE) in the system receives input analog signals, such as signals representing an audio input to a microphone. Features in the input signal are extracted, by measuring such attributes as zero crossing events and total energy in filtered versions of the signal with different frequency characteristics at different times during the audio event. In one embodiment, a tunable analog filter is controlled to change its frequency characteristics at different times during the event. In another embodiment, multiple analog filters with different filter characteristics filter the input signal in parallel, and signal features are extracted from each filtered signal; a multiplexer selects the desired features at different times during the event.
Method and apparatus for frame loss concealment in transform domain
The present document discloses a method and apparatus for compensating for a lost frame in a transform domain, comprising: calculating frequency-domain coefficients of a current lost frame using frequency-domain coefficients of one or more frames prior to the current lost frame, and performing frequency-time transform to obtain an initially compensated signal; and performing waveform adjustment, to obtain a compensated signal. Alternatively, extrapolation is performed for all or part of frequency points of the current lost frame using phases and amplitudes of corresponding frequency points of a plurality of previous frames to obtain phases and amplitudes of the corresponding frequency points of the current lost frame, to obtain frequency-domain coefficients of the corresponding frequency points, and frequency-time transform is performed to obtain a compensated signal. The above methods can be selected through a judgment algorithm to compensate for the current lost frame, thereby achieving a better compensation effect.