Patent classifications
G10L25/60
Electronic device and method of controlling thereof
An electronic device and a method for controlling the electronic device are disclosed. The electronic device of the disclosure includes a microphone, a memory storing at least one instruction, and a processor configured to execute the at least one instruction. The processor, by executing the at least one instruction, is configured to: obtain second voice data by inputting first voice data input via the microphone to a first model trained to enhance sound quality, obtain a weight by inputting the first voice data and the second voice data to a second model, and identify input data to be input to a third model using the weight.
Electronic device and method of controlling thereof
An electronic device and a method for controlling the electronic device are disclosed. The electronic device of the disclosure includes a microphone, a memory storing at least one instruction, and a processor configured to execute the at least one instruction. The processor, by executing the at least one instruction, is configured to: obtain second voice data by inputting first voice data input via the microphone to a first model trained to enhance sound quality, obtain a weight by inputting the first voice data and the second voice data to a second model, and identify input data to be input to a third model using the weight.
SYSTEM AND METHOD FOR ENHANCING MULTIMEDIA CONTENT WITH VISUAL EFFECTS AUTOMATICALLY BASED ON AUDIO CHARACTERISTICS
Exemplary embodiments of the present disclosure are directed towards system for enhancing multimedia content with visual effects based on audio characteristics, comprising computing device comprises multimedia content enhancing module enables end-user to record multimedia content using camera; enables to select audio track and combine with multimedia content recorded; sends audio track and multimedia content recorded to cloud server; cloud server comprising multimedia analyzing and visual effects retrieving module to receive and analyze beat characteristics of audio track and multimedia content recorded; categorize visual effects and filters and deliver to the computing device; multimedia content enhancing module displays categorized visual effects and filters on computing device and enables end-user to select and apply categorized visual effects and filters on multimedia content to create enhanced multimedia content; enables the end-user to share and post enhanced multimedia content on computing device.
SYSTEM AND METHOD FOR ENHANCING MULTIMEDIA CONTENT WITH VISUAL EFFECTS AUTOMATICALLY BASED ON AUDIO CHARACTERISTICS
Exemplary embodiments of the present disclosure are directed towards system for enhancing multimedia content with visual effects based on audio characteristics, comprising computing device comprises multimedia content enhancing module enables end-user to record multimedia content using camera; enables to select audio track and combine with multimedia content recorded; sends audio track and multimedia content recorded to cloud server; cloud server comprising multimedia analyzing and visual effects retrieving module to receive and analyze beat characteristics of audio track and multimedia content recorded; categorize visual effects and filters and deliver to the computing device; multimedia content enhancing module displays categorized visual effects and filters on computing device and enables end-user to select and apply categorized visual effects and filters on multimedia content to create enhanced multimedia content; enables the end-user to share and post enhanced multimedia content on computing device.
UNDERSTANDING AND RANKING RECORDED CONVERSATIONS BY CLARITY OF AUDIO
Systems and methods are provided for generating quality scores associated with a contact (e.g., a telephonic call including an agent) and with agents. In particular, the disclosed technology determines types of frames of content of the contact into a speech and/or a noise, the noise further classified into a standard noise and a non-standard noise. A frame type determiner determines a type of a frame based on a waveform analysis and/or use of speech and noise models that are trained through machine learning. The standard noise includes noise that is expected and consistent across contacts and agents (e.g., a hold music). The non-standard noise includes a noise that is unexpected in occasion and audio sources (e.g., a barking dog, a siren from street, and the like). The disclosed technology enables assessing contacts and agents based on issues associated with remote working environment that vary among agents.
UNDERSTANDING AND RANKING RECORDED CONVERSATIONS BY CLARITY OF AUDIO
Systems and methods are provided for generating quality scores associated with a contact (e.g., a telephonic call including an agent) and with agents. In particular, the disclosed technology determines types of frames of content of the contact into a speech and/or a noise, the noise further classified into a standard noise and a non-standard noise. A frame type determiner determines a type of a frame based on a waveform analysis and/or use of speech and noise models that are trained through machine learning. The standard noise includes noise that is expected and consistent across contacts and agents (e.g., a hold music). The non-standard noise includes a noise that is unexpected in occasion and audio sources (e.g., a barking dog, a siren from street, and the like). The disclosed technology enables assessing contacts and agents based on issues associated with remote working environment that vary among agents.
Method and apparatus for determining experience quality of VR multimedia
A method for determining experience quality of virtual reality (VR) multimedia includes, in a process of playing VR multimedia, obtaining a first sensory parameter, a second sensory parameter, and a third sensory parameter of the VR multimedia, where the first sensory parameter, the second sensory parameter, and the third sensory parameter are obtained by performing sampling separately according to at least two same perceptual dimensions, and are respectively parameters that affect fidelity experience, enjoyment experience, and interaction experience, and determining a mean opinion score (MOS) of the VR multimedia based on the first sensory parameter, the second sensory parameter, and the third sensory parameter of the VR multimedia. Because the third sensory parameter is a parameter that affects the interaction experience, an interaction feature of the VR multimedia is considered.
Method and apparatus for determining experience quality of VR multimedia
A method for determining experience quality of virtual reality (VR) multimedia includes, in a process of playing VR multimedia, obtaining a first sensory parameter, a second sensory parameter, and a third sensory parameter of the VR multimedia, where the first sensory parameter, the second sensory parameter, and the third sensory parameter are obtained by performing sampling separately according to at least two same perceptual dimensions, and are respectively parameters that affect fidelity experience, enjoyment experience, and interaction experience, and determining a mean opinion score (MOS) of the VR multimedia based on the first sensory parameter, the second sensory parameter, and the third sensory parameter of the VR multimedia. Because the third sensory parameter is a parameter that affects the interaction experience, an interaction feature of the VR multimedia is considered.
Sound signal processing system apparatus for avoiding adverse effects on speech recognition
A sound signal processing system includes: a sound signal processing apparatus executing non-linear signal processing on a collected sound signal collected by a microphone, and transmitting, to an information processing apparatus, both a pre-execution sound signal before the non-linear signal processing is executed and a post-execution sound signal after the non-linear signal processing is executed; and the information processing apparatus receiving the pre-execution sound signal and the post-execution sound signal from the sound signal processing apparatus, and executing first processing on the pre-execution sound signal and executing second processing on the post-execution sound signal, the second processing being different from the first processing.
Sound signal processing system apparatus for avoiding adverse effects on speech recognition
A sound signal processing system includes: a sound signal processing apparatus executing non-linear signal processing on a collected sound signal collected by a microphone, and transmitting, to an information processing apparatus, both a pre-execution sound signal before the non-linear signal processing is executed and a post-execution sound signal after the non-linear signal processing is executed; and the information processing apparatus receiving the pre-execution sound signal and the post-execution sound signal from the sound signal processing apparatus, and executing first processing on the pre-execution sound signal and executing second processing on the post-execution sound signal, the second processing being different from the first processing.