METHODS TO ASSIST VERBAL COMMUNICATION FOR BOTH LISTENERS AND SPEAKERS
20230122715 ยท 2023-04-20
Inventors
Cpc classification
International classification
Abstract
Methods implemented in a system utilizing computing programs for a speaker and a listener in conversation are provided. Aspects include (i) a reminder provisioner for a speaker which is triggered according to speed, pitch or volume of the speaker's speech, (ii) a speech training provisioner for a speaker, and (iii) an application which records and plays back difficult conversation to understand.
Claims
1. A system for assisting speakers in conversation comprising: one or more microphone built in a device such as a phone, a portable computer, a wearable device or a hearing device for hard-of-hearing people such as hearing aid and cochlear implant, which is held by a speaker or a listener or is located in the environment; one or more electronic device a speaker or a listener brings to give a speaker feedback; and one or more computer program which triggers feedback according to audio data said microphone receives, which is evaluated by one or more of: the speech speed calculated by character or word per second or other metrics; the pitch/frequency (Hz) and its transition; and the volume (dB).
2. The system of claim 1, wherein a speaker gets feedback by one or more of: a haptic feedback by vibration through an electronic device users have; a visual feedback by screen or user interface of an electronic device users have; and an aural feedback by a speaker or an audio output device users have.
3. The system of claim 1, wherein audio data is evaluated whether it fits within range of the minimum and the maximum value with regards to one or more of speech speed, pitch and volume.
4. The system of claim 1, wherein the threshold values (minimum and maximum values) in evaluation of speakers' speech, which are utilized to determine when to give users feedback, are configured manually by users or by the system, which sets preset threshold values for usually difficult sound or by the system, which provides personalized values according to each user's hearing capability by utilizing the previously recorded audio data labeled as difficult and training the system as described in claim 6.
5. The system of claim 2, wherein said feedback to a speaker comprises speech training by visualization of one or more of pitch, speed and volume of a speaker's speech through a screen or other user interfaces, which shows one or more of: charts, text or numeric information regarding the speaker's speech; one of multiple graphical objects changing the color, size or shape; and content with the factor of gamification or coaching utilizing said 2 elements;
6. A system for assisting listeners in conversation comprising: one or more microphone built in a device such as a phone, a portable computer, a wearable device or a hearing device for hard-of-hearing people such as hearing aid and cochlear implant, which is held by a speaker or a listener or is located in the environment; one or more electronic device a speaker or a listener brings to give a user a feedback; and one or more computer program which records conversation and lets the user play back or save the audio data which captures conversation part which is difficult to understand so that users can understand the missed conversation immediately or later when users have time to check back, through a trigger by users, by the system which sets preset threshold values in evaluating speech as described in claim 1, or by the system which provides the personalized values for threshold according to each user's hearing capability by utilizing previously recorded audio data labeled as difficult and training the system.
7. The system of claim 6, wherein duration of said audio data to be played back or saved is short enough to comfortably check or listen back to, and the option of the duration of one audio data to be played back or saved comprises one of: a preset value by the system ranging between 0 second and 30 seconds; or a customized value configured by the user in the system.
8. The system of claim 6, wherein the configurable setting comprises changing one or more of speed, pitch and volume of said audio data to be played back or saved so that a user can understand better, by the user or by the system which provides the personalized values according to each user's hearing capability by utilizing previously recorded audio data labeled as difficult and training the system.
Description
DESCRIPTION OF DRAWINGS
[0006]
[0007]
[0008]
[0009]
DETAILED DESCRIPTION
[0010] In the following description of verbal communication assisting technique implementations reference is made to the accompanying drawings which form a part hereof, and in which are shown, by way of illustration, specific implementations in which the verbal communication assisting technique can be practiced. It is understood that other implementations can be utilized and structural changes can be made without departing from the scope of the verbal communication assisting technique implementations.
[0011]
[0012] Component 1
[0013] The system enables a listener (104) to let a speaker (102) know the fact that the speaker's speech is difficult to understand by the system on behalf of the listener by evaluating speed, pitch and volume of the speech.
[0014] For having input, the system is operable with any type of end-user computing device (106, 108) which has a microphone (110, 112) such as a mobile phone, a portable computer, a wearable device (Apple Watch, Fitbit or Galaxy Watch among others) or a hearing device for hard-of-hearing people such as hearing aid and cochlear implant.
[0015] Upon receiving audio input (114, 116) from the microphone, a computing program in the system evaluates factors of the speech (202) as below:
[0016] Speech speed: voice data is translated into text after processing the data through transcription (audio-to-text) tool (such as Javascript SpeechRecognition API) and then the length of the text divided by the duration of speech calculates character/word per second, which can be used as a metric to evaluate the speed of speech.
[0017] Volume: voice data is translated into numeric value through a computing program such as sound volume detection program in p5.js library in JavaScript.
[0018] Pitch: voice data is translated into numeric value through a computing program such as pitch detection program in ml5.js library (CREPE) in JavaScript.
[0019] Each value is evaluated whether it is within range of minimum and maximum value and according to the evaluation, a feedback to a speaker (102) is triggered in the system (206).
[0020] In an example, the preset value for threshold values (minimum/maximum) is below, which can be configured by the user:
[0021] Speech speed (character per second): In an example of English, 3 characters per second is set as the maximum value and no value is set for minimum value.
[0022] Volume: 45 dB for minimum value and 65 dB for maximum value.
[0023] Pitch: The duration of voice whose pitch range is within 1% has to be less than 30% (maximum value). In speech-language pathology, speaking with rich tone (rich change of pitch) is considered easy for hard-of-hearing people to understand.
[0024] The configuration of the threshold value can be done manually by the user, or automatically by the system which sets preset value of normally difficult sound to understand, or learns each user's hearing preference/capability from the audio data labeled as difficult as later described in Component 3.
[0025] Upon receiving trigger information by said evaluation, the system gives feedback (204) in such ways as below: [0026] The system gives speakers a haptic feedback through a wristband with a vibrator, a mobile phone or wearable devices (such as Apple Watch, FitBit or Galaxy Watch) which can be programmed to give vibration to user. [0027] The system gives speakers a visual feedback by showing numeric information or graphical representation on speech speed, pitch or volume through screen or user interface computing devices have, which include mobile phone, tablets or wearable devices among others. [0028] The system gives speakers an aural feedback by playing sound by a loudspeaker equipped in said computing devices.
[0029]
[0030] Component 2
[0031] Referring again to
[0032] For having input, same as in Component 1, the system is operable with any type of end-user computing device (106, 108) which has a microphone (110, 112) such as a mobile phone, a portable computer, a wearable device (Apple Watch, Fitbit or Galaxy Watch among others) or a hearing device for hard-of-hearing people such as hearing aid and cochlear implant.
[0033] Upon receiving audio data through the microphone (114), a computing program in the system evaluates factors of the speech as below (302):
[0034] Speech speed: voice data is translated into text after processing the data through transcription (audio-to-text) tool (such as Javascript SpeechRecognition API) and then the length of the text divided by the duration of speech calculates character/word per second, which can be used as a metric to evaluate the speed of speech.
[0035] Volume: voice data is translated into numeric value through a computing program such as volume detection program in p5.js library in JavaScript.
[0036] Pitch: voice data is translated into numeric value through a computing program such as pitch detection program in ml5.js library (CREPE) in JavaScript.
[0037] According to the value measured and calculated in said ways, a computing program creates content (304) such as charts/text/numeric information or graphical object in a browser program such as Google Chrome or FireFox (306, 308). For a user test, it is considered effective to interactively control the size of a graphical object by the volume of voice or surrounding sounds, and control its color by the pitch of voice.
[0038] The system can also have a gamification element utilizing graphical object representing speaker's speaking way by preparing a set of rules or a target line to attract more interest from users such as hard-of-hearing children.
[0039] The system can further have a speech training/coaching element advising users to change or keep their way of speaking according to evaluation.
[0040]
[0042] People with hearing difficulty can suffer understanding a sentence by missing one or more words. Even if what they missed is just a few words, they could find it hard and stressful to always ask for repeating. As a solution for such difficulty, the system lets a listener (104) understand missed conversation part by herself/himself.
[0043] For having input, same as in Component 1, the system is operable with any type of end-user computing device (106, 108) which has a microphone (110, 112) such as a mobile phone, a portable computer, a wearable device (Apple Watch, Fitbit or Galaxy Watch among others) or a hearing device for hard-of-hearing people such as hearing aid and cochlear implant.
[0044] In prior to use of this technology implementation, it should be agreed on recording conversation among participants in the conversation.
[0045] Upon receiving audio input (114, 116) through a microphone, a computing program in the system records (402) and divides the audio data in multiple small blocks. The system can record and upload audio blocks to a server through a computing program such as Recorder.js in JavaScript.
[0046] In conversation, when a listener finds it hard to hear, she/he can trigger the system (410) to save (404) and play back (406) the recent audio data which is short enough to comfortably listen back (414). In a user testing, 10 seconds was considered effective for the duration of the audio data to be played back, but a user can also change the duration of a play back. Also, rather than playing back the audio right away, a user can save/mark the difficult audio and play it back later (412).
[0047] The playing back/marking timing can also be triggered automatically by the system. The system can classify the audio data a user previously played back as a difficult sound and understand a user's personal hearing capability/preference through a machine learning process (408).
[0048] When playing back, a user can change the speed, volume or pitch of the conversation so that it is easier for the user to understand.
[0049]