INTEGRATED SPEECH RECOGNITION TEXT INPUT WITH MANUAL PUNCTUATION
20180342248 ยท 2018-11-29
Inventors
Cpc classification
G10L15/22
PHYSICS
G06F3/023
PHYSICS
H04W4/14
ELECTRICITY
H04M1/72466
ELECTRICITY
International classification
G06F3/023
PHYSICS
G10L15/22
PHYSICS
H04M1/27
ELECTRICITY
H04W4/14
ELECTRICITY
Abstract
An integrated system and method for text-input, combining and syncing speech input together with manual input, to improve speech-recognition-based text input both in speed and accuracy, when punctuation and other symbols are needed and when speech-recognition results are to be combined with previously-existing text. Facilitates the strong points of speech-recognition technology, which are speed and comfort when inputting common words, while at the same time facilitates the strong points of manual key-typing, which are speed, comfort and accuracy when inputting punctuation marks, symbols, or pre-defined text with a single click. Increases speed, accuracy and comfort of speech-recognition text input by solving the problems of current voice-typing methods, and by further using the data from the manual input for improving speech recognition results.
Claims
1. A text-input system and method comprising: a speech-recognition module; and a manual input module, specifically for punctuation marks, emoji symbols, digits and other non-alphabet symbols, simultaneously enabled with said speech-recognition module; and an integration module that synchronizes and combines said speech recognition module and said manual-input module and their corresponding inputs and results.
2. The text-input article of claim 1, implemented on a mobile phone.
3. The text-input article of claim 1, implemented on a pc.
4. The text-input article of claim 1, implemented on a virtual reality or augmented reality device.
5. The text-input article of claim 1, wherein said manual input module comprises a virtual keyboard.
6. The text-input article of claim 1, wherein said manual input module comprises a hardware keyboard.
7. The text-input article of claim 1, wherein said manual input module is always available and enabled, including when said speech recognition module is capturing or processing speech.
8. The text-input article of claim 1, wherein said speech recognition module is always available and enabled, even when said manual input is being used.
9. The text-input article of claim 1, wherein said manual input is used after speech was spoken but before speech results are finalized, such that final-resulting text includes integrated results of both the complete speech results and the symbol from the manual input, in the order of input: symbol after speech, and not in the order of results: symbol before speech results.
10. The text-input article of claim 1, wherein said integration module calculates whether it is most probably helpful to insert a space character between speech results by said speech recognition module and punctuation marks or symbols by said manual input module and vice versa between speech results to the manually-inputted mark, based on the specific said mark and said speech results, and inserts the space character when it decides necessary.
11. The text-input article of claim 1, wherein said integration module sends punctuation marks or symbols entered by said manual input module after speech was spoken, but before speech results were finalized to said speech-recognition module.
12. The text-input article of claim 1, wherein said speech recognition module takes a punctuation mark or other non-alphabetical symbol as an additional input for the speech processing algorithms and in evaluating the confidence level of speech-recognition results.
13. The text-input article of claim 1, wherein manual input by said manual input module while said speech recognition module processes prior speech, signals to said speech-recognition module current speech-utterance is done, enabling said speech recognition module to immediately stop awaiting for more speech or a recognizable pause in order to finalize speech results.
14. The text-input article of claim 1, wherein manual input by said manual input module while said speech recognition module processes prior speech, signals to said speech-recognition module current speech-utterance is done, enabling said speech recognition further process the speech utterance as a complete utterance using sentence-level-context in order to improve speech results.
15. The text-input article of claim 1, wherein said manual input module comprises keys that represent full pre-defined (by user or by system) text.
16. The text-input article of claim 1, wherein said manual input module comprises control commands for the currently-processed speech, wherein said control-commands include: End speech utterance; and Cancel speech utterance; and Finalize speech results.
17. The text-input article of claim 1, wherein said manual input module comprises ambiguity-resolutions for the currently-processed speech, based on incoming partial speech results from said speech recognition, enabling real-time selection of best result out of possible ambiguous results.
18. The text-input article of claim 1, wherein said integration-module automatically decides on capitalization of speech-recognition results based on text already existing in the text-field prior to the current caret position.
19. The text-input article of claim 1, wherein said integration-module automatically decides on inserting a space character prior to inserting the speech-recognition results based on text already existing in the text-field prior to the current caret position.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0022]
[0023]
[0024]
[0025]
[0026]
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS OF THE INVENTION
[0027] The invention is an integrated system and method for text-input, combining and syncing speech input together with manual input, to improve speech-recognition-based text input both in speed and accuracy, when punctuation and other symbols are needed and when speech-recognition results are to be combined with previously-existing text. As such, the invention integrates speech-recognition with manual punctuation input, smart context-aware text insertion and user-enhanced real-time ambiguity resolution. In punctuation input we include both punctuation marks and any other significant symbols for the post-speech-sequence text input or for the speech-recognition flow, such as end sequence and cancel sequence commands.
[0028] Embodiments of these aspects of the invention are discussed with reference to
[0029]
[0030]
[0031]
[0032]
[0033] After triggering the end of sequence, then the speech recognizer goes into the final processing of the buffered sequence possibly using for its analysis also the information on the specific punctuation mark that was typed. The knowledge of the ending punctuation mark might hold valuable information about the underlying sentence, that is used by the speech recognizer to factor the statistical likelihood of the different possible results. For instance, if the speech recognizer got the following 2 possible results for the first words in the utterance: What are the . . . and Water the . . . , the knowledge on whether the punctuation mark should be a question-mark or a period holds valuable information for the statistical likelihood of option 1 versus 2. Therefore, by typing the punctuation mark, the user actually helps the speech recognition algorithms to return the more accurate result. For instance, if the user typed ?, then from that information alone we derive that the beginning of the sentence is more likely to be What are the . . . . Whereas if the user typed ., then Water the . . . is more likely. These considerations are added when helpful to the statistical models for calculating the confidence level of each result.
[0034] Since the marks can be typed, there is no need to dictate them for the text-input, therefore emitting possible ambiguity in understanding the marks themselves by the speech recognizer. Therefore, the accuracy of the whole text-input is improved. For instance, dictating period is ambiguous (could mean either a length of time, or a punctuation mark) even when understood correctly by the recognizer. The situation would be even more ambiguous if the speech recognizer did not fully understand the mark. All these sources for mistakes are completely emitted when the user is enabled to manually type in the wanted punctuation mark.
[0035] The typed punctuation mark is appended to the speech results 324, and the integrated results are then inserted into the text element 360. The smart-insertion process is described in more detail in
[0036] Parallel to, or right after finalizing the speech results for the current sequence, a new sequence is started 312, so the user can dictate and type continuously.
[0037] In the case where the manual input is typed when there is no buffered speech being recognized, then the keyboard simply acts as a regular keyboard, and the typed symbols are inserted 370 to the text element.
[0038] Last, the new caret position is updated 380 to the last place of the inserted text, making it ready for the future text results.
[0039]
[0040] While particular embodiments of the present invention are illustrated and described, it would be obvious to those skilled in the art that various other changes and modifications can be made without departing from the spirit and scope of the invention.