Multimodel Text Input by a Keyboard/Camera Text Input Module Replacing a Conventional Keyboard Text Input Module on a Mobile Device
20170300128 · 2017-10-19
Inventors
Cpc classification
G06F3/02
PHYSICS
G06F40/274
PHYSICS
G06F40/58
PHYSICS
G06F3/04886
PHYSICS
G06F2203/0381
PHYSICS
G06V30/224
PHYSICS
G09G5/00
PHYSICS
G06F3/0227
PHYSICS
International classification
G06F3/023
PHYSICS
G06F3/0488
PHYSICS
Abstract
The present invention relates to a method and a module for a multimodal text input in a mobile device (1) via a keyboard or in a camera mode by holding the camera of the mobile device (1) over a written text, such that an image is taken of the written text and the written text is recognized, wherein the input text is output to an application requesting the input text, the method comprising the following steps: a) activating a keyboard mode; b) providing an A-Z-keyboard in a first field (4) for text input; c) activating the camera mode; d) capturing the image of the written text and displaying the captured image with the written text in a second field (5) of a display (2) of the mobile device (1); e) converting the captured image to character text by optical character recognition (OCR) and displaying the recognized character text on the display (2); outputting a selected character as the input text to the application upon a selection of the character on the A-Z-keyboard, or outputting a selected part of the recognized character text as the input text to the application upon a selection of the part of the recognized character text; wherein the respective selection takes place by a single keypress or control command, or by a single gesture.
Claims
1-15. (canceled)
16. A method for multimodal text input in a mobile device comprising a keyboard and a camera, the method comprising: requesting an input text; providing a keyboard and camera module comprising a keyboard mode for character selection and a camera mode, said keyboard mode for character selection comprising a frame for displaying input text in a display of a mobile device comprising a first field, and an A-Z keyboard for text input in said first field; and said camera mode comprising a second field and configured to capture an image of a written text; display said captured image in said second field; convert said captured image to character text by optical character recognition (OCR); and display the recognized character text in said frame, wherein said capture, display of said captured image, conversion, and display of said recognized character text is capable of being accomplished through a single operation; and outputting one or more of said selected characters and said recognized character text, or a combination of both, as input text to an application requesting said input text.
17. The method according to claim 16, wherein said frame of said display is a touch screen and said A-Z keyboard is a touch screen keyboard.
18. The method according to claim 16, wherein said recognized character text is displayed in one or more third fields of said frame or displayed as a text overlay of said captured image in said second field.
19. The method according to claim 16, wherein one or more suggestion candidates of said recognized character text are determined by an algorithm in connection with a database and are displayed in one or more third fields or as another overlay within said second field, and said one or more suggestion candidates are selectable by a keypress event on a key, wherein said key is one or more of a group consisting of a visible key; and a hidden key within one or more of a group consisting of one or more of the fields of said display, on the recognized character text, and on another text on said display.
20. The method according to claim 19, wherein said keypress event providing said selection is from the group consisting of a single mechanical keypress, a single touch keypress and a single swipe gesture on a touch screen.
21. The method according to claim 16, wherein frame size is the same frame size of an original standard keyboard module on said mobile device.
22. The method according to claim 16, wherein said second field is displayed adjacent to said A-Z keyboard in said first field.
23. The method according to claim 16, wherein the steps of capturing, displaying said captured image, converting, and displaying said recognized character text are executed repetitively until a certain keypress is detected to end said repetitive execution, wherein the respective latest recognized text in the part of the respective latest captured image is analyzed for new text in regards to a previously recognized character text that is output to said application requesting said input text, whereupon a control command is generated for the selection of the new text as the recognized character text, and said new text is output to said application requesting said input text.
24. The method according to claim 23, wherein said control command for the selection of said recognized character text is generated automatically via a detection algorithm, wherein said detection algorithm recognizes whether said recognized character text in a previously captured image and in the current captured image are the same.
25. The method according to claim 16, wherein: said keyboard mode for character selection is executed by a keyboard text input module; said camera mode is executed by a camera text input module, wherein the execution of said keyboard text input module is independent from the execution of said camera text input module, but wherein the execution of said camera text input module is dependent on the execution of said keyboard text input module; wherein said camera text input module is always active in the background to detect whether said keyboard text input module is active, wherein detection of execution of said keyboard text input module activates said camera mode, wherein at least the second field and the recognized character text are visible on the display; and wherein said camera mode is deactivated if said camera text input module detects that said keyboard text input module ceases to be active.
26. A method for multimodal text input in a mobile device comprising a keyboard and a camera, the method comprising: requesting an input text; providing a keyboard and camera module comprising a keyboard mode for character selection and a camera mode, said keyboard mode for character selection comprising a frame for displaying input text in a display of a mobile device comprising a first field; an A-Z keyboard for text input in said first field; and a first control key in the first field, wherein said first control key activates said camera mode and deactivates said keyboard mode; and said camera mode comprising a second field; and a second control key in the first field, wherein said second control key activates said keyboard mode and deactivates said camera mode, and said camera mode configured to capture an image of a written text; display said captured image in a second field; convert said captured image to character text by optical character recognition (OCR); and display the recognized character text in said frame, wherein said capture, display of said captured image, conversion, and display of said recognized character text is capable of being accomplished through a single operation; and outputting one or more of said selected characters and said recognized character text as input text to an application requesting said input text.
27. A mobile device configured to facilitate multimodal text input, the mobile device comprising a display configured to display text and image content in a frame having a first field, second field, and third field; an A-Z keyboard having a keyboard mode, wherein said A-Z keyboard is configured to receive input text; a camera having a camera mode; and a processor in communication with said display, said A-Z keyboard, and said camera, said processor being configured to receive input text via said A-Z keyboard, via said camera, or both; communicate with the A-Z keyboard to provide for selection of one or more characters on the A-Z keyboard; communicate with the camera to capture an image of said written text, display said captured image in said second field, convert said captured image to character text by optical character recognition (OCR), and display said recognized character text in said frame, wherein said capture, display of said captured image, conversion, and display of said recognized character text is capable of being accomplished through a single operation; and output said selected one or more characters and recognized character text to an application requesting input text.
28. The mobile device according to claim 27, wherein said processor is configured to generate said A-Z keyboard on said display using a touch screen interface, said A-Z keyboard being compatible with one or more applications requiring text input running on said mobile device, and said frame being maintained at a constant size on the display.
29. The mobile device according to claim 27, wherein said keyboard mode and said camera mode are both active simultaneously.
30. The mobile device according to claim 29, wherein said processor configures said A-Z keyboard as a touchscreen interface such that said touchscreen interface and said camera are active at the same time.
31. The mobile device according to claim 29, wherein said processor configures said A-Z keyboard as a touchscreen interface; the processor being further configured to display said captured image and said recognized character text in said second field; enable by one or more of a group consisting of a single keypress, a control command, and a single gesture the selection of said recognized character text as input text; and the selection of a character key from the A-Z keyboard as input text; and immediately output said input text to said application requesting said input text.
32. The mobile device according to claim 31, wherein said single keypress, said control command, and said single gesture include the selection of said recognized character text as input text; the selection of said character key from the A-Z keyboard as input text; and the immediate output of said input text to said processor.
33. The mobile device according to claim 27, wherein said A-Z keyboard comprises two sub-modules, wherein a first standard keyboard sub-module is activated by the processor in response to a request for input text by said application, said first standard keyboard sub-module being dependent on a second camera sub-module; and said second camera sub-module configured to be always active in the background detecting whether said first keyboard sub-module is activated.
34. The mobile device according to claim 33, wherein said processor is configured to activate said camera mode in response to the activation of said second camera sub-module, such that said second field is displayed adjacent to the A-Z keyboard; and shut down said camera mode when said second keyboard sub-module is no longer activated by said application.
35. The method according to claim 21, wherein said frame further comprises a third field.
36. The method of according to claim 22, wherein said frame comprises a third field displayed adjacent to said keyboard in said first field.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0022] The foregoing will be apparent from the following more particular description of example embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating embodiments of the present invention.
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
DETAILED DESCRIPTION
[0030] A description of example embodiments of the invention follows.
[0031]
[0032] Preferably the multimodal text input method is implemented in a multimodal text input program module, or further simply called “module”, with the same interface to an application as an original keyboard module and under a respective operating system of the mobile device 1. If so, then said module can replace the original keyboard module on the mobile device 1, advantageously offering all applications multimodal text input features as described above. Thus the user can choose whether he wants to input the text conventionally via the keyboard or in the camera mode selecting the recognized text or a part thereof.
[0033] An alternative solution instead of replacing the original keyboard module by the complete integral multimodal text input program module is a provision of a separate camera text input module, wherein the camera text input module is like a complement or supplement to the original keyboard module and always active in the background to detect whether the original keyboard module is activated by the application. In case the keyboard module is detected as activated, the camera text input module activates the camera mode and preferably displays an additional field on the display 2 displaying the captured written text and preferably the recognized text which can be selected by a keypress, a gesture 9, by a voice command or the like. Thus the text can be input either via the keyboard or via the camera text input module in the camera mode and the input text will be sent to the application. In case the original keyboard module gets closed, by the application, a key, a timer or the like, the keyboard module is not detected anymore by the camera text input module, whereupon the camera text input module will deactivate the camera mode and delete any displayed second field 5 or third field 6 on the display 2.
[0034] For clarity, the “application” stands for instance for a translation application, a search application such as Google search or the like searching for a keyword in the internet, a phone application requesting for a phone number or a name or for any other application requesting text input.
[0035] The “text” stands preferably for any character text, a single or multiple character/s, a string of connected characters such as a word, a number or the like.
[0036] The “character” stands preferably for any character and for any number.
[0037] The “written text” stands preferably for any written or displayed text on a sheet of paper, in a book or also on a screen or the like which can be captured by the camera.
[0038] The “keyboard text input” is preferably understood as any text input on the keyboard.
[0039] The “keyboard” stands preferably for a touch screen keyboard, a mechanical conventional keyboard, a holographic keyboard or the like, wherein characters can be input manually or wherein a typing is registered.
[0040] The “A-Z-keyboard” stands preferably for a keyboard comprising a full set of keys from A to Z or from 0 to 1 and at least one control key such as ENTER, for instance.
[0041] The “original keyboard module” stands preferably for a standard keyboard module or sub-program, being already available for the respective mobile device 1 with its respective operating system and respective applications, but a new proprietary or separate keyboard module for a certain mobile device 1 is also conceivable.
[0042] A “Control key” stands preferably for a respective key on the keyboard executing a certain function or for a hidden touch key behind one of the displayed fields as the first 4, the second 5 and/or the third field 6, for instance.
[0043] A “keypress” or a respective “keypress event” stands preferably for a manual pressing of a certain key or hidden key of the keyboard or of the touch screen, for a certain gesture 9 on the touch screen or in front of a camera, and also for a swype gesture 9. A “single keypress” can also be a double click or double press or double touch on the same key or place. A certain signal strength or pattern from an acceleration sensor is preferably also usable as a keypress.
[0044] The “mobile device 1” stands preferably for any mobile-phone, smart-phone or the like.
[0045] The “display 2” stands preferably for any conventional non-touch-screen display as well as for any touch screen with touch key functionality.
[0046]
[0047]
[0048] In the example shown in
[0049] A selection of the recognized text in the third field can be made preferably by a second keypress on the camera key or a keypress on the third field or on the RETURN key or the like. Other preferred possibilities for the selection of the recognized text are for instance via voice or via a timer, such that if the camera is held for a time longer than a stored time limit over the same written text the selection is executed. Other kinds of selection are not excluded.
[0050] Another preferred embodiment of the camera mode includes word recognition within the recognized text, such that words are identified and such that upon a selection only the word in a focus or in the middle of the third field or behind cross hairs, respectively, is selected and output. Another preferred selection can be made by touching or pressing, respectively, on the word on the touch screen.
[0051] It is also imaginable that the recognized text gets displayed preferably as an overlaid recognized text over the captured text in the second field 5. In this case the captured text in the second field 5 gets preferably erased, such that the recognized text can be overlaid on the displayed image preferably in a similar size and width as the original text. This would reduce necessary space, such that the third field 6 is not needed anymore and therefore the first field 4 or the second field 5 can take over that space.
[0052] Another preferred method of selecting one of the words of the recognized text displayed adjacent to the displayed keyboard is via a keypress on a key in next proximity of the desired word as input text.
[0053] Another preferred method of selecting one of the words of the recognized text is by displaying next to each word a respective identifier such as a certain number, for instance, and by a keypress on that respective number on the keyboard.
[0054] Preferably by pressing a KEYBOARD key 7 the mode is switched back from the camera mode to the keyboard mode, wherein the keyboard as shown in
[0055]
[0056] A preferred embodiment of the present invention foresees also a sending of the captured image or preferably of the part of the displayed image in the second field 5 as image data to a remote server, where the image data are OCR processed and the recognized text and preferably the suggestion candidates are generated and returned to the mobile device 1. This preferred method of a kind of remote computing is advantageous as it keeps the calculation power and the memory requirement on the mobile device 1 small, especially as regards the database for the OCR and as regards different languages. Also the development effort is reduced drastically with regard to an implementation of image processing and the OCR on different mobile device 1 types and operating systems.
[0057]
[0058]
[0059]
[0060]
[0061] Preferably the keyboard text input module is independent from the camera text input module, but the camera text input module is dependent of the state of the keyboard text input module, which is checked continuously by the camera text input module.
[0062] In the preferred embodiment shown the recognized text is displayed in the third field 6 below the second field 5. It is also imaginable to overlay the recognized text over the captured written text, such that the third field 6 is either overlaid over the second field 5, or such that the letters of the recognized text are overlaid over the captured written text within the part of the captured and displayed image.
[0063]
[0064] Preferably the embodiment of
[0065] Where technical features mentioned in any claim are followed by reference signs, those reference signs have been included for the sole purpose of increasing intelligibility of the claims and accordingly, such reference signs do not have any limiting effect on the scope of each element identified by way of example by such reference signs.
REFERENCE NUMERALS
[0066] 1 mobile device [0067] 2 display [0068] 3 application window [0069] 3b text input field [0070] 4 first field [0071] 5 second field [0072] 6 third field [0073] 7 keyboard key [0074] 8 camera mode key [0075] 9 gesture [0076] 10 hidden touch key
[0077] The teachings of all patents, published applications and references cited herein are incorporated by reference in their entirety.
[0078] While this invention has been particularly shown and described with references to example embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.