METHOD AND APPARATUS FOR DOCUMENT MANAGEMENT
20220092878 · 2022-03-24
Inventors
- Bokul BORAH (Noida, IN)
- Prachi GUPTA (Noida, IN)
- Shalab SHALAB (Noida, IN)
- Ayushi GUPTA (Noida, IN)
- Theophilus THOMAS (Noida, IN)
- Sumit Kumar TIWARY (Noida, IN)
- Bindu Madhavi MISHRA (Noida, IN)
- Dalbir Singh DHILLON (Noida, IN)
- Manoj KUMAR (Noida, IN)
- Santosh Pallav SAHU (Noida, IN)
- Shweta GARG (Noida, IN)
- Sourav CHATTERJEE (Noida, IN)
- Tasleem ARIF (Noida, IN)
- Naresh Kumar GUPTA (Noida, IN)
- Pooja PAWWAR (Noida, IN)
- Vipin TIWARI (Noida, IN)
Cpc classification
G06V30/413
PHYSICS
G06V30/414
PHYSICS
G06V30/412
PHYSICS
International classification
G06V30/413
PHYSICS
G06V10/26
PHYSICS
G06V30/414
PHYSICS
Abstract
The disclosure provides a method for document management in a network. The method includes acquiring, by an electronic device, a source document as an image, extracting, by the electronic device, a plurality of multi-modal information from the source document by parsing the source document, automatically determining, by the electronic device, a category of the source document based on a comparison of the extracted plurality of multi-modal information with a plurality of pre-defined features, extracting, by the electronic device, a plurality of data fields corresponding to the determined category from the source document, determining, by the electronic device, a priority for each of the plurality of data fields and storing, by the electronic device, the plurality of data fields in at least one of a secure information source and an unsecure information source based on the determined priority.
Claims
1. A method performed by an electronic device (100) for document management, the method comprising: acquiring a source document as an image; extracting a plurality of multi-modal information from the source document by parsing the source document; automatically determining a category of the source document based on a comparison of the extracted plurality of multi-modal information with a plurality of pre-defined features; extracting a plurality of data fields corresponding to the determined category from the source document; determining a priority for each of the plurality of data fields; and storing the plurality of data fields in at least one of a secure information source and an unsecure information source based on the determined priority.
2. The method of claim 1, further comprising: acquiring a target document as an image; extracting a plurality of multi-modal information from the target document by parsing the target document; automatically determining a category of the target document based on a comparison of the extracted plurality of multi-modal information with the plurality of pre-defined features; retrieving a plurality of data fields corresponding to the determined category from at least one of the secure information source and the unsecure information source; identifying a plurality of target data fields in the target document based on the determined category; creating an augmented reality (AR) overlay over the target document by positioning the retrieved plurality of data fields corresponding to the identified plurality of target data fields; and performing at least one of causing to display the target document with the AR overlay, and storing an image of the target document with the AR overlay in one of the secure information source and the unsecure information source.
3. The method of claim 1, further comprising: retrieving the plurality of data fields based on matching contextual information derived from the plurality of data fields with contextual information pertaining to the electronic device (100); and causing to display notifications based on the matched contextual information.
4. The method of claim 1, further comprising: receiving location information pertaining to a physical copy of the source document; storing the location information in the secure information source; triggering a camera communicably coupled to the electronic device (100) upon receiving a selection of the source document for retrieving location; scanning a location using the camera; causing to display an AR object indicative of the source document upon successfully matching the scanned location with the stored location information.
5. The method of claim 1, wherein acquiring the source document as an image comprises at least one of: scanning a physical document using a camera communicably coupled to the electronic device (100); retrieving the source document from a local storage source of the electronic device (100); retrieving the source document from a cloud storage source communicably coupled to the electronic device (100).
6. The method of claim 1, wherein the plurality of multi-modal information comprises at least one of textual information, a quick response (QR) code, a barcode, geographical tag, date, time, identifiers indicative of application usage and images.
7. The method of claim 1, wherein the pre-defined set of features comprise at least one of a name, identifiers indicative of a category of document, date of birth and geographic location.
8. The method of claim 1, wherein automatically determining a category of the source document based on a comparison of the extracted plurality of multi-modal information with a plurality of pre-defined set of features comprises: transmitting the source document and the extracted plurality of multi-modal information to a server communicably coupled to the electronic device (100); receiving results pertaining to optical character recognition performed over the source document from the server; dividing the source document into a plurality of regions based on the results pertaining to optical character recognition; matching at least one of textual information in each of the plurality of regions and the extracted plurality of multi-modal information with the pre-defined set of features to generate a matching score; and automatically categorizing the source document based on the generated matching score.
9. An electronic device (100) for document management, the electronic device (100) comprising: an image sensor (102); an image scanner (104) communicably coupled to the image sensor (102) configured to acquire any of a source document and a target document as an image; a classification engine (106) communicably to the image sensor (102), the classification engine (106) configured for: extracting a plurality of multi-modal information from the source document by parsing the source document; automatically determining a category of the source document based on a comparison of the extracted plurality of multi-modal information with a plurality of pre-defined features; extracting a plurality of data fields corresponding to the determined category from the source document; determining a priority for each of the plurality of data fields; and storing the plurality of data fields in at least one of a secure information source and an unsecure information source based on the determined priority.
10. The electronic device (100) of claim 9, further comprising an augmented reality (AR) engine (108) communicably coupled to the image sensor (102), the image scanner (104) and the classification engine (106), wherein the AR engine (108) is configured for: extracting a plurality of multi-modal information from the target document by parsing the target document; automatically determining a category of the target document based on a comparison of the extracted plurality of multi-modal information with the plurality of pre-defined features; retrieving a plurality of data fields corresponding to the determined category from at least one of the secure information source and the unsecure information source; identifying a plurality of target data fields in the target document based on the determined category; creating an augmented reality (AR) overlay over the target document by positioning the retrieved plurality of data fields corresponding to the identified plurality of target data fields; and performing at least one of causing to display the target document with the AR overlay, and storing an image of the target document with the AR overlay in one of the secure information source and the unsecure information source.
11. The electronic device (100) of claim 9, wherein acquiring any of the source document and the target document as an image comprises at least one of: scanning a physical document using the image sensor (102); retrieving any of the source document and the target document from a local storage source of the electronic device (100); retrieving any of the source document and the target document from a cloud storage source communicably coupled to the electronic device (100).
12. The electronic device (100) of claim 9, further comprising a contextual engine communicably coupled to the image sensor (102), the image scanner (104), the AR engine (108) and the classification engine (106) configured for: retrieving the plurality of data fields based on matching contextual information derived from the plurality of data fields with contextual information pertaining to the electronic device (100); and providing notifications based on the matched contextual information.
13. The electronic device (100) of claim 9, wherein the plurality of multi-modal information comprises at least one of textual information, a quick response (QR) code, a barcode, geographical tag, date, time, identifiers indicative of application usage and images.
14. The electronic device (100) of claim 9, wherein the pre-defined set of features comprise at least one of a name, identifiers indicative of a category of document, date of birth and geographic location.
15. The electronic device (100) of claim 9, wherein automatically determining a category of the source document based on a comparison of the extracted plurality of multi-modal information with a plurality of pre-defined set of features comprises: transmitting, by the electronic device (100), the source document and the extracted plurality of multi-modal information to a server communicably coupled to the electronic device (100); receiving, by the electronic device (100), results pertaining to optical character recognition performed over the source document from the server; dividing, by the electronic device (100), the source document into a plurality of regions based on the results pertaining to optical character recognition; matching, by the electronic device (100), at least one of textual information in each of the plurality of regions and the extracted plurality of multi-modal information with the pre-defined set of features to generate a matching score; and automatically categorizing, by the electronic device (100), the source document based on the generated matching score.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0028] This invention is illustrated in the accompanying drawings, throughout which like reference letters indicate corresponding parts in the various figures. The embodiments herein will be better understood from the following description with reference to the drawings, in which:
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047] While embodiments of the present disclosure are described herein by way of example using several illustrative drawings, those skilled in the art will recognize the present disclosure is not limited to the embodiments or drawings described. It should be understood the drawings and the detailed description thereto are not intended to limit the present disclosure to the form disclosed, but to the contrary, the present disclosure is to cover all modification, equivalents and alternatives falling within the spirit and scope of embodiments of the present disclosure as defined by the appended claims.
DETAILED DESCRIPTION
[0048] Various embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. In the following description, specific details such as detailed configuration and components are merely provided to assist the overall understanding of these embodiments of the present disclosure. Therefore, it should be apparent to those skilled in the art that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness.
[0049] Also, the various embodiments described herein are not necessarily mutually exclusive, as some embodiments can be combined with one or more other embodiments to form new embodiments. Herein, the term “or” as used herein, refers to a non-exclusive or, unless otherwise indicated. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein can be practiced and to further enable those skilled in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein. Further it should be possible to combine the flows specified in different figures to derive a new flow.
[0050] As is traditional in the field, embodiments may be described and illustrated in terms of blocks which carry out a described function or functions. These blocks, which may be referred to herein as managers, engines, controllers, units or modules or the like, are physically implemented by analog and/or digital circuits such as logic gates, integrated circuits, microprocessors, microcontrollers, memory circuits, passive electronic components, active electronic components, optical components, hardwired circuits and the like, and may optionally be driven by firmware and software. The circuits may, for example, be embodied in one or more semiconductor chips, or on substrate supports such as printed circuit boards and the like. The circuits constituting a block may be implemented by dedicated hardware, or by a processor (e.g., one or more programmed microprocessors and associated circuitry), or by a combination of dedicated hardware to perform some functions of the block and a processor to perform other functions of the block. Each block of the embodiments may be physically separated into two or more interacting and discrete blocks without departing from the scope of the disclosure. Likewise, the blocks of the embodiments may be physically combined into more complex blocks without departing from the scope of the disclosure.
[0051] The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description.
[0052] The embodiments disclosed herein can be implemented through at least one software program running on at least one hardware device and performing network management functions to control the elements. The elements shown in
[0053] In accordance with embodiments disclosed herein, document management involves acquiring any document and then retrieving document properties to map them to a pre-stored set of documents. Depending upon the document category, relevance of data inside document in any form such as text, QR code, etc. can be determined for providing services to the user.
[0054]
[0055] In some embodiments, the electronic device 100 can include communication units pertaining to communication with remote computers, servers or remote databases over a communication network. The communication network can include a data network such as, but not restricted to, the Internet, local area network (LAN), wide area network (WAN), metropolitan area network (MAN) etc. In certain embodiments, the communication network can include a wireless network, such as, but not restricted to, a cellular network and may employ various technologies including enhanced data rates for global evolution (EDGE), general packet radio service (GPRS), global system for mobile communications (GSM), Internet protocol multimedia subsystem (IMS), universal mobile telecommunications system (UMTS) etc.
[0056] The processor 112 can be, but not restricted to, a Central Processing Unit (CPU), a microprocessor, or a microcontroller. The processor 1112 executes sets of instructions stored on the memory 114.
[0057] The memory 114 includes storage locations to be addressable through the processor 112. The memory 1114 is not limited to a volatile memory and/or a non-volatile memory. Further, the memory 114 can include one or more computer-readable storage media. The memory 114 can include non-volatile storage elements. For example non-volatile storage elements can include magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories.
[0058] In some embodiments, the memory 114 is coupled to an immersive environment library. The immersive environment library is a source for multi-modal content used for extracting information indicative of various immersive environments. Immersive environments include augmented reality (AR) environments, virtual reality (VR) environments, mixed reality environments and the like. The immersive environment library can be but not limited to a relational database, a navigational database, a cloud database, an in-memory database, a distributed database and the like. In some embodiments, the immersive environment library can be stored on the memory 114. In some other embodiments, the immersive environment library is stored on a remote computer, a server, a network of computers or the Internet.
[0059] In some embodiments, the memory 114 is communicably coupled to third party storage, cloud storage and the like.
[0060] The image sensor 102 captures still images or moving images of the real world environment pointed at by a camera (not shown) placed on the electronic device 100. The camera is communicably coupled to the imaging sensor 102. The image sensor 102 captures an image of a document pointed at by a user of the electronic device 100. The image scanner 104 in conjunction with the image sensor 102 scans documents to generate images of the documents. The generated images are further converted to documents of types including but not limited to word documents, portable document formats, image formats and the like.
[0061]
[0062]
[0063]
[0064] After scanning the file or image, the components of the source document including text, text regions, QR code, Barcode, Logo, etc. are extracted at step 304. The components are accumulated and matched with the templates present in the memory 114 and/or in remote storage communicably coupled to the memory 114 at step 306. The template matching helps in classification of document which further categorizes the contents in meaningful structured data. This structured data is stored in the remote server 201 to build the profile of the user at steps 312 and 314.
[0065] In the above process, if the components are not matched with any of the templates of existing models then the document is detected to be a new template and stored at steps 308 and 310.
[0066]
[0067] So the conditional probability of classification of a document using Bayes' theorem can be stated as:
P(A|B)=(P(B|A)P(A))/(P(B))
[0068] Where P(B/A) is the probability of evidence given that our hypothesis is true. When the document is classified, the new template is saved and the training set is updated for reference. The new contents are mapped to the existing templates and convert data to structured form.
[0069] If the content contains a QR code or a barcode, the OCR engine 201B decodes it to text and compares it with the structured data to verify the validity of the information and make correction in the data which can occur because of wear and tear of the document or noise in the system to capture the information. Once the structured data is saved, context based prioritization of fields can be applied to the document.
[0070] In some embodiments, the auto categorization of the source document or a target document begins by acquiring a file using the image sensor 102 and the image scanner 104 or reading file from a file system/mailbox or any other source. The file is processed by the classification engine 106 to detect if the file qualifies as document or not. The file is then processed for specific features such as presence of text, QR code, barcode or logo. The file along with the extracted features is then sent to cloud for categorization where first the OCR is performed over the document.
[0071] As shown in
[0072] In some embodiments, based on the category of the document, the source document is moved to a secure location such as Knox storage or the user can be given an option to format the important information such as ID number in the image file stored at non-secure location. The information from the document will be saved with a profile which can be used in future to auto-fill forms. Priority of any field inside the given document can be decided on the following basis:
[0073] Pre-defined Set: Based on the category of document, a pre-defined set of fields inside that document is considered to be of higher priority.
[0074] Stored Data: If the current document contains any information already in the device database and belongs to a secure category the priority of that field is increased.
[0075]
[0076] It is a common user behavior to store physical documents at specific physical locations which is convenient for a user but in digital world such as smartphone, it becomes difficult to map those document files. This leads to extra effort in terms of remembering locations of all files. The invention addresses these problems by providing AR based locating document as well as easy retrieval of information from stored documents in AR view itself. The user can scan any physical form document using camera where the AR unit will classify the image preview as a form based on image classification. Then, the electronic device 100 can retrieve fields mentioned inside the form such as Name, Date of Birth, Address etc. and correspondingly retrieve those information from the user profile. These information can be previewed over the camera image of a target document.
[0077] At step 402, a target document is acquired as an image. The target document can be acquired through scanning a form by the image sensor 102 and the image scanner 104 or be retrieved from the memory 114 or any storage medium communicably coupled to the memory 114. At step 404, a plurality of multi-modal information is extracted from the target document by the classification engine 106. Steps similar to automatically categorizing the source document (shown in
[0078]
[0079]
[0080]
[0081]
[0082]
[0083]
[0084]
[0085]
[0086]
[0087] In an embodiment, the user can place preset actions to be performed by the contextual engine 110 to perform any actions related to bill payment. For example, the user can opt to direct the contextual engine 110 to automatically pay any bill detected two days before the due date.
[0088]
[0089]
[0090] In an example, forms for credit card application can be automatically filled using extracted information. The target forms are automatically filled and any e-KYC (Know Your Customer) procedures can be completed using the extracted information.
[0091]
[0092]
[0093] The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the embodiments as described herein.