Method and system for automatically detecting errors in at least one data entry using image maps
11580092 · 2023-02-14
Assignee
Inventors
Cpc classification
G06F16/2379
PHYSICS
G06F16/215
PHYSICS
G06F40/232
PHYSICS
International classification
Abstract
A method for automatically detecting errors in at least one data entry in a database, the at least one data entry including an input string of characters that do not match at least one predefined string of characters. The method includes generating a first image map; generating at least one classification parameter by comparing the first image map to a second image map, the second image map based at least partially on the predefined string of characters; determining that the input string of characters correlates to the predefined string of characters; and modifying the at least one data entry to match the predefined string of characters in response to determining that the input string of characters correlates to the predefined string of characters. Various other methods and systems for automatically detecting errors in at least one data entry in a database are also disclosed.
Claims
1. A method of automatically detecting errors in at least one data entry in a database, the at least one data entry comprising text data comprising an input string of characters that do not match text data comprising at least one predefined string of characters, the method comprising: generating, with at least one processor and based at least partially on the input string of characters of the at least one data entry, a first matrix comprising a matrix-representation of the input string of characters; generating, with at least one processor, at least one first classification parameter by comparing the first matrix to a second matrix, the second matrix comprising a matrix representation of the predefined string of characters, wherein the at least one first classification parameter is generated based on a predictive model to determine a correlation between the first matrix and the second matrix; based on the at least one first classification parameter, determining, with at least one processor, that the input string of characters does not correlate to the predefined string of characters; in response to determining that the input string of characters does not correlate to the predefined string of characters, generating at least one second classification parameter by comparing the first matrix to a third matrix, the third matrix comprising a matrix representation of a stored string of characters representative of a prior non-correlated data entry, wherein the prior non-correlated data entry comprises a previously-received input string of characters previously determined to not correlate to the at least one predefined string of characters; and storing, with at least one processor and based at least partially on the at least one second classification parameter, the at least one data entry in a database in association with the prior non-correlated data entry, such that the input string of characters and/or the stored string of characters form a new predefined string of characters for comparison to subsequent data entries.
2. The method of claim 1, wherein determining that the input string of characters does not correlate to the predefined string of characters comprises: generating, with at least one processor, the predictive model for determining correlations between matrices, the predictive model based at least partially on matrices generated from historic data entries; and determining, with at least one processor, that the input string of characters does not correspond to the predefined string of characters based at least partially on application of the predictive model to the first matrix.
3. The method of claim 1, wherein the first matrix, the second matrix, and the third matrix each comprise: (i) columns corresponding to a character position within a respective string of characters and (ii) rows corresponding to a character index in a character set, the columns and rows defining the matrices, wherein each character represented by the first matrix, the second matrix, and the third matrix is represented by a location in the matrix associated with a column and a row.
4. The method of claim 1, wherein a length of the input string of characters is different from a length of the predefined string of characters, and wherein the first matrix and the second matrix are generated to have a same dimension corresponding to either the length of the input string of characters or the length of the predefined string of characters.
5. The method of claim 1, wherein the first matrix, the second matrix, and the third matrix are compressed prior to generating the at least one first classification parameter and/or the at least one second classification parameter.
6. The method of claim 1, wherein the at least one data entry is manually inputted by a user into a freeform input field.
7. A system for automatically detecting errors in at least one data entry in a database, the at least one data entry comprising text data comprising an input string of characters that do not match text data comprising at least one predefined string of characters, the system comprising at least one processor configured to: generate, based at least partially on the input string of characters of the at least one data entry, a first matrix comprising a matrix representation of the input string of characters; generate at least one first classification parameter by comparing the first matrix to a second matrix, the second matrix comprising a matrix representation of the predefined string of characters, wherein the at least one first classification parameter is generated based on a predictive model to determine a correlation between the first matrix and the second matrix; based on the at least one first classification parameter, determine that the input string of characters does not correlate to the predefined string of characters; in response to determining that the input string of characters does not correlate to the predefined string of characters, generate at least one second classification parameter by comparing the first matrix to a third matrix, the third matrix comprising a matrix representation of a stored string of characters representative of a prior non-correlated data entry, wherein the prior non-correlated data entry comprises a previously-received input string of characters previously determined to not correlate to the at least one predefined string of characters; and store, based at least partially on the at least one second classification parameter, the at least one data entry in a database in association with the prior non-correlated data entry, such that the input string of characters and/or the stored string of characters form a new predefined string of characters for comparison to subsequent data entries.
8. The system of claim 7, wherein determining that the input string of characters does not correlate to the predefined string of characters comprises the at least processor being configured to: generate the predictive model for determining correlations between matrices, the predictive model based at least partially on matrices generated from historic data entries; and determine that the input string of characters does not correspond to the predefined string of characters based at least partially on application of the predictive model to the first matrix.
9. The system of claim 7, wherein the first matrix, the second matrix, and the third matrix each comprise: (i) columns corresponding to a character position within a respective string of characters and (ii) rows corresponding to a character index in a character set, the columns and rows defining the matrices, wherein each character represented by the first matrix, the second matrix, and the third matrix is represented by a location in the matrix associated with a column and a row.
10. The system of claim 7, wherein a length of the input string of characters is different from a length of the predefined string of characters, and wherein the first matrix and the second matrix are generated to have a same dimension corresponding to either the length of the input string of characters or the length of the predefined string of characters.
11. The system of claim 7, wherein the first matrix, the second matrix, and the third matrix are compressed prior to generating the at least one first classification parameter and/or the at least one second classification parameter.
12. The system of claim 7, wherein the at least one data entry is manually inputted by a user into a freeform input field.
13. A computer program product for automatically detecting errors in at least one data entry in a database, the at least one data entry comprising text data comprising an input string of characters that do not match text data comprising at least one predefined string of characters, the computer program product comprising at least one non-transitory computer-readable medium including one or more instructions that, when executed by at least one processor, cause the at least one processor to: generate, based at least partially on the input string of characters of the at least one data entry, a first matrix comprising a matrix representation of the input string of characters; generate at least one first classification parameter by comparing the first matrix to a second matrix, the second matrix comprising a matrix representation of the predefined string of characters, wherein the at least one first classification parameter is generated based on a predictive model to determine a correlation between the first matrix and the second matrix; based on the at least one first classification parameter, determine that the input string of characters does not correlate to the predefined string of characters; in response to determining that the input string of characters does not correlate to the predefined string of characters, generate at least one second classification parameter by comparing the first matrix to a third matrix, the third matrix comprising a matrix representation of a stored string of characters representative of a prior non-correlated data entry, wherein the prior non-correlated data entry comprises a previously-received input string of characters previously determined to not correlate to the at least one predefined string of characters; and store, based at least partially on the at least one second classification parameter, the at least one data entry in a database in association with the prior non-correlated data entry, such that the input string of characters and/or the stored string of characters form a new predefined string of characters for comparison to subsequent data entries.
14. The computer program product of claim 13, wherein determining that the input string of characters does not correlate to the predefined string of characters comprises the one or more instructions causing the at least one processor to: generate the predictive model for determining correlations between matrices, the predictive model based at least partially on matrices generated from historic data entries; and determine that the input string of characters does not correspond to the predefined string of characters based at least partially on application of the predictive model to the first matrix.
15. The computer program product of claim 13, wherein the first matrix, the second matrix, and the third matrix each comprise: (i) columns corresponding to a character position within a respective string of characters and (ii) rows corresponding to a character index in a character set, the columns and rows defining the matrices, wherein each character represented by the first matrix, the second matrix, and the third matrix is represented by a location in the matrix associated with a column and a row.
16. The computer program product of claim 13, wherein a length of the input string of characters is different from a length of the predefined string of characters, and wherein the first matrix and the second matrix are generated to have a same dimension corresponding to either the length of the input string of characters or the length of the predefined string of characters.
17. The computer program product of claim 13, wherein the first matrix, the second matrix, and the third matrix are compressed prior to generating the at least one first classification parameter and/or the at least one second classification parameter.
18. The computer program product of claim 13, wherein the at least one data entry is manually inputted by a user into a freeform input field.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Additional advantages and details of the invention are explained in greater detail below with reference to the exemplary embodiments that are illustrated in the accompanying schematic figures, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
DESCRIPTION OF THE INVENTION
(11) For purposes of the description hereinafter, the terms “end”, “upper”, “lower”, “right”, “left”, “vertical”, “horizontal”, “top”, “bottom”, “lateral”, “longitudinal”, and derivatives thereof, shall relate to the invention as it is oriented in the drawing figures. However, it is to be understood that the invention may assume various alternative variations and step sequences, except where expressly specified to the contrary. It is also to be understood that the specific devices and processes illustrated in the attached drawings, and described in the following specification, are simply exemplary embodiments or aspects of the invention. Hence, specific dimensions and other physical characteristics related to the embodiments or aspects disclosed herein are not to be considered as limiting.
(12) As used herein, the terms “communication” and “communicate” may refer to the reception, receipt, transmission, transfer, provision, and/or the like, of information (e.g., data, signals, messages, instructions, commands, and/or the like). For one unit (e.g., a device, a system, a component of a device or system, combinations thereof, and/or the like) to be in communication with another unit means that the one unit is able to directly or indirectly receive information from and/or transmit information to the other unit. This may refer to a direct or indirect connection (e.g., a direct communication connection, an indirect communication connection, and/or the like) that is wired and/or wireless in nature. Additionally, two units may be in communication with each other even though the information transmitted may be modified, processed, relayed, and/or routed between the first and second unit. For example, a first unit may be in communication with a second unit even though the first unit passively receives information and does not actively transmit information to the second unit. As another example, a first unit may be in communication with a second unit if at least one intermediary unit (e.g., a third unit located between the first unit and the second unit) processes information received from the first unit and communicates the processed information to the second unit. In some non-limiting embodiments, a message may refer to a network packet (e.g., a data packet, and/or the like) that includes data. It will be appreciated that numerous other arrangements are possible.
(13) As used herein, the term “account identifier” may include one or more PANs, tokens, or other identifiers associated with a customer account. The term “token” may refer to an identifier that is used as a substitute or replacement identifier for an original account identifier, such as a PAN. Account identifiers may be alphanumeric or any combination of characters and/or symbols. Tokens may be associated with a PAN or other original account identifier in one or more data structures (e.g., one or more databases, and/or the like) such that they may be used to conduct a transaction without directly using the original account identifier. In some examples, an original account identifier, such as a PAN, may be associated with a plurality of tokens for different individuals or purposes.
(14) As used herein, the term “merchant” may refer to an individual or entity that provides goods and/or services, or access to goods and/or services, to customers based on a transaction, such as a payment transaction. The term “merchant” or “merchant system” may also refer to one or more computer systems operated by or on behalf of a merchant, such as a server computer executing one or more software applications. A “point-of-sale (POS) system,” as used herein, may refer to one or more computers and/or peripheral devices used by a merchant to engage in payment transactions with customers, including one or more card readers, near-field communication (NFC) receivers, RFID receivers, and/or other contactless transceivers or receivers, contact-based receivers, payment terminals, computers, servers, input devices, and/or other like devices that can be used to initiate a payment transaction.
(15) As used herein, the term “mobile device” may refer to one or more portable electronic devices configured to communicate with one or more networks. As an example, a mobile device may include a cellular phone (e.g., a smartphone or standard cellular phone), a portable computer (e.g., a tablet computer, a laptop computer, etc.), a wearable device (e.g., a watch, pair of glasses, lens, clothing, and/or the like), a personal digital assistant (PDA), and/or other like devices. The term “client device,” as used herein, refers to any electronic device that is configured to communicate with one or more servers or remote devices and/or systems. A client device may include a mobile device, a network-enabled appliance (e.g., a network-enabled television, refrigerator, thermostat, and/or the like), a computer, a POS system, and/or any other device or system capable of communicating with a network.
(16) As used herein, the term “portable financial device” may refer to a payment card (e.g., a credit or debit card), a gift card, a smartcard, smart media, a payroll card, a healthcare card, a wrist band, a machine-readable medium containing account information, a keychain device or fob, an RFID transponder, a retailer discount or loyalty card, a cellular phone, an electronic wallet mobile application, a personal digital assistant (PDA), a pager, a security card, a computer, an access card, a wireless terminal, a transponder, and/or the like. In some non-limiting embodiments, the portable financial device may include volatile or non-volatile memory to store information (e.g., an account identifier, a name of the account holder, and/or the like).
(17) The term “selectable option,” as used herein, refers to one or more buttons, radio buttons, checkboxes, links, drop-down menus, text boxes, icons, and/or other like options that are selectable by a user through any type of input.
(18) As used herein, the term “server” may refer to or include one or more processors or computers, storage devices, or similar computer arrangements that are operated by or facilitate communication and processing for multiple parties in a network environment, such as the internet, although it will be appreciated that communication may be facilitated over one or more public or private network environments and that various other arrangements are possible. Further, multiple computers, e.g., servers, or other computerized devices, e.g., point-of-sale devices, directly or indirectly communicating in the network environment may constitute a “system”, such as a merchant's point-of-sale system. Reference to “a server” or “a processor”, as used herein, may refer to a previously-recited server and/or processor that is recited as performing a previous step or function, a different server and/or processor, and/or a combination of servers and/or processors. For example, as used in the specification and the claims, a first server and/or a first processor that is recited as performing a first step or function may refer to the same or different server and/or a processor recited as performing a second step or function.
(19) As used herein, the term “computing device” may refer to one or more electronic devices that are configured to directly or indirectly communicate with or over one or more networks. The computing device may be a mobile device. As an example, a mobile device may include a cellular phone (e.g., a smartphone or standard cellular phone), a portable computer, a wearable device (e.g., watches, glasses, lenses, clothing, and/or the like), a personal digital assistant (PDA), and/or other like devices. In other non-limiting embodiments, the computing device may be a desktop computer or other non-mobile computer. Furthermore, the term “computer” may refer to any computing device that includes the necessary components to receive, process, and output data, and normally includes a display, a processor, a memory, an input device, and a network interface. An “application” or “application program interface” (API) refers to computer code or other data sorted on a computer-readable medium that may be executed by a processor to facilitate the interaction between software components, such as a client-side front-end and/or server-side back-end for receiving data from the client. An “interface” refers to a generated display, such as one or more graphical user interfaces (GUIs) with which a user may interact, either directly or indirectly (e.g., through a keyboard, mouse, etc.).
(20) Non-limiting embodiments or aspects of the invention are directed to methods and systems for automatically detecting errors in at least one data entry in a database. Non-limiting embodiments or aspects of the methods and systems allow for the modification of data entries including errors, modifying including replacing, amending, or otherwise altering the incorrect data entry (such as a misspelling, abbreviation, or entry with junk text) with the intended, correct data entry to result in a more accurate database. Non-limiting embodiments or aspects of the invention include predefined strings of characters that may be compared to the data entry including an error in the image domain so as to identify the intended and accurate data entry. Non-limiting embodiments or aspects provide for a modified database that contains fewer incorrect data entries because data entries containing errors can be automatically identified and updated. Non-limiting embodiments or aspects of the invention convert the data entry from a text domain to an image domain, generating specific patterns from the data entry, which may then be compressed. The compressed patterns make data entries containing errors appear more similar to images of similar text entries without the errors, leading to a more effective correlation determination compared to correlation determinations conducted only in the text domain. Non-limiting embodiments or aspects of the invention also provide for the reduction of errors of downstream processes (transaction processing or analytics, for example) that use data entries in databases because the data entries used have been modified to remove errors associated with faulty user input. Non-limiting embodiments or aspects of the invention also allow for entries not included in the predefined strings database, but which have been added to the database by multiple different users (suggesting that the entry actually does not contain an error), to be added to the predefined strings database according to certain rules. In this way, the system can self-learn new, correct data entries, as opposed to identifying them as potential misinformation and improve itself over time.
(21) Referring to
(22) Referring to
(23) Referring to
(24) Referring to
(25) Referring to
(26) With continued reference to
(27) With continued reference to
(28) With continued reference to
(29) The result of the system 2000 shown in
(30) Referring to
(31)
(32) It will be appreciated that any conceivable string of characters can be represented in matrix form based on the previously-described rules-based protocol 302 to form an image representation (image map 304) of the string of characters. It will also be appreciated that alternative rules-based protocols that differently represent the characters in the string of characters to form an image representation of the string of characters may be used. For example, the image map may include an image of the data entry 300 “abf”.
(33) In some examples, a length of the data entry (e.g., characters in the input string) may be the same as a length of a predefined string. However, in other examples, a length of the data entry may be different from a length of a predefined string. In this example, the modifying processor 206 may generate image maps for the data entry and the predefined string having a same dimension corresponding to either the length of the data entry's string of characters or a length of the predefined string of characters, or some other length of characters different from both.
(34) Referring to
(35) With continued reference to
(36) With continued reference to
(37) The data entry may be determined to correlate with one of the predefined string using any suitable method. In one example, the data entry may be determined to correlate with one of the predefined strings using a rules-based protocol. For instance, if the classification parameter between the data entry and the predefined strings is above a certain threshold, the rules-based protocol may determine that the data entry correlates with the predefined string. In other embodiments, the rules-based protocol may specify that the data entry may automatically correlate with the predefined string with which it has the highest classification parameter (e.g., is most similar to).
(38) In another non-limiting embodiment, to determine whether there is a correlation, the modifying processor 206 may generate a predictive model. For example, the predictive model may be based at least in part on images generated from historic data entries. In this way, the system may learn from previous image maps found to correlate with one another. For example, classifiers (e.g., SVM, Random Forrest, and the like) may be used for learning whether a correlation exists. The modifying processor 206 may determine that the data entry correlates with the predefined string based at least partially on application of the predictive model to the image map of the data entry.
(39) In response to determining that the data entry correlates with the predefined string, the modifying processor 206 may communicate with the entry database 204, such that the modified data entry (matching the predefined string) is stored in the entry database 204.
(40) Referring to
(41) The data entry 500 may be compared against the previously non-matching strings 504a-c using the same methods previously described. For example, the modifying processor 206 may generate a classification parameter by comparing the image map for the data entry 500 to the each image map of the previously non-matching strings 504a-c. In some non-limiting embodiments, the data entry 500 may need to have a classification parameter that corresponds to the data entry 500 exactly matching one of the previously non-matching strings 504a-c.
(42) With continued reference to
(43) In some non-limiting embodiments, a data entry 500 that matches a non-matching string a single time may trigger the data entry 500 and/or the non-matching string being stored as a predefined string. However, in other embodiments, a predetermined number of identical input strings that match the non-matching string must first be entered before the data entry 500 and/or the non-matching string is stored as a predefined string. For instance, various users may need to enter the same non-matching data entry at least 10 times before the system determines that the non-matching data entry is a correct data entry and should be classified as a predefined data entry. Various rules may be implemented as to when a repeated non-matching string becomes a predefined string. In this way, the system may be self-learning by determining that data entries previously determined to be non-matching and considered to contain an error may be subsequently considered a predefined, correct data entry based on the data entry being entered by multiple users.
(44) Referring to
(45) Referring to
(46) Although the invention has been described in detail for the purpose of illustration based on what is currently considered to be the most practical and preferred embodiments, it is to be understood that such detail is solely for that purpose and that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover modifications and equivalent arrangements that are within the spirit and scope of the appended claims. For example, it is to be understood that the present invention contemplates that, to the extent possible, one or more features of any embodiment can be combined with one or more features of any other embodiment.