Systems and methods for analyzing and storing network relationships
10394761 ยท 2019-08-27
Assignee
Inventors
Cpc classification
G06F21/6245
PHYSICS
H04L67/63
ELECTRICITY
H04L67/1072
ELECTRICITY
International classification
Abstract
Systems and methods for augmenting, inferring, extrapolating, deducing, reasoning, and storing logical networks of information represented in digest form are disclosed. The disclosed systems and methods enable the combined inclusion of both sensitive information and non-sensitive information into a compressed, universal, indiscernible format. Relationships can be derived or inferred between unintelligible digests of information by associating them with a common underlying entity. A universal digest format described herein may become comprehensible by further utilizing additional a priori or a posteriori knowledge of the information such that recognized digests can be utilized fully while unknown and potentially sensitive digests remain undecipherable.
Claims
1. A computer-implemented method, the method comprising: receiving, by a computing device comprising at least one processor, a resource retrieval request from a requesting device, wherein the resource retrieval request is associated with a plurality of attributes of content; identifying, by the at least one processor, a plurality of attribute digests that is obtained by applying a one-way function to each of the plurality of attributes, each attribute digest of the plurality of attribute digests corresponding to a respective attribute of the plurality of attributes, the plurality of attribute digests forming a composite digest; matching, by the at least one processor, the plurality of attribute digests with a respective set of attribute digests associated with each of a plurality of network resources stored on storage accessible by the computing device, computing, by the at least one processor, a respective ranking score for each of the plurality of network resources, wherein computing the respective ranking score for a particular network resource comprises: determining a total number of individual digests concatenated together in the composite digest; determining a number of individual matching digests between the composite digest and a particular respective set of attribute digests associated with the particular network resource; and computing the respective ranking score for the particular network resource based at least in part on the total number of individual digests in the composite digest and the number of individual matching digests; ordering, by the at least one processor, the plurality of network resources based at least in part on the respective ranking score for each network resource; and returning, by the computing device to the requesting device, a respective identifier enabling access to each of the plurality of network resources and the ordering of the plurality of network resources in response to receiving the resource retrieval request.
2. The computer-implemented method of claim 1, wherein a user device applies the one-way function to each of the plurality of attributes to obtain the plurality of attribute digests and transmits the plurality of attribute digests to the computing device via one or more networks.
3. The computer-implemented method of claim 1, further comprising: applying, by the at least one processor, the one-way function to each of the plurality of attributes to obtain the plurality of attribute digests.
4. The computer-implemented method of claim 1, wherein each attribute digest comprises a respective type based at least in part on which the respective ranking score is calculated.
5. The computer-implemented method of claim 1, wherein the respective ranking score for the particular network resource is further computed based at least in part on one or more factors comprising at least one of: a number of contributors of the respective set of attribute digests associated with the particular network resource, a credibility factor associated with the contributors, a number of data sources contributing to the respective set of attribute digests, a credibility factor associated with the data sources, a number of attribute digests not in common between the resource retrieval request and the particular network resource, and a frequency of occurrence of each attribute digest in common between the resource retrieval request and the particular network resource.
6. The computer-implemented method of claim 5, wherein the resource retrieval request comprises a unique identifier of a user associated with the resource retrieval request, the method further comprising storing, by the at least one processor, the unique identifier in association with the plurality of attribute digests.
7. The computer-implemented method of claim 1, further comprising storing, by the at least one processor, the plurality of attribute digests in association with the resource retrieval resource request.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
DETAILED DESCRIPTION
Overview
(23) This document describes methods and systems for augmenting, inferring, extrapolating, deducing, reasoning and storing logical networks of information represented in digest form, thereby enabling the combined inclusion of both sensitive information and non-sensitive information into a compressed, universal, indiscernible format. The inventive techniques disclosed herein allow relationships to be derived or inferred between the aforementioned unintelligible digests of information by, for instance, associating them with a common underlying entity: a person, place, or business, for example. Further, the inventive universal digest format described herein only becomes comprehensible by further utilizing additional a priori or a posteriori knowledge of said information; via this property, recognized digests can be utilized fully, whereas unknown and potentially sensitive digests remain undecipherable.
(24) In the interest of clarity, the inventive techniques described herein build upon conventional digest representation to effectively enable any one such party to provide an arbitrary set of information, with no limitation regarding size or content and possibly including potentially or absolutely sensitive information, and have it be transformed into an unintelligible digest form. The extent to which information is recoverable from the furnished digest(s) is limited to information already known at that instant or information discovered at a future point in time; it follows that, most importantly, no unknown information is ever revealed, thus keeping unknown and potentially sensitive information undecipherable. Conversely, in the event that information is able to be recovered from the unintelligible digest, by very definition, the information was already known or was discovered separately and, consequently, no new information has been revealed.
(25) A digest representation is achieved by making use of one or more one-way functions, whereby a given input is transformed into a deterministic output digest. Notably, for the output digest, the corresponding provided input cannot be recovered (assuming reasonable constraints around time and/or resources). In one embodiment detailed herein, the well-known one-way SHA-256 hash function is utilized.
(26) Building upon this opaque, irreversible representation of underlying information, additional inventive techniques are disclosed herein for: systematically structuring and organizing networks of digests; reasoning about relationships between digests that may have come from a plurality of disparate sources; inferring that individual information records, such as rows in a database, represent the same underlying entity like a person, place, or business; and augmenting stores of information, often external (e.g. a customer's) and/or otherwise considered private, with information obtained from a network International Resource Identifier (IRI). The aforementioned inventive techniques may be collectively referred to as analysis or intelligence in relation to digest representations.
(27) Such an approach may advantageously provide one or more features. For example, in the scenario outlined in the Background section whereby an individual wishes to augment his or her contact information contained in, for instance, a digital address book, he or she may safely utilize an external system that uses the inventive techniques disclosed herein to augment said information without concern, as nothing sensitive has been or will be revealed. Similarly, in the case of the previously mentioned scenario in which a business employing a CRM system wishes to supplement its customer records, it may do so by, for example, engaging peripheral applications or systems that use the inventive techniques disclosed herein, despite the fact that the aforementioned customer records almost certainly contain sensitive information.
(28) In service of providing an additional level of specificity, a concrete scenario follows herein. Consider an individual that has accumulated a large collection of business cards that he or she has collected by attending various events or conferences over the course of many years. If that individual were to transition the set of business cards into an electronic format, whether manually or with software that utilizes optical character recognition (OCR), the end result would be some form of an electronic repository of information representing the underlying originating people. In the interest of alleviating ambiguity, let's say that this information is input into a spreadsheet.
(29) Business cards generally contain a subset of the information associated with a person, such as a name, phone number, email address, company, or job title; however, given that this information represents a specific point in time, the greater the chance it is out-of-date as more time elapses. For instance, attributes associated with a person such as the company they work for and their corresponding job title commonly change. Further, business cards typically omit other pieces of information that the receiver may be interested in knowing, like where that person was employed previously, academic institutions the person attended, or real-world connections the two have in common. It is also plausible that the receiving individual may have captured additional notes about the original conversation that ensued which he or she might want to remain private, such as a detail revealed in confidence.
(30) Given the preceding context of an electronic repository of business cards contained within a spreadsheet, an explanation will now be given as to how the inventive techniques described herein can provide value to the owner. Henceforth, the individual possessing this information, previously referred to as both the owner and the receiver, shall be referred to as the user.
(31) First, the user may be provided with a user interface with which he or she may input information. In this scenario, the user provides input by furnishing the previously established spreadsheet as a file. In other embodiments, information may be exported from an external system using, for example but without restriction, a programmatic integration over a network. Collectively, all potential sources of information are referred to herein as a plurality of information stores; most commonly but without limitation, the plurality of information stores with which the inventive techniques disclosed herein interact may be considered external, defined as having been provided by a user and/or customer. For the remainder of this example, the originating spreadsheet may be referred to instead as an instance of one such information store (e.g. the information store).
(32) Next, a user interface may give the user the option to both select the information store attributes to export and, of the exported attributes, which should have a one-way function(s) applied to them. In this scenario, the SHA-256 hash function is used as the one-way function of choice. As one of ordinary skill in the art would attest, SHA-256 is a well-known hash function with implementations readily available in most programming languages. Let's suppose that the user chooses to export the following fields: first name, last name, company name, job title, phone number, and email address. Further, let's assume that, wanting to respect the privacy of the individuals represented in the information store, the user has indicated that he or she would like a one-way function(s) applied to every attribute.
(33) Accordingly, a resultant payload would be formulated with the contents of the originating information store, in this case a spreadsheet, in an opaque digest form. Standardization and/or canonicalization functions may be used in conjunction with the inventive techniques disclosed herein to increase textual similarity between other occurrences of the same attribute. For instance, if the company name attribute for an individual appeared as International Business Machines, even if in digest form, the inventive techniques herein may instead replace it with the digest representation for ibm. By doing so, the likelihood of matching attributes that have an association with IBM has been increased. In one embodiment, a key-value reference table of common synonyms is maintained to facilitate the respective substitution.
(34) Separately, independent of the user's input, the inventive techniques disclosed herein may obtain information from network IRIs, transform this obtained information into a digest representation, then organize, structure, index, and finally store the digest representation. In the interest of simplicity, this example will consider information located on the professional social networking website LINKEDIN; in practice, information may be indexed in digest form from a plurality of IRI domains without restriction. Continuing, the present embodiment assumes that individual profiles representing each person located in the user-provided information store, where such a profile exists, have already been obtained from their respective IRIs and indexed in digest representation prior to the user's interaction.
(35) Subsequently, the inventive techniques disclosed herein take the digest representation of the user-provided spreadsheet and, although the digests are completely unintelligible, make use of derived and/or inferred intelligence in attempt to find candidate matches pertaining to the same underlying entities within the aforementioned indexed IRI resource information. For instance, suppose one individual record in the user-provided information store, prior to having any transformation functions applied, appears as follows:
(36) {
(37) first_name: Steve,
(38) last_name: Jobs,
(39) company: Apple, Inc.,
(40) job_title: Chief Executive Officer,
(41) phone_number: 123-456-7890,
(42) email_address: steve@apple.com
(43) }
(44) Here, the record is represented using a JavaScript Object Notation for Linked Data (JSON-LD) data-interchange format. As one versed in the art would attest, JSON is commonly used to marshal data in between processes or systems, and JSON-LD is JSON used as a way to represent relationships contained within the JSON. In other embodiments, a plurality of encodings may additionally be utilized. Once transformed into digest representation using the process previously described, the very same record may appear as the following:
(45) {
(46) first_name:
(47) f148389d080cfe85952998a8a367e2f7eaf35f2d72d2599a5b0412fe4094d65c,
(48) last_name:
(49) 5d9a17cb70b9733aadc073a44c21889d33325874c51f9c0c461de3e61a2425eb,
(50) company:
(51) 1804504fae2bb2f3d4551393af4f9fc7a9c229ea7cd17d3837400c1653634132,
(52) job_title:
(53) 31d123108dba5fd36075ab4f89b130a495b15278b38e8a1a756bc43926c43e2e,
(54) phone_number:
(55) c775e7b757ede630cd0aa1113bd102661ab38829ca52a6422ab782862f268646,
(56) email_address:
(57) 10933d29dc2b65cb44d496a4cfce184e82e37f7e6687f61beadd3a6b5e8afef6
(58) }
(59) So that each attribute spans only one line, the font size has been decreased as necessary. In the interest of disambiguation, steve, jobs, apple inc, ceo, 1234567890, and steve@apple.com were used as input values, respectively. Transformation functions in the way of normalization, standardization, and/or canonicalization have been utilized; for instance, attributes have been lower-cased, special characters such as punctuation have been removed, and synonymous diction has been substituted, exemplified by ceo having replaced Chief Executive Officer. Finally, the SHA-256 one-way hash function was applied to the resultant transformed values. Although semantically equivalent language was substituted prior to a one-way function(s) being applied in the present embodiment, it may also occur after a digest representation is obtained for known digests (e.g. first names given by a census).
(60) The inventive techniques disclosed herein would then take the digest representation of the individual record exemplified, along with the digest representations of every other such record, and deduce candidate matches contained within the digest store. Contemplating further the individual record cited above, the inventive techniques would use a logical entity referred to herein as the digest engine to look for digest store information with overlapping logical relationships between digests. For instance, one such relationship the digest engine may consider would be all people named steve that are associated with apple inc. It may also solicit people named steve that have a job title ceo. Yet another relationship the digest engine may seek is all people with the last name jobs, an email address of steve@apple.com, and a phone number 1234567890. Although each of the preceding examples has been described semantically in plaintext form, the actual comparison would be made using their respective unintelligible digest representations. Moreover, while some attributes included in the present embodiment, such as an email address or a phone number, may be considered personally identifiable information in that they uniquely identify an individual, the inventive techniques disclosed herein don't require any such attributes as a prerequisite; specifically, and to reiterate explicitly, had the information store only included attributes of first name, last name, and job title, the inventive techniques herein would still function as described.
(61) While a naive implementation may visit each combination of attributes, having contemplated enough relationships to reach a conclusion, the inventive techniques may return a result set. Regardless, the digest engine may employ, without limitation, probabilistic, statistical, or other alternative methods to determine an ordered result set. In this example and as mentioned heretofore, the present disclosure considers only previously indexed LINKEDIN user profiles within the digest store; for each record contained in the user-provided input, LINKEDIN profile candidate matches would be transmitted back to the user.
(62) In some embodiments, the user then accepts, rejects, or ignores each candidate match returned, as indicated by an input selection or lack thereof. Assuming a candidate match is correct, the user may then augment their representation of the underlying person with information contained in the digest store for the very same person. Additionally, the user may opt to correct or otherwise mutate existing information with a more up-to-date or a more standardized version for example. While the digests remain opaque from a privacy perspective, the inventive techniques disclosed herein may derive intelligence through a plurality of information stores or network IRIs in the way of consensus for a particular attribute. For instance, if a user has an individual record with a job title attribute reading chief, whereas it has been determined that a plurality of (e.g. 100) information stores have indicated that the underlying entity should instead have job title chief executive officer, then a suggestion may be presented to the user. For certain well-known attributes, the plaintext version may be offered back to the user as a suggestion; however, by default and by the very nature of a one-way digest representation, the only aspect known is that there has been unanimity around a different value for the same attribute.
(63) In another embodiment, the inventive techniques disclosed herein are combined with other information retrieval techniques in order to balance the tradeoffs between latency and storage and memory requirements. For example, an inverted index may be utilized to store and index the attributes associated with a resource below a certain level in the digest store. Utilizing the inventive techniques of the digest store in combination with an inverted index alleviate some of the constraints associated with deploying a large-scale inverted index while still achieving low latency goals.
(64) Finally, inventive techniques disclosed herein make use of both user selection and non-selection of candidate results to improve future results. As one skilled in the art would attest, ranking functions are commonly employed to determine the relevance of results returned. For instance, the proportion representing the number of times a candidate match is returned versus how often the same match is selected may affect how often it is returned in the future. Similarly, a user indicating that an information store record is related to multiple differing network IRI resources, such as online identities like those found on social networks, may inform future results.
(65) The inventive techniques disclosed herein additionally embody embeddable sets of instructions (e.g. third-party), such as those able to be invoked by a web browser (henceforth referred to as a browser extension), for interpreting and subsequently interacting with rendered documents as information stores. As one of ordinary skill in the art would substantiate, third-party sets of invokable instructions are commonly packaged into a format referred to by terminology such as a plugin or an extension. Although the embodiment contemplated is executed within a web browser environment, other embodiments may include a plurality of environments as described heretofore without limitation.
(66) The previously mentioned browser-based example works in a similar fashion to the anterior embodiment, differing only in the way in which the information store is furnished and subsequently provided reciprocal intelligence. Instead of, for instance, providing input by uploading a file or establishing a connection with an external system, the user installs a browser extension and subsequently navigates to a document representing an information store (e.g. a CRM web application or resource on a social network). Although the present embodiment utilizes a set of well-known domains for which the installed browser extension is able to determine how to interpret the document, other embodiments may employ methods such as natural language processing (NLP) to automatically infer the structure and/or meaning of a document. As appreciated by a person of ordinary skill in the art, the present embodiment utilizes a priori knowledge of the well-known domain to parse the document and structure the payload before transmission, thereby substantially reducing the data transmission size and explicitly not collecting sensitive information that may be contained within the HTML. Although the current disclosure does not require type information to accompany each of the attributes (e.g. f148389d080cfe85952998a8a367e2f7eaf35f2d72d2599a5b0412fe4094d65c is a digest of type https://schema.org/givenName), such additional information may accompany the attributes in order to improve the results returned from the digest store.
(67) Once a document representing an information store has been rendered, the installed browser extension interprets the document, extracts pertinent information, and transmits it to the digest engine over the network for further processing. Like the anterior embodiment, the data-interchange format may be, but isn't limited to, JavaScript Object Notation for Linked Data (JSON-LD); the transmitted payload may contain one or more individual records exemplified above as JSON-LD objects. Given the ephemeral nature of rendered documents, ancillary metadata such as an anonymous identifier may be included to uniquely identify respective information stores. Further, any of the previously described transformation functions, including the one-way function(s), may occur in the user's environment (client-side) prior to being transmitted, or instead after having been received (server-side) by the systems and devices detailed herein.
(68) After resulting candidate matches are returned to the browser extension over the network by the digest engine, visual elements are injected into the rendered document as to indicate corresponding potential matches to the user. Using these visual elements, the user is provided a mechanism for similarly accepting, rejecting, or otherwise ignoring candidate matches, like the original embodiment. Similarly, the sequence of steps that take place after a user does or opts not to make a selection, disclosed herein for the present inventive techniques, remain the same.
(69) The details of one or more embodiments of the disclosure are set forth in the accompanying drawings and descriptions below. Other features or advantages of the disclosure will be apparent from the description and drawings.
(70) The subject matter of the present disclosure is described with specificity herein to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the terms step, block and/or area may be used herein to denote different elements of methods employed, these terms should not be interpreted as implying any particular order among or between various steps herein disclosed.
(71) A representative environment for implementing example embodiments of the disclosure is first described with respect to
Example Embodiments
(72)
(73) It should be understood that this and other arrangements described herein are set forth only as examples. Other arrangements and other components (e.g., machines, interfaces, functions, orders, and groupings of functions, etc.) may be used in addition to or instead of those shown, and some may be omitted altogether. Further, many of the components described herein are functional entities that may be implemented as discrete or distributed components or in conjunction with other components, in any suitable combination and in any suitable location. Various functions described herein as being performed by one or more entities may be carried out by hardware, firmware, and/or software. For instance, various functions may be carried out by a processor executing instructions stored in memory.
(74) Among other components not shown, the system 0100 includes a plurality of user devices (0101 and 0115), a plurality of information storage devices 0103, application programming interface (API) services 0106, a network transport API device 0120, a digest engine device 0108, a caching device 0111, and a digest storage device 0113. The components may communicate with each other via a network 0107.
(75) The components represented in
(76) The user 0119, employing a web browser (browser) running on a user device 0101, launches the user interface 0102. In some embodiments, the user device 0101 is a workstation computer with GOOGLE CHROME (CHROME) version 42 installed, a connected keyboard and mouse, 1 GB of memory, a 128 GB disk drive, and connected to a network 0107. The interface 0102 presented on user device 0101 next establishes a connection to the network transport API server (API server) 0109 residing on the network transport API device 0120 by communicating over the network 0107. In some embodiments the API server 0109 is NGINX: version v1.8.0 (commit a3055d1ce9febb3fa0c8d25ba406c7c473efba43) and the network transport API device 0120 is a cluster of GOOGLE COMPUTE ENGINE n1-standard-4 instances, each of which has 4 virtual CPUs and 15 GB of main memory. The network transport API server 0109 returns a response to the interface 0102 running on the user device 0101. As illustrated in further detail in
(77) A different user 0118 interacts with a separate user device 0115, possessing similar specifications to the user device 0101 described above. The user 0118 has previously installed a browser extension 0117 from the GOOGLE CHROME Web Store (CHROME Store) into their browser 0116. The CHROME Store provides an online marketplace for the discovery and installation of modules of executable computer code invoked within the CHROME browser; as one skilled in the art would recognize, the installation and subsequent usage of browser extensions from the CHROME Store is common practice. In another embodiment of the present disclosure, the user device 0115 has the FIREFOX browser installed and the browser extension 0117 is installed via the add-ons capability within FIREFOX. In another embodiment of the present disclosure, the user device 0115 has the SAFARI browser installed and the browser extension 0117 is installed via the APPLE SAFARI Extensions website. Additionally, other web browsers not listed that can be extended to invoke custom executable code 0116 may also be included within the system 0100.
(78) Having described a representative environment in
(79)
(80) Having obtained a resource referenced by an IRI at block 0201, named-entities are extracted 0202. As one skilled in the art will recognize, there are standard industry programs used to extract named-entities from text; by way of example only and not limitation, MIT Information Extraction (MITIE) is used herein.
(81) As shown in block 0203, transformation functions are applied to the named-entities extracted by the previous step 0202. As one skilled in the art will recognize, transformation functions may improve the likelihood of matching like information, such as that exported 0105 by an information store 0104 in
(82) The output of applying the aforementioned transformation functions at block 0205 is then used as input to a one-way hash function 0204, the outcome of which is an unintelligible digest representation of the underlying information. As one skilled in the art would understand, with many hashing functions, the slightest variation of input parameters will result in an entirely different digest representation. The actual hash function utilized may be one or more of a plurality of hash functions that one skilled in the art would be familiar with, including but not limited to: BLAKE-256, BLAKE-512, ECOH, FSB, GOST, Grstl, HAS-160, HAVAL, JH, MD2, MD4, MD5, MD6, RadioGatn, RIPEMD, RIPEMD-128, RIPEMD-160, RIPEMD-320, SHA-0, SHA-1, SHA-2, SHA-3, SHA-224, SHA-256, SHA-384, SHA-512, Skein, SipHash, Snefru, Spectral Hash, SWIFFT, Tiger, or Whirlpool. In other embodiments of the present disclosure, alternative one-way functions may be used.
(83) The resultant digest representation produced at block 0204 is subsequently requested from the digest store 0205; if such a record exists, the associated entry is updated 0206. The process of updating the digest store with an existing digest representation is provided in
(84) Method 0200 handles both inserts of new records and updates of existing records. New record insertion will be described in connection with
(85)
(86) At Step 0301, WriteDigest is invoked with three parameters: the current set of component digests to be persisted in their composite form; the set of component digests remaining, comprised of all component digests for the given IRI minus the component digests of the current digest; and the IRI. In step 0302, the composite form of digest d is labelled appropriately as to avoid duplicate operations for the same digest. In step 0303, the composite form of digest d's auxiliary information is retrieved from the digest store and assigned to v. In one embodiment of the present disclosure, the previous auxiliary information is detailed by
(87) In the embodiment as described by method 0300, a recursive technique for using WriteDigest is employed. However, as one skilled in the art will understand, method 0300 may alternatively be implemented using an iterative sequence of instructions. Method 0300 as described above is synchronous to the caller. In another embodiment of the present disclosure, step 0309 is performed asynchronously. In addition, method 0300 may be further optimized due to, for instance, constraints or service level requirements. Further, digests in common between IRIs for a given digest d may instead be stored with auxiliary information in the digest store in lieu of the loop represented by 0305-0309.
(88) As mentioned above, method 0200 handles both inserts of new records and updates of existing records. Having discussed insertions in connection with
(89) In step 0401, two parameters are provided as input: a set of digests d from the hashing output of
(90)
(91) In this embodiment of the present disclosure, digest representations have been created by applying the SHA-256 hashing function to the underlying associated information. For example, block 0501 was computed by applying the SHA-256 hash function to the text georgia institute of technology, resulting in digest c2dab080fa8567162a918a229286f5d2d63479d989a0d98c6a25841694745898.
(92) Blocks 0502, 0503, 0504, 0505 and 0506 illustrate logical relationships between the digest for georgia institute of technology 0501 and digests for houston, atlanta, chicago, boston, and richmond, respectively. All resources referenced by an IRI that contain the attributes georgia institute of technology and houston, for example, will have the digest at block 0502 in common. Block 0507 illustrates a logical relationship additionally with the attribute female; therefore, all resources referenced by an IRI that contain the attributes georgia institute of technology, houston and female, for example, will have block 0507 in common. Similarly, block 0508 illustrates an additional logical relationship between the combination of georgia institute of technology, houston, female and a new attribute. In the interest of brevity, further logical relationships of digests may continue 0509 without limitation within the digest store 0500.
(93) In one embodiment of the present disclosure, auxiliary information associated with a particular digest is combined into a file, named in accordance with the very same digest, and persisted to the digest store. For instance, the file pertaining to block 0504 would be named a2470c9d137c1c5d3567d1180a64cb43a9269c4d6f1ff13 ac8cdbaf6fc5df3b7c2dab080fa8 567162a918a229286f5d2d63479d989a0d98c6a25841694745898, and its auxiliary information is further described in
(94)
(95) Area 0602 of
(96) Area 0603 of
(97) Having described methods of persisting resources obtained from an IRI to the digest store in digest representation, what is presented next in
(98) Flow diagram
(99) calling application programming interfaces that read from the information store; or in response to collected events occurring within the information store. An information store may be, but is not limited to, one or more of the following: a database, spreadsheet, file, customer relationship management (CRM) system, contact list, or any other format containing information with logical relationships between its component attributes. As one skilled in the art will recognize, an information store may be physically located within the premises of the owner or in a different location altogether, in which case it may be accessed over a network such as the Internet.
(100) As shown at block 0701, input containing logical relationships between component attributes is imported into the user interface corresponding to
(101)
(102) Moving on now to
(103) By way of illustration only and without limitation,
(104)
(105) In
(106)
(107) Method 1100 receives as input 1101 the output of
(108) For any individual attribute contained within the input payload that exists in raw plaintext form, steps 1102-1103 may apply a prerequisite set of tasks before continuing. In step 1102, transformations are applied to the input and named-entities are extracted. In 1103, a one-way function is applied to the output of 1102 to create a digest representation. Moving on to step 1104, in one embodiment of the present disclosure, a request is formulated using the set of digests d from 1101 in order to locate related digests within a similarity-based reference table consisting of a source digest, canonical digest, and similarity score.
(109) In step 1105, the output of 1103 and 1104 are unioned to create a set of digests d. Step 1106 creates two sets: a set of digests S that have been visited and sorted based on their match score, and a result set T. Step 1107 invokes method 1121 with parameters of null and d respectively; upon completion, the result set T is returned to the caller.
(110) Step 1122 determines if, during the recursive invocation of 1121 from step 1136, a threshold on the number of results in set T has been reached in order to ensure sufficient match results are returned. In one embodiment of the present disclosure, this threshold is established based on the cardinality of T. In another embodiment, the threshold is determined by the number of entries in T with a match score above a certain value. Other embodiments may apply statistical formulas across a subset of T's elements to govern whether or not a threshold has been reached. In any situation, once a threshold has been reached, method 1120 returns.
(111) When method 1121 is initially invoked (by 1107), parameter c is the null set and parameter r contains the digests unioned in step 1105. Step 1123 determines if method 1121 was invoked by 1107 or recursively by 1136; if c is the null set, method 1121 has been invoked by 1107 and steps 1124-1131 should be skipped after r is overwritten by f r. If, however, c is not the null set, method 1121 has been invoked by step 1136 recursively and steps 1124-1131 should be executed. After the composite digest form of c is marked as visited in step 1124, it is used to query the digest store, the results of which are saved in v. If no records are found, it effectively indicates that there are no IRIs associated with digest c in composite form within the digest store; accordingly, the method returns.
(112) Having populated digest set v, in step 1127, a score is calculated indicating the degree to which there is a match between digests contained within c and v; one embodiment of the formula used to calculate this score is described in
(113) Step 1130 determines if a threshold has been reached; if so, it returns. If not, the next instruction 1131 retrieves the highest score from S and assigns digests c and r from S to c and r, respectively.
(114) Steps 1132-1136 loop over each x of the digests in set r, assigning new values of m and n in each iteration. Step 1133 appends x to the completed set c to create set m. Then step 1134 removes digest x from r to create n. If m has not yet been visited 1135, then step 1136 recursively invokes 1121 with m and n. Upon completion, the method returns.
(115) Method 1100, as embodied, employs a recursive technique. As one skilled in the art would understand, it may alternatively be implemented using an iterative sequence of instructions. In another embodiment of the present disclosure, step 1136 is performed asynchronously. In addition, method 1100 may be optimized based on, for instance, constraints or particular requirements; by way of example and without limitation, multiple invocations of method 1120 may be executed concurrently.
(116)
(117)
(118) Area 1201 corresponds to information exported from a store like that of
(119) In this illustration, the user may select zero or more of the results returned in area 1203, thus designating candidate matches as recognized matches. In one embodiment of the present disclosure, as selections occur, the corresponding records in area 1202 are highlighted. Here, the user has selected returned result 1204, as denoted by the checkbox; accordingly, the user has effectively indicated that result 1204 is in fact related to information store record 1202, confirming the suspected reference between and subsequent choice of the proposed candidate match. This reference is in turn transmitted to the digest engine device, via the network transport API server, to be written to the digest store as shown in
(120) In another embodiment, the user does not make any selections. The duration of time spent on each record of area 1201, along with the set of candidate matches, both selected and not selected, is relayed back to the digest engine device to improve future match recommendations.
(121) In yet another embodiment of the present disclosure, the user may be presented with an option to explicitly reject a candidate match presented in area 1203 as an indication of an incorrect result. This information is transmitted to the digest engine device to be written to the digest store, to improve future match recommendations.
(122)
(123) Line 1301, the declaration of WriteInfoStore, receives input resulting from a user making zero or more selections as described in
(124) Step 1302 checks the value of c, as the initial invocation of WriteInfoStore is made with c equal to the null set and r containing every digest associated with a given IRI. Subsequent invocations of WriteInfoStore are performed recursively beginning with step 1312, described further below. If it is affirmed that c is in fact the null set, then the procedure skips to step 1308; if not, the next step executed is 1304, which loops through each element of c and checks if it is contained in d. If not, it is not possible to create a reference between the exported information store record and information located in the digest store for digest c, so the function terminates. The function may return a value indicating the reason for termination.
(125) Step 1304 retrieves auxiliary information from the digest store using the composite form of c. Step 1305 adds this composite digest to set S, which maintains all references between an id and its composite digest entries in the digest store. Step 1306 appends id to the set of identifiers associated with v and re-persists it to the digest store in step 1307.
(126) Steps 1308-1312 loop through each element x of set r. In step 1309, x is appended to c and then stored in m. Step 1310 checks if the composite form of m is labeled complete and, if not, executes steps 1311 and 1312. Step 1311 removes x from set r to create set n. Then, step 1312 recursively invokes step 1301 (WriteInfoStore) with parameters of m, n, d and id. Step 1313 checks if c is the null set, indicating the initial invocation of 1301; if so, id and set S are persisted to an information store reference structure, as described in
(127) In the embodiment described by method 1300, a recursive technique is employed. As one skilled in the art would understand, this method may alternatively be implemented using an iterative sequence of instructions. In another embodiment of the present disclosure, the step 1312 is performed asynchronously.
(128)
(129) Area 1401 is representative of the input to process 1300 in one embodiment of the present disclosure. Area 1402 illustrates the output in another embodiment of the present disclosure. It should be noted that, in the interest of clarity, irrelevant or unnecessary information for the sake of demonstration has been explicitly omitted. As one can clearly see, two digests overlap between the user-selected records from the information store and digests contained by the digest store: c3d24dae34dc01c8baa600dd8d9810148b3741eef00f959e3a7186492b8fba72 and a6c2221569febb76282953aabf85f648511cbe327a8b20580e6029998184cdeb. As a result, in addition to other mutations that may be made to the digest store, the identifier 3024323232 will be appended to the array of InputIds for the digest store entry with the key a6c2221569febb76282953aabf85f648511cbe327a8b20580e6029998184cdebc3d24dae3 4dc01c8baa600dd8d9810148b3741eef00f959e3a7186492b8fba72, as previously outlined in
(130)
(131) Step 1510 loops through each digest y in set d, checking to see if it exists in S in step 1511; for each digest that is new, steps 1512 and 1513 are executed. Step 1512 creates r from the set d with y removed. Step 1513 invokes method 1301 from
(132)
(133) Flow diagram
(134) In block 1701, every information store that possesses references to the digest store is retrieved; these are created and persisted as described in
(135) The next step retrieves all digests associated with the information store record 1704. The aforementioned digests are iterated through in blocks 1705-1708. If the return value of block 1706 is negative, block 1709 is executed. In one embodiment of the present disclosure, the credibility calculation is performed by averaging the values from block 1707. In a second embodiment of the present disclosure, the median value is used instead. In yet another embodiment, the values from block 1707 are binned, weights are placed on each bin, and a weighted average is computed based on the number of records in and the weights placed on each bin.
(136) If block 1706 returns a positive assertion, indicating that there are more records to be processed, blocks 1707 and 1708 are invoked in respective order.
(137) For block 1707, a score is computed based on the reference between an information store and information contained within the digest store for a given individual record. The derivation of said score in one embodiment of the present disclosure is as follows:
(138) Let X be defined as a set of digests exported from an information store.
(139) Let Y be defined as a set of digests created from information obtained from a IRI and residing in the digest store.
(140) Let Z be defined as the set resulting from the logical union of X and Y.
X=XY
For members of set Z, we perform the following calculation:
(141)
where |Z|=cardinality of Z
and U.sub.t=system-wide number when Z.sub.i|X| and Z.sub.i|Y|
and U.sub.m=system-wide number when Z.sub.i|X| and Z.sub.i.Math.|Y|
and U.sub.e=system-wide number when Z.sub.i.Math.|X| and Z.sub.i|Y|
and t.sub.Zi=count of component digests c in Z.sub.i where c|X| and c|Y|
and m.sub.Zi=count of component digests c in Z.sub.i where c|X| and c.Math.|Y|
and e.sub.Zi=count of component digests c in Z.sub.i where c.Math.|X| and c|Y|
and w.sub.Zi=digest weight from digest store for Z.sub.i.
(142) In one embodiment of the present disclosure, the digest weight W.sub.Zi is calculated as follows:
(143)
where s=set of information stores that have references to digest Z
and |s|=cardinality of set s
and w.sub.e=system-wide number assigned when an information store has a reference to a digest; this number is independent of the credibility of the information store
and w.sub.f=system-wide number applied to each c.sub.si when an information store has reference to a digest; this number is dependent on the credibility of the information store
and c.sub.si=credibility of an information store s,
(144) In one embodiment of the present disclosure, the sequence of steps outlined above are executed. In another embodiment of the present disclosure, matrices are instead used to perform the calculations.
(145) Block 1708 both: takes the result of block 1707 and adds it to a running total, and increments a count of the total number of records processed, each for the present information store. These values are used in block 1709.
(146) Having described methods of exporting and augmenting information from an external information store with related information from the digest store, what follows in the way of
(147) Flow diagram
(148) For block 1801, a document is navigated to by a user 0118 from
(149) After the browser renders 1802 in response to step 1801, the resulting document is parsed by a browser extension 1803; subsequently, raw information is extracted and transmitted to the digest engine device 1804.
(150) Transformation and one-way functions are applied to the information extracted from the document 1805 to create corresponding digest representations, described in further detail in
(151) In block 1808, the browser extension takes the resulting payload transmitted in block 1807 and visually injects it into the previously rendered document. The user can then interact with a plurality of said visual elements injected into the document; pertinent input, such as user selections, may be captured by the browser extension and later transmitted to the digest engine device 1809 in order to inform future intelligence and/or decisions.
(152) In one embodiment of the present disclosure, method 1300 is invoked with the transmission from 1809.
(153) An illustration of
(154)
(155) In another embodiment of the browser extension,
(156) As can be understood, embodiments of the present disclosure identify methods and systems for analyzing and storing logical networks of information in a digest representation by augmenting, inferring, extrapolating, deducing, and/or reasoning. The present disclosure has been described in relation to particular embodiments, which are intended in all respects to be illustrative rather than restrictive. Alternative embodiments will become apparent to those of ordinary skill in the art to which the present disclosure pertains without departing from its scope.
(157) From the foregoing, it can be seen that this disclosure is one well adapted to attain various ends and objects set forth above, together with other advantages, as a person of ordinary skill in the art would appreciate. It should be understood that certain features and sub-combinations are of utility and may be employed without reference to other features and sub-combinations.