SYSTEM FOR PROTECTING AND ANONYMIZING PERSONAL DATA

20220391537 · 2022-12-08

Assignee

Inventors

Cpc classification

International classification

Abstract

The computer system includes a control computer system, a provisioning computer system and at least one user computer system. The control computer system includes control software. The user computer system includes a data store in which personal data is stored and an anonymization software. The anonymization software is configured for receiving at least one anonymization protocol; for each of said at least one anonymization protocol selecting and anonymizing a subset of the personal data in accordance with said anonymizing protocol; and transferring the anonymized subset and an identifier of the anonymization protocol to the control software. The control software is configured for receiving the at least one anonymized subset and the at least one identifier from said anonymizing software; and providing the subset and the identifier to the analysis software for performing those analysis functions to which the anonymization protocol identified by the identifier is associated, on the subset.

Claims

1.-19. (canceled)

20. A computer system for the anonymization of personal data, comprising: a control computer system comprising a control software for providing anonymized personal data to at least one analysis software, the at least one analysis software comprising a plurality of different analysis functions for analyzing personal data; a provisioning computer system comprising a plurality of anonymization protocols each associated with one of said plurality of different analysis functions, each of the anonymization protocols being configured to select and anonymize personal data in a manner adapted to the one of the analysis functions associated with said anonymization protocol, the protocols being configured to selectively select and anonymize only those personal data that are necessary for the respective analysis function; at least one user computer system connected to the control computer system and the provisioning computer system via a network, the at least one user computer system comprising, a data store in which personal data is stored in a protected non-anonymized form; an anonymization software; wherein the user computer system is the source of the personal data, and wherein the personal data is stored in the user computer system such that it can only be accessed by the anonymization software and optionally also by a database management program and/or a personal data management program; wherein the anonymization software is configured for, receiving at least one anonymization protocol of the plurality of anonymization protocols from the provisioning computer system; for each of said at least one anonymization protocol, selecting and anonymizing a subset of the personal data, said selecting and anonymizing being performed in accordance with said anonymizing protocol; and transferring the anonymized subset and an identifier of the anonymization protocol used for anonymization to the control software; wherein the control software is configured for, receiving the at least one anonymized subset and the at least one identifier from said anonymizing software; and providing the at least one anonymized subset and the at least one received identifier to the analysis software for performing those analysis functions to which the anonymization protocol identified by the identifier is associated, on the subset; the control computer system further comprising, the analysis software, the analysis software being adapted to perform the one of the analysis functions identified by the identifier provided by the control software.

21. The computer system according to claim 20, wherein the control computer system serves as the provisioning computer system; or wherein the control computer system and the provisioning computer systems are different computer systems.

22. The computer system according to claim 20, further comprising: a personal data management software, wherein the personal data management software is configured to interoperate with the anonymization software during editing of the personal data and/or during input of new personal data by a user via a GUI to compare the data currently input via the GUI and/or the input fields currently present in the GUI with the at least one anonymization protocol and to output a result of the comparison.

23. The computer system according to claim 22, wherein the comparison of the data currently entered via the GUI with the anonymization protocol comprises: determining if and which of at least one anonymization protocol has been activated for the person whose personal data is currently being entered or edited; analyzing the one or more anonymization protocols activated for this person in order to determine the totality of all the attributes specified as a “necessary attribute” in all the anonymization protocols activated for this person, a “necessary attribute” being a data field of a personal file which is necessary for the execution of the analysis function assigned to the anonymization protocol; comparison of the determined “necessary attributes” with the entered data; if the entered data does not contain at least one of the necessary attributes: automatically outputting a warning message to the user; and/or automatically modifying the GUI so that the modified GUI contains input fields for at least the at least on missing necessary attributes.

24. The computer system according to claim 22, wherein the comparing of the input fields currently present in the GUI with the anonymization protocols comprises: determining if and which of the anonymization protocols have been activated for the person whose personal data is currently being entered or edited; analyzing of the one or more anonymization protocols activated for this person in order to determine the totality of all the data fields specified as a “necessary data field” in all the anonymization protocols activated for this person, a “necessary data field” being a data field of a personal file used for storing an attribute that is necessary for the execution of the analysis function associated with the anonymization protocol; comparing the determined necessary data fields with the data fields of the GUI; if the GUI does not contain at least one of the necessary data fields, automatically outputting a warning message to the user; and/or automatically modifying the GUI so that the modified GUI contains input fields at least for each of the missing necessary data fields.

25. The computer system according to claim 20, the anonymization protocols each comprising a validity period, the validity period indicating a time of validity and usability of the respective protocol within the anonymization software; and the anonymization software being configured to automatically collect the personal data anonymized in accordance with this protocol in the form of a subset of the personal data in response to the end of the validity period and to transmit them to the control software in collected form.

26. The computer system according to claim 20, wherein the anonymization software for one or more of the at least one anonymization protocol respectively comprises and continually updates a counter, wherein the one or more counters each indicate how many personal data records have already been anonymized with the anonymization protocol to which the counter is assigned, wherein the anonymization software is adapted to: check whether one of the counters exceeds a predefined minimum value; if the minimum value is exceeded, automatically collecting all personal data already anonymized by the anonymization protocol assigned to this counter and transmitting the collected anonymized personal data in the form of a batch to the control software.

27. The computer system according to claim 20, wherein one or more of the anonymization protocols each include: a specification of one or more “sensitive data fields”, wherein a “sensitive data field” is a data field of a personal file whose original content is deleted or anonymized by the anonymization protocol in the course of anonymization; and/or a specification of one or more “range data fields” and at least one respectively associated value range, wherein a “range data field” is a data field of a personal file whose original content is replaced in the course of anonymization by the anonymization protocol by the one of the value ranges defined in the anonymization protocol which comprises this data value; and/or a specification of one or more “necessary data fields”, where a “necessary data field” is a data field of a personal file that is necessary to perform the analysis function associated with the anonymization protocol; and/or a specification of one or more “selection data fields” and at least one respective associated selection value, wherein a “selection data field” is a data field whose content determines whether or not a data field of a personal file is extracted and anonymized in the course of anonymization; and/or a mapping list comprising one or more synonyms mapped to a normalized term representing basically the same semantic content as the synonyms mapped to the normalized term, wherein all synonyms contained in a personal file are replaced with the normalized term to which the synonym is mapped in the protocol in the course of anonymization; and/or a whitelist comprising a list of allowed data values which are to be maintained in the course of anonymization; a blacklist comprising a list of forbidden data values which are to be deleted or replaced in the course of anonymization; and/or a time period indicating the granularity of an absolute-to-relative time conversion operation performed in the course of anonymization; the time period can be specified in the protocol on a per-field basis or globally for two or more different fields; and/or an identifier of the analysis function assigned to the anonymization protocol.

28. The computer system according to claim 20, the personal data consisting of a plurality of personal files, wherein the anonymization software is configured for: receiving a request for personal data from the control software, the request comprising an identifier of one of the anonymization protocols; performing said one anonymization protocol in response to receipt of said request, said one anonymization protocol comprising a specification of one or more “selection data fields” and at least one respective associated selection value, wherein performing said one anonymization protocol comprises comparing the content of said “selection data field” of all personal files with said at least one selection value, wherein said one anonymization protocol is configured to anonymize only those personal files for which said comparison provides sufficient similarity to said at least one selection value; and transferring the anonymized personal files as the subset of the personal data to the control software, each together with an identifier of the one anonymization protocol.

29. The computer system according to claim 20, further comprising: a proxy computer system connected via the network to the control computer system and to a plurality of user computer systems respectively comprising an instance of the anonymization software, the plurality of user computer systems including the at least one user computer system, wherein each of the plurality of user computer systems is connected to the control computer system only indirectly via the proxy computer system, wherein the anonymized subsets and protocol identifiers are transferred from each of the anonymization software instances to the control software via the proxy computer, and wherein the proxy computer is configured to perform the transfer such that the identity of the one of the user computers having provided any one of the anonymized subsets and protocol identifiers is hidden from the control computer; and/or wherein the anonymization software instantiated on each of the user computer systems is configured to encrypt the anonymized subset of the personal data such that the control software but not the proxy computer can decrypt the transferred anonymized subset of the personal data.

30. The computer system according to claim 20, wherein the anonymization software is configured to perform the selection and anonymization of the subset of the personal data for the data of a plurality of persons, to collect the anonymized sub-sets and identifiers in a batch and to transfer the anonymized subsets and identifiers contained in the batch only in case the number of persons whose data is collected in the batch exceeds a predefined minimum threshold value.

31. The computer system according to claim 20, wherein the anonymization software is configured to automatically determine the degree of anonymization achieved by the execution of the at least one anonymization protocol and to transfer the anonymized subset and identifier to the control software only in case the anonymized data guarantees a predefined minimum degree of anonymity; and/or wherein the control software is configured to automatically determine the degree of anonymization of the transferred anonymized subset and is configured to provide the at least one anonymized subset and the at least one received identifier to the analysis software only in case the anonymized data guarantees a predefined minimum degree of anonymity.

32. The computer system according to claim 20, wherein the provisioning computer system comprises a private cryptographic signing key; wherein each of the plurality of anonymization protocols comprises a signature generated with the private cryptographic signing key; and wherein the anonymization software comprises a public signature verification key that forms an asymmetric cryptographic key pair with the private cryptographic signing key, wherein the anonymization software is configured to verify the signature of each received protocol and for using any of the received anonymization protocols for selecting and anonymizing a subset of the personal data only in case the signature is valid.

33. The computer system according to claim 32, wherein the degree of anonymization is measured as k-anonymity and/or l-diversity.

34. The computer system according to claim 20, wherein the user computer system comprises security means which prohibit installation of an analysis programs and/or any other type of software program on the user computer system; and/or wherein at least some of the multiple analysis programs are instantiated on two or more remote analysis computers operatively coupled to the control computer system via the network.

35. A computer-implemented method for anonymizing personal data, the method being performed by: a control computer system comprising control software for providing anonymized personal data to at least one analysis software, said at least one analysis software comprising a plurality of different analysis functions for analyzing personal data; a provisioning computer system comprising a plurality of anonymization protocols each associated with one of said plurality of different analysis functions, said anonymization protocols each configured to select and anonymize personal data in a manner adapted to said associated analysis function; the method comprising, providing, by the provisioning computer system, at least one anonymization protocol of the plurality of anonymization protocols to an anonymization software of a user computer system connected to the control computer system and the provisioning computer system via a network; for each of said at least one anonymization protocols provided, receiving, by the control software of the control computer system, an anonymized subset of personal data of one or more persons and an identifier of the one anonymization protocol used by the anonymization software for selecting and anonymizing the subset, whereby the selection and anonymization was performed in accordance with said one anonymization protocol; and providing, by the control software, the at least one anonymized subset and the at least one received identifier to the analysis software for performing the one of the analysis functions which is associated with the anonymization protocol identified by the identifier on the subset.

36. A computer-implemented method for anonymizing personal data, the method being performed by: at least one user computer system connected to a control computer system and a provisioning computer system via a network, the at least one user computer system comprising a data store in which personal data is stored in a protected, non-anonymized form, the at least one user computer system further comprising anonymization software, the control computer system comprising control software for providing anonymized personal data to at least one analysis software, said at least one analysis software comprising a plurality of different analysis functions for analyzing personal data, the provisioning computer system comprising a plurality of anonymization protocols each associated with one of said plurality of different analysis functions, said anonymization protocols each configured to select and anonymize personal data in a manner adapted to said associated analysis function, the protocols being configured to selectively select and anonymize only those personal data that are necessary for the respective analysis function, wherein the user computer system is the source of the personal data, and wherein the personal data is stored in the user computer system such that it can only be accessed by the anonymization software and optionally also by a database management program and/or a personal data management program; and the control computer system; the method comprising, receiving, by the anonymization software, the at least one anonymization protocol of the plurality of anonymization protocols from the provisioning computer system; for each of said at least one anonymization protocol: selecting and anonymizing, by the anonymization software, a subset of said personal data, said selecting and anonymizing being performed according to said at least one anonymizing protocol; and transmitting, by the anonymization software, the anonymized subset and an identifier of the anonymization protocol used for anonymization to the control software for enabling the control software to provide the at least one anonymized subset and the at least one received identifier to the analysis software for performing the one of the analysis functions which is associated with the anonymization protocol identified by the identifier on the subset; and performing the one of the analysis functions identified by the identifier provided by the control software by the analysis software of the control computer system.

37. A computer-readable non-transitory storage medium having embedded therein a set of instructions which, when executed by one or more processors causes said processors to execute a computer-implemented method according to claim 34.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0150] In the following, only exemplary forms of the invention are explained in more detail, whereby reference is made to the drawings in which they are contained. They show:

[0151] FIG. 1 a block diagram of an embodiment of an inventive computer system having a user computer system and a control computer system that also serves as a provisioning computer system;

[0152] FIG. 2 a block diagram of another computer system according to the invention with three user computer systems, one control computer system and one deployment computer system;

[0153] FIG. 3 a user computer system with a personal data management program and an anonymization plugin;

[0154] FIG. 4 a flowchart of a method for providing and using an anonymization protocol according to an embodiment of the invention;

[0155] FIG. 5 a flowchart of a method for collecting and anonymizing personal data in the course of opening a personal file;

[0156] FIG. 6 a flowchart of a method for providing and using an anonymization protocol, and

[0157] FIG. 7 a block diagram of a distributed system comprising multiple user computer system, a control computer system and a proxy.

DETAILED DESCRIPTION

[0158] The following exemplary embodiments all refer to the medical field. However, the invention may also be used in other areas in which personal data is collected, stored and, under certain conditions, made available to third parties for external analysis. This applies in particular to the administration of clients, customers and members of an organization. When talking about “patients” here, “persons” are implicitly included and meant as well.

[0159] FIG. 1 shows a block diagram of an embodiment of an inventive computer system 100 with a user computer system 160 and a control computer system 128 that also serves as a deployment computer system.

[0160] For example, the application computer system may be a physician's computer, such as a single practice, a group practice, or a clinic or medical research facility. The computer system may include one or more processors and may be implemented as a notebook, smartphone, tablet computer system, terminal, server computer system, or distributed cloud computer system.

[0161] The user computer system contains a data storage 102, on which a large number of patient files 104-110 are stored in non-anonymous but protected form. The data storage can be any data storage, for example a file, or a set of files, or a file directory, or a database. Preferably it is a database, especially a relational database such as MySQL or PostgreSQL. For example, the data store can contain 102 personal data from a large number of people (in this case, patients). In addition, the user computer system 160 contains an anonymization software 114 which can access the personal data of the data memory 102 at least readably via an interface 112.

[0162] The anonymization software contains a multitude of functionalities. On the one hand, it contains an interface 122.2 to a provisioning computer system via which it can receive one or more anonymization protocols 120 via a network. Typically, the received anonymization protocols 120 represent only a small selection of the anonymization protocols 121 contained in the provisioning computer system 128. For example, the one or more anonymization protocols can be requested via interface 122.2 at any time during the runtime of the anonymization software and received via the network.

[0163] Each of the anonymization protocols 121,120 is clearly assigned to one analysis function 130 out of a multitude of analysis functions. This means that the anonymization protocol of a particular analysis function determines which personal data records are to be selected for a particular analysis and how this data is to be anonymized. The type of data selected and the type of anonymization is specific to the associated analysis functions, i.e., data collected and processed by another, unallocated protocol may not be processed or may not be processed correctly by an analysis function.

[0164] For example, each protocol can have a unique ID, a version (“revision”) corresponding to a particular validity period, a start date and end date as indicated in the JSON example given below:

TABLE-US-00001 { ... “protocolID”: 123, “protocolRevision”: 3, “valid”: [“2019-08-01”, “2019-09-30”] ... }

[0165] According to some embodiments, each protocol comprises one or more filter criteria that can be specified as filter rules. A filter rule is a function that specifies which attributes of a person file should be processed and how, and which specifies how to decide based on the result of this analysis if a person and his or her personal data is relevant for the analytical function and task to which this protocol is assigned. A pseudocode example for a filter rule is given below (the original JSON code would be less comprehensible): [0166] PatientFile.DiagnosisRecord con tainsHistoricalOrCurrent(24 months, [K75.8;K75.9;K76.0]) and Patient.Age <85)

[0167] Each filter rule may comprise an arbitrarily complex combination of Boolean operators.

[0168] According to some embodiments, one or more of the protocols respectively comprise a specification of a set of “quasi-identifiers”.

[0169] A “quasi identifier” as used herein is an attribute of a person (or field name of a person file) which alone or in combination with other quasi identifiers bears the risk of making a person identifiable. For example, a diagnosis and/or medication of a patient can be a quasi identifier.

[0170] According to embodiments, to ensure a sufficient degree of anonymization, all quasi identifiers of a person together must be k-anonymous, otherwise in combination they can identify individual persons. For example, k can be 55 meaning that in a set of anonymized person data comprising e.g. the data of 10.000 persons, each person-specific combination of quasi-identifiers must be observed in at least 55 persons.

[0171] According to embodiments, the anonymization software is configured to automatically and dynamically repeat the anonymization procedure based on modified anonymization parameters, e.g. based on extended value ranges: in case the degree of anonymization obtained by replacing individual data values with respective data ranges is not sufficient, the replacing is repeated using larger value ranges, thereby increasing the number of persons in a person data set to which the attribute value ranges assigned to an individual anonymized person can be mapped. Examples for the dynamic computation of anonymization parameters can be found in the literature, e.g. in Aggarwal, Gagan, et al. “Approximation algorithms for k-anonymity. Journal of Privacy Technology (JOPT) (2005).

TABLE-US-00002 { ... “quasiIdentifiers”: [ { “object”: “PatientFile.Age”, “anonymizer”: { “name”: “mapToRange”, “args”: { “ranges”: [ [20, 25], [25, 30], [30, 40], [40, 60] ] } }, } ], ... }

[0172] According to some embodiments, one or more of the protocols respectively comprise a specification of a set of “sensitive data” elements (i.e., “sensitive attributes” of a person comprising sensitive personal data which—in contrast to a “quasi identifier”—do neither alone nor in combination with other “sensitive data” elements make a person identifiable. However, these attributes may allow drawing conclusions on a group of persons. According to embodiments, the protocol can require an indication of a numerical number for the parameter “L”, wherein “L” means -l-divers. All “sensitive data” elements of an anonymized patient record need to be l-divers to prevent drawing conclusions from the anonymized data records about an entire group of users.

[0173] A “sensitive attribute” or “sensitive data element” is an attribute of a person whose value for any particular individual must be kept secret.

[0174] A set of non-sensitive attributes can be or can acta s a “quasi-identifier” if these attributes can be used to uniquely identify at least one individual in the data set.

[0175] For example, let S denote the set of all sensitive attributes. An example of a sensitive attribute can be “medical condition”. The association between individuals and “medical condition” hence needs to be kept secret and the anonymization process needs to ensure that the anonymized data does not allow linking a medical condition to an individual person or vice versa. Thus the sensitive data “medical condition=cancer” must not be disclosed in association with a particular patient but it may be permissible to disclose the information that cancer patients exist in a particular hospital.

[0176] A set of nonsensitive attributes of a table is called a quasi-identifier if these attributes can be linked with external data to uniquely identify at least one individual in the general population. One example of a quasi-identifier is a primary key, like social security number. Another example is the set {gender, age, zip code} in a data set comprising only a small number of persons per zip code. A zip code per se does not disclose sensitive data of a person, but in combination with other attributes it may reveal the identity of a person, thereby also disclosing the medical condition of this person.

[0177] For example, a data set to be anonymized may consist of all people in a small village. The data set comprises only 10 different 54 year old men who all suffer from disease X and it is known that there are only 10 54 year old men in this village. In this case, it is immediately known that “Max Mustermann” suffers from disease X as soon as I know that he is 54 years old. To prevent this, l-diversity is computed: If I=2 the group (54, man) should have at least two different entries for “has disease X”. Embodiments of the invention use anonymization protocols configured to create anonymized data sets comprising as few “sensitive fields” as possible to handle data as sparingly as possible (e.g. to read only “Patient has diagnosis X” instead of a complete list of all diagnoses of a patient).

[0178] Accordingly, in order to ensure I-anonymity, only a subset of diagnoses from all existing diagnoses of a patient will be extracted and included in the anonymized patient record. For example, the subset of diagnoses can be the ones of the diagnoses of a patient mentioned on a whitelist specified in the protocol and/or can be the diagnoses observed within the last 12 months.

[0179] A protocol code sample for the sensitive data in JSON format is presented below:

TABLE-US-00003 { ... “sensitiveData”: [ { “object”: “PatientFile. DiagnosisRecord”, “anonymizer”: { “name”: “whitelist”, “args”: { “allowedValues”: [ “K75.8”, “K75.9” ], “rangeMonths”: 12 } }, } ], ... }

[0180] The received anonymization protocols 120 can be read by the anonymization software in order to select a certain subset of the existing patient files that are to be evaluated with regard to a certain analysis function. The anonymization software can, for example, use the anonymization module 116 to read in a certain anonymization protocol 120 and transfer it to a filter module 118, which uses the information in the protocol to select a subset of the patient files 104-110 that are suitable for the analysis functions assigned to the protocol. This selection of patient files is transferred from the filter module 118 to the anonymization module 116, which anonymizes the patient files of this selection according to the specific anonymization protocol. In the course of anonymization, data values in sensitive data fields in particular are completely removed, data values in range data fields are replaced by range information, necessary data fields specified in the protocol are checked to see whether the required information is available, and the anonymized patient data thus obtained is stored locally in a structured form that the control software 140 can process.

[0181] According to embodiments, the filter criteria are protocol-specific and are comprised in the protocols. In some examples, the filter criteria are automatically evaluated against the personal data when a person file is opened in a personal data management program 300 that is interoperable with the anonymization software. This may usually happen when a person (e.g. a patient) visits the operator of the anonymization software (e.g. a physician). If the patient is suitable for a protocol, the patient and the physician respectively have the possibility to object to the anonymization of this data. This objection will be saved and the data of this person will not be processed further by the anonymization software. Otherwise, a sub-set of the personal data selected in accordance with this protocol is read out, processed and stored anonymously in a local database by the anonymization software.

[0182] Preferably, a large amount of patient data is anonymized and stored locally as long as the validity period of the protocol 120 used has not expired. The expiry of the validity period of a protocol can be interpreted by the anonymization software 114 as a trigger signal to send all anonymized patient files having been generated by this protocol and having been stored locally in the form of a batch of anonymized patient records to the control software 140.

[0183] In addition or alternatively, it is also possible that the control software 140 triggers the transmission of the collected anonymized data records. This can be done, for example, by transmitting a command 152 from the control software to the anonymization software 114, whereby the command contains an identifier 150 of the analysis functions 132-138 to be performed and/or an identifier of the anonymization protocol assigned to these analysis functions. In response to the receipt of command 152, the anonymization software identifies an anonymization protocol which is assigned to identifier 150 directly or indirectly via the identifier of the analysis functions, executes the identified anonymization protocol(s) and provides a protocol-specific, anonymized subset of the patient data.

[0184] For example, the filter module 118 and anonymization module 116 can be used for selectively anonymizing and providing those patient data records which match some filter criteria (selection values) specified in the identified anonymization protocols. This subset 154 is returned to control software 140 in response to command 152.

[0185] According to embodiments, one or more of the protocols comprised in the anonymization software respectively comprise a specification of a data structure, e.g. of a database table, to be used for storing the anonymized data generated in accordance with this protocol. The data structure can be created dynamically by the anonymization software when or before performing the selection and anonymization based on the protocol. For example, the data structure can be created in a local database. Then, the anonymized subset of the sensitive data of one or more persons generated in accordance with this protocol is stored in this data structure in the local database.

[0186] If there is either enough data available (e.g. if more than a predefined minimum number of persons are represented in the anonymized data generated in accordance with a particular anonymization protocol) or if the defined validity of one of the anonymization protocols has expired, the data is transferred to the control software via the network. For example, the data can be transferred via a REST API in JSON format. Before transmission, the anonymizing software optionally checks whether the quasi-identifiers contained in the anonymized subset of the data that is to be transferred are k-anonymous and/or whether the “sensitive data” of this subset is l-divers. If this is not the case, the anonymization software transmits only an error message to the control software.

[0187] The control software 140 may include modules and functions 142 for managing the anonymized patient data received from one or more user computer systems, for storing this anonymized patient data 146, 148 in a database 144, and for providing the anonymized patient data specifically to selected analysis functions 132-138. The anonymized patient data is provided to selected analysis functions in such a way that an anonymized patient data subset 146,148 received by the anonymization software 114 is only provided to the analysis functions that are assigned to the anonymization protocol used to create the subset. For example, the control computer system may have a corresponding allocation table or allocation file that assigns a corresponding anonymization protocol to each of the analysis functions. The allocation table or allocation file may also contain address data from multiple analysis computer systems, if the variety of analysis functions 132-138 are distributed among multiple analysis computer systems. In this case, the control software selectively provides the subset received for a particular analysis function (it may also be multiple subset provided for an analysis function by a variety of user computer systems) to the address of the analysis computer system containing that analysis function.

[0188] According to some embodiments, the anonymized patient data subsets are transferred from the control software to the individual analysis functions of one or more analysis software programs 130 by means of push procedures. According to other embodiments, the anonymized patient data subsets are transferred from the control software to the individual analysis functions of one or more analysis software programs 130 by means of pull procedures.

[0189] After receiving one or more anonymous subsets of patient data, the analysis software 130 executes the corresponding analysis functions on this subset. The analysis functions can be performed in response to receipt of the subset, or after a sufficiently large data set has been received from one or more user computer systems for a particular analysis function. The result 156 returned by the analysis functions is output. The output is made to at least one user of the analysis computer system (which is identical to the control computer system here), for example via a screen, printer, or other user interface. These users may be, for example, the leader of a medical survey, researchers who have developed a particular complex statistical analysis, or anyone else who is in charge of implementing and/or performing an analysis.

[0190] In some cases, even if only for certain analysis functions, the result is also returned to the control software and issued to a user of the control computer system. The user of the control computer system may also be a leader of a medical survey, a person who has developed a particular analysis or integrated its use into the control software, or another person in charge of implementing and/or performing or integrating an analysis into the control software.

[0191] In some cases, in particular for analysis functions that evaluate a large amount of anonymized personal data within the framework of a scientific study, e.g. a medical survey, the result is also returned by the control software to the anonymization software 114, which has provided at least part of the anonymized personal data on the basis of which the results were obtained. Due to anonymization, the result cannot be assigned to an individual patient, but the user can still benefit from receiving the resuit, for example by being informed that a certain proportion of his patients have a particularly high or low chance of responding to a certain therapy and/or have a particularly high or low risk with regard to a certain diagnosis due to a specific diet, for example, or due to other characteristics that a doctor may observe in a patient.

[0192] According some embodiments, all communication between the control computer and each of the user computers is performed via an SSL/TLS connection. Preferably, the anonymization software and/or the person management software requires a user, e.g. a healthcare professional, to authenticate at the anonymization software and/or at the person management software (e.g. by providing a password, biometric data or other form of user credential).

[0193] The anonymization software can be configured to regularly synchronize its protocols with the protocols stored in the provisioning computer system and/or the control computer system to ensure the analysis software always comprises the latest version of the protocols already comprised in the anonymization software. According to some implementation variants, the synchronization comprises repeatedly (e.g. once a day) sending a request from the anonymization software to the control software via REST API to get a list of the most current version numbers of all currently active, locally available protocols. The synchronization can comprise receiving, by the anonymization software, a list of protocol identifiers from a remote computer (the provisioning computer system or the control computer system) indicating a number of protocols or protocol versions having been deleted on the remote computer. If an identifier of one of the anonymization protocols stored locally in the anonymization software is comprised in the list, the anonymization protocol automatically deletes this protocol and all locally stored anonymized data generated in accordance with this protocol. In case a newer version of one of the deleted protocols is available, the anonymization software automatically downloads this new version and verifies the signature of the downloaded protocol before the protocol is stored locally. For example, the anonymization software can comprise a public signature verification key that corresponds to a public root key of the organization that operates the control computer system and that typically also provides the anonymization protocols. The signature verification comprises checking a chain of signatures belonging to the Public Key Infrastructure of this organization, similar to e.g. a SSL/TLS PKI. If the signature is invalid or cannot be assigned to the root key, the protocol is not imported into the anonymization software and discarded. Otherwise, the new and verified protocol is used for evaluating and anonymizing personal data.

[0194] FIG. 2 shows a block diagram of another computer system 200 according to the invention with three user computer systems 160, 120, 260, a control computer system 128 and a provisioning computer system 262. It is a distributed computer system whose components are operatively connected to each other via a network, e.g. the Internet. Each of the computer systems 160, 120, 260, 128 and 262 can also be implemented as a monolithic or distributed computer system, e.g. as a computer network and/or as a cloud computer architecture. The user computer systems 160, 120, 260 each contain an instance of the anonymization software 114, which can exchange data with the control computer system via an interface, as described, for example, with regard to the embodiment shown in FIG. 1. Each of the user computer systems contains a data store 102, 202, 210, e.g. a relational database in which personal data records are stored. Typically, the personal data records 104-108, 204-208, 214-218 of the different computer systems 160, 120, 260 originate from different persons and/or contain at least different contents. For example, user computer system 160 can be a computer of a general medical practitioner in Cologne, user computer system 260 can be a computer of a group practice in Berlin and user computer system 210 can be a computer in an oncology department of a hospital. Typically, the patient files therefore originate from different patients and/or differ at least with regard to parts of the contents of the patient files.

[0195] If, for example, the users of the user computer systems wish to participate in a study, e.g. a specific medical survey concerning the interaction of two drugs M1, M2, the users can obtain a corresponding anonymization protocol from the provisioning computer system, e.g. via a download link activated after conclusion of the contract, and import the protocol into the respective instance of the anonymization software 114.

[0196] According to some embodiments, the physician obtains the consent for the transfer of anonymous patient data from the respective patient when opening or creating a patient file. For example, creating or editing a patient file can automatically activate the protocol for this patient at least partly before the patient was asked to agree to the anonymization and forwarding of his or her data. This may have the advantage that the select value specified in the protocol can be evaluated and compared with the data content of the respective select field of the patient record before the user is asked for consent. For example, if the patient does not match the select value and does not “fit” in the survey, embodiments of the invention do not ask the patient for his or her consent to provide his or her data in anonymized form.

[0197] Often, an analysis function only refers to a certain group of people, e.g. people of a certain sex, age group, people with a certain pre-existing condition or long-term medication, etc. The analysis function is often used to determine whether the patient is a suitable candidate for the survey or the analysis function. In this case, the patient is only asked by the physician to agree to the data transfer if the patient belongs to the said group of persons.

[0198] If the patient does not agree to the anonymization and forwarding of his or her data to the control software/analysis software, the protocol will not anonymize this person's personal data and transmit it to the control software. If the partial execution of the protocol shows that the patient belongs to the group of people whose data can be used for the analysis function, the anonymization software instructs the physician to request all attributes relevant to this survey and specified in the protocol from the patient, e.g. by automatically modifying the fields of a GUI and/or outputting a visual, acoustic or other signal. After closing the patient file, the data of the patient that are relevant for the analysis function according to the protocol are selected and first stored anonymously locally in the respective user computer systems. In this way, each of the multiple instances of the anonymization software collects patient data and stores it locally until, for example, a minimum number of data sets has been collected and/or the validity period of the protocol has ended. Once one of these termination criteria has been met, the collected anonymized patient data is sent asynchronously from the individual instances of the anonymization software to the control software.

[0199] According to embodiments, the control software is configured to receive from a plurality of user computer systems 210, 260, 160 a set (“subset”) of anonymized patient records obtained by executing a particular anonymization protocol, and to merge those records on a protocol-specific basis and provide them as a whole to the analysis function associated with that anonymization protocol.

[0200] In some versions, several 1000 or even several 10,000 application computer systems can be operatively connected to the control software and transmit anonymized patient data together with an identifier of the anonymization protocol used for anonymization to the control software. One or more different anonymization protocols can be installed and active in each of the user computer systems. The administration of the anonymized data of the individual user computer systems and the protocol-specific collection and combination of the anonymized patient data of several user computer systems can therefore be quite complex and require a sufficiently powerful computer architecture.

[0201] The type and number of anonymization protocols provided by the deployment computer system may change over time and must be synchronized with the type and number of analysis functions supported by the analysis software.

[0202] FIG. 3 shows a user computer system 160 with a personal data management program 300, e.g. a patient data management program, and anonymization software 114 designed as a plugin for this patient data management program. The patient data management program may include a standard input mask (graphical user interface —“GUI”) that includes multiple input fields for personal attributes such as first and last name, address, gender and/or birthday, long-term medication, and current symptoms. The question of whether the patient is taking a particular medicine X is too specific to require a separate field in the standard input mask. Accordingly, in daily practice it is to be expected that the physician will not explicitly ask for this medication, and even if the physician asks the patient for current or previous patients, it is possible that the patient does not remember the medication. Many patients are older and take a large number of drugs, so that it is quite possible that the existing database of patient records of a physician does not provide a reliable database for whether a patient is taking drug X or not. If, however, an anonymization protocol of the anonymization software is executed and this recognizes that the currently processed patient file lacks explicit information on the taking of the drug X, then the anonymization software alone or in interoperation with the patient data management software automatically modifies the input mask 302 in such a way that the required attributes are explicitly queried, as shown here in the form of the data field 306. In addition or alternatively, the anonymization software alone or in interoperation with the patient data management software, can also generate a message, e.g. a pop-up window 308, which reminds the user to retrieve the required data or to collect them in another way (e.g. blood sampling to determine required blood values, etc.).

[0203] FIG. 4 shows a flowchart of a method for providing and using an anonymization protocol according to an embodiment of the invention.

[0204] The operator of the user computer system 160, e.g. a physician or a hospital manager, can contract with an operator of the control computer system, e.g. the creator of a multitude of analysis functions, for what duration and period of time anonymized patient data should be made available for which types of analysis functions and under which conditions. In the event of an agreement, one or more anonymization protocols are made available to the operator of the user computer system 160 in step 404, e.g. in the form of a download link, via which the anonymization software can download and import the one or more selected anonymization protocols from the provisioning computer system.

[0205] According to some embodiments, for each of the anonymization protocols imported into the anonymization software, which, for example, are sequentially processed in a program loop 406 on certain occasions, a part of the locally available personal files is selected in step 408 which fulfil certain criteria defined in the protocol (e.g. age, sex, medication, etc.). The occasion can be e.g. start of the anonymization software, opening of a personal file, closing of a personal file, etc. If a protocol does not specify such selection criteria, all locally available personal files are selected for further analysis by this anonymization protocol.

[0206] According to embodiments, only patient records of patients having agreed to the anonymization of their data are selected. The selected personal files are analyzed to read (capture) those attributes that are specified in the anonymization protocol as necessary for performing an analysis function. The patient data recorded according to the anonymization protocol (e.g. health status and postal code, but not X-ray images) are stored locally in anonymized form.

[0207] The anonymization software repeatedly checks all anonymization protocols it contains to see whether they have reached the end of their validity period. If the expiry date of the validity period of one of the anonymization protocols contained in the anonymization software has been reached, in step 410 all personal data records anonymized by the said anonymization protocol are collected and transmitted to the control software.

[0208] FIG. 5 shows a flowchart of a method for collecting and anonymizing personal data according to an embodiment of invention.

[0209] The method is initialized by opening 502 or creating a new personal file, e.g. in the course of a visit of the person, e.g. a patient, to the user of the user computer system, e.g. a physician. For example, a personal file can be opened by the anonymization software or by a personal data management software that is interoperable with the anonymization software.

[0210] In step 504, the physician obtains permission from the patient to transmit the patient's data anonymously. Step 506 is only executed if the patient permits the anonymization and transmission for the specific purpose of performing a particular analysis function. In this step, the anonymization program checks whether the patient fulfills the selection criteria (“filter criteria”) of the analysis function at all, i.e. belongs to a certain age group, to which the analysis function should be limited. Only if this condition is also fulfilled, the anonymization software alone or in interoperation with the patient data management software in step 508 performs the acquisition of parts of the patient's data according to the protocol. “According to protocol” here means that the protocol can optionally influence the data acquisition process, e.g. by automatically modifying the fields of a GUI and/or by informing the user that data for certain attributes are still missing. If the patient refuses to consent and/or the patient does not meet the filter criteria, the patient data can still be collected or changed, but the patient data will not be anonymized or transmitted to the control software, but only stored and used locally.

[0211] In other forms, step 506 can also be performed before step 504 and step 506 can also be completely missing or missing for some of the anonymization protocols.

[0212] For the purpose of data economy, the anonymization software selectively anonymizes the attribute values of the patient file currently being processed selected according to the anonymization protocol in step 510 and saves the anonymized part of the patient file locally in step 512. The anonymization can comprise replacing concrete data values stored in a particular data field (identified e.g. as “range field” in the anonymization protocol) by a value range specified in the anonymization protocol and/or removing data values stored in a data field identified as “sensitive field” in the anonymization protocol.

[0213] FIG. 6 shows a flowchart of a method for providing and using an anonymization protocol.

[0214] In step 602, the anonymization software receives one or more anonymization protocols from the provisioning computer system. For example, the anonymization software may be a plug-in of a patient administration program at a doctor's office and the doctor may want to participate in a particular demographic study for which the initiator of that study provides a corresponding anonymization protocol via the provisioning computer system for download. The provision can take place without any restriction in the form of a publicly accessible download link or may be access-restricted (e.g. password-protected) only to certain persons.

[0215] In some embodiments, the received protocols comprise a signature. The anonymization software performs a signature verification and integrates and locally stores selectively those protocols comprising a valid signature.

[0216] In the following steps 606-612, the anonymization protocols integrated in the anonymization software are applied to the patient data. This can be done, for example, in the form of program loops 604.

[0217] For example, when the anonymization software and/or the patient administration software is started, a program loop 604 is executed over all available anonymization protocols, regardless of whether and which patient file is currently being processed. In this embodiment or operating mode, a large number of protocols can be executed and a large number of patient files can be processed and anonymized. This operating mode is preferably executed, for example, at times when the computer on which the anonymization software is running is not used for other purposes, such as at night or on weekends.

[0218] In a different operating mode or according to different embodiments, the following steps 606-612 are performed when the physician is working in a particular patient file. In this case, the 604 program loop is only run selectively for those anonymization protocols which are stored in association with and are activated for the currently processed patient.

[0219] In step 606, a first anonymization protocol of program loop 604 is selected and executed. The execution of the anonymization protocol involves the selection and anonymization of a subset of personal data of one or more patients (for example, the personal data of a patient whose patient file is currently being processed or the personal data of several patients for whom this anonymization protocol has been activated). For example, the address information for the patient file currently being processed is only included in the anonymized data record if the address information is relevant for the analysis functions assigned to the protocol. Other possibly relevant information is at least partially anonymized by transferring concrete numerical values to numerical value ranges. Irrelevant information is omitted. The question of which attributes are relevant or irrelevant and how to make them anonymous is specified in the protocol.

[0220] In step 608, the anonymization software sends the anonymized data of one or more patients via the network to the control software and the control software receives this data. In addition to the anonymized data, an identifier of the protocol (or protocols) used for anonymization will also be transmitted or received. Depending on the mode or form of execution, the anonymized data can be transferred per patient or as a totality of anonymized data from a large number of patients. Preferably, the transmission of the patient data is separated in time from the patient's visit to the doctor, as this may allow achieving a higher degree of security for the personal data.

[0221] In step 610, the control software forwards the anonymized data received and the identifier to an analysis software that can identify the analysis functions assigned to this protocol using the protocol identifier and apply them to the anonymized data. In other embodiments, the control software can also use the protocol identifier to identify the one from a plurality of anonymization programs that implements the analysis function associated with the anonymization protocol. This can be advantageous, for example, if the control software is interoperable with many different analysis programs offered on different servers.

[0222] According to some embodiments, the method further comprises a step 612 of executing the analysis function on the anonymized data provided by the control program. For example, the analysis function can be a statistical program configured to identify correlations between zip-codes and particular illnesses.

[0223] FIG. 7 depicts a block diagram of a distributed system comprising multiple user computer systems 160, 260, 210, a control computer system 128 and a proxy computer system 702. In each of the user computer systems 160, 260, 210, personal data is collected and anonymized by an anonymization software installed in the respective user computer system. For example, user computer system 160 can be a computer in a GP's practice, computer system 260 in an oncology clinic and user computer system 210 belongs to a cardiologist. The anonymized data are encrypted by the user computer systems 160, 260, 210 with a public cryptographic key of the control computer system 128. The encrypted anonymous data is not sent directly to the control computer system 128, but exclusively via the proxy computer system 702. The proxy computer system cannot decrypt the encrypted data because the private decryption key is only accessible to the control computer system, especially the control software. Since the control computer system 128 receives the anonymized data from the proxy computer system 702, the control software cannot assign the received anonymized records to a user computer system where they were collected. The implementation variant shown in FIG. 7 is particularly advantageous, since a particularly high degree of anonymization is achieved by concealing the data source.

LIST OF REFERENCE NUMERALS

[0224] 100 distributed computer system [0225] 102 database [0226] 104-110 personal file [0227] 112 database interface [0228] 114 anonymization software [0229] 116 anonymization module [0230] 118 filter module [0231] 120 one or more anonymization protocols [0232] 121 variety of anonymization protocols [0233] 122 controller interface [0234] 124 processor(s) [0235] 126 users [0236] 128 control computer system [0237] 130 analysis software [0238] 132-138 analysis functions [0239] 140 control software [0240] 142 data management module [0241] 144 database [0242] 146 anonymized patient data [0243] 148 anonymized patient data [0244] 150 analysis type [0245] 152 command [0246] 154 anonymized patient data [0247] 156 result of analysis functions [0248] 160 user computer system [0249] 200 distributed computer system [0250] 202 database [0251] 204-208 patient file [0252] 210 user computer system [0253] 212 database [0254] 214-218 patient file [0255] 260 user computer system [0256] 262 provisioning computer system [0257] 300 personal data management program [0258] 302 graphical user interface [0259] 304 dialog box for entering personal data [0260] 306 required data field [0261] 308 pop-up window [0262] 404-410 steps [0263] 502-512 steps [0264] 602-612 steps [0265] 702 proxy computer system