Quick Links

Meeting Schedule

How to Get Involved

Feedback Welcome

We value your input!
Have a suggestion or idea for the initiative wiki? Send an email the Project Manager.
Need help using the Wiki? Send an email to Wiki Admin/Support.

Edit SidebarRightEdit Action ItemsEdit Page Tabs


Data Segmentation for Privacy Wiki Post

Written by: Scott Weinstein & Ioana Singureanu

This post will seek to summarize the components necessary to segment data for privacy purposes based on the discussions within the workgroup so far. While some of our word choices could also be read to favor one architecture over another, these slips are inadvertent and should not be followed too closely. The post is meant as a draft to promote discussion about the elements of a data segmentation solution and should not be read as a finalized determination of what the data segmentation solution would consist of.

What is Data Segmentation for Privacy?

Data Segmentation means that no matter where policies are stored (locally or centralized), a policy decision point/rules engine must have the ability to understand which data or documents from a patient records are protected by the policy. If some data/documents are protected by a policy and some are not, then there must be a method in place to segment the protected documents. This is a core functional requirement for systems and solutions capable of data segmentation for the purpose of ensuring the privacy of patient records.

Another key aspect of data segmentation is communicating to a receiving entity the protections that correspond with the data/documents. Even if a system can identify what data is protected by policy and treat it differently, the effect of the original segmentation would be severely limited if this identification and instruction on handling the data does not persist upon sending it to another system.

Potential System Components of a Data Segmentation for Privacy Solution

The following is a set of abstract system roles that may enable some implementations of data segmentation. These roles may be played by EHRs or HIE gateways in some instances.

Privacy Policy and Consent Repository/Directory – The EHR, HIO, or another 3rd party maintains a database or directory of patient privacy consents, organizational policies, and jurisdictional policies. The actual electronic files that express the patient preference or jurisdictional/organizational policies could be stored locally in a provider system, or they could be stored in a centralized database. If they are stored remotely, then there should be a reference (URL, URI, UUID, OID) in the local system, the rules engine, or individual consents to the jurisdictional/organizational policies.

This system is also responsible for managing a policy or privacy consent over time (e.g. time-limited consent, revocation).

Figure 1: The life cycle of a policy or privacy consent

  • Some in the workgroup have agreed with the Standards Committee in pointing out that jurisdictional and organizational policies are always subject to change. As a result, it may be preferable for organizational/jurisdictional policies to be expressed in a centralized way (either on a website or in a database), so that when policies change the local systems do not have to correct every policy for every patient in their system.

Policy Decision Point/Rules Engine – This component allows the EHR/HIO to take the information within the policies and consent and determine which patient data should be protected. For example, if in a hospital only a certain wing is covered by 42 CFR Part 2, the rules engine would be able to determine from some standardized attribute of the data that the default policy is not to share it without an existing patient privacy consent.

Potential Data Components

Jurisdictional Privacy Policies – These policies apply to patient records in that jurisdiction whether a patient has expressed a sharing preference or not. They are separate from the patient’s privacy consent and their default restrictions on sharing of a record (such as “don’t share without consent” when it comes to 42 CFR Part 2 or 38 CFR Part 1) must be recognized regardless of whether a privacy consent is present.

Figure 2: Mind Map identifying the privacy policy and the related policy

  • Segmenting data affected by the jurisdictional privacy policy based on ”intrinsic metadata” (such as origin of the data): One method for segmenting based on Policy is for the policy decision point/rules engine to recognize what is protected based on some existing characteristic of the data. For example, the rules engine could be programmed with a rule that all data/documents created by a certain wing of a hospital or certain providers (i.e. those covered by 42 CFR Part 2) cannot be disclosed without a patient privacy consent indicating that they can be disclosed. This method of segmentation would require a standardized way for EHRs and/or HIOs to tag where the data was created, down to the provider/provider group level. If the policy were to change, so that the default was to share the data regardless of patient preference, a change would only have to be made at the policy decision point, as the system would no longer distinguish the data as required by the former policy.
  • Segmenting based on data-specific criteria (e.g. condition)

Another method of segmenting data based on a Policy is the creation of a rule that identifies specific standardized data elements that constitute the protected class of data. For example, if a policy protects information related to the diagnosis of HIV, the rule would contain a directory of all standardized data elements that would reveal that the patient had been diagnosed with HIV. A rules engine could then apply this rule in order to tag this data or documents containing the data as requiring special protection in accordance with the Policy. The tag could be expansive - in that it would not only distinguishes the data but provides instructions to the policy decision point about the policy itself, who the data can be shared with, for what purposes, etc. Alternatively, the metadata could simply provide a reference that the policy decision point would be programmed to recognize – i.e. a pointer to a policy expressed in a centralized policy database at the organizational or jurisdictional level. If the latter method were used, the metadata would not have to be modified if the Policy were to change, since the metadata is only a reference/pointer and policy decision point would recognize the updated policy based on the reference. If the policy explanation were to travel with the data/document and the policy decision point did not have a mechanism to check this metadata to make sure it was still accurate, a change in policy would have to be addressed at the metadata level.

Figure 3: Privacy Metadata

Privacy Consent- Privacy consents are patient preferences about sharing information that an organization has agreed to, or is required by law to honor. These consents overcome default organizational/jurisdictional sharing policies (share/don’t share) by allowing information protected by policy to be shared or by forbidding information from being shared that is not withheld by policy alone. (A good visual of this concept is to picture the electrical blueprint for a building. In this blueprint, there are markers for where the different light switches will go and what lights they will control. Some of these switches will automatically be in the “on” position, while others will be turned to the “off” condition. The blueprint represents policy. A tenant in the building can then elect to turn some of the switches off and others on. Other switches must stay on or off. The switches that are operated by the building tenant represent privacy consents.) Privacy consents can be managed similarly to policies, and must be adjudicated in accordance with the policy (i.e. the policy decision point recognizes that certain data is 42 CFR Part 2 data and initially restricts request to share until it locates a compliant consent authorizing the sharing of that data).
  • Using the policy decision point/rules engine to segment data based on privacy consents
    • With this method, the patient privacy consent would be adjudicated by the policy decision point/rules engine after the policy has been adjudicated. The privacy consent could exist on an electronic file either within the EHR/HIO or in a 3rd party database. Upon a request to share information, the EHR would pass the request through the policy decision point. The request would have to include information such as: 1) the data/document being requested (patient id, date of encounter, specify “med list”, labs, etc.); 2) who is making the request; 3) for what purpose is the request being made. The policy decision point would then determine if there is any Policy associated with the data/document (see above). With knowledge of the default Policy, the decision point would then look for a privacy consent based on the patient’s id. The privacy consent would inform the policy decision point whether the default applies or whether an alternate preference has been indicated and whether it applies based on the request information. This method relies on the rules engine to use existing data/document standards to adjudicate the preferences. For example, if a patient were to express the preference: “share information relating to my 42 CFR Part 2 substance abuse treatment with Dr. Bob”, the rules engine would identify data that is protected by 42 CFR Part 2, recognize that the default Policy is to not share the data, but recognize that the patient has consented to sharing with Dr. Bob. As a result, if the request to share indicated that the recipient was Dr. Bob, the policy decision point should allow the information to flow.
  • Using “privacy metadata” to help the policy decision point/rules engine adjudicate privacy consents
    • Metadata can be used to assist the policy decision point with adjudicating patient privacy consents. An EHR or HIO could apply metadata to all information to which the patient has issued a privacy consent that goes against the default policy for sharing. The metadata could be simple enough to indicate “don’t send” or “protected.” It could include some type of reference to an electronic privacy consent housed locally within the EHR/HIO or by a 3rd party. This way, if the patient consent said “send all 42 CFR Part 2 data to Dr. Bob”, the 42 CFR Part 2 data would have privacy metadata tags indicating that it can be sent to Dr. Bob. The metadata could contain detailed specifics about the patient privacy consent or it could just contain a code that acts as a reference to a separate patient privacy consent document which could be sent with the record or kept by a 3rd party and referenced by the EHR/HIO.

Information Requests - The trigger for exchanging data or documents can come from the provider who holds the data/document or from a provider who is wishing to receive the data/document. Regardless of the form of request, certain information must be included in order for the policy decision point to appropriately adjudicate the request. This information includes, but is not limited to: 1) the patient to which the information request pertains (without it, the system will not be able to find the appropriate data OR privacy consent); 2) what is being requested (i.e. discharge summary, CCD, lab report, med list, etc.); 3) what is the purpose for the request; 4) who is making the request (provider, organization, etc.). The policy decision point will use this information to decide what to return in response to the request.

Figure 4a: Example codes to standardize information response and obligation

Information Response (with Privacy Metadata) – This is the metadata sent along with the personal health information to the organization/provider intended to receive and use it. It represents the envelope and metadata for the health information disclosed to a third-party organization.
Figure 4b

Figure 4c

Patient data/documents - Patient data is currently exchanged in multiple ways. Sometimes the sending EHR will create a document based on certain data fields within the EHR. Depending on how the document is formed and sent, the receiving EHR may or may not be able to take the different data elements within the document and incorporate the data into their own system. In some systems, information is sent at the data level, in which case the receiving system can incorporate the patient information into their system as long as the systems use or can recognize the data standards used by the sending system.
  • If data is being shared at the document level, there may be an “envelope” that addresses the document, instructs its transport, secures the document, and describes to the receiving system when it opens the envelope what to do with it. Metadata could be applied at the envelope level which could provide reference to a policy and/or consent so that the receiving organization knows its obligations in terms of handling the document.
  • The workgroup has rightly recognized that there is likely a need for multiple envelopes to prevent intermediary handlers of it from seeing information on the envelope that could expose the existence of the same sensitive information it is meant to protect.
  • Privacy metadata could also be applied to specific atoms of information within a document or a message exchanged between EHRS or HIOs

Figure 5: Information is submitted using an enveloped complete with policy-derived "confidentiality"

Figure 6: Privacy Metadata available in information submitted using Continuity of Care Documents (CCD)

Notice of Obligation – In some circumstances, the passing by the sending system of a reference to the policies and privacy consents that apply to data/documents (or the actual privacy consent itself) is sufficient for the receiving system to recognize and honor the obligations. The privacy consent would either get incorporated into the receiving system’s repository of privacy consents, or a 3rd party repository of privacy consents would help maintain restrictions on the use of protected data, even though it is now present in another EHR. However, laws such as 42 CFR Part 2 and 38 CFR Part 1 require that a notice of prohibition on redisclosure be sent with the data. As a result, an EHR should be able to send this notice automatically when it sends 42 CFR Part 2 data and include it with the document payload. If not, the provider would have to manually send the notice– which is presumably workflow-prohibitive.

Items for discussion by the Data Segmentation for Privacy WG

1) Do the HITSC proposed metadata standards only concern sharing at the document level and support only the use of privacy metadata on the “envelope”?
a. If so, what would data segmentation at this level look like? Would it be possible to complete one of our user stories using these standards? If a provider wanted to send some information that was protected along with other information that was not, would a provider have to send two transitions of care documents to a receiving provider- one with the privacy protective metadata and one without it?
b. If not, how can the privacy metadata be applied at the data level and is that desirable?
2) Based on current EHR technology and capability, at what data level could a rules engine adjudicate a policy ?
a. Is data standardized enough to handle a rule that data coming from a certain provider or encounter should not be shared without consent? Would this be enough to meet our current use cases?
b. If data is not standardized enough at this point, would it be technically capable for current EHRs or 3rd parties to apply privacy metadata appropriately so that a rules engine/policy decision point could adjudicate a Policy appropriately?
c. Would it be onerous for EHRs to add this “intrinsic metadata” to their systems?
3) Are there any current standards for expressing and exchanging policies, and/or privacy consents?
a. If so, are the standards adequate for meeting the needs of our use cases?
i. Refer to the HL7 Privacy Consent CDA Implementation Guide
4) Are there any existing standard for how a policy decision point/rules engine adjudicates privacy policies?
a. If yes, are they adequate for meeting the needs of our use cases?
b. Could they be adapted to distinguish data based on its metadata?
i. Refer to NwHIN Security and Privacy Framework use of XACML and SAML