Identifiability

In general

Being able to sufficiently identify the subject within a set of subjects (i.e. the anonymity set).

Not being able to hide the link between the identity and the IOI (an action or piece of information)[1].

Examples are: identifying the reader of a web page, the sender of an email, the person to whom an entry in a database relates, etc.

Consequences

  • Can lead to severe privacy violations (when subject assumes he is anonymous)

Impacted by

  • Data minimization: the less info is available, the better (see L_DS for more info) (+)

  • Linkability: the more information is linked, the higher the chance the combined data are identifiable (the more attributes are known, the smaller the anonymity set) (-)

 

[1] More information anonymity and pseudonymity is available in Pfitzmann, Andreas and Hansen, Marit, A terminology for talking about privacy by data minimization: Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity, and Identity Management, v0.34, Technical Report, 2010

 

Identifiability of entity

LINDDUN identifiability of  entity

Tree in general

These threats mainly focus on a subject (the entity) who wants to hide as much of his identifiable information as possible. This can occur when the subject wants to authenticate himself to a certain service (multiple authentication principles are shown in the tree), but also when (anonymously) browsing by means of the contextual information using for the communication.

Leaf nodes explanation

When an identifiable login is used and is communicated in an untrustworthy matter, the entity can be identifiable (I_e1).

Identifiable login used (I_e2)

Several types of identifiable logins exist. The most obvious is the e-id (I_e7), which means using your real identity (I_e4).

Alternatively, a pseudo-identity (I_e5) can be used. The most common pseudo-identity is a pseudonym (I_e8), using a username-password combination. Although in theory this can provide anonymity (you can choose an unrelated username and password, or it can be assigned to you), in practice this pseudonym is often not very anonymous. Either the username can be easily linked (I_e18) to the user's identity (e.g. the user's firstname and/or lastname) or even the password (I_e17) can contain identifiable information (people tend to use easy to remember password like their birthday). Another pseudo-identifier can be a token (I_e9), which can be either a hardware (I_e15) or software (I_e14) token [1] (e.g. a smartcard, usb token, a disconnected token, a file, etc.). When the token is badly designed (either physically or implementation-wise), the entity can be identified. A final type of pseudo-identity is biometrics (I_e10) (e.g. fingerprints) that are identifiable when the biometrics themselves can be linked back to the actual identity (I_e16).

Another type of identifiers are certificates (I_e6). They are the most privacy-friendly authentication type as they only aim at proving certain properties about the entity (e.g. older than 18, living in the US, female, etc.). The entity can however still become identifiable when the certificate contains too much (precise) properties (L_e11). The more specific a certificate is (and thus unique), the more identifiable it becomes.

Untrusted communication (I_e3)

An identifiable login can lead to identifiability of an entity when this login is used over untrusted communication (I_e3). This subtree is actually very similar to the untrusted communication subtree of linkability (L_e3) as the threats are closely related.

When the communication is not secure, the identifiable login can be easily disclosed (ID_DF). This information disclosure threat is applicable to all communication that transfers the entity's identifiable credentials. This communication is not necessarily limited to the information flow between the user and service, as the service can decide to transfers the credentials to another service (e.g. a third party authentication service like Facebook Login). Obviously, also this communication should remain confidential.

Similarly, when the receiver of the data is untrustworthy (I_e6) (e.g. he fails to anonymize the data during processing or shares the information with other parties), the subject, and all the data he has communicated, become identifiable. Note that this receiver can be the service the subject is directly communicating with, but also additional (third party) services that are used by the intended receiver of which the subject might be unaware.

Also, when the user is not careful when storing his credentials (I_e20) (e.g. writing username and password on a paper near the computer, failing to install security measures which allow keyloggers or other kinds of eavesdropping, etc.), the identifiable login can be easily intercepted and hence the entity becomes identifiable.

Finally, even when the receiver is trustworthy and stores the identifiable user credentials in a privacy-friendly fashion by anonymizing them, the entity can become identifiable when the anonymized logins are identifiable in the identity management database (I_DS) if it is not properly secured (e.g. having an identity management database where the full credentials are stored, and an internal (anonymized) identifier for the main data; however, the IDM database is not encrypted).

 

Identifiability based on metadata of entity communication (indentifiability of contextual data at I_DF)

Even when no identifiable credentials are used, the user can still be identified based on the contextual information of his communication (e.g. based on his IP address or his online behavior) (identifiability of contextual data at I_DF). This threat only applies to the communication that is directly connected to the entity, meaning the entity is the sender or receiver of the communication.

[1] Wikipedia provides a nice overview of the different kinds of security tokens

 

Identifiability of data flow

LINDDUN identifiability of flow
 

Tree in general

A distinction should be made between identifiability of the contextual data (e.g. IP address necessary for communication) and the transactional data (the actual data that are being communicated).

Contextual data identifiability can be resolved by, for example, anonymous routing solutions and should be applied by the sender and/or receiver to protect their own anonymity.

Transactional data, however, does not necessarily have the sender or receiver as data subject, but can involve a third party subject as well (e.g. when two doctors share information about a patient, this patient is the data subject of the transactional data while he is not involved in the actual communication). Transactional data identifiability can occur when the flow is unprotected and hence information disclosure threats apply, and, when this disclosed information itself is identifiable. Another threat to transactional data identifiability exists when the data is being sent to an untrustworthy receiver (or future receiver). Transactional data identifiability should be resolved at the origin of the data or at least before the data cross a trust boundary (which can thus be much earlier in the flow than the current sender).

Leaf nodes explanation

Identifiability of transactional data (I_DF1)

Transactional data identifiability (L_DF1) can occur when the transmitted data becomes available to an untrusted party (I_DF3). This either means that the communication is unprotected (I_DF4) and hence information disclosure threats (ID_DF) apply, or that a receiver cannot be trusted (I_DF5). This receiver can be the direct receiver of the information, or can be a "future" receiver when the receiving party communicates the transactional data to other parties (e.g. when the accounting is being outsourced to a third party).

Sharing data with untrusted parties will however only lead to linkability when the information that is made available is actually linkable (insufficient minimization of I_DS). This mainly occurs when the data that are being sent are not being minimized sufficiently. When protecting a user's privacy ideally only the minimal set of information should be provided.

Identifiability of contextual data (I_DF2)

This subtree is actually identical to the subtree of linkability (L_DF2). The summary of this subtree will thus be similar as well. Note however that is, generally speaking, easier to link data flows based on their contextual information, than actually identify (e.g. knowing that a certain user visits website X every day at 8PM will make his actions linkable based on this time pattern, however, this does not provide sufficient information to actually identify him).

Contextual data becomes identifiability (I_DF2) when non-anonymous communication (I_DF6) is used. The data flow can be for example linked on IP address (I_DF8)computer iD (I_DF9)session ID (I_DF10), or even based on certain patterns (I_DF11) (like time, frequency, and location or browser setting, etc.).

Alternatively, it is possible that the anonymity system that is being used in insecure (I_DF7). This can enables traffic analysis (I_DF12) to extract information out of patterns of traffic; passive attacks (I_DF13), like long-term intersection attacks, traffic correlation and confirmation, fingerprinting, epistemic attacks (route selection), and predecessor attacks; or active attacks (I_DF14), like N-1 attacks, Sybil attack, traffic watermarking, tagging attack, replay, and DoS attack. More information about these attacks can be found in the work of Danezis, Diaz, and Syverson[1].

[1] G. Danezis, C. Diaz, and P. Syverson, Systems for Anonymous Communication, in CRC Handbook of Financial Cryptography and Security, p. 61. Chapman & Hall, 2009.

Identifiability of data store

LINDDUN identifiability of data store

Tree in general

Identifiability at a data store can only occur when the data store itself is not sufficiently protected.

 

When the data store can be accessed and the data are insufficiently anonymized, the data can become identifiable. Either because the data are identified because they are linked to (identifiable) login data, or because the data are re-identified by lack of (sufficient) data minimization.

Leaf nodes explanation

Weak access control (I_DS1)

In order to have identifiability at the data store, one needs access to it. Thus weak access control (I_DS1) is a prerequisite, which is possible when there is information disclosure at the data store (ID_DS).

Weak anonymization/ inference (I_DS2)

When access to the data store is provided, data become identifiable when there is weak data anonymization (and/or when inference techniques are applied) (I_DS2). This can occur in two ways. Either the data are identified by means of login data (I_DS3) when the data are linkable to login data (I_DS5) [1] and these login data are identifiable (identifiable login used of I_E).

Or, the data can be re-identified (I_DS4) because they are insufficiently minimized (insufficient minimization at L_DS) and given the abundance of linked data, they become identifiable (I_DS6). Clearly, the more information is available, the more unique it becomes and hence the smaller the anonymity set.

[1] This node is actually a specific case of the L_DS3 leaf node (data linkable to other database). When the other database contains identifiable data, like an identity management database does, the entire dataset becomes identifiable

 

Identifiability of process

LINDDUN identifiability of process

Tree in general

The threat tree of Identifiability of process suggests that the only way to prevent different actions being linked to the same subject is by gaining access to the process. Note that this threat is very rare.

Leaf nodes explanation

Identifiability of a process can only occur after information disclosure of this process (ID_P). We therefore refer to that tree.

 

DistriNet Research Group 

KU Leuven

Dept. Computer Science 

Celestijnenlaan 200A (postbox 2402) 

200A B-3001 Heverlee BELGIUM 

  • White Facebook Icon
  • White Twitter Icon

© 2020  DistriNet KU Leuven