Being able to sufficiently distinguish whether 2 IOI (items of interest) are linked or not, even WITHOUT knowing the actual identity of the subject of the linkable IOI.
Not being able to hide the link between two or more actions/identities/pieces of information.
Examples are: anonymous letters written by the same person, web page visits by the same user, entries in two databases related to the same person, people related by a friendship link, etc.
Note that often linkability involves IOIs that are linked because they belong to the same subject, however, IOIs can also be linked based on properties (e.g. people visiting the same restaurant, people with a similar disease, etc.).
Can lead to identifiability (see Identifiability trees) when too much linkable information is combined
Can lead to inference: when “group data” is linkable, this can lead to societal harm, like discrimination (e.g. if an insurance company knows that people who live in a certain area get sick more often, they might increase their insurance cost for that target group)
Data minimization: the less info is available, the better (see L_DS for more info) (+)
Identifiability: if the subject’s identity is known, all related data can obviously be linked (-)
 More information on (un)linkability is available in Pfitzmann, Andreas and Hansen, Marit, A terminology for talking about privacy by data minimization: Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity, and Identity Management, v0.34, Technical Report, 2010
Linkability of entity
Tree in general
These threats mainly focus on a subject (the entity) who wants to hide as much of his identifiable information (or at least make it as unlikable as possible). This can occur when the subject wants to authenticate himself to a certain service (multiple authentication principles are shown in the tree), but also during regular communication (browsing, client-server requests, etc.) by means of the contextual information used for the communication.
Leaf nodes explanation
When using a linkable login in combination with untrusted communication, the entity can become linkable (L_e1).
Linkable login (L_e2)
A linkable login (L_e2) can be a "fixed" login (L_e4), like an e-id or a username-password combination, which is being used more than once. As it is being reused, its corresponding IOIs are also linkable based on this login.
Alternatively, using very detailed certificates (L_e5) as means of authentication can lead to linkable logins. For example, certain online services are location-dependent and require users to reside in a certain country. This can be checked in multiple ways. Ideally the certificate (authorized by for example the government or another trusted certificate authority) only proves that the user is indeed a citizen of the required country. However, a more detailed certificate can provide the entire address of the user to prove his residence. As an address is unique (disregarding the fellow residents), the certificate becomes a linkable login.
Untrusted communication (L_e3)
A linkable login will however only lead to linkability of the entity when used over untrusted communication (L_e3).
Clearly, when the communication is not secure, the linkable login can be easily disclosed (ID_DF). Note that this information disclosure threat is applicable to all communication that transfers the entity's linkable credentials. Usually, this communication is limited to the information flow between the user and service, but when the service transfers the credentials to another service (e.g. a third party authentication service like Facebook Login), obviously also this communication should remain confidential.
Similarly, when the receiver of the data is untrustworthy (L_e6) (e.g. he fails to anonymize the data during processing or shares the information with other parties), the subject, and all the data he has communicated, become linkable. Note that this receiver can be the service the subject is directly communicating with, but also additional (third party) services that are used by the intended receiver of which the subject might be unaware.
Finally, it is also possible that the receiver is trustworthy and handles the linkable logins in a privacy-friendly fashion by anonymizing them, however, the anonymized logins are easily linkable in the identity management database (database where internal identifiers are managed and linked to their user accounts)(L_DS). Of course, this is only an issue when this identity management database is not secure.
Linkability base on metadata of entity communication (metadata linkability at L_DF)
Even when no linkable credentials are used, the user can still be linked based on the contextual information of his communication (e.g. based on his IP address or his online behavior) (linkability of contextual data at L_DF). This threat only applies to the communication that is directly connected to the entity, meaning the entity is the sender or receiver of the communication.
Linkability of data flow
Tree in general
A distinction should be made between linkability of the contextual data (e.g. IP address necessary for communication) and the transactional data (the actual data that are being communicated).
Contextual data linkability can be resolved by, for example, anonymous routing solutions and should be applied by the sender and/or receiver to protect their own unlinkability.
Transactional data, however, does not necessarily have the sender or receiver as data subject, but can involve a third party subject as well (e.g. when two doctors share information about a patient, this patient is the data subject of the transactional data while he is not involved in the actual communication).
Leaf nodes explanation
Linkability of transactional data (L_DF1)
Transactional data linkability (L_DF1) can occur when the transmitted data becomes available to an untrusted party (L_DF3). This either means that the communication is unprotected (L_DF6) and hence information disclosure threats (ID_DF) apply, or that a receiver cannot be trusted (L_DF7). This receiver can be the direct receiver of the information, or can be a “future” receiver when the receiving party communicates the transactional data to other parties (e.g. when the accounting is being outsourced to a third party).
Sharing data with untrusted parties will however only lead to linkability when the information that is made available is actually linkable (insufficient minimization inference of L_DS). This mainly occurs when the data that are being sent are not being minimized sufficiently. When protecting a user’s privacy ideally only the minimal set of information should be provided.
Linkability of contextual data (L_DF2)
Contextual data becomes linkable (L_DF2) when non-anonymous communication (L_DF4) is used. The data flow can be for example linked on IP address (L_DF8), computer iD (L_DF9), session ID (L_DF10), or even based on certain patterns (L_DF11) (like time, frequency, and location or browser setting, etc.).
Alternatively, it is possible that the anonymity system that is being used is insecure (L_DF5). This can enables traffic analysis (L_DF12) to extract information out of patterns of traffic; passive attacks (L_DF14), like long-term intersection attacks, traffic correlation and confirmation, fingerprinting, epistemic attacks (route selection), and predecessor attacks; or active attacks (L_DF13), like N-1 attacks, Sybil attack, traffic watermarking, tagging attack, replay, and DoS attack. More information about these attacks can be found in .
Linkability of data store
Tree in general
Linkability in a data store occurs when one has access to the data store and when insufficient data minimization is applied. This means that too much data are being stored which enables a large set of information that can be used (e.g. by data miners) to look for links.
The most obvious consequence of linking lots of information is that more pseudo-identifiers are linked which can result in identifiability (e.g. knowing one's city, gender, age or even first name does not reveal an identity, but when combined the anonymity set suddenly becomes a lot smaller and can already lead to identification, depending on the city's population size and the uniqueness of the person's first name. Thus the more data available and linkable (based on (pseudo)identifiers), the more likely the chance of identification.
Another result of linkability is inference. Instead of linking data that belongs to the same person, data are linked based on certain properties to deduce relationships between them and generalize them. This can be used in a rather innocent fashion to determine the best way to organize groceries in a grocery store (e.g. people who buy hamburgers usually buy buns at the same time, hence they are stored close to each other). This inference can however also have a more judgmental nature if it is used to discriminate a certain population (e.g. people living in a certain neighborhood have a higher chance of cancer, hence their health insurance fee is higher than the surrounding cities). Inference can thus lead to societal harm.
Leaf nodes explanation
Weak access control (L_DS1)
A requirement for linkability in the data store is weak access control to the data store (L_DS1). This can occur when the data are not stored in a confidential fashion and hence information disclosure threats exist (ID_DS).
Insufficient minimization/Inference (L_DS2)
When having access to the data store, linkability becomes a threat when data is insufficiently minimized (L_DS2). The more information is available the higher the chance of inference, which is a key element of data mining that derives ideas and conclusions by combining (linkable) information.
The data can be linked to data in another database (L_DS3). This other database can be both internal to the system that is being analyzed or external (data from a partner company, public data online, like Facebook data, etc.).
Or, simply too much data are available (L_DS4) in the data store that is being analyzed. This can occur when data are stored too long (L_DS5) instead of removing them when no longer needed (and thus resulting in too much information) or storing more information than required (L_DF6) for the purpose of collection (e.g. storing a subject's entire address when only his city is required). Both data retention and data minimization threats originate from the data protection legislation principles.
 Note that this minimization branch can also be considered from a wider perspective, when one does not (only) focus on data minimization, but also minimization of central storage, of risk, of trust, etc. This tree does however only discuss minimization of data, as it is the main privacy concern regarding privacy.
 When this additional database is an identity management database that stores account data (and the login is identifiable), linking the IOIs to this database will result in identifiability.
Linkability of process
Tree in general
The threat tree of linkability of process suggests that the only way to prevent different actions being linked to the same subject is by gaining access to the process. Note that this threat is very rare.
Leaf nodes explanation
Linkability of a process can only occur after information disclosure of this process (ID_P). We therefore refer to that tree.