While security and privacy issues are highly interdependent and intertwined with one another, in recent years the privacy research field has increasingly developed its own identity and thematic objectives within the overall PACs spectrum. Some privacy challenges and solutions are addressed within established security-related domains such as Access Control and Cryptology, whereas others sit firmly as an extension of the data management and analytics field, particularly as developments around Big Data become a key influencer of data privacy. Encryption and obfuscation techniques in particular are increasingly used for privacy assurance and in order to comply with relevant privacy legislation.
Regular consensus initiatives such as the Annual Privacy Forum (APF) and CPDP (Computer Privacy and Data Protection) recognise that privacy research issues are complex, and are increasingly easily manipulated in favour of the privacy invader [APF12]. As a less mature research field within the overall security domain, it is still believed that privacy handling is still rather siloed and tactical in nature, with a greater need for overarching flagship initiatives with greater completeness, driving large-scale solution deployments that are adopted by a large user base. Some key privacy research themes and challenges within the overall PACs spectrum are highlighted below.
- Privacy issues around data growth and emergence of Big Data.
- Advanced Data Protection Techniques
- Privacy Issues in Authorisation, Authentication and Access Control (AAA).
- Privacy issues in performing forensics investigations
- Privacy issues in IoT-Based Monitoring and information exchange
- Improving data privacy co-ordination among technical and legal stakeholders
- Privacy Issues in Software Development and Process Improvement
Privacy issues around data growth and emergence of Big Data.
As increasing demands are placed on the amount of data that needs to be processed, the increasingly advanced correlation and intelligence capabilities of such data analytics solutions creates an increasing challenge for privacy preservation, and for ensuring data collection that is fit for purpose. Recent high profile examples of over-collection of user data beyond the application’s core purpose exist, for example radio application TuneIn’s over-collection of birthday and ZIP code info in the US, as well as Google’s unauthorised collection of password, emails and other personal data from accessible consumer Wi-Fi networks.
Traditional privacy preservation techniques such as K-Anonymity (or variants such as L-Diversity) are no longer deemed to be effective in the era of Big Data, as powerful data re-identification algorithms are now capable of easily inferring “identifying” data from any data available, even when certain “key “ attributes have already been anonymised or “de-identified”. For example a wide range of online behavioural characteristics are viewed as being highly effective at supporting re-identification, for example consumption preferences, commercial transactions, and search histories among others, all of which when combined can provide a powerful means of re-identifying users even when key “identifying” attributes such as names, addresses, emails etc. have already been anonymised [NAR10].
In order to increase assurances that emerging Big Data analytics strikes a balance between privacy preservation and the usefulness of tools and applications, new techniques are needed. Emerging developments such as work around differential privacy are seen as a positive step towards reducing the problem, with its emphasis is not on de-identifying key data, but more so around formally defining in a structured manner what it means for a data computation to be privacy-preserving [DWO11].
However such methods need to be strongly integrated with established security mitigation, for example leveraging encryption, combined with additional measures to add transparency and accountability, such as efficient logging and auditing tools and AAA (Authorisation, Authentication and Access Control).
Advanced Data Protection Techniques
Protection of sensitive data is a key concern within operation of today’s digital infrastructures which are increasingly distributed and involving outsourcing of user data to one or more third-parties. Such techniques should be able to satisfy generic privacy constraints that correspond to different privacy needs, for example that values assumed by some attributes (e.g. phone numbers or email addresses) are considered sensitive and therefore cannot be stored in the clear, or that given attribute values are sensitive and cannot be released. Also they should aim to be robust against inferences that may be drawn from exploiting data dependencies. Another element involves support for allowing users to verify that data has not been improperly modified or tampered with, and that providers comply with any availability constraints that may be specified by the data owners.
As encryption techniques are often leveraged to ensure that third part data processors do not have access to plaintext data, query execution around such data is made much more difficult or perhaps impossible. Work around searchable encryption algorithms aim to strike a balance between protecting core information while still making sufficient metadata index information available to support important kinds of query-based analysis. Techniques for distributed querying involving data emanating from multiple parties, typically with different data protection requirements and policies is also an open research area. Preserving the privacy of the queries themselves is also an objective, particularly if the subject matter of the query is sensitive (e.g. a user searching about information related to a specific illness relating to them or a relative).
Privacy Issues in Authorisation, Authentication and Access Control (AAA).
There is an increasing need to handle user identity and credentials and privacy of related data more effectively within forward development of such solutions (sometimes referred to privacy-enhanced attribute-based credentials, or “Privacy-ABCs”) [CSP14]. In the consumer context, the most widely used single sign-on services presently used by the mainstream public are managed by large IT vendors such as Facebook and Google, giving such vendor access to large amounts of user behavioural data that can sit beyond the core function of such companies.
Hence would be more ideal if the authentication systems would limit the information associated with user’s activity to such third-party services; or at least have the ability to control such inputs if desired. Present approaches are often based on approaches symmetric to the ones used by servers for the disclosure of resources to known users. However the ability for users to select specific data categories that they wish to disclose is not facilitated by such approaches, thereby limiting ability to implement the “Own Your Own Data” (OYOD) principle. Hence a relevant research challenge is to develop expressive and flexible approaches for regulating the release of user personal data depending on past interactions/usage, or the context and purpose of the interaction. Potential anticipated emergence of personal clouds and the ability for users to become the hub of their own data may accentuate this trend.
In the corporate context, a huge amount of private information is being circulated and stored, often without direct control of its owner, thereby greatly increasing privacy risks. Privacy of owner data in such contexts will rely on appropriate access control frameworks that regulate related information exchange and access in the interactions among parties, which is a challenging problem with many open research issues in today’s system context.
Privacy issues in performing forensics investigations
Ongoing tension exists around emerging forensics methods versus privacy rights, and the need to develop tools that can support such investigations in a privacy-legislation-compliant manner.
At the opposite end, advancements in encryption technology increase difficulties in applying many existing forensics techniques, hindering the investigative process - malicious criminal behaviour being perpetrated via specific privacy-preserving search tools such a Tor providing one such example.
Privacy issues in IoT-Based Monitoring and information exchange
Privacy and user-driven regulation of information exchange will become even more prevalent in existing and emerging Internet of Things contexts involving large scale environmental monitoring and information collection. Approaches deemed invaluable across many industry sectors such as construction, transportation, retail, healthcare and automotive are vulnerable, especially as many such monitoring devices were initially produced with no security protection mind.
Similar issues apply for security-specific monitoring, for example around selectively anonymising and de-anonymising information in SIEM contexts where deployment is increasingly more likely to be shared across customers and infrastructures. Hence there is a need to applying novel privacy techniques to pre-correlation at the sensor level, to the collection and parsing and to the generation of alarms and so on.
Improving data privacy co-ordination among technical and legal stakeholders
Many legal aspects are strongly related to privacy management within PACs issues as a while - for example aspects around privacy, data protection legislation, and contracts around service level agreements (SLAs). As systems increasingly become cross-organisational and cross-national, cross-cutting legal requirements become more complex to understand, including which jurisdictions may be relevant. Hence methods to improve the interleaving of legal risk assessment with privacy assessment are needed within PACs assessment as a whole.
Privacy Issues in Software Development and Process Improvement
It is widely acknowledged that improved procedures and tools support for integrating privacy-by-design into the software development lifecycle are needed and will be an ongoing research challenge as software development approaches and frameworks continue to evolve. Emerging techniques should focus on privacy protection across several key contexts [FOR14], for example:
- Temporal – defining WHEN applications can collect and use data about users
- Spatial – applications should only know about user locations only when it is necessary, for example within 100 metres of a given store when promoting in-store promotions, etc.
- Functional – specifying the technical and procedural ways in which companies can collect data about you
- Identity context – which identify/persona is a customer allowed to use when integrating with an application/brand
- Social – who should/can that application/brand share my data with?
Related issues focus on developing improved Privacy Markup Languages and improving the ability to integrate existing Privacy Enhancing Technologies (PETs) with contemporary application development contexts.
Other open issues are based on extending research based on secure composition of software services (such as those focussed on in the Aniketos FP7 project and similar initiatives), including such models to include more explicit support for privacy parameters.
[APF12] Report on Annual Privacy Forum 2012, ENISA
[NAR10] Narayanan, A., & Shmatikov, V. (2010). Myths and fallacies of personally identifiable information. Communications of the ACM, 53(6), 24-26.
[DWO11] Dwork, C. (2011). A firm foundation for private data analysis. Communications of the ACM, 54(1), 86-95.
[CSP14] "Privacy technologies: From research to the real world", Dr. Gregory Neven, IBM Research - Zurich Cyber Security & Privacy Forum, Athens, 21-22 May 2014 http://www.cspforum.eu/uploads/Csp2014Presentations/Track_1/Dr_G.Neven-Privacy%20technologies-1.pdf
[FOR14] Contextual Privacy: Making Trust A Market Differentiator, Forrester webinar January 28th 2014, https://www.forrester.com/The+New+Privacy+Is+All+About+Context/-/E-WEB16543