How do they differ and what do they protect us from?

One of the possible definitions of privacy is the right that all people have to control information about themselves, and particularly who can access personal information, under what conditions and with what guarantees. In many cases privacy is a concept that is intertwined with security. However, security is a much broader concept that encompasses different mechanisms.

Security provides us with tools to help protect privacy. One of the most widely used security techniques to protect information is data encryption. Encryption allows us to protect our information from unauthorized access. So, if by encrypting I am protecting my data and access to it, isn’t that enough?

Encryption is not enough for Anonymization because…

in many cases, the information in the metadata is unprotected. For example, the content of an email can be encrypted. This gives us a [false] idea about some protection. When we send the message, there is a destination address. If the email sent is addressed, for example, to a political party, that fact would be revealing sensitive information despite having protected the content of the message.

On the other hand, there are many scenarios in which we cannot encrypt the information. For example, if we want to outsource the processing of a database or release it for third parties to carry out analyses or studies for statistical purposes. In these types of scenarios we often encounter the problem that the database contains a large amount of personal or sensitive information, and even if we remove personal identifiers (e.g., name or passport number), it may not be sufficient to protect the privacy of individuals.

Anonymization: protecting our privacy

Anonymization (also known as “data masking”) is a set of techniques that allows the user to protect the privacy of the documents or information by modifying the data. This means anonymization with gaps (deletion), anonymization with placeholders (substitution) or pseudoanonymizing data.

In general, anonymization aims to alter the data in such a way that, even if it is subsequently processed by a third party, the identity or sensitive attributes of the persons whose data is being processed cannot be revealed.

Privacy management is regulated similarly across legal jurisdictions in the world. In Europe, it is known as GDPR (General Data Protection Regulation). which was approved in 2016 and implemented in 2018. In the US, the California Consumer Privacy Act (CCPA) was approved in January 2018 and is applicable to businesses that

  • have annual gross revenues in excess of $25 million;
  • buys, receive, or sell the personal information of 50,000 or more consumers or households; or
  • Earn more than half of its annual revenue from selling consumers’ personal information

It is expected that most other States will follow the spirit of California’s CPA any time soon. This will affect the way organizations collect, hold, release, buy and sell personal data.

In Japan the reformed privacy law came into full force May 30, 2017 and it is known as Japanese Act on Protection of Personal Information (APPI). The main differences with the European GDPR are the specific clauses defining private identifiable which in Europe are “Personal data means any information relating to an identified or identifiable natural person” but APPI itemizes.

In general, all privacy laws want to provide citizens with the right to:

  1. Know what personal data is being collected about them.
  2. Know whether their personal data is sold or disclosed and to whom.
  3. Say no to the sale of personal data.
  4. Access their personal data.
  5. Request a business to delete any personal information about a consumer collected from that consumer.
  6. Not be discriminated against for exercising their privacy rights.

The new regulations seek to regulate the processing of our personal data. Each one of them establishes that data must be subject to adequate guarantees, minimizing personal data.

What is PangeaMT doing about Anonymization?

PangeaMT, is Pangeanic’s R&D arm. We lead the MAPA Project – the first multilingual anonymization effort making deep use of bilingual encoders for transformers in order to identify actors, personal identifiers such as names and surnames, addresses, job titles and functions, and a deep taxonomy.

Together with our partners (Centre National pour la Recherche Scientifique in Paris, Vicomtech, etc.) we are developing the first truly multilingual anonymization software. The project will release a fully customizable, open-source solution that can be adopted by Public Administrations to start their journey in de-identification and anonymization. Corporations will also be able to benefit from MAPA as the commercial version will be released on 01.01.2021.

PangeaMT will showcase its anonymization software for the Japanese market at the forthcoming AI EXPO in Tokyo from 28th-30th October, following its successful AI EXPO introduction of Japanese machine translation.

About Alexandre

Alexandre joined Pangeanic in 2011 while still finishing his Master’s degree in applied machine translation. Alex attended the University of Alicante where he studied Technical Engineering in Computer Science, with major in pattern recognition. He majored in Machine Translation, Artificial Intelligence, Neural Networks, Pattern Recognition and Digital Imaging during his Master’s Degree at the Polytechnic University of Valencia, where he was also involved in the development of the first version of PangeaMT back in 2010.He is a specialist in Machine Translation in distant languages, like Japanese-English, Chinese-English-Spanish. His daily duties include Research and Development and programming. He is also Pangeanic’s system administrator.In our team, Alex has been responsible for the technical aspects of the research implementation of EU’s EXPERT project at Pangeanic, including search-engine techniques based on Elastic Search in a massive database and hybrid MT + TM approaches.