Arvato Financial Solutions Tech Center

How can data survive in a GDPR world?

GDPR. DPO. Compliance.

You may think of these as a nightmare, but these topics are essential in the modern data-driven world. Especially in Europe, where we operate the most, GDPR compliance is a very important topic, to which highest attention has been paid for a couple of years now. Rules are rules and you have to respect them – they are here for a reason. So, how the company can use the data, to offer a better service without breaching the privacy of their consumers?

We, in AFS, have a lot of different companies (legal entities), located in different countries, under same EU regulation and, plus, regulated by local authorities. It would be nice, if our legal entities could share information with each other, because they all support the same service, which is offered to our client. To address this issue we started a project with a codename DnA, which aim was to find a legally accepted way to share the information between the different entities, serving the same value and business chain.

Today we want to share some insights with you learnt from this project.

Insight nr 1: you CAN’T exchange data on the personal level. Period.

There is nothing you can do here. You can’t say to the other company, that John Doe with email john@doe.com bought from you 1 pie at 09:00 AM on the 2nd of October. However, you can pass anonymized and aggregated information.

 

What does the anonymization really mean in this case? You need to make sure, that information you will provide to the other party won’t allow them to trace back the person, even if they will get access to your database. For example, if you are selling 1 pie per hour and passing information externally, that you got a sale on 10:04 AM – this can give the other party a chance to use this part of the data (let’s say your loyalty card provider, who knows, that John has used a loyalty card, issued to him, in your shop at 10:03 AM) to trace back your customer.

How to make sure this won’t happen under any circumstances?

Insight nr 2: data aggregation, applied on the right granularity level, will help you anonymize data.

So, you should aggregate data on suitable granularity level, to ensure full backwards intrackability of your customer. Make sure, you will select valid granularity level. For example, having 2 sales per hour and aggregating on hour level might not just be enough for you. Because if you will be passing dataset, similar to the next one:

HourAverage checkFemale %Male %
10:00 AM$ 6.780.50.5

It will be quite easy for a sophisticated data scientist to find out later the exact transactions in your database and link them to the physical persons. In such cases, a longer period has selected, to make sure, that the features will be general enough to hide the actual person behind them.

There are several aggregation technics available on the market. We have chosen aggregation based on the geographical position:

The map is divided by squares with sides wide enough to have a lot of data points in them, but not too wide to become too general. There is no point in selecting such grid, in which city like Cologne will be divided into 1-2 cells. Then you can also get some features based on the neighborhood characteristics, like: median education level, higher education ratio, etc.

But with such approach you will get a decrease in accuracy, you’ll say – that is true.

Insight nr 3: General numbers are general numbers. They don’t describe the person – they describe an abstract, a grouped feature set. And the accuracy of your algorithms and model will suffer.

Of course for any data-driven person it will be better to know, that:

  1. John eats 3 pies per week
  2. mostly in the period from 8:05 till 8:15
  3. and has spent $ 15.67 on them on the first week of September 2020

rather than know, that John belongs to the group of people, who tend to:

  1. eat 6.78% more pie than average
  2. prefer morning hours
  3. and spend slightly less than average

But… it’s still better, than nothing and you WILL get some increase in accuracy with combined data.

So, respect the privacy of your customers and try to get everything you can in the legally accepted manner!

Dmitri Oništšik

Dmitri Oništšik

Data Architect