Undeleteable data

Daniel Stone

13 Apr 2018 13 Apr '18

12:07 p.m.

Hi, We've been looking into GDPR compliance for fd.o, which has been ... fun. The biggest stumbling block for us is probably Bugzilla and Mailman. Deleting messages and profiles from those just isn't practical for us, especially at any kind of scale. We could write a script to censor those, but once it has been posted to either, then it's all over the public internet anyway. We don't control distribution once messages hit Mailman - it's forwarded raw to a potentially unlimited distribution list - and deleting messages from Mailman is also a manual nightmare. Rebuilding the archives is out since it breaks URLs. Hand-editing it all sucks beyond belief. And then people have quoted it in replies anyway ... Does anyone know if there's some kind of GDPR 'out' for, 'by posting here you agree that everything is going to be made public, so as there's nothing we can do about its distribution, it's not useful or practical for us to undo that'? And are there any kind of credible Bugzilla/Mailman deletion tools? Cheers, Daniel

Show replies by date

Moritz Bartl

13 Apr 13 Apr

12:17 p.m.

On 13.04.2018 13:07, Daniel Stone wrote:

...

The biggest stumbling block for us is probably Bugzilla and Mailman. Deleting messages and profiles from those just isn't practical for us, especially at any kind of scale. We could write a script to censor those, but once it has been posted to either, then it's all over the public internet anyway.

In many countries it has already been the case that if someone requests personal data to be deleted, you have to make that happen. This does not mean you have to delete the data from all the _other_ places it already went out to, so the only thing we're talking about in the Mailman case is the archives: Posts themselves and potentially quotes, yes, as long as it is personal identifiable data. My understanding there is that it would be enough in most cases to remove the sender information, and the quoted name above quotes, not the quoted statements themselves. In almost all larger project that I've been involved in, we had such cases already: People mistakenly posting sensitive information to a list, or asking for removal later because they didn't understand their mail would be publicly archived. Few, yes, but still. Which meant exactly what you mentioned: the manual hacky way of censoring the archived post. I don't see how the GDPR changes that. You cannot argue your way out of it, the obligation exists that you do need to remove such personal content on request, but: How often will it happen, really? There is no obligation to fully and cleanly automate it. -- moritz

Daniel Stone

2:02 p.m.

Hi Moritz, On 13 April 2018 at 13:17, Moritz Bartl <moritz@techcultivation.org> wrote:

...

On 13.04.2018 13:07, Daniel Stone wrote:

...
The biggest stumbling block for us is probably Bugzilla and Mailman. Deleting messages and profiles from those just isn't practical for us, especially at any kind of scale. We could write a script to censor those, but once it has been posted to either, then it's all over the public internet anyway.

In many countries it has already been the case that if someone requests personal data to be deleted, you have to make that happen. This does not mean you have to delete the data from all the _other_ places it already went out to, so the only thing we're talking about in the Mailman case is the archives: Posts themselves and potentially quotes, yes, as long as it is personal identifiable data. My understanding there is that it would be enough in most cases to remove the sender information, and the quoted name above quotes, not the quoted statements themselves.

In almost all larger project that I've been involved in, we had such cases already: People mistakenly posting sensitive information to a list, or asking for removal later because they didn't understand their mail would be publicly archived. Few, yes, but still. Which meant exactly what you mentioned: the manual hacky way of censoring the archived post.

True. We have done it a couple of times, but those were quite extreme: copyright violation (posting proprietary code), and extreme content that would have been legally actionable for us.

...

I don't see how the GDPR changes that. You cannot argue your way out of it, the obligation exists that you do need to remove such personal content on request, but: How often will it happen, really? There is no obligation to fully and cleanly automate it.

OK, it's good to have your opinion that we cannot route around this. That is a very real change for us though, because of the claimed universal jurisdiction regardless of the location of the servers/processors (fd.o is not hosted in the EU, nor is it part of a European legal entity). I don't care about being obligated to have the process automated, but quite the reverse: it's _extremely_ consuming. For Mailman archives, this means hand-editing mboxes, HTML mails, and all of author/date/thread indices. Even in advance of the GDPR, we get more requests for this than we can practically service given our limited admin time. Given a deluge of requests, we would be forced to either not comply and face any legal consequences, or just stop offering these services. Cheers, Daniel

TJ

2 p.m.

On 13/04/18 12:07, Daniel Stone wrote:

...

Does anyone know if there's some kind of GDPR 'out' for, 'by posting here you agree that everything is going to be made public, so as there's nothing we can do about its distribution, it's not useful or practical for us to undo that'? And are there any kind of credible Bugzilla/Mailman deletion tools?

...

From reading the regulation and various interpretations of it, it seems that PII required to operate the service is exempt from the requirement to get specific consent, and from what I've read, may also exempt (some of) that data from the deletion requirement.

The regulation is designed to protect non-essential collected PII. I'd also wonder about the difference between 'collected' and 'volunteered' data in respect of bug reports, emails to mailing lists, etc., since in most cases the service isn't asking for PII. On the contract side, if the processing is necessary for the performance of the contract, then it is a lawful use not requiring explicit consent. The data subject is giving consent by subscribing or sending to a mailing list, or creating or adding to a bug report. In this case I'd suspect ensuring there is an explicit notice that the action is giving consent would be sufficient (although it's not clear these used require consent). Corner-cases are where a child is the data-subject and verifiable parental consent is required.

Daniel Stone

2:05 p.m.

Hi TJ, On 13 April 2018 at 15:00, TJ <0.gdpr-discuss@iam.tj> wrote:

...

On 13/04/18 12:07, Daniel Stone wrote:

...
Does anyone know if there's some kind of GDPR 'out' for, 'by posting here you agree that everything is going to be made public, so as there's nothing we can do about its distribution, it's not useful or practical for us to undo that'? And are there any kind of credible Bugzilla/Mailman deletion tools?

From reading the regulation and various interpretations of it, it seems that PII required to operate the service is exempt from the requirement to get specific consent, and from what I've read, may also exempt (some of) that data from the deletion requirement.

The regulation is designed to protect non-essential collected PII.

I'd also wonder about the difference between 'collected' and 'volunteered' data in respect of bug reports, emails to mailing lists, etc., since in most cases the service isn't asking for PII.

On the contract side, if the processing is necessary for the performance of the contract, then it is a lawful use not requiring explicit consent.

The data subject is giving consent by subscribing or sending to a mailing list, or creating or adding to a bug report. In this case I'd suspect ensuring there is an explicit notice that the action is giving consent would be sufficient (although it's not clear these used require consent).

This is quite a different viewpoint from Moritz's, and was also my reading of it. This is what our current privacy policies and notices express, so people are at least fully aware of the consequences of volunteering information. As it comes from Mailman/Bugzilla, it is not exactly passive: you are voluntarily providing data to be posted for public consumption, and we make people aware of the consequences of doing so when registering/subscribing.

...

Corner-cases are where a child is the data-subject and verifiable parental consent is required.

That one is far more difficult. I suppose there is another corner case, if someone was to e.g. forward a mail from someone else to a list. In that case, the person whose PII is available has not necessarily directly consented to our processing of that information. I'm not at all sure what regulations apply to this third-party case. Cheers, Daniel

Gregor Jehle

2:54 p.m.

Hi list, On 04/13/2018 03:05 PM, Daniel Stone wrote:

...

On 13 April 2018 at 15:00, TJ <0.gdpr-discuss@iam.tj> wrote:

...
The data subject is giving consent by subscribing or sending to a mailing list, or creating or adding to a bug report. In this case I'd suspect ensuring there is an explicit notice that the action is giving consent would be sufficient (although it's not clear these used require consent).

This is quite a different viewpoint from Moritz's, and was also my reading of it. This is what our current privacy policies and notices express, so people are at least fully aware of the consequences of volunteering information. As it comes from Mailman/Bugzilla, it is not exactly passive: you are voluntarily providing data to be posted for public consumption, and we make people aware of the consequences of doing so when registering/subscribing.

as I understand the GDPR, a key point is that consent once given is not forever. You're able, at any point in time, to decide otherwise and then request deletion of your PII. Cheers, Gregor

Moritz Bartl

3:37 p.m.

On 13.04.2018 15:54, Gregor Jehle wrote:

...

...
...
The data subject is giving consent by subscribing or sending to a mailing list, or creating or adding to a bug report. In this case I'd suspect ensuring there is an explicit notice that the action is giving consent would be sufficient (although it's not clear these used require consent). This is quite a different viewpoint from Moritz's, [..] as I understand the GDPR, a key point is that consent once given is not forever. You're able, at any point in time, to decide otherwise and then request deletion of your PII.

Exactly. My understanding is that both is correct, and not at all a different viewpoint than what I talked about, which is requested deletion. I merely implied earlier consent, because without that it would not have been (legally) possible to collect the data in the first place. ;-) The most important thing about the GDPR is that it is actually very readable. OK, it is a couple of pages long (~50), but you do not need to be a lawyer to understand what it asks for. Of course, anyone's reading might contrast quite a bit from how lawyers will over time engineer courts into interpreting it, but the best approach here is to take some time to fully read it and make up your own mind, rather than (or before) reading 50+ pages of random interpretations. OK OK maybe I've grown to actually enjoy reading laws, yes, that too. -- moritz

TJ

4:50 p.m.

On 13/04/18 14:54, Gregor Jehle wrote:

...

Hi list,

On 04/13/2018 03:05 PM, Daniel Stone wrote:

...
On 13 April 2018 at 15:00, TJ <0.gdpr-discuss@iam.tj> wrote:

...
The data subject is giving consent by subscribing or sending to a mailing list, or creating or adding to a bug report. In this case I'd suspect ensuring there is an explicit notice that the action is giving consent would be sufficient (although it's not clear these used require consent).

This is quite a different viewpoint from Moritz's, and was also my reading of it. This is what our current privacy policies and notices express, so people are at least fully aware of the consequences of volunteering information. As it comes from Mailman/Bugzilla, it is not exactly passive: you are voluntarily providing data to be posted for public consumption, and we make people aware of the consequences of doing so when registering/subscribing.

as I understand the GDPR, a key point is that consent once given is not forever. You're able, at any point in time, to decide otherwise and then request deletion of your PII.

My point was there is a different requirement for the data required to provide and operate the service, from the additional voluntary PII a subscriber might provide. Also, for a mailing-list, there are two aspects: 1. operating the SMTP relay 2. operating an HTTP archive For (1) in most cases unsubscribing should remove the data subject's email address and (optional) (nick)name from the database. For (2) the challenge is to mask the user's PII without damaging the thread context. To me that means masking/replacing the To: From: and reply context like:

...

...
On 13 April 2018 at 15:00, TJ <0.gdpr-discuss@iam.tj> wrote:

Ralph Corderoy

11 May 11 May

12:09 p.m.

Hi TJ,

...

Also, for a mailing-list, there are two aspects: ... 2. operating an HTTP archive ... For (2) the challenge is to mask the user's PII without damaging the thread context. To me that means masking/replacing the To: From: and reply context like:

...
...
On 13 April 2018 at 15:00, TJ <0.gdpr-discuss at iam.tj> wrote:

Looking at https://www.earth.li/pipermail/gdpr-discuss/2018-April.txt.gz that I've just used to read this list there's also `Message-ID', `In-Reply-To', and `References' headers since they can include IP addresses or other PII. And then the bodies need inspecting for PII to satisfy Huge Grunt's request? -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy

Holger Levsen

13 Apr 13 Apr

2:50 p.m.

On Fri, Apr 13, 2018 at 02:00:55PM +0100, TJ wrote:

...

From reading the regulation and various interpretations of it, it seems that PII

what does PII stand for? -- cheers, Holger

Daniel Stone

2:54 p.m.

On 13 April 2018 at 15:50, Holger Levsen <holger@layer-acht.org> wrote:

...

On Fri, Apr 13, 2018 at 02:00:55PM +0100, TJ wrote:

...
From reading the regulation and various interpretations of it, it seems that PII

what does PII stand for?

Personally-Identifiable Information

Jonathan McDowell

2:03 p.m.

On Fri, Apr 13, 2018 at 01:07:21PM +0200, Daniel Stone wrote:

...

We've been looking into GDPR compliance for fd.o, which has been ... fun.

Yeah. I've been involved with looking at it for Debian. Fun isn't the word I'd use; I've ended up with a lot of questions and no real answers at this stage.

...

The biggest stumbling block for us is probably Bugzilla and Mailman. Deleting messages and profiles from those just isn't practical for us, especially at any kind of scale. We could write a script to censor those, but once it has been posted to either, then it's all over the public internet anyway.

We don't control distribution once messages hit Mailman - it's forwarded raw to a potentially unlimited distribution list - and deleting messages from Mailman is also a manual nightmare. Rebuilding the archives is out since it breaks URLs. Hand-editing it all sucks beyond belief. And then people have quoted it in replies anyway ...

Does anyone know if there's some kind of GDPR 'out' for, 'by posting here you agree that everything is going to be made public, so as there's nothing we can do about its distribution, it's not useful or practical for us to undo that'? And are there any kind of credible Bugzilla/Mailman deletion tools?

For posting and distributing I think the "You posted to a list, therefore it's going to be sent out to anyone on the list" is reasonable - it's a point in time thing, it's the way lists work and there's no retention. For archives if you rely on "you posted it, therefore we'll archive it and display it" you're using consent as the basis. GDPR says consent must be as easy to remove as grant, so you have to act on any deletion request. Which means it's much better to have an alternative basis for processing. In a commercial environment I'd argue a bug tracking system is potentially part of a contractual obligation to fix bugs (or at least take some sort of notice of them), but I'm not sure that can apply to a Free software project in the general case. However there's potentially a public interest case to be made (we make the world a better place through Free software and it's in the interest of the public to see what is going on / historical information about why things are the way they are / interesting and informative technical discussions - Debian's Social Contract argues strongly that this applies) or just generally legitimate interests of the organisation; it's in the interest of fd.o to provide a bug tracking system that is public so that others with the same bug can come along and provide extra information to help solve it, or interested people can try to come up with fixes, or patterns across bugs that don't look related can be seen. Having to close access, or delete old bugs, removes those advantages. Even assuming those are valid reasons (and no one I've spoken to has been able to tell me they definitely are or definitely aren't) you'll still need the ability to delete things, it's just that that deletion won't be an automatic thing the way it would be if consent was the only justification for public archives / bug tracking systems. J. -- Beware of programmers carrying | .''`. Debian GNU/Linux Developer screwdrivers. | : :' : Happy to accept PGP signed | `. `' or encrypted mail - RSA | `- key on the keyservers.

TJ

5:09 p.m.

On 13/04/18 14:03, Jonathan McDowell wrote:

...

On Fri, Apr 13, 2018 at 01:07:21PM +0200, Daniel Stone wrote:

...
We've been looking into GDPR compliance for fd.o, which has been ... fun.

Yeah. I've been involved with looking at it for Debian. Fun isn't the word I'd use; I've ended up with a lot of questions and no real answers at this stage.

...
The biggest stumbling block for us is probably Bugzilla and Mailman. Deleting messages and profiles from those just isn't practical for us, especially at any kind of scale. We could write a script to censor those, but once it has been posted to either, then it's all over the public internet anyway.

We don't control distribution once messages hit Mailman - it's forwarded raw to a potentially unlimited distribution list - and deleting messages from Mailman is also a manual nightmare. Rebuilding the archives is out since it breaks URLs. Hand-editing it all sucks beyond belief. And then people have quoted it in replies anyway ...

Does anyone know if there's some kind of GDPR 'out' for, 'by posting here you agree that everything is going to be made public, so as there's nothing we can do about its distribution, it's not useful or practical for us to undo that'? And are there any kind of credible Bugzilla/Mailman deletion tools?

For posting and distributing I think the "You posted to a list, therefore it's going to be sent out to anyone on the list" is reasonable - it's a point in time thing, it's the way lists work and there's no retention.

I agree; I don't think anything needs to change because the user takes a "clear affirmative action" to subscribe: GDPR Rec.32; Art.4(11) "The consent of the data subject" means any freely given, specific, informed and unambiguous indication of his or her wishes by which the data subject, either by a statement or by a clear affirmative action, signifies agreement to personal data relating to them being processed. But I think it needs to be stated that any emails they send to the list can and likely will be retained by every other subscriber, and that there is also a public archive of those emails kept which is an essential part of the service (to retain historic technical and other data about the topic in the community/public interest). The subscriber should also be reminded that other services (search engines, public and private archives) may well make copies which the data controller has no (contractual) relationship with. The primary requirement from the point of view of the data controller/processor is having an efficient automated way to receive and handle deletion requests - bugzilla to track bugzilla anyone?! In summary, it needs 'small print' and sensible interpretation.

Holger Levsen

5:35 p.m.

On Fri, Apr 13, 2018 at 05:09:48PM +0100, TJ wrote:

...

...
For posting and distributing I think the "You posted to a list, therefore it's going to be sent out to anyone on the list" is reasonable - it's a point in time thing, it's the way lists work and there's no retention.

i'm not sure knowledge of how lists work can be assumed.

...

I agree; I don't think anything needs to change because the user takes a "clear affirmative action" to subscribe:

many lists allow posting without being subscribed... -- cheers, Holger

TJ

7:58 p.m.

On 13/04/18 17:35, Holger Levsen wrote:

...

On Fri, Apr 13, 2018 at 05:09:48PM +0100, TJ wrote:

...
...
For posting and distributing I think the "You posted to a list, therefore it's going to be sent out to anyone on the list" is reasonable - it's a point in time thing, it's the way lists work and there's no retention.

i'm not sure knowledge of how lists work can be assumed.

...
I agree; I don't think anything needs to change because the user takes a "clear affirmative action" to subscribe:

many lists allow posting without being subscribed...

In which case the "clear affirmative action" is the sending of the email to the mailing-list. If services feel the need to spell things out an auto-responder can reply with the necessary notices.

3000

Age (days ago)

3028

Last active (days ago)

List overview

Download

14 comments

7 participants

participants (7)

Daniel Stone
Gregor Jehle
Holger Levsen
Jonathan McDowell
Moritz Bartl
Ralph Corderoy
TJ

Undeleteable data

tags

participants (7)