Git and the Right for Rectification

newer
[article] GDPR: Biggest pain...

older
Summary of GDPR Concepts For FOSS...

Jonas Wielicki

21 May 2018 21 May '18

5:22 p.m.

On Montag, 21. Mai 2018 17:57:42 CEST Winfried Tilanus wrote:

...

Git has the pressing need of maintaining code integrity and traceability. The final decision will be up to a judge, but my bets are on the need of maintaining the code. Something similar will be the case with Bugzilla.

So I was wondering about Git and the Right for Rectification. In contrast to the Right to be Forgotten, the Right for Rectification, does not have any exceptions I am aware of. Now what if somebody commits code to a Git repository (so the commit includes their name and email address) and they change for example email addresses. In that case, from my understanding, the Right for Rectification would trigger and the controller of the Git repository may be forced to rectify the information. This would require re-writing all history since that commit, which is a huge issue. One argument against that which I heard that: - The email address was valid at the time the commit was made and is thus an accurate representation of the history at the time the commit was made (which is timestamped) and thus doesn’t need to be rectified. - It is expected that the user would provide accurate information and if they, for example, have a typo in e.g. their name in the commit metadata, it is kind of their fault and this does not need to be corrected. Do these counter-arguments make sense or is this a real threat to Git? kind regards, Jonas

Attachments:

signature.asc (application/pgp-signature — 833 bytes)

Show replies by date

Jens Kubieziel

21 May 21 May

6:01 p.m.

* Jonas Wielicki schrieb am 2018-05-21 um 18:22 Uhr:

...

This would require re-writing all history since that commit, which is a huge issue.

So you could argue with Art. 12 (5) lit. b GDPR: »Where requests from a data subject are manifestly unfounded or excessive, […], the controller may either: […] refuse to act on the request.« However this is quite a weak argument and will probably not work for a git archive. If it is not possible to correctly identify the person, the request must be rejected (think of nicknames).

...

- The email address was valid at the time the commit was made and is thus an accurate representation of the history at the time the commit was made (which is timestamped) and thus doesn’t need to be rectified.

- It is expected that the user would provide accurate information and if they, for example, have a typo in e.g. their name in the commit metadata, it is kind of their fault and this does not need to be corrected.

I think both arguments are not valid, because the data subject has this right independent from what was correct or not not. If the data is incorrect now, the data subject has the right to rectification and also the controller has a duty to process correct data. In the case of git some other arguments are needed, IMHO. -- Jens Kubieziel https://www.kubieziel.de Wer vom Glück immer nur träumt, darf sich nicht wundern, wenn er es verschläft. Ernst Deutsch

Karen Reilly

6:18 p.m.

In the case of git, and the integrity of a code base that has implications for the security of many people, I would look into legitimate interest from Article 6(1)(f): “processing is necessary for… …the purposes of the legitimate interests pursued by the controller or by a third party, … …except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject which require protection of personal data, in particular where the data subject is a child.” In short: purpose, necessity, and balancing. Also consider what is PII and what isn’t. Make a decision and document it. Cheers, Karen Reilly On Mon 21. May 2018 at 19:01, Jens Kubieziel <maillist@kubieziel.de> wrote:

...

* Jonas Wielicki schrieb am 2018-05-21 um 18:22 Uhr:

...
This would require re-writing all history since that commit, which is a huge issue.

So you could argue with Art. 12 (5) lit. b GDPR: »Where requests from a data subject are manifestly unfounded or excessive, […], the controller may either: […] refuse to act on the request.« However this is quite a weak argument and will probably not work for a git archive.

If it is not possible to correctly identify the person, the request must be rejected (think of nicknames).

...
- The email address was valid at the time the commit was made and is thus an accurate representation of the history at the time the commit was made (which is timestamped) and thus doesn’t need to be rectified.

- It is expected that the user would provide accurate information and if they, for example, have a typo in e.g. their name in the commit metadata, it is kind of their fault and this does not need to be corrected.

I think both arguments are not valid, because the data subject has this right independent from what was correct or not not. If the data is incorrect now, the data subject has the right to rectification and also the controller has a duty to process correct data.

In the case of git some other arguments are needed, IMHO.

-- Jens Kubieziel https://www.kubieziel.de Wer vom Glück immer nur träumt, darf sich nicht wundern, wenn er es verschläft. Ernst Deutsch _______________________________________________ gdpr-discuss mailing list gdpr-discuss@earth.li https://www.earth.li/mailman/listinfo/gdpr-discuss

Winfried Tilanus

8:47 p.m.

On 05/21/2018 07:18 PM, Karen Reilly wrote: Hi,

...

In the case of git, and the integrity of a code base that has implications for the security of many people, I would look into legitimate interest from Article 6(1)(f):

Article 6 defines the possible legal grounds for processing personal data, that is separate from the rights of the data subject when you are processing them. Exceptions in the rights of the data subjects are defined in those articles. So is the right of portability (art 20) only valid for data that is processed under art. 6.1a or art. 6.1b, as stated in art. 20.1a. Article 16 (rectification) doesn't have such a clause. Winfried -- privacy consultant e-health +31.6.23303960 https://www.tilanus.com/

Tapani Tarvainen

6:31 p.m.

On Mon, May 21, 2018 at 07:01:01PM +0200, Jens Kubieziel (maillist@kubieziel.de) wrote:

...

* Jonas Wielicki schrieb am 2018-05-21 um 18:22 Uhr:

...

...
- The email address was valid at the time the commit was made and is thus an accurate representation of the history at the time the commit was made (which is timestamped) and thus doesn’t need to be rectified.

- It is expected that the user would provide accurate information and if they, for example, have a typo in e.g. their name in the commit metadata, it is kind of their fault and this does not need to be corrected.

I think both arguments are not valid, because the data subject has this right independent from what was correct or not not.

The latter is clearly invalid on that basis. The former, however, is not so obvious:

...

If the data is incorrect now, the data subject has the right to rectification and also the controller has a duty to process correct data.

If I expand Jonas' point a bit, I could argue the data is *not* incorrect now, as it is historical record and as such *must* not be changed, indeed changing it would be tantamount to forgery. Of course that does not solve the problem, because then we'd need a justification for keeping such historical records. To that end it could be argued that the email was used as means of authentication, and as such must be kept just like signatures in any old deeds or other legal paperwork - you can't get your name changed in or removed from such either. The analogy, however, fails or at least isn't as strong when it comes to *publishing* the data, as with public git repositories, so it may not help much. In any case this line of argument does not stand or fall on the basis of technology used alone: git or not, what matters is what it's used for and how, and what its users have accepted or what can be justified by some of the other relevant clauses of GDPR. -- Tapani Tarvainen

Winfried Tilanus

8:37 p.m.

On 05/21/2018 07:01 PM, Jens Kubieziel wrote: Hi,

...

So you could argue with Art. 12 (5) lit. b GDPR: »Where requests from a data subject are manifestly unfounded or excessive, […], the controller may either: […] refuse to act on the request.« However this is quite a weak argument and will probably not work for a git archive.

If it is not possible to correctly identify the person, the request must be rejected (think of nicknames).

Article 12 would also be my first place to look for exceptions, but the 'excessive' clause is not easily met, excessive is about the frequency of requests, not about administrative work to fulfil the request.

...

...
- The email address was valid at the time the commit was made and is thus an accurate representation of the history at the time the commit was made (which is timestamped) and thus doesn’t need to be rectified.

- It is expected that the user would provide accurate information and if they, for example, have a typo in e.g. their name in the commit metadata, it is kind of their fault and this does not need to be corrected.

I think both arguments are not valid, because the data subject has this right independent from what was correct or not not. If the data is incorrect now, the data subject has the right to rectification and also the controller has a duty to process correct data.

An e-mail address that was valid at the time of commit is not subject to the right of correction, because it is correct somebody had that e-mail address at the time of the commit. The issue arises when somebody commits with a typo in the e-mail address or with the e-mail address of somebody else. Winfried -- privacy consultant e-health +31.6.23303960 https://www.tilanus.com/

Winfried Tilanus

8:14 p.m.

On 05/21/2018 06:22 PM, Jonas Wielicki wrote: Hi,

...

So I was wondering about Git and the Right for Rectification. In contrast to the Right to be Forgotten, the Right for Rectification, does not have any exceptions I am aware of.

Correct, good point. But lets have a look at article 16 (Right to rectification): --- The data subject shall have the right to obtain from the controller without undue delay the rectification of inaccurate personal data concerning him or her. Taking into account the purposes of the processing, the data subject shall have the right to have incomplete personal data completed, including by means of providing a supplementary statement. --- The 'supplementary statement' are the most interesting words here. At places where it is technically impossible to change the data (e.g. in a paper archive or on not-changeable media) it is allowed to make the correction by adding a supplementary statement. This has been practice under old directive and will stay like that under the GDPR. Git is an example of a non-changeable medium. So adding a statement (read commit) with the correct information should be enough. Winfried -- privacy consultant e-health +31.6.23303960 https://www.tilanus.com/

gdpr＠sheogorath.shivering-isles.com

9:12 p.m.

On 05/21/2018 06:22 PM, Jonas Wielicki wrote:

...

On Montag, 21. Mai 2018 17:57:42 CEST Winfried Tilanus wrote:

...
Git has the pressing need of maintaining code integrity and traceability. The final decision will be up to a judge, but my bets are on the need of maintaining the code. Something similar will be the case with Bugzilla.

So I was wondering about Git and the Right for Rectification. In contrast to the Right to be Forgotten, the Right for Rectification, does not have any exceptions I am aware of.

Now what if somebody commits code to a Git repository (so the commit includes their name and email address) and they change for example email addresses. In that case, from my understanding, the Right for Rectification would trigger and the controller of the Git repository may be forced to rectify the information.

This would require re-writing all history since that commit, which is a huge issue.

One argument against that which I heard that:

- The email address was valid at the time the commit was made and is thus an accurate representation of the history at the time the commit was made (which is timestamped) and thus doesn’t need to be rectified.

- It is expected that the user would provide accurate information and if they, for example, have a typo in e.g. their name in the commit metadata, it is kind of their fault and this does not need to be corrected.

Well, whose fault is is doesn't really matter to the GDPR. But no matter what, for git repositories, the solution is as easy as old: mailmaps. If someone wants to correct their mail address, just add a .mailmap file. For details check: https://www.git-scm.com/docs/git-check-mailmap I think this should be enough on top of HEAD -- Signed Sheogorath

Karen Reilly

9:31 p.m.

https://ico.org.uk/for-organisations/guide-to-the-general-data-protection-re... In the case of git, the data (email addresses) were accurate at the time. It may also reflect an organization member’s code contributions - historical data that one has a reason for tracking (OpenStack does this). The legitimate interest for processing is still relevant in later articles - data subject requests are influenced by whether the legitimate interest still holds. If coder Ms.Foo used a company email to contribute, then changed email addresses, a correct email address for future contributions may suffice. Cheers, K. Reilly On Mon 21. May 2018 at 22:13, <gdpr@sheogorath.shivering-isles.com> wrote:

...

On 05/21/2018 06:22 PM, Jonas Wielicki wrote:

...
On Montag, 21. Mai 2018 17:57:42 CEST Winfried Tilanus wrote:

...
Git has the pressing need of maintaining code integrity and traceability. The final decision will be up to a judge, but my bets are on the need of maintaining the code. Something similar will be the case with Bugzilla.

So I was wondering about Git and the Right for Rectification. In contrast to the Right to be Forgotten, the Right for Rectification, does not have any exceptions I am aware of.

Now what if somebody commits code to a Git repository (so the commit includes their name and email address) and they change for example email addresses. In that case, from my understanding, the Right for Rectification would trigger and the controller of the Git repository may be forced to rectify the information.

This would require re-writing all history since that commit, which is a huge issue.

One argument against that which I heard that:

- The email address was valid at the time the commit was made and is thus an accurate representation of the history at the time the commit was made (which is timestamped) and thus doesn’t need to be rectified.

- It is expected that the user would provide accurate information and if they, for example, have a typo in e.g. their name in the commit metadata, it is kind of their fault and this does not need to be corrected.

Well, whose fault is is doesn't really matter to the GDPR.

But no matter what, for git repositories, the solution is as easy as old: mailmaps.

If someone wants to correct their mail address, just add a .mailmap file.

For details check: https://www.git-scm.com/docs/git-check-mailmap

I think this should be enough on top of HEAD

-- Signed Sheogorath

_______________________________________________ gdpr-discuss mailing list gdpr-discuss@earth.li https://www.earth.li/mailman/listinfo/gdpr-discuss

Jonas Wielicki

22 May 22 May

10:52 a.m.

On Montag, 21. Mai 2018 22:12:49 CEST gdpr@sheogorath.shivering-isles.com wrote:

...

Well, whose fault is is doesn't really matter to the GDPR.

But no matter what, for git repositories, the solution is as easy as old: mailmaps.

If someone wants to correct their mail address, just add a .mailmap file.

For details check: https://www.git-scm.com/docs/git-check-mailmap

I think this should be enough on top of HEAD

Thank you, this indeed seems perfect. kind regards, Jonas

Ian Jackson

24 May 24 May

6:54 p.m.

(resending) gdpr@sheogorath.shivering-isles.com writes ("Re: [gdpr-discuss] Git and the Right for Rectification"):

...

If someone wants to correct their mail address, just add a .mailmap file. For details check: https://www.git-scm.com/docs/git-check-mailmap

I think this should be enough on top of HEAD

I agree. Although it would be nice if the mailmap (or another similar file) could also contain the SHA-1s of the old values. This is particularly relevant for trans contributors, and for anyone else for whom use of their previous name or email address is offensive somehow. It is undesirable and unnecessary that their actual deadname (or whatever) should be replicated in HEAD in the mailmap file. Putting the hash in the file does not serve a *security* purpose (since obviously it is easy to recover the preimage), but it does serve the *human* purposes of avoiding the unnecessary promulgation and display of deadnames etc. Would anyone care to try to get that feature added to git ? Ian.

Roberto Polli

25 May 25 May

9:09 a.m.

About personal data in git: 2018-05-24 19:54 GMT+02:00 Ian Jackson <ijackson@chiark.greenend.org.uk>:

...

gdpr@sheogorath.shivering-isles.com writes ("Re: [gdpr-discuss] Git and the Right for Rectification"):

...
If someone wants to correct their mail address, just add a .mailmap file. For details check: https://www.git-scm.com/docs/git-check-mailmap

The actual gitlab clauses include:

...

(For GitLab Contributors Only) ... I [...] agree that my name and email address will become embedded and part of the code, which may be publicly available. I understand the removal of this information would be impermissibly destructive [..] I hereby waive any right to request any erasure, removal, or rectification of this information under any applicable privacy or other law [..].

Could those clauses suffice? R.

Winfried Tilanus

9:31 a.m.

On 05/25/2018 10:09 AM, Roberto Polli wrote: Hi,

...

...
(For GitLab Contributors Only) ... I [...] agree that my name and email address will become embedded and part of the code, which may be publicly available. I understand the removal of this information would be impermissibly destructive [..] I hereby waive any right to request any erasure, removal, or rectification of this information under any applicable privacy or other law [..]. Could those clauses suffice?

No, you can't waive-away the GDPR, though it is good to make clear on forehand that Git is an unchangeable system. Winfried -- privacy consultant e-health +31.6.23303960 https://www.tilanus.com/

Daniel Stone

9:35 a.m.

On Fri, 25 May 2018, 9:31 am Winfried Tilanus, <winfried@tilanus.com> wrote:

...

On 05/25/2018 10:09 AM, Roberto Polli wrote:

...
...
(For GitLab Contributors Only) ... I [...] agree that my name and email address will become embedded and part of the code, which may be publicly available. I understand the removal of this information would be impermissibly destructive [..] I hereby waive any right to request any erasure, removal, or rectification of this information under any applicable privacy or other law [..]. Could those clauses suffice?

No, you can't waive-away the GDPR, though it is good to make clear on forehand that Git is an unchangeable system.

Git is certainly not an unchangeable record, on a technical level. It's just that certain repositories have a policy of not rebasing to remove PII and similar from history.

...

Winfried Tilanus

9:51 a.m.

On 05/25/2018 10:35 AM, Daniel Stone wrote: Hi Daniel,

...

Git is certainly not an unchangeable record, on a technical level. It's just that certain repositories have a policy of not rebasing to remove PII and similar from history.

Being only a basic Git user, I need your experience here: how practical is rebasing to remove a commit message or an e-mail address from a commit two years ago? Winfried -- privacy consultant e-health +31.6.23303960 https://www.tilanus.com/

Daniel Stone

9:57 a.m.

Hi, On Fri, 25 May 2018, 9:51 am Winfried Tilanus, <winfried@tilanus.com> wrote:

...

On 05/25/2018 10:35 AM, Daniel Stone wrote:

...
Git is certainly not an unchangeable record, on a technical level. It's just that certain repositories have a policy of not rebasing to remove PII and similar from history.

Being only a basic Git user, I need your experience here: how practical is rebasing to remove a commit message or an e-mail address from a commit two years ago?

Technically it's completely trivial and possible to automate. Everyone pulling the repository will have to deal with the result: they will need to manually reconcile the new and old state via a rebase or merge. This is something that's part of the workflow of some large repositories. The result is somewhat more painful to work with, but that's a workflow and policy issue rather than a technical one ...

...

Christoph Berg

10:04 a.m.

Re: Daniel Stone 2018-05-25 <CAPj87rOE9hkVkMknZoMNkp_zpKcKgXd868vULPV5vLV-MwT2wg@mail.gmail.com>

...

...
Being only a basic Git user, I need your experience here: how practical is rebasing to remove a commit message or an e-mail address from a commit two years ago?

Technically it's completely trivial and possible to automate.

Everyone pulling the repository will have to deal with the result: they will need to manually reconcile the new and old state via a rebase or merge. This is something that's part of the workflow of some large repositories.

The result is somewhat more painful to work with, but that's a workflow and policy issue rather than a technical one ...

This is totally impractical. Suggesting that rebasing larger git repositories is feasible in practise is nonsense. Christoph

Daniel Stone

10:11 a.m.

On Fri, 25 May 2018, 10:04 am Christoph Berg, <myon@debian.org> wrote:

...

Re: Daniel Stone 2018-05-25 < CAPj87rOE9hkVkMknZoMNkp_zpKcKgXd868vULPV5vLV-MwT2wg@mail.gmail.com>

...
...
Being only a basic Git user, I need your experience here: how practical is rebasing to remove a commit message or an e-mail address from a commit two years ago?

Technically it's completely trivial and possible to automate.

Everyone pulling the repository will have to deal with the result: they will need to manually reconcile the new and old state via a rebase or merge. This is something that's part of the workflow of some large repositories.

The result is somewhat more painful to work with, but that's a workflow and policy issue rather than a technical one ...

This is totally impractical. Suggesting that rebasing larger git repositories is feasible in practise is nonsense.

It causes a workflow issue. It's not technically impossible or an insurmountable limitation of the tool. Kernel development involves quite a deal of rebasing. There are good tools to handle it. I do it constantly, and am happy to advise you if you're stuck or confused about how rebase works, or how to handle it on the client side. It is possible to make an argument based on the assurance given by the commit history etc etc, that a linear history is necessary for proper operation and rewriting it is not an option. But this is absolutely not a certainty that legal authorities will agree with your incredibly blunt opinion.

...

Elias Aarnio

10:38 a.m.

Hello, people! I am Elias, an FOSS user&activist who turned into a privacy consultant. So far I've just following this discussion silently as I simply have been too busy. I wish to point out something that I think is of importance in this discussion. When using Git in an FLOSS project (FLOSS here in wide sense - using any copyleft license), an important part of using Git is the act of accepting the license and thus giving certain IPR that usually belong to the author, to the project. This act of giving the IPR away must be documented as this kind of usage is an exception to legislation everywhere that I know of. Personal information like name and email address are needed for identifying the person who is contributing the IPR. This is a legitimate interest mentioned recital 47 of GDPR.

...

From this point of view IMHO one could argue that as the information has been accurate at the time of the act of committing (and simultaneously accepting the conditions mentioned in the copyleft license used), keeping the information intact is also possible.

-- Elias Aarnio

Ian Jackson

11:03 a.m.

Elias Aarnio writes ("Re: [gdpr-discuss] Git and the Right for Rectification"):

...

Personal information like name and email address are needed for identifying the person who is contributing the IPR. This is a legitimate interest mentioned recital 47 of GDPR.

From this point of view IMHO one could argue that as the information has been accurate at the time of the act of committing (and simultaneously accepting the conditions mentioned in the copyleft license used), keeping the information intact is also possible.

I would avoid the word IPR, but, I broadly agree with this analysis. There are also code integrity (assurance and security) reasons for not wanting to routinely rebase, and for wanting to permanently document the identities of contributors. The application of these kind of tests is a matter of judgement and balance, and we don't know what a court would say. Until we know the contrary I would rely on these justifications for git histories. However, I do think we need to: * Fix it so that you don't have to list all the old names in the .mailmap file. GDPR (and European privacy law in general) is context sensitive, so while it can be justifiable to retain the old address in the history, it is not justifiable to make it so prominent. * Routinely accede to requests to pseudonymise contributions, by adding to the mailmap something like <hash of name and email> Past contributor #332 Ian.

Walter van Holst

12:59 p.m.

On 2018-05-25 12:03, Ian Jackson wrote:

...

* Routinely accede to requests to pseudonymise contributions, by adding to the mailmap something like <hash of name and email> Past contributor #332

I would advise against doing so unless the author requests it. In quite a few jurisdictions the removal of an author's name from any copyright notice counts as a copyright infringement in itself. Which gets us to the main basis for processing this personal data: it ticks the boxes for a) execution of a contract (a license is a contract is most civil law jurisdictions), b) a legal obligation (copyright law does not allow for removal of author indications) and c) a legitimate interest in maintaining records of the provenance of the code insofar not covered by a) and b). The right of correction only comes into play when the data really is inaccurate and I would stick to the proposed solution of using mailmaps. This is not a good use case for pseudonymisation. Regards, Walter

Ian Jackson

2:49 p.m.

Walter van Holst writes ("Re: [gdpr-discuss] Git and the Right for Rectification"):

...

On 2018-05-25 12:03, Ian Jackson wrote:

...
* Routinely accede to requests to pseudonymise contributions, by adding to the mailmap something like <hash of name and email> Past contributor #332

I would advise against doing so unless the author requests it.

By "requests" I meant "requests from the author", obviously.

...

The right of correction only comes into play when the data really is inaccurate and I would stick to the proposed solution of using mailmaps.

I don't understand how you could think that I am not proposing to use mailmaps when I said "adding to the mailmap".

...

This is not a good use case for pseudonymisation.

If a past contributor requests that their contributions be anonymised, that should clearly be done. Do you disagree ? If it is done, it will still be necessary to distinguish contributions by different now-anonymous people. Hence, pseudonymisation. ian. -- Ian Jackson <ijackson@chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.

Walter van Holst

3:35 p.m.

On 2018-05-25 15:49, Ian Jackson wrote:

...

If a past contributor requests that their contributions be anonymised, that should clearly be done. Do you disagree ?

Yes, I agree. Your mail was (mis)read by me to use pseudonymisation for every change in mail-address etc. Regards, Walter

Winfried Tilanus

10:58 a.m.

On 05/25/2018 11:11 AM, Daniel Stone wrote: Hi, If we want to weigh the chances when there is a right to be forgotten court case against a Git deployment, we should make an argumentation chart about the possibility to change past commits and its consequences. That chart should include technical arguments, workflow arguments and arguments about traceability and integrity of the code. Giving it a start, please improve/add/amend, I am probably saying things here that are bluntly incorrect! Git is changeable because: - it is technically possible to rebase branches - with the right tools (which?), rebasing is feasible to do, like large projects like the Linux kernel show. - the workflow needed for rebasing needs a decision by each maintainer of a local copy. This can be a bit of a hassle, but is doable like large projects like the linux kernel show. - because each participant has to review the rebasing decision code integrity is maintained Git is unchangeable because: - changing the history is against the design of Git, all technical routes to do so are emergency scenarios with heavy side effects, but it is possible to add notes, new commits or to map e-mail addresses. - for options like rebasing a quite disruptive additional workflow is needed. That workflow demands the cooperation of all developers - Code integrity can only be guaranteed when all commits are traceable activities like rebasing result in untraceable code changes and endanger the security model behind open source software. Winfried ps. Beside the changeable / unchangeable discussion there is also the element of expectation that is relevant for the legal discussion.

...

On Fri, 25 May 2018, 10:04 am Christoph Berg, <myon@debian.org <mailto:myon@debian.org>> wrote:

Re: Daniel Stone 2018-05-25 <CAPj87rOE9hkVkMknZoMNkp_zpKcKgXd868vULPV5vLV-MwT2wg@mail.gmail.com <mailto:CAPj87rOE9hkVkMknZoMNkp_zpKcKgXd868vULPV5vLV-MwT2wg@mail.gmail.com>> > > Being only a basic Git user, I need your experience here: how practical > > is rebasing to remove a commit message or an e-mail address from a > > commit two years ago? > > > > Technically it's completely trivial and possible to automate. > > Everyone pulling the repository will have to deal with the result: they > will need to manually reconcile the new and old state via a rebase or > merge. This is something that's part of the workflow of some large > repositories. > > The result is somewhat more painful to work with, but that's a workflow and > policy issue rather than a technical one ...

This is totally impractical. Suggesting that rebasing larger git repositories is feasible in practise is nonsense.

It causes a workflow issue. It's not technically impossible or an insurmountable limitation of the tool.

Kernel development involves quite a deal of rebasing. There are good tools to handle it. I do it constantly, and am happy to advise you if you're stuck or confused about how rebase works, or how to handle it on the client side.

It is possible to make an argument based on the assurance given by the commit history etc etc, that a linear history is necessary for proper operation and rewriting it is not an option. But this is absolutely not a certainty that legal authorities will agree with your incredibly blunt opinion.

-- privacy consultant e-health +31.6.23303960 https://www.tilanus.com/

Thorsten Behrens

4:31 p.m.

Winfried Tilanus wrote:

...

Git is unchangeable because: - Code integrity can only be guaranteed when all commits are traceable activities like rebasing result in untraceable code changes and endanger the security model behind open source software.

Amend: - for repositories where cryptographically signing tags (or even commits) is the rule, git is by design unchangeable - requirements to keep accurate records might put conflicting requirements on you not to tamper with history And yep, the gitlab approach of putting that right in front of you when signing up is probably a good idea. Cheers, -- Thorsten

Peter Stuge

26 May 26 May

10:43 a.m.

Thorsten Behrens wrote:

...

...
Git is unchangeable because: - Code integrity can only be guaranteed when all commits are traceable activities like rebasing result in untraceable code changes and endanger the security model behind open source software.

Amend: - for repositories where cryptographically signing tags (or even commits) is the rule, git is by design unchangeable

All signed commits can be replaced with other, newly signed commits. But then either all previous signers must re-sign the new versions of all their old commits which follow the first replaced commit, or their original signature and the original parent id is stored in a new field in the new commit, then an automated system can re-sign that. I agree with you about tags though: Changing tags goes directly against the design of git, goes against the intent of projects using git for version control, and hinders traceability of the original tag and/or released version of the work. Branches are by design temporary references to some commit, but tags are very much intended to be permanent references to some commit (and by extension also all commits before). It is of course technically possible to change tags, but tags are not expected to change. Changing a tag creates a ripple effect just like changing a commit does. Not only the already-published tag becomes invalid, but also any use of that tag, most significantly also outside of the project. //Peter

Peter Stuge

25 May 25 May

11:35 p.m.

Hi Winfried, thanks for your contributions. Winfried Tilanus wrote:

...

Git is changeable because: - it is technically possible to rebase branches

I suggest changing "rebase branches" to perhaps "rewrite branch history". --8<-- Git data model explanation A branch in git is a name that temporarily refers to some particular commit, and by extension to the development history before that commit. The directly preceding commit id is part of every commit. A branch is much more of a workflow concept than a technical one. When a branch is checked out (often a branch called "master" is) and a commit is created, the commit is said to be created "on the branch" and the reference for the branch is moved forward to refer to the newly created commit. Every git commit is write-once and immutable, because commits are defined by their contents. If any contents (git log --pretty=raw) changes, that results in *a different* commit. So no commit can be changed. But a commit can be replaced with a different commit, ie. one having a different id. Ids change even if commits are identical except for the preceding commit id reference. When a commit is replaced then all succeeding commits are also replaced, because the different id of the replaced commit is recorded as parent commit in directly following commits, resulting in a different commit id after the replaced one, and so on until the most recent commit on the rewritten branch. It is a fork of the branch. All collaborators must synchronize at this point, and replace their previous truth about rewritten branches with the new truths. And old commits remain in the repository, and even if one branch is rewritten, those commits may still be part of any number of other branches, as well as tags. Tags after the replaced commit would also have to be changed. This may become a significant burden for consumers of the repository such as distribution packagers, who may have recorded version_tagname<->commitid mappings as a sort of opportunistic commit pinning for security purposes. git didn't use to allow very fine-grained control of deletion of objects not referenced anywhere - such as commits which have been replaced with others and are no longer part of any branch. Such objects used to be garbage-collected at an imprecise point in time, after not being referenced for 30 days or so IIRC, but it may have changed. During that time they are not publically visible on a git server hosting the repo, but they are still available if you know which (old) commit id to ask for.

...

- with the right tools (which?), rebasing is feasible to do, like large projects like the Linux kernel show.

All neccessary tooling is within git itself, and has been for long/ever.

...

- the workflow needed for rebasing needs a decision by each maintainer of a local copy. This can be a bit of a hassle, but is doable like large projects like the linux kernel show.

Suggest perhaps s,rebasing,rewritten branches,

...

- because each participant has to review the rebasing decision code integrity is maintained

I disagree strongly with that statement. If a repository rewrites branch history then I argue that not only the rebasing decision itself must be reviewed, but since every commit after the fork by definition is different from the corresponding commit before the replacement *every commit* after the replacement should be reviewed again. (It's technically possible to verify that only parent id has changed in all commits, but I haven't seen tooling doing that - yet.)

...

Git is unchangeable because: - changing the history is against the design of Git, all technical routes to do so are emergency scenarios with heavy side effects, but it is possible to add notes, new commits or to map e-mail addresses. - for options like rebasing a quite disruptive additional workflow is needed. That workflow demands the cooperation of all developers - Code integrity can only be guaranteed when all commits are traceable

Agree.

...

activities like rebasing result in untraceable code changes and endanger the security model behind open source software.

Disagree. The result is still traceable. But it is distinct from what was there before, so it would need to be fully traced again. Then there's deletion from already-published works, such as a contributor list in particular released versions of a project? //Peter

Ian Jackson

10:48 a.m.

Roberto Polli writes ("Re: [gdpr-discuss] Git and the Right for Rectification"):

...

About personal data in git:

2018-05-24 19:54 GMT+02:00 Ian Jackson <ijackson@chiark.greenend.org.uk>:

...
gdpr@sheogorath.shivering-isles.com writes ("Re: [gdpr-discuss] Git and the Right for Rectification"):

...
If someone wants to correct their mail address, just add a .mailmap file. For details check: https://www.git-scm.com/docs/git-check-mailmap

The actual gitlab clauses include:

...
(For GitLab Contributors Only) ... I [...] agree that my name and email address will become embedded and part of the code, which may be publicly available. I understand the removal of this information would be impermissibly destructive [..] I hereby waive any right to request any erasure, removal, or rectification of this information under any applicable privacy or other law [..].

Could those clauses suffice?

I'm not a laywer, but maybe. I do think that we should improve mailmap along the lines I suggested in my previous message. Legally: "processing" someone's deadname, by putting it in plain view in the HEAD's mailmap file, is not *necessary* for the integrity of the repository. Obfuscating it is easy. So I don't think it is justified, ethically or legally. Ian. -- Ian Jackson <ijackson@chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.

2985

Age (days ago)

2990

Last active (days ago)

List overview

Download

27 comments

14 participants

participants (14)

Christoph Berg
Daniel Stone
Elias Aarnio
gdpr＠sheogorath.shivering-isles.com
Ian Jackson
Jens Kubieziel
Jonas Wielicki
Karen Reilly
Peter Stuge
Roberto Polli
Tapani Tarvainen
Thorsten Behrens
Walter van Holst
Winfried Tilanus