[gdpr-discuss] Git and the Right for Rectification

Peter Stuge peter at stuge.se
Fri May 25 23:35:29 BST 2018


Hi Winfried, thanks for your contributions.

Winfried Tilanus wrote:
> Git is changeable because:
> - it is technically possible to rebase branches

I suggest changing "rebase branches" to perhaps "rewrite branch history".

--8<-- Git data model explanation

A branch in git is a name that temporarily refers to some particular
commit, and by extension to the development history before that commit.
The directly preceding commit id is part of every commit.

A branch is much more of a workflow concept than a technical one.

When a branch is checked out (often a branch called "master" is)
and a commit is created, the commit is said to be created "on the
branch" and the reference for the branch is moved forward to refer
to the newly created commit.

Every git commit is write-once and immutable, because commits are
defined by their contents. If any contents (git log --pretty=raw)
changes, that results in *a different* commit.

So no commit can be changed.

But a commit can be replaced with a different commit, ie. one having
a different id. Ids change even if commits are identical except for
the preceding commit id reference.

When a commit is replaced then all succeeding commits are also
replaced, because the different id of the replaced commit is recorded
as parent commit in directly following commits, resulting in a
different commit id after the replaced one, and so on until the most
recent commit on the rewritten branch.

It is a fork of the branch. All collaborators must synchronize at
this point, and replace their previous truth about rewritten branches
with the new truths.

And old commits remain in the repository, and even if one branch is
rewritten, those commits may still be part of any number of other
branches, as well as tags.

Tags after the replaced commit would also have to be changed. This
may become a significant burden for consumers of the repository such
as distribution packagers, who may have recorded
version_tagname<->commitid mappings as a sort of opportunistic commit
pinning for security purposes.


git didn't use to allow very fine-grained control of deletion of
objects not referenced anywhere - such as commits which have been
replaced with others and are no longer part of any branch.

Such objects used to be garbage-collected at an imprecise point in
time, after not being referenced for 30 days or so IIRC, but it may
have changed. During that time they are not publically visible on a
git server hosting the repo, but they are still available if you know
which (old) commit id to ask for.


> - with the right tools (which?), rebasing is feasible to do, like large 
> projects like the Linux kernel show.

All neccessary tooling is within git itself, and has been for long/ever.


> - the workflow needed for rebasing needs a decision by each maintainer 
> of a local copy. This can be a bit of a hassle, but is doable like large 
> projects like the linux kernel show.

Suggest perhaps s,rebasing,rewritten branches,


> - because each participant has to review the rebasing decision code 
> integrity is maintained

I disagree strongly with that statement.

If a repository rewrites branch history then I argue that not only
the rebasing decision itself must be reviewed, but since every commit
after the fork by definition is different from the corresponding
commit before the replacement *every commit* after the replacement
should be reviewed again.

(It's technically possible to verify that only parent id has changed
in all commits, but I haven't seen tooling doing that - yet.)


> Git is unchangeable because:
> - changing the history is against the design of Git, all technical 
> routes to do so are emergency scenarios with heavy side effects, but it 
> is possible to add notes, new commits or to map e-mail addresses.
> - for options like rebasing a quite disruptive additional workflow is 
> needed. That workflow demands the cooperation of all developers
> - Code integrity can only be guaranteed when all commits are traceable 

Agree.


> activities like rebasing result in untraceable code changes and endanger 
> the security model behind open source software.

Disagree. The result is still traceable. But it is distinct from what
was there before, so it would need to be fully traced again.


Then there's deletion from already-published works, such as a
contributor list in particular released versions of a project?


//Peter



More information about the gdpr-discuss mailing list