Rich Freeman on 16 Nov 2017 13:50:43 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Revision Control for the Rest of Us


On Thu, Nov 16, 2017 at 3:43 PM, Clay Wells <clayw@sas.upenn.edu> wrote:
>
> For the record, Git commits are not hashes of every previous commit.
> This is simply incorrect. Someone
> mistakenly said that they were in a previous post.
>

Assuming you're referring to me you might want to re-read my post.  I
never said that git commits were hashes of every previous commit.  I
said:

> If you want to go back and change a commit that happened a year ago, the hash of every subsequent commit in the repository will change.

And that is completely true.  The reason for this is that each commit
ID is a content hash that includes the hash of the tree and the hash
of each parent commits (zero for the tail, one for a typical commit,
and more than one for a merge commit), as well as all the information
associated with the commit itself (author, date, message, etc).

Here is an example (generated using git cat-file commit
4434f7ad872bfc995efa99d7f9b98171f8156aeb on the Gentoo official repo):
tree 244119e72a6290145bbd490464b28f7f56a7fe22
parent ff9509fe161530b1c6d5412c855054828efd0373
author Jeroen Roovers <jer@gentoo.org> 1510811306 +0100
committer Jeroen Roovers <jer@gentoo.org> 1510811306 +0100
gpgsig -----BEGIN PGP SIGNATURE-----

 iF0EABECAB0WIQTGNWQvsnjji9bvHBhVaZGyp5KmEwUCWg0mqgAKCRBVaZGyp5Km
 E7CSAJ96QrhGRYaHdKTyxpD573zs/T+jVgCdGg8ITgi6d2syYyexg1+HRISUK+Q=
 =Q8fY
 -----END PGP SIGNATURE-----

www-plugins/adobe-flash: Old.

Package-Manager: Portage-2.3.14, Repoman-2.3.6


If you change anything about a commit its hash will change (indeed, if
you look at the example above you'll note that the commit ID itself
isn't even stored in the commit).  Now what was the subsequent commit
still contains the hash of the original unmodified commit, and if you
change that hash to point to the modified commit then its own content
hash will change.

When you modify git history (such as when using an interactive
rebase), git re-writes all the subsequent commits, and they all get
new hashes.  Otherwise the commit you modified will be an orphan that
isn't in the history of any head, and it will eventually be garbage
collected.

Now, "subsequent" is relative to a branch.  Indeed, git commits form a
linked list that only points backwards in history.  The only way to
get to a commit (other than by direct reference) is to start from one
of the heads and search backwards for it.  So, if you "modify" a
commit you could do so only in one branch, and then the hashes in the
other branches would not change, but then again neither would the
content of that commit.

If you could just modify a commit in place without touching all the
subsequent commit hashes then there would be no need for the "git
replace" command.  That works by storing a special kind of reference
in a separate location.  The original linked list of commits is
untouched, but the substitute commit is read in place of the replaced
commit whenever it is accessed.  But, as I pointed out before there
are some caveats with that.

In case this isn't clear, here is an analogy using gpg which I think
most people will understand.

Imagine that I'm paranoid about security so I sign all my emails.
However, I go a step further.  In addition to signing each email,
before I sign it I go ahead and generate a hash of the last email I
sent and include it in the content of my current email, which gets
signed when I send it.  My emails would basically form a linked list
backwards in time.  Now imagine that an attacker manages to steal my
private key.  They could forge new emails using my identity, but if
they wanted to go tamper with a historical email that might be
archived on some list archive they would be faced with a problem.  If
they modify the email and issue a new signature then its hash wouldn't
match the next email I sent in sequence.  They would be forced to go
out and modify every record of every email I ever sent otherwise the
alteration would be at risk of discovery.  While git doesn't really
use hashes for cryptographic purposes (ugh, sha1), it ends up working
the same way.  The emails don't actually contain a direct hash of
every previous email, but they contain a hash that is essentially
traceable to every previous email.

-- 
Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug