Rich Freeman on 16 Nov 2017 13:50:43 -0800 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: [PLUG] Revision Control for the Rest of Us |
On Thu, Nov 16, 2017 at 3:43 PM, Clay Wells <clayw@sas.upenn.edu> wrote: > > For the record, Git commits are not hashes of every previous commit. > This is simply incorrect. Someone > mistakenly said that they were in a previous post. > Assuming you're referring to me you might want to re-read my post. I never said that git commits were hashes of every previous commit. I said: > If you want to go back and change a commit that happened a year ago, the hash of every subsequent commit in the repository will change. And that is completely true. The reason for this is that each commit ID is a content hash that includes the hash of the tree and the hash of each parent commits (zero for the tail, one for a typical commit, and more than one for a merge commit), as well as all the information associated with the commit itself (author, date, message, etc). Here is an example (generated using git cat-file commit 4434f7ad872bfc995efa99d7f9b98171f8156aeb on the Gentoo official repo): tree 244119e72a6290145bbd490464b28f7f56a7fe22 parent ff9509fe161530b1c6d5412c855054828efd0373 author Jeroen Roovers <jer@gentoo.org> 1510811306 +0100 committer Jeroen Roovers <jer@gentoo.org> 1510811306 +0100 gpgsig -----BEGIN PGP SIGNATURE----- iF0EABECAB0WIQTGNWQvsnjji9bvHBhVaZGyp5KmEwUCWg0mqgAKCRBVaZGyp5Km E7CSAJ96QrhGRYaHdKTyxpD573zs/T+jVgCdGg8ITgi6d2syYyexg1+HRISUK+Q= =Q8fY -----END PGP SIGNATURE----- www-plugins/adobe-flash: Old. Package-Manager: Portage-2.3.14, Repoman-2.3.6 If you change anything about a commit its hash will change (indeed, if you look at the example above you'll note that the commit ID itself isn't even stored in the commit). Now what was the subsequent commit still contains the hash of the original unmodified commit, and if you change that hash to point to the modified commit then its own content hash will change. When you modify git history (such as when using an interactive rebase), git re-writes all the subsequent commits, and they all get new hashes. Otherwise the commit you modified will be an orphan that isn't in the history of any head, and it will eventually be garbage collected. Now, "subsequent" is relative to a branch. Indeed, git commits form a linked list that only points backwards in history. The only way to get to a commit (other than by direct reference) is to start from one of the heads and search backwards for it. So, if you "modify" a commit you could do so only in one branch, and then the hashes in the other branches would not change, but then again neither would the content of that commit. If you could just modify a commit in place without touching all the subsequent commit hashes then there would be no need for the "git replace" command. That works by storing a special kind of reference in a separate location. The original linked list of commits is untouched, but the substitute commit is read in place of the replaced commit whenever it is accessed. But, as I pointed out before there are some caveats with that. In case this isn't clear, here is an analogy using gpg which I think most people will understand. Imagine that I'm paranoid about security so I sign all my emails. However, I go a step further. In addition to signing each email, before I sign it I go ahead and generate a hash of the last email I sent and include it in the content of my current email, which gets signed when I send it. My emails would basically form a linked list backwards in time. Now imagine that an attacker manages to steal my private key. They could forge new emails using my identity, but if they wanted to go tamper with a historical email that might be archived on some list archive they would be faced with a problem. If they modify the email and issue a new signature then its hash wouldn't match the next email I sent in sequence. They would be forced to go out and modify every record of every email I ever sent otherwise the alteration would be at risk of discovery. While git doesn't really use hashes for cryptographic purposes (ugh, sha1), it ends up working the same way. The emails don't actually contain a direct hash of every previous email, but they contain a hash that is essentially traceable to every previous email. -- Rich ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug