Fred Stluka on 21 Dec 2018 13:31:58 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Git: net time gain or loss?


Rich,

So, if two people attempt to push at similar times, the first to get
in will have a successful push, and the next will get an error.  That
second committer must do a pull and a rebase, and then push again.

If the pace of commits is large enough then this can become a
significant bottleneck, with lots of committers spending a lot of time
rebasing commits only to fail repeatedly to push them.

Yes, I have seen the situation where people are pushing so often
that my push is sometimes based on older commits, and I have
to pull again before pushing.  You are right -- I can see how this
could become a serious problem at scale.

But, I'm not sure what rebase has to do with this.  When I get this
error, I always just pull, which does a merge.  If the changes made
were in unrelated parts of the files, the merge is automatic.  If not
there are "merge conflicts" that I have to resolve manually because,
for example, another developer and I made unrelated changes to
the same line of code.  If the merge is automatic, it's fast, and so
my next push works fine.  If not, I may take some time to manually
resolve the conflicts, and someone may meanwhile do another
push, so my next push fails, and I have to do another pull first.
But I've never needed rebase in this scenario.  What am I missing?

For scalability, what about the workflow where folks fork (clone)
the repo, make their change, and issue a "pull request"?  Then a
smaller set of senior people are doing all the pulls from the forked
repos into the main repo.  And resolving all of the merge conflicts,
or rejecting the pull request so that the guy who did the fork has
to re-pull, resolve the conflicts and issue a new pull request.  In
this scenario, no one ever really pushes, they just pull upstream.

Would you expect this to also not scale?  It see it used on large
scale FOSS projects like Django.

--Fred
------------------------------------------------------------------------
Fred Stluka -- Bristle Software, Inc. -- http://bristle.com
#DontBeATrump -- Make America Honorable Again!
------------------------------------------------------------------------

On 12/20/18 6:06 PM, Rich Freeman wrote:
On Thu, Dec 20, 2018 at 5:22 PM Fred Stluka <fred@bristle.com> wrote:
Yeah, Git scales.  Linus wrote it to manage the huge number of
committers to Linux around the world.

Git sort-of scales.  Linus has a fairly unique workflow in the FOSS
world.  The official linux repo has but a single committer.  It might
grow large in size, and have many commits per day, but it never has
more than one person committing at the same time.

Git can handle any number of incoming commits at the same time, /as
long as those commits target different branches./

If more than one person attempts to commit to a single branch at the
same time, then only one can succeed without a merge commit, because
once one commit is merged the next no longer is parented against the
current head, and a fast-forward commit is not possible.  It is rarely
desirable to have an automated repository accept non-fast-forward
pushes, because nobody will have actually looked at the resulting
merge commits prior to them being committed and if there are conflicts
there is no possibility of manual review.

So, if two people attempt to push at similar times, the first to get
in will have a successful push, and the next will get an error.  That
second committer must do a pull and a rebase, and then push again.

If the pace of commits is large enough then this can become a
significant bottleneck, with lots of committers spending a lot of time
rebasing commits only to fail repeatedly to push them.

Now, the fact that git is distributed does allow all those committers
to continue to accumulate work in their private repos and ignore the
bottleneck, and then push all their commits at once when it is in less
contention.  These pushes are all-or-nothing so they probably aren't
going to be penalized for having 100 commits to push all at once.  It
does delay the dissemination of work, however.

Other VCS implementations are more file-based, and thus there isn't
the same kind of repository-level locking difficulty.

All that said, I think it is usually manageable in practice.  And
there are workarounds.  The Linux workflow of course works where
people cascade their commits up.  They basically sit queued up in
email inboxes until somebody applies their patches.  Another
workaround would be to have a collection of staging branches where
individuals can push their changes and then a scheduled process checks
for merge conflicts and if there are none merges the branch.  That
approach would result in many merge commits, which some find
distasteful, unless you rebase them, but that approach precludes gpg
signing commits.

I would not say that git is perfect.  However, in practice many of its
issues are the result of it not fitting in with preconceptions around
how a VCS ought to work, and potential users might do well to consider
if there is an opportunity to improve things by changing their
processes.

Now, going back to my earlier post, just as I have little hope of my
company ever managing 100 page requirement specifications in anything
other than Word, I also have little hope in them ever using git.  I'm
happy when I see people using subversion - they're more likely to just
stick everything on a shared drive, or maybe make occassional zip
snapshots and stick them in sharepoint.


___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug