Re: [PLUG] Git: net time gain or loss?

JP Vossen on 20 Dec 2018 17:22:59 -0800

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Git: net time gain or loss?

From: JP Vossen <jp@jpsdomain.org>
To: plug@lists.phillylinux.org
Subject: Re: [PLUG] Git: net time gain or loss?
Date: Thu, 20 Dec 2018 20:22:54 -0500
Reply-to: Philadelphia Linux User's Group Discussion List <plug@lists.phillylinux.org>
Sender: "plug" <plug-bounces@lists.phillylinux.org>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1

Clay, elsewhere you asked:
	"What are you trying to do that is causing you to waste so much time?"
	What's been your biggest struggle?"

I deliberately didn't go into details because:

1. A bunch of people will say, "oh that's easy you just..." but thepoint is that it was NOT easy for me and wasted a lot of my time. Andthat happens to me a lot with Git, and virtually never with Bzr. OK, ithappens a bit with SVN, but that wasted time is just how to recover fromerrors when it does something stupid, which is a totally different canof worms.2. I didn't want to get bogged down in the details, because the point isthe net time gain or loss, not how to do 'foo'.

3. That's my story and I'm sticking too it.
4. This is my Final Countdown (HA!  Charlie & Will, I beat you to it! :)

Pouring gasoline...

Note, anyone interested in the history of all this should go readhttp://www.catb.org/esr/writings/version-control/version-control.html.I personally find it fascinating, but most people...don't. :-)


On 12/20/18 4:34 PM, Wells, Clay A wrote:

Wow! My comments are below. I really don't mean to sound trollish
but IMHO someone needs to speak to some of your comments. Don't want
to start yet another war over Git but seriously.. if people took more
time to learn about Git and understand how it works and why it was
designed the way it was then threads like these would no longer exist.

Not always! Sometimes yes. If you are a Real Programmer(tm) for aliving or for a serious hobby, then yes, a steep learning curve on apowerful but dangerous tool makes good sense.

But a LOT of people who are brainwashed into using Git (because Github?)aren't that! They just want to be able to use a DVCS for whatever andmove on. You do not and SHOULD NOT have to know how your dishwasherworks to use it. You should just be able to use it and move on. Youcan NOT do that with Git; you have to learn way too much aboutdishwasher guts--and that wastes time.

I should probably keep quiet and not comment but no good can come from
that.


Yeah, I obviously have that problem too.  :-)

So, before I get started. The point here is people really need to take
the time to learn about how Git works.. I mean REALLY learn and why Git
was designed the way it was.

Start here, https://www.youtube.com/watch?v=4XpnKHJAok8


No.  No they should NOT.  See above.

Cheers

On 12/20/18 2:01 PM, Lee H. Marzke wrote:

After using many SCM systems over the years I totally agree that git (command-line) is a complete
mess, but most developers don't really use Git directly but some local GUI tool to hide the complexity.


Totally disagree!

I agree that the CLI is a horrendous mess. I disagree that most Gituses use a GUI. But then Lee & I seem to see very different segments ofthe business world. I can't think of anyone I know who uses Git whouses a GUI. Oh wait, maybe a Windows Java dev or two at $WORK.

As proof of the CLI being a mess I submit Fred's alias email! There aresome great hacks in there. A lot of them. REALLY a lot! Why are theyneeded? Because Git is a really powerful, potentially dangerous toolwith a really bad interface and no data abstraction. Do you really wantto have to build the tools with which to assemble your dishwasher beforeyou use it? Every time? I don't. I want my dishes done and that's it.

I see large Enterprise customers slowly uncovering Git's limitations, and realizing that significant downsides
exist with Git at scale, and are starting to look for alternatives that support large installations
with less effort.  Now for single developers or small business the following doesn't apply.


If scale is an issue then people clearly don't know what they are doing.

Microsoft and Rich's later post disagree with you there. I don't havean opinion.

https://blogs.msdn.microsoft.com/bharry/2017/02/03/scaling-git-and-some-back-story/

I think Git users tend to lean towards git because it's the only thing that they have used, or have only used
other older tools like CVS, etc. but the arguments for using git over other SCM's have been slowly been overcome.
Yes, there were some valid arguments about some Git features being better 5 years ago, but the problems
with Git were always left out.  It's not convenient to use SHA digests instead of linear version numbers or
be unable to check in large repositories or binary files.

Yeah, somehow Git got all the mindshare and Bzr & Hg just got sidelined.And I really wonder how much Gitlab had to do with that. Github seemsto have gotten a LOT of things REALLY right, REALLY fast. So why do YOUuse Git? Because of Git or because of Gitlab or because "everyone elseuses it" (like Fred said)...probably because of Github!

I really, REALLY hate the digest thing! You can't tell by looking attwo of them which is old and which is new, and THAT leads to all kindsof contortions in the interface to hack-around that problem.

lol

Git will happily take binary files. However, since they are not text
files you can't use text-based tools like diff.

Agreed. I'm not aware of anything that "supports" arbitrary binaries.I don't know how you could, even theoretically. What some tools can dois fork out to some other tool that does understand some specificbinaries, for some use cases. But that's all I think I know about that.

Git rebase is great for private dev work, but badly breaks things if used on code that has been distributed
to others.    Since Git keeps change history locally,  a Git repo on your machine uses 2x to 3x the
local disk space of just the raw code,  and the initial sync time is longer since you have to
sync the code and past history ( but arguments for Git don't mention this fact ).   The need to split up repositories
into multiple small units, and lack of support for large binaries is always left out.  ( Binary support can be
added with LFS - but that is a central server solution - not native Git )


Why rebase. There are other ways to "fix" user generated mistakes. I
challenge you to prove this statement re: disk usage. I 100% don't
believe your claim of a Git repo using 2x to 3x of local storage.
Would love to see a side by side comparison with other SCMs.

I can't talk about rebase, because I have to admit that's one Git aspectthat I do not fully understand and which seems like it's nothing butrazor blades to me.

That fact that Git can re-write history is a catastrophic bug to me!I'd personally argue that something that can do that is no longer aversion control system, it's just chaos. However, I do see some usecases for it and for a really large project it is probably required. :-(

For space use, Git keeps the entire repo on-disk, as everyone notes. Ithink the locally on-disk thing is a feature, not a bug, but it doesn'tmatter, that's how every single distributed VCS (DVCS) works. They*have* to work that way or they aren't distributed! SVN only keeps 1copy (basically HEAD) on disk, plus your sandbox. But then anythingthat touches the repo is a (very slow) network operation.

Yeah, Git (or any other DVCS) will periodically garbage collect andrepack, so sizes will fluctuate. But they don't keep diffs anymore,they all keep all the things and just compress them. I really did likethe RCS/CVS reverse-chrono-diff thing, very clever.

I can't prove that Git will use 2-3x space, but it's certainly quitepossible if you have a lot of churn in data does doesn't diff orcompress well.

I'd love to see some numbers on this. I have these 2 use cases, but oneis OT.

Somewhat OT I keep my Zim wiki in Bzr, which is a DVCS so it keeps theentire repo like Git. I don't have a clue how close the actual datastorage is to Git, my guess is "not very." But it's what I have. Rightnow the dir is 11M, 5.5 data and 5.5 .bzr. I check it every week or soand it's been up to 5.9 before it has re-packed. I have no binary filesin there but do make a lot of changes to plain text files, so it'lleventually grow. Yes, I know Bzr != Git, but that's all I have there.

The opposite case is really interesting. I have 2.7G of config filesfrom around 200 servers, which is checked into Git daily. There's onlya couple of weeks of data in there and those servers change rarely.They are also very similar, so there is a huge amount of duplication.The .git dir? 209M. Git wins big on that one.

But...simple operations on this repo, in part, are what prompted thisemail in the first place.

Developers really don't need 'distributed' SCM, because any Enterprise will force all the code into a
central repository anyway.   What they really need is:  1) off-line access, 2) ability to hide
intermediate commits (often mistakes ) and only check in the final result. and 3) easy branching and
good merge tools.


You clearly don't understand the concept of a distributed SCM. Just
want to point that out.

For this one I think Lee is arguing it's not a matter of what youindividually think you want, it's what the business need is. I agreewith him here.


I'm pretty sure Github, Gitlab and Bitbucket agree with Lee too...

Enterprise infrastructure admins don't like Git because of lack of central AD authentication, lack of
tools to remove mistakes, and lack of fine-grain access controls, lack of binary support, etc.


AD auth.. seriously!?! Those admins clearly don't understand Git's
architecture. Git will happily commit binaries. How in the world can
any SCM track changes in a binary?

Wait, are you saying that Gitlab and GitHub don't have AD and LDAP authbackends? Cause...I'm pretty sure they do, for just this reason! And IKNOW Bitbucket has that, because I use that at work...via Crowd and AD.

So perhaps taking the best feature of both Git and best of breed Enterprise SCM features is really the best solution.
Turns out that the 2018 Helix SCM tool now has features of both Perforce and Native Git included.

Boy Lee, you REALLY love Perforce... In all the years I've known you Ithink Perform comes up in about 1/3 of our conversations. ;-)

Helix has two internal repositories ( one for native p4 ) and one for native Git ( graph depot ).  This means
that Git users can now connect natively with the Helix server,  but you also get Active directory
integration and global replication etc.   Note that the central Helix server has had additional
commands added to support ALL the git features on top of all the existing Perforce features such as
seamless replication.

The two ( P4 and Git ) repositories in Helix are not fully connected yet.  This is mainly designed to let Git teams checkin
to a Git native repo, and P4 users checkin to a P4 repo.  However a build system can sync the latest from both
p4 and git from a single client ( so larger projects can have legacy teams on p4, and new projects on Git )

Now this costs significant money, so it's mainly targeted at large Enterprises, but it seems from the amount of recent
inquiries that large organizations are finally hitting a wall with Git issues and Git training,  complexity of multiple
repos, lack of binary support, etc.


:face palm:

Lee & I often disagree on commercial offerings like this. Iphilosophically prefer F/OSS, even though it may cost something in termsof resources (get it going, care & feeding, etc.). Lee goes for thecore competencies & bottom line. But as noted we see different parts ofthe world. :-)

For me personally at $WORK, getting budget for a tool is a nightmarethat's 98% impossible, but people are "free" because they are alreadypaid for in some other budget someplace else. I could never getPerforce at work in 1M years, but we've had Gitlab for years.

That said, I'm not familiar with Helix (or Perforce other than via Lee),but I can say that "Breezy" (the current fork of Canonical's Bazaar,which I really like), is doing similar "be a front-end to Git" stuff.

So as a developer ,  do you care what your Central Enterprise repository runs?  For example GitHub, GitLab
or Helix Team Hub (HTH),  as long as your Git tools connect ?  If your an infrastructure admin, what issues
do you have issues with Git ?

Note that HTH is just a hub,  and all code is still put into Helix or other Git repos.  HTH provides pull-request support
interface to CI tools, central authentication, etc.

So, in summary to JP's question,  yes people are slowly figuring out that while Git is good, it is not all it is
hyped up to be and becomes a problem to support in larger organizations.


People simply don't take the time to learn how Git works. They give
up.

I disagree. Git has a LOT of warts. It's the right tool for LinuxKernel development, and perhaps other projects with a similar scale andsimilar (arguably odd, as Rich notes later) work-flows. I still sayit's the wrong tool for most users.

That doesn't matter, because Git won the war and like it ornot--everyone does use it. But is that a net time gain or loss to theworld?

So far, big loss to me, big win for Fred, though it'd be interesting ifFred had used Bzr or Hg first. Some alternate time-line I guess...

That said, while Bzr, Brz or Hg are a HELL of a lot friendlier andeasier to use, they suffer even more from Lee's corporate view, mostlylacking even Gitlab, Bitbucket, etc. And I can't speak to scale either,but my guess is: not as good as Git. I'd argue that very few usersactually need that scale though.


Elsewhere Fred said:

'It did take a while to grok. Not so much Git, I suspect, as my first"distributed VCS". Perhaps Bazaar and Mercurial would have been

as hard to grok?'

I found Bzr really easy, because it's very friendly and has great docs.In a way that hurt me with Git, because I expect the same and am alwaysdisappointed. Note, I have the Bat Book and have read it at leasttwice. But I shouldn't have had to!

I don't know how Bzr gets away with the integer revision numbers. Isuspect they are per-repo only, which sounds like a show-stopper andmight be. Since there is very often a canonical repo by convention, canthey all get the integer-to-hash map from there? I dunno. Ditto Hg,which IIRC is per-repo.


And Fred, thanks for adding the pointer to my PLUG Preso.

Flame on,
JP
--  -------------------------------------------------------------------
JP Vossen, CISSP | http://www.jpsdomain.org/ | http://bashcookbook.com/
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug

Follow-Ups:
- Re: [PLUG] Git: net time gain or loss?
  - From: Rich Freeman <r-plug@thefreemanclan.net>
- Re: [PLUG] Git: net time gain or loss?
  - From: Aaron Mulder <ammulder@alumni.princeton.edu>
- Re: [PLUG] Git: net time gain or loss?
  - From: Fred Stluka <fred@bristle.com>

References:
- [PLUG] Git: net time gain or loss?
  - From: JP Vossen <jp@jpsdomain.org>
- Re: [PLUG] Git: net time gain or loss?
  - From: "Lee H. Marzke" <lee@marzke.net>
- Re: [PLUG] Git: net time gain or loss?
  - From: "Wells, Clay A" <clayw@sas.upenn.edu>

Prev by Date: Re: [PLUG] Git: net time gain or loss?
Next by Date: Re: [PLUG] Git: net time gain or loss?
Previous by thread: Re: [PLUG] Git: net time gain or loss?
Next by thread: Re: [PLUG] Git: net time gain or loss?
Index(es):
- Date
- Thread