JP Vossen on 20 Dec 2018 17:22:59 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Git: net time gain or loss?


Clay, elsewhere you asked:
	"What are you trying to do that is causing you to waste so much time?"
	What's been your biggest struggle?"

I deliberately didn't go into details because:

1. A bunch of people will say, "oh that's easy you just..." but the point is that it was NOT easy for me and wasted a lot of my time. And that happens to me a lot with Git, and virtually never with Bzr. OK, it happens a bit with SVN, but that wasted time is just how to recover from errors when it does something stupid, which is a totally different can of worms. 2. I didn't want to get bogged down in the details, because the point is the net time gain or loss, not how to do 'foo'.
3. That's my story and I'm sticking too it.
4. This is my Final Countdown (HA!  Charlie & Will, I beat you to it! :)

Pouring gasoline...

Note, anyone interested in the history of all this should go read http://www.catb.org/esr/writings/version-control/version-control.html. I personally find it fascinating, but most people...don't. :-)

On 12/20/18 4:34 PM, Wells, Clay A wrote:
Wow! My comments are below. I really don't mean to sound trollish
but IMHO someone needs to speak to some of your comments. Don't want
to start yet another war over Git but seriously.. if people took more
time to learn about Git and understand how it works and why it was
designed the way it was then threads like these would no longer exist.

Not always! Sometimes yes. If you are a Real Programmer(tm) for a living or for a serious hobby, then yes, a steep learning curve on a powerful but dangerous tool makes good sense.

But a LOT of people who are brainwashed into using Git (because Github?) aren't that! They just want to be able to use a DVCS for whatever and move on. You do not and SHOULD NOT have to know how your dishwasher works to use it. You should just be able to use it and move on. You can NOT do that with Git; you have to learn way too much about dishwasher guts--and that wastes time.


I should probably keep quiet and not comment but no good can come from
that.

Yeah, I obviously have that problem too.  :-)


So, before I get started. The point here is people really need to take
the time to learn about how Git works.. I mean REALLY learn and why Git
was designed the way it was.

Start here, https://www.youtube.com/watch?v=4XpnKHJAok8

No.  No they should NOT.  See above.


Cheers

On 12/20/18 2:01 PM, Lee H. Marzke wrote:
After using many SCM systems over the years I totally agree that git (command-line) is a complete
mess, but most developers don't really use Git directly but some local GUI tool to hide the complexity.

Totally disagree!

I agree that the CLI is a horrendous mess. I disagree that most Git uses use a GUI. But then Lee & I seem to see very different segments of the business world. I can't think of anyone I know who uses Git who uses a GUI. Oh wait, maybe a Windows Java dev or two at $WORK.

As proof of the CLI being a mess I submit Fred's alias email! There are some great hacks in there. A lot of them. REALLY a lot! Why are they needed? Because Git is a really powerful, potentially dangerous tool with a really bad interface and no data abstraction. Do you really want to have to build the tools with which to assemble your dishwasher before you use it? Every time? I don't. I want my dishes done and that's it.


I see large Enterprise customers slowly uncovering Git's limitations, and realizing that significant downsides
exist with Git at scale, and are starting to look for alternatives that support large installations
with less effort.  Now for single developers or small business the following doesn't apply.

If scale is an issue then people clearly don't know what they are doing.

Microsoft and Rich's later post disagree with you there. I don't have an opinion.
https://blogs.msdn.microsoft.com/bharry/2017/02/03/scaling-git-and-some-back-story/


I think Git users tend to lean towards git because it's the only thing that they have used, or have only used
other older tools like CVS, etc. but the arguments for using git over other SCM's have been slowly been overcome.
Yes, there were some valid arguments about some Git features being better 5 years ago, but the problems
with Git were always left out.  It's not convenient to use SHA digests instead of linear version numbers or
be unable to check in large repositories or binary files.

Yeah, somehow Git got all the mindshare and Bzr & Hg just got sidelined. And I really wonder how much Gitlab had to do with that. Github seems to have gotten a LOT of things REALLY right, REALLY fast. So why do YOU use Git? Because of Git or because of Gitlab or because "everyone else uses it" (like Fred said)...probably because of Github!

I really, REALLY hate the digest thing! You can't tell by looking at two of them which is old and which is new, and THAT leads to all kinds of contortions in the interface to hack-around that problem.

lol

Git will happily take binary files. However, since they are not text
files you can't use text-based tools like diff.

Agreed. I'm not aware of anything that "supports" arbitrary binaries. I don't know how you could, even theoretically. What some tools can do is fork out to some other tool that does understand some specific binaries, for some use cases. But that's all I think I know about that.

Git rebase is great for private dev work, but badly breaks things if used on code that has been distributed
to others.    Since Git keeps change history locally,  a Git repo on your machine uses 2x to 3x the
local disk space of just the raw code,  and the initial sync time is longer since you have to
sync the code and past history ( but arguments for Git don't mention this fact ).   The need to split up repositories
into multiple small units, and lack of support for large binaries is always left out.  ( Binary support can be
added with LFS - but that is a central server solution - not native Git )

Why rebase. There are other ways to "fix" user generated mistakes. I
challenge you to prove this statement re: disk usage. I 100% don't
believe your claim of a Git repo using 2x to 3x of local storage.
Would love to see a side by side comparison with other SCMs.

I can't talk about rebase, because I have to admit that's one Git aspect that I do not fully understand and which seems like it's nothing but razor blades to me.

That fact that Git can re-write history is a catastrophic bug to me! I'd personally argue that something that can do that is no longer a version control system, it's just chaos. However, I do see some use cases for it and for a really large project it is probably required. :-(

For space use, Git keeps the entire repo on-disk, as everyone notes. I think the locally on-disk thing is a feature, not a bug, but it doesn't matter, that's how every single distributed VCS (DVCS) works. They *have* to work that way or they aren't distributed! SVN only keeps 1 copy (basically HEAD) on disk, plus your sandbox. But then anything that touches the repo is a (very slow) network operation.

Yeah, Git (or any other DVCS) will periodically garbage collect and repack, so sizes will fluctuate. But they don't keep diffs anymore, they all keep all the things and just compress them. I really did like the RCS/CVS reverse-chrono-diff thing, very clever.

I can't prove that Git will use 2-3x space, but it's certainly quite possible if you have a lot of churn in data does doesn't diff or compress well.

I'd love to see some numbers on this. I have these 2 use cases, but one is OT.

Somewhat OT I keep my Zim wiki in Bzr, which is a DVCS so it keeps the entire repo like Git. I don't have a clue how close the actual data storage is to Git, my guess is "not very." But it's what I have. Right now the dir is 11M, 5.5 data and 5.5 .bzr. I check it every week or so and it's been up to 5.9 before it has re-packed. I have no binary files in there but do make a lot of changes to plain text files, so it'll eventually grow. Yes, I know Bzr != Git, but that's all I have there.

The opposite case is really interesting. I have 2.7G of config files from around 200 servers, which is checked into Git daily. There's only a couple of weeks of data in there and those servers change rarely. They are also very similar, so there is a huge amount of duplication. The .git dir? 209M. Git wins big on that one.

But...simple operations on this repo, in part, are what prompted this email in the first place.


Developers really don't need 'distributed' SCM, because any Enterprise will force all the code into a
central repository anyway.   What they really need is:  1) off-line access, 2) ability to hide
intermediate commits (often mistakes ) and only check in the final result. and 3) easy branching and
good merge tools.

You clearly don't understand the concept of a distributed SCM. Just
want to point that out.

For this one I think Lee is arguing it's not a matter of what you individually think you want, it's what the business need is. I agree with him here.

I'm pretty sure Github, Gitlab and Bitbucket agree with Lee too...


Enterprise infrastructure admins don't like Git because of lack of central AD authentication, lack of
tools to remove mistakes, and lack of fine-grain access controls, lack of binary support, etc.

AD auth.. seriously!?! Those admins clearly don't understand Git's
architecture. Git will happily commit binaries. How in the world can
any SCM track changes in a binary?

Wait, are you saying that Gitlab and GitHub don't have AD and LDAP auth backends? Cause...I'm pretty sure they do, for just this reason! And I KNOW Bitbucket has that, because I use that at work...via Crowd and AD.


So perhaps taking the best feature of both Git and best of breed Enterprise SCM features is really the best solution.
Turns out that the 2018 Helix SCM tool now has features of both Perforce and Native Git included.

Boy Lee, you REALLY love Perforce... In all the years I've known you I think Perform comes up in about 1/3 of our conversations. ;-)


Helix has two internal repositories ( one for native p4 ) and one for native Git ( graph depot ).  This means
that Git users can now connect natively with the Helix server,  but you also get Active directory
integration and global replication etc.   Note that the central Helix server has had additional
commands added to support ALL the git features on top of all the existing Perforce features such as
seamless replication.

The two ( P4 and Git ) repositories in Helix are not fully connected yet.  This is mainly designed to let Git teams checkin
to a Git native repo, and P4 users checkin to a P4 repo.  However a build system can sync the latest from both
p4 and git from a single client ( so larger projects can have legacy teams on p4, and new projects on Git )

Now this costs significant money, so it's mainly targeted at large Enterprises, but it seems from the amount of recent
inquiries that large organizations are finally hitting a wall with Git issues and Git training,  complexity of multiple
repos, lack of binary support, etc.

:face palm:

Lee & I often disagree on commercial offerings like this. I philosophically prefer F/OSS, even though it may cost something in terms of resources (get it going, care & feeding, etc.). Lee goes for the core competencies & bottom line. But as noted we see different parts of the world. :-)

For me personally at $WORK, getting budget for a tool is a nightmare that's 98% impossible, but people are "free" because they are already paid for in some other budget someplace else. I could never get Perforce at work in 1M years, but we've had Gitlab for years.

That said, I'm not familiar with Helix (or Perforce other than via Lee), but I can say that "Breezy" (the current fork of Canonical's Bazaar, which I really like), is doing similar "be a front-end to Git" stuff.


So as a developer ,  do you care what your Central Enterprise repository runs?  For example GitHub, GitLab
or Helix Team Hub (HTH),  as long as your Git tools connect ?  If your an infrastructure admin, what issues
do you have issues with Git ?

Note that HTH is just a hub,  and all code is still put into Helix or other Git repos.  HTH provides pull-request support
interface to CI tools, central authentication, etc.

So, in summary to JP's question,  yes people are slowly figuring out that while Git is good, it is not all it is
hyped up to be and becomes a problem to support in larger organizations.

People simply don't take the time to learn how Git works. They give
up.

I disagree. Git has a LOT of warts. It's the right tool for Linux Kernel development, and perhaps other projects with a similar scale and similar (arguably odd, as Rich notes later) work-flows. I still say it's the wrong tool for most users.

That doesn't matter, because Git won the war and like it or not--everyone does use it. But is that a net time gain or loss to the world?

So far, big loss to me, big win for Fred, though it'd be interesting if Fred had used Bzr or Hg first. Some alternate time-line I guess...

That said, while Bzr, Brz or Hg are a HELL of a lot friendlier and easier to use, they suffer even more from Lee's corporate view, mostly lacking even Gitlab, Bitbucket, etc. And I can't speak to scale either, but my guess is: not as good as Git. I'd argue that very few users actually need that scale though.

Elsewhere Fred said:
'It did take a while to grok. Not so much Git, I suspect, as my first "distributed VCS". Perhaps Bazaar and Mercurial would have been
as hard to grok?'

I found Bzr really easy, because it's very friendly and has great docs. In a way that hurt me with Git, because I expect the same and am always disappointed. Note, I have the Bat Book and have read it at least twice. But I shouldn't have had to!

I don't know how Bzr gets away with the integer revision numbers. I suspect they are per-repo only, which sounds like a show-stopper and might be. Since there is very often a canonical repo by convention, can they all get the integer-to-hash map from there? I dunno. Ditto Hg, which IIRC is per-repo.

And Fred, thanks for adding the pointer to my PLUG Preso.

Flame on,
JP
--  -------------------------------------------------------------------
JP Vossen, CISSP | http://www.jpsdomain.org/ | http://bashcookbook.com/
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug