Michael Lazin via plug on 8 May 2023 09:30:46 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Please don't upload my code on GitHub


I am actually not certain if this is jousting windmills, because GitHub does let you choose your own software licensing, for example MIT licensed software is ok to use in commercial products while GPL is not.  I think a bit of caution concerning this feature may be justified if you are intending on using code that is MIT licensed for a commercial product for example.  If a bot is scraping code from all of Github and you do want to make a commercial product this would put the onus of caution on you as the developer.

Thanks, 

Michael Lazin

.. τὸ γὰρ αὐτὸ νοεῖν ἐστίν τε καὶ εἶναι.


On Mon, May 8, 2023 at 12:18 PM Rich Freeman via plug <plug@lists.phillylinux.org> wrote:
On Mon, May 8, 2023 at 11:49 AM Ron Guilmet via plug
<plug@lists.phillylinux.org> wrote:
>
> If your code is GPL it shouldn’t matter that CoPilot looks at it right? The author seems to worry that GPL code will end up in proprietary software, asking that you label your code not to be placed on GitHub if forked. If the repo is open the same thing can happen you would think.

This feels a bit like tilting at windmills.  Humans look at GPL code
and write non-GPL code all the time.  It isn't surprising that AI does
the same.

>From the talks on generative AI that I've listened to by people who
seem to know what they're talking about, they basically work the same
way that humans do, with the caveat that we're talking about
non-expert humans.

Don't expect a generative AI to be as meticulous and careful as Linus
Torvalds.  However, it might write code comparably to the average
outsourced programmer who just grabs things off of stack overflow and
uses them non-critically.  That isn't intended as a knock really - the
reality is that 90% of the software that exists are written by this
sort of developer.

I don't think that having GPL code in the training dataset for an AI
somehow taints the AI, in the same way that a human looking at GPL
code doesn't forever taint anything they ever write with the GPL.

The other factor is that for every public-facing AI like CoPilot that
people are paying attention to, there are probably 50-100 more being
trained on the same web-accessible data behind closed doors.  Even if
you only put your code on gitlab and put a bunch of keep out signs on
it, odds are somebody working for some company overseas is going to
have some bot discover and clone your repo and check it into their AI
training set, and then churn out code used in closed source products.

In any case, as the article points out, it isn't even something you
can legally require.  If somebody posted such a tag on their GPL
software, I'd just choose to follow their instructions to honor the
GPL and post it wherever I wanted in the spirit of the GPL.  I fork
repos onto github from time to time, if only to take advantage of the
auto-generation of tarballs for tags, or to add release tags for
projects that lack them (which I can then use for packaging).

--
Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug