Toby DiPasquale on 23 Jul 2007 04:07:23 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PhillyOnRails] Somewhat OT, Statistics


On Sun, Jul 22, 2007 at 08:37:39PM -0400, Randy Schmidt wrote:
> Ok, I've heard Toby talk about statistics for two whole Philly
> Pubnites and then this article pops up:
> 
> http://www.zedshaw.com/rants/programmer_stats.html
> 
> I'm thinking I should brush up on statistics since there are now two
> people whom I watch and listen to when it comes to tech are talking
> about statistics.
> 
> Toby et al: any suggestions on references for "normal" statistics
> and/or whatever it is you talked about at the pubnite?

Just to comment on the Zed article: I'd read this a while ago and agreed at 
the time. I think programmer's should know more about statistics, but not for
the reasons Zed thinks. Having one guy around that had a voice and that knew
statistics is enough to settle the issues he was having. I now think it is
important since basically every advance in software that has anything to
do with data is right now and will continue to be for some time being
bourne out of statistics in some way. 

Anyway... just to start out: I am not an expert in statistics in any shape
or form. I'm simply a guy who very recently had to learn a whole lot more
about statistics Real Fast(tm).

Yeah, I'd grab a book, something tuned to business-people. I say this
because this kind of book will have the least amount of "high-tower"
academic stuff to wade through as you are (re-?)learning and you can
always go back and flesh out your knowledge with that stuff once you know
the terminology. When starting my path down this road, I went for pure
applicability, and thus swallowed some long-standing points of pride
regarding reading the highest-tower shit I could find first. This worked
out very well for me... so well, I will be doing it again in the future. I
feel dirtier, but its more pragmatically useful ;-)

In terms of learning itself, knowing the concepts and applications is WAY 
more important than remembering any formulas or equations. My
cousin-in-law, currently a Statistics major at TCNJ, tells me that they
actually hand out a sheet with all of the necessary formulas on it before
the test; they don't give a shit if you remember any formulas, as you can
always look that up. Knowing when you can use a t test versus a Chi-square
test, though, is very hard to do without actually knowing it. 

<sidebar>
I have to disagree with Angel, in that I don't believe that Wikipedia is a
good source for learning statistics, or math in general. Refer to my
previous posting in this area for (slightly) more detail:

http://blog.cbcg.net/articles/2007/03/11/tobys-first-law-of-wikipedia

Basically, Wikipedia has some of the highest-tower shit around. (*)
</sidebar>

Back to the suggestion at hand: what's a good reference. Personally, I
like "Business Statistics" by Douglas Downing and Jeffrey Clark:

http://www.amazon.com/Business-Statistics-Barrons-Review/dp/0764119834/ref=pd_bbs_2/103-5198047-5610235?ie=UTF8&s=books&qid=1185160483&sr=8-2

Its fast, easy to read and has some exercises at the end of every chapter,
which I find to be necessary to really learning something. You forget way
less of something when you do the exercises at the end of the chapters to
reinforce what you've just learned (at least, I do, anyway). Also, its got
some Excel-based stuff in there, too, which will help you navigate that
(this was nice for me, as I'd never used a spreadsheet program before in
my life).

Speaking of Excel, I'd definitely start with that, rather than R. I use
both, and I love R, but its a significant mental burden to attempt to
learn statistics *AND* R at the same time. R is a programming language and
environment, Excel has a way lower mnemonic load barrier. OpenOffice Calc
serves in this role, as well. You will find it necessary to step up to R
(or RSRuby or RPy or whatever bridge you like) at some point, as Excel is
pretty limited and is dog-ass slow on large datasets, but I'd caution
against trying to learn them both at the same time, for the sake of your
sanity. Also, once you get to R, make sure you go out and get that RAM
upgrade. You'll need it.

Chances are you are learning statistics to actually *use* it for something, 
so in the interest of getting to that point as quickly as possible, let 
the GUI do some of the work for once, at least while you're getting up to
speed ;) (oh yeah, and if anyone figures out how to get OpenOffice Calc to
actually show the function it determined when you do an XY chart with a 
regression, please let me know)

If you're dead set on R, or are free to spend as much time on this
endeavor as you like, you can start with an R-based book such as:

http://www.amazon.com/Statistics-Introduction-Michael-J-Crawley/dp/0470022981/ref=pd_rhf_p_5/103-5198047-5610235?ie=UTF8&qid=1185161406&sr=1-1

I have not read this book but there are several linked in the "also
bought" section that also use R as the tool for statistical learning so
you can probably find one that doesn't suck. As far as I know, there is no
canonical "master book for learning R", though the R project page lists
many:

http://www.r-project.org/doc/bib/R-books.html

In terms of prerequisites for your journey into statistics-land, make sure
you are up-to-speed on algebra and calculus, at least the single-variable
variety. I kid you not, I went out and got the Algebra II and Calculus for
Dummies books in order to brush back up on these subjects and they were
surprisingly good. They have no exercises after each chapter, which sucks,
but I supplemented the Calc one with "Calculus Refresher" by A.A. Klaf,
which is basically all exercises ;-) Note to women readers: the tagline of
that book is mysogynistic, but it was written in 1944 when that was far
more acceptable. Not that I'm apologizing, just giving context...

http://www.amazon.com/Calculus-Refresher-Klaf/dp/0486203700/ref=pd_bbs_sr_1/103-5198047-5610235?ie=UTF8&s=books&qid=1185161406&sr=1-1

If you're super-hyper about it, you can try tackling "Calculus and
Statistics" by Michael C. Gemignani. This combines the two in one (kinda
short) text where the print is small. I own, but have not thoroughly read
this text and think that its somewhat weaker on the statistics side.

http://www.amazon.com/Calculus-Statistics-Michael-C-Gemignani/dp/0486449939/ref=pd_bbs_sr_1/103-5198047-5610235?ie=UTF8&s=books&qid=1185161385&sr=1-1

A pretty good little book is "Practical Statistics" by Russell Langley.
This one is a pretty broad look at all of the areas of statistics.

http://www.amazon.com/Practical-Statistics-Explained-Explaining-Science/dp/0486227294/ref=pd_bbs_sr_1/103-5198047-5610235?ie=UTF8&s=books&qid=1185161366&sr=1-1

Now, in your travels, you may see reference to the terms "frequentist" and
"Bayesian". These reference two schools of thought on the underpinnings of
statistical theory. Learn frequentist statistics first. If you're unsure
which one you're currently reading about, look for the word "Bayesian". If
you don't see it or you see R. A. Fisher referenced in a positive or
neutral way, chances are very good (p=0.95) that you're reading about
frequentist statistics. Bayesian is way hotter right now and worth 
learning, but is generally accepted to be much harder to grasp. (not that I 
correctly grasp it, of course)

Finally, should any of you actually undertake this experience and make it 
into the journeyman stage, look me up about a job.

P.S. I own no stock in, nor do I know anyone who works/worked for, Dover
Publishing. I just like their books because they seem to spend money on
authors that actually know their shit rather than fancy paper or graphics.
Oh yeah, another thing: you won't be finding any fancy graphics in Dover
books.

-- 
Toby DiPasquale
_______________________________________________
To unsubscribe or change your settings, visit:
http://lists.phillyonrails.org/mailman/listinfo/talk