Toby DiPasquale on 23 Jul 2007 04:07:23 0000 
On Sun, Jul 22, 2007 at 08:37:39PM 0400, Randy Schmidt wrote: > Ok, I've heard Toby talk about statistics for two whole Philly > Pubnites and then this article pops up: > > http://www.zedshaw.com/rants/programmer_stats.html > > I'm thinking I should brush up on statistics since there are now two > people whom I watch and listen to when it comes to tech are talking > about statistics. > > Toby et al: any suggestions on references for "normal" statistics > and/or whatever it is you talked about at the pubnite? Just to comment on the Zed article: I'd read this a while ago and agreed at the time. I think programmer's should know more about statistics, but not for the reasons Zed thinks. Having one guy around that had a voice and that knew statistics is enough to settle the issues he was having. I now think it is important since basically every advance in software that has anything to do with data is right now and will continue to be for some time being bourne out of statistics in some way. Anyway... just to start out: I am not an expert in statistics in any shape or form. I'm simply a guy who very recently had to learn a whole lot more about statistics Real Fast(tm). Yeah, I'd grab a book, something tuned to businesspeople. I say this because this kind of book will have the least amount of "hightower" academic stuff to wade through as you are (re?)learning and you can always go back and flesh out your knowledge with that stuff once you know the terminology. When starting my path down this road, I went for pure applicability, and thus swallowed some longstanding points of pride regarding reading the highesttower shit I could find first. This worked out very well for me... so well, I will be doing it again in the future. I feel dirtier, but its more pragmatically useful ;) In terms of learning itself, knowing the concepts and applications is WAY more important than remembering any formulas or equations. My cousininlaw, currently a Statistics major at TCNJ, tells me that they actually hand out a sheet with all of the necessary formulas on it before the test; they don't give a shit if you remember any formulas, as you can always look that up. Knowing when you can use a t test versus a Chisquare test, though, is very hard to do without actually knowing it. <sidebar> I have to disagree with Angel, in that I don't believe that Wikipedia is a good source for learning statistics, or math in general. Refer to my previous posting in this area for (slightly) more detail: http://blog.cbcg.net/articles/2007/03/11/tobysfirstlawofwikipedia Basically, Wikipedia has some of the highesttower shit around. (*) </sidebar> Back to the suggestion at hand: what's a good reference. Personally, I like "Business Statistics" by Douglas Downing and Jeffrey Clark: http://www.amazon.com/BusinessStatisticsBarronsReview/dp/0764119834/ref=pd_bbs_2/10351980475610235?ie=UTF8&s=books&qid=1185160483&sr=82 Its fast, easy to read and has some exercises at the end of every chapter, which I find to be necessary to really learning something. You forget way less of something when you do the exercises at the end of the chapters to reinforce what you've just learned (at least, I do, anyway). Also, its got some Excelbased stuff in there, too, which will help you navigate that (this was nice for me, as I'd never used a spreadsheet program before in my life). Speaking of Excel, I'd definitely start with that, rather than R. I use both, and I love R, but its a significant mental burden to attempt to learn statistics *AND* R at the same time. R is a programming language and environment, Excel has a way lower mnemonic load barrier. OpenOffice Calc serves in this role, as well. You will find it necessary to step up to R (or RSRuby or RPy or whatever bridge you like) at some point, as Excel is pretty limited and is dogass slow on large datasets, but I'd caution against trying to learn them both at the same time, for the sake of your sanity. Also, once you get to R, make sure you go out and get that RAM upgrade. You'll need it. Chances are you are learning statistics to actually *use* it for something, so in the interest of getting to that point as quickly as possible, let the GUI do some of the work for once, at least while you're getting up to speed ;) (oh yeah, and if anyone figures out how to get OpenOffice Calc to actually show the function it determined when you do an XY chart with a regression, please let me know) If you're dead set on R, or are free to spend as much time on this endeavor as you like, you can start with an Rbased book such as: http://www.amazon.com/StatisticsIntroductionMichaelJCrawley/dp/0470022981/ref=pd_rhf_p_5/10351980475610235?ie=UTF8&qid=1185161406&sr=11 I have not read this book but there are several linked in the "also bought" section that also use R as the tool for statistical learning so you can probably find one that doesn't suck. As far as I know, there is no canonical "master book for learning R", though the R project page lists many: http://www.rproject.org/doc/bib/Rbooks.html In terms of prerequisites for your journey into statisticsland, make sure you are uptospeed on algebra and calculus, at least the singlevariable variety. I kid you not, I went out and got the Algebra II and Calculus for Dummies books in order to brush back up on these subjects and they were surprisingly good. They have no exercises after each chapter, which sucks, but I supplemented the Calc one with "Calculus Refresher" by A.A. Klaf, which is basically all exercises ;) Note to women readers: the tagline of that book is mysogynistic, but it was written in 1944 when that was far more acceptable. Not that I'm apologizing, just giving context... http://www.amazon.com/CalculusRefresherKlaf/dp/0486203700/ref=pd_bbs_sr_1/10351980475610235?ie=UTF8&s=books&qid=1185161406&sr=11 If you're superhyper about it, you can try tackling "Calculus and Statistics" by Michael C. Gemignani. This combines the two in one (kinda short) text where the print is small. I own, but have not thoroughly read this text and think that its somewhat weaker on the statistics side. http://www.amazon.com/CalculusStatisticsMichaelCGemignani/dp/0486449939/ref=pd_bbs_sr_1/10351980475610235?ie=UTF8&s=books&qid=1185161385&sr=11 A pretty good little book is "Practical Statistics" by Russell Langley. This one is a pretty broad look at all of the areas of statistics. http://www.amazon.com/PracticalStatisticsExplainedExplainingScience/dp/0486227294/ref=pd_bbs_sr_1/10351980475610235?ie=UTF8&s=books&qid=1185161366&sr=11 Now, in your travels, you may see reference to the terms "frequentist" and "Bayesian". These reference two schools of thought on the underpinnings of statistical theory. Learn frequentist statistics first. If you're unsure which one you're currently reading about, look for the word "Bayesian". If you don't see it or you see R. A. Fisher referenced in a positive or neutral way, chances are very good (p=0.95) that you're reading about frequentist statistics. Bayesian is way hotter right now and worth learning, but is generally accepted to be much harder to grasp. (not that I correctly grasp it, of course) Finally, should any of you actually undertake this experience and make it into the journeyman stage, look me up about a job. P.S. I own no stock in, nor do I know anyone who works/worked for, Dover Publishing. I just like their books because they seem to spend money on authors that actually know their shit rather than fancy paper or graphics. Oh yeah, another thing: you won't be finding any fancy graphics in Dover books.  Toby DiPasquale _______________________________________________ To unsubscribe or change your settings, visit: http://lists.phillyonrails.org/mailman/listinfo/talk

