Kyle R. Burton on 10 Feb 2011 13:26:22 -0800

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Speakers needed for all chapters!

> I'd like to hear a talk on non-relational databases such as CouchDB or
> MongoDB.

Toby has given a talk in the past about Cassandra - might want to hit
him up and see if he'll talk about it again.

Someone from Philly Lambda gave a talk on Mongo DB.

> Laying out data in terms of rows and columns seems really
> natural to me.  I'd love to have someone familiar with NoSQL concepts
> tell me why I'm wrong.

I am not an experienced user of any of popular nosql databases, but I
have used a non-relational datastore a few companies ago.  Some
aspects that are different (IMO) are:

- many nosql databases are effectively document stores
- many relax ACID (few if any support transactions outside of CUD for
a single document)
- they do this (at least in spirit) to gain performance
- cassandra bills itself as 'eventually consistent' for exactly this reason
- they have no fixed data model, documents are heterogeneous
- don't have to plan a strict schema up front
- don't have to migrate old documents to update the schema (somewhat
backward/forward compatible)
- you give up being able to use SQL to access the datastore
- 'indexes' are often just a function over the document store, stored
right back into the system
- since there is no equivalent to a join, sharding across multiple
systems can be easier uses a non-relational schema so that it's many
customers can define their own data models and their app can still
have (somewhat) consistent access methods.  Some that I've seen are:
xpath, javascript (I think I remember that in couch db, indexes are
created with a map/reduce), or just a single key lookup (which may
return multiple documents).

At the company where we used a non-relational store we did it for the
same kind of reason: we created a data integration system that took
snapshots (data and schema) of relational databases and merged them
together (via matching) and could extract out of that consolidated
form a relational snapshot (model and data) of your choosing (even
multiples with different consolidation rules).  At the time I felt it
was a good approach because it kept us from applying non-reversible
changes to input data - at least up until we needed to export it to
send it back to a customer.

Today my impression is that many go after nosql because the barrier
for getting one up and running, even across a cluster, is easier than
with a relational database (eg: to get clustering or replication set
up).  They're seen as simpler to administer too because, well, they're
simpler than a relational database.  I tend to agree with many of
these sentiments (esp the ability for something like cassandra to
scale), though I will still choose an RDBMS for most projects, if only
so that business users can use the tools they're accustomed to (sql,
odbc, etc).

> I also second the call for an IPv6 talk.  It seems like something many
> of us don't know enough about.

Is there anyone on the list that could give one on converting a
corporate network to ipv6?


Twitter: @kyleburton
Philadelphia Linux Users Group         --
Announcements -
General Discussion  --