gabriel rosenkoetter on Wed, 9 Apr 2003 14:42:10 -0400


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Can Open Source Replace Oracle?


On Wed, Apr 09, 2003 at 10:46:27AM -0400, Edmund Goppelt wrote:
> I'd like Philadelphians to be able to look up land ownership records.

Is it, at present, possible for us to do this by USPS?

Is there a law saying we should be able to?

> It turns out the City already has a web site that offers this
> information.  For three years, a group of 50 private companies have
> been able to look at deed images and related info. at no cost, by
> going here:

At no cost? Are you sure they didn't front some cash for the city's
initial layout?

Why can't they just use the one of those license they hopefully
preserved to do a SQL dump of their Oracle tablespace and had that
to you on CD? (I presume there's no IP issues... this is all public
information, right?).

> 1. Doing so would overburden the City's Internet connection.

Quite possible. Colocation doesn't solve this problem; if they only
pay for a certain amount of bandwidth per month and get cut off when
it's exceeded, they're removing the information from the public
domain again.

> 2. Their license with Oracle Corporation restricted them to 50 named
> users (i.e., not simultaneous users, but the same 50 people).

Sounds true to me.

> I asked her recently why she didn't just ditch Oracle and use
> PostgreSQL or MySQL.

Because PostgreSQL can't come close to serving large databases at
the speed that Oracle can (we're talking orders of magnitude here),
because MySQL fails the ACID test, because neither is capable of
accessing raw disk in an even remotely sane way (Veritas quick I/O
is what they're almost definitely using with Oracle) and because
switching to *anything* is a HUGE development cost.

SQL may be a "standard", but a database in use within one DBMS
cannot just be magically transferred to another DBMS without some
significant work by a DBA skilled in both DBMSes (a rarity; no,
really!) and some (probably even more) signficant work by a
developer on the external interfaces to the database.

> For the record, Hallwatch runs MySQL and Zope on an 800 Mhz
> Celeron, 512 MB RAM, 40 GB HD off of a shared T-1 connection.

Which would be totally insufficient to the task you'd like to ask of
it.

> In your opinion, what hardware configuration does this application
> require?

Well, you're out of your mind to use commidity PC hardware. If you
insist on doing this on an IA32 machine (why? No, really; why?)
using Linux (again, why?), you're looking at maybe an IBM xSeries 

For years' data is around 1 million rows.

You don't happen to know how much data's in a rwo, do you?

How many years do you intend to keep online?

What indexing do you intend to do across them?

You could *very* quickly be looking at terabyte quantities of disk.

I really doubt Postgres will hold up under more than a couple of
years of data (MySQL isn't really an option even for one year of
data; sapdb might do better than Postgres, but not by much).

Before you suggest that you'd be content to get a new system ever
couple of years to manage the next couple of years of data, consider
the utility of being able to compare between years, over a decade,
so forth.

> Do you know of any government entities that are using one of the Open
> Source databases?

Do you know of any companies providing the level of support that
companies like Oracle and Veritas do?

(And I mean actually *providing* it. I've been very disappointed
with Red Hat, for instance...)

> Unless I hear from you otherwise, I will assume that it is ok to show
> your comments to the Commissioner or other City officials.

I'm not sure how that could possibly make a difference. The city has
no reason to believe that we're not completely figments of your
imagination unless we show up and testify, do they?

On Wed, Apr 09, 2003 at 11:45:48AM -0400, Jeff Weisberg wrote:
> the largest postgres installation I know of is ".org"

How many rows is that?

How many bytes per row?

On what bases is it indexed?

On Wed, Apr 09, 2003 at 12:28:43PM -0400, Michael Leone wrote:
> Hardware is secondary - you can throw bunches of hardware at it, but if
> the software doesn't offer a needed feature, more hardware won't (always)
> help.

There's software and then there's software. A big part of Oracle's
being faster (above and beyond how much faster it just *is* than
open source DBMSes) is using Veritas Volume Manager with Quick I/O
to access backing disk as raw but through the file system, meaning
that adding tablespace is very easy (as opposed to very painful with
true raw partitions), and that operations like backups function
along normal FS lines.

In addition, Veritas permits of checkpoints, snapshots, and a
variety of mechanisms to assure data integrity over time, across
branch analysis, and so forth. These are all things that are
theoretically feasible under Linux and with a DBMS other than
Oracle, but they are siginficantly more difficult. Difficulty of
management matters.

On Wed, Apr 09, 2003 at 04:33:41PM -0000, greg@turnstep.com wrote:
> > 1. Doing so would overburden the City's Internet connection.
> The city exists to serve the people. If the bandwidth becomes an 
> issue (and I seriously doubt it will), then the city should upgrade 
> their connection. That's like arguing that new roads could overburden 
> the city's traffic, so they should not be built.

No, it's like arguing that running SEPTA lines like the R3 every ten
minutes rather than every hour would overburden the rails and the
infrastructure (architectural and human) to support the trains.

It's *possible* for the city to spend more on Internet service, but
where do you think it's written that it's their obligation to
provide this information to you at your convenience swallowing all
costs themselves? (And if your suggestion is that our taxes should
pay for this, then you need to take your argument to the people who
allocate tax funds.)

> > 2. Their license with Oracle Corporation restricted them to 50 named
> > users (i.e., not simultaneous users, but the same 50 people).
> Sounds like a very poor licensing decision.

It's not a decision, it's all Oracle will sell you these days.

> PostgreSQL would definitely be up to the job.

I strongly disagree.

How many rows are in the largest Postgres database you've dealt with?

How much data (byte-wise)?

What indexes?

> Support for PostgreSQL can be purchased from many companies. 

Name two.

(I know of exactly one.)

> Even after buying such support, the money saved from not using Oracle would 
> be quite substantial.

Could you compare the costs, including the labor time of development
to convert, please?

On Wed, Apr 09, 2003 at 01:00:13PM -0400, Chris Hedemark wrote:
> Sucks.  If it overburdens the city's internet connection, then so be 
> it.  

And what of non-public-informational uses of the Internet by the
city?

And what of the charge that, when their bandwidth is saturated,
they're again restricting access to the information?

> I think there are licensing options on Oracle for concurrent 
> connections rather than named users, so there should be some options 
> there too.

Not any more. We have such a license at work, Chris, but it is
IMPOSSIBLE to get those from Oracle now, and we're very careful to
pay Oracle bills on time to avoid their revoking it.

> Someone made some stupid decisions and fixing it might call 
> into question who & why the original bad decisions were made in the 
> first place.

Which decisions, precisely, do you think were stupid?

> If they can provide you with a temporary read-only account to their 
> Oracle server, and the database schema, you should be able to push that 
> to your own machine with no problem if you're sitting on their LAN.  

They can't do that, but I fail to see why they can't give him a SQL
dump.

> While there are some people here who would disagree, most of the 
> objections against PostgreSQL that I have heard are based off of flawed 
> evaluations.

I know precisely which evaluation you're referring to Chris, and you
really are wrong, and Barry really is right.

He really did examine the ways in which Postgres manages its memory
usage (both on disk and in memory proper) and performs operations on
it, and it really does lose. The suggestion was never that Postgres
on a random Dell 1RU server should compete with Oracle on a
fully-populated E450, but that operations should happen at an
appropriate speed guaged by the relative processor, I/O, and memory
speed of the systems. They don't. (And Oracle running on identical
hardware under Linux wins too, btw. Well it used to... with the
stupid interactions we've seen between it and ext3 lately, who
knows.)

> Anyway, it's moot.  You just want the data, right?  They can give that 
> to you without burning an Oracle license and without burning their 
> bandwidth.

Agreed. And I think this is the route that's most likely to get Ed
the end result he wants.

-- 
gabriel rosenkoetter
gr@eclipsed.net

Attachment: pgpF8aEN2GCDz.pgp
Description: PGP signature