Re: Complex database for testing, U.S. Census Tiger/UA - Mailing list pgsql-hackers

From cbbrowne@cbbrowne.com
Subject Re: Complex database for testing, U.S. Census Tiger/UA
Date
Msg-id 20030408185842.CC2013E65C@cbbrowne.com
Whole thread Raw
In response to Re: Complex database for testing, U.S. Census Tiger/UA  (Dustin Sallings <dustin@spy.net>)
List pgsql-hackers
Dustin Sallings wrote:
>     I think it was my first application I wrote in python which parsed
> the zip files containing these data and shoved it into a postgres system.
> I had multiple clients on four or five computers running nonstop for about
> two weeks to get it all populated.
> 
>     By the time I was done, and got my first index created, I began to
> run out of disk space.  I think I only had about 70GB to work with on the
> RAID array.

But this does not establish that this data represents a meaningful
"transactional" load.

Based on the sources, which presumably involve unique data, the
"transactions" are all touching independent sets of data, and are likely
to be totally uninteresting from the perspective of seeing how the
system works under /TRANSACTION/ load.

TRANSACTION loading will involve doing updates that actually have some
opportunity to trample on one another.  Multiple transactions
concurrently updating a single balance table.  Multiple transactions
concurrently trying to attach links to a table entry.  That sort of
thing.

I remember a while back when MSFT did a "enterprise scalability day,"
where they were trumpeting SQL Server performance on "hundreds of
millions of transactions."  At the time, I was at Sabre, who actually do
tens of millions of transactions per day, for passenger reservations
across lotso airlines.  Microsoft was making loud noises to the effect
that NT Server was wonderful for "enterprise transaction" work; the guys
at work just laughed, because the kind of performance they got involved
considerable amounts of 370 assembler to tune vital bits of the
systems.

What happened in the "scalability tests" was that Microsoft did much the
same thing you did; they had hordes of transactions going through that
were well, basically independent of one another.  They could "scale"
things up trivially by adding extra boxes.  Need to handle 10x the
transactions?  Well, since they don't actually modify any shared
resources, you just need to put in 10x as many servers.

And that's essentially what happens any time TPC-? benchmarks reach the
point of irrelevance; that happens every time someone figures out some
"hack" that is able to successfully partition the work load.  At that
point, they merely need to add a bit of extra hardware, and increasing
performance is as easy as adding extra processor boards.  The real world
doesn't scale so easily...
--
(concatenate 'string "cbbrowne" "@acm.org")
http://cbbrowne.com/info/emacs.html
Send  messages calling for fonts  not  available to the  recipient(s).
This can (in the case of Zmail) totally disable the user's machine and
mail system for up to a whole day in some circumstances.
-- from the Symbolics Guidelines for Sending Mail



pgsql-hackers by date:

Previous
From: Dustin Sallings
Date:
Subject: Re: Complex database for testing, U.S. Census Tiger/UA
Next
From: "Ron Peacetree"
Date:
Subject: Re: Anyone working on better transaction locking?