Home > mailing lists

Performance on large data transformations - Mailing list pgsql-performance

From	Josh Berkus
Subject	Performance on large data transformations
Date	March 17, 2003 15:38:41
Msg-id	200303170938.38365.josh@agliodbs.com Whole thread Raw
Responses	Re: Performance on large data transformations ("Ron Mayer" <ron@intervideo.com>)
List	pgsql-performance

Tree view

Folks,

On one database, I have an overnight data transformation procedure that goes
like:

TableA has about 125,000 records.

Begin Transaction:
1) Update 80% of records in TableA
2) Update 10% of records in TableA
3) Update 65% of records in TableA
4) Update 55% of records in TableA
5) Update 15% or records in TableA with references to other records in TableA
6) Flag what hasn't been updated.
Commit

I've found that, no matter what my FSM settings (I've gone as high as
1,000,000) by the time I reach step 4 execution has slowed down considerably,
and for step 5 it takes the server more than 10 minutes to complete the
update statement. During this period, CPU, RAM and disk I/O are almost idle
... the system seems to spend all of its time doing lengthy seeks.  There is,
for that matter, no kernel swap activity, but I'm not sure how to measure
Postgres temp file activity.

(FYI:  Dual Athalon 1600mhz/1gb/Hardware Raid 1 with xlog on seperate SCSI
drive/Red Hat Linux 8.0/PostgreSQL 7.2.4)

The only way around this I've found is to break up the above into seperate
transactions with VACUUMs in between, and "simulate" a transaction by making
a back-up copy of the table and restoring from it if something goes wrong.
I've tried enough different methods to be reasonably certain that there is no
way around this in 7.2.4.

The reason I bring this up is that PostgreSQL's dramatic plunge in performance
in large serial updates is really problematic for us in the OLAP database
market, where large data transformations, as well as extensive use of
calculated temporary tables, is common.    I was particularly distressed when
I had to tell a client considering switching from MSSQL to Postgres for an
OLAP database that they might just be trading one set of problems for
another.

Is there any way we can improve on this kind of operation in future versions
of PostgreSQL?

--
Josh Berkus
Aglio Database Solutions
San Francisco

pgsql-performance by date:

From: Chris Sutton
Date: 17 March 2003, 15:12:32
Subject: Re: postgresql meltdown on PlanetMath.org

From: Max Baker
Date: 17 March 2003, 16:33:38
Subject: Re: postgresql meltdown on PlanetMath.org

Performance on large data transformations - Mailing list pgsql-performance

Previous

Next