Re: Function won't complete - Mailing list pgsql-sql

From Josh Berkus
Subject Re: Function won't complete
Date
Msg-id web-1386712@davinci.ethosmedia.com
Whole thread Raw
In response to Function won't complete  ("Josh Berkus" <josh@agliodbs.com>)
Responses Re: Function won't complete  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-sql
Tom,

> Uh, what PG version are you running?  7.1.3 or later should not have
> a
> problem with WAL bloat because of long transactions.

7.2.1.  The problem is not specifically the WAL bloat ... it's
large/many WAL files combined with single-IDE-disk-access.

To recap, because I think this experience might benefit other folks on
this list:

The setup:
1. I had a large, data-heavy system that, because of problems at the
client's site, I had to move to my development machine.
2. Unlike the client's server (which has dual-controller ultra SCSI
drives), my development machine has a single 30gb IDE drive.
3. I was debugging the process of moving data from the client's
immense, not-normalized, accounting database to a smaller, normalized,
backend database for a web interface.  
4. The accounting data consists primarily of six tables with 110,000,
900,000, 110,000, 800, 250,000, and 300 records.
5. This data was fed into a function that tests for irregular data and
flags it, fills in missing defaults, aggregates the data, and loads it
into 7 tables in the new database with a better relational structure
and more normalization controls.

The Problem:
The function never completed.  Even when left to a running time of 3.5
hours, it was still "thinking."

Analysis:
By upping the debug level and tailing the logs, I could see what was
happening.  Because all of the activity was taking place inside a
single function, the database had to be prepared to roll it all back as
a single transaction.  As a result, with each succeeding operation
within the function, the calls to the transaction log got longer and
slower.  On a dual-SCSI or RAID system, this probably would have finished OK
anyway.  But, on a consumer-grade single IDE drive, the system kept
suffering longer and longer waits for reading back data from the drive
(both the database and the WAL files) before it could proceed to the
next step.A check of vmstat confirmed this.  The CPU was not swamped (60-92%
activity) and the system RAM was barely touched (about 128 of 512 mb in
use).  However, there was a constant stream of disk I/O which hovered
around the limit for the drive.

The Solution
Since my system was not to be the permanent home of the application, I
didn't want to run out and spend $700 on drives and a SCSI controller.What to do?
1. The first step was to break up the single function into 6 functions
that write their results to a log.  Each succeeding function would then
check the log for the previous function's result before proceeding, and
run VACUUM ANALYZE between functions.
2. The second step was to check Explain on each of the significant data
steps.  Through this, I discovered three missing indexes (including one
column where I was doing a SELECT DISTINCT ON, a real
perfromance-killer if you don't have an index).  I created those
indexes.
3. I wrote a shell script to launch all 6 functions in succession.
4. The re-built operation now completes in about 35 minutes.

-Josh Berkus




______AGLIO DATABASE SOLUTIONS___________________________                                      Josh Berkus Complete
informationtechnology      josh@agliodbs.com  and data management solutions       (415) 565-7293 for law firms, small
businesses       fax 621-2533   and non-profit organizations.      San Francisco
 


pgsql-sql by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: pg_shadow password
Next
From: Tom Lane
Date:
Subject: Re: Function won't complete