Re: Function won't complete - Mailing list pgsql-sql
From | Josh Berkus |
---|---|
Subject | Re: Function won't complete |
Date | |
Msg-id | web-1386712@davinci.ethosmedia.com Whole thread Raw |
In response to | Function won't complete ("Josh Berkus" <josh@agliodbs.com>) |
Responses |
Re: Function won't complete
|
List | pgsql-sql |
Tom, > Uh, what PG version are you running? 7.1.3 or later should not have > a > problem with WAL bloat because of long transactions. 7.2.1. The problem is not specifically the WAL bloat ... it's large/many WAL files combined with single-IDE-disk-access. To recap, because I think this experience might benefit other folks on this list: The setup: 1. I had a large, data-heavy system that, because of problems at the client's site, I had to move to my development machine. 2. Unlike the client's server (which has dual-controller ultra SCSI drives), my development machine has a single 30gb IDE drive. 3. I was debugging the process of moving data from the client's immense, not-normalized, accounting database to a smaller, normalized, backend database for a web interface. 4. The accounting data consists primarily of six tables with 110,000, 900,000, 110,000, 800, 250,000, and 300 records. 5. This data was fed into a function that tests for irregular data and flags it, fills in missing defaults, aggregates the data, and loads it into 7 tables in the new database with a better relational structure and more normalization controls. The Problem: The function never completed. Even when left to a running time of 3.5 hours, it was still "thinking." Analysis: By upping the debug level and tailing the logs, I could see what was happening. Because all of the activity was taking place inside a single function, the database had to be prepared to roll it all back as a single transaction. As a result, with each succeeding operation within the function, the calls to the transaction log got longer and slower. On a dual-SCSI or RAID system, this probably would have finished OK anyway. But, on a consumer-grade single IDE drive, the system kept suffering longer and longer waits for reading back data from the drive (both the database and the WAL files) before it could proceed to the next step.A check of vmstat confirmed this. The CPU was not swamped (60-92% activity) and the system RAM was barely touched (about 128 of 512 mb in use). However, there was a constant stream of disk I/O which hovered around the limit for the drive. The Solution Since my system was not to be the permanent home of the application, I didn't want to run out and spend $700 on drives and a SCSI controller.What to do? 1. The first step was to break up the single function into 6 functions that write their results to a log. Each succeeding function would then check the log for the previous function's result before proceeding, and run VACUUM ANALYZE between functions. 2. The second step was to check Explain on each of the significant data steps. Through this, I discovered three missing indexes (including one column where I was doing a SELECT DISTINCT ON, a real perfromance-killer if you don't have an index). I created those indexes. 3. I wrote a shell script to launch all 6 functions in succession. 4. The re-built operation now completes in about 35 minutes. -Josh Berkus ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco