Thread: Function won't complete
Folks, I am currently testing a massive data update function, which takes a large amount of raw data (in 3 tables comprising over 2 million records total) and organizes and cleans that data in a series of successive steps for the final relational database. The problem is, it doesn't complete. About 1/3 of the way in to the data updates, postgres gets stuck in an endless cycle of recycling transaction logs and child processes and won't finish (current longest abort time is 37 minutes). I've upped the various memory parameters significantly ... working with wal_files 32, wal_buffers 32, sort_mem 65536, shared_buffers 256, etc.The CPU (1400mhz athalon) and memory (512Mb DDR RAM)are hardly being taxed. The only hardware bottleneck I can see is that the system only has one IDE drive (though a fast, large one). I've added many indexes. Does anyone have an suggestions on making this process complete, short of running out and buying a RAID SCSI array? -Josh Berkus
"Josh Berkus" <josh@agliodbs.com> writes: > I am currently testing a massive data update function, which takes a > large amount of raw data (in 3 tables comprising over 2 million records > total) and organizes and cleans that data in a series of successive > steps for the final relational database. > The problem is, it doesn't complete. About 1/3 of the way in to the > data updates, postgres gets stuck in an endless cycle of recycling > transaction logs and child processes and won't finish (current longest > abort time is 37 minutes). You sure it's not just a programming error (the proverbial unconstrained join, for example)? Without seeing the query(s) I doubt we can help you much. regards, tom lane
Tom, > Uh, what PG version are you running? 7.1.3 or later should not have > a > problem with WAL bloat because of long transactions. 7.2.1. The problem is not specifically the WAL bloat ... it's large/many WAL files combined with single-IDE-disk-access. To recap, because I think this experience might benefit other folks on this list: The setup: 1. I had a large, data-heavy system that, because of problems at the client's site, I had to move to my development machine. 2. Unlike the client's server (which has dual-controller ultra SCSI drives), my development machine has a single 30gb IDE drive. 3. I was debugging the process of moving data from the client's immense, not-normalized, accounting database to a smaller, normalized, backend database for a web interface. 4. The accounting data consists primarily of six tables with 110,000, 900,000, 110,000, 800, 250,000, and 300 records. 5. This data was fed into a function that tests for irregular data and flags it, fills in missing defaults, aggregates the data, and loads it into 7 tables in the new database with a better relational structure and more normalization controls. The Problem: The function never completed. Even when left to a running time of 3.5 hours, it was still "thinking." Analysis: By upping the debug level and tailing the logs, I could see what was happening. Because all of the activity was taking place inside a single function, the database had to be prepared to roll it all back as a single transaction. As a result, with each succeeding operation within the function, the calls to the transaction log got longer and slower. On a dual-SCSI or RAID system, this probably would have finished OK anyway. But, on a consumer-grade single IDE drive, the system kept suffering longer and longer waits for reading back data from the drive (both the database and the WAL files) before it could proceed to the next step.A check of vmstat confirmed this. The CPU was not swamped (60-92% activity) and the system RAM was barely touched (about 128 of 512 mb in use). However, there was a constant stream of disk I/O which hovered around the limit for the drive. The Solution Since my system was not to be the permanent home of the application, I didn't want to run out and spend $700 on drives and a SCSI controller.What to do? 1. The first step was to break up the single function into 6 functions that write their results to a log. Each succeeding function would then check the log for the previous function's result before proceeding, and run VACUUM ANALYZE between functions. 2. The second step was to check Explain on each of the significant data steps. Through this, I discovered three missing indexes (including one column where I was doing a SELECT DISTINCT ON, a real perfromance-killer if you don't have an index). I created those indexes. 3. I wrote a shell script to launch all 6 functions in succession. 4. The re-built operation now completes in about 35 minutes. -Josh Berkus ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
"Josh Berkus" <josh@agliodbs.com> writes: > Analysis: > By upping the debug level and tailing the logs, I could see what was > happening. Because all of the activity was taking place inside a > single function, the database had to be prepared to roll it all back as > a single transaction. As a result, with each succeeding operation > within the function, the calls to the transaction log got longer and > slower. This analysis is nonsense, because PG does not rely on WAL for transaction rollback, and the amount of WAL activity is *not* proportional to transaction length. (At least not since 7.1.2.) It might be that the real issue is growth of the list of pending triggers, if you have deferred triggers (eg RI triggers) on the relations you are updating. Or it could be something else; since you say the thing remains I/O-bound, it seems like we must be talking about tuple access or updates somewhere. (If VACUUM between steps helps, it might just be accumulation of dead tuples.) However, until you drop your focus on the WAL we'll not find out what's really the bottleneck... regards, tom lane
Tom, > This analysis is nonsense, because PG does not rely on WAL for > transaction rollback, and the amount of WAL activity is *not* > proportional to transaction length. (At least not since 7.1.2.) <snip> > However, until you drop > your focus on the WAL we'll not find out what's really the > bottleneck... Sorry, Tom. I was used to the problems of 7.1.2, and didn't really "get it" when you told me things had changed. I still say it's the disk I/O, and I think your explanation of dead tuples makes a lot of sense. The debug log is full of this: DEBUG: proc_exit(0) DEBUG: shmem_exit(0) DEBUG: exit(0) DEBUG: reaping dead processes DEBUG: child process (pid 11933) exited with exit code 0 DEBUG: proc_exit(0) DEBUG: shmem_exit(0) DEBUG: exit(0) DEBUG: reaping dead processes DEBUG: child process (pid 11939) exited with exit code 0 And each one of the cycles about takes 5-10 minutes. I'm a little reluctant to dump everything to the list, as we're talking about a lot of data and code. Lemme do some judicious editing and I'll send you a gzip package this week. -Josh Berkus
Anyone know of a utility to convert a sybase .db file to something postgres can use? If not, to convert to anything that I know the file format of to itself to convert into postgres. All preferably open-source. Thanks.
Frank- Try this link: http://pgadmin.postgresql.org/pgadmin2.php?ContentID=15 You might find some other clues in the techdocs: http://techdocs.postgresql.org/ -Nick -------------------------------------------------------------------------- Nick Fankhauser nickf@ontko.com Phone 1.765.935.4283 Fax 1.765.962.9788 Ray Ontko & Co. Software Consulting Services http://www.ontko.com/ > -----Original Message----- > From: pgsql-sql-owner@postgresql.org > [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Frank Morton > Sent: Monday, April 29, 2002 6:09 PM > To: pgsql-sql@postgresql.org > Subject: [SQL] convert from sybase to postgresql > > > Anyone know of a utility to convert a sybase .db > file to something postgres can use? If not, to convert > to anything that I know the file format of to itself > to convert into postgres. All preferably open-source. > > Thanks. > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html >