Re: New to the list; would this be an okay question? - Mailing list pgsql-general
From | Madison Kelly |
---|---|
Subject | Re: New to the list; would this be an okay question? |
Date | |
Msg-id | 40D85F11.6040104@alteeve.com Whole thread Raw |
In response to | Re: New to the list; would this be an okay question? (Richard Huxton <dev@archonet.com>) |
List | pgsql-general |
Richard Huxton wrote: > Madison Kelly wrote: > >> Richard Huxton wrote: >> >>> Madison Kelly wrote: >>> >>>> Hi all, >>>> >>>> I am new to the list and I didn't want to seem rude at all so I >>>> wanted to ask if this was okay first. >>> >>> >>> >>> >>> No problem. Reading your message below, you might want to try the >>> performance list, but general is a good place to start. >>> >>>> I have a program I have written in perl which uses a postgresSQL >>>> database as the backend. The program works but the performance is >>>> really bad. I have been reading as much as I can on optimizing >>>> performance but still it isn't very reasonable. At one point I had >>>> my program able to process 175,000 records in 16min 10sec on a >>>> Pentium3 650MHz, 448MB RAM test machine. Since then I got a Pentium3 >>>> 1GHz, 512MB system and I have tried a lot of things to get the >>>> performance up but now it is substantially slower and I can't seem >>>> to figure out what I am doing wrong. >>> >>> >>> >>> >>> A few places to start: >>> 1. VACUUM FULL >>> This will make sure any unused space is reclaimed >>> 2. ANALYZE >>> This will recalculate stats for the tables >>> 3. Basic performce tuning: >>> http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php >>> There's also a good guide to the postgresql.conf file on varlena.com >>> >>>> Would it be appropriate to ask for help on my program on this >>>> list? Full disclosure: The program won't be initially GPL'ed because >>>> it is for my company but it will be released for free to home users >>>> and the source code will be made available (similar to other >>>> split-license programs) though once my company makes it's money back >>>> I think they will fully GPL it (I am on my boss's case about it :p ). >>> >>> >>> >>> >>> No problem - what you licence your software under is your concern. >>> Once you've taken the basic steps described above, try to pick out a >>> specific query that you think is too slow and provide: >>> >>> 1. PostgreSQL version >>> 2. Basic hardware info (as you have) >>> 3. Sizes of tables. >>> 4. Output of EXPLAIN ANALYZE <query here> >>> >>> The EXPLAIN ANALYZE runs the query and shows how much work PG thought >>> it would be and how much it actually turned out to be. >>> >>> HTH >> >> >> >> Thank you very much!! I am using Psql 7.4 on a stock install of >> Fedora Core 2 on my IBM thinkpad a22m (P3 1GHz, 512MB RAM, not the >> fastest HDD). The drive carrier I am using is connected via USB2 and >> uses a few different hard drives with the fastest being a couple of >> Barracuda 7200.7 drives (2MB cache, 7,200rpm). I described the program >> in my reply to Martijn so here is some of the code (code not related >> to psql snipped, let me know if posting it would help - sorry for the >> wrapping...): > > > I'm not clear if the database is on the local disk or attached to the > USB2. Not sure it's important, since neither will be that fast. > > If I understand, you scan thousands or millions of files for backup > purposes and then issue a select + update/insert for each. > > Once a partition is scanned, a flag is cleared on all rows. > > Once all selected files have been dealt with a vaccum/analyse is issued. > > > Some things to look at: > 1. How many files are you handling per second? Are the disks involved in > the backup as well as the database? > 2. What does the output of "vmstat 10" show when the system is running. > Is your I/O saturated? CPU? > 3. Is your main index (file_src_uuid,file_name,file_parent_dir) being > used? Your best bet is to select from "pg_stat_indexes" before and after. > 4. If you are updating several hundred thousand rows then you probably > don't have enought vacuum memory set aside - try a vacuum full after > each set of updates. > 5. You might want to batch together queries into transactions of a few > hundred or even few thousand updates. Hi Richard, The database is on the system's local disk and the destination drives are on the USB-connected drives. It is possible to include the server's local disk in a backup job. Without the DB calls I was able to process ~4000 files in ~2secs. 'vmstat 10' shows (system running): r 0; b 0; swapd 3396; free 192356; buff 7084; cache 186508; si 0; so 0; bi 0; bo 5; in 1023; cs 82; us 1; sy 0; id 99; wa 0 'vmstat 10' shows (while the program is running): r 1; b 0; swapd 480; free 180252; buff 8600; cache 195412; si 0; so 0; bi 6; bo 163; in 1037; cs 281; us 97; sy 3; id 0; wa 0 I'm afraid that I am pretty new to postgres (and programming period) so I am not sure if the indexes are being used. I assumed they where simple because I created them but then again assumptions always get me in trouble. :p When you say "select from "pg_stat_indexes"" do you mean select from there those three field instead of 'SELECT <blah> FROM file_dir'?. I currently have "VACUUM ANALYZE" set to run after every big set of UPDATE/INSERTs but that isn't enough? Would it also help if I kept a count of how many records had been processed and running "VACUUM {ANALYZE:FULL}" every so many 'x' records? I have been trying to bunch jobs into a single transaction but for some reason whenever I try to do that PERL dies on me with an empty "$DBI:errstr" error (it shows what line it died on but it is blank where it normally says what went wrong). I was trying to do this like this: $DB->do("BEGIN TRANSACTION") || die $DBI::errstr; # Jump into the sub-routine: &list_files($real_dir, $exclude_list_num, $relative_dir, $file_backup, $file_restore, $file_display); $DB->do("COMMIT;") || die $DBI::errstr; Inside the subroutine I kept a count of how much time went by and to commit the current transaction every 250secs and start a new transaction block: if ( $rec_num > $rec_disp ) { $rec_disp=($rec_disp+500); $nowtime=time; $processtime=($nowtime - $starttime); if ( $processtime >= $committime ) { $committime=$committime+250; $DB->do("COMMIT") || die $DBI::errstr; $DB->do("BEGIN TRANSACTION") || die $DBI::errstr; system 'echo " |- Committed processed records to the database, updating committime to \''.$committime.'\' and proceeding:" >> '.$log; } system 'echo " |- Processing record number \''.$rec_num.'\', \''.$processtime.'\' seconds into operation." >> '.$log; } Is what I am doing wrong obvious at all? When I enter "BEGIN TRANSACTION" directly into 'psql' it seems to work fine so I am sure the syntax is right. Is it a perl prolem maybe? Thank you so much for your help!! Madison
pgsql-general by date: