Re: New to the list; would this be an okay question? - Mailing list pgsql-general

From Madison Kelly
Subject Re: New to the list; would this be an okay question?
Date
Msg-id 40D85F11.6040104@alteeve.com
Whole thread Raw
In response to Re: New to the list; would this be an okay question?  (Richard Huxton <dev@archonet.com>)
List pgsql-general
Richard Huxton wrote:
> Madison Kelly wrote:
>
>> Richard Huxton wrote:
>>
>>> Madison Kelly wrote:
>>>
>>>> Hi all,
>>>>
>>>>   I am new to the list and I didn't want to seem rude at all so I
>>>> wanted to ask if this was okay first.
>>>
>>>
>>>
>>>
>>> No problem. Reading your message below, you might want to try the
>>> performance list, but general is a good place to start.
>>>
>>>>   I have a program I have written in perl which uses a postgresSQL
>>>> database as the backend. The program works but the performance is
>>>> really bad. I have been reading as much as I can on optimizing
>>>> performance but still it isn't very reasonable. At one point I had
>>>> my program able to process 175,000 records in 16min 10sec on a
>>>> Pentium3 650MHz, 448MB RAM test machine. Since then I got a Pentium3
>>>> 1GHz, 512MB system and I have tried a lot of things to get the
>>>> performance up but now it is substantially slower and I can't seem
>>>> to figure out what I am doing wrong.
>>>
>>>
>>>
>>>
>>> A few places to start:
>>> 1. VACUUM FULL
>>>    This will make sure any unused space is reclaimed
>>> 2. ANALYZE
>>>    This will recalculate stats for the tables
>>> 3. Basic performce tuning:
>>>    http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
>>>    There's also a good guide to the postgresql.conf file on varlena.com
>>>
>>>>   Would it be appropriate to ask for help on my program on this
>>>> list? Full disclosure: The program won't be initially GPL'ed because
>>>> it is for my company but it will be released for free to home users
>>>> and the source code will be made available (similar to other
>>>> split-license programs) though once my company makes it's money back
>>>> I think they will fully GPL it (I am on my boss's case about it :p ).
>>>
>>>
>>>
>>>
>>> No problem - what you licence your software under is your concern.
>>> Once you've taken the basic steps described above, try to pick out a
>>> specific query that you think is too slow and provide:
>>>
>>> 1. PostgreSQL version
>>> 2. Basic hardware info (as you have)
>>> 3. Sizes of tables.
>>> 4. Output of EXPLAIN ANALYZE <query here>
>>>
>>> The EXPLAIN ANALYZE runs the query and shows how much work PG thought
>>> it would be and how much it actually turned out to be.
>>>
>>> HTH
>>
>>
>>
>>     Thank you very much!! I am using Psql 7.4 on a stock install of
>> Fedora Core 2 on my IBM thinkpad a22m (P3 1GHz, 512MB RAM, not the
>> fastest HDD). The drive carrier I am using is connected via USB2 and
>> uses a few different hard drives with the fastest being a couple of
>> Barracuda 7200.7 drives (2MB cache, 7,200rpm). I described the program
>> in my reply to Martijn so here is some of the code (code not related
>> to psql snipped, let me know if posting it would help - sorry for the
>> wrapping...):
>
>
> I'm not clear if the database is on the local disk or attached to the
> USB2. Not sure it's important, since neither will be that fast.
>
> If I understand, you scan thousands or millions of files for backup
> purposes and then issue a select + update/insert for each.
>
> Once a partition is scanned, a flag is cleared on all rows.
>
> Once all selected files have been dealt with a vaccum/analyse is issued.
>
>
> Some things to look at:
> 1. How many files are you handling per second? Are the disks involved in
> the backup as well as the database?
> 2. What does the output of "vmstat 10" show when the system is running.
> Is your I/O saturated? CPU?
> 3. Is your main index (file_src_uuid,file_name,file_parent_dir) being
> used? Your best bet is to select from "pg_stat_indexes" before and after.
> 4. If you are updating several hundred thousand rows then you probably
> don't have enought vacuum memory set aside - try a vacuum full after
> each set of updates.
> 5. You might want to batch together queries into transactions of a few
> hundred or even few thousand updates.

Hi Richard,

   The database is on the system's local disk and the destination drives
are on the USB-connected drives. It is possible to include the server's
local disk in a backup job. Without the DB calls I was able to process
~4000 files in ~2secs.

   'vmstat 10' shows (system running):
r 0; b 0; swapd 3396; free 192356; buff 7084; cache 186508; si 0; so 0;
bi 0; bo 5; in 1023; cs 82; us 1; sy 0; id 99; wa 0

   'vmstat 10' shows (while the program is running):
r 1; b 0; swapd 480; free 180252; buff 8600; cache 195412; si 0; so 0;
bi 6; bo 163; in 1037; cs 281; us 97; sy 3; id 0; wa 0

   I'm afraid that I am pretty new to postgres (and programming period)
so I am not sure if the indexes are being used. I assumed they where
simple because I created them but then again assumptions always get me
in trouble. :p When you say "select from "pg_stat_indexes"" do you mean
select from there those three field instead of 'SELECT <blah> FROM
file_dir'?.

   I currently have "VACUUM ANALYZE" set to run after every big set of
UPDATE/INSERTs but that isn't enough? Would it also help if I kept a
count of how many records had been processed and running "VACUUM
{ANALYZE:FULL}" every so many 'x' records?

   I have been trying to bunch jobs into a single transaction but for
some reason whenever I try to do that PERL dies on me with an empty
"$DBI:errstr" error (it shows what line it died on but it is blank where
it normally says what went wrong). I was trying to do this like this:

$DB->do("BEGIN TRANSACTION") || die $DBI::errstr;
# Jump into the sub-routine:
&list_files($real_dir, $exclude_list_num, $relative_dir, $file_backup,
$file_restore, $file_display);
$DB->do("COMMIT;") || die $DBI::errstr;

Inside the subroutine I kept a count of how much time went by and to
commit the current transaction every 250secs and start a new transaction
block:

if ( $rec_num > $rec_disp )
{
     $rec_disp=($rec_disp+500);
     $nowtime=time;
     $processtime=($nowtime - $starttime);
     if ( $processtime >= $committime )
     {
         $committime=$committime+250;
         $DB->do("COMMIT") || die $DBI::errstr;
         $DB->do("BEGIN TRANSACTION") || die $DBI::errstr;
         system 'echo " |- Committed processed records to the database,
updating committime to \''.$committime.'\' and proceeding:" >> '.$log;
     }
system 'echo " |- Processing record number \''.$rec_num.'\',
\''.$processtime.'\' seconds into operation." >> '.$log;
}


   Is what I am doing wrong obvious at all? When I enter "BEGIN
TRANSACTION" directly into 'psql' it seems to work fine so I am sure the
syntax is right. Is it a perl prolem maybe?

   Thank you so much for your help!!

Madison

pgsql-general by date:

Previous
From: Vams
Date:
Subject: Re: psql
Next
From: Tom Lane
Date:
Subject: Re: How can I add a new language localization(locale) support