Re: How can I make use of both CPUs in a dual processor - Mailing list pgsql-performance

From John A Meinel
Subject Re: How can I make use of both CPUs in a dual processor
Date
Msg-id 420A2933.6060100@arbash-meinel.com
Whole thread Raw
In response to Re: How can I make use of both CPUs in a dual processor  (Alex <alex@meerkatsoft.com>)
Responses Re: How can I make use of both CPUs in a dual processor
List pgsql-performance
Alex wrote:

> Thanks John.
>
> Well as I mentioned. I have a  Dual AMD Opteron 64 2.4ghz, 15k rpm
> SCSI Disks, 4GB of memory.
> Disks are pretty fast and memory should be more than enough. Currently
> we dont have many concurrent connections.
>
Well, you didn't mention Opteron before (it makes a difference against
Xeons).
How many disks and in what configuration?
Do you have pg_xlog on a separate set of disks?
Are your drives in RAID 10 (0+1) or RAID 5?

If you have enough disks the recommended configuration is at least a
RAID1 for the OS, RAID 10 for pg_xlog (4drives), and RAID 10 (the rest
of the drives) for the actual data.

If your dataset is read heavy, or you have more than 6 disks, you can
get away with RAID 5 for the actual data. But since you are talking
about loading 8million rows at once, it certainly sounds like you are
write heavy.

If you only have a few disks, it's still probably better to put pg_xlog
on it's own RAID1 (2-drive) mirror. pg_xlog is pretty much append only,
so if you dedicate a disk set to it, you eliminate a lot of seek times.

> I run PG 8.0.1 on Fedora Core 3
>
> When I now run the batch job, one CPU runs in the 80-90% the other in
> 5-10% max.


Anyway, it doesn't completely sound like you are CPU limited, but you
might be able to get a little bit more if you spawn another process.
Have you tried dropping the index, doing the copy, and then recreating
the 4-indexes in separate processes?

The simple test for this is to open 3-4 psql connections, have one of
them drop the indexes and do the copy, in the other connections you can
already have typed "CREATE INDEX ..." so when the copy is done and
committed to the database, you just go to the other terminals and hit enter.

Unfortunately you'll have to use wall clock time to see if this is faster.

Though I think you could do the same thing with a bash script. The
authentication should be in "trust" mode so that you don't take the time
to type your password.

#!/bin/bash
psql -h <host> -c "DROP INDEX ...; COPY FROM ..."

psql -h <host> -c "CREATE INDEX ..." &
psql -h <host> -c "CREATE INDEX ..." &
psql -h <host> -c "CREATE INDEX ..." &
psql -h <host> -c "CREATE INDEX ..."


Now, I don't really know how to wait for all child processes in a bash
script (I could give you the python for it, but you're a perl guy). But
by not spawning the last INDEX, I'm hoping it takes longer than the
rest. Try to put the most difficult index there.

Then you could just run

time loadscript.sh

I'm sure you could do the equivalent in perl. Just open multiple
connections to the DB, and have them ready.

I'm guessing since you are on a dual processor machine, you won't get
much better performance above 2 connections.

You can also try doing 2 COPYs at the same time, but it seems like you
would have issues. Do you have any serial columns that you expect to be
in a certain order, or is all the information in the copy?

If the latter, try it, let us know what you get. I can't tell you the
perl for this, since I'm not a perl guy.

John
=:->


Attachment

pgsql-performance by date:

Previous
From: "Merlin Moncure"
Date:
Subject: Re: How can I make use of both CPUs in a dual processor
Next
From: Silke Trissl
Date:
Subject: Tell postgres which index to use?