Thread: How can I make use of both CPUs in a dual processor machine

How can I make use of both CPUs in a dual processor machine

From
Alex
Date:
Hi,
we just got a new dual processor machine and I wonder if there is a way
to utilize both processors.

Our DB server is basically fully dedicated to postgres. (its a dual amd
with 4gb mem.)

I have a batch job that periodically loads about 8 million records into
a table.
for this I drop the indices, truncate the table, use the copy to insert
the data, recreate the indices (4 indices), vacuum the table.

That is all done through a perl batch job.

While I am doing this, I noticed that only one CPU is really used.

So here are my questions:

Is there a way to utilize both CPUs

Is it possible to split up the import file and run 2 copy processes

Is it possible to create 2 indices at the same time

Would I actually gain anything from that, or is the bottleneck somewhere
else ?

(perl is a given here for the batch job)

If anyone has some experience or ideas... any hints or help on this
would be appreciated.

Thanks
Alex


Re: How can I make use of both CPUs in a dual processor

From
John A Meinel
Date:
Alex wrote:

> Hi,
> we just got a new dual processor machine and I wonder if there is a
> way to utilize both processors.
>
> Our DB server is basically fully dedicated to postgres. (its a dual
> amd with 4gb mem.)
>
> I have a batch job that periodically loads about 8 million records
> into a table.
> for this I drop the indices, truncate the table, use the copy to
> insert the data, recreate the indices (4 indices), vacuum the table.
>
> That is all done through a perl batch job.
>
> While I am doing this, I noticed that only one CPU is really used.
>
> So here are my questions:
>
> Is there a way to utilize both CPUs
>
For postgres, you get a max of 1 CPU per connection, so to use both, you
need 2 CPU's.

> Is it possible to split up the import file and run 2 copy processes
>
> Is it possible to create 2 indices at the same time
>
You'd want to be a little careful. Postgres uses work_mem for vacuum and
index creation, so if you have 2 processes doing it, just make sure you
aren't running out of RAM and going to swap.

> Would I actually gain anything from that, or is the bottleneck
> somewhere else ?
>
More likely, the bottleneck would be disk I/O. Simply because it is
almost always disk I/O. However, without knowing your configuration, how
much CPU is used during the operation, etc, it's hard to say.

> (perl is a given here for the batch job)
>
> If anyone has some experience or ideas... any hints or help on this
> would be appreciated.
>
> Thanks
> Alex
>
Sorry I wasn't a lot of help. You should probably post your postgres
version, and more information about how much CPU load there is while
your load is running.

John
=:->


Attachment

Re: How can I make use of both CPUs in a dual processor

From
Alex
Date:
Thanks John.

Well as I mentioned. I have a  Dual AMD Opteron 64 2.4ghz, 15k rpm SCSI
Disks, 4GB of memory.
Disks are pretty fast and memory should be more than enough. Currently
we dont have many concurrent connections.

I run PG 8.0.1 on Fedora Core 3

When I now run the batch job, one CPU runs in the 80-90% the other in
5-10% max.






John A Meinel wrote:

> Alex wrote:
>
>> Hi,
>> we just got a new dual processor machine and I wonder if there is a
>> way to utilize both processors.
>>
>> Our DB server is basically fully dedicated to postgres. (its a dual
>> amd with 4gb mem.)
>>
>> I have a batch job that periodically loads about 8 million records
>> into a table.
>> for this I drop the indices, truncate the table, use the copy to
>> insert the data, recreate the indices (4 indices), vacuum the table.
>>
>> That is all done through a perl batch job.
>>
>> While I am doing this, I noticed that only one CPU is really used.
>>
>> So here are my questions:
>>
>> Is there a way to utilize both CPUs
>>
> For postgres, you get a max of 1 CPU per connection, so to use both,
> you need 2 CPU's.
>
>> Is it possible to split up the import file and run 2 copy processes
>>
>> Is it possible to create 2 indices at the same time
>>
> You'd want to be a little careful. Postgres uses work_mem for vacuum
> and index creation, so if you have 2 processes doing it, just make
> sure you aren't running out of RAM and going to swap.
>
>> Would I actually gain anything from that, or is the bottleneck
>> somewhere else ?
>>
> More likely, the bottleneck would be disk I/O. Simply because it is
> almost always disk I/O. However, without knowing your configuration,
> how much CPU is used during the operation, etc, it's hard to say.
>
>> (perl is a given here for the batch job)
>>
>> If anyone has some experience or ideas... any hints or help on this
>> would be appreciated.
>>
>> Thanks
>> Alex
>>
> Sorry I wasn't a lot of help. You should probably post your postgres
> version, and more information about how much CPU load there is while
> your load is running.
>
> John
> =:->
>



Re: How can I make use of both CPUs in a dual processor

From
John A Meinel
Date:
Alex wrote:

> Thanks John.
>
> Well as I mentioned. I have a  Dual AMD Opteron 64 2.4ghz, 15k rpm
> SCSI Disks, 4GB of memory.
> Disks are pretty fast and memory should be more than enough. Currently
> we dont have many concurrent connections.
>
Well, you didn't mention Opteron before (it makes a difference against
Xeons).
How many disks and in what configuration?
Do you have pg_xlog on a separate set of disks?
Are your drives in RAID 10 (0+1) or RAID 5?

If you have enough disks the recommended configuration is at least a
RAID1 for the OS, RAID 10 for pg_xlog (4drives), and RAID 10 (the rest
of the drives) for the actual data.

If your dataset is read heavy, or you have more than 6 disks, you can
get away with RAID 5 for the actual data. But since you are talking
about loading 8million rows at once, it certainly sounds like you are
write heavy.

If you only have a few disks, it's still probably better to put pg_xlog
on it's own RAID1 (2-drive) mirror. pg_xlog is pretty much append only,
so if you dedicate a disk set to it, you eliminate a lot of seek times.

> I run PG 8.0.1 on Fedora Core 3
>
> When I now run the batch job, one CPU runs in the 80-90% the other in
> 5-10% max.


Anyway, it doesn't completely sound like you are CPU limited, but you
might be able to get a little bit more if you spawn another process.
Have you tried dropping the index, doing the copy, and then recreating
the 4-indexes in separate processes?

The simple test for this is to open 3-4 psql connections, have one of
them drop the indexes and do the copy, in the other connections you can
already have typed "CREATE INDEX ..." so when the copy is done and
committed to the database, you just go to the other terminals and hit enter.

Unfortunately you'll have to use wall clock time to see if this is faster.

Though I think you could do the same thing with a bash script. The
authentication should be in "trust" mode so that you don't take the time
to type your password.

#!/bin/bash
psql -h <host> -c "DROP INDEX ...; COPY FROM ..."

psql -h <host> -c "CREATE INDEX ..." &
psql -h <host> -c "CREATE INDEX ..." &
psql -h <host> -c "CREATE INDEX ..." &
psql -h <host> -c "CREATE INDEX ..."


Now, I don't really know how to wait for all child processes in a bash
script (I could give you the python for it, but you're a perl guy). But
by not spawning the last INDEX, I'm hoping it takes longer than the
rest. Try to put the most difficult index there.

Then you could just run

time loadscript.sh

I'm sure you could do the equivalent in perl. Just open multiple
connections to the DB, and have them ready.

I'm guessing since you are on a dual processor machine, you won't get
much better performance above 2 connections.

You can also try doing 2 COPYs at the same time, but it seems like you
would have issues. Do you have any serial columns that you expect to be
in a certain order, or is all the information in the copy?

If the latter, try it, let us know what you get. I can't tell you the
perl for this, since I'm not a perl guy.

John
=:->


Attachment

Re: How can I make use of both CPUs in a dual processor

From
"Iain"
Date:
You can wait for processes to finish as follows:

#launch 3 processes
sh -c './build_indexes1.sh'  & PID1=$!
sh -c './build_indexes2.sh'  & PID2=$!
sh -c './build_indexes3.sh'  & PID3=$!
# then
wait $PID1
wait $PID2
wait $PID3
#continue

My feeling is that doing so should generally reduce the overall processing
time, but if there are contention problems then it could conceivably get
much worse.

regards
Iain
----- Original Message -----
From: "Alex" <alex@meerkatsoft.com>
To: "John A Meinel" <john@arbash-meinel.com>
Cc: <pgsql-performance@postgresql.org>
Sent: Thursday, February 10, 2005 12:00 AM
Subject: Re: [PERFORM] How can I make use of both CPUs in a dual processor


> Thanks John.
>
> Well as I mentioned. I have a  Dual AMD Opteron 64 2.4ghz, 15k rpm SCSI
> Disks, 4GB of memory.
> Disks are pretty fast and memory should be more than enough. Currently we
> dont have many concurrent connections.
>
> I run PG 8.0.1 on Fedora Core 3
>
> When I now run the batch job, one CPU runs in the 80-90% the other in
> 5-10% max.
>
>
>
>
>
>
> John A Meinel wrote:
>
>> Alex wrote:
>>
>>> Hi,
>>> we just got a new dual processor machine and I wonder if there is a way
>>> to utilize both processors.
>>>
>>> Our DB server is basically fully dedicated to postgres. (its a dual amd
>>> with 4gb mem.)
>>>
>>> I have a batch job that periodically loads about 8 million records into
>>> a table.
>>> for this I drop the indices, truncate the table, use the copy to insert
>>> the data, recreate the indices (4 indices), vacuum the table.
>>>
>>> That is all done through a perl batch job.
>>>
>>> While I am doing this, I noticed that only one CPU is really used.
>>>
>>> So here are my questions:
>>>
>>> Is there a way to utilize both CPUs
>>>
>> For postgres, you get a max of 1 CPU per connection, so to use both, you
>> need 2 CPU's.
>>
>>> Is it possible to split up the import file and run 2 copy processes
>>>
>>> Is it possible to create 2 indices at the same time
>>>
>> You'd want to be a little careful. Postgres uses work_mem for vacuum and
>> index creation, so if you have 2 processes doing it, just make sure you
>> aren't running out of RAM and going to swap.
>>
>>> Would I actually gain anything from that, or is the bottleneck somewhere
>>> else ?
>>>
>> More likely, the bottleneck would be disk I/O. Simply because it is
>> almost always disk I/O. However, without knowing your configuration, how
>> much CPU is used during the operation, etc, it's hard to say.
>>
>>> (perl is a given here for the batch job)
>>>
>>> If anyone has some experience or ideas... any hints or help on this
>>> would be appreciated.
>>>
>>> Thanks
>>> Alex
>>>
>> Sorry I wasn't a lot of help. You should probably post your postgres
>> version, and more information about how much CPU load there is while your
>> load is running.
>>
>> John
>> =:->
>>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend


Re: How can I make use of both CPUs in a dual processor

From
Alex
Date:
Thanks for all the suggestions. It seems that creating indices, or even
import data using a copy is easy to implement. I also have some jobs
that create reports and want to try if I gain anything if i work reports
in parallel.

will give it a try in the next week and let you know the resuls.

Alex

John A Meinel wrote:

> Alex wrote:
>
>> Thanks John.
>>
>> Well as I mentioned. I have a  Dual AMD Opteron 64 2.4ghz, 15k rpm
>> SCSI Disks, 4GB of memory.
>> Disks are pretty fast and memory should be more than enough.
>> Currently we dont have many concurrent connections.
>>
> Well, you didn't mention Opteron before (it makes a difference against
> Xeons).
> How many disks and in what configuration?
> Do you have pg_xlog on a separate set of disks?
> Are your drives in RAID 10 (0+1) or RAID 5?
>
> If you have enough disks the recommended configuration is at least a
> RAID1 for the OS, RAID 10 for pg_xlog (4drives), and RAID 10 (the rest
> of the drives) for the actual data.
>
> If your dataset is read heavy, or you have more than 6 disks, you can
> get away with RAID 5 for the actual data. But since you are talking
> about loading 8million rows at once, it certainly sounds like you are
> write heavy.
>
> If you only have a few disks, it's still probably better to put
> pg_xlog on it's own RAID1 (2-drive) mirror. pg_xlog is pretty much
> append only, so if you dedicate a disk set to it, you eliminate a lot
> of seek times.
>
>> I run PG 8.0.1 on Fedora Core 3
>>
>> When I now run the batch job, one CPU runs in the 80-90% the other in
>> 5-10% max.
>
>
>
> Anyway, it doesn't completely sound like you are CPU limited, but you
> might be able to get a little bit more if you spawn another process.
> Have you tried dropping the index, doing the copy, and then recreating
> the 4-indexes in separate processes?
>
> The simple test for this is to open 3-4 psql connections, have one of
> them drop the indexes and do the copy, in the other connections you
> can already have typed "CREATE INDEX ..." so when the copy is done and
> committed to the database, you just go to the other terminals and hit
> enter.
>
> Unfortunately you'll have to use wall clock time to see if this is
> faster.
>
> Though I think you could do the same thing with a bash script. The
> authentication should be in "trust" mode so that you don't take the
> time to type your password.
>
> #!/bin/bash
> psql -h <host> -c "DROP INDEX ...; COPY FROM ..."
>
> psql -h <host> -c "CREATE INDEX ..." &
> psql -h <host> -c "CREATE INDEX ..." &
> psql -h <host> -c "CREATE INDEX ..." &
> psql -h <host> -c "CREATE INDEX ..."
>
>
> Now, I don't really know how to wait for all child processes in a bash
> script (I could give you the python for it, but you're a perl guy).
> But by not spawning the last INDEX, I'm hoping it takes longer than
> the rest. Try to put the most difficult index there.
>
> Then you could just run
>
> time loadscript.sh
>
> I'm sure you could do the equivalent in perl. Just open multiple
> connections to the DB, and have them ready.
>
> I'm guessing since you are on a dual processor machine, you won't get
> much better performance above 2 connections.
>
> You can also try doing 2 COPYs at the same time, but it seems like you
> would have issues. Do you have any serial columns that you expect to
> be in a certain order, or is all the information in the copy?
>
> If the latter, try it, let us know what you get. I can't tell you the
> perl for this, since I'm not a perl guy.
>
> John
> =:->
>