Thread: How can I make use of both CPUs in a dual processor machine
Hi, we just got a new dual processor machine and I wonder if there is a way to utilize both processors. Our DB server is basically fully dedicated to postgres. (its a dual amd with 4gb mem.) I have a batch job that periodically loads about 8 million records into a table. for this I drop the indices, truncate the table, use the copy to insert the data, recreate the indices (4 indices), vacuum the table. That is all done through a perl batch job. While I am doing this, I noticed that only one CPU is really used. So here are my questions: Is there a way to utilize both CPUs Is it possible to split up the import file and run 2 copy processes Is it possible to create 2 indices at the same time Would I actually gain anything from that, or is the bottleneck somewhere else ? (perl is a given here for the batch job) If anyone has some experience or ideas... any hints or help on this would be appreciated. Thanks Alex
Alex wrote: > Hi, > we just got a new dual processor machine and I wonder if there is a > way to utilize both processors. > > Our DB server is basically fully dedicated to postgres. (its a dual > amd with 4gb mem.) > > I have a batch job that periodically loads about 8 million records > into a table. > for this I drop the indices, truncate the table, use the copy to > insert the data, recreate the indices (4 indices), vacuum the table. > > That is all done through a perl batch job. > > While I am doing this, I noticed that only one CPU is really used. > > So here are my questions: > > Is there a way to utilize both CPUs > For postgres, you get a max of 1 CPU per connection, so to use both, you need 2 CPU's. > Is it possible to split up the import file and run 2 copy processes > > Is it possible to create 2 indices at the same time > You'd want to be a little careful. Postgres uses work_mem for vacuum and index creation, so if you have 2 processes doing it, just make sure you aren't running out of RAM and going to swap. > Would I actually gain anything from that, or is the bottleneck > somewhere else ? > More likely, the bottleneck would be disk I/O. Simply because it is almost always disk I/O. However, without knowing your configuration, how much CPU is used during the operation, etc, it's hard to say. > (perl is a given here for the batch job) > > If anyone has some experience or ideas... any hints or help on this > would be appreciated. > > Thanks > Alex > Sorry I wasn't a lot of help. You should probably post your postgres version, and more information about how much CPU load there is while your load is running. John =:->
Attachment
Thanks John. Well as I mentioned. I have a Dual AMD Opteron 64 2.4ghz, 15k rpm SCSI Disks, 4GB of memory. Disks are pretty fast and memory should be more than enough. Currently we dont have many concurrent connections. I run PG 8.0.1 on Fedora Core 3 When I now run the batch job, one CPU runs in the 80-90% the other in 5-10% max. John A Meinel wrote: > Alex wrote: > >> Hi, >> we just got a new dual processor machine and I wonder if there is a >> way to utilize both processors. >> >> Our DB server is basically fully dedicated to postgres. (its a dual >> amd with 4gb mem.) >> >> I have a batch job that periodically loads about 8 million records >> into a table. >> for this I drop the indices, truncate the table, use the copy to >> insert the data, recreate the indices (4 indices), vacuum the table. >> >> That is all done through a perl batch job. >> >> While I am doing this, I noticed that only one CPU is really used. >> >> So here are my questions: >> >> Is there a way to utilize both CPUs >> > For postgres, you get a max of 1 CPU per connection, so to use both, > you need 2 CPU's. > >> Is it possible to split up the import file and run 2 copy processes >> >> Is it possible to create 2 indices at the same time >> > You'd want to be a little careful. Postgres uses work_mem for vacuum > and index creation, so if you have 2 processes doing it, just make > sure you aren't running out of RAM and going to swap. > >> Would I actually gain anything from that, or is the bottleneck >> somewhere else ? >> > More likely, the bottleneck would be disk I/O. Simply because it is > almost always disk I/O. However, without knowing your configuration, > how much CPU is used during the operation, etc, it's hard to say. > >> (perl is a given here for the batch job) >> >> If anyone has some experience or ideas... any hints or help on this >> would be appreciated. >> >> Thanks >> Alex >> > Sorry I wasn't a lot of help. You should probably post your postgres > version, and more information about how much CPU load there is while > your load is running. > > John > =:-> >
Alex wrote: > Thanks John. > > Well as I mentioned. I have a Dual AMD Opteron 64 2.4ghz, 15k rpm > SCSI Disks, 4GB of memory. > Disks are pretty fast and memory should be more than enough. Currently > we dont have many concurrent connections. > Well, you didn't mention Opteron before (it makes a difference against Xeons). How many disks and in what configuration? Do you have pg_xlog on a separate set of disks? Are your drives in RAID 10 (0+1) or RAID 5? If you have enough disks the recommended configuration is at least a RAID1 for the OS, RAID 10 for pg_xlog (4drives), and RAID 10 (the rest of the drives) for the actual data. If your dataset is read heavy, or you have more than 6 disks, you can get away with RAID 5 for the actual data. But since you are talking about loading 8million rows at once, it certainly sounds like you are write heavy. If you only have a few disks, it's still probably better to put pg_xlog on it's own RAID1 (2-drive) mirror. pg_xlog is pretty much append only, so if you dedicate a disk set to it, you eliminate a lot of seek times. > I run PG 8.0.1 on Fedora Core 3 > > When I now run the batch job, one CPU runs in the 80-90% the other in > 5-10% max. Anyway, it doesn't completely sound like you are CPU limited, but you might be able to get a little bit more if you spawn another process. Have you tried dropping the index, doing the copy, and then recreating the 4-indexes in separate processes? The simple test for this is to open 3-4 psql connections, have one of them drop the indexes and do the copy, in the other connections you can already have typed "CREATE INDEX ..." so when the copy is done and committed to the database, you just go to the other terminals and hit enter. Unfortunately you'll have to use wall clock time to see if this is faster. Though I think you could do the same thing with a bash script. The authentication should be in "trust" mode so that you don't take the time to type your password. #!/bin/bash psql -h <host> -c "DROP INDEX ...; COPY FROM ..." psql -h <host> -c "CREATE INDEX ..." & psql -h <host> -c "CREATE INDEX ..." & psql -h <host> -c "CREATE INDEX ..." & psql -h <host> -c "CREATE INDEX ..." Now, I don't really know how to wait for all child processes in a bash script (I could give you the python for it, but you're a perl guy). But by not spawning the last INDEX, I'm hoping it takes longer than the rest. Try to put the most difficult index there. Then you could just run time loadscript.sh I'm sure you could do the equivalent in perl. Just open multiple connections to the DB, and have them ready. I'm guessing since you are on a dual processor machine, you won't get much better performance above 2 connections. You can also try doing 2 COPYs at the same time, but it seems like you would have issues. Do you have any serial columns that you expect to be in a certain order, or is all the information in the copy? If the latter, try it, let us know what you get. I can't tell you the perl for this, since I'm not a perl guy. John =:->
Attachment
You can wait for processes to finish as follows: #launch 3 processes sh -c './build_indexes1.sh' & PID1=$! sh -c './build_indexes2.sh' & PID2=$! sh -c './build_indexes3.sh' & PID3=$! # then wait $PID1 wait $PID2 wait $PID3 #continue My feeling is that doing so should generally reduce the overall processing time, but if there are contention problems then it could conceivably get much worse. regards Iain ----- Original Message ----- From: "Alex" <alex@meerkatsoft.com> To: "John A Meinel" <john@arbash-meinel.com> Cc: <pgsql-performance@postgresql.org> Sent: Thursday, February 10, 2005 12:00 AM Subject: Re: [PERFORM] How can I make use of both CPUs in a dual processor > Thanks John. > > Well as I mentioned. I have a Dual AMD Opteron 64 2.4ghz, 15k rpm SCSI > Disks, 4GB of memory. > Disks are pretty fast and memory should be more than enough. Currently we > dont have many concurrent connections. > > I run PG 8.0.1 on Fedora Core 3 > > When I now run the batch job, one CPU runs in the 80-90% the other in > 5-10% max. > > > > > > > John A Meinel wrote: > >> Alex wrote: >> >>> Hi, >>> we just got a new dual processor machine and I wonder if there is a way >>> to utilize both processors. >>> >>> Our DB server is basically fully dedicated to postgres. (its a dual amd >>> with 4gb mem.) >>> >>> I have a batch job that periodically loads about 8 million records into >>> a table. >>> for this I drop the indices, truncate the table, use the copy to insert >>> the data, recreate the indices (4 indices), vacuum the table. >>> >>> That is all done through a perl batch job. >>> >>> While I am doing this, I noticed that only one CPU is really used. >>> >>> So here are my questions: >>> >>> Is there a way to utilize both CPUs >>> >> For postgres, you get a max of 1 CPU per connection, so to use both, you >> need 2 CPU's. >> >>> Is it possible to split up the import file and run 2 copy processes >>> >>> Is it possible to create 2 indices at the same time >>> >> You'd want to be a little careful. Postgres uses work_mem for vacuum and >> index creation, so if you have 2 processes doing it, just make sure you >> aren't running out of RAM and going to swap. >> >>> Would I actually gain anything from that, or is the bottleneck somewhere >>> else ? >>> >> More likely, the bottleneck would be disk I/O. Simply because it is >> almost always disk I/O. However, without knowing your configuration, how >> much CPU is used during the operation, etc, it's hard to say. >> >>> (perl is a given here for the batch job) >>> >>> If anyone has some experience or ideas... any hints or help on this >>> would be appreciated. >>> >>> Thanks >>> Alex >>> >> Sorry I wasn't a lot of help. You should probably post your postgres >> version, and more information about how much CPU load there is while your >> load is running. >> >> John >> =:-> >> > > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend
Thanks for all the suggestions. It seems that creating indices, or even import data using a copy is easy to implement. I also have some jobs that create reports and want to try if I gain anything if i work reports in parallel. will give it a try in the next week and let you know the resuls. Alex John A Meinel wrote: > Alex wrote: > >> Thanks John. >> >> Well as I mentioned. I have a Dual AMD Opteron 64 2.4ghz, 15k rpm >> SCSI Disks, 4GB of memory. >> Disks are pretty fast and memory should be more than enough. >> Currently we dont have many concurrent connections. >> > Well, you didn't mention Opteron before (it makes a difference against > Xeons). > How many disks and in what configuration? > Do you have pg_xlog on a separate set of disks? > Are your drives in RAID 10 (0+1) or RAID 5? > > If you have enough disks the recommended configuration is at least a > RAID1 for the OS, RAID 10 for pg_xlog (4drives), and RAID 10 (the rest > of the drives) for the actual data. > > If your dataset is read heavy, or you have more than 6 disks, you can > get away with RAID 5 for the actual data. But since you are talking > about loading 8million rows at once, it certainly sounds like you are > write heavy. > > If you only have a few disks, it's still probably better to put > pg_xlog on it's own RAID1 (2-drive) mirror. pg_xlog is pretty much > append only, so if you dedicate a disk set to it, you eliminate a lot > of seek times. > >> I run PG 8.0.1 on Fedora Core 3 >> >> When I now run the batch job, one CPU runs in the 80-90% the other in >> 5-10% max. > > > > Anyway, it doesn't completely sound like you are CPU limited, but you > might be able to get a little bit more if you spawn another process. > Have you tried dropping the index, doing the copy, and then recreating > the 4-indexes in separate processes? > > The simple test for this is to open 3-4 psql connections, have one of > them drop the indexes and do the copy, in the other connections you > can already have typed "CREATE INDEX ..." so when the copy is done and > committed to the database, you just go to the other terminals and hit > enter. > > Unfortunately you'll have to use wall clock time to see if this is > faster. > > Though I think you could do the same thing with a bash script. The > authentication should be in "trust" mode so that you don't take the > time to type your password. > > #!/bin/bash > psql -h <host> -c "DROP INDEX ...; COPY FROM ..." > > psql -h <host> -c "CREATE INDEX ..." & > psql -h <host> -c "CREATE INDEX ..." & > psql -h <host> -c "CREATE INDEX ..." & > psql -h <host> -c "CREATE INDEX ..." > > > Now, I don't really know how to wait for all child processes in a bash > script (I could give you the python for it, but you're a perl guy). > But by not spawning the last INDEX, I'm hoping it takes longer than > the rest. Try to put the most difficult index there. > > Then you could just run > > time loadscript.sh > > I'm sure you could do the equivalent in perl. Just open multiple > connections to the DB, and have them ready. > > I'm guessing since you are on a dual processor machine, you won't get > much better performance above 2 connections. > > You can also try doing 2 COPYs at the same time, but it seems like you > would have issues. Do you have any serial columns that you expect to > be in a certain order, or is all the information in the copy? > > If the latter, try it, let us know what you get. I can't tell you the > perl for this, since I'm not a perl guy. > > John > =:-> >