Re: How can I make use of both CPUs in a dual processor - Mailing list pgsql-performance
From | Alex |
---|---|
Subject | Re: How can I make use of both CPUs in a dual processor |
Date | |
Msg-id | 420B5F22.2070407@meerkatsoft.com Whole thread Raw |
In response to | Re: How can I make use of both CPUs in a dual processor (John A Meinel <john@arbash-meinel.com>) |
List | pgsql-performance |
Thanks for all the suggestions. It seems that creating indices, or even import data using a copy is easy to implement. I also have some jobs that create reports and want to try if I gain anything if i work reports in parallel. will give it a try in the next week and let you know the resuls. Alex John A Meinel wrote: > Alex wrote: > >> Thanks John. >> >> Well as I mentioned. I have a Dual AMD Opteron 64 2.4ghz, 15k rpm >> SCSI Disks, 4GB of memory. >> Disks are pretty fast and memory should be more than enough. >> Currently we dont have many concurrent connections. >> > Well, you didn't mention Opteron before (it makes a difference against > Xeons). > How many disks and in what configuration? > Do you have pg_xlog on a separate set of disks? > Are your drives in RAID 10 (0+1) or RAID 5? > > If you have enough disks the recommended configuration is at least a > RAID1 for the OS, RAID 10 for pg_xlog (4drives), and RAID 10 (the rest > of the drives) for the actual data. > > If your dataset is read heavy, or you have more than 6 disks, you can > get away with RAID 5 for the actual data. But since you are talking > about loading 8million rows at once, it certainly sounds like you are > write heavy. > > If you only have a few disks, it's still probably better to put > pg_xlog on it's own RAID1 (2-drive) mirror. pg_xlog is pretty much > append only, so if you dedicate a disk set to it, you eliminate a lot > of seek times. > >> I run PG 8.0.1 on Fedora Core 3 >> >> When I now run the batch job, one CPU runs in the 80-90% the other in >> 5-10% max. > > > > Anyway, it doesn't completely sound like you are CPU limited, but you > might be able to get a little bit more if you spawn another process. > Have you tried dropping the index, doing the copy, and then recreating > the 4-indexes in separate processes? > > The simple test for this is to open 3-4 psql connections, have one of > them drop the indexes and do the copy, in the other connections you > can already have typed "CREATE INDEX ..." so when the copy is done and > committed to the database, you just go to the other terminals and hit > enter. > > Unfortunately you'll have to use wall clock time to see if this is > faster. > > Though I think you could do the same thing with a bash script. The > authentication should be in "trust" mode so that you don't take the > time to type your password. > > #!/bin/bash > psql -h <host> -c "DROP INDEX ...; COPY FROM ..." > > psql -h <host> -c "CREATE INDEX ..." & > psql -h <host> -c "CREATE INDEX ..." & > psql -h <host> -c "CREATE INDEX ..." & > psql -h <host> -c "CREATE INDEX ..." > > > Now, I don't really know how to wait for all child processes in a bash > script (I could give you the python for it, but you're a perl guy). > But by not spawning the last INDEX, I'm hoping it takes longer than > the rest. Try to put the most difficult index there. > > Then you could just run > > time loadscript.sh > > I'm sure you could do the equivalent in perl. Just open multiple > connections to the DB, and have them ready. > > I'm guessing since you are on a dual processor machine, you won't get > much better performance above 2 connections. > > You can also try doing 2 COPYs at the same time, but it seems like you > would have issues. Do you have any serial columns that you expect to > be in a certain order, or is all the information in the copy? > > If the latter, try it, let us know what you get. I can't tell you the > perl for this, since I'm not a perl guy. > > John > =:-> >
pgsql-performance by date: