Thread: Performance tuning in PostgreSQL
Hello everybody, Can someone tell me the various aspects of Hardware for performance tuning in PostgreSQL 7.3 on a RHL 8 server. How can I determine the required hardware configuration(CPUs,RAM,Shared Buffer Space,OS TCP/IP ports) for n number of connections or n number of users accessing the database? Please help. Thanks, Best Regards. Abhishek Sharma,
> Can someone tell me the various aspects of Hardware for performance tuning > in PostgreSQL 7.3 on a RHL 8 server. > > How can I determine the required hardware configuration(CPUs,RAM,Shared > Buffer Space,OS TCP/IP ports) for n number of connections or n number of > users accessing the database? The bad news is there's only one way and that's to test. Without knowing how much data you have, in what tables and how you are querying/updating it you really can't say much useful. A few points though. CPU: PostgreSQL will use multiple CPUs but one query runs on only one CPU. RAM: more is better, pretty much always Disk: fast and see the archives for details on placing the WAL on its own disk. Tuning: there's a performance mailing list, and some stuff on techdocs.postgresql.org Hope that's some use - Richard Huxton
Hi, Tx for your reply. Is there any way to determine the size of a process per connection. What is it dependent on ? What variables affect the size of a process by a user accessing a table in the database ? Postgresql can be configured with a --max-backends options which means the no. of connections which can be established at any given time to the server,which also means that there will be an equal no. of process. In my opinion there should be some computation with regards to the amount of RAM or shared buffer space and the no. of processes and size of these processes. I am using 256 MB Ram on a RHL 8 system. I have made the shared buffer space 190 MB i.e.. 75 % of my system. My database is going to be almost 80 GB. Advice needed. Tx, Best regards, Abhishek. -----Original Message----- From: dev@archonet.com [mailto:dev@archonet.com] Sent: Tuesday, March 25, 2003 11:54 PM To: Abhishek Sharma Cc: Pgsql-General Subject: Re: [GENERAL] Performance tuning in PostgreSQL > Can someone tell me the various aspects of Hardware for performance tuning > in PostgreSQL 7.3 on a RHL 8 server. > > How can I determine the required hardware configuration(CPUs,RAM,Shared > Buffer Space,OS TCP/IP ports) for n number of connections or n number of > users accessing the database? The bad news is there's only one way and that's to test. Without knowing how much data you have, in what tables and how you are querying/updating it you really can't say much useful. A few points though. CPU: PostgreSQL will use multiple CPUs but one query runs on only one CPU. RAM: more is better, pretty much always Disk: fast and see the archives for details on placing the WAL on its own disk. Tuning: there's a performance mailing list, and some stuff on techdocs.postgresql.org Hope that's some use - Richard Huxton
On Wed, 26 Mar 2003, Abhishek Sharma wrote: > Is there any way to determine the size of a process per connection. The delta of a new process is very small. Since most of the memory it will be accessing will be shared memory, and since most flavors of unix just run the code in the same place as the other copies of it, the only delta would be whatever small memory the individual process needs for stack and local vars. It's not much. 23234 postgres 12 0 66676 65M 62192 D 49.4 4.3 0:04 postmaster 23149 postgres 8 0 4216 4216 4136 S 0.0 0.2 1:15 postmaster 23150 postgres 9 0 5196 5196 4156 S 0.0 0.3 0:11 postmaster 23151 postgres 9 0 4780 4780 4148 S 0.0 0.3 0:08 postmaster 3665 postgres 9 0 10096 9.8M 9024 S 0.0 0.6 0:09 postmaster 3666 postgres 9 0 10764 10M 9700 S 0.0 0.6 0:21 postmaster Here we see my box's postmasters that are up and running, some are for psql, some are for web pages, one is my running an insane 'select * from bigtable order by random()' The first number after the 0 column is size, the third one is the shared memory it's using. Notice the delta on these is 1 to 4 megabytes or so. The query doing the heavy lifting is about 4 megs, the others are about 1 meg deltas. > What is it dependent on ? EVERYTHING. I.e. what the backend is doing will determine the amount of memory it is using. Plus the settings in postgresql.conf for things like sort_mem and buffers. > What variables affect the size of a process by a user accessing a table in > the database ? > > Postgresql can be configured with a --max-backends options which means the > no. of connections which can be established at any given time to the > server,which also means that there will be an equal no. of process. > > In my opinion there should be some computation with regards to the amount of > RAM or shared buffer space and the no. of processes and size of these > processes. Well, it's not that simple. After experimenting with postgresql, if you feel you have a handle on how to compute it, I'm sure it would gladly accepted as a useful tool by all of the folks who use postgresql. > I am using 256 MB Ram on a RHL 8 system. I have made the shared buffer space > 190 MB i.e.. 75 % of my system. My database is going to be almost 80 GB. Wow, that is seriously underpowered in the RAM department. My 3 year old server that handles web/database/ldap services has 1.5Gig and uses about 600 Megs of ram for programs with about 800+Megs for kernel cache and buffer. Also, it's not a good idea to just give postgresql huge amounts of buffer memory. Generally, the kernel is better at buffering the disk than postgresql is, and cranking up postgresql's buffers to >50% of the available RAM means that the kernel will always be playing catch up with it's buffers, and postgresql will be the only layer buffering. While we're at it, don't set sort_mem real high either, especially if you plan on handling lots of users at the same time, as each sort is limited to sort_mem, which means that a query with three sorts in it could use 3*sort_mem memory, and if that query gets run by a dozen people at once, then you'd be looking at 3*12*sort_mem usage. 8 meg is a good intermediate setting for sort_mem for most folks. Recommendations: Go to at least 1 Gig of ram. Give 256 Megs or so to postgresql buffers. Anything after that is likely to not make for any faster performance. If you can fit in more than 1 gig then do so. Memory is your biggest limitation right now. Use a fast RAID array setup. Lots of disks in a RAID 5 is a good compromise of performance and storage space. Large RAID 0 setups are the fastest, but a single drive failure can result in all your data being lost. >2 disks in a RAID 1 is a good setup for something that is mostly read (95% reads or more). Think data warehouse. The problem is that you're asking how to optimize postgresql but how you optimize any database is greatly affected by the type of load you're looking at. If you run batch files at night on 10G data files, then serve them read only during the day, your needs are going to be vastly different than if you are running a dynamic multi-user system with lots of writes going on. But you can't go wrong with more RAM. How much can your server hold? Memory bandwidth is more important than CPU speed for most postgresql applications, and the same is true for the drives, having lots of little fast drives is way better than one or two big slower ones. SCSI is almost always faster than IDE, all other things being equal (i.e. my 80 gig IDE "mass storage" drives are way faster than a 2 Gig Ultra Wide SCSI drive from 6 years ago would be, but any modern SCSI drive will kick the butt on my IDE drives.
<snip> > SCSI is almost > always faster than IDE, all other things being equal (i.e. my 80 gig IDE > "mass storage" drives are way faster than a 2 Gig Ultra Wide SCSI drive > from 6 years ago would be, but any modern SCSI drive will kick the butt on > my IDE drives. </snip> That's not /entirely/ true. There was an article on slashdot a while back about what exactly the differences between IDE and SCSI are. IDE has pretty much almost caught up to SCSI in terms of everything except testing -- i.e. one of the reasons SCSI drives cost so much more is that they are each run through extensive individual tests to make sure they're not gonna break down 5 minutes out of the box. The only other difference, if I remember correctly, was the amount of drives you could put on the same cable. I'm going out on a limb here, but while ATA133 or whatever you're running /needs/ a single cable and controller to itself SCSI can put several drives on the same cable while maintaining speed. So the good news is that if money is tight you could probably justify an IDE raid, but if you really need that extra reliability SCSI might be the answer. -- Daniel R. Anderson Chief Lab Rat and Helper Monkey Great Lakes Industries, Inc. 80 Pineview Ave. Buffalo, NY 14218 (716) 691-5900 x218 "Never let your schooling interfere with your education" -- Mark Twain
In General, the rotational speed is higher on SCSCI disks, and this increases the tranfer rate from the disc, which is the limitation for anything not in the disk's cache. Given the same areal dinsity, a 15,000 SCSI drive will be 50% faster in tranfer rate than a 10,000 IDE drive. Daniel R. Anderson wrote: > <snip> > >>SCSI is almost >>always faster than IDE, all other things being equal (i.e. my 80 gig IDE >>"mass storage" drives are way faster than a 2 Gig Ultra Wide SCSI drive >>from 6 years ago would be, but any modern SCSI drive will kick the butt on >>my IDE drives. > > </snip> > > That's not /entirely/ true. There was an article on slashdot a while > back about what exactly the differences between IDE and SCSI are. IDE > has pretty much almost caught up to SCSI in terms of everything except > testing -- i.e. one of the reasons SCSI drives cost so much more is that > they are each run through extensive individual tests to make sure > they're not gonna break down 5 minutes out of the box. > > The only other difference, if I remember correctly, was the amount of > drives you could put on the same cable. I'm going out on a limb here, > but while ATA133 or whatever you're running /needs/ a single cable and > controller to itself SCSI can put several drives on the same cable while > maintaining speed. > > So the good news is that if money is tight you could probably justify an > IDE raid, but if you really need that extra reliability SCSI might be > the answer.
On Wed, 2003-03-26 at 12:56, Dennis Gearon wrote: > In General, the rotational speed is higher on SCSCI disks, and this increases > the tranfer rate from the disc, which is the limitation for anything not in the > disk's cache. Given the same areal dinsity, a 15,000 SCSI drive will be 50% > faster in tranfer rate than a 10,000 IDE drive. For anybody interested I got the story off of slashdot: http://slashdot.org/article.pl?sid=03/02/21/0553249&mode=thread&tid=137 The claim is that these ATA drives have "SCSI-like specs at 30% less of the price". SCSI-LIKE != SCSI though. :-( -- Daniel R. Anderson Chief Lab Rat and Helper Monkey Great Lakes Industries, Inc. 80 Pineview Ave. Buffalo, NY 14218 (716) 691-5900 x218 "Never let your schooling interfere with your education" -- Mark Twain
In case you're interested in a do it yourself Fiber channel array check out: http://slashdot.org/article.pl?sid=03/03/16/0449201&mode=thread&tid=188&tid=137 (Fiber Channel means up to 10 Gbps!!!!!) -- Daniel R. Anderson Chief Lab Rat and Helper Monkey Great Lakes Industries, Inc. 80 Pineview Ave. Buffalo, NY 14218 (716) 691-5900 x218 "Never let your schooling interfere with your education" -- Mark Twain
I read a recent article on Tom's hardware that said, even given the same speed of data from the read heads and the same buffer size, a SCSI drive will work better for a server, and the IDE drive will work better for the desktop. The caching algorithms are optimised with the assumption that a SCSI drive will BE on a server and an IDE drive will BE on a desktop. Daniel R. Anderson wrote: > On Wed, 2003-03-26 at 12:56, Dennis Gearon wrote: > >>In General, the rotational speed is higher on SCSCI disks, and this increases >>the tranfer rate from the disc, which is the limitation for anything not in the >>disk's cache. Given the same areal dinsity, a 15,000 SCSI drive will be 50% >>faster in tranfer rate than a 10,000 IDE drive. > > > For anybody interested I got the story off of slashdot: > > http://slashdot.org/article.pl?sid=03/02/21/0553249&mode=thread&tid=137 > > The claim is that these ATA drives have "SCSI-like specs at 30% less of > the price". SCSI-LIKE != SCSI though. :-( >
I've read something similar indeed. Aswell as most IDE disks really do have less quality components. Scsi disks have also smaller, thicker platters to enable them to run more stable and on higher speeds, larger magnets etc etc. For those of you who can read Dutch: http://www.tweakers.net/reviews/362/3 In the second-last paragraph is a summary of Maxtor's view on the differences between ATA and SCSI. Regards, Arjen > -----Oorspronkelijk bericht----- > Van: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] Namens Dennis Gearon > Verzonden: donderdag 27 maart 2003 18:10 > Aan: Daniel R. Anderson > CC: Pgsql-General > Onderwerp: Re: [GENERAL] Performance tuning in PostgreSQL > > > I read a recent article on Tom's hardware that said, even > given the same speed > of data from the read heads and the same buffer size, a SCSI > drive will work > better for a server, and the IDE drive will work better for > the desktop. The > caching algorithms are optimised with the assumption that a > SCSI drive will BE > on a server and an IDE drive will BE on a desktop. > > Daniel R. Anderson wrote: > > On Wed, 2003-03-26 at 12:56, Dennis Gearon wrote: > > > >>In General, the rotational speed is higher on SCSCI disks, and this > >>increases > >>the tranfer rate from the disc, which is the limitation for > anything not in the > >>disk's cache. Given the same areal dinsity, a 15,000 SCSI > drive will be 50% > >>faster in tranfer rate than a 10,000 IDE drive. > > > > > > For anybody interested I got the story off of slashdot: > > > > > http://slashdot.org/article.pl?sid=> 03/02/21/0553249&mode=thread&tid=13 > > 7 > > > > The claim is that these ATA drives have "SCSI-like specs at > 30% less > > of the price". SCSI-LIKE != SCSI though. :-( > > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to > majordomo@postgresql.org >