Re: Performance Question - Mailing list pgsql-performance
From | PFC |
---|---|
Subject | Re: Performance Question |
Date | |
Msg-id | op.ukpwvq1fcigqcu@soyouz Whole thread Raw |
In response to | Performance Question (- - <themanatuf@yahoo.com>) |
List | pgsql-performance |
> I've been searching for performance metrics and tweaks for a few weeks > now. I'm trying to determine if the length of time to process my queries > is accurate or not and I'm having a difficult time determining that. I > know postgres performance is very dependent on hardware and settings and > I understand how difficult it is to tackle. However, I was wondering if > I could get some feedback based on my results please. Well, the simplest thing is to measure the time it takes to process a query, but : - EXPLAIN ANALYZE will always report a longer time than the reality, because instrumenting the query takes time. For instance, EXPLAIN ANALYZE on a count(*) on a query could take more time to count how many times the "count" aggregate is called and how much time is spent in it, than to actually compute the aggregate... This is because it takes much longer to measure the time it takes to call "count" on a row (syscalls...) than it takes to increment the count. This is not a problem as long as you are aware of it, and the information provided by EXPLAIN ANALYZE is very valuable. - Using \timing in psql is also a good way to examine queries, but if your query returns lots of results, the time it takes for the client to process those results will mess with your measurements. In this case a simple : SELECT sum(1) FROM (your query) can provide less polluted timings. Remember you are not that interested in client load : you can always add more webservers, but adding more database servers is a lot more difficult. - You can add some query logging in your application (always a good idea IMHO). For instance, the administrator (you) could see a list of queries at the bottom of the page with the time it takes to run them. In that case, keep in mind that any load will add randomness to this measurements. For instance, when you hit F5 in your browser, of the webserver and database run on the same machine as the browser, the browser's CPU usage can make one of your queries appear to take up to half a second... even if it takes, in reality, half a millisecond... So, average. You could push the idea further. Sometimes I log the parameterized query (without args), the args separately, and the query time, so I can get average timings for things like "SELECT stuff FROM table WHERE column=$1", not get a zillion separate queries depending on the parameters. Such logging can destroy your performance, though, use with care. OF COURSE YOU SHOULD MEASURE WHAT IS RELEVANT, that is, queries that your application uses. > The database is running on a dual-core 2GHz Opteron processor with 8GB > of RAM. 8GB. 64 bits I presume ? > The drives are 10K RPM 146GB drives in RAID 5 (I've read RAID 5 is bad > for Postgres, but moving the database to another server didn't change > performance at all). RAID5 = good for reads, and large writes. RAID5 = hell for small random writes. Depends on your load... > shared_buffers = 16MB That's a bit small IMHO. (try 2 GB). > work_mem = 64MB > everything else is set to the default > > One of my tables has 660,000 records and doing a SELECT * from that > table (without any joins or sorts) takes 72 seconds. Well, sure, but why would you do such a thing ? I mean, I don't know your row size, but say it is 2 KB, you just used 1.5 GB of RAM on the client and on the server. Plus of course transferring all this data over your network connection. If client and server are on the same machine, you just zapped 3 GB of RAM. I hope you don't do too many of those concurrently... This is never going to be fast and it is never going to be a good performance metric. If you need to pull 600.000 rows from a table, use a CURSOR, and pull them in batches of say, 1000. Then you will use 600 times less RAM. I hope you have gigabit ethernet though. Network and disk IO will be your main bottleneck. If you don't need to pull 600.000 rows from a table, well then, don't do it. If you're using a client app to display the results, well, how long does it take to display 600.000 rows in a GUI box ?... > Ordering the table based on 3 columns almost doubles that time to an > average of 123 seconds. Same as above, if your rows are small, say 100 bytes, you're sorting 66 megabytes, which would easily be done in RAM, but you specified work_mem too small, so it is done on disk, with several passes. If your rows are large, well you're facing a multi gigabyte disksort with only 64 MB of working memory, so it's really going to take lots of passes. If you often need to pull 600.000 rows from a table in a specific order, create an index on the column, use a CURSOR, and pull them in batches of say, 1000. If you seldom need to, don't create an index but do use a CURSOR, and pull them in batches of say, 1000. If you don't need to pull 600.000 rows from a table in a specific order, well then, don't do it. > To me, those numbers are crazy slow and I don't understand why the > queries are taking so long. The tables are UTF-8 encode and contain a > mix of languages (English, Spanish, etc). I'm running the query from > pgadmin3 on a remote host. The server has nothing else running on it > except the database. OK, I presume you are sorting UNICODE strings (which is also slower than binary compare) so in this case you should really try to minimize the number of string comparisons which means using a much larger work_mem. > I'm convinced something is wrong, I just can't pinpoint where it is. I > can provide any other information necessary. If anyone has any > suggestions it would be greatly appreciated. Well, the big questions are : - do you need to run this query often ? - what do you use it for ? - how many bytes does it weigh ? Until you answer that, it is difficult to help...
pgsql-performance by date: