Thread: Performance Question
I've been searching for performance metrics and tweaks for a few weeks now. I'm trying to determine if the length of time to process my queries is accurate or not and I'm having a difficult time determining that. I know postgres performance is very dependent on hardware and settings and I understand how difficult it is to tackle. However, I was wondering if I could get some feedback based on my results please.
The database is running on a dual-core 2GHz Opteron processor with 8GB of RAM. The drives are 10K RPM 146GB drives in RAID 5 (I've read RAID 5 is bad for Postgres, but moving the database to another server didn't change performance at all). Some of the key parameters from postgresql.conf are:
max_connections = 100
shared_buffers = 16MB
work_mem = 64MB
everything else is set to the default
One of my tables has 660,000 records and doing a SELECT * from that table (without any joins or sorts) takes 72 seconds. Ordering the table based on 3 columns almost doubles that time to an average of 123 seconds. To me, those numbers are crazy slow and I don't understand why the queries are taking so long. The tables are UTF-8 encode and contain a mix of languages (English, Spanish, etc). I'm running the query from pgadmin3 on a remote host. The server has nothing else running on it except the database.
As a test I tried splitting up the data across a number of other tables. I ran 10 queries (to correspond with the 10 tables) with a UNION ALL to join the results together. This was even slower, taking an average of 103 seconds to complete the generic select all query.
I'm convinced something is wrong, I just can't pinpoint where it is. I can provide any other information necessary. If anyone has any suggestions it would be greatly appreciated.
The database is running on a dual-core 2GHz Opteron processor with 8GB of RAM. The drives are 10K RPM 146GB drives in RAID 5 (I've read RAID 5 is bad for Postgres, but moving the database to another server didn't change performance at all). Some of the key parameters from postgresql.conf are:
max_connections = 100
shared_buffers = 16MB
work_mem = 64MB
everything else is set to the default
One of my tables has 660,000 records and doing a SELECT * from that table (without any joins or sorts) takes 72 seconds. Ordering the table based on 3 columns almost doubles that time to an average of 123 seconds. To me, those numbers are crazy slow and I don't understand why the queries are taking so long. The tables are UTF-8 encode and contain a mix of languages (English, Spanish, etc). I'm running the query from pgadmin3 on a remote host. The server has nothing else running on it except the database.
As a test I tried splitting up the data across a number of other tables. I ran 10 queries (to correspond with the 10 tables) with a UNION ALL to join the results together. This was even slower, taking an average of 103 seconds to complete the generic select all query.
I'm convinced something is wrong, I just can't pinpoint where it is. I can provide any other information necessary. If anyone has any suggestions it would be greatly appreciated.
There are a few things you didn't mention...
First off, what is the context this database is being used in? Is it the backend for a web server? Data warehouse? Etc?
Second, you didn't mention the use of indexes. Do you have any indexes on the table in question, and if so, does EXPLAIN ANALYZE show the planner utilizing the index(es)?
Third, you have 8 GB of RAM on a dedicated machine. Consider upping the memory settings in postgresql.conf. For instance, on my data warehouse machines (8 GB RAM each) I have shared_buffers set to almost 2 GB and effective_cache_size set to nearly 5.5 GB. (This is dependent on how you're utilizing this database, so don't blindly set these values!)
Last, you didn't mention what RAID level the other server you tested this on was running.
--
Computers are like air conditioners...
They quit working when you open Windows.
First off, what is the context this database is being used in? Is it the backend for a web server? Data warehouse? Etc?
Second, you didn't mention the use of indexes. Do you have any indexes on the table in question, and if so, does EXPLAIN ANALYZE show the planner utilizing the index(es)?
Third, you have 8 GB of RAM on a dedicated machine. Consider upping the memory settings in postgresql.conf. For instance, on my data warehouse machines (8 GB RAM each) I have shared_buffers set to almost 2 GB and effective_cache_size set to nearly 5.5 GB. (This is dependent on how you're utilizing this database, so don't blindly set these values!)
Last, you didn't mention what RAID level the other server you tested this on was running.
On Wed, Nov 12, 2008 at 10:27 AM, - - <themanatuf@yahoo.com> wrote:
I've been searching for performance metrics and tweaks for a few weeks now. I'm trying to determine if the length of time to process my queries is accurate or not and I'm having a difficult time determining that. I know postgres performance is very dependent on hardware and settings and I understand how difficult it is to tackle. However, I was wondering if I could get some feedback based on my results please.
The database is running on a dual-core 2GHz Opteron processor with 8GB of RAM. The drives are 10K RPM 146GB drives in RAID 5 (I've read RAID 5 is bad for Postgres, but moving the database to another server didn't change performance at all). Some of the key parameters from postgresql.conf are:
max_connections = 100
shared_buffers = 16MB
work_mem = 64MB
everything else is set to the default
One of my tables has 660,000 records and doing a SELECT * from that table (without any joins or sorts) takes 72 seconds. Ordering the table based on 3 columns almost doubles that time to an average of 123 seconds. To me, those numbers are crazy slow and I don't understand why the queries are taking so long. The tables are UTF-8 encode and contain a mix of languages (English, Spanish, etc). I'm running the query from pgadmin3 on a remote host. The server has nothing else running on it except the database.
As a test I tried splitting up the data across a number of other tables. I ran 10 queries (to correspond with the 10 tables) with a UNION ALL to join the results together. This was even slower, taking an average of 103 seconds to complete the generic select all query.
I'm convinced something is wrong, I just can't pinpoint where it is. I can provide any other information necessary. If anyone has any suggestions it would be greatly appreciated.
--
Computers are like air conditioners...
They quit working when you open Windows.
> max_connections = 100 > shared_buffers = 16MB > work_mem = 64MB > everything else is set to the default OK, but what about effective_cache_size for example? Anyway, we need more information about the table itself - the number of rows is nice, but it does not say how large the table is. The rows might be small (say 100B each) or large (say several kilobytes), affecting the amount of data to be read. We need to know the structure of the table, and the output of the following commands: ANALYZE table; SELECT relpages, reltuples FROM pg_class WHERE relname = 'table'; EXPLAIN SELECT * FROM table; > > One of my tables has 660,000 records and doing a SELECT * from that table > (without any joins or sorts) takes 72 seconds. Ordering the table based on > 3 columns almost doubles that time to an average of 123 seconds. To me, > those numbers are crazy slow and I don't understand why the queries are > taking so long. The tables are UTF-8 encode and contain a mix of languages > (English, Spanish, etc). I'm running the query from pgadmin3 on a remote > host. The server has nothing else running on it except the database. > > As a test I tried splitting up the data across a number of other tables. I > ran 10 queries (to correspond with the 10 tables) with a UNION ALL to join > the results together. This was even slower, taking an average of 103 > seconds to complete the generic select all query. Well, splitting the tables just to read all of them won't help. It will make the problem even worse, due to the necessary processing (UNION ALL). regards Tomas
Incrementing shared_buffers to 1024MB and set effective_cache_size to 6000MB and test again.
To speed up sort operations, increase work_mem till you notice an improvement.
Play with those settings with different values.
De: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] En nombre de - -
Enviado el: Miércoles, 12 de Noviembre de 2008 14:28
Para: pgsql-performance@postgresql.org
Asunto: [PERFORM] Performance QuestionI've been searching for performance metrics and tweaks for a few weeks now. I'm trying to determine if the length of time to process my queries is accurate or not and I'm having a difficult time determining that. I know postgres performance is very dependent on hardware and settings and I understand how difficult it is to tackle. However, I was wondering if I could get some feedback based on my results please.
The database is running on a dual-core 2GHz Opteron processor with 8GB of RAM. The drives are 10K RPM 146GB drives in RAID 5 (I've read RAID 5 is bad for Postgres, but moving the database to another server didn't change performance at all). Some of the key parameters from postgresql.conf are:
max_connections = 100
shared_buffers = 16MB
work_mem = 64MB
everything else is set to the default
One of my tables has 660,000 records and doing a SELECT * from that table (without any joins or sorts) takes 72 seconds. Ordering the table based on 3 columns almost doubles that time to an average of 123 seconds. To me, those numbers are crazy slow and I don't understand why the queries are taking so long. The tables are UTF-8 encode and contain a mix of languages (English, Spanish, etc). I'm running the query from pgadmin3 on a remote host. The server has nothing else running on it except the database.
As a test I tried splitting up the data across a number of other tables. I ran 10 queries (to correspond with the 10 tables) with a UNION ALL to join the results together. This was even slower, taking an average of 103 seconds to complete the generic select all query.
I'm convinced something is wrong, I just can't pinpoint where it is. I can provide any other information necessary. If anyone has any suggestions it would be greatly appreciated.
- - <themanatuf@yahoo.com> writes: > One of my tables has 660,000 records and doing a SELECT * from that table (without any joins or sorts) takes 72 seconds.Ordering the table based on 3 columns almost doubles that time to an average of 123 seconds. To me, those numbersare crazy slow and I don't understand why the queries are taking so long. The tables are UTF-8 encode and containa mix of languages (English, Spanish, etc). I'm running the query from pgadmin3 on a remote host. The server has nothingelse running on it except the database. pgadmin has got its own performance issues with large select results. Are you sure the bulk of the time isn't being spent on the client side? Watching top or vmstat on both machines would probably tell much. regards, tom lane
On Wed, Nov 12, 2008 at 9:27 AM, - - <themanatuf@yahoo.com> wrote: > I've been searching for performance metrics and tweaks for a few weeks now. > I'm trying to determine if the length of time to process my queries is > accurate or not and I'm having a difficult time determining that. I know > postgres performance is very dependent on hardware and settings and I > understand how difficult it is to tackle. However, I was wondering if I > could get some feedback based on my results please. > > The database is running on a dual-core 2GHz Opteron processor with 8GB of > RAM. The drives are 10K RPM 146GB drives in RAID 5 (I've read RAID 5 is bad > for Postgres, but moving the database to another server didn't change > performance at all). Some of the key parameters from postgresql.conf are: I'm not sure what you mean. Did you move it to another server with a single drive? A 100 drive RAID-10 array with a battery backed caching controller? There's a lot of possibility in "another server". > > max_connections = 100 > shared_buffers = 16MB WAY low. try 512M to 2G on a machine that big. > work_mem = 64MB acceptable. For 100 clients, if each did a sort you'd need 6.4Gig of free ram, but since the chances of all 100 clients doing a sort that big at the same time are small, you're probably safe. > > One of my tables has 660,000 records and doing a SELECT * from that table > (without any joins or sorts) takes 72 seconds. Ordering the table based on 3 > columns almost doubles that time to an average of 123 seconds. To me, those How wide is this table? IF it's got 300 columns, then it's gonna be a lot slower than if it has 10 columns. Try running your query like this: \timing select count(*) from (my big query goes here) as a; and see how long it takes. This will remove the network effect of transferring the data. If that runs fast enough, then the real problem is that your client is waiting til it gets all the data to display it.
On Wed, Nov 12, 2008 at 8:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > - - <themanatuf@yahoo.com> writes: >> One of my tables has 660,000 records and doing a SELECT * from that table (without any joins or sorts) takes 72 seconds.Ordering the table based on 3 columns almost doubles that time to an average of 123 seconds. To me, those numbersare crazy slow and I don't understand why the queries are taking so long. The tables are UTF-8 encode and containa mix of languages (English, Spanish, etc). I'm running the query from pgadmin3 on a remote host. The server has nothingelse running on it except the database. > > pgadmin has got its own performance issues with large select results. They were fixed a couple of years ago. We're essentially at the mercy of libpq now. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com
On Wed, Nov 12, 2008 at 11:27 AM, - - <themanatuf@yahoo.com> wrote: > I've been searching for performance metrics and tweaks for a few weeks now. > I'm trying to determine if the length of time to process my queries is > accurate or not and I'm having a difficult time determining that. I know > postgres performance is very dependent on hardware and settings and I > understand how difficult it is to tackle. However, I was wondering if I > could get some feedback based on my results please. > > The database is running on a dual-core 2GHz Opteron processor with 8GB of > RAM. The drives are 10K RPM 146GB drives in RAID 5 (I've read RAID 5 is bad > for Postgres, but moving the database to another server didn't change > performance at all). Some of the key parameters from postgresql.conf are: > > max_connections = 100 > shared_buffers = 16MB > work_mem = 64MB > everything else is set to the default > > One of my tables has 660,000 records and doing a SELECT * from that table > (without any joins or sorts) takes 72 seconds. Ordering the table based on 3 > columns almost doubles that time to an average of 123 seconds. To me, those > numbers are crazy slow and I don't understand why the queries are taking so > long. The tables are UTF-8 encode and contain a mix of languages (English, > Spanish, etc). I'm running the query from pgadmin3 on a remote host. The > server has nothing else running on it except the database. > > As a test I tried splitting up the data across a number of other tables. I > ran 10 queries (to correspond with the 10 tables) with a UNION ALL to join > the results together. This was even slower, taking an average of 103 seconds > to complete the generic select all query. > > I'm convinced something is wrong, I just can't pinpoint where it is. I can > provide any other information necessary. If anyone has any suggestions it > would be greatly appreciated. Maybe there is a lot of dead rows? Do a VACUUM VERBOSE; That performance is quite slow unless the rows are really big (you have huge text or bytea columns). What is the average row size in bytes? Try running the following command as a benchmark: select generate_series(1,500000); on my imac that takes about 600ms. merlin
> I've been searching for performance metrics and tweaks for a few weeks > now. I'm trying to determine if the length of time to process my queries > is accurate or not and I'm having a difficult time determining that. I > know postgres performance is very dependent on hardware and settings and > I understand how difficult it is to tackle. However, I was wondering if > I could get some feedback based on my results please. Well, the simplest thing is to measure the time it takes to process a query, but : - EXPLAIN ANALYZE will always report a longer time than the reality, because instrumenting the query takes time. For instance, EXPLAIN ANALYZE on a count(*) on a query could take more time to count how many times the "count" aggregate is called and how much time is spent in it, than to actually compute the aggregate... This is because it takes much longer to measure the time it takes to call "count" on a row (syscalls...) than it takes to increment the count. This is not a problem as long as you are aware of it, and the information provided by EXPLAIN ANALYZE is very valuable. - Using \timing in psql is also a good way to examine queries, but if your query returns lots of results, the time it takes for the client to process those results will mess with your measurements. In this case a simple : SELECT sum(1) FROM (your query) can provide less polluted timings. Remember you are not that interested in client load : you can always add more webservers, but adding more database servers is a lot more difficult. - You can add some query logging in your application (always a good idea IMHO). For instance, the administrator (you) could see a list of queries at the bottom of the page with the time it takes to run them. In that case, keep in mind that any load will add randomness to this measurements. For instance, when you hit F5 in your browser, of the webserver and database run on the same machine as the browser, the browser's CPU usage can make one of your queries appear to take up to half a second... even if it takes, in reality, half a millisecond... So, average. You could push the idea further. Sometimes I log the parameterized query (without args), the args separately, and the query time, so I can get average timings for things like "SELECT stuff FROM table WHERE column=$1", not get a zillion separate queries depending on the parameters. Such logging can destroy your performance, though, use with care. OF COURSE YOU SHOULD MEASURE WHAT IS RELEVANT, that is, queries that your application uses. > The database is running on a dual-core 2GHz Opteron processor with 8GB > of RAM. 8GB. 64 bits I presume ? > The drives are 10K RPM 146GB drives in RAID 5 (I've read RAID 5 is bad > for Postgres, but moving the database to another server didn't change > performance at all). RAID5 = good for reads, and large writes. RAID5 = hell for small random writes. Depends on your load... > shared_buffers = 16MB That's a bit small IMHO. (try 2 GB). > work_mem = 64MB > everything else is set to the default > > One of my tables has 660,000 records and doing a SELECT * from that > table (without any joins or sorts) takes 72 seconds. Well, sure, but why would you do such a thing ? I mean, I don't know your row size, but say it is 2 KB, you just used 1.5 GB of RAM on the client and on the server. Plus of course transferring all this data over your network connection. If client and server are on the same machine, you just zapped 3 GB of RAM. I hope you don't do too many of those concurrently... This is never going to be fast and it is never going to be a good performance metric. If you need to pull 600.000 rows from a table, use a CURSOR, and pull them in batches of say, 1000. Then you will use 600 times less RAM. I hope you have gigabit ethernet though. Network and disk IO will be your main bottleneck. If you don't need to pull 600.000 rows from a table, well then, don't do it. If you're using a client app to display the results, well, how long does it take to display 600.000 rows in a GUI box ?... > Ordering the table based on 3 columns almost doubles that time to an > average of 123 seconds. Same as above, if your rows are small, say 100 bytes, you're sorting 66 megabytes, which would easily be done in RAM, but you specified work_mem too small, so it is done on disk, with several passes. If your rows are large, well you're facing a multi gigabyte disksort with only 64 MB of working memory, so it's really going to take lots of passes. If you often need to pull 600.000 rows from a table in a specific order, create an index on the column, use a CURSOR, and pull them in batches of say, 1000. If you seldom need to, don't create an index but do use a CURSOR, and pull them in batches of say, 1000. If you don't need to pull 600.000 rows from a table in a specific order, well then, don't do it. > To me, those numbers are crazy slow and I don't understand why the > queries are taking so long. The tables are UTF-8 encode and contain a > mix of languages (English, Spanish, etc). I'm running the query from > pgadmin3 on a remote host. The server has nothing else running on it > except the database. OK, I presume you are sorting UNICODE strings (which is also slower than binary compare) so in this case you should really try to minimize the number of string comparisons which means using a much larger work_mem. > I'm convinced something is wrong, I just can't pinpoint where it is. I > can provide any other information necessary. If anyone has any > suggestions it would be greatly appreciated. Well, the big questions are : - do you need to run this query often ? - what do you use it for ? - how many bytes does it weigh ? Until you answer that, it is difficult to help...