Re: [PERFORM] Need help in setting optimal configuration - Mailing list pgsql-admin
From | Scott Marlowe |
---|---|
Subject | Re: [PERFORM] Need help in setting optimal configuration |
Date | |
Msg-id | 1130167682.15546.114.camel@state.g2switchworks.com Whole thread Raw |
In response to | Re: [PERFORM] Need help in setting optimal configuration for a huge database. (Kishore B <kishorebh@gmail.com>) |
List | pgsql-admin |
(Please don't top reply... your response has been moved to the bottom) On Sun, 2005-10-23 at 16:55, Kishore B wrote: > > On 10/23/05, Craig A. James <cjames@modgraph-usa.com> wrote: > > We are facing a* critical situation because of the > performance of the > > **database** .* Even a basic query like select count(*) from > > bigger_table is taking about 4 minutes to return. > > Several other replies have mentioned that COUNT() requires a > full table scan, This isn't wholly correct. A query like this: select count(*) from locatorcodes where locatorcode like 'ZZZ%'; can use an index. However, since tuple visibility info is NOT stored in indexes, ALL these tuples must be looked up in the actual table. > but this point can't be emphasized enough: Don't do it! > People who are migrating from other environments (Oracle or > MySQL) are used to COUNT(), MIN() and MAX() returning almost > instantaneously, certainly on indexed columns. While I'll admit that min and max are a bit faster in Oracle than in postgresql, count doesn't seem much faster in my testing. Of course, on a wider table Oracle probably is faster, but I'm used to normalizing out my tables so that there's no advantage for Oracle there. > But for reasons that have something to do with transactions, > these operations are unbelievably slow in PostgreSQL. It's because of visibility in the MVCC system PostgreSQL uses. > MIN() and MAX() -- These are surprisingly slow, because they > seem to do a full table scan EVEN ON AN INDEXED COLUMN! I > don't understand why, but happily there is an effective > substitute: It's because aggregate in PostgreSQL are abstract things. To make these two faster would require short circuiting the query planner to use something other than the abstracted methods PostgreSQL was built around. On the other hand, select with limit and order by can use the indexes because they are not aggregates. > You should carefully examine your entire application for > COUNT, MIN, and MAX, and get rid of them EVERYWHERE. This may > be the entire source of your problem. It was in my case. You're right on here. The problem is that people often use aggregates where they shouldn't. Aggregates really are meant to operate across a whole set of data. An aggregate like sum() or avg() seems obviously designed to hit every tuple. Well, while min, max, and count may not look like they should, they, in fact, do hit every table covered by the where clause. > This is, in my humble opinion, the only serious flaw in > PostgreSQL. I've been totally happy with it in every other > way, and once I understood these shortcomings, my application > is runs faster than ever on PostgreSQL. I wouldn't fully qualify it as a flaw. It's a design quirk, caused by the extensible model PostgreSQL is built under. While it costs you in one way, like slower min / max / count in some circumstances, it benefits you others, like the ability make your own aggregate functions. > Hi Craig, > > Does the no of tables and the size of each table affect the > performance of a join operation? Of course they do. The more information your query has to process, the slower it will run. It's usually a pretty much a linear increase in time required, unless you go from everything fitting into buffers to spilling to disk. Then things will slow down noticeably. > > When we are trying to join the two big tables that I described above, > pg is taking so long to execute? Hard to say. There are many ways to tune PostgreSQL. I strongly suggest you take this thread to the performance list, and post your postgresql.conf file, and the output of "explain analyze <your query here>" and ask for help. That list is much better equipped to help with these things.
pgsql-admin by date: