Home > mailing lists

Re: [PERFORM] Need help in setting optimal configuration - Mailing list pgsql-admin

From	Scott Marlowe
Subject	Re: [PERFORM] Need help in setting optimal configuration
Date	October 24, 2005 12:28:07
Msg-id	1130167682.15546.114.camel@state.g2switchworks.com Whole thread Raw
In response to	Re: [PERFORM] Need help in setting optimal configuration for a huge database. (Kishore B <kishorebh@gmail.com>)
List	pgsql-admin

Tree view

(Please don't top reply... your response has been moved to the bottom)

On Sun, 2005-10-23 at 16:55, Kishore B wrote:
>
> On 10/23/05, Craig A. James <cjames@modgraph-usa.com> wrote:
>         > We are facing a* critical situation because of the
>         performance of the
>         > **database** .* Even a basic query like select count(*) from
>         > bigger_table is taking about 4 minutes to return.
>
>         Several other replies have mentioned that COUNT() requires a
>         full table scan,

This isn't wholly correct.  A query like this:

select count(*) from locatorcodes where locatorcode like 'ZZZ%';

can use an index.  However, since tuple visibility info is NOT stored in
indexes, ALL these tuples must be looked up in the actual table.

>          but this point can't be emphasized enough: Don't do it!
>         People who are migrating from other environments (Oracle or
>         MySQL) are used to COUNT(), MIN() and MAX() returning almost
>         instantaneously, certainly on indexed columns.

While I'll admit that min and max are a bit faster in Oracle than in
postgresql, count doesn't seem much faster in my testing.  Of course, on
a wider table Oracle probably is faster, but I'm used to normalizing out
my tables so that there's no advantage for Oracle there.

>           But for reasons that have something to do with transactions,
>         these operations are unbelievably slow in PostgreSQL.

It's because of visibility in the MVCC system PostgreSQL uses.

>         MIN() and MAX() -- These are surprisingly slow, because they
>         seem to do a full table scan EVEN ON AN INDEXED COLUMN!  I
>         don't understand why, but happily there is an effective
>         substitute:

It's because aggregate in PostgreSQL are abstract things.  To make these
two faster would require short circuiting the query planner to use
something other than the abstracted methods PostgreSQL was built around.

On the other hand, select with limit and order by can use the indexes
because they are not aggregates.

>         You should carefully examine your entire application for
>         COUNT, MIN, and MAX, and get rid of them EVERYWHERE.  This may
>         be the entire source of your problem.  It was in my case.

You're right on here.  The problem is that people often use aggregates
where they shouldn't.  Aggregates really are meant to operate across a
whole set of data.  An aggregate like sum() or avg() seems obviously
designed to hit every tuple.  Well, while min, max, and count may not
look like they should, they, in fact, do hit every table covered by the
where clause.

>         This is, in my humble opinion, the only serious flaw in
>         PostgreSQL.  I've been totally happy with it in every other
>         way, and once I understood these shortcomings, my application
>         is runs faster than ever on PostgreSQL.

I wouldn't fully qualify it as a flaw.  It's a design quirk, caused by
the extensible model PostgreSQL is built under.  While it costs you in
one way, like slower min / max / count in some circumstances, it
benefits you others, like the ability make your own aggregate functions.

> Hi Craig,
>
> Does the no of tables and the size of each table affect the
> performance of a join operation?

Of course they do.  The more information your query has to process, the
slower it will run.  It's usually a pretty much a linear increase in
time required, unless you go from everything fitting into buffers to
spilling to disk.  Then things will slow down noticeably.

>
> When we are trying to join the two big tables that I described above,
> pg is taking so long to execute?

Hard to say.  There are many ways to tune PostgreSQL.  I strongly
suggest you take this thread to the performance list, and post your
postgresql.conf file, and the output of "explain analyze <your query
here>" and ask for help.  That list is much better equipped to help with
these things.

pgsql-admin by date:

From: Tsirkin Evgeny
Date: 24 October 2005, 10:28:29
Subject: Re: Fwd: Question: 2 GB file limit on linux

From: Colton A Smith
Date: 24 October 2005, 23:14:19
Subject: excluding a table from pg_dump

Re: [PERFORM] Need help in setting optimal configuration - Mailing list pgsql-admin

Previous

Next