Re: Q on views and performance - Mailing list pgsql-performance

From Kynn Jones
Subject Re: Q on views and performance
Date
Msg-id c2350ba40802260849h5d37f3a3vc93f31e49a096ef5@mail.gmail.com
Whole thread Raw
In response to Re: Q on views and performance  (Matthew <matthew@flymine.org>)
List pgsql-performance
On Mon, Feb 25, 2008 at 11:56 AM, Matthew <matthew@flymine.org> wrote:
On Mon, 25 Feb 2008, Kynn Jones wrote:
> This is just GREAT!!!  It fits the problem to a tee.

It makes the queries quick then?

It is good that you ask.  Clearly you know the story: a brilliant-sounding optimization that in practice has only a small effect at best...

I'm totally puzzled.  It makes absolutely no sense to me...

For my analysis, in addition to creating the index on (type, zipk) that you suggested, I also added an extra column to T containing a random integer in the range 0..99, and created an index on this, so that I could produce a totally "shuffled clustering".  I compared the performance in going from a randomly-clustered table to a (type, zipk)-clustered table, and the output of EXPLAIN was encouraging, but when I ran the actual queries under EXPLAIN ANALYZE the difference in execution time was negligible.

Live and learn!

Actually, what's driving me absolutely insane is the documentation for EXPLAIN and for Pg's query planning in general.  I've read the docs (in particular the chapter on performance), but I still can't make any sense of EXPLAINs results, so I can't begin to understand why optimizations like the one you suggested turned out to be ineffective.  For example, the first lines of two recent EXPLAIN ANALYZE outputs are

Nested Loop Left Join  (cost=58.00..1154.22 rows=626 width=26) (actual time=1.462..26.494 rows=2240 loops=1)
Merge Left Join  (cost=33970.96..34887.69 rows=58739 width=26) (actual time=106.961..126.589 rows=7042 loops=1)

Actual runtimes are 27ms and 128ms.  The ratio 128/27 is much smaller than one would expect from the relative costs of the two queries.  It looks like there is no proportionality at all between the estimated costs and actual running time...  (BTW, all these runs of EXPLAIN were done after calls to VACUUM ANALYZE.)  This is one of the many things I don't understand about this case...

What I would like to be able to do is to at least make enough sense of query plans to determine whether they are reasonable or not.  This requires knowing the algorithms behind each type of query tree node, but I have not found this info...

On the positive side, in the course of all this analysis I must have done *something* to improve the performance, because now even the unoptimized queries are running pretty fast (e.g. queries that used to take about 1.5 seconds are now taking 130ms).  But unfortunately I don't know what was it that I did to bring this speed-up about!

Anyway, be that as it may, thank you very much for your suggestion.

Kynn

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: PG planning randomly ?
Next
From: "Laurent Raufaste"
Date:
Subject: Re: PG planning randomly ?