Re: [PERFORM] How to read query plan

From: Tom Lane
Subject: Re: [PERFORM] How to read query plan
Date: ,
(view: Whole thread, Raw)
Responses: Re: [PERFORM] How to read query plan  (Tom Lane)
Re: [PERFORM] How to read query plan  (Miroslav Šulc)
List: pgsql-hackers

Tree view

Re: [PERFORM] How to read query plan  (Tom Lane, )
 Re: [PERFORM] How to read query plan  (Tom Lane, )
  Re: [PERFORM] How to read query plan  (Miroslav Šulc, )
 Re: [PERFORM] How to read query plan  (Tom Lane, )
  Re: [PERFORM] How to read query plan  (Miroslav Šulc, )
 Re: [PERFORM] How to read query plan  (Tom Lane, )
 Re: [PERFORM] How to read query plan  (John Arbash Meinel, )
 Re: [PERFORM] How to read query plan  (Miroslav Šulc, )

=?windows-1250?Q?Miroslav_=8Aulc?= <> writes:
>> Is the data confidential?  If you'd be willing to send me a pg_dump
>> off-list, I'd like to replicate this test and try to see where the time
>> is going.
> Thank you very much for your offer. The data are partially confidental
> so I hashed some of the text information and changed some values (not
> the values for the JOINs) so I could send it to you. I've checked the
> EXPLAIN ANALYZE if anything changed and the result is merely the same
> (maybe cca 1 sec slower - maybe because the hash caused the text data to
> be longer).

No problem; thank you for supplying the test case.  What I find is
rather surprising: most of the runtime can be blamed on disassembling
and reassembling tuples during the join steps.  Here are the hot spots
according to gprof:

                1.27    7.38    8277/103737      ExecScan [16]
                2.93   17.02   19092/103737      ExecNestLoop <cycle 2> [14]
                3.91   22.70   25456/103737      ExecMergeJoin <cycle 2> [13]
                7.81   45.40   50912/103737      ExecHashJoin <cycle 2> [12]
[9]     86.3   15.92   92.50  103737         ExecProject [9]
                7.65   76.45 8809835/9143692     ExecEvalVar [10]
                3.42    4.57  103737/103775      heap_formtuple [17]
                0.03    0.24   12726/143737      ExecMakeFunctionResultNoSets [24]
                0.02    0.12  103737/290777      ExecStoreTuple [44]
                0.01    0.00       2/2           ExecEvalFunc [372]
                0.00    0.00       2/22          ExecMakeFunctionResult [166]
                0.00    0.00      42/9143692     ExecEvalFuncArgs [555]
                0.05    0.51   59067/9143692     ExecHashGetHashValue [32]
                0.24    2.38  274748/9143692     ExecMakeFunctionResultNoSets [24]
                7.65   76.45 8809835/9143692     ExecProject [9]
[10]    69.5    7.94   79.34 9143692         ExecEvalVar [10]
               79.34    0.00 8750101/9175517     nocachegetattr [11]

I think the reason this is popping to the top of the runtime is that the
joins are so wide (an average of ~85 columns in a join tuple according
to the numbers above).  Because there are lots of variable-width columns
involved, most of the time the fast path for field access doesn't apply
and we end up going to nocachegetattr --- which itself is going to be
slow because it has to scan over so many columns.  So the cost is
roughly O(N^2) in the number of columns.

As a short-term hack, you might be able to improve matters if you can
reorder your LEFT JOINs to have the minimum number of columns
propagating up from the earlier join steps.  In other words make the
later joins add more columns than the earlier, as much as you can.

This is actually good news, because before 8.0 we had much worse
problems than this with extremely wide tuples --- there were O(N^2)
behaviors all over the place due to the old List algorithms.  Neil
Conway's rewrite of the List code got rid of those problems, and now
we can see the places that are left to optimize.  The fact that there
seems to be just one is very nice indeed.

Since ExecProject operations within a nest of joins are going to be
dealing entirely with Vars, I wonder if we couldn't speed matters up
by having a short-circuit case for a projection that is only Vars.
Essentially it would be a lot like execJunk.c, except able to cope
with two input tuples.  Using heap_deformtuple instead of retail
extraction of fields would eliminate the O(N^2) penalty for wide tuples.

            regards, tom lane

pgsql-hackers by date:

From: Tom Lane
Subject: Re: Null Value Stored for Date e TimeStamp
From: Neil Conway
Subject: invalidating cached plans