Thread: Re: [PERFORM] How to read query plan

Re: [PERFORM] How to read query plan

From
Tom Lane
Date:
=?windows-1250?Q?Miroslav_=8Aulc?= <miroslav.sulc@startnet.cz> writes:
>> Is the data confidential?  If you'd be willing to send me a pg_dump
>> off-list, I'd like to replicate this test and try to see where the time
>> is going.
>>
> Thank you very much for your offer. The data are partially confidental
> so I hashed some of the text information and changed some values (not
> the values for the JOINs) so I could send it to you. I've checked the
> EXPLAIN ANALYZE if anything changed and the result is merely the same
> (maybe cca 1 sec slower - maybe because the hash caused the text data to
> be longer).

No problem; thank you for supplying the test case.  What I find is
rather surprising: most of the runtime can be blamed on disassembling
and reassembling tuples during the join steps.  Here are the hot spots
according to gprof:

-----------------------------------------------
                1.27    7.38    8277/103737      ExecScan [16]
                2.93   17.02   19092/103737      ExecNestLoop <cycle 2> [14]
                3.91   22.70   25456/103737      ExecMergeJoin <cycle 2> [13]
                7.81   45.40   50912/103737      ExecHashJoin <cycle 2> [12]
[9]     86.3   15.92   92.50  103737         ExecProject [9]
                7.65   76.45 8809835/9143692     ExecEvalVar [10]
                3.42    4.57  103737/103775      heap_formtuple [17]
                0.03    0.24   12726/143737      ExecMakeFunctionResultNoSets [24]
                0.02    0.12  103737/290777      ExecStoreTuple [44]
                0.01    0.00       2/2           ExecEvalFunc [372]
                0.00    0.00       2/22          ExecMakeFunctionResult [166]
-----------------------------------------------
                0.00    0.00      42/9143692     ExecEvalFuncArgs [555]
                0.05    0.51   59067/9143692     ExecHashGetHashValue [32]
                0.24    2.38  274748/9143692     ExecMakeFunctionResultNoSets [24]
                7.65   76.45 8809835/9143692     ExecProject [9]
[10]    69.5    7.94   79.34 9143692         ExecEvalVar [10]
               79.34    0.00 8750101/9175517     nocachegetattr [11]
-----------------------------------------------

I think the reason this is popping to the top of the runtime is that the
joins are so wide (an average of ~85 columns in a join tuple according
to the numbers above).  Because there are lots of variable-width columns
involved, most of the time the fast path for field access doesn't apply
and we end up going to nocachegetattr --- which itself is going to be
slow because it has to scan over so many columns.  So the cost is
roughly O(N^2) in the number of columns.

As a short-term hack, you might be able to improve matters if you can
reorder your LEFT JOINs to have the minimum number of columns
propagating up from the earlier join steps.  In other words make the
later joins add more columns than the earlier, as much as you can.

This is actually good news, because before 8.0 we had much worse
problems than this with extremely wide tuples --- there were O(N^2)
behaviors all over the place due to the old List algorithms.  Neil
Conway's rewrite of the List code got rid of those problems, and now
we can see the places that are left to optimize.  The fact that there
seems to be just one is very nice indeed.

Since ExecProject operations within a nest of joins are going to be
dealing entirely with Vars, I wonder if we couldn't speed matters up
by having a short-circuit case for a projection that is only Vars.
Essentially it would be a lot like execJunk.c, except able to cope
with two input tuples.  Using heap_deformtuple instead of retail
extraction of fields would eliminate the O(N^2) penalty for wide tuples.

            regards, tom lane

Re: [PERFORM] How to read query plan

From
Tom Lane
Date:
I wrote:
> Since ExecProject operations within a nest of joins are going to be
> dealing entirely with Vars, I wonder if we couldn't speed matters up
> by having a short-circuit case for a projection that is only Vars.
> Essentially it would be a lot like execJunk.c, except able to cope
> with two input tuples.  Using heap_deformtuple instead of retail
> extraction of fields would eliminate the O(N^2) penalty for wide tuples.

Actually, we already had a pending patch (from Atsushi Ogawa) that
eliminates that particular O(N^2) behavior in another way.  After
applying it, I get about a factor-of-4 reduction in the runtime for
Miroslav's example.

ExecEvalVar and associated routines are still a pretty good fraction of
the runtime, so it might still be worth doing something like the above,
but it'd probably be just a marginal win instead of a big win.

            regards, tom lane

Re: [PERFORM] How to read query plan

From
Tom Lane
Date:
=?windows-1250?Q?Miroslav_=8Aulc?= <miroslav.sulc@startnet.cz> writes:
> As there are a lot of varchar(1) in the AdDevicesSites table, wouldn't
> be helpful to change them to char(1)? Would it solve the variable-width
> problem at least for some fields and speed the query up?

No, because char(1) isn't physically fixed-width (consider multibyte
encodings).  There's really no advantage to char(N) in Postgres.

I don't know what you're doing with those fields, but if they are
effectively booleans or small codes you might be able to convert them to
bool or int fields.  There is also the "char" datatype (not to be
confused with char(1)) which can hold single ASCII characters, but is
nonstandard and a bit impoverished as to functionality.

However, I doubt this is worth pursuing.  One of the things I tested
yesterday was a quick hack to organize the storage of intermediate join
tuples with fixed-width fields first and non-fixed ones later.  It
really didn't help much at all :-(.  I think the trouble with your
example is that in the existing code, the really fast path applies only
when the tuple contains no nulls --- and since you're doing all that
left joining, there's frequently at least one null lurking.

            regards, tom lane

Re: [PERFORM] How to read query plan

From
Tom Lane
Date:
=?windows-1250?Q?Miroslav_=8Aulc?= <miroslav.sulc@startnet.cz> writes:
> Tom Lane wrote:
>> Actually, we already had a pending patch (from Atsushi Ogawa) that
>> eliminates that particular O(N^2) behavior in another way.  After
>> applying it, I get about a factor-of-4 reduction in the runtime for
>> Miroslav's example.
>>
> Is there a chance we will see this patch in the 8.0.2 release?

No.  We are not in the habit of making non-bug-fix changes in stable
branches.  Ogawa's patch is in CVS for 8.1.

            regards, tom lane

Re: [PERFORM] How to read query plan

From
John Arbash Meinel
Date:
Miroslav Šulc wrote:

> Tom Lane wrote:
>
>> ...
>> I think the reason this is popping to the top of the runtime is that the
>> joins are so wide (an average of ~85 columns in a join tuple according
>> to the numbers above).  Because there are lots of variable-width columns
>> involved, most of the time the fast path for field access doesn't apply
>> and we end up going to nocachegetattr --- which itself is going to be
>> slow because it has to scan over so many columns.  So the cost is
>> roughly O(N^2) in the number of columns.
>>
>>
> As there are a lot of varchar(1) in the AdDevicesSites table, wouldn't
> be helpful to change them to char(1)? Would it solve the
> variable-width problem at least for some fields and speed the query up?
>
I'm guessing there really wouldn't be a difference. I think varchar()
and char() are stored the same way, just one always has space padding. I
believe they are both varlena types, so they are still "variable" length.

>> As a short-term hack, you might be able to improve matters if you can
>> reorder your LEFT JOINs to have the minimum number of columns
>> propagating up from the earlier join steps.  In other words make the
>> later joins add more columns than the earlier, as much as you can.
>>
>>
> That will be hard as the main table which contains most of the fields
> is LEFT JOINed with the others. I'll look at it if I find some way to
> improve it.

One thing that you could try, is to select just the primary keys from
the main table, and then later on, join back to that table to get the
rest of the columns. It is a little bit hackish, but if it makes your
query faster, you might want to try it.

>
> I'm not sure whether I understand the process of performing the plan
> but I imagine that the data from AdDevicesSites are retrieved only
> once when they are loaded and maybe stored in memory. Are the columns
> stored in the order they are in the SQL command? If so, wouldn't it be
> useful to move all varchar fields at the end of the SELECT query? I'm
> just guessing because I don't know at all how a database server is
> implemented and what it really does.
>
I don't think they are stored in the order of the SELECT <> portion. I'm
guessing they are loaded and saved as you go. But that the order of the
LEFT JOIN at the end is probably important.

>> ..
>>             regards, tom lane
>>
>>
> Miroslav


John
=:->


Re: [PERFORM] How to read query plan

From
Miroslav Šulc
Date:
Tom Lane wrote:

>=?windows-1250?Q?Miroslav_=8Aulc?= <miroslav.sulc@startnet.cz> writes:
>
>
>>As there are a lot of varchar(1) in the AdDevicesSites table, wouldn't
>>be helpful to change them to char(1)? Would it solve the variable-width
>>problem at least for some fields and speed the query up?
>>
>>
>
>No, because char(1) isn't physically fixed-width (consider multibyte
>encodings).  There's really no advantage to char(N) in Postgres.
>
>
I was aware of that :-(

>I don't know what you're doing with those fields, but if they are
>effectively booleans or small codes you might be able to convert them to
>bool or int fields.  There is also the "char" datatype (not to be
>confused with char(1)) which can hold single ASCII characters, but is
>nonstandard and a bit impoverished as to functionality.
>
>
The problem lies in migration from MySQL to PostgreSQL. In MySQL we
(badly) choose enum for yes/no switches (there's nothing like boolean
field type in MySQL as I know but we could use tinyint). It will be very
time consuming to rewrite all such enums and check the code whether it
works.

>However, I doubt this is worth pursuing.  One of the things I tested
>yesterday was a quick hack to organize the storage of intermediate join
>tuples with fixed-width fields first and non-fixed ones later.  It
>really didn't help much at all :-(.  I think the trouble with your
>example is that in the existing code, the really fast path applies only
>when the tuple contains no nulls --- and since you're doing all that
>left joining, there's frequently at least one null lurking.
>
>
Unfortunatelly I don't see any other way than LEFT JOINing in this case.

>            regards, tom lane
>
>
Miroslav

Attachment

Re: [PERFORM] How to read query plan

From
Miroslav Šulc
Date:
Tom Lane wrote:

>I wrote:
>
>
>>Since ExecProject operations within a nest of joins are going to be
>>dealing entirely with Vars, I wonder if we couldn't speed matters up
>>by having a short-circuit case for a projection that is only Vars.
>>Essentially it would be a lot like execJunk.c, except able to cope
>>with two input tuples.  Using heap_deformtuple instead of retail
>>extraction of fields would eliminate the O(N^2) penalty for wide tuples.
>>
>>
>
>Actually, we already had a pending patch (from Atsushi Ogawa) that
>eliminates that particular O(N^2) behavior in another way.  After
>applying it, I get about a factor-of-4 reduction in the runtime for
>Miroslav's example.
>
>
Is there a chance we will see this patch in the 8.0.2 release? And when
can we expect this release?

>ExecEvalVar and associated routines are still a pretty good fraction of
>the runtime, so it might still be worth doing something like the above,
>but it'd probably be just a marginal win instead of a big win.
>
>            regards, tom lane
>
>
Miroslav

Attachment

Re: [PERFORM] How to read query plan

From
Miroslav Šulc
Date:
Tom Lane wrote:

>...
>I think the reason this is popping to the top of the runtime is that the
>joins are so wide (an average of ~85 columns in a join tuple according
>to the numbers above).  Because there are lots of variable-width columns
>involved, most of the time the fast path for field access doesn't apply
>and we end up going to nocachegetattr --- which itself is going to be
>slow because it has to scan over so many columns.  So the cost is
>roughly O(N^2) in the number of columns.
>
>
As there are a lot of varchar(1) in the AdDevicesSites table, wouldn't
be helpful to change them to char(1)? Would it solve the variable-width
problem at least for some fields and speed the query up?

>As a short-term hack, you might be able to improve matters if you can
>reorder your LEFT JOINs to have the minimum number of columns
>propagating up from the earlier join steps.  In other words make the
>later joins add more columns than the earlier, as much as you can.
>
>
That will be hard as the main table which contains most of the fields is
LEFT JOINed with the others. I'll look at it if I find some way to
improve it.

I'm not sure whether I understand the process of performing the plan but
I imagine that the data from AdDevicesSites are retrieved only once when
they are loaded and maybe stored in memory. Are the columns stored in
the order they are in the SQL command? If so, wouldn't it be useful to
move all varchar fields at the end of the SELECT query? I'm just
guessing because I don't know at all how a database server is
implemented and what it really does.

>..
>            regards, tom lane
>
>
Miroslav

Attachment