Thread: Re: improve EXPLAIN for wide tables

Re: improve EXPLAIN for wide tables

From
Tom Lane
Date:
Sami Imseih <samimseih@gmail.com> writes:
> Looking further into this improvement, I started to question if it is
> necessary to make columns unique for EXPLAIN purposes?

Yes, otherwise references to them elsewhere in the plan will be
ambiguous.

It looks like your proposal tries to dodge that by unique-ifying
in some cases but not others, which strikes me as a totally
random and confusing thing to do.

Is there any reason to think that 52c707483 wasn't a sufficient
response to this issue?

            regards, tom lane



Re: improve EXPLAIN for wide tables

From
Sami Imseih
Date:
> > Looking further into this improvement, I started to question if it is
> > necessary to make columns unique for EXPLAIN purposes?

> Yes, otherwise references to them elsewhere in the plan will be
> ambiguous.

Explain will qualify the column name with the table name such as the simple
example below with the experimental patch. I have not been able to find cases,
except for the failed test case that involves "?column?", in which the
plan will result
in ambiguous column names.

postgres=# create table test (id int, id2 int);
CREATE TABLE
postgres=# create table test2 (id int, id2 int);
CREATE TABLE

postgres=# explain select * from test, test2 where test.id = test2.id
order by 1;
                             QUERY PLAN
---------------------------------------------------------------------
 Merge Join  (cost=317.01..711.38 rows=25538 width=16)
   Merge Cond: (test.id = test2.id)
   ->  Sort  (cost=158.51..164.16 rows=2260 width=8)
         Sort Key: test.id
         ->  Seq Scan on test  (cost=0.00..32.60 rows=2260 width=8)
   ->  Sort  (cost=158.51..164.16 rows=2260 width=8)
         Sort Key: test2.id
         ->  Seq Scan on test2  (cost=0.00..32.60 rows=2260 width=8)
(8 rows)

postgres=# explain verbose select * from test, test2 where test.id =
test2.id order by 1;
                                 QUERY PLAN
----------------------------------------------------------------------------
 Merge Join  (cost=317.01..711.38 rows=25538 width=16)
   Output: test.id, test.id2, test2.id, test2.id2
   Merge Cond: (test.id = test2.id)
   ->  Sort  (cost=158.51..164.16 rows=2260 width=8)
         Output: test.id, test.id2
         Sort Key: test.id
         ->  Seq Scan on public.test  (cost=0.00..32.60 rows=2260 width=8)
               Output: test.id, test.id2
   ->  Sort  (cost=158.51..164.16 rows=2260 width=8)
         Output: test2.id, test2.id2
         Sort Key: test2.id
         ->  Seq Scan on public.test2  (cost=0.00..32.60 rows=2260 width=8)
               Output: test2.id, test2.id2
 Query Identifier: 2523359249885908438
(14 rows)

> It looks like your proposal tries to dodge that by unique-ifying
> in some cases but not others, which strikes me as a totally
> random and confusing thing to do.

The work is required for things like deparsing views because Postgres
has to construct the underlying sql definition and part of that is it has to
make columns unique to generate a valid sql, such as the case here [1].
For planning purposes, the column name must already be unambiguous.
right?


> Is there any reason to think that 52c707483 wasn't a sufficient
> response to this issue?
>
>                         regards, tom lane

In the case reported, similar to the one in the earlier attached
experiment.sql, even with the hash table optimization, there is still
significant
overhead from the unique-ifying work. for such users, this can impact real-world
performance particularly if they have enabled extensions such as auto_explain.

[1] https://github.com/postgres/postgres/blob/master/src/test/regress/sql/create_view.sql#L550-L565

Regards,

Sami



Re: improve EXPLAIN for wide tables

From
Tom Lane
Date:
Sami Imseih <samimseih@gmail.com> writes:
>>> Looking further into this improvement, I started to question if it is
>>> necessary to make columns unique for EXPLAIN purposes?

>> Yes, otherwise references to them elsewhere in the plan will be
>> ambiguous.

> Explain will qualify the column name with the table name such as the simple
> example below with the experimental patch.

But if the column names are ambiguous within the same RTE, how does
table-qualification fix that?  And it's within-the-same-RTE that
we're concerned with here.

> I have not been able to find cases,
> except for the failed test case that involves "?column?", in which the
> plan will result in ambiguous column names.

It only takes one case to mean we have to deal with it ;-).  But I'm
fairly sure that there are many other cases, since the parser doesn't
restrict the output names of a sub-SELECT to be unique.

>> Is there any reason to think that 52c707483 wasn't a sufficient
>> response to this issue?

> In the case reported, similar to the one in the earlier attached
> experiment.sql, even with the hash table optimization, there is still
> significant
> overhead from the unique-ifying work. for such users, this can impact real-world
> performance particularly if they have enabled extensions such as auto_explain.

This detail-free claim doesn't persuade me that we have to do
something more.  (And I'm not buying the idea that people
should expect auto_explain to be free, anyway.)

            regards, tom lane



Re: improve EXPLAIN for wide tables

From
Tom Lane
Date:
I had a thought about this: I don't think EXPLAIN is ever required
to print the names of join alias variables (since the planner flattens
all join alias variables to some kind of expression over their
underlying columns).  So we could skip assigning column names to
join RTEs at all, if we know that it's EXPLAIN rather than view/rule
decompilation.  That might let us skip all the mess around
unique-ifying JOIN USING column names, too.

            regards, tom lane



Re: improve EXPLAIN for wide tables

From
Tom Lane
Date:
Sami Imseih <samimseih@gmail.com> writes:
>> It only takes one case to mean we have to deal with it ;-).  But I'm
>> fairly sure that there are many other cases, since the parser doesn't
>> restrict the output names of a sub-SELECT to be unique.

> good point. I see the error in my original line of thinking now.
> In fact, it's this simple to prove that we still need to unique-ify
> something like this subquery is valid:

> select * from (select 1 a, 2 a) as s

Actually, I was expecting you to cite that as a counterexample ;-)
because EXPLAIN doesn't show the sub-select's column names in
such cases:

=# explain verbose select * from (select 1 a, 2 a) as s;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=8)
   Output: 1, 2
(2 rows)

You need something where we don't elide the SubqueryScan node
to show there's an issue, for example:

=# explain verbose select * from (select 1 a, 2 b, 3 b limit 4) as s where a < 3;
                      QUERY PLAN
-------------------------------------------------------
 Subquery Scan on s  (cost=0.00..0.02 rows=1 width=12)
   Output: s.a, s.b, s.b_1
   Filter: (s.a < 3)
   ->  Limit  (cost=0.00..0.01 rows=1 width=12)
         Output: 1, 2, 3
         ->  Result  (cost=0.00..0.01 rows=1 width=12)
               Output: 1, 2, 3
(7 rows)

> I suspect that we can also skip RTE_RELATION, since columns must
> be unique, but I am not sure it's worth the extra effort. At least my test
> does not show any real benefit.

Yeah, I was thinking of that too.  Seems like it ought to be
a noticeable improvement if the join case is.

> I am attaching a patch that deals with the RTE_JOIN case.

I'll take a look.  Thanks for the test demonstrating that
this makes a visible performance difference.

            regards, tom lane



Re: improve EXPLAIN for wide tables

From
Tom Lane
Date:
I wrote:
> Sami Imseih <samimseih@gmail.com> writes:
>> I am attaching a patch that deals with the RTE_JOIN case.

> I'll take a look.  Thanks for the test demonstrating that
> this makes a visible performance difference.

Pushed with some simplification: we don't need a new flag,
because none of the callers of set_simple_column_names need it
to do anything with join RTEs.  This is better anyway because
set_relation_column_names' comment explicitly says it is not
for join RTEs, and now we don't use it on them ever.

I poked at the question of whether it's worth skipping
unique-ification for relation RTEs, and I came to the same
conclusion as you: it doesn't seem to be.  The related code
is down in the noise according to "perf" once we skip join
RTEs.  I think the reason the join RTEs are so expensive for
this is that the upper ones get very wide in join nests like
the example query.

            regards, tom lane