Thread: enable_incremental_sort changes query behavior

enable_incremental_sort changes query behavior

From
Jaime Casanova
Date:
Hi,

With sqlsmith I found a query that gives this error:
ERROR:  ORDER/GROUP BY expression not found in targetlist

I noted the query (sql query below, sorry it uses custom tables i
couldn't replicate with regression tables) because it doesn't include
an ORDER/GROUP BY clause.

--- 0 ----
select distinct
          subq_0.c1 as c0,
          ref_0.radi_usua_radi as c1,
          ref_0.radi_nume_asoc as c2,
          subq_0.c1 as c3,
          case when (cast(null as pg_lsn) >=
pg_catalog.pg_last_wal_receive_lsn())
              and (true = pg_catalog.pg_rotate_logfile_old()) then
ref_0.radi_usua_rem else ref_0.radi_usua_rem end
             as c4,
          cast(nullif((select hist_codi from public.hist_eventos_2
limit 1 offset 4)
              ,
            pg_catalog.pg_stat_get_buf_alloc()) as int8) as c5
        from
          public.radicado_2 as ref_0,
          lateral (select
                ref_0.radi_text_temp as c0,
                ref_0.radi_usua_actu as c1
              from
                public.hist_eventos_1 as ref_1
              where cast(nullif(cast(null as float4),
                  cast(null as float4)) as float4) >= pg_catalog.pi()) as subq_0
        where ref_0.radi_usua_dest is not NULL;
--- 0 ----

Attached the stack trace produced until de elog that produces the message.

But if I set enable_incremental_sort to off the query gets executed
without problems (attached the explain produced for that case)

-- 
Jaime Casanova                      www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Sat, Sep 26, 2020 at 2:49 PM Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:
>
> Hi,
>
> With sqlsmith I found a query that gives this error:
> ERROR:  ORDER/GROUP BY expression not found in targetlist
>
> I noted the query (sql query below, sorry it uses custom tables i
> couldn't replicate with regression tables) because it doesn't include
> an ORDER/GROUP BY clause.
>
> --- 0 ----
> select distinct
>           subq_0.c1 as c0,
>           ref_0.radi_usua_radi as c1,
>           ref_0.radi_nume_asoc as c2,
>           subq_0.c1 as c3,
>           case when (cast(null as pg_lsn) >=
> pg_catalog.pg_last_wal_receive_lsn())
>               and (true = pg_catalog.pg_rotate_logfile_old()) then
> ref_0.radi_usua_rem else ref_0.radi_usua_rem end
>              as c4,
>           cast(nullif((select hist_codi from public.hist_eventos_2
> limit 1 offset 4)
>               ,
>             pg_catalog.pg_stat_get_buf_alloc()) as int8) as c5
>         from
>           public.radicado_2 as ref_0,
>           lateral (select
>                 ref_0.radi_text_temp as c0,
>                 ref_0.radi_usua_actu as c1
>               from
>                 public.hist_eventos_1 as ref_1
>               where cast(nullif(cast(null as float4),
>                   cast(null as float4)) as float4) >= pg_catalog.pi()) as subq_0
>         where ref_0.radi_usua_dest is not NULL;
> --- 0 ----
>
> Attached the stack trace produced until de elog that produces the message.
>
> But if I set enable_incremental_sort to off the query gets executed
> without problems (attached the explain produced for that case)

Thanks for the report.

Is there by an chance an index on ref_0.radi_text_temp?

And if you set enable_hashagg = off what plan do you get (or error)?

Thanks,
James



Re: enable_incremental_sort changes query behavior

From
Jaime Casanova
Date:
On Wed, 30 Sep 2020 at 21:21, James Coleman <jtc331@gmail.com> wrote:
>
> On Sat, Sep 26, 2020 at 2:49 PM Jaime Casanova
> <jaime.casanova@2ndquadrant.com> wrote:
> >
> > Hi,
> >
> > With sqlsmith I found a query that gives this error:
> > ERROR:  ORDER/GROUP BY expression not found in targetlist
> >
[...]
> >
> > But if I set enable_incremental_sort to off the query gets executed
> > without problems (attached the explain produced for that case)
>
> Thanks for the report.
>

Hi,

by experiment I reduced the query to this

--- 0 ---
select distinct
        subq_0.c1 as c0,
        case when (true = pg_catalog.pg_rotate_logfile_old()) then
                ref_0.t else ref_0.t
        end
             as c4
        from
          public.ref_0,
          lateral (select

                ref_0.i as c1
              from
                generate_series(1, 100) as ref_1) as subq_0
--- 0 ---

the only custom table already needed can be created with this commands:

--- 0 ---
create table ref_0 as select repeat('abcde', (random() * 10)::integer)
t, random() * 1000 i from generate_series(1, 500000);
create index on ref_0 (i);
analyze ref_0 ;
--- 0 ---


> Is there by an chance an index on ref_0.radi_text_temp?
>

there is an index involved but not on that field, commands above
create the index in the right column... after that, ANALYZE the table

> And if you set enable_hashagg = off what plan do you get (or error)?
>

same error

-- 
Jaime Casanova                      www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Thu, Oct 1, 2020 at 3:09 AM Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:
>
> On Wed, 30 Sep 2020 at 21:21, James Coleman <jtc331@gmail.com> wrote:
> >
> > On Sat, Sep 26, 2020 at 2:49 PM Jaime Casanova
> > <jaime.casanova@2ndquadrant.com> wrote:
> > >
> > > Hi,
> > >
> > > With sqlsmith I found a query that gives this error:
> > > ERROR:  ORDER/GROUP BY expression not found in targetlist
> > >
> [...]
> > >
> > > But if I set enable_incremental_sort to off the query gets executed
> > > without problems (attached the explain produced for that case)
> >
> > Thanks for the report.
> >
>
> Hi,
>
> by experiment I reduced the query to this
>
> --- 0 ---
> select distinct
>         subq_0.c1 as c0,
>         case when (true = pg_catalog.pg_rotate_logfile_old()) then
>                 ref_0.t else ref_0.t
>         end
>              as c4
>         from
>           public.ref_0,
>           lateral (select
>
>                 ref_0.i as c1
>               from
>                 generate_series(1, 100) as ref_1) as subq_0
> --- 0 ---
>
> the only custom table already needed can be created with this commands:
>
> --- 0 ---
> create table ref_0 as select repeat('abcde', (random() * 10)::integer)
> t, random() * 1000 i from generate_series(1, 500000);
> create index on ref_0 (i);
> analyze ref_0 ;
> --- 0 ---
>
>
> > Is there by an chance an index on ref_0.radi_text_temp?
> >
>
> there is an index involved but not on that field, commands above
> create the index in the right column... after that, ANALYZE the table
>
> > And if you set enable_hashagg = off what plan do you get (or error)?
> >
>
> same error

I was able to reproduce the error without incremental sort enabled
(i.e., it happens with a full sort also). The function call in the
SELECT doesn't have to be in a case expression; for example I was able
to reproduce changing that to `random()::text || ref_0.t`.

It looks like the issue happens when:
1. The sort happens within a parallel node.
2. One of the sort keys is an expression containing a volatile
function call and a column from the lateral join.

Here are the settings I used with your above repro case to show it
with regular sort:

 enable_hashagg=off
 enable_incremental_sort=off
 enable_seqscan=off
 parallel_setup_cost=10
 parallel_tuple_cost=0

The plan (obtained by replacing the volatile function with a stable one):

 Unique
   ->  Nested Loop
         ->  Gather Merge
               Workers Planned: 2
               ->  Sort
                     Sort Key: ref_0.i, (md5(ref_0.t))
                     ->  Parallel Index Scan using ref_0_i_idx on ref_0
         ->  Function Scan on generate_series ref_1

Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.

I haven't been able to dig further than that yet, but my intuition is
to poke around in the parallel query machinery?

James



Re: enable_incremental_sort changes query behavior

From
Tomas Vondra
Date:
On Thu, Oct 01, 2020 at 09:02:57AM -0400, James Coleman wrote:
>On Thu, Oct 1, 2020 at 3:09 AM Jaime Casanova
><jaime.casanova@2ndquadrant.com> wrote:
>>
>> On Wed, 30 Sep 2020 at 21:21, James Coleman <jtc331@gmail.com> wrote:
>> >
>> > On Sat, Sep 26, 2020 at 2:49 PM Jaime Casanova
>> > <jaime.casanova@2ndquadrant.com> wrote:
>> > >
>> > > Hi,
>> > >
>> > > With sqlsmith I found a query that gives this error:
>> > > ERROR:  ORDER/GROUP BY expression not found in targetlist
>> > >
>> [...]
>> > >
>> > > But if I set enable_incremental_sort to off the query gets executed
>> > > without problems (attached the explain produced for that case)
>> >
>> > Thanks for the report.
>> >
>>
>> Hi,
>>
>> by experiment I reduced the query to this
>>
>> --- 0 ---
>> select distinct
>>         subq_0.c1 as c0,
>>         case when (true = pg_catalog.pg_rotate_logfile_old()) then
>>                 ref_0.t else ref_0.t
>>         end
>>              as c4
>>         from
>>           public.ref_0,
>>           lateral (select
>>
>>                 ref_0.i as c1
>>               from
>>                 generate_series(1, 100) as ref_1) as subq_0
>> --- 0 ---
>>
>> the only custom table already needed can be created with this commands:
>>
>> --- 0 ---
>> create table ref_0 as select repeat('abcde', (random() * 10)::integer)
>> t, random() * 1000 i from generate_series(1, 500000);
>> create index on ref_0 (i);
>> analyze ref_0 ;
>> --- 0 ---
>>
>>
>> > Is there by an chance an index on ref_0.radi_text_temp?
>> >
>>
>> there is an index involved but not on that field, commands above
>> create the index in the right column... after that, ANALYZE the table
>>
>> > And if you set enable_hashagg = off what plan do you get (or error)?
>> >
>>
>> same error
>
>I was able to reproduce the error without incremental sort enabled
>(i.e., it happens with a full sort also). The function call in the
>SELECT doesn't have to be in a case expression; for example I was able
>to reproduce changing that to `random()::text || ref_0.t`.
>
>It looks like the issue happens when:
>1. The sort happens within a parallel node.
>2. One of the sort keys is an expression containing a volatile
>function call and a column from the lateral join.
>
>Here are the settings I used with your above repro case to show it
>with regular sort:
>
> enable_hashagg=off
> enable_incremental_sort=off
> enable_seqscan=off
> parallel_setup_cost=10
> parallel_tuple_cost=0
>
>The plan (obtained by replacing the volatile function with a stable one):
>
> Unique
>   ->  Nested Loop
>         ->  Gather Merge
>               Workers Planned: 2
>               ->  Sort
>                     Sort Key: ref_0.i, (md5(ref_0.t))
>                     ->  Parallel Index Scan using ref_0_i_idx on ref_0
>         ->  Function Scan on generate_series ref_1
>
>Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.
>
>I haven't been able to dig further than that yet, but my intuition is
>to poke around in the parallel query machinery?
>

Nope. Bisect says this was introduced by this commit:

     ba3e76cc57 Consider Incremental Sort paths at additional places

Looking at the diff, I suspect generate_useful_gather_paths (or maybe
get_useful_pathkeys_for_relation) fails to do something with the
pathkeys.

Of course, that'd explain why it only happens for parallel plans, as
this builds gather nodes.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Thu, Oct 1, 2020 at 6:08 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Thu, Oct 01, 2020 at 09:02:57AM -0400, James Coleman wrote:
> >On Thu, Oct 1, 2020 at 3:09 AM Jaime Casanova
> ><jaime.casanova@2ndquadrant.com> wrote:
> >>
> >> On Wed, 30 Sep 2020 at 21:21, James Coleman <jtc331@gmail.com> wrote:
> >> >
> >> > On Sat, Sep 26, 2020 at 2:49 PM Jaime Casanova
> >> > <jaime.casanova@2ndquadrant.com> wrote:
> >> > >
> >> > > Hi,
> >> > >
> >> > > With sqlsmith I found a query that gives this error:
> >> > > ERROR:  ORDER/GROUP BY expression not found in targetlist
> >> > >
> >> [...]
> >> > >
> >> > > But if I set enable_incremental_sort to off the query gets executed
> >> > > without problems (attached the explain produced for that case)
> >> >
> >> > Thanks for the report.
> >> >
> >>
> >> Hi,
> >>
> >> by experiment I reduced the query to this
> >>
> >> --- 0 ---
> >> select distinct
> >>         subq_0.c1 as c0,
> >>         case when (true = pg_catalog.pg_rotate_logfile_old()) then
> >>                 ref_0.t else ref_0.t
> >>         end
> >>              as c4
> >>         from
> >>           public.ref_0,
> >>           lateral (select
> >>
> >>                 ref_0.i as c1
> >>               from
> >>                 generate_series(1, 100) as ref_1) as subq_0
> >> --- 0 ---
> >>
> >> the only custom table already needed can be created with this commands:
> >>
> >> --- 0 ---
> >> create table ref_0 as select repeat('abcde', (random() * 10)::integer)
> >> t, random() * 1000 i from generate_series(1, 500000);
> >> create index on ref_0 (i);
> >> analyze ref_0 ;
> >> --- 0 ---
> >>
> >>
> >> > Is there by an chance an index on ref_0.radi_text_temp?
> >> >
> >>
> >> there is an index involved but not on that field, commands above
> >> create the index in the right column... after that, ANALYZE the table
> >>
> >> > And if you set enable_hashagg = off what plan do you get (or error)?
> >> >
> >>
> >> same error
> >
> >I was able to reproduce the error without incremental sort enabled
> >(i.e., it happens with a full sort also). The function call in the
> >SELECT doesn't have to be in a case expression; for example I was able
> >to reproduce changing that to `random()::text || ref_0.t`.
> >
> >It looks like the issue happens when:
> >1. The sort happens within a parallel node.
> >2. One of the sort keys is an expression containing a volatile
> >function call and a column from the lateral join.
> >
> >Here are the settings I used with your above repro case to show it
> >with regular sort:
> >
> > enable_hashagg=off
> > enable_incremental_sort=off
> > enable_seqscan=off
> > parallel_setup_cost=10
> > parallel_tuple_cost=0
> >
> >The plan (obtained by replacing the volatile function with a stable one):
> >
> > Unique
> >   ->  Nested Loop
> >         ->  Gather Merge
> >               Workers Planned: 2
> >               ->  Sort
> >                     Sort Key: ref_0.i, (md5(ref_0.t))
> >                     ->  Parallel Index Scan using ref_0_i_idx on ref_0
> >         ->  Function Scan on generate_series ref_1
> >
> >Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.
> >
> >I haven't been able to dig further than that yet, but my intuition is
> >to poke around in the parallel query machinery?
> >
>
> Nope. Bisect says this was introduced by this commit:
>
>      ba3e76cc57 Consider Incremental Sort paths at additional places
>
> Looking at the diff, I suspect generate_useful_gather_paths (or maybe
> get_useful_pathkeys_for_relation) fails to do something with the
> pathkeys.
>
> Of course, that'd explain why it only happens for parallel plans, as
> this builds gather nodes.

I should have known I'd regret making a wild guess. I was in the
middle of bisecting too, but had to set it aside for a bit and hadn't
finished.

I'll try to look into that commit (and I agree those functions are the
likely places to look) as I get a chance here.

James



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Thu, Oct 1, 2020 at 9:10 PM James Coleman <jtc331@gmail.com> wrote:
>
> On Thu, Oct 1, 2020 at 6:08 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> >
> > On Thu, Oct 01, 2020 at 09:02:57AM -0400, James Coleman wrote:
> > >On Thu, Oct 1, 2020 at 3:09 AM Jaime Casanova
> > ><jaime.casanova@2ndquadrant.com> wrote:
> > >>
> > >> On Wed, 30 Sep 2020 at 21:21, James Coleman <jtc331@gmail.com> wrote:
> > >> >
> > >> > On Sat, Sep 26, 2020 at 2:49 PM Jaime Casanova
> > >> > <jaime.casanova@2ndquadrant.com> wrote:
> > >> > >
> > >> > > Hi,
> > >> > >
> > >> > > With sqlsmith I found a query that gives this error:
> > >> > > ERROR:  ORDER/GROUP BY expression not found in targetlist
> > >> > >
> > >> [...]
> > >> > >
> > >> > > But if I set enable_incremental_sort to off the query gets executed
> > >> > > without problems (attached the explain produced for that case)
> > >> >
> > >> > Thanks for the report.
> > >> >
> > >>
> > >> Hi,
> > >>
> > >> by experiment I reduced the query to this
> > >>
> > >> --- 0 ---
> > >> select distinct
> > >>         subq_0.c1 as c0,
> > >>         case when (true = pg_catalog.pg_rotate_logfile_old()) then
> > >>                 ref_0.t else ref_0.t
> > >>         end
> > >>              as c4
> > >>         from
> > >>           public.ref_0,
> > >>           lateral (select
> > >>
> > >>                 ref_0.i as c1
> > >>               from
> > >>                 generate_series(1, 100) as ref_1) as subq_0
> > >> --- 0 ---
> > >>
> > >> the only custom table already needed can be created with this commands:
> > >>
> > >> --- 0 ---
> > >> create table ref_0 as select repeat('abcde', (random() * 10)::integer)
> > >> t, random() * 1000 i from generate_series(1, 500000);
> > >> create index on ref_0 (i);
> > >> analyze ref_0 ;
> > >> --- 0 ---
> > >>
> > >>
> > >> > Is there by an chance an index on ref_0.radi_text_temp?
> > >> >
> > >>
> > >> there is an index involved but not on that field, commands above
> > >> create the index in the right column... after that, ANALYZE the table
> > >>
> > >> > And if you set enable_hashagg = off what plan do you get (or error)?
> > >> >
> > >>
> > >> same error
> > >
> > >I was able to reproduce the error without incremental sort enabled
> > >(i.e., it happens with a full sort also). The function call in the
> > >SELECT doesn't have to be in a case expression; for example I was able
> > >to reproduce changing that to `random()::text || ref_0.t`.
> > >
> > >It looks like the issue happens when:
> > >1. The sort happens within a parallel node.
> > >2. One of the sort keys is an expression containing a volatile
> > >function call and a column from the lateral join.
> > >
> > >Here are the settings I used with your above repro case to show it
> > >with regular sort:
> > >
> > > enable_hashagg=off
> > > enable_incremental_sort=off
> > > enable_seqscan=off
> > > parallel_setup_cost=10
> > > parallel_tuple_cost=0
> > >
> > >The plan (obtained by replacing the volatile function with a stable one):
> > >
> > > Unique
> > >   ->  Nested Loop
> > >         ->  Gather Merge
> > >               Workers Planned: 2
> > >               ->  Sort
> > >                     Sort Key: ref_0.i, (md5(ref_0.t))
> > >                     ->  Parallel Index Scan using ref_0_i_idx on ref_0
> > >         ->  Function Scan on generate_series ref_1
> > >
> > >Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.
> > >
> > >I haven't been able to dig further than that yet, but my intuition is
> > >to poke around in the parallel query machinery?
> > >
> >
> > Nope. Bisect says this was introduced by this commit:
> >
> >      ba3e76cc57 Consider Incremental Sort paths at additional places
> >
> > Looking at the diff, I suspect generate_useful_gather_paths (or maybe
> > get_useful_pathkeys_for_relation) fails to do something with the
> > pathkeys.
> >
> > Of course, that'd explain why it only happens for parallel plans, as
> > this builds gather nodes.
>
> I should have known I'd regret making a wild guess. I was in the
> middle of bisecting too, but had to set it aside for a bit and hadn't
> finished.
>
> I'll try to look into that commit (and I agree those functions are the
> likely places to look) as I get a chance here.

I've been able to confirm that the problem goes away if we stop adding
the gather merge paths in generate_useful_gather_paths().

I'm not sure yet what conclusion that leads us to. It seems to be that
the biggest clue remains that all of this works correctly unless one
of the selected columns (which happens to be a pathkey at this point
because it's a DISTINCT query) contains a volatile expression.

James



Re: enable_incremental_sort changes query behavior

From
Tomas Vondra
Date:
On Fri, Oct 02, 2020 at 09:19:44AM -0400, James Coleman wrote:
>
> ...
>
>I've been able to confirm that the problem goes away if we stop adding
>the gather merge paths in generate_useful_gather_paths().
>
>I'm not sure yet what conclusion that leads us to. It seems to be that
>the biggest clue remains that all of this works correctly unless one
>of the selected columns (which happens to be a pathkey at this point
>because it's a DISTINCT query) contains a volatile expression.
>

Yeah. It seems to me this is a bug in get_useful_pathkeys_for_relation,
which is calling find_em_expr_for_rel and is happy with anything it
returns. But this was copied from postgres_fdw, which however does a bit
more here:

     if (pathkey_ec->ec_has_volatile ||
         !(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
         !is_foreign_expr(root, rel, em_expr))

So not only does it explicitly check volatility of the pathkey, it also
calls is_foreign_expr which checks the expression for mutable functions.

The attached patch seems to fix this, but it only adds the check for
mutable functions. Maybe it should check ec_has_volatile too ...


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Fri, Oct 2, 2020 at 10:32 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Fri, Oct 02, 2020 at 09:19:44AM -0400, James Coleman wrote:
> >
> > ...
> >
> >I've been able to confirm that the problem goes away if we stop adding
> >the gather merge paths in generate_useful_gather_paths().
> >
> >I'm not sure yet what conclusion that leads us to. It seems to be that
> >the biggest clue remains that all of this works correctly unless one
> >of the selected columns (which happens to be a pathkey at this point
> >because it's a DISTINCT query) contains a volatile expression.
> >
>
> Yeah. It seems to me this is a bug in get_useful_pathkeys_for_relation,
> which is calling find_em_expr_for_rel and is happy with anything it
> returns. But this was copied from postgres_fdw, which however does a bit
> more here:
>
>      if (pathkey_ec->ec_has_volatile ||
>          !(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
>          !is_foreign_expr(root, rel, em_expr))
>
> So not only does it explicitly check volatility of the pathkey, it also
> calls is_foreign_expr which checks the expression for mutable functions.
>
> The attached patch seems to fix this, but it only adds the check for
> mutable functions. Maybe it should check ec_has_volatile too ...

We actually discussed the volatility check in that function back on
the original thread [1], and we'd concluded that was specifically
necessary for the fdw code because the function would execute on two
different servers (and thus produce different results), but that in a
local server only scenario it should be fine.

My understanding (correct me if I'm wrong) is that the volatile
function should only be executed once (at the scan level?) to build
the tuple and from then on the datum isn't going to change, so I'm not
sure why the volatility would matter here.

James

1: https://www.postgresql.org/message-id/20200328025830.6v6ogkseohakc32q%40development



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Fri, Oct 2, 2020 at 10:53 AM James Coleman <jtc331@gmail.com> wrote:
>
> On Fri, Oct 2, 2020 at 10:32 AM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> >
> > On Fri, Oct 02, 2020 at 09:19:44AM -0400, James Coleman wrote:
> > >
> > > ...
> > >
> > >I've been able to confirm that the problem goes away if we stop adding
> > >the gather merge paths in generate_useful_gather_paths().
> > >
> > >I'm not sure yet what conclusion that leads us to. It seems to be that
> > >the biggest clue remains that all of this works correctly unless one
> > >of the selected columns (which happens to be a pathkey at this point
> > >because it's a DISTINCT query) contains a volatile expression.
> > >
> >
> > Yeah. It seems to me this is a bug in get_useful_pathkeys_for_relation,
> > which is calling find_em_expr_for_rel and is happy with anything it
> > returns. But this was copied from postgres_fdw, which however does a bit
> > more here:
> >
> >      if (pathkey_ec->ec_has_volatile ||
> >          !(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
> >          !is_foreign_expr(root, rel, em_expr))
> >
> > So not only does it explicitly check volatility of the pathkey, it also
> > calls is_foreign_expr which checks the expression for mutable functions.
> >
> > The attached patch seems to fix this, but it only adds the check for
> > mutable functions. Maybe it should check ec_has_volatile too ...
>
> We actually discussed the volatility check in that function back on
> the original thread [1], and we'd concluded that was specifically
> necessary for the fdw code because the function would execute on two
> different servers (and thus produce different results), but that in a
> local server only scenario it should be fine.
>
> My understanding (correct me if I'm wrong) is that the volatile
> function should only be executed once (at the scan level?) to build
> the tuple and from then on the datum isn't going to change, so I'm not
> sure why the volatility would matter here.
>
> James
>
> 1: https://www.postgresql.org/message-id/20200328025830.6v6ogkseohakc32q%40development

Oh, hmm, could what I said all be true, but there still be some rule
that you shouldn't compare datums generated from volatile expressions
in different backends (i.e., parallel query)?

James



Re: enable_incremental_sort changes query behavior

From
Tomas Vondra
Date:
On Fri, Oct 02, 2020 at 10:55:14AM -0400, James Coleman wrote:
>On Fri, Oct 2, 2020 at 10:53 AM James Coleman <jtc331@gmail.com> wrote:
>>
>> On Fri, Oct 2, 2020 at 10:32 AM Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> wrote:
>> >
>> > On Fri, Oct 02, 2020 at 09:19:44AM -0400, James Coleman wrote:
>> > >
>> > > ...
>> > >
>> > >I've been able to confirm that the problem goes away if we stop adding
>> > >the gather merge paths in generate_useful_gather_paths().
>> > >
>> > >I'm not sure yet what conclusion that leads us to. It seems to be that
>> > >the biggest clue remains that all of this works correctly unless one
>> > >of the selected columns (which happens to be a pathkey at this point
>> > >because it's a DISTINCT query) contains a volatile expression.
>> > >
>> >
>> > Yeah. It seems to me this is a bug in get_useful_pathkeys_for_relation,
>> > which is calling find_em_expr_for_rel and is happy with anything it
>> > returns. But this was copied from postgres_fdw, which however does a bit
>> > more here:
>> >
>> >      if (pathkey_ec->ec_has_volatile ||
>> >          !(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
>> >          !is_foreign_expr(root, rel, em_expr))
>> >
>> > So not only does it explicitly check volatility of the pathkey, it also
>> > calls is_foreign_expr which checks the expression for mutable functions.
>> >
>> > The attached patch seems to fix this, but it only adds the check for
>> > mutable functions. Maybe it should check ec_has_volatile too ...
>>
>> We actually discussed the volatility check in that function back on
>> the original thread [1], and we'd concluded that was specifically
>> necessary for the fdw code because the function would execute on two
>> different servers (and thus produce different results), but that in a
>> local server only scenario it should be fine.
>>
>> My understanding (correct me if I'm wrong) is that the volatile
>> function should only be executed once (at the scan level?) to build
>> the tuple and from then on the datum isn't going to change, so I'm not
>> sure why the volatility would matter here.
>>
>> James
>>
>> 1: https://www.postgresql.org/message-id/20200328025830.6v6ogkseohakc32q%40development
>
>Oh, hmm, could what I said all be true, but there still be some rule
>that you shouldn't compare datums generated from volatile expressions
>in different backends (i.e., parallel query)?
>

I'm not sure it's all that related to parallel query - it's more likely
that when constructing the paths below the gather merge, this new code
fails to do something important.

I'm not 100% sure how the grouping and volatile functions work, so let
me think aloud here ...

The backtrace looks like this:

     #0  get_sortgroupref_tle 
     #1  0x0000000000808ab9 in prepare_sort_from_pathkeys
     #2  0x000000000080926c in make_sort_from_pathkeys
     #3  0x0000000000801032 in create_sort_plan
     #4  0x00000000007fe7e0 in create_plan_recurse
     #5  0x0000000000800b2c in create_gather_merge_plan
     #6  0x00000000007fe94d in create_plan_recurse
     #7  0x0000000000805328 in create_nestloop_plan
     #8  0x00000000007ff3c5 in create_join_plan
     #9  0x00000000007fe5f8 in create_plan_recurse
     #10 0x0000000000800d68 in create_projection_plan
     #11 0x00000000007fe662 in create_plan_recurse
     #12 0x0000000000801252 in create_upper_unique_plan
     #13 0x00000000007fe760 in create_plan_recurse
     #14 0x00000000007fe4f2 in create_plan
     #15 0x000000000081082f in standard_planner

and the create_sort_plan works with lefttree that is IndexScan, so the
query we're constructing looks like this:

    Distinct
     -> Nestloop
         -> Gather Merge
        -> Sort
            -> Index Scan

and it's the sort that expects to find the expression in the Index Scan
target list. Which seems rather bogus, because clearly the index scan
does not include the expression. (I wonder if it's somehow related that
indexes can't be built on volatile expressions ...)

Anyway, the index scan clearly does not include the expression the sort
references, hence the failure. And the index can can't compute it,
because we probably need to compute it on top of the join I think
(otherwise we might get duplicate values for volatile functions etc.)


Looking at this from a slightly different angle, the root cause here
seems to be that generate_useful_gather_paths uses the pathkeys it gets
from get_useful_pathkeys_for_relation, which means root->query_pathkeys.
But all other create_gather_merge_calls use root->sort_pathkeys, so
maybe this is the actual problem and get_useful_pathkeys_for_relation
should use root->sort_pathkeys instead. That does fix the issue for me
too (and it passes all regression tests).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Fri, Oct 2, 2020 at 2:25 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Fri, Oct 02, 2020 at 10:55:14AM -0400, James Coleman wrote:
> >On Fri, Oct 2, 2020 at 10:53 AM James Coleman <jtc331@gmail.com> wrote:
> >>
> >> On Fri, Oct 2, 2020 at 10:32 AM Tomas Vondra
> >> <tomas.vondra@2ndquadrant.com> wrote:
> >> >
> >> > On Fri, Oct 02, 2020 at 09:19:44AM -0400, James Coleman wrote:
> >> > >
> >> > > ...
> >> > >
> >> > >I've been able to confirm that the problem goes away if we stop adding
> >> > >the gather merge paths in generate_useful_gather_paths().
> >> > >
> >> > >I'm not sure yet what conclusion that leads us to. It seems to be that
> >> > >the biggest clue remains that all of this works correctly unless one
> >> > >of the selected columns (which happens to be a pathkey at this point
> >> > >because it's a DISTINCT query) contains a volatile expression.
> >> > >
> >> >
> >> > Yeah. It seems to me this is a bug in get_useful_pathkeys_for_relation,
> >> > which is calling find_em_expr_for_rel and is happy with anything it
> >> > returns. But this was copied from postgres_fdw, which however does a bit
> >> > more here:
> >> >
> >> >      if (pathkey_ec->ec_has_volatile ||
> >> >          !(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
> >> >          !is_foreign_expr(root, rel, em_expr))
> >> >
> >> > So not only does it explicitly check volatility of the pathkey, it also
> >> > calls is_foreign_expr which checks the expression for mutable functions.
> >> >
> >> > The attached patch seems to fix this, but it only adds the check for
> >> > mutable functions. Maybe it should check ec_has_volatile too ...
> >>
> >> We actually discussed the volatility check in that function back on
> >> the original thread [1], and we'd concluded that was specifically
> >> necessary for the fdw code because the function would execute on two
> >> different servers (and thus produce different results), but that in a
> >> local server only scenario it should be fine.
> >>
> >> My understanding (correct me if I'm wrong) is that the volatile
> >> function should only be executed once (at the scan level?) to build
> >> the tuple and from then on the datum isn't going to change, so I'm not
> >> sure why the volatility would matter here.
> >>
> >> James
> >>
> >> 1: https://www.postgresql.org/message-id/20200328025830.6v6ogkseohakc32q%40development
> >
> >Oh, hmm, could what I said all be true, but there still be some rule
> >that you shouldn't compare datums generated from volatile expressions
> >in different backends (i.e., parallel query)?
> >
>
> I'm not sure it's all that related to parallel query - it's more likely
> that when constructing the paths below the gather merge, this new code
> fails to do something important.
>
> I'm not 100% sure how the grouping and volatile functions work, so let
> me think aloud here ...
>
> The backtrace looks like this:
>
>      #0  get_sortgroupref_tle
>      #1  0x0000000000808ab9 in prepare_sort_from_pathkeys
>      #2  0x000000000080926c in make_sort_from_pathkeys
>      #3  0x0000000000801032 in create_sort_plan
>      #4  0x00000000007fe7e0 in create_plan_recurse
>      #5  0x0000000000800b2c in create_gather_merge_plan
>      #6  0x00000000007fe94d in create_plan_recurse
>      #7  0x0000000000805328 in create_nestloop_plan
>      #8  0x00000000007ff3c5 in create_join_plan
>      #9  0x00000000007fe5f8 in create_plan_recurse
>      #10 0x0000000000800d68 in create_projection_plan
>      #11 0x00000000007fe662 in create_plan_recurse
>      #12 0x0000000000801252 in create_upper_unique_plan
>      #13 0x00000000007fe760 in create_plan_recurse
>      #14 0x00000000007fe4f2 in create_plan
>      #15 0x000000000081082f in standard_planner
>
> and the create_sort_plan works with lefttree that is IndexScan, so the
> query we're constructing looks like this:
>
>     Distinct
>      -> Nestloop
>          -> Gather Merge
>             -> Sort
>                 -> Index Scan
>
> and it's the sort that expects to find the expression in the Index Scan
> target list. Which seems rather bogus, because clearly the index scan
> does not include the expression. (I wonder if it's somehow related that
> indexes can't be built on volatile expressions ...)

The index scan does include values not found in the index though,
right? It returns the whole tuple -- though I'm not sure if it
includes calculated values (BTW: is this what is meant by a
"projection" being added? Or is that something else?)

> Anyway, the index scan clearly does not include the expression the sort
> references, hence the failure. And the index can can't compute it,
> because we probably need to compute it on top of the join I think
> (otherwise we might get duplicate values for volatile functions etc.)

In this particular query there's no reason why we'd have to wait to
compute it until after the join, right?

For example, if we take away the parallelism but still force a Unique
plan, we get this:

 Unique
   ->  Sort
         Sort Key: ref_0.i, (CASE WHEN pg_rotate_logfile_old() THEN
ref_0.t ELSE ref_0.t END)
         ->  Nested Loop
               ->  Seq Scan on ref_0
               ->  Function Scan on generate_series ref_1

And I don't see any reason why the CASE statement couldn't in theory
(I don't know the internals enough to know when it actually happens)
be done as part of the base relation scan (in this case, the seq
scan). It's not dependent on any information from the join.

Furthermore the same plan works correctly for a non-volatile
expression, so that means (I think) that the relation scan is capable
of producing that expression in the tuple it generates (either that or
the sort is generating it?).

> Looking at this from a slightly different angle, the root cause here
> seems to be that generate_useful_gather_paths uses the pathkeys it gets
> from get_useful_pathkeys_for_relation, which means root->query_pathkeys.
> But all other create_gather_merge_calls use root->sort_pathkeys, so
> maybe this is the actual problem and get_useful_pathkeys_for_relation
> should use root->sort_pathkeys instead. That does fix the issue for me
> too (and it passes all regression tests).

So I'm not convinced this is the correct way forward. It seems we want
to be able to apply this as broadly as possible, and using
sort_pathkeys here means we wouldn't use it for the DISTINCT case at
all, right?

So far I think excluding volatile expressions (or mutable functions?)
is the best way we've discussed so far, though I'm clear yet on why it
is that that's a necessary restriction. If the answer is "there are
cases where it would be unsafe for parallel query even though it isn't
here...", I'm fine with that, but if so, we should make sure that's in
the code comments/readmes somewhere.

I also verified that the functions I've been playing with
(pg_rotate_logfile_old and md5) are marked parallel safe in pg_proc.

James



Re: enable_incremental_sort changes query behavior

From
Tomas Vondra
Date:
On Fri, Oct 02, 2020 at 04:12:11PM -0400, James Coleman wrote:
>On Fri, Oct 2, 2020 at 2:25 PM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Fri, Oct 02, 2020 at 10:55:14AM -0400, James Coleman wrote:
>> >On Fri, Oct 2, 2020 at 10:53 AM James Coleman <jtc331@gmail.com> wrote:
>> >>
>> >> On Fri, Oct 2, 2020 at 10:32 AM Tomas Vondra
>> >> <tomas.vondra@2ndquadrant.com> wrote:
>> >> >
>> >> > On Fri, Oct 02, 2020 at 09:19:44AM -0400, James Coleman wrote:
>> >> > >
>> >> > > ...
>> >> > >
>> >> > >I've been able to confirm that the problem goes away if we stop adding
>> >> > >the gather merge paths in generate_useful_gather_paths().
>> >> > >
>> >> > >I'm not sure yet what conclusion that leads us to. It seems to be that
>> >> > >the biggest clue remains that all of this works correctly unless one
>> >> > >of the selected columns (which happens to be a pathkey at this point
>> >> > >because it's a DISTINCT query) contains a volatile expression.
>> >> > >
>> >> >
>> >> > Yeah. It seems to me this is a bug in get_useful_pathkeys_for_relation,
>> >> > which is calling find_em_expr_for_rel and is happy with anything it
>> >> > returns. But this was copied from postgres_fdw, which however does a bit
>> >> > more here:
>> >> >
>> >> >      if (pathkey_ec->ec_has_volatile ||
>> >> >          !(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
>> >> >          !is_foreign_expr(root, rel, em_expr))
>> >> >
>> >> > So not only does it explicitly check volatility of the pathkey, it also
>> >> > calls is_foreign_expr which checks the expression for mutable functions.
>> >> >
>> >> > The attached patch seems to fix this, but it only adds the check for
>> >> > mutable functions. Maybe it should check ec_has_volatile too ...
>> >>
>> >> We actually discussed the volatility check in that function back on
>> >> the original thread [1], and we'd concluded that was specifically
>> >> necessary for the fdw code because the function would execute on two
>> >> different servers (and thus produce different results), but that in a
>> >> local server only scenario it should be fine.
>> >>
>> >> My understanding (correct me if I'm wrong) is that the volatile
>> >> function should only be executed once (at the scan level?) to build
>> >> the tuple and from then on the datum isn't going to change, so I'm not
>> >> sure why the volatility would matter here.
>> >>
>> >> James
>> >>
>> >> 1: https://www.postgresql.org/message-id/20200328025830.6v6ogkseohakc32q%40development
>> >
>> >Oh, hmm, could what I said all be true, but there still be some rule
>> >that you shouldn't compare datums generated from volatile expressions
>> >in different backends (i.e., parallel query)?
>> >
>>
>> I'm not sure it's all that related to parallel query - it's more likely
>> that when constructing the paths below the gather merge, this new code
>> fails to do something important.
>>
>> I'm not 100% sure how the grouping and volatile functions work, so let
>> me think aloud here ...
>>
>> The backtrace looks like this:
>>
>>      #0  get_sortgroupref_tle
>>      #1  0x0000000000808ab9 in prepare_sort_from_pathkeys
>>      #2  0x000000000080926c in make_sort_from_pathkeys
>>      #3  0x0000000000801032 in create_sort_plan
>>      #4  0x00000000007fe7e0 in create_plan_recurse
>>      #5  0x0000000000800b2c in create_gather_merge_plan
>>      #6  0x00000000007fe94d in create_plan_recurse
>>      #7  0x0000000000805328 in create_nestloop_plan
>>      #8  0x00000000007ff3c5 in create_join_plan
>>      #9  0x00000000007fe5f8 in create_plan_recurse
>>      #10 0x0000000000800d68 in create_projection_plan
>>      #11 0x00000000007fe662 in create_plan_recurse
>>      #12 0x0000000000801252 in create_upper_unique_plan
>>      #13 0x00000000007fe760 in create_plan_recurse
>>      #14 0x00000000007fe4f2 in create_plan
>>      #15 0x000000000081082f in standard_planner
>>
>> and the create_sort_plan works with lefttree that is IndexScan, so the
>> query we're constructing looks like this:
>>
>>     Distinct
>>      -> Nestloop
>>          -> Gather Merge
>>             -> Sort
>>                 -> Index Scan
>>
>> and it's the sort that expects to find the expression in the Index Scan
>> target list. Which seems rather bogus, because clearly the index scan
>> does not include the expression. (I wonder if it's somehow related that
>> indexes can't be built on volatile expressions ...)
>
>The index scan does include values not found in the index though,
>right? It returns the whole tuple -- though I'm not sure if it
>includes calculated values (BTW: is this what is meant by a
>"projection" being added? Or is that something else?)
>

AFAIK index scans are projection-capable - at least it seems like that
from is_projection_capable_path. But the volatility blocks that, I think
(see below).

>> Anyway, the index scan clearly does not include the expression the sort
>> references, hence the failure. And the index can can't compute it,
>> because we probably need to compute it on top of the join I think
>> (otherwise we might get duplicate values for volatile functions etc.)
>
>In this particular query there's no reason why we'd have to wait to
>compute it until after the join, right?
>
>For example, if we take away the parallelism but still force a Unique
>plan, we get this:
>
> Unique
>   ->  Sort
>         Sort Key: ref_0.i, (CASE WHEN pg_rotate_logfile_old() THEN
>ref_0.t ELSE ref_0.t END)
>         ->  Nested Loop
>               ->  Seq Scan on ref_0
>               ->  Function Scan on generate_series ref_1
>
>And I don't see any reason why the CASE statement couldn't in theory
>(I don't know the internals enough to know when it actually happens)
>be done as part of the base relation scan (in this case, the seq
>scan). It's not dependent on any information from the join.
>

Imagine a query like this:

    select t1.id, volatile_func() from t1 join t2 using (id);

and now assume t2 contains duplicate values. If the volatile_func gets
evaluated as part of the t1 scan, then we'll get multiple occurrences
in the results, due to the t2 duplicates. I belive volatile functions
are not expected to behave like that - the docs say:

    A query using a volatile function will re-evaluate the function at
    every row where its value is needed..

And I assume this references to rows at the level where the function is
used, i.e. after the join.

>Furthermore the same plan works correctly for a non-volatile
>expression, so that means (I think) that the relation scan is capable
>of producing that expression in the tuple it generates (either that or
>the sort is generating it?).
>

I do believe the reason is exactly that non-volatile expressions can be
pushed down, so that we actually find them in the target list of the
node below the sort.

See explains for the second set of queries, where I used a simple
non-volatile expression.

>> Looking at this from a slightly different angle, the root cause here
>> seems to be that generate_useful_gather_paths uses the pathkeys it gets
>> from get_useful_pathkeys_for_relation, which means root->query_pathkeys.
>> But all other create_gather_merge_calls use root->sort_pathkeys, so
>> maybe this is the actual problem and get_useful_pathkeys_for_relation
>> should use root->sort_pathkeys instead. That does fix the issue for me
>> too (and it passes all regression tests).
>
>So I'm not convinced this is the correct way forward. It seems we want
>to be able to apply this as broadly as possible, and using
>sort_pathkeys here means we wouldn't use it for the DISTINCT case at
>all, right?
>
>So far I think excluding volatile expressions (or mutable functions?)
>is the best way we've discussed so far, though I'm clear yet on why it
>is that that's a necessary restriction. If the answer is "there are
>cases where it would be unsafe for parallel query even though it isn't
>here...", I'm fine with that, but if so, we should make sure that's in
>the code comments/readmes somewhere.
>
>I also verified that the functions I've been playing with
>(pg_rotate_logfile_old and md5) are marked parallel safe in pg_proc.
>

More importanly, it does not actually fix the issue - it does fix that
particular query, but just replacing the DISTINCT with either ORDER BY
or GROUP BY make it fail again :-(

Attached is a simple script I used, demonstrating these issues for the
three cases that expect to have ressortgroupref != 0 (per the comment
before TargetEntry in plannodes.h).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Fri, Oct 02, 2020 at 04:12:11PM -0400, James Coleman wrote:
> >On Fri, Oct 2, 2020 at 2:25 PM Tomas Vondra
> ><tomas.vondra@2ndquadrant.com> wrote:
> >>
> >> On Fri, Oct 02, 2020 at 10:55:14AM -0400, James Coleman wrote:
> >> >On Fri, Oct 2, 2020 at 10:53 AM James Coleman <jtc331@gmail.com> wrote:
> >> >>
> >> >> On Fri, Oct 2, 2020 at 10:32 AM Tomas Vondra
> >> >> <tomas.vondra@2ndquadrant.com> wrote:
> >> >> >
> >> >> > On Fri, Oct 02, 2020 at 09:19:44AM -0400, James Coleman wrote:
> >> >> > >
> >> >> > > ...
> >> >> > >
> >> >> > >I've been able to confirm that the problem goes away if we stop adding
> >> >> > >the gather merge paths in generate_useful_gather_paths().
> >> >> > >
> >> >> > >I'm not sure yet what conclusion that leads us to. It seems to be that
> >> >> > >the biggest clue remains that all of this works correctly unless one
> >> >> > >of the selected columns (which happens to be a pathkey at this point
> >> >> > >because it's a DISTINCT query) contains a volatile expression.
> >> >> > >
> >> >> >
> >> >> > Yeah. It seems to me this is a bug in get_useful_pathkeys_for_relation,
> >> >> > which is calling find_em_expr_for_rel and is happy with anything it
> >> >> > returns. But this was copied from postgres_fdw, which however does a bit
> >> >> > more here:
> >> >> >
> >> >> >      if (pathkey_ec->ec_has_volatile ||
> >> >> >          !(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
> >> >> >          !is_foreign_expr(root, rel, em_expr))
> >> >> >
> >> >> > So not only does it explicitly check volatility of the pathkey, it also
> >> >> > calls is_foreign_expr which checks the expression for mutable functions.
> >> >> >
> >> >> > The attached patch seems to fix this, but it only adds the check for
> >> >> > mutable functions. Maybe it should check ec_has_volatile too ...
> >> >>
> >> >> We actually discussed the volatility check in that function back on
> >> >> the original thread [1], and we'd concluded that was specifically
> >> >> necessary for the fdw code because the function would execute on two
> >> >> different servers (and thus produce different results), but that in a
> >> >> local server only scenario it should be fine.
> >> >>
> >> >> My understanding (correct me if I'm wrong) is that the volatile
> >> >> function should only be executed once (at the scan level?) to build
> >> >> the tuple and from then on the datum isn't going to change, so I'm not
> >> >> sure why the volatility would matter here.
> >> >>
> >> >> James
> >> >>
> >> >> 1: https://www.postgresql.org/message-id/20200328025830.6v6ogkseohakc32q%40development
> >> >
> >> >Oh, hmm, could what I said all be true, but there still be some rule
> >> >that you shouldn't compare datums generated from volatile expressions
> >> >in different backends (i.e., parallel query)?
> >> >
> >>
> >> I'm not sure it's all that related to parallel query - it's more likely
> >> that when constructing the paths below the gather merge, this new code
> >> fails to do something important.
> >>
> >> I'm not 100% sure how the grouping and volatile functions work, so let
> >> me think aloud here ...
> >>
> >> The backtrace looks like this:
> >>
> >>      #0  get_sortgroupref_tle
> >>      #1  0x0000000000808ab9 in prepare_sort_from_pathkeys
> >>      #2  0x000000000080926c in make_sort_from_pathkeys
> >>      #3  0x0000000000801032 in create_sort_plan
> >>      #4  0x00000000007fe7e0 in create_plan_recurse
> >>      #5  0x0000000000800b2c in create_gather_merge_plan
> >>      #6  0x00000000007fe94d in create_plan_recurse
> >>      #7  0x0000000000805328 in create_nestloop_plan
> >>      #8  0x00000000007ff3c5 in create_join_plan
> >>      #9  0x00000000007fe5f8 in create_plan_recurse
> >>      #10 0x0000000000800d68 in create_projection_plan
> >>      #11 0x00000000007fe662 in create_plan_recurse
> >>      #12 0x0000000000801252 in create_upper_unique_plan
> >>      #13 0x00000000007fe760 in create_plan_recurse
> >>      #14 0x00000000007fe4f2 in create_plan
> >>      #15 0x000000000081082f in standard_planner
> >>
> >> and the create_sort_plan works with lefttree that is IndexScan, so the
> >> query we're constructing looks like this:
> >>
> >>     Distinct
> >>      -> Nestloop
> >>          -> Gather Merge
> >>             -> Sort
> >>                 -> Index Scan
> >>
> >> and it's the sort that expects to find the expression in the Index Scan
> >> target list. Which seems rather bogus, because clearly the index scan
> >> does not include the expression. (I wonder if it's somehow related that
> >> indexes can't be built on volatile expressions ...)
> >
> >The index scan does include values not found in the index though,
> >right? It returns the whole tuple -- though I'm not sure if it
> >includes calculated values (BTW: is this what is meant by a
> >"projection" being added? Or is that something else?)
> >
>
> AFAIK index scans are projection-capable - at least it seems like that
> from is_projection_capable_path. But the volatility blocks that, I think
> (see below).
>
> >> Anyway, the index scan clearly does not include the expression the sort
> >> references, hence the failure. And the index can can't compute it,
> >> because we probably need to compute it on top of the join I think
> >> (otherwise we might get duplicate values for volatile functions etc.)
> >
> >In this particular query there's no reason why we'd have to wait to
> >compute it until after the join, right?
> >
> >For example, if we take away the parallelism but still force a Unique
> >plan, we get this:
> >
> > Unique
> >   ->  Sort
> >         Sort Key: ref_0.i, (CASE WHEN pg_rotate_logfile_old() THEN
> >ref_0.t ELSE ref_0.t END)
> >         ->  Nested Loop
> >               ->  Seq Scan on ref_0
> >               ->  Function Scan on generate_series ref_1
> >
> >And I don't see any reason why the CASE statement couldn't in theory
> >(I don't know the internals enough to know when it actually happens)
> >be done as part of the base relation scan (in this case, the seq
> >scan). It's not dependent on any information from the join.
> >
>
> Imagine a query like this:
>
>     select t1.id, volatile_func() from t1 join t2 using (id);
>
> and now assume t2 contains duplicate values. If the volatile_func gets
> evaluated as part of the t1 scan, then we'll get multiple occurrences
> in the results, due to the t2 duplicates. I belive volatile functions
> are not expected to behave like that - the docs say:
>
>     A query using a volatile function will re-evaluate the function at
>     every row where its value is needed..
>
> And I assume this references to rows at the level where the function is
> used, i.e. after the join.
>
> >Furthermore the same plan works correctly for a non-volatile
> >expression, so that means (I think) that the relation scan is capable
> >of producing that expression in the tuple it generates (either that or
> >the sort is generating it?).
> >
>
> I do believe the reason is exactly that non-volatile expressions can be
> pushed down, so that we actually find them in the target list of the
> node below the sort.
>
> See explains for the second set of queries, where I used a simple
> non-volatile expression.
>
> >> Looking at this from a slightly different angle, the root cause here
> >> seems to be that generate_useful_gather_paths uses the pathkeys it gets
> >> from get_useful_pathkeys_for_relation, which means root->query_pathkeys.
> >> But all other create_gather_merge_calls use root->sort_pathkeys, so
> >> maybe this is the actual problem and get_useful_pathkeys_for_relation
> >> should use root->sort_pathkeys instead. That does fix the issue for me
> >> too (and it passes all regression tests).
> >
> >So I'm not convinced this is the correct way forward. It seems we want
> >to be able to apply this as broadly as possible, and using
> >sort_pathkeys here means we wouldn't use it for the DISTINCT case at
> >all, right?
> >
> >So far I think excluding volatile expressions (or mutable functions?)
> >is the best way we've discussed so far, though I'm clear yet on why it
> >is that that's a necessary restriction. If the answer is "there are
> >cases where it would be unsafe for parallel query even though it isn't
> >here...", I'm fine with that, but if so, we should make sure that's in
> >the code comments/readmes somewhere.
> >
> >I also verified that the functions I've been playing with
> >(pg_rotate_logfile_old and md5) are marked parallel safe in pg_proc.
> >
>
> More importanly, it does not actually fix the issue - it does fix that
> particular query, but just replacing the DISTINCT with either ORDER BY
> or GROUP BY make it fail again :-(
>
> Attached is a simple script I used, demonstrating these issues for the
> three cases that expect to have ressortgroupref != 0 (per the comment
> before TargetEntry in plannodes.h).

So does checking for volatile expressions (if you happened to test
that) solve all the cases? If you haven't tested that yet, I can try
to do that this evening.

James



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> >And I don't see any reason why the CASE statement couldn't in theory
> >(I don't know the internals enough to know when it actually happens)
> >be done as part of the base relation scan (in this case, the seq
> >scan). It's not dependent on any information from the join.
> >
>
> Imagine a query like this:
>
>     select t1.id, volatile_func() from t1 join t2 using (id);
>
> and now assume t2 contains duplicate values. If the volatile_func gets
> evaluated as part of the t1 scan, then we'll get multiple occurrences
> in the results, due to the t2 duplicates. I belive volatile functions
> are not expected to behave like that - the docs say:
>
>     A query using a volatile function will re-evaluate the function at
>     every row where its value is needed..
>
> And I assume this references to rows at the level where the function is
> used, i.e. after the join.

Ah, this makes sense. I was thinking exactly the opposite: that you'd
want not to execute volatile functions multiple times for the same
base rel row, but now that you describe it it makes sense why they
shouldn't be.

James



Re: enable_incremental_sort changes query behavior

From
Tomas Vondra
Date:
On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
>On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> ...
>>
>> More importanly, it does not actually fix the issue - it does fix that
>> particular query, but just replacing the DISTINCT with either ORDER BY
>> or GROUP BY make it fail again :-(
>>
>> Attached is a simple script I used, demonstrating these issues for the
>> three cases that expect to have ressortgroupref != 0 (per the comment
>> before TargetEntry in plannodes.h).
>
>So does checking for volatile expressions (if you happened to test
>that) solve all the cases? If you haven't tested that yet, I can try
>to do that this evening.
>

Yes, it does fix all the three queries in the SQL script.

The question however is whether this is the root issue, and whether it's
the right way to fix it. For example - volatility is not the only reason
that may block qual pushdown. If you look at qual_is_pushdown_safe, it
also blocks pushdown of leaky functions in security_barrier views. So I
wonder if that could cause failures too, somehow. But I haven't managed
to create such example.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
> >On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
> ><tomas.vondra@2ndquadrant.com> wrote:
> >>
> >> ...
> >>
> >> More importanly, it does not actually fix the issue - it does fix that
> >> particular query, but just replacing the DISTINCT with either ORDER BY
> >> or GROUP BY make it fail again :-(
> >>
> >> Attached is a simple script I used, demonstrating these issues for the
> >> three cases that expect to have ressortgroupref != 0 (per the comment
> >> before TargetEntry in plannodes.h).
> >
> >So does checking for volatile expressions (if you happened to test
> >that) solve all the cases? If you haven't tested that yet, I can try
> >to do that this evening.
> >
>
> Yes, it does fix all the three queries in the SQL script.
>
> The question however is whether this is the root issue, and whether it's
> the right way to fix it. For example - volatility is not the only reason
> that may block qual pushdown. If you look at qual_is_pushdown_safe, it
> also blocks pushdown of leaky functions in security_barrier views. So I
> wonder if that could cause failures too, somehow. But I haven't managed
> to create such example.

I was about to say that the issue here is slightly different from qual
etc. pushdown, since we're not concerned about quals here, and so I
wonder where we determine what target list entries to put in a given
scan path, but then I realized that implies (maybe!) a simpler
solution. Instead of duplicating checks on target list entries would
be safe, why not check directly in get_useful_pathkeys_for_relation()
whether or not the pathkey has a target list entry?

I haven't been able to try that out yet, and so maybe I'm missing
something, but I need to step out for a bit, so I'll have to look at
it later.

James



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Fri, Oct 2, 2020 at 7:07 PM James Coleman <jtc331@gmail.com> wrote:
>
> On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> >
> > On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
> > >On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
> > ><tomas.vondra@2ndquadrant.com> wrote:
> > >>
> > >> ...
> > >>
> > >> More importanly, it does not actually fix the issue - it does fix that
> > >> particular query, but just replacing the DISTINCT with either ORDER BY
> > >> or GROUP BY make it fail again :-(
> > >>
> > >> Attached is a simple script I used, demonstrating these issues for the
> > >> three cases that expect to have ressortgroupref != 0 (per the comment
> > >> before TargetEntry in plannodes.h).
> > >
> > >So does checking for volatile expressions (if you happened to test
> > >that) solve all the cases? If you haven't tested that yet, I can try
> > >to do that this evening.
> > >
> >
> > Yes, it does fix all the three queries in the SQL script.
> >
> > The question however is whether this is the root issue, and whether it's
> > the right way to fix it. For example - volatility is not the only reason
> > that may block qual pushdown. If you look at qual_is_pushdown_safe, it
> > also blocks pushdown of leaky functions in security_barrier views. So I
> > wonder if that could cause failures too, somehow. But I haven't managed
> > to create such example.
>
> I was about to say that the issue here is slightly different from qual
> etc. pushdown, since we're not concerned about quals here, and so I
> wonder where we determine what target list entries to put in a given
> scan path, but then I realized that implies (maybe!) a simpler
> solution. Instead of duplicating checks on target list entries would
> be safe, why not check directly in get_useful_pathkeys_for_relation()
> whether or not the pathkey has a target list entry?
>
> I haven't been able to try that out yet, and so maybe I'm missing
> something, but I need to step out for a bit, so I'll have to look at
> it later.

I've started poking at this, but haven't yet found a workable
solution. See the attached patch which "solves" the problem by
breaking putting the sort under the gather merge, but it at least
demonstrates conceptually what I think we're interested in doing.

The issue currently is that the comparison of expressions fails -- in
our query, the first column selected shows up as a Var (with the same
contents) but is a different pointer in the em_expr and the reltarget
exprs list. I don't yet see a good way to get the equivalence class
for a PathTarget entry.

James

Attachment

Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Fri, Oct 2, 2020 at 2:25 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> The backtrace looks like this:
>
>      #0  get_sortgroupref_tle
>      #1  0x0000000000808ab9 in prepare_sort_from_pathkeys
>      #2  0x000000000080926c in make_sort_from_pathkeys
>      #3  0x0000000000801032 in create_sort_plan
>      #4  0x00000000007fe7e0 in create_plan_recurse
>      #5  0x0000000000800b2c in create_gather_merge_plan
>      #6  0x00000000007fe94d in create_plan_recurse
>      #7  0x0000000000805328 in create_nestloop_plan
>      #8  0x00000000007ff3c5 in create_join_plan
>      #9  0x00000000007fe5f8 in create_plan_recurse
>      #10 0x0000000000800d68 in create_projection_plan
>      #11 0x00000000007fe662 in create_plan_recurse
>      #12 0x0000000000801252 in create_upper_unique_plan
>      #13 0x00000000007fe760 in create_plan_recurse
>      #14 0x00000000007fe4f2 in create_plan
>      #15 0x000000000081082f in standard_planner
>
> and the create_sort_plan works with lefttree that is IndexScan, so the
> query we're constructing looks like this:
>
>     Distinct
>      -> Nestloop
>          -> Gather Merge
>             -> Sort
>                 -> Index Scan
>
> and it's the sort that expects to find the expression in the Index Scan
> target list. Which seems rather bogus, because clearly the index scan
> does not include the expression. (I wonder if it's somehow related that
> indexes can't be built on volatile expressions ...)
>
> Anyway, the index scan clearly does not include the expression the sort
> references, hence the failure. And the index can can't compute it,
> because we probably need to compute it on top of the join I think
> (otherwise we might get duplicate values for volatile functions etc.)
>
>
> Looking at this from a slightly different angle, the root cause here
> seems to be that generate_useful_gather_paths uses the pathkeys it gets
> from get_useful_pathkeys_for_relation, which means root->query_pathkeys.
> But all other create_gather_merge_calls use root->sort_pathkeys, so
> maybe this is the actual problem and get_useful_pathkeys_for_relation
> should use root->sort_pathkeys instead. That does fix the issue for me
> too (and it passes all regression tests).

So I've been a bit confused how our error could come from working with
root->query_pathkeys when that's what's supposedly being set from the
make_pathkeys_for_sortclauses() call in the backtrace Jaime reported,
but I just realized that the trace I get when reproducing the error is
different -- and matches the one you shared above.

Jaime: was the backtrace in the original report by any chance record
from breakpointing in the first call to get_sortgroupref_tle() (and
one that successfully returned a sort group ref) rather than a call
that hit the elog error on line 379?

James



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Fri, Oct 2, 2020 at 11:16 PM James Coleman <jtc331@gmail.com> wrote:
>
> On Fri, Oct 2, 2020 at 7:07 PM James Coleman <jtc331@gmail.com> wrote:
> >
> > On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra
> > <tomas.vondra@2ndquadrant.com> wrote:
> > >
> > > On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
> > > >On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
> > > ><tomas.vondra@2ndquadrant.com> wrote:
> > > >>
> > > >> ...
> > > >>
> > > >> More importanly, it does not actually fix the issue - it does fix that
> > > >> particular query, but just replacing the DISTINCT with either ORDER BY
> > > >> or GROUP BY make it fail again :-(
> > > >>
> > > >> Attached is a simple script I used, demonstrating these issues for the
> > > >> three cases that expect to have ressortgroupref != 0 (per the comment
> > > >> before TargetEntry in plannodes.h).
> > > >
> > > >So does checking for volatile expressions (if you happened to test
> > > >that) solve all the cases? If you haven't tested that yet, I can try
> > > >to do that this evening.
> > > >
> > >
> > > Yes, it does fix all the three queries in the SQL script.
> > >
> > > The question however is whether this is the root issue, and whether it's
> > > the right way to fix it. For example - volatility is not the only reason
> > > that may block qual pushdown. If you look at qual_is_pushdown_safe, it
> > > also blocks pushdown of leaky functions in security_barrier views. So I
> > > wonder if that could cause failures too, somehow. But I haven't managed
> > > to create such example.
> >
> > I was about to say that the issue here is slightly different from qual
> > etc. pushdown, since we're not concerned about quals here, and so I
> > wonder where we determine what target list entries to put in a given
> > scan path, but then I realized that implies (maybe!) a simpler
> > solution. Instead of duplicating checks on target list entries would
> > be safe, why not check directly in get_useful_pathkeys_for_relation()
> > whether or not the pathkey has a target list entry?
> >
> > I haven't been able to try that out yet, and so maybe I'm missing
> > something, but I need to step out for a bit, so I'll have to look at
> > it later.
>
> I've started poking at this, but haven't yet found a workable
> solution. See the attached patch which "solves" the problem by
> breaking putting the sort under the gather merge, but it at least
> demonstrates conceptually what I think we're interested in doing.
>
> The issue currently is that the comparison of expressions fails -- in
> our query, the first column selected shows up as a Var (with the same
> contents) but is a different pointer in the em_expr and the reltarget
> exprs list. I don't yet see a good way to get the equivalence class
> for a PathTarget entry.

Hmm, I think I was looking to do is the attached patch. I didn't
realize until I did a lot of reading through source that we have an
equal() function that can compare exprs. That (plus the realization in
[1] the originally reported backtrace wasn't where the error actually
came from) convinced me that what we need is to confirm not only that
the all of the vars in the ec member come from the relids in the rel
but also that the expr is actually represented in the target of the
rel.

With the gucs changed as I mentioned earlier both of the plans (with
and without a volatile call in the 2nd select entry) now look like:

 Unique
   ->  Gather Merge
         Workers Planned: 2
         ->  Sort
               Sort Key: ref_0.i, (md5(ref_0.t))
               ->  Nested Loop
                     ->  Parallel Index Scan using ref_0_i_idx on ref_0
                     ->  Function Scan on generate_series ref_1

Without the gucs changed the minimal repro case now doesn't error, but
results in this plan:

 HashAggregate
   Group Key: ref_0.i, CASE WHEN pg_rotate_logfile_old() THEN ref_0.t
ELSE ref_0.t END
   ->  Nested Loop
         ->  Seq Scan on ref_0
         ->  Function Scan on generate_series ref_1

Similarly in your six queries I now only see parallel query showing up
in the last one.

I created an entirely new function because adding the target expr
lookup to the existing find_em_expr_for_rel() function broke a bunch
of postgres_fdw tests. That maybe raises questions about whether that
code also could have problems in theory/in the future, but I didn't
investigate further. In any case we already know it excludes
volatile...so maybe it's fine because in practice that's actually a
broader exclusion than what we're doing here.

This seems to fix the issue, but I'd like feedback on whether it's too
strict. We could of course just check em_has_volatile, but I'm
wondering if that's simultaneously too strict (by not allowing the
volatile expression to be computed in the gather merge supposing
there's no join) and too loose (maybe there are other cases we should
care about?). It also just strikes me as re-encoding rules that should
have already been applied (and thus we should be able to look up in
the data we have if it's safe to use the expr/pathkey). Those are
mostly intuitions though.

James

1: https://www.postgresql.org/message-id/CAAaqYe8zqDAv0Sfak5Riu%2BDKsm-i3ARPursn5v6qTwiCXmkXKQ%40mail.gmail.com

Attachment

Re: enable_incremental_sort changes query behavior

From
Jaime Casanova
Date:
On Sat, 3 Oct 2020 at 08:15, James Coleman <jtc331@gmail.com> wrote:
>
> Jaime: was the backtrace in the original report by any chance record
> from breakpointing in the first call to get_sortgroupref_tle() (and
> one that successfully returned a sort group ref) rather than a call
> that hit the elog error on line 379?
>

I remember the first backtrace a made for this was on the line of the
elog but then a second backtrace was with the function, now that
repeat it with line of the function for the elog (and the original
query) i see again create_incrementalsort_plan() which made me try
with enable_incremental_sort disabled...

sorry for that, attached a better backtrace

-- 
Jaime Casanova                      www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: enable_incremental_sort changes query behavior

From
Tomas Vondra
Date:
On Sat, Oct 03, 2020 at 10:50:06AM -0400, James Coleman wrote:
>On Fri, Oct 2, 2020 at 11:16 PM James Coleman <jtc331@gmail.com> wrote:
>>
>> On Fri, Oct 2, 2020 at 7:07 PM James Coleman <jtc331@gmail.com> wrote:
>> >
>> > On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra
>> > <tomas.vondra@2ndquadrant.com> wrote:
>> > >
>> > > On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
>> > > >On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
>> > > ><tomas.vondra@2ndquadrant.com> wrote:
>> > > >>
>> > > >> ...
>> > > >>
>> > > >> More importanly, it does not actually fix the issue - it does fix that
>> > > >> particular query, but just replacing the DISTINCT with either ORDER BY
>> > > >> or GROUP BY make it fail again :-(
>> > > >>
>> > > >> Attached is a simple script I used, demonstrating these issues for the
>> > > >> three cases that expect to have ressortgroupref != 0 (per the comment
>> > > >> before TargetEntry in plannodes.h).
>> > > >
>> > > >So does checking for volatile expressions (if you happened to test
>> > > >that) solve all the cases? If you haven't tested that yet, I can try
>> > > >to do that this evening.
>> > > >
>> > >
>> > > Yes, it does fix all the three queries in the SQL script.
>> > >
>> > > The question however is whether this is the root issue, and whether it's
>> > > the right way to fix it. For example - volatility is not the only reason
>> > > that may block qual pushdown. If you look at qual_is_pushdown_safe, it
>> > > also blocks pushdown of leaky functions in security_barrier views. So I
>> > > wonder if that could cause failures too, somehow. But I haven't managed
>> > > to create such example.
>> >
>> > I was about to say that the issue here is slightly different from qual
>> > etc. pushdown, since we're not concerned about quals here, and so I
>> > wonder where we determine what target list entries to put in a given
>> > scan path, but then I realized that implies (maybe!) a simpler
>> > solution. Instead of duplicating checks on target list entries would
>> > be safe, why not check directly in get_useful_pathkeys_for_relation()
>> > whether or not the pathkey has a target list entry?
>> >
>> > I haven't been able to try that out yet, and so maybe I'm missing
>> > something, but I need to step out for a bit, so I'll have to look at
>> > it later.
>>
>> I've started poking at this, but haven't yet found a workable
>> solution. See the attached patch which "solves" the problem by
>> breaking putting the sort under the gather merge, but it at least
>> demonstrates conceptually what I think we're interested in doing.
>>
>> The issue currently is that the comparison of expressions fails -- in
>> our query, the first column selected shows up as a Var (with the same
>> contents) but is a different pointer in the em_expr and the reltarget
>> exprs list. I don't yet see a good way to get the equivalence class
>> for a PathTarget entry.
>
>Hmm, I think I was looking to do is the attached patch. I didn't
>realize until I did a lot of reading through source that we have an
>equal() function that can compare exprs. That (plus the realization in
>[1] the originally reported backtrace wasn't where the error actually
>came from) convinced me that what we need is to confirm not only that
>the all of the vars in the ec member come from the relids in the rel
>but also that the expr is actually represented in the target of the
>rel.
>
>With the gucs changed as I mentioned earlier both of the plans (with
>and without a volatile call in the 2nd select entry) now look like:
>
> Unique
>   ->  Gather Merge
>         Workers Planned: 2
>         ->  Sort
>               Sort Key: ref_0.i, (md5(ref_0.t))
>               ->  Nested Loop
>                     ->  Parallel Index Scan using ref_0_i_idx on ref_0
>                     ->  Function Scan on generate_series ref_1
>
>Without the gucs changed the minimal repro case now doesn't error, but
>results in this plan:
>
> HashAggregate
>   Group Key: ref_0.i, CASE WHEN pg_rotate_logfile_old() THEN ref_0.t
>ELSE ref_0.t END
>   ->  Nested Loop
>         ->  Seq Scan on ref_0
>         ->  Function Scan on generate_series ref_1
>
>Similarly in your six queries I now only see parallel query showing up
>in the last one.
>

OK, that seems reasonable I think.

>I created an entirely new function because adding the target expr
>lookup to the existing find_em_expr_for_rel() function broke a bunch
>of postgres_fdw tests. That maybe raises questions about whether that
>code also could have problems in theory/in the future, but I didn't
>investigate further. In any case we already know it excludes
>volatile...so maybe it's fine because in practice that's actually a
>broader exclusion than what we're doing here.
>

I don't think postgres_fdw needs these new checks, because FDW scan's
are not allowed in parallel part of the plan - if that changes in the
future, I'm sure that'll require fixes in plenty other places.

OTOH it's interesting that it triggers those failures - I wonder if we
could learn something from them? AFAICS those paths can't be built by
generate_useful_gather_paths (because of the postgres_fdw vs. parallel
query restrictions), so how do these plans look?


>This seems to fix the issue, but I'd like feedback on whether it's too
>strict. We could of course just check em_has_volatile, but I'm
>wondering if that's simultaneously too strict (by not allowing the
>volatile expression to be computed in the gather merge supposing
>there's no join) and too loose (maybe there are other cases we should
>care about?). It also just strikes me as re-encoding rules that should
>have already been applied (and thus we should be able to look up in
>the data we have if it's safe to use the expr/pathkey). Those are
>mostly intuitions though.
>

I don't know :-( As I mentioned before, I suspect checking just the
volatility may not be enough in some cases (leakyness, etc.) and I think
you're right it may be too strict in other cases.

Not sure I understand which rules you think we're re-encoding, but I
have a nagging feeling there's a piece of code somewhere earlier in
the query planner meant to prevent such cases (sort on path without all
the pathkeys), and that fixing it here is just a band aid :-(

I'm sure we're building plans with Sort on top of Index Scan paths, so
how come those don't fail? How come the non-parallel version of these
queries don't fail?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Sat, Oct 3, 2020 at 5:44 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Sat, Oct 03, 2020 at 10:50:06AM -0400, James Coleman wrote:
> >On Fri, Oct 2, 2020 at 11:16 PM James Coleman <jtc331@gmail.com> wrote:
> >>
> >> On Fri, Oct 2, 2020 at 7:07 PM James Coleman <jtc331@gmail.com> wrote:
> >> >
> >> > On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra
> >> > <tomas.vondra@2ndquadrant.com> wrote:
> >> > >
> >> > > On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
> >> > > >On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
> >> > > ><tomas.vondra@2ndquadrant.com> wrote:
> >> > > >>
> >> > > >> ...
> >> > > >>
> >> > > >> More importanly, it does not actually fix the issue - it does fix that
> >> > > >> particular query, but just replacing the DISTINCT with either ORDER BY
> >> > > >> or GROUP BY make it fail again :-(
> >> > > >>
> >> > > >> Attached is a simple script I used, demonstrating these issues for the
> >> > > >> three cases that expect to have ressortgroupref != 0 (per the comment
> >> > > >> before TargetEntry in plannodes.h).
> >> > > >
> >> > > >So does checking for volatile expressions (if you happened to test
> >> > > >that) solve all the cases? If you haven't tested that yet, I can try
> >> > > >to do that this evening.
> >> > > >
> >> > >
> >> > > Yes, it does fix all the three queries in the SQL script.
> >> > >
> >> > > The question however is whether this is the root issue, and whether it's
> >> > > the right way to fix it. For example - volatility is not the only reason
> >> > > that may block qual pushdown. If you look at qual_is_pushdown_safe, it
> >> > > also blocks pushdown of leaky functions in security_barrier views. So I
> >> > > wonder if that could cause failures too, somehow. But I haven't managed
> >> > > to create such example.
> >> >
> >> > I was about to say that the issue here is slightly different from qual
> >> > etc. pushdown, since we're not concerned about quals here, and so I
> >> > wonder where we determine what target list entries to put in a given
> >> > scan path, but then I realized that implies (maybe!) a simpler
> >> > solution. Instead of duplicating checks on target list entries would
> >> > be safe, why not check directly in get_useful_pathkeys_for_relation()
> >> > whether or not the pathkey has a target list entry?
> >> >
> >> > I haven't been able to try that out yet, and so maybe I'm missing
> >> > something, but I need to step out for a bit, so I'll have to look at
> >> > it later.
> >>
> >> I've started poking at this, but haven't yet found a workable
> >> solution. See the attached patch which "solves" the problem by
> >> breaking putting the sort under the gather merge, but it at least
> >> demonstrates conceptually what I think we're interested in doing.
> >>
> >> The issue currently is that the comparison of expressions fails -- in
> >> our query, the first column selected shows up as a Var (with the same
> >> contents) but is a different pointer in the em_expr and the reltarget
> >> exprs list. I don't yet see a good way to get the equivalence class
> >> for a PathTarget entry.
> >
> >Hmm, I think I was looking to do is the attached patch. I didn't
> >realize until I did a lot of reading through source that we have an
> >equal() function that can compare exprs. That (plus the realization in
> >[1] the originally reported backtrace wasn't where the error actually
> >came from) convinced me that what we need is to confirm not only that
> >the all of the vars in the ec member come from the relids in the rel
> >but also that the expr is actually represented in the target of the
> >rel.
> >
> >With the gucs changed as I mentioned earlier both of the plans (with
> >and without a volatile call in the 2nd select entry) now look like:
> >
> > Unique
> >   ->  Gather Merge
> >         Workers Planned: 2
> >         ->  Sort
> >               Sort Key: ref_0.i, (md5(ref_0.t))
> >               ->  Nested Loop
> >                     ->  Parallel Index Scan using ref_0_i_idx on ref_0
> >                     ->  Function Scan on generate_series ref_1
> >
> >Without the gucs changed the minimal repro case now doesn't error, but
> >results in this plan:
> >
> > HashAggregate
> >   Group Key: ref_0.i, CASE WHEN pg_rotate_logfile_old() THEN ref_0.t
> >ELSE ref_0.t END
> >   ->  Nested Loop
> >         ->  Seq Scan on ref_0
> >         ->  Function Scan on generate_series ref_1
> >
> >Similarly in your six queries I now only see parallel query showing up
> >in the last one.
> >
>
> OK, that seems reasonable I think.

If we proceed with this patch I'd like to tweak it to check for the
relids subset first on the theory that it will be cheaper than the
expr equality check loop; that change is attached.

> >I created an entirely new function because adding the target expr
> >lookup to the existing find_em_expr_for_rel() function broke a bunch
> >of postgres_fdw tests. That maybe raises questions about whether that
> >code also could have problems in theory/in the future, but I didn't
> >investigate further. In any case we already know it excludes
> >volatile...so maybe it's fine because in practice that's actually a
> >broader exclusion than what we're doing here.
> >
>
> I don't think postgres_fdw needs these new checks, because FDW scan's
> are not allowed in parallel part of the plan - if that changes in the
> future, I'm sure that'll require fixes in plenty other places.
>
> OTOH it's interesting that it triggers those failures - I wonder if we
> could learn something from them? AFAICS those paths can't be built by
> generate_useful_gather_paths (because of the postgres_fdw vs. parallel
> query restrictions), so how do these plans look?

Well the fdw code doesn't trigger the errors, but both
generate_useful_gather_paths() and the fdw code originally relied on
find_em_expr_for_rel(), but now they're diverging. IIRC the fdw code
also uses it for determining what path keys are valid -- in that case
which ones are safe to be sent to the foreign server (so all of the
vars have to be from rels on the foreign server). The fdw code has
additional checks too though, for example no volatile expressions may
be pushed to the remote.

> >This seems to fix the issue, but I'd like feedback on whether it's too
> >strict. We could of course just check em_has_volatile, but I'm
> >wondering if that's simultaneously too strict (by not allowing the
> >volatile expression to be computed in the gather merge supposing
> >there's no join) and too loose (maybe there are other cases we should
> >care about?). It also just strikes me as re-encoding rules that should
> >have already been applied (and thus we should be able to look up in
> >the data we have if it's safe to use the expr/pathkey). Those are
> >mostly intuitions though.
> >
>
> I don't know :-( As I mentioned before, I suspect checking just the
> volatility may not be enough in some cases (leakyness, etc.) and I think
> you're right it may be too strict in other cases.
>
> Not sure I understand which rules you think we're re-encoding, but I
> have a nagging feeling there's a piece of code somewhere earlier in
> the query planner meant to prevent such cases (sort on path without all
> the pathkeys), and that fixing it here is just a band aid :-(

I'm getting at exactly what you're saying below: there's got to be
some existing rules (and presumably code already enforcing it in some
places) at what level in the path tree a given pathkey is
safe/correct, and we don't want to reinvent those rules (or ideally
that code). Verifying the expr is in the rel seems to me to be
intuitively correct, but I wish there was another place in the code
that could validate that.

> I'm sure we're building plans with Sort on top of Index Scan paths, so
> how come those don't fail? How come the non-parallel version of these
> queries don't fail?

Yeah, exactly. Something has to be doing something similar -- I don't
think everywhere else using sort_pathkeys instead of query_pathkeys
really changes the problem. I've read through all of the places we use
generate_useful_gather_paths() as well as root->sort_pathkeys to try
to get a better understanding of this. Most of the places I convinced
myself it's already safe -- for example places like
create_orderd_paths are working with the upper rel, and the full
target list is available. A few other places I added comments (see the
"questions" patch in the attached series) where I'm not sure why we
apply projections *after* we create a sort path; given the issue we're
discussing here I would have thought you'd want to do exactly the
opposite.

I haven't yet tried to read through the code in other places where we
build an explicit sort to see if I can find any hints as to what
existing code does to ensure this situation doesn't occur, but so far
I haven't found anything else obvious.

Is there someone who might be able to point us in the right
direction/validate what we're thinking?

James

Attachment

Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Sat, Oct 3, 2020 at 10:10 PM James Coleman <jtc331@gmail.com> wrote:
>
> On Sat, Oct 3, 2020 at 5:44 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> >
> > On Sat, Oct 03, 2020 at 10:50:06AM -0400, James Coleman wrote:
> > >On Fri, Oct 2, 2020 at 11:16 PM James Coleman <jtc331@gmail.com> wrote:
> > >>
> > >> On Fri, Oct 2, 2020 at 7:07 PM James Coleman <jtc331@gmail.com> wrote:
> > >> >
> > >> > On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra
> > >> > <tomas.vondra@2ndquadrant.com> wrote:
> > >> > >
> > >> > > On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
> > >> > > >On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
> > >> > > ><tomas.vondra@2ndquadrant.com> wrote:
> > >> > > >>
> > >> > > >> ...
> > >> > > >>
> > >> > > >> More importanly, it does not actually fix the issue - it does fix that
> > >> > > >> particular query, but just replacing the DISTINCT with either ORDER BY
> > >> > > >> or GROUP BY make it fail again :-(
> > >> > > >>
> > >> > > >> Attached is a simple script I used, demonstrating these issues for the
> > >> > > >> three cases that expect to have ressortgroupref != 0 (per the comment
> > >> > > >> before TargetEntry in plannodes.h).
> > >> > > >
> > >> > > >So does checking for volatile expressions (if you happened to test
> > >> > > >that) solve all the cases? If you haven't tested that yet, I can try
> > >> > > >to do that this evening.
> > >> > > >
> > >> > >
> > >> > > Yes, it does fix all the three queries in the SQL script.
> > >> > >
> > >> > > The question however is whether this is the root issue, and whether it's
> > >> > > the right way to fix it. For example - volatility is not the only reason
> > >> > > that may block qual pushdown. If you look at qual_is_pushdown_safe, it
> > >> > > also blocks pushdown of leaky functions in security_barrier views. So I
> > >> > > wonder if that could cause failures too, somehow. But I haven't managed
> > >> > > to create such example.
> > >> >
> > >> > I was about to say that the issue here is slightly different from qual
> > >> > etc. pushdown, since we're not concerned about quals here, and so I
> > >> > wonder where we determine what target list entries to put in a given
> > >> > scan path, but then I realized that implies (maybe!) a simpler
> > >> > solution. Instead of duplicating checks on target list entries would
> > >> > be safe, why not check directly in get_useful_pathkeys_for_relation()
> > >> > whether or not the pathkey has a target list entry?
> > >> >
> > >> > I haven't been able to try that out yet, and so maybe I'm missing
> > >> > something, but I need to step out for a bit, so I'll have to look at
> > >> > it later.
> > >>
> > >> I've started poking at this, but haven't yet found a workable
> > >> solution. See the attached patch which "solves" the problem by
> > >> breaking putting the sort under the gather merge, but it at least
> > >> demonstrates conceptually what I think we're interested in doing.
> > >>
> > >> The issue currently is that the comparison of expressions fails -- in
> > >> our query, the first column selected shows up as a Var (with the same
> > >> contents) but is a different pointer in the em_expr and the reltarget
> > >> exprs list. I don't yet see a good way to get the equivalence class
> > >> for a PathTarget entry.
> > >
> > >Hmm, I think I was looking to do is the attached patch. I didn't
> > >realize until I did a lot of reading through source that we have an
> > >equal() function that can compare exprs. That (plus the realization in
> > >[1] the originally reported backtrace wasn't where the error actually
> > >came from) convinced me that what we need is to confirm not only that
> > >the all of the vars in the ec member come from the relids in the rel
> > >but also that the expr is actually represented in the target of the
> > >rel.
> > >
> > >With the gucs changed as I mentioned earlier both of the plans (with
> > >and without a volatile call in the 2nd select entry) now look like:
> > >
> > > Unique
> > >   ->  Gather Merge
> > >         Workers Planned: 2
> > >         ->  Sort
> > >               Sort Key: ref_0.i, (md5(ref_0.t))
> > >               ->  Nested Loop
> > >                     ->  Parallel Index Scan using ref_0_i_idx on ref_0
> > >                     ->  Function Scan on generate_series ref_1
> > >
> > >Without the gucs changed the minimal repro case now doesn't error, but
> > >results in this plan:
> > >
> > > HashAggregate
> > >   Group Key: ref_0.i, CASE WHEN pg_rotate_logfile_old() THEN ref_0.t
> > >ELSE ref_0.t END
> > >   ->  Nested Loop
> > >         ->  Seq Scan on ref_0
> > >         ->  Function Scan on generate_series ref_1
> > >
> > >Similarly in your six queries I now only see parallel query showing up
> > >in the last one.
> > >
> >
> > OK, that seems reasonable I think.
>
> If we proceed with this patch I'd like to tweak it to check for the
> relids subset first on the theory that it will be cheaper than the
> expr equality check loop; that change is attached.
>
> > >I created an entirely new function because adding the target expr
> > >lookup to the existing find_em_expr_for_rel() function broke a bunch
> > >of postgres_fdw tests. That maybe raises questions about whether that
> > >code also could have problems in theory/in the future, but I didn't
> > >investigate further. In any case we already know it excludes
> > >volatile...so maybe it's fine because in practice that's actually a
> > >broader exclusion than what we're doing here.
> > >
> >
> > I don't think postgres_fdw needs these new checks, because FDW scan's
> > are not allowed in parallel part of the plan - if that changes in the
> > future, I'm sure that'll require fixes in plenty other places.
> >
> > OTOH it's interesting that it triggers those failures - I wonder if we
> > could learn something from them? AFAICS those paths can't be built by
> > generate_useful_gather_paths (because of the postgres_fdw vs. parallel
> > query restrictions), so how do these plans look?
>
> Well the fdw code doesn't trigger the errors, but both
> generate_useful_gather_paths() and the fdw code originally relied on
> find_em_expr_for_rel(), but now they're diverging. IIRC the fdw code
> also uses it for determining what path keys are valid -- in that case
> which ones are safe to be sent to the foreign server (so all of the
> vars have to be from rels on the foreign server). The fdw code has
> additional checks too though, for example no volatile expressions may
> be pushed to the remote.
>
> > >This seems to fix the issue, but I'd like feedback on whether it's too
> > >strict. We could of course just check em_has_volatile, but I'm
> > >wondering if that's simultaneously too strict (by not allowing the
> > >volatile expression to be computed in the gather merge supposing
> > >there's no join) and too loose (maybe there are other cases we should
> > >care about?). It also just strikes me as re-encoding rules that should
> > >have already been applied (and thus we should be able to look up in
> > >the data we have if it's safe to use the expr/pathkey). Those are
> > >mostly intuitions though.
> > >
> >
> > I don't know :-( As I mentioned before, I suspect checking just the
> > volatility may not be enough in some cases (leakyness, etc.) and I think
> > you're right it may be too strict in other cases.
> >
> > Not sure I understand which rules you think we're re-encoding, but I
> > have a nagging feeling there's a piece of code somewhere earlier in
> > the query planner meant to prevent such cases (sort on path without all
> > the pathkeys), and that fixing it here is just a band aid :-(
>
> I'm getting at exactly what you're saying below: there's got to be
> some existing rules (and presumably code already enforcing it in some
> places) at what level in the path tree a given pathkey is
> safe/correct, and we don't want to reinvent those rules (or ideally
> that code). Verifying the expr is in the rel seems to me to be
> intuitively correct, but I wish there was another place in the code
> that could validate that.
>
> > I'm sure we're building plans with Sort on top of Index Scan paths, so
> > how come those don't fail? How come the non-parallel version of these
> > queries don't fail?
>
> Yeah, exactly. Something has to be doing something similar -- I don't
> think everywhere else using sort_pathkeys instead of query_pathkeys
> really changes the problem. I've read through all of the places we use
> generate_useful_gather_paths() as well as root->sort_pathkeys to try
> to get a better understanding of this. Most of the places I convinced
> myself it's already safe -- for example places like
> create_orderd_paths are working with the upper rel, and the full
> target list is available. A few other places I added comments (see the
> "questions" patch in the attached series) where I'm not sure why we
> apply projections *after* we create a sort path; given the issue we're
> discussing here I would have thought you'd want to do exactly the
> opposite.
>
> I haven't yet tried to read through the code in other places where we
> build an explicit sort to see if I can find any hints as to what
> existing code does to ensure this situation doesn't occur, but so far
> I haven't found anything else obvious.

I did a lot more reading through code, and I concluded that generally
(maybe exclusively?) create_sort_path() is only called on upper rels
(e.g., from create_ordered_paths()), so we already have the full
target list available. I also stepped through in the debugger and
confirmed that non-var exprs generally don't seem to show up in the
pathtarget of paths either on base rels or on join rels. Finally, the
last piece of relevant data is that we don't actually push down sorts
on non-parallel paths even when it would be beneficial. For example,
this query:

select i, md5(i::text) from t2, generate_series(1,2) order by i;

generates this plan:

 Sort
   Sort Key: t2.i
   ->  Nested Loop
         ->  Function Scan on generate_series
         ->  Materialize
               ->  Seq Scan on t2

but it seems obvious to me that it would be better to move the sort to
immediately above the seq scan.

So I conclude from all of this that the real reason we didn't see this
problem before was that we didn't have anything that was trying to
generate useful sort paths lower in the query, so it just worked. To
state it from the other direction, I don't see any indication that
there actually is any code trying to determine what's safe to put in
the target at various join levels; from what I can tell only vars are
distributed at those points. Thinking about it some more, this
actually makes sense, since get_useful_pathkeys() is new code, and
something like it would have been needed already if we wanted to push
sorts down further into the plan.

As a bit of a side note: combined with the above idea about pushing
sort down, it could also be beneficial to try to distribute
expressions lower too.

So I think the previously attached patch is probably correct; the
second patch with the comment questions shows I still don't fully
understand how/when projections are applied (and when they're needed),
though it might have something to do with what I said about only
operating on upper rels.

James



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Sun, Oct 4, 2020 at 9:40 PM James Coleman <jtc331@gmail.com> wrote:
>
> On Sat, Oct 3, 2020 at 10:10 PM James Coleman <jtc331@gmail.com> wrote:
> >
> > On Sat, Oct 3, 2020 at 5:44 PM Tomas Vondra
> > <tomas.vondra@2ndquadrant.com> wrote:
> > >
> > > On Sat, Oct 03, 2020 at 10:50:06AM -0400, James Coleman wrote:
> > > >On Fri, Oct 2, 2020 at 11:16 PM James Coleman <jtc331@gmail.com> wrote:
> > > >>
> > > >> On Fri, Oct 2, 2020 at 7:07 PM James Coleman <jtc331@gmail.com> wrote:
> > > >> >
> > > >> > On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra
> > > >> > <tomas.vondra@2ndquadrant.com> wrote:
> > > >> > >
> > > >> > > On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
> > > >> > > >On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
> > > >> > > ><tomas.vondra@2ndquadrant.com> wrote:
> > > >> > > >>
> > > >> > > >> ...
> > > >> > > >>
> > > >> > > >> More importanly, it does not actually fix the issue - it does fix that
> > > >> > > >> particular query, but just replacing the DISTINCT with either ORDER BY
> > > >> > > >> or GROUP BY make it fail again :-(
> > > >> > > >>
> > > >> > > >> Attached is a simple script I used, demonstrating these issues for the
> > > >> > > >> three cases that expect to have ressortgroupref != 0 (per the comment
> > > >> > > >> before TargetEntry in plannodes.h).
> > > >> > > >
> > > >> > > >So does checking for volatile expressions (if you happened to test
> > > >> > > >that) solve all the cases? If you haven't tested that yet, I can try
> > > >> > > >to do that this evening.
> > > >> > > >
> > > >> > >
> > > >> > > Yes, it does fix all the three queries in the SQL script.
> > > >> > >
> > > >> > > The question however is whether this is the root issue, and whether it's
> > > >> > > the right way to fix it. For example - volatility is not the only reason
> > > >> > > that may block qual pushdown. If you look at qual_is_pushdown_safe, it
> > > >> > > also blocks pushdown of leaky functions in security_barrier views. So I
> > > >> > > wonder if that could cause failures too, somehow. But I haven't managed
> > > >> > > to create such example.
> > > >> >
> > > >> > I was about to say that the issue here is slightly different from qual
> > > >> > etc. pushdown, since we're not concerned about quals here, and so I
> > > >> > wonder where we determine what target list entries to put in a given
> > > >> > scan path, but then I realized that implies (maybe!) a simpler
> > > >> > solution. Instead of duplicating checks on target list entries would
> > > >> > be safe, why not check directly in get_useful_pathkeys_for_relation()
> > > >> > whether or not the pathkey has a target list entry?
> > > >> >
> > > >> > I haven't been able to try that out yet, and so maybe I'm missing
> > > >> > something, but I need to step out for a bit, so I'll have to look at
> > > >> > it later.
> > > >>
> > > >> I've started poking at this, but haven't yet found a workable
> > > >> solution. See the attached patch which "solves" the problem by
> > > >> breaking putting the sort under the gather merge, but it at least
> > > >> demonstrates conceptually what I think we're interested in doing.
> > > >>
> > > >> The issue currently is that the comparison of expressions fails -- in
> > > >> our query, the first column selected shows up as a Var (with the same
> > > >> contents) but is a different pointer in the em_expr and the reltarget
> > > >> exprs list. I don't yet see a good way to get the equivalence class
> > > >> for a PathTarget entry.
> > > >
> > > >Hmm, I think I was looking to do is the attached patch. I didn't
> > > >realize until I did a lot of reading through source that we have an
> > > >equal() function that can compare exprs. That (plus the realization in
> > > >[1] the originally reported backtrace wasn't where the error actually
> > > >came from) convinced me that what we need is to confirm not only that
> > > >the all of the vars in the ec member come from the relids in the rel
> > > >but also that the expr is actually represented in the target of the
> > > >rel.
> > > >
> > > >With the gucs changed as I mentioned earlier both of the plans (with
> > > >and without a volatile call in the 2nd select entry) now look like:
> > > >
> > > > Unique
> > > >   ->  Gather Merge
> > > >         Workers Planned: 2
> > > >         ->  Sort
> > > >               Sort Key: ref_0.i, (md5(ref_0.t))
> > > >               ->  Nested Loop
> > > >                     ->  Parallel Index Scan using ref_0_i_idx on ref_0
> > > >                     ->  Function Scan on generate_series ref_1
> > > >
> > > >Without the gucs changed the minimal repro case now doesn't error, but
> > > >results in this plan:
> > > >
> > > > HashAggregate
> > > >   Group Key: ref_0.i, CASE WHEN pg_rotate_logfile_old() THEN ref_0.t
> > > >ELSE ref_0.t END
> > > >   ->  Nested Loop
> > > >         ->  Seq Scan on ref_0
> > > >         ->  Function Scan on generate_series ref_1
> > > >
> > > >Similarly in your six queries I now only see parallel query showing up
> > > >in the last one.
> > > >
> > >
> > > OK, that seems reasonable I think.
> >
> > If we proceed with this patch I'd like to tweak it to check for the
> > relids subset first on the theory that it will be cheaper than the
> > expr equality check loop; that change is attached.
> >
> > > >I created an entirely new function because adding the target expr
> > > >lookup to the existing find_em_expr_for_rel() function broke a bunch
> > > >of postgres_fdw tests. That maybe raises questions about whether that
> > > >code also could have problems in theory/in the future, but I didn't
> > > >investigate further. In any case we already know it excludes
> > > >volatile...so maybe it's fine because in practice that's actually a
> > > >broader exclusion than what we're doing here.
> > > >
> > >
> > > I don't think postgres_fdw needs these new checks, because FDW scan's
> > > are not allowed in parallel part of the plan - if that changes in the
> > > future, I'm sure that'll require fixes in plenty other places.
> > >
> > > OTOH it's interesting that it triggers those failures - I wonder if we
> > > could learn something from them? AFAICS those paths can't be built by
> > > generate_useful_gather_paths (because of the postgres_fdw vs. parallel
> > > query restrictions), so how do these plans look?
> >
> > Well the fdw code doesn't trigger the errors, but both
> > generate_useful_gather_paths() and the fdw code originally relied on
> > find_em_expr_for_rel(), but now they're diverging. IIRC the fdw code
> > also uses it for determining what path keys are valid -- in that case
> > which ones are safe to be sent to the foreign server (so all of the
> > vars have to be from rels on the foreign server). The fdw code has
> > additional checks too though, for example no volatile expressions may
> > be pushed to the remote.
> >
> > > >This seems to fix the issue, but I'd like feedback on whether it's too
> > > >strict. We could of course just check em_has_volatile, but I'm
> > > >wondering if that's simultaneously too strict (by not allowing the
> > > >volatile expression to be computed in the gather merge supposing
> > > >there's no join) and too loose (maybe there are other cases we should
> > > >care about?). It also just strikes me as re-encoding rules that should
> > > >have already been applied (and thus we should be able to look up in
> > > >the data we have if it's safe to use the expr/pathkey). Those are
> > > >mostly intuitions though.
> > > >
> > >
> > > I don't know :-( As I mentioned before, I suspect checking just the
> > > volatility may not be enough in some cases (leakyness, etc.) and I think
> > > you're right it may be too strict in other cases.
> > >
> > > Not sure I understand which rules you think we're re-encoding, but I
> > > have a nagging feeling there's a piece of code somewhere earlier in
> > > the query planner meant to prevent such cases (sort on path without all
> > > the pathkeys), and that fixing it here is just a band aid :-(
> >
> > I'm getting at exactly what you're saying below: there's got to be
> > some existing rules (and presumably code already enforcing it in some
> > places) at what level in the path tree a given pathkey is
> > safe/correct, and we don't want to reinvent those rules (or ideally
> > that code). Verifying the expr is in the rel seems to me to be
> > intuitively correct, but I wish there was another place in the code
> > that could validate that.
> >
> > > I'm sure we're building plans with Sort on top of Index Scan paths, so
> > > how come those don't fail? How come the non-parallel version of these
> > > queries don't fail?
> >
> > Yeah, exactly. Something has to be doing something similar -- I don't
> > think everywhere else using sort_pathkeys instead of query_pathkeys
> > really changes the problem. I've read through all of the places we use
> > generate_useful_gather_paths() as well as root->sort_pathkeys to try
> > to get a better understanding of this. Most of the places I convinced
> > myself it's already safe -- for example places like
> > create_orderd_paths are working with the upper rel, and the full
> > target list is available. A few other places I added comments (see the
> > "questions" patch in the attached series) where I'm not sure why we
> > apply projections *after* we create a sort path; given the issue we're
> > discussing here I would have thought you'd want to do exactly the
> > opposite.
> >
> > I haven't yet tried to read through the code in other places where we
> > build an explicit sort to see if I can find any hints as to what
> > existing code does to ensure this situation doesn't occur, but so far
> > I haven't found anything else obvious.
>
> I did a lot more reading through code, and I concluded that generally
> (maybe exclusively?) create_sort_path() is only called on upper rels
> (e.g., from create_ordered_paths()), so we already have the full
> target list available. I also stepped through in the debugger and
> confirmed that non-var exprs generally don't seem to show up in the
> pathtarget of paths either on base rels or on join rels. Finally, the
> last piece of relevant data is that we don't actually push down sorts
> on non-parallel paths even when it would be beneficial. For example,
> this query:
>
> select i, md5(i::text) from t2, generate_series(1,2) order by i;
>
> generates this plan:
>
>  Sort
>    Sort Key: t2.i
>    ->  Nested Loop
>          ->  Function Scan on generate_series
>          ->  Materialize
>                ->  Seq Scan on t2
>
> but it seems obvious to me that it would be better to move the sort to
> immediately above the seq scan.
>
> So I conclude from all of this that the real reason we didn't see this
> problem before was that we didn't have anything that was trying to
> generate useful sort paths lower in the query, so it just worked. To
> state it from the other direction, I don't see any indication that
> there actually is any code trying to determine what's safe to put in
> the target at various join levels; from what I can tell only vars are
> distributed at those points. Thinking about it some more, this
> actually makes sense, since get_useful_pathkeys() is new code, and
> something like it would have been needed already if we wanted to push
> sorts down further into the plan.
>
> As a bit of a side note: combined with the above idea about pushing
> sort down, it could also be beneficial to try to distribute
> expressions lower too.
>
> So I think the previously attached patch is probably correct; the
> second patch with the comment questions shows I still don't fully
> understand how/when projections are applied (and when they're needed),
> though it might have something to do with what I said about only
> operating on upper rels.

Just FYI in case the patch seems mergeable; I'm adding test cases now,
and I think I might make some minor tweaks to the patch.

I hope to have a new version out ~this evening.

James



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Mon, Oct 5, 2020 at 11:33 AM James Coleman <jtc331@gmail.com> wrote:
>
> On Sun, Oct 4, 2020 at 9:40 PM James Coleman <jtc331@gmail.com> wrote:
> >
> > On Sat, Oct 3, 2020 at 10:10 PM James Coleman <jtc331@gmail.com> wrote:
> > >
> > > On Sat, Oct 3, 2020 at 5:44 PM Tomas Vondra
> > > <tomas.vondra@2ndquadrant.com> wrote:
> > > >
> > > > On Sat, Oct 03, 2020 at 10:50:06AM -0400, James Coleman wrote:
> > > > >On Fri, Oct 2, 2020 at 11:16 PM James Coleman <jtc331@gmail.com> wrote:
> > > > >>
> > > > >> On Fri, Oct 2, 2020 at 7:07 PM James Coleman <jtc331@gmail.com> wrote:
> > > > >> >
> > > > >> > On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra
> > > > >> > <tomas.vondra@2ndquadrant.com> wrote:
> > > > >> > >
> > > > >> > > On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
> > > > >> > > >On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
> > > > >> > > ><tomas.vondra@2ndquadrant.com> wrote:
> > > > >> > > >>
> > > > >> > > >> ...
> > > > >> > > >>
> > > > >> > > >> More importanly, it does not actually fix the issue - it does fix that
> > > > >> > > >> particular query, but just replacing the DISTINCT with either ORDER BY
> > > > >> > > >> or GROUP BY make it fail again :-(
> > > > >> > > >>
> > > > >> > > >> Attached is a simple script I used, demonstrating these issues for the
> > > > >> > > >> three cases that expect to have ressortgroupref != 0 (per the comment
> > > > >> > > >> before TargetEntry in plannodes.h).
> > > > >> > > >
> > > > >> > > >So does checking for volatile expressions (if you happened to test
> > > > >> > > >that) solve all the cases? If you haven't tested that yet, I can try
> > > > >> > > >to do that this evening.
> > > > >> > > >
> > > > >> > >
> > > > >> > > Yes, it does fix all the three queries in the SQL script.
> > > > >> > >
> > > > >> > > The question however is whether this is the root issue, and whether it's
> > > > >> > > the right way to fix it. For example - volatility is not the only reason
> > > > >> > > that may block qual pushdown. If you look at qual_is_pushdown_safe, it
> > > > >> > > also blocks pushdown of leaky functions in security_barrier views. So I
> > > > >> > > wonder if that could cause failures too, somehow. But I haven't managed
> > > > >> > > to create such example.
> > > > >> >
> > > > >> > I was about to say that the issue here is slightly different from qual
> > > > >> > etc. pushdown, since we're not concerned about quals here, and so I
> > > > >> > wonder where we determine what target list entries to put in a given
> > > > >> > scan path, but then I realized that implies (maybe!) a simpler
> > > > >> > solution. Instead of duplicating checks on target list entries would
> > > > >> > be safe, why not check directly in get_useful_pathkeys_for_relation()
> > > > >> > whether or not the pathkey has a target list entry?
> > > > >> >
> > > > >> > I haven't been able to try that out yet, and so maybe I'm missing
> > > > >> > something, but I need to step out for a bit, so I'll have to look at
> > > > >> > it later.
> > > > >>
> > > > >> I've started poking at this, but haven't yet found a workable
> > > > >> solution. See the attached patch which "solves" the problem by
> > > > >> breaking putting the sort under the gather merge, but it at least
> > > > >> demonstrates conceptually what I think we're interested in doing.
> > > > >>
> > > > >> The issue currently is that the comparison of expressions fails -- in
> > > > >> our query, the first column selected shows up as a Var (with the same
> > > > >> contents) but is a different pointer in the em_expr and the reltarget
> > > > >> exprs list. I don't yet see a good way to get the equivalence class
> > > > >> for a PathTarget entry.
> > > > >
> > > > >Hmm, I think I was looking to do is the attached patch. I didn't
> > > > >realize until I did a lot of reading through source that we have an
> > > > >equal() function that can compare exprs. That (plus the realization in
> > > > >[1] the originally reported backtrace wasn't where the error actually
> > > > >came from) convinced me that what we need is to confirm not only that
> > > > >the all of the vars in the ec member come from the relids in the rel
> > > > >but also that the expr is actually represented in the target of the
> > > > >rel.
> > > > >
> > > > >With the gucs changed as I mentioned earlier both of the plans (with
> > > > >and without a volatile call in the 2nd select entry) now look like:
> > > > >
> > > > > Unique
> > > > >   ->  Gather Merge
> > > > >         Workers Planned: 2
> > > > >         ->  Sort
> > > > >               Sort Key: ref_0.i, (md5(ref_0.t))
> > > > >               ->  Nested Loop
> > > > >                     ->  Parallel Index Scan using ref_0_i_idx on ref_0
> > > > >                     ->  Function Scan on generate_series ref_1
> > > > >
> > > > >Without the gucs changed the minimal repro case now doesn't error, but
> > > > >results in this plan:
> > > > >
> > > > > HashAggregate
> > > > >   Group Key: ref_0.i, CASE WHEN pg_rotate_logfile_old() THEN ref_0.t
> > > > >ELSE ref_0.t END
> > > > >   ->  Nested Loop
> > > > >         ->  Seq Scan on ref_0
> > > > >         ->  Function Scan on generate_series ref_1
> > > > >
> > > > >Similarly in your six queries I now only see parallel query showing up
> > > > >in the last one.
> > > > >
> > > >
> > > > OK, that seems reasonable I think.
> > >
> > > If we proceed with this patch I'd like to tweak it to check for the
> > > relids subset first on the theory that it will be cheaper than the
> > > expr equality check loop; that change is attached.
> > >
> > > > >I created an entirely new function because adding the target expr
> > > > >lookup to the existing find_em_expr_for_rel() function broke a bunch
> > > > >of postgres_fdw tests. That maybe raises questions about whether that
> > > > >code also could have problems in theory/in the future, but I didn't
> > > > >investigate further. In any case we already know it excludes
> > > > >volatile...so maybe it's fine because in practice that's actually a
> > > > >broader exclusion than what we're doing here.
> > > > >
> > > >
> > > > I don't think postgres_fdw needs these new checks, because FDW scan's
> > > > are not allowed in parallel part of the plan - if that changes in the
> > > > future, I'm sure that'll require fixes in plenty other places.
> > > >
> > > > OTOH it's interesting that it triggers those failures - I wonder if we
> > > > could learn something from them? AFAICS those paths can't be built by
> > > > generate_useful_gather_paths (because of the postgres_fdw vs. parallel
> > > > query restrictions), so how do these plans look?
> > >
> > > Well the fdw code doesn't trigger the errors, but both
> > > generate_useful_gather_paths() and the fdw code originally relied on
> > > find_em_expr_for_rel(), but now they're diverging. IIRC the fdw code
> > > also uses it for determining what path keys are valid -- in that case
> > > which ones are safe to be sent to the foreign server (so all of the
> > > vars have to be from rels on the foreign server). The fdw code has
> > > additional checks too though, for example no volatile expressions may
> > > be pushed to the remote.
> > >
> > > > >This seems to fix the issue, but I'd like feedback on whether it's too
> > > > >strict. We could of course just check em_has_volatile, but I'm
> > > > >wondering if that's simultaneously too strict (by not allowing the
> > > > >volatile expression to be computed in the gather merge supposing
> > > > >there's no join) and too loose (maybe there are other cases we should
> > > > >care about?). It also just strikes me as re-encoding rules that should
> > > > >have already been applied (and thus we should be able to look up in
> > > > >the data we have if it's safe to use the expr/pathkey). Those are
> > > > >mostly intuitions though.
> > > > >
> > > >
> > > > I don't know :-( As I mentioned before, I suspect checking just the
> > > > volatility may not be enough in some cases (leakyness, etc.) and I think
> > > > you're right it may be too strict in other cases.
> > > >
> > > > Not sure I understand which rules you think we're re-encoding, but I
> > > > have a nagging feeling there's a piece of code somewhere earlier in
> > > > the query planner meant to prevent such cases (sort on path without all
> > > > the pathkeys), and that fixing it here is just a band aid :-(
> > >
> > > I'm getting at exactly what you're saying below: there's got to be
> > > some existing rules (and presumably code already enforcing it in some
> > > places) at what level in the path tree a given pathkey is
> > > safe/correct, and we don't want to reinvent those rules (or ideally
> > > that code). Verifying the expr is in the rel seems to me to be
> > > intuitively correct, but I wish there was another place in the code
> > > that could validate that.
> > >
> > > > I'm sure we're building plans with Sort on top of Index Scan paths, so
> > > > how come those don't fail? How come the non-parallel version of these
> > > > queries don't fail?
> > >
> > > Yeah, exactly. Something has to be doing something similar -- I don't
> > > think everywhere else using sort_pathkeys instead of query_pathkeys
> > > really changes the problem. I've read through all of the places we use
> > > generate_useful_gather_paths() as well as root->sort_pathkeys to try
> > > to get a better understanding of this. Most of the places I convinced
> > > myself it's already safe -- for example places like
> > > create_orderd_paths are working with the upper rel, and the full
> > > target list is available. A few other places I added comments (see the
> > > "questions" patch in the attached series) where I'm not sure why we
> > > apply projections *after* we create a sort path; given the issue we're
> > > discussing here I would have thought you'd want to do exactly the
> > > opposite.
> > >
> > > I haven't yet tried to read through the code in other places where we
> > > build an explicit sort to see if I can find any hints as to what
> > > existing code does to ensure this situation doesn't occur, but so far
> > > I haven't found anything else obvious.
> >
> > I did a lot more reading through code, and I concluded that generally
> > (maybe exclusively?) create_sort_path() is only called on upper rels
> > (e.g., from create_ordered_paths()), so we already have the full
> > target list available. I also stepped through in the debugger and
> > confirmed that non-var exprs generally don't seem to show up in the
> > pathtarget of paths either on base rels or on join rels. Finally, the
> > last piece of relevant data is that we don't actually push down sorts
> > on non-parallel paths even when it would be beneficial. For example,
> > this query:
> >
> > select i, md5(i::text) from t2, generate_series(1,2) order by i;
> >
> > generates this plan:
> >
> >  Sort
> >    Sort Key: t2.i
> >    ->  Nested Loop
> >          ->  Function Scan on generate_series
> >          ->  Materialize
> >                ->  Seq Scan on t2
> >
> > but it seems obvious to me that it would be better to move the sort to
> > immediately above the seq scan.
> >
> > So I conclude from all of this that the real reason we didn't see this
> > problem before was that we didn't have anything that was trying to
> > generate useful sort paths lower in the query, so it just worked. To
> > state it from the other direction, I don't see any indication that
> > there actually is any code trying to determine what's safe to put in
> > the target at various join levels; from what I can tell only vars are
> > distributed at those points. Thinking about it some more, this
> > actually makes sense, since get_useful_pathkeys() is new code, and
> > something like it would have been needed already if we wanted to push
> > sorts down further into the plan.
> >
> > As a bit of a side note: combined with the above idea about pushing
> > sort down, it could also be beneficial to try to distribute
> > expressions lower too.
> >
> > So I think the previously attached patch is probably correct; the
> > second patch with the comment questions shows I still don't fully
> > understand how/when projections are applied (and when they're needed),
> > though it might have something to do with what I said about only
> > operating on upper rels.
>
> Just FYI in case the patch seems mergeable; I'm adding test cases now,
> and I think I might make some minor tweaks to the patch.
>
> I hope to have a new version out ~this evening.

All right, here's a modified patch. I did a bit of surgery: the
concept is the same, but I decided to explicitly not the parallels to
(and follow as possible) prepare_sort_from_pathkeys(). That means we
only have to do the expression equality check if the expr is volatile,
and the relids bms check doesn't have to bail if it's empty.

This properly allows us to push the sort all the way down to the base
rel when the only things in the base rel are referenced (including
stable expressions) but pushes the sort up to the upper rel when a
volatile expression is used (technically it would be safe for it to go
lower, but in the current planner that's the only time it appears in
the target list, so allowing that would be a larger functional
change).

I left the questions patch here, though obviously it doesn't need to
be committed. If someone has answers to the questions, I'd be
interested though, but I don't think it affects the correctness of
this patch.

James

Attachment

Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Mon, Oct 5, 2020 at 10:38 PM James Coleman <jtc331@gmail.com> wrote:
>
> On Mon, Oct 5, 2020 at 11:33 AM James Coleman <jtc331@gmail.com> wrote:
> >
> > On Sun, Oct 4, 2020 at 9:40 PM James Coleman <jtc331@gmail.com> wrote:
> > >
> > > On Sat, Oct 3, 2020 at 10:10 PM James Coleman <jtc331@gmail.com> wrote:
> > > >
> > > > On Sat, Oct 3, 2020 at 5:44 PM Tomas Vondra
> > > > <tomas.vondra@2ndquadrant.com> wrote:
> > > > >
> > > > > On Sat, Oct 03, 2020 at 10:50:06AM -0400, James Coleman wrote:
> > > > > >On Fri, Oct 2, 2020 at 11:16 PM James Coleman <jtc331@gmail.com> wrote:
> > > > > >>
> > > > > >> On Fri, Oct 2, 2020 at 7:07 PM James Coleman <jtc331@gmail.com> wrote:
> > > > > >> >
> > > > > >> > On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra
> > > > > >> > <tomas.vondra@2ndquadrant.com> wrote:
> > > > > >> > >
> > > > > >> > > On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
> > > > > >> > > >On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
> > > > > >> > > ><tomas.vondra@2ndquadrant.com> wrote:
> > > > > >> > > >>
> > > > > >> > > >> ...
> > > > > >> > > >>
> > > > > >> > > >> More importanly, it does not actually fix the issue - it does fix that
> > > > > >> > > >> particular query, but just replacing the DISTINCT with either ORDER BY
> > > > > >> > > >> or GROUP BY make it fail again :-(
> > > > > >> > > >>
> > > > > >> > > >> Attached is a simple script I used, demonstrating these issues for the
> > > > > >> > > >> three cases that expect to have ressortgroupref != 0 (per the comment
> > > > > >> > > >> before TargetEntry in plannodes.h).
> > > > > >> > > >
> > > > > >> > > >So does checking for volatile expressions (if you happened to test
> > > > > >> > > >that) solve all the cases? If you haven't tested that yet, I can try
> > > > > >> > > >to do that this evening.
> > > > > >> > > >
> > > > > >> > >
> > > > > >> > > Yes, it does fix all the three queries in the SQL script.
> > > > > >> > >
> > > > > >> > > The question however is whether this is the root issue, and whether it's
> > > > > >> > > the right way to fix it. For example - volatility is not the only reason
> > > > > >> > > that may block qual pushdown. If you look at qual_is_pushdown_safe, it
> > > > > >> > > also blocks pushdown of leaky functions in security_barrier views. So I
> > > > > >> > > wonder if that could cause failures too, somehow. But I haven't managed
> > > > > >> > > to create such example.
> > > > > >> >
> > > > > >> > I was about to say that the issue here is slightly different from qual
> > > > > >> > etc. pushdown, since we're not concerned about quals here, and so I
> > > > > >> > wonder where we determine what target list entries to put in a given
> > > > > >> > scan path, but then I realized that implies (maybe!) a simpler
> > > > > >> > solution. Instead of duplicating checks on target list entries would
> > > > > >> > be safe, why not check directly in get_useful_pathkeys_for_relation()
> > > > > >> > whether or not the pathkey has a target list entry?
> > > > > >> >
> > > > > >> > I haven't been able to try that out yet, and so maybe I'm missing
> > > > > >> > something, but I need to step out for a bit, so I'll have to look at
> > > > > >> > it later.
> > > > > >>
> > > > > >> I've started poking at this, but haven't yet found a workable
> > > > > >> solution. See the attached patch which "solves" the problem by
> > > > > >> breaking putting the sort under the gather merge, but it at least
> > > > > >> demonstrates conceptually what I think we're interested in doing.
> > > > > >>
> > > > > >> The issue currently is that the comparison of expressions fails -- in
> > > > > >> our query, the first column selected shows up as a Var (with the same
> > > > > >> contents) but is a different pointer in the em_expr and the reltarget
> > > > > >> exprs list. I don't yet see a good way to get the equivalence class
> > > > > >> for a PathTarget entry.
> > > > > >
> > > > > >Hmm, I think I was looking to do is the attached patch. I didn't
> > > > > >realize until I did a lot of reading through source that we have an
> > > > > >equal() function that can compare exprs. That (plus the realization in
> > > > > >[1] the originally reported backtrace wasn't where the error actually
> > > > > >came from) convinced me that what we need is to confirm not only that
> > > > > >the all of the vars in the ec member come from the relids in the rel
> > > > > >but also that the expr is actually represented in the target of the
> > > > > >rel.
> > > > > >
> > > > > >With the gucs changed as I mentioned earlier both of the plans (with
> > > > > >and without a volatile call in the 2nd select entry) now look like:
> > > > > >
> > > > > > Unique
> > > > > >   ->  Gather Merge
> > > > > >         Workers Planned: 2
> > > > > >         ->  Sort
> > > > > >               Sort Key: ref_0.i, (md5(ref_0.t))
> > > > > >               ->  Nested Loop
> > > > > >                     ->  Parallel Index Scan using ref_0_i_idx on ref_0
> > > > > >                     ->  Function Scan on generate_series ref_1
> > > > > >
> > > > > >Without the gucs changed the minimal repro case now doesn't error, but
> > > > > >results in this plan:
> > > > > >
> > > > > > HashAggregate
> > > > > >   Group Key: ref_0.i, CASE WHEN pg_rotate_logfile_old() THEN ref_0.t
> > > > > >ELSE ref_0.t END
> > > > > >   ->  Nested Loop
> > > > > >         ->  Seq Scan on ref_0
> > > > > >         ->  Function Scan on generate_series ref_1
> > > > > >
> > > > > >Similarly in your six queries I now only see parallel query showing up
> > > > > >in the last one.
> > > > > >
> > > > >
> > > > > OK, that seems reasonable I think.
> > > >
> > > > If we proceed with this patch I'd like to tweak it to check for the
> > > > relids subset first on the theory that it will be cheaper than the
> > > > expr equality check loop; that change is attached.
> > > >
> > > > > >I created an entirely new function because adding the target expr
> > > > > >lookup to the existing find_em_expr_for_rel() function broke a bunch
> > > > > >of postgres_fdw tests. That maybe raises questions about whether that
> > > > > >code also could have problems in theory/in the future, but I didn't
> > > > > >investigate further. In any case we already know it excludes
> > > > > >volatile...so maybe it's fine because in practice that's actually a
> > > > > >broader exclusion than what we're doing here.
> > > > > >
> > > > >
> > > > > I don't think postgres_fdw needs these new checks, because FDW scan's
> > > > > are not allowed in parallel part of the plan - if that changes in the
> > > > > future, I'm sure that'll require fixes in plenty other places.
> > > > >
> > > > > OTOH it's interesting that it triggers those failures - I wonder if we
> > > > > could learn something from them? AFAICS those paths can't be built by
> > > > > generate_useful_gather_paths (because of the postgres_fdw vs. parallel
> > > > > query restrictions), so how do these plans look?
> > > >
> > > > Well the fdw code doesn't trigger the errors, but both
> > > > generate_useful_gather_paths() and the fdw code originally relied on
> > > > find_em_expr_for_rel(), but now they're diverging. IIRC the fdw code
> > > > also uses it for determining what path keys are valid -- in that case
> > > > which ones are safe to be sent to the foreign server (so all of the
> > > > vars have to be from rels on the foreign server). The fdw code has
> > > > additional checks too though, for example no volatile expressions may
> > > > be pushed to the remote.
> > > >
> > > > > >This seems to fix the issue, but I'd like feedback on whether it's too
> > > > > >strict. We could of course just check em_has_volatile, but I'm
> > > > > >wondering if that's simultaneously too strict (by not allowing the
> > > > > >volatile expression to be computed in the gather merge supposing
> > > > > >there's no join) and too loose (maybe there are other cases we should
> > > > > >care about?). It also just strikes me as re-encoding rules that should
> > > > > >have already been applied (and thus we should be able to look up in
> > > > > >the data we have if it's safe to use the expr/pathkey). Those are
> > > > > >mostly intuitions though.
> > > > > >
> > > > >
> > > > > I don't know :-( As I mentioned before, I suspect checking just the
> > > > > volatility may not be enough in some cases (leakyness, etc.) and I think
> > > > > you're right it may be too strict in other cases.
> > > > >
> > > > > Not sure I understand which rules you think we're re-encoding, but I
> > > > > have a nagging feeling there's a piece of code somewhere earlier in
> > > > > the query planner meant to prevent such cases (sort on path without all
> > > > > the pathkeys), and that fixing it here is just a band aid :-(
> > > >
> > > > I'm getting at exactly what you're saying below: there's got to be
> > > > some existing rules (and presumably code already enforcing it in some
> > > > places) at what level in the path tree a given pathkey is
> > > > safe/correct, and we don't want to reinvent those rules (or ideally
> > > > that code). Verifying the expr is in the rel seems to me to be
> > > > intuitively correct, but I wish there was another place in the code
> > > > that could validate that.
> > > >
> > > > > I'm sure we're building plans with Sort on top of Index Scan paths, so
> > > > > how come those don't fail? How come the non-parallel version of these
> > > > > queries don't fail?
> > > >
> > > > Yeah, exactly. Something has to be doing something similar -- I don't
> > > > think everywhere else using sort_pathkeys instead of query_pathkeys
> > > > really changes the problem. I've read through all of the places we use
> > > > generate_useful_gather_paths() as well as root->sort_pathkeys to try
> > > > to get a better understanding of this. Most of the places I convinced
> > > > myself it's already safe -- for example places like
> > > > create_orderd_paths are working with the upper rel, and the full
> > > > target list is available. A few other places I added comments (see the
> > > > "questions" patch in the attached series) where I'm not sure why we
> > > > apply projections *after* we create a sort path; given the issue we're
> > > > discussing here I would have thought you'd want to do exactly the
> > > > opposite.
> > > >
> > > > I haven't yet tried to read through the code in other places where we
> > > > build an explicit sort to see if I can find any hints as to what
> > > > existing code does to ensure this situation doesn't occur, but so far
> > > > I haven't found anything else obvious.
> > >
> > > I did a lot more reading through code, and I concluded that generally
> > > (maybe exclusively?) create_sort_path() is only called on upper rels
> > > (e.g., from create_ordered_paths()), so we already have the full
> > > target list available. I also stepped through in the debugger and
> > > confirmed that non-var exprs generally don't seem to show up in the
> > > pathtarget of paths either on base rels or on join rels. Finally, the
> > > last piece of relevant data is that we don't actually push down sorts
> > > on non-parallel paths even when it would be beneficial. For example,
> > > this query:
> > >
> > > select i, md5(i::text) from t2, generate_series(1,2) order by i;
> > >
> > > generates this plan:
> > >
> > >  Sort
> > >    Sort Key: t2.i
> > >    ->  Nested Loop
> > >          ->  Function Scan on generate_series
> > >          ->  Materialize
> > >                ->  Seq Scan on t2
> > >
> > > but it seems obvious to me that it would be better to move the sort to
> > > immediately above the seq scan.
> > >
> > > So I conclude from all of this that the real reason we didn't see this
> > > problem before was that we didn't have anything that was trying to
> > > generate useful sort paths lower in the query, so it just worked. To
> > > state it from the other direction, I don't see any indication that
> > > there actually is any code trying to determine what's safe to put in
> > > the target at various join levels; from what I can tell only vars are
> > > distributed at those points. Thinking about it some more, this
> > > actually makes sense, since get_useful_pathkeys() is new code, and
> > > something like it would have been needed already if we wanted to push
> > > sorts down further into the plan.
> > >
> > > As a bit of a side note: combined with the above idea about pushing
> > > sort down, it could also be beneficial to try to distribute
> > > expressions lower too.
> > >
> > > So I think the previously attached patch is probably correct; the
> > > second patch with the comment questions shows I still don't fully
> > > understand how/when projections are applied (and when they're needed),
> > > though it might have something to do with what I said about only
> > > operating on upper rels.
> >
> > Just FYI in case the patch seems mergeable; I'm adding test cases now,
> > and I think I might make some minor tweaks to the patch.
> >
> > I hope to have a new version out ~this evening.
>
> All right, here's a modified patch. I did a bit of surgery: the
> concept is the same, but I decided to explicitly not the parallels to
> (and follow as possible) prepare_sort_from_pathkeys(). That means we
> only have to do the expression equality check if the expr is volatile,
> and the relids bms check doesn't have to bail if it's empty.
>
> This properly allows us to push the sort all the way down to the base
> rel when the only things in the base rel are referenced (including
> stable expressions) but pushes the sort up to the upper rel when a
> volatile expression is used (technically it would be safe for it to go
> lower, but in the current planner that's the only time it appears in
> the target list, so allowing that would be a larger functional
> change).
>
> I left the questions patch here, though obviously it doesn't need to
> be committed. If someone has answers to the questions, I'd be
> interested though, but I don't think it affects the correctness of
> this patch.

And with a typo fixed.

James

Attachment

Re: enable_incremental_sort changes query behavior

From
Jaime Casanova
Date:
On Tue, 6 Oct 2020 at 06:46, James Coleman <jtc331@gmail.com> wrote:
>
> > All right, here's a modified patch. I did a bit of surgery: the
> > concept is the same, but I decided to explicitly not the parallels to
> > (and follow as possible) prepare_sort_from_pathkeys(). That means we
> > only have to do the expression equality check if the expr is volatile,
> > and the relids bms check doesn't have to bail if it's empty.
> >
> > This properly allows us to push the sort all the way down to the base
> > rel when the only things in the base rel are referenced (including
> > stable expressions) but pushes the sort up to the upper rel when a
> > volatile expression is used (technically it would be safe for it to go
> > lower, but in the current planner that's the only time it appears in
> > the target list, so allowing that would be a larger functional
> > change).
> >
> > I left the questions patch here, though obviously it doesn't need to
> > be committed. If someone has answers to the questions, I'd be
> > interested though, but I don't think it affects the correctness of
> > this patch.
>
> And with a typo fixed.
>

Hi James,

Thanks for working on this, I haven't tried the latest patch but the
previous single patch worked fine in the cases you and Tomas found and
with my original example.

I haven't been able to reproduce the same problem with other queries
(I tried RLS and views with barriers, i also wanted to test with LEAK
functions but didn't have the time yet).

Can you please create an entry in the commitfest for this one so we
don't lose track of it?

-- 
Jaime Casanova                      www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Tue, Oct 6, 2020 at 9:28 AM Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:
> Can you please create an entry in the commitfest for this one so we
> don't lose track of it?

Done.



Re: enable_incremental_sort changes query behavior

From
Robert Haas
Date:
On Thu, Oct 1, 2020 at 9:03 AM James Coleman <jtc331@gmail.com> wrote:
> The plan (obtained by replacing the volatile function with a stable one):
>
>  Unique
>    ->  Nested Loop
>          ->  Gather Merge
>                Workers Planned: 2
>                ->  Sort
>                      Sort Key: ref_0.i, (md5(ref_0.t))
>                      ->  Parallel Index Scan using ref_0_i_idx on ref_0
>          ->  Function Scan on generate_series ref_1
>
> Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.

As far as I remember this stuff, target lists for any nodes below the
top of the join tree were previously always just Var nodes. Any
expressions that needed to be computed were computed at the level of
the topmost join, before adding upper planner steps like Agg,
WindowAgg, Limit, etc. But, here you've got a complex expression --
md5(ref_0.t) -- begin computed somewhere in the middle of the plan
tree so that the sort can be performed at that point. However, that
seems likely to break a bunch of assumptions all over the planner,
because, unless a lot of work and surgery has been done since I looked
at this last, that md5(ref_0.t) value is not going to "bubble up" to
higher levels of the plan through the tlists of the individual nodes.
That poses a risk that it will be recomputed multiple times, which is
particularly bad for volatile functions because you might not always
get the same answer, but it seems like it might be bad even for
non-volatile functions, because it could be slow if the value is used
at multiple places in the query. It feels like the sort of thing that
might have broken some other assumptions, too, although I can't say
exactly what.

I wonder if whatever part of the incremental sort patch allowed
computation of tlist expressions below the top level of the join tree
ought to just be reverted outright, but that might be overkill. I
can't say that I really understand exactly what's going on here, but
it seems like a pretty significant behavior change to make as part of
another project, and I wonder if we're going to keep finding other
problems with it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Wed, Oct 7, 2020 at 3:52 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Oct 1, 2020 at 9:03 AM James Coleman <jtc331@gmail.com> wrote:
> > The plan (obtained by replacing the volatile function with a stable one):
> >
> >  Unique
> >    ->  Nested Loop
> >          ->  Gather Merge
> >                Workers Planned: 2
> >                ->  Sort
> >                      Sort Key: ref_0.i, (md5(ref_0.t))
> >                      ->  Parallel Index Scan using ref_0_i_idx on ref_0
> >          ->  Function Scan on generate_series ref_1
> >
> > Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.
>
> As far as I remember this stuff, target lists for any nodes below the
> top of the join tree were previously always just Var nodes. Any
> expressions that needed to be computed were computed at the level of
> the topmost join, before adding upper planner steps like Agg,
> WindowAgg, Limit, etc. But, here you've got a complex expression --
> md5(ref_0.t) -- begin computed somewhere in the middle of the plan
> tree so that the sort can be performed at that point. However, that
> seems likely to break a bunch of assumptions all over the planner,
> because, unless a lot of work and surgery has been done since I looked
> at this last, that md5(ref_0.t) value is not going to "bubble up" to
> higher levels of the plan through the tlists of the individual nodes.
> That poses a risk that it will be recomputed multiple times, which is
> particularly bad for volatile functions because you might not always
> get the same answer, but it seems like it might be bad even for
> non-volatile functions, because it could be slow if the value is used
> at multiple places in the query. It feels like the sort of thing that
> might have broken some other assumptions, too, although I can't say
> exactly what.

The distinction between volatile and stable expressions here has been
discussed at length in this thread; I think it'd be worth reading
through the rest of the thread before offering sledgehammer ideas
below like reverting the entire piece.

In particular, most of the discussion in the thread has been about
what is actually safe to reference at lower levels in the plan. It
would be extremely easy tomake this only allow sorting by Vars at
lower levels in the plan, though the current revision of the patch
follows what prepare_sort_from_pathkeys allows and disallows.

Obviously it'd be nice to be able to sort by non-Var expressions at
lower levels too. But again, it'd be easy to change that if it's not
safe, so it'd be helpful if you could point to areas that don't bubble
non-Vars up so that we can actually comment about that in the source
and discuss specifics.

> I wonder if whatever part of the incremental sort patch allowed
> computation of tlist expressions below the top level of the join tree
> ought to just be reverted outright, but that might be overkill. I
> can't say that I really understand exactly what's going on here, but
> it seems like a pretty significant behavior change to make as part of
> another project, and I wonder if we're going to keep finding other
> problems with it.

James



Re: enable_incremental_sort changes query behavior

From
Tomas Vondra
Date:
On Wed, Oct 07, 2020 at 04:01:27PM -0400, James Coleman wrote:
>On Wed, Oct 7, 2020 at 3:52 PM Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Thu, Oct 1, 2020 at 9:03 AM James Coleman <jtc331@gmail.com> wrote:
>> > The plan (obtained by replacing the volatile function with a stable one):
>> >
>> >  Unique
>> >    ->  Nested Loop
>> >          ->  Gather Merge
>> >                Workers Planned: 2
>> >                ->  Sort
>> >                      Sort Key: ref_0.i, (md5(ref_0.t))
>> >                      ->  Parallel Index Scan using ref_0_i_idx on ref_0
>> >          ->  Function Scan on generate_series ref_1
>> >
>> > Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.
>>
>> As far as I remember this stuff, target lists for any nodes below the
>> top of the join tree were previously always just Var nodes. Any
>> expressions that needed to be computed were computed at the level of
>> the topmost join, before adding upper planner steps like Agg,
>> WindowAgg, Limit, etc. But, here you've got a complex expression --
>> md5(ref_0.t) -- begin computed somewhere in the middle of the plan
>> tree so that the sort can be performed at that point. However, that
>> seems likely to break a bunch of assumptions all over the planner,
>> because, unless a lot of work and surgery has been done since I looked
>> at this last, that md5(ref_0.t) value is not going to "bubble up" to
>> higher levels of the plan through the tlists of the individual nodes.
>> That poses a risk that it will be recomputed multiple times, which is
>> particularly bad for volatile functions because you might not always
>> get the same answer, but it seems like it might be bad even for
>> non-volatile functions, because it could be slow if the value is used
>> at multiple places in the query. It feels like the sort of thing that
>> might have broken some other assumptions, too, although I can't say
>> exactly what.
>
>The distinction between volatile and stable expressions here has been
>discussed at length in this thread; I think it'd be worth reading
>through the rest of the thread before offering sledgehammer ideas
>below like reverting the entire piece.
>

Um, that's a tad too harsh, I guess ...

We've discussed the volatile/stable expressions before, but AFAIK we
failed to notice the planner assumes such expressions are expected to be
computed only in the upper parts of the plan. If this assumption really
exists, it probably relies on simply calling create_sort_path just about
right, and the new code looking at useful pathkeys below gather merge
might be breaking that simply by doing it too early.

In that case we can either rip this code out, effectively reverting most
of ba3e76cc571, or consider only pathkeys that we can find in the tlist
of the child path.

I'm still not entirely sure I understand what's happening, or what the
exact rule is. Consider this query:

   explain (verbose) select distinct i, t, md5(t) from ref_0;

which on PG12 (i.e. before incremental sort) is planned like this:

                                  QUERY PLAN
----------------------------------------------------------------------------------
  Unique  (cost=78120.92..83120.92 rows=500000 width=65)
    Output: i, t, (md5(t))
    ->  Sort  (cost=78120.92..79370.92 rows=500000 width=65)
          Output: i, t, (md5(t))
          Sort Key: ref_0.i, ref_0.t, (md5(ref_0.t))
          ->  Seq Scan on public.ref_0  (cost=0.00..10282.00 rows=500000 width=65)
                Output: i, t, md5(t)
(7 rows)

i.e. the (stable) function is pushed all the way to the scan node. And
even if we replace it with a volatile expression it gets pushed down:

explain (verbose) select distinct i, t, md5(random()::text || t) from ref_0;

                                  QUERY PLAN
----------------------------------------------------------------------------------
  Unique  (cost=83120.92..88120.92 rows=500000 width=65)
    Output: i, t, (md5(((random())::text || t)))
    ->  Sort  (cost=83120.92..84370.92 rows=500000 width=65)
          Output: i, t, (md5(((random())::text || t)))
          Sort Key: ref_0.i, ref_0.t, (md5(((random())::text || ref_0.t)))
          ->  Seq Scan on public.ref_0  (cost=0.00..15282.00 rows=500000 width=65)
                Output: i, t, md5(((random())::text || t))
(7 rows)


But perhaps I just don't understand the assumption correctly?

>In particular, most of the discussion in the thread has been about
>what is actually safe to reference at lower levels in the plan. It
>would be extremely easy tomake this only allow sorting by Vars at
>lower levels in the plan, though the current revision of the patch
>follows what prepare_sort_from_pathkeys allows and disallows.
>
>Obviously it'd be nice to be able to sort by non-Var expressions at
>lower levels too. But again, it'd be easy to change that if it's not
>safe, so it'd be helpful if you could point to areas that don't bubble
>non-Vars up so that we can actually comment about that in the source
>and discuss specifics.
>

I think the proble here might be "what we think is theoretically safe"
vs. "what the planner expects". I wouldn't be surprised if this was safe
in theoretical sense, but planner not expecting that.

>> I wonder if whatever part of the incremental sort patch allowed
>> computation of tlist expressions below the top level of the join tree
>> ought to just be reverted outright, but that might be overkill. I
>> can't say that I really understand exactly what's going on here, but
>> it seems like a pretty significant behavior change to make as part of
>> another project, and I wonder if we're going to keep finding other
>> problems with it.
>

It certainly was not my intention to make such change in this patch. I
don't think the earlier parts of the incremental sort have this issue
because all those places were already constructing regular sort before.

But this code is different because it considers adding sort node (both
regular and incremental) below a gather merge, which was not happening
before. I suspect this is the problem, somehow.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: enable_incremental_sort changes query behavior

From
Tomas Vondra
Date:
On Wed, Oct 07, 2020 at 03:52:04PM -0400, Robert Haas wrote:
>On Thu, Oct 1, 2020 at 9:03 AM James Coleman <jtc331@gmail.com> wrote:
>> The plan (obtained by replacing the volatile function with a stable one):
>>
>>  Unique
>>    ->  Nested Loop
>>          ->  Gather Merge
>>                Workers Planned: 2
>>                ->  Sort
>>                      Sort Key: ref_0.i, (md5(ref_0.t))
>>                      ->  Parallel Index Scan using ref_0_i_idx on ref_0
>>          ->  Function Scan on generate_series ref_1
>>
>> Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.
>
>As far as I remember this stuff, target lists for any nodes below the
>top of the join tree were previously always just Var nodes. Any
>expressions that needed to be computed were computed at the level of
>the topmost join, before adding upper planner steps like Agg,
>WindowAgg, Limit, etc. But, here you've got a complex expression --
>md5(ref_0.t) -- begin computed somewhere in the middle of the plan
>tree so that the sort can be performed at that point. However, that
>seems likely to break a bunch of assumptions all over the planner,
>because, unless a lot of work and surgery has been done since I looked
>at this last, that md5(ref_0.t) value is not going to "bubble up" to
>higher levels of the plan through the tlists of the individual nodes.
>
>That poses a risk that it will be recomputed multiple times, which is
>particularly bad for volatile functions because you might not always
>get the same answer, but it seems like it might be bad even for
>non-volatile functions, because it could be slow if the value is used
>at multiple places in the query. It feels like the sort of thing that
>might have broken some other assumptions, too, although I can't say
>exactly what.

I agree this might be a clear correctness issue for volatile functions.
Not sure about stable functions, though - it's easy to construct cases
where pushing the function down results in much faster execution.

Consider this:

create table t (id int, val text);
insert into t select 1, repeat('x', 1000) from generate_series(1,100) s(i);

-- no pushdown
select id, md5(t1.val) from t t1 join t t2 using (id) join t t3 using (id);
Time: 5222.895 ms (00:05.223)

-- manual pushdown
with x as (select id, md5(t.val) from t)
select id, t1.md5 from x t1 join x t2 using (id) join x t3 using (id);
Time: 486.455 ms

Of course, this is independent of what the planner assumes correctly,
which is what's being discussed in this thread. And I'm sure there are
plenty counter-examples too, so we'd need to cost this properly.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Wed, Oct 7, 2020 at 6:22 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Wed, Oct 07, 2020 at 04:01:27PM -0400, James Coleman wrote:
> >On Wed, Oct 7, 2020 at 3:52 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >>
> >> On Thu, Oct 1, 2020 at 9:03 AM James Coleman <jtc331@gmail.com> wrote:
> >> > The plan (obtained by replacing the volatile function with a stable one):
> >> >
> >> >  Unique
> >> >    ->  Nested Loop
> >> >          ->  Gather Merge
> >> >                Workers Planned: 2
> >> >                ->  Sort
> >> >                      Sort Key: ref_0.i, (md5(ref_0.t))
> >> >                      ->  Parallel Index Scan using ref_0_i_idx on ref_0
> >> >          ->  Function Scan on generate_series ref_1
> >> >
> >> > Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.
> >>
> >> As far as I remember this stuff, target lists for any nodes below the
> >> top of the join tree were previously always just Var nodes. Any
> >> expressions that needed to be computed were computed at the level of
> >> the topmost join, before adding upper planner steps like Agg,
> >> WindowAgg, Limit, etc. But, here you've got a complex expression --
> >> md5(ref_0.t) -- begin computed somewhere in the middle of the plan
> >> tree so that the sort can be performed at that point. However, that
> >> seems likely to break a bunch of assumptions all over the planner,
> >> because, unless a lot of work and surgery has been done since I looked
> >> at this last, that md5(ref_0.t) value is not going to "bubble up" to
> >> higher levels of the plan through the tlists of the individual nodes.
> >> That poses a risk that it will be recomputed multiple times, which is
> >> particularly bad for volatile functions because you might not always
> >> get the same answer, but it seems like it might be bad even for
> >> non-volatile functions, because it could be slow if the value is used
> >> at multiple places in the query. It feels like the sort of thing that
> >> might have broken some other assumptions, too, although I can't say
> >> exactly what.
> >
> >The distinction between volatile and stable expressions here has been
> >discussed at length in this thread; I think it'd be worth reading
> >through the rest of the thread before offering sledgehammer ideas
> >below like reverting the entire piece.
> >
>
> Um, that's a tad too harsh, I guess ...

Yes, it was too harsh. So, my apologies, Robert.

Nonetheless I do think that reverting would be a surprisingly strong
reaction for this because I don't think the problem scope is
particularly dire.

Two reasons why:

1. As noted earlier it'd be pretty easy to limit this sort pushdown
only to sorts on Vars. But I don't think that;s necessary, because:
2. From my time in the debugger on this issue, I don't see how the
planner can be assuming these things won't bubble up.

For one thing, your Tomas's example below shows the stable expression
being computed at the scan node level and bubbling up through.

For another thing, earlier in this thread [1] I have a patch that
compares expressions in the path target to the expression in the
pathkey, and even under a gather merge the stable expression (md5) is
already in the path target at that level of the path.

> We've discussed the volatile/stable expressions before, but AFAIK we
> failed to notice the planner assumes such expressions are expected to be
> computed only in the upper parts of the plan. If this assumption really
> exists, it probably relies on simply calling create_sort_path just about
> right, and the new code looking at useful pathkeys below gather merge
> might be breaking that simply by doing it too early.

I don't think such an assumption exists, as explained above, or at
least I can't find an indication of it, and it's already violated in
pre-incremental sort code.

> In that case we can either rip this code out, effectively reverting most
> of ba3e76cc571, or consider only pathkeys that we can find in the tlist
> of the child path.

This is exactly what the patch in [1] does. The current patch
effectively does it also, but I believe it does so in a more refined
way.

> I'm still not entirely sure I understand what's happening, or what the
> exact rule is. Consider this query:
>
>    explain (verbose) select distinct i, t, md5(t) from ref_0;
>
> which on PG12 (i.e. before incremental sort) is planned like this:
>
>                                   QUERY PLAN
> ----------------------------------------------------------------------------------
>   Unique  (cost=78120.92..83120.92 rows=500000 width=65)
>     Output: i, t, (md5(t))
>     ->  Sort  (cost=78120.92..79370.92 rows=500000 width=65)
>           Output: i, t, (md5(t))
>           Sort Key: ref_0.i, ref_0.t, (md5(ref_0.t))
>           ->  Seq Scan on public.ref_0  (cost=0.00..10282.00 rows=500000 width=65)
>                 Output: i, t, md5(t)
> (7 rows)
>
> i.e. the (stable) function is pushed all the way to the scan node. And
> even if we replace it with a volatile expression it gets pushed down:
>
> explain (verbose) select distinct i, t, md5(random()::text || t) from ref_0;
>
>                                   QUERY PLAN
> ----------------------------------------------------------------------------------
>   Unique  (cost=83120.92..88120.92 rows=500000 width=65)
>     Output: i, t, (md5(((random())::text || t)))
>     ->  Sort  (cost=83120.92..84370.92 rows=500000 width=65)
>           Output: i, t, (md5(((random())::text || t)))
>           Sort Key: ref_0.i, ref_0.t, (md5(((random())::text || ref_0.t)))
>           ->  Seq Scan on public.ref_0  (cost=0.00..15282.00 rows=500000 width=65)
>                 Output: i, t, md5(((random())::text || t))
> (7 rows)
>
>
> But perhaps I just don't understand the assumption correctly?
>
> >In particular, most of the discussion in the thread has been about
> >what is actually safe to reference at lower levels in the plan. It
> >would be extremely easy tomake this only allow sorting by Vars at
> >lower levels in the plan, though the current revision of the patch
> >follows what prepare_sort_from_pathkeys allows and disallows.
> >
> >Obviously it'd be nice to be able to sort by non-Var expressions at
> >lower levels too. But again, it'd be easy to change that if it's not
> >safe, so it'd be helpful if you could point to areas that don't bubble
> >non-Vars up so that we can actually comment about that in the source
> >and discuss specifics.
> >
>
> I think the proble here might be "what we think is theoretically safe"
> vs. "what the planner expects". I wouldn't be surprised if this was safe
> in theoretical sense, but planner not expecting that.

Agreed as a general rule, but as above I'm having a hard time matching
up pre-this-code behavior with the claim that the planner isn't
expecting it.

> >> I wonder if whatever part of the incremental sort patch allowed
> >> computation of tlist expressions below the top level of the join tree
> >> ought to just be reverted outright, but that might be overkill. I
> >> can't say that I really understand exactly what's going on here, but
> >> it seems like a pretty significant behavior change to make as part of
> >> another project, and I wonder if we're going to keep finding other
> >> problems with it.
> >
>
> It certainly was not my intention to make such change in this patch. I
> don't think the earlier parts of the incremental sort have this issue
> because all those places were already constructing regular sort before.
>
> But this code is different because it considers adding sort node (both
> regular and incremental) below a gather merge, which was not happening
> before. I suspect this is the problem, somehow.

If my above understanding holds, then it's not a significant behavior
change, but if it is, we have a simple workaround (looking only at
pathkeys referencing a Var in the path target list).

James

1: https://www.postgresql.org/message-id/CAAaqYe9Q5TL-d0Fcs1m9bes7YbfkrizmEVFF%2BLdTLE6ZEXCwhg%40mail.gmail.com



Re: enable_incremental_sort changes query behavior

From
Tomas Vondra
Date:
On Tue, Oct 06, 2020 at 09:37:31AM -0400, James Coleman wrote:
>On Tue, Oct 6, 2020 at 9:28 AM Jaime Casanova
><jaime.casanova@2ndquadrant.com> wrote:
>> Can you please create an entry in the commitfest for this one so we
>> don't lose track of it?
>

We're not too far from the next minor release, so I've been looking at
this fix again and I intend to get it committed shortly (on Monday or
so). Attached is a slightly modified version of the v4 patch that:

(a) Removes comments about projections added to code that is not
directly related to the fix. I'm not against adding such comments
separately.

(b) Fixes comment in expected output of incremental_sort test.

(c) Removes else branch from find_em_expr_usable_for_sorting_rel. It's
not quite needed thanks to the "return" in the "if" branch. IMO this
makes it more elegant.


I do have two questions about find_em_expr_usable_for_sorting_rel.

(a) Where does the em_is_const check come from? The comment claims we
should not be trying to sort by equivalence class members, but I can't
convince myself it's actually true and I don't see it discussed in this
thread.

(b) In find_em_expr_for_rel, which was what was used before, the
condition on relids was this:

     if (bms_is_subset(em->em_relids, rel->relids) &&
         !bms_is_empty(em->em_relids))
     {
         return em->em_expr;
     }

but here we're using only

     if (!bms_is_subset(em->em_relids, rel->relids))
         continue;

Isn't this missing the second bms_is_empty condition?


Of course, an alternative to this fix would be reverting ba3e76cc571
(completely or just the part introducing generate_useful_gather_paths).
If anyone thinks that's what we should do, please speak now.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Thu, Oct 29, 2020 at 6:06 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Tue, Oct 06, 2020 at 09:37:31AM -0400, James Coleman wrote:
> >On Tue, Oct 6, 2020 at 9:28 AM Jaime Casanova
> ><jaime.casanova@2ndquadrant.com> wrote:
> >> Can you please create an entry in the commitfest for this one so we
> >> don't lose track of it?
> >
>
> We're not too far from the next minor release, so I've been looking at
> this fix again and I intend to get it committed shortly (on Monday or
> so). Attached is a slightly modified version of the v4 patch that:
>
> (a) Removes comments about projections added to code that is not
> directly related to the fix. I'm not against adding such comments
> separately.

Seems fine. Do you want to commit them at the same time (just a
separate commit)? Or have a separate patch? It seems a bit overkill to
start a new thread just for those.

> (b) Fixes comment in expected output of incremental_sort test.

Thanks.

> (c) Removes else branch from find_em_expr_usable_for_sorting_rel. It's
> not quite needed thanks to the "return" in the "if" branch. IMO this
> makes it more elegant.

No objection.

> I do have two questions about find_em_expr_usable_for_sorting_rel.
>
> (a) Where does the em_is_const check come from? The comment claims we
> should not be trying to sort by equivalence class members, but I can't
> convince myself it's actually true and I don't see it discussed in this
> thread.

That comes from find_ec_member_for_tle which is called by
prepare_sort_from_pathkeys which we note in the comments contains the
set of rules we need to mirror.

> (b) In find_em_expr_for_rel, which was what was used before, the
> condition on relids was this:
>
>      if (bms_is_subset(em->em_relids, rel->relids) &&
>          !bms_is_empty(em->em_relids))
>      {
>          return em->em_expr;
>      }
>
> but here we're using only
>
>      if (!bms_is_subset(em->em_relids, rel->relids))
>          continue;
>
> Isn't this missing the second bms_is_empty condition?

I definitely remember intentionally removing that condition. For one,
it's not present in prepare_sort_from_pathkeys. My memory is a bit
fuzzy on all of the whys, but isn't it the case that if we've already
excluded const expressions, that we must have vars (and therefore
rels) present, and therefore the relids bms must be non-empty?
Probably more importantly, we're going to check that an actual em expr
matches, which means the bms subset check is really just an
optimization to avoid unnecessarily scanning the exprs for equality.
Perhaps it's worth adding a comment saying as such?

By the way, the fact that this is parallel to
prepare_sort_from_pathkeys is the reason the XXX comment is still
there about possibly needing to remove relabel types (since
prepare_sort_from_pathkeys does that). I don't think it's a hard
requirement: the worst case is that by not digging into relabel types
we're being unnecessarily strict.

Both of these I can add if desired.

James



Re: enable_incremental_sort changes query behavior

From
Tomas Vondra
Date:
On Fri, Oct 30, 2020 at 01:26:08PM -0400, James Coleman wrote:
>On Thu, Oct 29, 2020 at 6:06 PM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Tue, Oct 06, 2020 at 09:37:31AM -0400, James Coleman wrote:
>> >On Tue, Oct 6, 2020 at 9:28 AM Jaime Casanova
>> ><jaime.casanova@2ndquadrant.com> wrote:
>> >> Can you please create an entry in the commitfest for this one so we
>> >> don't lose track of it?
>> >
>>
>> We're not too far from the next minor release, so I've been looking at
>> this fix again and I intend to get it committed shortly (on Monday or
>> so). Attached is a slightly modified version of the v4 patch that:
>>
>> (a) Removes comments about projections added to code that is not
>> directly related to the fix. I'm not against adding such comments
>> separately.
>
>Seems fine. Do you want to commit them at the same time (just a
>separate commit)? Or have a separate patch? It seems a bit overkill to
>start a new thread just for those.
>

Probably sometime later, as a separate patch. I haven't thought very
much about those comments, it just seemed unrelated to the fix so I've
removed that for now. Let's not bother with this until after the minor
release.

>> (b) Fixes comment in expected output of incremental_sort test.
>
>Thanks.
>
>> (c) Removes else branch from find_em_expr_usable_for_sorting_rel. It's
>> not quite needed thanks to the "return" in the "if" branch. IMO this
>> makes it more elegant.
>
>No objection.
>
>> I do have two questions about find_em_expr_usable_for_sorting_rel.
>>
>> (a) Where does the em_is_const check come from? The comment claims we
>> should not be trying to sort by equivalence class members, but I can't
>> convince myself it's actually true and I don't see it discussed in this
>> thread.
>
>That comes from find_ec_member_for_tle which is called by
>prepare_sort_from_pathkeys which we note in the comments contains the
>set of rules we need to mirror.
>

Thanks for the pointer. I'll take a look.

>> (b) In find_em_expr_for_rel, which was what was used before, the
>> condition on relids was this:
>>
>>      if (bms_is_subset(em->em_relids, rel->relids) &&
>>          !bms_is_empty(em->em_relids))
>>      {
>>          return em->em_expr;
>>      }
>>
>> but here we're using only
>>
>>      if (!bms_is_subset(em->em_relids, rel->relids))
>>          continue;
>>
>> Isn't this missing the second bms_is_empty condition?
>
>I definitely remember intentionally removing that condition. For one,
>it's not present in prepare_sort_from_pathkeys. My memory is a bit
>fuzzy on all of the whys, but isn't it the case that if we've already
>excluded const expressions, that we must have vars (and therefore
>rels) present, and therefore the relids bms must be non-empty?
>Probably more importantly, we're going to check that an actual em expr
>matches, which means the bms subset check is really just an
>optimization to avoid unnecessarily scanning the exprs for equality.
>Perhaps it's worth adding a comment saying as such?
>
>By the way, the fact that this is parallel to
>prepare_sort_from_pathkeys is the reason the XXX comment is still
>there about possibly needing to remove relabel types (since
>prepare_sort_from_pathkeys does that). I don't think it's a hard
>requirement: the worst case is that by not digging into relabel types
>we're being unnecessarily strict.
>
>Both of these I can add if desired.
>

Yeah, it'd be good to explain the reasoning why it's fine to have the
conditions like this. Thanks.


regards


-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Fri, Oct 30, 2020 at 5:03 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Fri, Oct 30, 2020 at 01:26:08PM -0400, James Coleman wrote:
> >On Thu, Oct 29, 2020 at 6:06 PM Tomas Vondra
> ><tomas.vondra@2ndquadrant.com> wrote:
> >>
> >> On Tue, Oct 06, 2020 at 09:37:31AM -0400, James Coleman wrote:
> >> >On Tue, Oct 6, 2020 at 9:28 AM Jaime Casanova
> >> ><jaime.casanova@2ndquadrant.com> wrote:
> >> >> Can you please create an entry in the commitfest for this one so we
> >> >> don't lose track of it?
> >> >
> >>
> >> We're not too far from the next minor release, so I've been looking at
> >> this fix again and I intend to get it committed shortly (on Monday or
> >> so). Attached is a slightly modified version of the v4 patch that:
> >>
> >> (a) Removes comments about projections added to code that is not
> >> directly related to the fix. I'm not against adding such comments
> >> separately.
> >
> >Seems fine. Do you want to commit them at the same time (just a
> >separate commit)? Or have a separate patch? It seems a bit overkill to
> >start a new thread just for those.
> >
>
> Probably sometime later, as a separate patch. I haven't thought very
> much about those comments, it just seemed unrelated to the fix so I've
> removed that for now. Let's not bother with this until after the minor
> release.

For whatever it's worth, I didn't originally consider them unrelated;
the purpose was to explain why those places were safe relative to the
same kinds of issues under discussion here: whether an expr will be in
the target and when it might need to be added.

EIther way I've split it into a separate patch in this series.

> >> (b) Fixes comment in expected output of incremental_sort test.
> >
> >Thanks.
> >
> >> (c) Removes else branch from find_em_expr_usable_for_sorting_rel. It's
> >> not quite needed thanks to the "return" in the "if" branch. IMO this
> >> makes it more elegant.
> >
> >No objection.
> >
> >> I do have two questions about find_em_expr_usable_for_sorting_rel.
> >>
> >> (a) Where does the em_is_const check come from? The comment claims we
> >> should not be trying to sort by equivalence class members, but I can't
> >> convince myself it's actually true and I don't see it discussed in this
> >> thread.
> >
> >That comes from find_ec_member_for_tle which is called by
> >prepare_sort_from_pathkeys which we note in the comments contains the
> >set of rules we need to mirror.
> >
>
> Thanks for the pointer. I'll take a look.
>
> >> (b) In find_em_expr_for_rel, which was what was used before, the
> >> condition on relids was this:
> >>
> >>      if (bms_is_subset(em->em_relids, rel->relids) &&
> >>          !bms_is_empty(em->em_relids))
> >>      {
> >>          return em->em_expr;
> >>      }
> >>
> >> but here we're using only
> >>
> >>      if (!bms_is_subset(em->em_relids, rel->relids))
> >>          continue;
> >>
> >> Isn't this missing the second bms_is_empty condition?
> >
> >I definitely remember intentionally removing that condition. For one,
> >it's not present in prepare_sort_from_pathkeys. My memory is a bit
> >fuzzy on all of the whys, but isn't it the case that if we've already
> >excluded const expressions, that we must have vars (and therefore
> >rels) present, and therefore the relids bms must be non-empty?
> >Probably more importantly, we're going to check that an actual em expr
> >matches, which means the bms subset check is really just an
> >optimization to avoid unnecessarily scanning the exprs for equality.
> >Perhaps it's worth adding a comment saying as such?
> >
> >By the way, the fact that this is parallel to
> >prepare_sort_from_pathkeys is the reason the XXX comment is still
> >there about possibly needing to remove relabel types (since
> >prepare_sort_from_pathkeys does that). I don't think it's a hard
> >requirement: the worst case is that by not digging into relabel types
> >we're being unnecessarily strict.
> >
> >Both of these I can add if desired.
> >
>
> Yeah, it'd be good to explain the reasoning why it's fine to have the
> conditions like this. Thanks.

I was looking at this some more, and I'm still convinced that this is
correct, but I don't think a comment about it being an optimization
(though I suspect that that its major benefit), since it came from,
and must parallel, find_ec_member_for_tle, and there is no such
explanation there. I don't want to backfill reasoning onto that
decision, but I do think it's important to parallel it since it's
ultimately what is going to cause errors if we're out of sync with it.

As to why find_em_expr_for_rel is different, I think the answer is
that it's used by the FDW code, and it seems to me to make sense that
in that case we'd only send sort conditions to the foreign server if
the em actually contains rels.

The new series attached includes the primary fix for this issue, the
additional comments that could either go in at the same time or not,
and my patch with semi-related questions (which isn't intended to be
committed, but to keep track of the questions).

James

Attachment

Re: enable_incremental_sort changes query behavior

From
Tomas Vondra
Date:
On Fri, Oct 30, 2020 at 07:37:33PM -0400, James Coleman wrote:
>
> ...
>
>I was looking at this some more, and I'm still convinced that this is
>correct, but I don't think a comment about it being an optimization
>(though I suspect that that its major benefit), since it came from,
>and must parallel, find_ec_member_for_tle, and there is no such
>explanation there. I don't want to backfill reasoning onto that
>decision, but I do think it's important to parallel it since it's
>ultimately what is going to cause errors if we're out of sync with it.
>
>As to why find_em_expr_for_rel is different, I think the answer is
>that it's used by the FDW code, and it seems to me to make sense that
>in that case we'd only send sort conditions to the foreign server if
>the em actually contains rels.
>
>The new series attached includes the primary fix for this issue, the
>additional comments that could either go in at the same time or not,
>and my patch with semi-related questions (which isn't intended to be
>committed, but to keep track of the questions).
>

Thanks. Attached are two patches that I plan to get committed

0001 is what you sent, with slightly reworded commit message. This is
the actual fix.

I'm still thinking about Robert's comment that perhaps we should be
considering only expressions already in the relation's target, which
would imply checking for volatile expressions is not enough. But I've
been unable to convince myself it's correct or incorrect. In any case
0001 is a clear improvement - in the worst case we'll have to make it
stricter in the future.


0002 partially reverts ba3e76cc571 by moving find_em_expr_for_rel back
to postgres_fdw.c.  We've moved it to equivclass.c so that it could be
called from both the FDW and allpaths.c, but now that the functions
diverged it's only called from the FDW again.  So maybe we should move
it back, but I guess that's not a good thing in a minor release.



regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: enable_incremental_sort changes query behavior

From
Tomas Vondra
Date:
On Tue, Nov 03, 2020 at 03:37:43AM +0100, Tomas Vondra wrote:
>On Fri, Oct 30, 2020 at 07:37:33PM -0400, James Coleman wrote:
>>
>>...
>>
>>I was looking at this some more, and I'm still convinced that this is
>>correct, but I don't think a comment about it being an optimization
>>(though I suspect that that its major benefit), since it came from,
>>and must parallel, find_ec_member_for_tle, and there is no such
>>explanation there. I don't want to backfill reasoning onto that
>>decision, but I do think it's important to parallel it since it's
>>ultimately what is going to cause errors if we're out of sync with it.
>>
>>As to why find_em_expr_for_rel is different, I think the answer is
>>that it's used by the FDW code, and it seems to me to make sense that
>>in that case we'd only send sort conditions to the foreign server if
>>the em actually contains rels.
>>
>>The new series attached includes the primary fix for this issue, the
>>additional comments that could either go in at the same time or not,
>>and my patch with semi-related questions (which isn't intended to be
>>committed, but to keep track of the questions).
>>
>
>Thanks. Attached are two patches that I plan to get committed
>
>0001 is what you sent, with slightly reworded commit message. This is
>the actual fix.
>
>I'm still thinking about Robert's comment that perhaps we should be
>considering only expressions already in the relation's target, which
>would imply checking for volatile expressions is not enough. But I've
>been unable to convince myself it's correct or incorrect. In any case
>0001 is a clear improvement - in the worst case we'll have to make it
>stricter in the future.
>
>
>0002 partially reverts ba3e76cc571 by moving find_em_expr_for_rel back
>to postgres_fdw.c.  We've moved it to equivclass.c so that it could be
>called from both the FDW and allpaths.c, but now that the functions
>diverged it's only called from the FDW again.  So maybe we should move
>it back, but I guess that's not a good thing in a minor release.
>

I've pushed the 0001 part, i.e. the main fix. Not sure about the other
parts (comments and moving the code back to postgres_fdw) yet.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: enable_incremental_sort changes query behavior

From
luis.roberto@siscobra.com.br
Date:

I've pushed the 0001 part, i.e. the main fix. Not sure about the other
parts (comments and moving the code back to postgres_fdw) yet.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Hi, Tomas!

I'm still having trouble with this problem. Tom commited something [1] to avoid core dumps, but all I get now is: subplan "SubPlan 1"  was not initialized

I upgraded to 13.1 over the weekend but the problem still occurs. 

This is the topic I started: https://www.postgresql.org/message-id/flat/CAAaqYe--qpnBOUiqnHSAc71K2HpXetnGedRAX0Z04zx%3DY0ExHA%40mail.gmail.com#8cb6c7c5e787d2d3d2dc7aaf00d1cae8



[1] https://git.postgresql.org/pg/commitdiff/936043c9eacb9e9c7356a8190a410d2c4e4ea03a

Re: enable_incremental_sort changes query behavior

From
Tomas Vondra
Date:
Hmm, I missed that other thread. That indeed seems like a bug in the
same area already tweaked by ebb7ae839d033d0f2 for similar cases.

The attached patch fixes this simply by adding is_parallel_safe to
get_useful_pathkeys_for_relation - that does fix the reproducer, but I'm
not entirely sure that's the right place. Maybe it should be done in
find_em_expr_usable_for_sorting_rel (which might make a difference if an
EC clause can contain a mix of parallel safe and unsafe expressions). Or
maybe we should do it in the caller (which would allow using
get_useful_pathkeys_for_relation in contexts not requiring parallel
safety in the future).

Anyway, while this is not an "incremental sort" bug, it seems like the
root cause is the same as for ebb7ae839d033d0f2 - one of the incremental
sort patches started considering sorting below gather nodes, not
realizing these possible consequences.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Mon, Nov 16, 2020 at 11:23 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> Hmm, I missed that other thread. That indeed seems like a bug in the
> same area already tweaked by ebb7ae839d033d0f2 for similar cases.

I meant to bring it up in this thread before we got the other patch
committed, but I just ended up not having time to look into it.

> The attached patch fixes this simply by adding is_parallel_safe to
> get_useful_pathkeys_for_relation - that does fix the reproducer, but I'm
> not entirely sure that's the right place. Maybe it should be done in
> find_em_expr_usable_for_sorting_rel (which might make a difference if an
> EC clause can contain a mix of parallel safe and unsafe expressions). Or
> maybe we should do it in the caller (which would allow using
> get_useful_pathkeys_for_relation in contexts not requiring parallel
> safety in the future).
>
> Anyway, while this is not an "incremental sort" bug, it seems like the
> root cause is the same as for ebb7ae839d033d0f2 - one of the incremental
> sort patches started considering sorting below gather nodes, not
> realizing these possible consequences.

Yeah. I'd like to investigate a bit if really we should also be adding
it to prepare_sort_from_pathkeys (which
find_em_expr_usable_for_sorting_rel parallels) or similar, since this
seems to be a broader concern that's been previously missed (even if
the bug was otherwise not yet exposed). In particular, as I understand
it, we did do sorts below gather nodes before, just in fewer places,
which meant that "in theory" the code was already broken (or at least
not complete) for parallel queries.

James



Re: enable_incremental_sort changes query behavior

From
Tomas Vondra
Date:

On 11/17/20 3:03 PM, James Coleman wrote:
> On Mon, Nov 16, 2020 at 11:23 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> Hmm, I missed that other thread. That indeed seems like a bug in the
>> same area already tweaked by ebb7ae839d033d0f2 for similar cases.
> 
> I meant to bring it up in this thread before we got the other patch
> committed, but I just ended up not having time to look into it.
> 
>> The attached patch fixes this simply by adding is_parallel_safe to
>> get_useful_pathkeys_for_relation - that does fix the reproducer, but I'm
>> not entirely sure that's the right place. Maybe it should be done in
>> find_em_expr_usable_for_sorting_rel (which might make a difference if an
>> EC clause can contain a mix of parallel safe and unsafe expressions). Or
>> maybe we should do it in the caller (which would allow using
>> get_useful_pathkeys_for_relation in contexts not requiring parallel
>> safety in the future).
>>
>> Anyway, while this is not an "incremental sort" bug, it seems like the
>> root cause is the same as for ebb7ae839d033d0f2 - one of the incremental
>> sort patches started considering sorting below gather nodes, not
>> realizing these possible consequences.
> 
> Yeah. I'd like to investigate a bit if really we should also be adding
> it to prepare_sort_from_pathkeys (which
> find_em_expr_usable_for_sorting_rel parallels) or similar, since this
> seems to be a broader concern that's been previously missed (even if
> the bug was otherwise not yet exposed). In particular, as I understand
> it, we did do sorts below gather nodes before, just in fewer places,
> which meant that "in theory" the code was already broken (or at least
> not complete) for parallel queries.
> 

True. It's possible the bug was there before, but the affected plans
were just very unlikely for some reason, and the new plans introduced by
incremental sort just made it easier to trigger.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Tue, Nov 17, 2020 at 11:20 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
>
>
> On 11/17/20 3:03 PM, James Coleman wrote:
> > On Mon, Nov 16, 2020 at 11:23 PM Tomas Vondra
> > <tomas.vondra@enterprisedb.com> wrote:
> >>
> >> Hmm, I missed that other thread. That indeed seems like a bug in the
> >> same area already tweaked by ebb7ae839d033d0f2 for similar cases.
> >
> > I meant to bring it up in this thread before we got the other patch
> > committed, but I just ended up not having time to look into it.
> >
> >> The attached patch fixes this simply by adding is_parallel_safe to
> >> get_useful_pathkeys_for_relation - that does fix the reproducer, but I'm
> >> not entirely sure that's the right place. Maybe it should be done in
> >> find_em_expr_usable_for_sorting_rel (which might make a difference if an
> >> EC clause can contain a mix of parallel safe and unsafe expressions). Or
> >> maybe we should do it in the caller (which would allow using
> >> get_useful_pathkeys_for_relation in contexts not requiring parallel
> >> safety in the future).
> >>
> >> Anyway, while this is not an "incremental sort" bug, it seems like the
> >> root cause is the same as for ebb7ae839d033d0f2 - one of the incremental
> >> sort patches started considering sorting below gather nodes, not
> >> realizing these possible consequences.
> >
> > Yeah. I'd like to investigate a bit if really we should also be adding
> > it to prepare_sort_from_pathkeys (which
> > find_em_expr_usable_for_sorting_rel parallels) or similar, since this
> > seems to be a broader concern that's been previously missed (even if
> > the bug was otherwise not yet exposed). In particular, as I understand
> > it, we did do sorts below gather nodes before, just in fewer places,
> > which meant that "in theory" the code was already broken (or at least
> > not complete) for parallel queries.
> >
>
> True. It's possible the bug was there before, but the affected plans
> were just very unlikely for some reason, and the new plans introduced by
> incremental sort just made it easier to trigger.

For additional patches on the broader topic (e.g., parallel safety) is
it preferable to keep them here or in the thread Luis started (or even
a new one besides that -- since that's on -bugs not -hackers)?

James



Re: enable_incremental_sort changes query behavior

From
Robert Haas
Date:
On Wed, Oct 7, 2020 at 6:22 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> I'm still not entirely sure I understand what's happening, or what the
> exact rule is. Consider this query:
>
>    explain (verbose) select distinct i, t, md5(t) from ref_0;
>
> which on PG12 (i.e. before incremental sort) is planned like this:
>
>                                   QUERY PLAN
> ----------------------------------------------------------------------------------
>   Unique  (cost=78120.92..83120.92 rows=500000 width=65)
>     Output: i, t, (md5(t))
>     ->  Sort  (cost=78120.92..79370.92 rows=500000 width=65)
>           Output: i, t, (md5(t))
>           Sort Key: ref_0.i, ref_0.t, (md5(ref_0.t))
>           ->  Seq Scan on public.ref_0  (cost=0.00..10282.00 rows=500000 width=65)
>                 Output: i, t, md5(t)
> (7 rows)
>
> i.e. the (stable) function is pushed all the way to the scan node. And
> even if we replace it with a volatile expression it gets pushed down:
>
> explain (verbose) select distinct i, t, md5(random()::text || t) from ref_0;
>
>                                   QUERY PLAN
> ----------------------------------------------------------------------------------
>   Unique  (cost=83120.92..88120.92 rows=500000 width=65)
>     Output: i, t, (md5(((random())::text || t)))
>     ->  Sort  (cost=83120.92..84370.92 rows=500000 width=65)
>           Output: i, t, (md5(((random())::text || t)))
>           Sort Key: ref_0.i, ref_0.t, (md5(((random())::text || ref_0.t)))
>           ->  Seq Scan on public.ref_0  (cost=0.00..15282.00 rows=500000 width=65)
>                 Output: i, t, md5(((random())::text || t))
> (7 rows)
>
>
> But perhaps I just don't understand the assumption correctly?

This isn't a counterexample, because there's no join tree here -- or,
well, there is, but it's trivial, because there's only one relation
involved. You can't have a non-Var expression computed before you
finish all the joins, because there are no joins.

What I said was: "target lists for any nodes below the top of the join
tree were previously always just Var nodes." The topmost join allowed
non-Var nodes before, but not lower levels.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Fri, Nov 20, 2020 at 12:06 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Oct 7, 2020 at 6:22 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> > I'm still not entirely sure I understand what's happening, or what the
> > exact rule is. Consider this query:
> >
> >    explain (verbose) select distinct i, t, md5(t) from ref_0;
> >
> > which on PG12 (i.e. before incremental sort) is planned like this:
> >
> >                                   QUERY PLAN
> > ----------------------------------------------------------------------------------
> >   Unique  (cost=78120.92..83120.92 rows=500000 width=65)
> >     Output: i, t, (md5(t))
> >     ->  Sort  (cost=78120.92..79370.92 rows=500000 width=65)
> >           Output: i, t, (md5(t))
> >           Sort Key: ref_0.i, ref_0.t, (md5(ref_0.t))
> >           ->  Seq Scan on public.ref_0  (cost=0.00..10282.00 rows=500000 width=65)
> >                 Output: i, t, md5(t)
> > (7 rows)
> >
> > i.e. the (stable) function is pushed all the way to the scan node. And
> > even if we replace it with a volatile expression it gets pushed down:
> >
> > explain (verbose) select distinct i, t, md5(random()::text || t) from ref_0;
> >
> >                                   QUERY PLAN
> > ----------------------------------------------------------------------------------
> >   Unique  (cost=83120.92..88120.92 rows=500000 width=65)
> >     Output: i, t, (md5(((random())::text || t)))
> >     ->  Sort  (cost=83120.92..84370.92 rows=500000 width=65)
> >           Output: i, t, (md5(((random())::text || t)))
> >           Sort Key: ref_0.i, ref_0.t, (md5(((random())::text || ref_0.t)))
> >           ->  Seq Scan on public.ref_0  (cost=0.00..15282.00 rows=500000 width=65)
> >                 Output: i, t, md5(((random())::text || t))
> > (7 rows)
> >
> >
> > But perhaps I just don't understand the assumption correctly?
>
> This isn't a counterexample, because there's no join tree here -- or,
> well, there is, but it's trivial, because there's only one relation
> involved. You can't have a non-Var expression computed before you
> finish all the joins, because there are no joins.
>
> What I said was: "target lists for any nodes below the top of the join
> tree were previously always just Var nodes." The topmost join allowed
> non-Var nodes before, but not lower levels.

As I understand what you're saying, the attached (from the repro case
in [1]'s discussion about parallel safety here) is a counterexample.

Specifically we have a plan like:

Merge Right Join
  -> Unique
    -> Gather Merge
      -> Sort
        -> Nested Loop

The pathtarget of the nested loop contains non-var expressions (in
this case a CASE expression).

Am I misunderstanding what you're saying?

I've attached verbose output (and the query).

James

Attachment

Re: enable_incremental_sort changes query behavior

From
Robert Haas
Date:
On Fri, Nov 20, 2020 at 1:51 PM James Coleman <jtc331@gmail.com> wrote:
> > This isn't a counterexample, because there's no join tree here -- or,
> > well, there is, but it's trivial, because there's only one relation
> > involved. You can't have a non-Var expression computed before you
> > finish all the joins, because there are no joins.
> >
> > What I said was: "target lists for any nodes below the top of the join
> > tree were previously always just Var nodes." The topmost join allowed
> > non-Var nodes before, but not lower levels.
>
> As I understand what you're saying, the attached (from the repro case
> in [1]'s discussion about parallel safety here) is a counterexample.
>
> Specifically we have a plan like:
>
> Merge Right Join
>   -> Unique
>     -> Gather Merge
>       -> Sort
>         -> Nested Loop
>
> The pathtarget of the nested loop contains non-var expressions (in
> this case a CASE expression).

Well, in this case there are two join trees, one for the subquery and
one for the outer query. The expressions for the subquery are computed
at the top of the plan tree for that subquery.

I guess I didn't specify before that I meant "at the top of the join
tree for a subquery level," but I did mean that. On principle, the
planner can't very realistically postpone evaluating a subquery's
expressions until the top of the outer query's join tree, because, as
in your example here, the subquery might contain an ORDER BY or
DISTINCT clause. And anyway, if you look at the planner code, you'll
see that every subquery is basically planned separately, unless it
gets flattened into the parent query, so even if it were possible to
postpone expression evaluation to outer subquery levels in some case,
the current code structure definitely would fail to achieve that goal.

However, apart from incremental sort, you can postpone evaluating a
join tree's expressions until you reach the top of the join tree, and
I think that's what we have always done in the past. Each baserel in
the join tree gets a target list containing a list of vars which
corresponds to its column list, and the joinrels get lists of vars
which correspond to the columns from the inputs are needed at higher
levels of the plan tree. At the top of the join tree, we stick in the
expressions, replacing the target list of the top node in the plan
tree; the expressions have to be computable from the inputs because
the inputs contain all the columns that we figured out were needed at
this level at the beginning of planning. But, if there are scans or
joins below the top level of the plan tree, then do they bubble up to
higher levels of the plan? Should they? It's pretty complicated.
Theoretically if I've computed length(somelongvar) then perhaps I can
just bubble the computed value up the plan tree, rather than the
original column, which might be cheaper. But that only works if the
higher levels only need length(somelongvar) and not somelongvar
itself. I don't think there's any logic to work stuff like this out,
unless the incremental sort patch added some, because I don't think we
ever did it before. And it may be that it doesn't really matter, at
least when there aren't volatile functions: perhaps the worst case is
that we evaluate a non-volatile function twice, and maybe that's just
a performance consequence that we can live with. But on the other
hand, maybe it breaks stuff.

Or, on the third hand, maybe I'm wrong about what the rule was in the
first place. This is certainly a complicated topic. I don't think I'm
wrong, or I wouldn't be bothering to write emails about it, but that
doesn't mean I'm not wrong anyway.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
Thanks much for the detailed followup; this is super helpful.

On Fri, Nov 20, 2020 at 2:57 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Nov 20, 2020 at 1:51 PM James Coleman <jtc331@gmail.com> wrote:
> > > This isn't a counterexample, because there's no join tree here -- or,
> > > well, there is, but it's trivial, because there's only one relation
> > > involved. You can't have a non-Var expression computed before you
> > > finish all the joins, because there are no joins.
> > >
> > > What I said was: "target lists for any nodes below the top of the join
> > > tree were previously always just Var nodes." The topmost join allowed
> > > non-Var nodes before, but not lower levels.
> >
> > As I understand what you're saying, the attached (from the repro case
> > in [1]'s discussion about parallel safety here) is a counterexample.
> >
> > Specifically we have a plan like:
> >
> > Merge Right Join
> >   -> Unique
> >     -> Gather Merge
> >       -> Sort
> >         -> Nested Loop
> >
> > The pathtarget of the nested loop contains non-var expressions (in
> > this case a CASE expression).
>
> Well, in this case there are two join trees, one for the subquery and
> one for the outer query. The expressions for the subquery are computed
> at the top of the plan tree for that subquery.
>
> I guess I didn't specify before that I meant "at the top of the join
> tree for a subquery level," but I did mean that. On principle, the
> planner can't very realistically postpone evaluating a subquery's
> expressions until the top of the outer query's join tree, because, as
> in your example here, the subquery might contain an ORDER BY or
> DISTINCT clause. And anyway, if you look at the planner code, you'll
> see that every subquery is basically planned separately, unless it
> gets flattened into the parent query, so even if it were possible to
> postpone expression evaluation to outer subquery levels in some case,
> the current code structure definitely would fail to achieve that goal.

Ah, the "for a subquery level" clarification is helpful.

> However, apart from incremental sort, you can postpone evaluating a
> join tree's expressions until you reach the top of the join tree, and
> I think that's what we have always done in the past. Each baserel in
> the join tree gets a target list containing a list of vars which
> corresponds to its column list, and the joinrels get lists of vars
> which correspond to the columns from the inputs are needed at higher
> levels of the plan tree. At the top of the join tree, we stick in the
> expressions, replacing the target list of the top node in the plan
> tree; the expressions have to be computable from the inputs because
> the inputs contain all the columns that we figured out were needed at
> this level at the beginning of planning. But, if there are scans or
> joins below the top level of the plan tree, then do they bubble up to
> higher levels of the plan? Should they? It's pretty complicated.
> Theoretically if I've computed length(somelongvar) then perhaps I can
> just bubble the computed value up the plan tree, rather than the
> original column, which might be cheaper. But that only works if the
> higher levels only need length(somelongvar) and not somelongvar
> itself.

Apologies for some rambling/overlapping thoughts below; I'm try to
absorb all of this.

A join rel (at the top of a subquery join tree) would have to somehow
know that it needs to pass up both length(somelongvar) and somelongvar
right? Since both might be needed by a higher level?

If I understand correctly you're saying that:
1. Target list entries for join rels contain vars and non-var expressions
2. Target list entries for base rels contain only vars
3. By implication the planner knows how to propagate non-var
expression results up from join rels, but not from base rels.

Point 3 is the one that would surprise me, or at least, I'd be
interested in knowing what makes the two different in that case (is it
an artificial restriction, a result of how things are structured in
code, something else?). I feel like I'm still missing context on why
this is inherently a problem.

Does this mean a simple "SELECT bar * 2 FROM foo" ends up with a join
rel at the top? Or that a projection handles it (maybe this is the
wrong question -- perhaps the project results in a join rel? I don't
know enough here to tell if this ends up being an odd question)? And
if a projection, then would inserting a projection above a base rel
but before the top of the join tree solve the problem while allowing
us to push expression evaluation down?

Perhaps the key (to what I'm missing) is just saying something like:
"the work hasn't been down to push expression evaluation down and
ensure it remains in the target lists at every level of the join tree
so as to be propagated up, and so adding it to the base rel will
almost certainly mean duplicate evaluation". That still leaves my
question about how propagating it up from a subquery join tree works
(maybe that's an already handled "special" case).

> I don't think there's any logic to work stuff like this out,
> unless the incremental sort patch added some, because I don't think we
> ever did it before. And it may be that it doesn't really matter, at
> least when there aren't volatile functions: perhaps the worst case is
> that we evaluate a non-volatile function twice, and maybe that's just
> a performance consequence that we can live with. But on the other
> hand, maybe it breaks stuff.

It seems to me that (given the new restriction on volatile
expressions) it'd likely not be the end of the world from the
unnecessary execution perspective; after all, we already do lots of
duplicate expression evaluation in other areas like filter clauses.

> Or, on the third hand, maybe I'm wrong about what the rule was in the
> first place. This is certainly a complicated topic. I don't think I'm
> wrong, or I wouldn't be bothering to write emails about it, but that
> doesn't mean I'm not wrong anyway.

Well, I for one definitely didn't understand what you were trying to
point out earlier, so I appreciate your continuing the discussion.

It seems to me a (likely?) outcome of this is requiring that the
expressions be in the target list to be considered for sorting here.
And indeed that fixes the non-parallel subplan issue we're also
looking at.

But I think that still leaves something missing: after all,
prepare_sort_from_pathkeys does know how to insert new target list
entries, so something either there (or in the caller/how its called)
has to be enforcing an apparently implicit rule about what point in
the tree it's safe to do such. Or even just no path generation had
ever considered it before (that feels unsatisfactory as an explanation
to me, because it feels more implicit than I'd like, but...)

James



Re: enable_incremental_sort changes query behavior

From
Tom Lane
Date:
James Coleman <jtc331@gmail.com> writes:
> But I think that still leaves something missing: after all,
> prepare_sort_from_pathkeys does know how to insert new target list
> entries, so something either there (or in the caller/how its called)
> has to be enforcing an apparently implicit rule about what point in
> the tree it's safe to do such. Or even just no path generation had
> ever considered it before (that feels unsatisfactory as an explanation
> to me, because it feels more implicit than I'd like, but...)

Hi guys,

I hadn't been paying any attention to this thread, but Robert asked
me to take a look.  A few comments:

1. James was wondering, far upthread, why we would do projections
pre-sort or post-sort.  I think the answers can be found by studying
planner.c's make_sort_input_target(), which separates out what we want
to do pre-sort and post-sort.  (There, the "sort" is envisioned as a
standalone Sort node atop all the scan/join steps; but its decisions
nonetheless constrain what will be considered for sorting that's
implemented further down.)  It has a *very* long introductory comment
laying out all the considerations.

2. Robert is correct that under normal circumstances, the targetlists of
both baserels and join rels contain only Vars.  Any computations we have
to do are postponed to the final top-level targetlist, whether they are
volatile or not.  The fact that this policy is applied regardless of
volatility may explain why James isn't seeing volatility checks where he
expected them.  The core (and, I think, only) exception to this is that
an expression that is a sort or group key has to be evaluated earlier.
But even then, it won't be pushed down as far as the reltargets of any
scan or join relations; it'll be computed after the final join.

3. If we have a volatile sort/group key, that constrains the set of plans
we can validly generate, because the key expression mustn't be evaluated
redundantly.  With the addition of parallelism, a non-parallel-safe
sort/group key expression likewise constrains the set of valid plans,
even if it's not considered volatile.

4. In the past, the only way we'd consider non-Var sort keys in any way
during scan/join planning was if (a) a relation had an expression index
matching a requested sort key; of course, that's a-fortiori going to be
a non-volatile expression.  Or (b) we needed to sort in order to provide
suitable input for a merge join ... but again, volatile expressions
aren't considered candidates for mergejoin.  So there basically wasn't
any need to consider sorting by volatiles below the top level sort/group
processing, and that again contributes to why you don't see explicit
tests in that area.  I'm not sure offhand whether the parallel-query
patches added anything to guard against nonvolatile-but-parallel-unsafe
sort expressions.  If not, there might be live bugs there independently
of incremental sort; or that might be unreachable because of overall
limitations on parallelism.

5. While I've not dug far enough into the code to identify the exact
issue, the impression I have about this bug and the parallel-unsafe
subplan bug is that the incremental sort code is generating Paths
without due regard to point 3.  If it is making paths based on explicit
sorts for final-stage pathkeys, independently of either expression
indexes or mergejoin requirements, that could well explain why it's
got problems that weren't seen before.

6. I did look at ebb7ae839, and I suspect that the last part of
find_em_expr_usable_for_sorting_rel(), ie the search of the reltarget
list, is dead code.  There is no case where a volatile expression would
appear there, per point 2.  The committed regression test cases
certainly do not provide a counterexample: a look at the code coverage
report will show you that that loop never finds a match in any
regression test case.  I'd be inclined to flush that code and just
say that we won't return volatile expressions here.

            regards, tom lane



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Mon, Nov 23, 2020 at 2:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> James Coleman <jtc331@gmail.com> writes:
> > But I think that still leaves something missing: after all,
> > prepare_sort_from_pathkeys does know how to insert new target list
> > entries, so something either there (or in the caller/how its called)
> > has to be enforcing an apparently implicit rule about what point in
> > the tree it's safe to do such. Or even just no path generation had
> > ever considered it before (that feels unsatisfactory as an explanation
> > to me, because it feels more implicit than I'd like, but...)
>
> Hi guys,
>
> I hadn't been paying any attention to this thread, but Robert asked
> me to take a look.  A few comments:

Thanks for jumping in (and thanks Robert for asking for Tom to take a
look); I appreciate the input.

> 1. James was wondering, far upthread, why we would do projections
> pre-sort or post-sort.  I think the answers can be found by studying
> planner.c's make_sort_input_target(), which separates out what we want
> to do pre-sort and post-sort.  (There, the "sort" is envisioned as a
> standalone Sort node atop all the scan/join steps; but its decisions
> nonetheless constrain what will be considered for sorting that's
> implemented further down.)  It has a *very* long introductory comment
> laying out all the considerations.

That comment is helpful; I wish I'd discovered it sooner.

Does it imply we need to intentionally avoid SRFs also?

> 2. Robert is correct that under normal circumstances, the targetlists of
> both baserels and join rels contain only Vars.  Any computations we have
> to do are postponed to the final top-level targetlist, whether they are
> volatile or not.  The fact that this policy is applied regardless of
> volatility may explain why James isn't seeing volatility checks where he
> expected them.  The core (and, I think, only) exception to this is that
> an expression that is a sort or group key has to be evaluated earlier.
> But even then, it won't be pushed down as far as the reltargets of any
> scan or join relations; it'll be computed after the final join.

Is that primarily a project policy? Or a limitation of our current
planner (just can't push things down/carry the results back up as much
as we'd like)? Or something else? In particular, it seems plausible
there are cases where pushing down the sort on a non-Var expression to
the base rel could improve plans, so I'm wondering if there's a reason
to intentionally avoid that in the long or short run (or both).

> 3. If we have a volatile sort/group key, that constrains the set of plans
> we can validly generate, because the key expression mustn't be evaluated
> redundantly.  With the addition of parallelism, a non-parallel-safe
> sort/group key expression likewise constrains the set of valid plans,
> even if it's not considered volatile.

This makes a lot of sense (just unfortunate we had to re-discover it).

> 4. In the past, the only way we'd consider non-Var sort keys in any way
> during scan/join planning was if (a) a relation had an expression index
> matching a requested sort key; of course, that's a-fortiori going to be
> a non-volatile expression.  Or (b) we needed to sort in order to provide
> suitable input for a merge join ... but again, volatile expressions
> aren't considered candidates for mergejoin.  So there basically wasn't
> any need to consider sorting by volatiles below the top level sort/group
> processing, and that again contributes to why you don't see explicit
> tests in that area.  I'm not sure offhand whether the parallel-query
> patches added anything to guard against nonvolatile-but-parallel-unsafe
> sort expressions.  If not, there might be live bugs there independently
> of incremental sort; or that might be unreachable because of overall
> limitations on parallelism.

Interesting: so merge joins are an example of us pushing down sorts,
which I assume means (part of) the answer to my question on (2) is
that there's nothing inherently wrong/broken with evaluating
expressions lower down the tree as long as we're careful about what is
safe/unsafe with respect to volatility and parallelism?

> 5. While I've not dug far enough into the code to identify the exact
> issue, the impression I have about this bug and the parallel-unsafe
> subplan bug is that the incremental sort code is generating Paths
> without due regard to point 3.  If it is making paths based on explicit
> sorts for final-stage pathkeys, independently of either expression
> indexes or mergejoin requirements, that could well explain why it's
> got problems that weren't seen before.

Yes, that's correct. Tomas already pushed the fix for the volatility
case, and there's a new thread [1] for the parallel safety concerns.

> 6. I did look at ebb7ae839, and I suspect that the last part of
> find_em_expr_usable_for_sorting_rel(), ie the search of the reltarget
> list, is dead code.  There is no case where a volatile expression would
> appear there, per point 2.  The committed regression test cases
> certainly do not provide a counterexample: a look at the code coverage
> report will show you that that loop never finds a match in any
> regression test case.  I'd be inclined to flush that code and just
> say that we won't return volatile expressions here.

Hmm, I see what you mean. It's theoretically valuable if we ended up
with volatile expressions there, but...that seems at best unlikely.

I have wondered if we should strictly require the expression to be in
the target list even if nonvolatile, but prepare_sort_from_pathkeys
doesn't seem to think that's necessary, so I'm comfortable with that
unless there's something you know we haven't considered.

Alternatively, if we decide we need to be more strict (whether because
of discussion under (1) or because we otherwise decide only Vars can
be allowed here), then would we still potentially need/want the
relabel stripping?

Depending on what we conclude on that, I'll plan to address that in
the thread in [1].

James

1: https://www.postgresql.org/message-id/flat/CAAaqYe8cK3g5CfLC4w7bs%3DhC0mSksZC%3DH5M8LSchj5e5OxpTAg%40mail.gmail.com



Re: enable_incremental_sort changes query behavior

From
Tom Lane
Date:
James Coleman <jtc331@gmail.com> writes:
> On Mon, Nov 23, 2020 at 2:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 1. James was wondering, far upthread, why we would do projections
>> pre-sort or post-sort.  I think the answers can be found by studying
>> planner.c's make_sort_input_target(), which separates out what we want
>> to do pre-sort and post-sort.

> Does it imply we need to intentionally avoid SRFs also?

It's sort of a wart that we allow SRFs in ORDER BY at all, but my
expectation is that make_sort_input_target would prevent lower levels
of the planner from needing to think about that.  We don't allow SRFs
in index expressions, nor in WHERE clauses (so they'll never come up
as mergejoin sort keys).  So really there's no way that scan/join
processing should need to consider such cases.  Such a sort would
only ever be implemented via a final Sort atop ProjectSet.

>> 2. Robert is correct that under normal circumstances, the targetlists of
>> both baserels and join rels contain only Vars.

> Is that primarily a project policy? Or a limitation of our current
> planner (just can't push things down/carry the results back up as much
> as we'd like)? Or something else? In particular, it seems plausible
> there are cases where pushing down the sort on a non-Var expression to
> the base rel could improve plans, so I'm wondering if there's a reason
> to intentionally avoid that in the long or short run (or both).

I think you've just rediscovered Joe Hellerstein's thesis topic [1].
We ripped out the remnants of that code ages ago (search very early
git states for "JMH" if you're interested), because the complexity
vs. benefit ratio seemed pretty bad.  Maybe there'll be a case for
putting it back some day, but I'm dubious.  Note that we have the
ability to push down sorts-on-expressions anyway; that's not constrained
by what is in the relation targetlists.

> Interesting: so merge joins are an example of us pushing down sorts,
> which I assume means (part of) the answer to my question on (2) is
> that there's nothing inherently wrong/broken with evaluating
> expressions lower down the tree as long as we're careful about what is
> safe/unsafe with respect to volatility and parallelism?

Right, I don't see any fundamental problem with that, we just have
to be careful about these constraints.

> I have wondered if we should strictly require the expression to be in
> the target list even if nonvolatile, but prepare_sort_from_pathkeys
> doesn't seem to think that's necessary, so I'm comfortable with that
> unless there's something you know we haven't considered.

No, prepare_sort_from_pathkeys is happy to build a sort expression if it
can't find it already computed in the input.  The secret here is that
we should never get to that code with a "dangerous" sort expression,
because we should never have made a Path that would request such a thing
in the first place.  It's fairly obvious to me how we deal with the
consideration for volatile sortkeys.  We cannot restrict parallel-unsafe
sortkeys quite the same way, because the restriction only applies to
parallel not non-parallel plans.  Maybe it's sufficient to mark Paths
as parallel unsafe if they sort by parallel-unsafe sortkeys?  Is there
anyplace that is Just Assuming that paths it builds will be
parallel-safe?

            regards, tom lane

[1] https://www.postgresql.org/message-id/28216.1023340706%40sss.pgh.pa.us



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Tue, Nov 24, 2020 at 2:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> James Coleman <jtc331@gmail.com> writes:
> > On Mon, Nov 23, 2020 at 2:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> 1. James was wondering, far upthread, why we would do projections
> >> pre-sort or post-sort.  I think the answers can be found by studying
> >> planner.c's make_sort_input_target(), which separates out what we want
> >> to do pre-sort and post-sort.
>
> > Does it imply we need to intentionally avoid SRFs also?
>
> It's sort of a wart that we allow SRFs in ORDER BY at all, but my
> expectation is that make_sort_input_target would prevent lower levels
> of the planner from needing to think about that.  We don't allow SRFs
> in index expressions, nor in WHERE clauses (so they'll never come up
> as mergejoin sort keys).  So really there's no way that scan/join
> processing should need to consider such cases.  Such a sort would
> only ever be implemented via a final Sort atop ProjectSet.

In this case though we're sorting based on "interesting" pathkeys,
which means we don't don't necessarily have index expressions or
mergejoin sort keys. For example, even with the bugfix patch (from the
parallel fix thread I've linked to previously) applied, I'm able to
generate this (with of course some GUCs "tweaked"):

select unique1 from tenk1 order by unnest('{1}'::int[]);

 Gather Merge
   Workers Planned: 2
   ->  Sort
         Sort Key: (unnest('{1}'::integer[]))
         ->  ProjectSet
               ->  Parallel Index Only Scan using tenk1_unique1 on tenk1

If I understand correctly, that's a violation of the spirit of what
the comments above make_sort_input_target(). If that's true, then it
should be pretty easy to disallow them from being considered. Given
the existing restrictions on where SRFs can be placed in a SELECT
(e.g., no CASE/COALESCE) and my assumption (without having thoroughly
verified this) that SRFs aren't allowed as arguments to functions or
as arguments to any other expression (I assume only scalars are
allowed), would it be sufficient to check the pathkey expression
(without recursion) to see if it's a FuncExpr that returns a set?

> >> 2. Robert is correct that under normal circumstances, the targetlists of
> >> both baserels and join rels contain only Vars.
>
> > Is that primarily a project policy? Or a limitation of our current
> > planner (just can't push things down/carry the results back up as much
> > as we'd like)? Or something else? In particular, it seems plausible
> > there are cases where pushing down the sort on a non-Var expression to
> > the base rel could improve plans, so I'm wondering if there's a reason
> > to intentionally avoid that in the long or short run (or both).
>
> I think you've just rediscovered Joe Hellerstein's thesis topic [1].
> We ripped out the remnants of that code ages ago (search very early
> git states for "JMH" if you're interested), because the complexity
> vs. benefit ratio seemed pretty bad.  Maybe there'll be a case for
> putting it back some day, but I'm dubious.  Note that we have the
> ability to push down sorts-on-expressions anyway; that's not constrained
> by what is in the relation targetlists.

I'm doing some grepping now to see what that work entailed, but I'm
not sure that what I'm describing is the same thing. By "pushing down
a non-Var expression to the base rel" I mean what (to my ears) sounds
like what you're saying by "we have the ability to push down
sorts-on-expressions anyway; that's not constrained by what is in the
relation targetlists". The target list entries I'm talking about for
these expressions are the ones inserted by
prepare_sort_from_pathkeys(), which in PG13 happens just a bit more
frequently since, at least in parallel query, consider sort paths
lower than we did before based on interesting pathkeys (for now just
query_pathkeys) in generate_useful_gather_paths().

> > Interesting: so merge joins are an example of us pushing down sorts,
> > which I assume means (part of) the answer to my question on (2) is
> > that there's nothing inherently wrong/broken with evaluating
> > expressions lower down the tree as long as we're careful about what is
> > safe/unsafe with respect to volatility and parallelism?
>
> Right, I don't see any fundamental problem with that, we just have
> to be careful about these constraints.

Great. That's exactly what I was looking for. To summarize: the
constraints are on volatility, parallel safety, and SRFs.

> > I have wondered if we should strictly require the expression to be in
> > the target list even if nonvolatile, but prepare_sort_from_pathkeys
> > doesn't seem to think that's necessary, so I'm comfortable with that
> > unless there's something you know we haven't considered.
>
> No, prepare_sort_from_pathkeys is happy to build a sort expression if it
> can't find it already computed in the input.  The secret here is that
> we should never get to that code with a "dangerous" sort expression,
> because we should never have made a Path that would request such a thing
> in the first place.  It's fairly obvious to me how we deal with the
> consideration for volatile sortkeys.  We cannot restrict parallel-unsafe
> sortkeys quite the same way, because the restriction only applies to
> parallel not non-parallel plans.  Maybe it's sufficient to mark Paths
> as parallel unsafe if they sort by parallel-unsafe sortkeys?  Is there
> anyplace that is Just Assuming that paths it builds will be
> parallel-safe?

It seems actually quite simple as I understand it: in my proposed
patch in [1] we know in find_em_expr_usable_for_sorting_rel() whether
or not we're working on parallel paths, and so we're able to rely
simply on is_parallel_safe() for the expr under consideration.

James

1: https://www.postgresql.org/message-id/flat/CAAaqYe8cK3g5CfLC4w7bs%3DhC0mSksZC%3DH5M8LSchj5e5OxpTAg%40mail.gmail.com



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Wed, Nov 25, 2020 at 2:53 PM James Coleman <jtc331@gmail.com> wrote:
>
> On Tue, Nov 24, 2020 at 2:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > James Coleman <jtc331@gmail.com> writes:
> > > On Mon, Nov 23, 2020 at 2:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >> 1. James was wondering, far upthread, why we would do projections
> > >> pre-sort or post-sort.  I think the answers can be found by studying
> > >> planner.c's make_sort_input_target(), which separates out what we want
> > >> to do pre-sort and post-sort.
> >
> > > Does it imply we need to intentionally avoid SRFs also?
> >
> > It's sort of a wart that we allow SRFs in ORDER BY at all, but my
> > expectation is that make_sort_input_target would prevent lower levels
> > of the planner from needing to think about that.  We don't allow SRFs
> > in index expressions, nor in WHERE clauses (so they'll never come up
> > as mergejoin sort keys).  So really there's no way that scan/join
> > processing should need to consider such cases.  Such a sort would
> > only ever be implemented via a final Sort atop ProjectSet.
>
> In this case though we're sorting based on "interesting" pathkeys,
> which means we don't don't necessarily have index expressions or
> mergejoin sort keys. For example, even with the bugfix patch (from the
> parallel fix thread I've linked to previously) applied, I'm able to
> generate this (with of course some GUCs "tweaked"):
>
> select unique1 from tenk1 order by unnest('{1}'::int[]);
>
>  Gather Merge
>    Workers Planned: 2
>    ->  Sort
>          Sort Key: (unnest('{1}'::integer[]))
>          ->  ProjectSet
>                ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
>
> If I understand correctly, that's a violation of the spirit of what
> the comments above make_sort_input_target(). If that's true, then it
> should be pretty easy to disallow them from being considered. Given
> the existing restrictions on where SRFs can be placed in a SELECT
> (e.g., no CASE/COALESCE) and my assumption (without having thoroughly
> verified this) that SRFs aren't allowed as arguments to functions or
> as arguments to any other expression (I assume only scalars are
> allowed), would it be sufficient to check the pathkey expression
> (without recursion) to see if it's a FuncExpr that returns a set?

Here's the plan with a change to restrict SRFs here:

 Sort
   Sort Key: (unnest('{1,2}'::integer[]))
   ->  Gather
         Workers Planned: 2
         ->  ProjectSet
               ->  Parallel Index Only Scan using tenk1_unique1 on tenk1

I'm a bit surprised the ProjectSet is above the Index Scan rather than
above the Gather, but maybe this is still more correct?

James



Re: enable_incremental_sort changes query behavior

From
James Coleman
Date:
On Tue, Nov 3, 2020 at 4:39 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> I've pushed the 0001 part, i.e. the main fix. Not sure about the other
> parts (comments and moving the code back to postgres_fdw) yet.

I noticed the CF entry [1] got moved to the next CF; I'm thinking this
entry should be marked as committed since the fix for the initial bug
reported on this thread has been pushed. We have the parallel safety
issue outstanding, but there's a separate thread and patch for that,
so I'll make a new CF entry for that.

I can mark it as committed, but I'm not sure how to "undo" (or if
that's desirable) the move to the next CF.

Thoughts?

James

1: https://commitfest.postgresql.org/30/2754/



Re: enable_incremental_sort changes query behavior

From
Anastasia Lubennikova
Date:
On 01.12.2020 03:08, James Coleman wrote:
> On Tue, Nov 3, 2020 at 4:39 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> I've pushed the 0001 part, i.e. the main fix. Not sure about the other
>> parts (comments and moving the code back to postgres_fdw) yet.
> I noticed the CF entry [1] got moved to the next CF; I'm thinking this
> entry should be marked as committed since the fix for the initial bug
> reported on this thread has been pushed. We have the parallel safety
> issue outstanding, but there's a separate thread and patch for that,
> so I'll make a new CF entry for that.
>
> I can mark it as committed, but I'm not sure how to "undo" (or if
> that's desirable) the move to the next CF.
>
> Thoughts?
>
> James
>
> 1: https://commitfest.postgresql.org/30/2754/
>
>
Oops...
I must have rushed with this one, thank you for noticing.
I don't see how to move it back either. I think it's fine to mark it as 
Committed where it is now.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: enable_incremental_sort changes query behavior

From
Jaime Casanova
Date:
On Tue, Dec 1, 2020 at 4:08 AM Anastasia Lubennikova
<a.lubennikova@postgrespro.ru> wrote:
>
> On 01.12.2020 03:08, James Coleman wrote:
> > On Tue, Nov 3, 2020 at 4:39 PM Tomas Vondra
> > <tomas.vondra@2ndquadrant.com> wrote:
> >> I've pushed the 0001 part, i.e. the main fix. Not sure about the other
> >> parts (comments and moving the code back to postgres_fdw) yet.
> > I noticed the CF entry [1] got moved to the next CF; I'm thinking this
> > entry should be marked as committed since the fix for the initial bug
> > reported on this thread has been pushed. We have the parallel safety
> > issue outstanding, but there's a separate thread and patch for that,
> > so I'll make a new CF entry for that.
> >
> > I can mark it as committed, but I'm not sure how to "undo" (or if
> > that's desirable) the move to the next CF.
> >
> > Thoughts?
> >
> > James
> >
> > 1: https://commitfest.postgresql.org/30/2754/
> >
> >
> Oops...
> I must have rushed with this one, thank you for noticing.
> I don't see how to move it back either. I think it's fine to mark it as
> Committed where it is now.
>

BTW, I still see this one as needs review


--
Jaime Casanova
Professional PostgreSQL: Soporte 24x7 y capacitación



Re: enable_incremental_sort changes query behavior

From
Tomas Vondra
Date:

On 12/16/20 1:51 AM, Jaime Casanova wrote:
> On Tue, Dec 1, 2020 at 4:08 AM Anastasia Lubennikova
> <a.lubennikova@postgrespro.ru> wrote:
>>
>> On 01.12.2020 03:08, James Coleman wrote:
>>> On Tue, Nov 3, 2020 at 4:39 PM Tomas Vondra
>>> <tomas.vondra@2ndquadrant.com> wrote:
>>>> I've pushed the 0001 part, i.e. the main fix. Not sure about the other
>>>> parts (comments and moving the code back to postgres_fdw) yet.
>>> I noticed the CF entry [1] got moved to the next CF; I'm thinking this
>>> entry should be marked as committed since the fix for the initial bug
>>> reported on this thread has been pushed. We have the parallel safety
>>> issue outstanding, but there's a separate thread and patch for that,
>>> so I'll make a new CF entry for that.
>>>
>>> I can mark it as committed, but I'm not sure how to "undo" (or if
>>> that's desirable) the move to the next CF.
>>>
>>> Thoughts?
>>>
>>> James
>>>
>>> 1: https://commitfest.postgresql.org/30/2754/
>>>
>>>
>> Oops...
>> I must have rushed with this one, thank you for noticing.
>> I don't see how to move it back either. I think it's fine to mark it as
>> Committed where it is now.
>>
> 
> BTW, I still see this one as needs review
> 

Hmm, not sure what was the plan, but I'll mark it as committed once I
push the remaining patches (for parallel-safe checks, SRFs etc.) in a
couple days.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company