Thread: Partition prune with stable Expr

Partition prune with stable Expr

From
Andy Fan
Date:
Hi:

I find we can't prune partitions in the planner if the qual is a stable function.

CREATE TABLE measurement (
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int
) PARTITION BY RANGE (logdate);

CREATE TABLE measurement_y2006m02 PARTITION OF measurement
    FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

CREATE TABLE measurement_y2006m03 PARTITION OF measurement
    FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');

postgres=# explain (costs off) select * from measurement
postgres-# where logdate = to_date('2006-03-02', 'yyyy-mm-dd');
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Append
   Subplans Removed: 1  <-- Here
   ->  Seq Scan on measurement_y2006m03 measurement_1
         Filter: (logdate = to_date('2006-03-02'::text, 'yyyy-mm-dd'::text))
(4 rows)

IMO, we should do it. Why not?  The attached is used to show the things
in my mind. 

Btw,  why the to_date function is declared as stable rather than immutable
since it always delivers the same result for the same inputs. 

--
Best Regards
Andy Fan
Attachment

Re: Partition prune with stable Expr

From
David Rowley
Date:
On Mon, 28 Sep 2020 at 08:59, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> I find we can't prune partitions in the planner if the qual is a stable function.

> IMO, we should do it. Why not?

Thanks for showing an interest in partition pruning. Unfortunately,
it's not possible to use stable functions to prune partitions during
planning.

NOW() is one example of a function that's stable, but the return value
will change over time. If we used the return value of that to perform
partition pruning then we'd end up with a plan that's wrong over time.

Here's an example:

create table rp (t timestamp) partition by range(t);
create table rp1 partition of rp for values from ('now'::timestamp) to
('now'::timestamp + '1 min'::interval);
create table rp2 partition of rp for values from ('now'::timestamp +
'1 min'::interval) to ('now'::timestamp + '2 min'::interval);
insert into rp select t from generate_Series('now'::timestamp,
'now'::timestamp + '1 min 59 sec'::interval, '1 sec'::interval) t;

prepare q1 as select count(*) from rp where t > now() and t < now() +
'10 sec'::interval;

Now, if you run the following command with your patch, it'll prune the
rp2 partition as it's not required for the WHERE clause (at the time
we planned). However, just wait 1 minute and execute the plan again.
Oops, my rows vanished!

execute q1; select pg_sleep(60); execute q1;

The 2nd execute should have returned 10 rows, the same as the first
(assuming you executed that directly after creating the tables)

Run-time partition pruning was invented just for this purpose.

David



Re: Partition prune with stable Expr

From
Andy Fan
Date:
Thank you David for coming:) 

On Mon, Sep 28, 2020 at 4:46 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Mon, 28 Sep 2020 at 08:59, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> I find we can't prune partitions in the planner if the qual is a stable function.

> IMO, we should do it. Why not?

Thanks for showing an interest in partition pruning. Unfortunately,
it's not possible to use stable functions to prune partitions during
planning.

NOW() is one example of a function that's stable, but the return value
will change over time. If we used the return value of that to perform
partition pruning then we'd end up with a plan that's wrong over time.


Sigh.. I understand you now, I ignored the plan can be cached for later use.
Without that,  we should be able to prune with stable function.  I know the
run-time partition prune can help with this, but it can't help with planning time. 
I run into some cases that SELECT * FROM p WHERE pkey = to_date(..);
p has 1500+ partitions and planning takes lots of time.  and users are not
willing to remove the to_date('2018-11-11', 'yyyy-mm-dd') style code since
too much and can't find out all of them at once.  Actually I think to_date should
be marked as immuable rather than stable. 


--
Best Regards
Andy Fan

Re: Partition prune with stable Expr

From
Tom Lane
Date:
Andy Fan <zhihui.fan1213@gmail.com> writes:
> On Mon, Sep 28, 2020 at 4:46 AM David Rowley <dgrowleyml@gmail.com> wrote:
>> Thanks for showing an interest in partition pruning. Unfortunately,
>> it's not possible to use stable functions to prune partitions during
>> planning.

> Sigh.. I understand you now, I ignored the plan can be cached for later use.
> Without that,  we should be able to prune with stable function.

No, that's still wrong.  The contract for a stable function is that
its result won't change over execution of a single query; but that
says *execution*, not *planning and execution*.

In particular, the canonical example of a stable function is one
whose result depends on a database query.  The reason it can be
considered stable is that within a single outer query, the MVCC
snapshot it's used with won't change.  But we take a new snapshot
(later than the planner's snapshot) when beginning execution.

Somebody (Robert Haas, if memory serves, which it might not)
tried to change that a few years ago.  It blew up pretty well,
and was eventually reverted, because of undesirable side-effects
on user-visible query semantics.  You'd have to check the archives
for details.

It's possible that we could make that work differently in serializable
mode, thanks to the longer persistence of snapshots.  Not sure that
it'd be desirable for planning to work differently in serializable
mode, though.

            regards, tom lane



Re: Partition prune with stable Expr

From
Andy Fan
Date:


On Mon, Sep 28, 2020 at 7:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andy Fan <zhihui.fan1213@gmail.com> writes:
> On Mon, Sep 28, 2020 at 4:46 AM David Rowley <dgrowleyml@gmail.com> wrote:
>> Thanks for showing an interest in partition pruning. Unfortunately,
>> it's not possible to use stable functions to prune partitions during
>> planning.

> Sigh.. I understand you now, I ignored the plan can be cached for later use.
> Without that,  we should be able to prune with stable function.

No, that's still wrong.  The contract for a stable function is that
its result won't change over execution of a single query; but that
says *execution*, not *planning and execution*.

In particular, the canonical example of a stable function is one
whose result depends on a database query.  The reason it can be
considered stable is that within a single outer query, the MVCC
snapshot it's used with won't change.  But we take a new snapshot
(later than the planner's snapshot) when beginning execution.

Somebody (Robert Haas, if memory serves, which it might not)
tried to change that a few years ago.  It blew up pretty well,
and was eventually reverted, because of undesirable side-effects
on user-visible query semantics.  You'd have to check the archives
for details.

It's possible that we could make that work differently in serializable
mode, thanks to the longer persistence of snapshots.  Not sure that
it'd be desirable for planning to work differently in serializable
mode, though.

                        regards, tom lane

Well, that's very interesting.  Specific to my user case, 
SELECT * FROM p WHERE pkey = to_date('2018-12-13', 'yyyy-mm-dd)'; 
p has 1500+ partitions and planning takes lots of time, which is so same
with SELECT * FROM p WHERE pkey = '2018-12-13',  however the planning
time difference is so huge, that doesn't make sense in human view.  Can
we do something for that?  to_date(text, text) should be a "immutable" function
IMO.  Does that have a semantic issue or other issues?


--
Best Regards
Andy Fan

Re: Partition prune with stable Expr

From
Tom Lane
Date:
Andy Fan <zhihui.fan1213@gmail.com> writes:
> Well, that's very interesting.  Specific to my user case,
> SELECT * FROM p WHERE pkey = to_date('2018-12-13', 'yyyy-mm-dd)';
> p has 1500+ partitions and planning takes lots of time, which is so same
> with SELECT * FROM p WHERE pkey = '2018-12-13',  however the planning
> time difference is so huge, that doesn't make sense in human view.  Can
> we do something for that?  to_date(text, text) should be a "immutable"
> function IMO.  Does that have a semantic issue or other issues?

Yeah.  It depends on the lc_time setting, and possibly also the timezone
GUC.  (Admittedly, common values of the format string would not have
any lc_time dependency, but the immutability property is not fine-grained
enough to recognize that.)

            regards, tom lane



Re: Partition prune with stable Expr

From
Andy Fan
Date:

On Mon, Sep 28, 2020 at 9:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andy Fan <zhihui.fan1213@gmail.com> writes:
> Well, that's very interesting.  Specific to my user case,
> SELECT * FROM p WHERE pkey = to_date('2018-12-13', 'yyyy-mm-dd)';
> p has 1500+ partitions and planning takes lots of time, which is so same
> with SELECT * FROM p WHERE pkey = '2018-12-13',  however the planning
> time difference is so huge, that doesn't make sense in human view.  Can
> we do something for that?  to_date(text, text) should be a "immutable"
> function IMO.  Does that have a semantic issue or other issues?

Yeah.  It depends on the lc_time setting, and possibly also the timezone
GUC.  (Admittedly, common values of the format string would not have
any lc_time dependency, but the immutability property is not fine-grained
enough to recognize that.)

                        regards, tom lane

Thanks for your reply. Even it has something on GUC or lc_time setting, suppose
it should be decided at planning time.  Do we have concerns about changes
between planning and execution?

The attached patch marked some common formatting function as immutable,  
only one partition prune test case needed fixing because of this. I only changed
to_char/to_date/to_timestamp,  however the whole list is below. I can change 
all of them if needed.

     proname     | count
-----------------+-------
 to_ascii        |     3
 to_char         |     8
 to_date         |     1
 to_hex          |     2
 to_json         |     1
 to_jsonb        |     1
 to_number       |     1
 to_regclass     |     1
 to_regcollation |     1
 to_regnamespace |     1
 to_regoper      |     1
 to_regoperator  |     1
 to_regproc      |     1
 to_regprocedure |     1
 to_regrole      |     1
 to_regtype      |     1
 to_timestamp    |     2
 to_tsquery      |     2
 to_tsvector     |     6
(19 rows)

With this change, the exact issue on the beginning of this thread can be fixed as
well with this patch. 

--
Best Regards
Andy Fan
Attachment

Re: Partition prune with stable Expr

From
Jesse Zhang
Date:
On Sun, Sep 27, 2020 at 7:52 PM Andy Fan wrote:
>
>
> On Mon, Sep 28, 2020 at 9:15 AM Tom Lane wrote:
>>
>> Andy Fan writes:
>> > Well, that's very interesting.  Specific to my user case,
>> > SELECT * FROM p WHERE pkey = to_date('2018-12-13', 'yyyy-mm-dd)';
>> > p has 1500+ partitions and planning takes lots of time, which is so same
>> > with SELECT * FROM p WHERE pkey = '2018-12-13',  however the planning
>> > time difference is so huge, that doesn't make sense in human view.  Can
>> > we do something for that?  to_date(text, text) should be a "immutable"
>> > function IMO.  Does that have a semantic issue or other issues?
>>
>> Yeah.  It depends on the lc_time setting, and possibly also the timezone
>> GUC.  (Admittedly, common values of the format string would not have
>> any lc_time dependency, but the immutability property is not fine-grained
>> enough to recognize that.)
>>
>>                         regards, tom lane
>
>
> Thanks for your reply. Even it has something on GUC or lc_time setting, suppose
> it should be decided at planning time.  Do we have concerns about changes
> between planning and execution?

Planner can be called at prepared statement creation time, like

PREPARE yolo() AS SELECT * FROM foo WHERE pk = to_date(...);

Here, there's an arbitrary gap between planning time, and execution.

>
> The attached patch marked some common formatting function as immutable,
> only one partition prune test case needed fixing because of this. I only changed
> to_char/to_date/to_timestamp,  however the whole list is below. I can change
> all of them if needed.
>
>      proname     | count
> -----------------+-------
>  to_ascii        |     3
>  to_char         |     8
>  to_date         |     1
>  to_hex          |     2
>  to_json         |     1
>  to_jsonb        |     1
>  to_number       |     1
>  to_regclass     |     1
>  to_regcollation |     1
>  to_regnamespace |     1
>  to_regoper      |     1
>  to_regoperator  |     1
>  to_regproc      |     1
>  to_regprocedure |     1
>  to_regrole      |     1
>  to_regtype      |     1
>  to_timestamp    |     2
>  to_tsquery      |     2
>  to_tsvector     |     6
> (19 rows)
>
This patch is ridiculous.

Immutable functions need to produce the same output for the same
argument values. None of the functions changed in the patch is
immutable: they are all stable because they all depend on GUC settings
(e.g. to_tsvector depends on default_text_search_config).



Re: Partition prune with stable Expr

From
Andy Fan
Date:


On Mon, Sep 28, 2020 at 2:44 PM Jesse Zhang <sbjesse@gmail.com> wrote:
On Sun, Sep 27, 2020 at 7:52 PM Andy Fan wrote:
>
>
> On Mon, Sep 28, 2020 at 9:15 AM Tom Lane wrote:
>>
>> Andy Fan writes:
>> > Well, that's very interesting.  Specific to my user case,
>> > SELECT * FROM p WHERE pkey = to_date('2018-12-13', 'yyyy-mm-dd)';
>> > p has 1500+ partitions and planning takes lots of time, which is so same
>> > with SELECT * FROM p WHERE pkey = '2018-12-13',  however the planning
>> > time difference is so huge, that doesn't make sense in human view.  Can
>> > we do something for that?  to_date(text, text) should be a "immutable"
>> > function IMO.  Does that have a semantic issue or other issues?
>>
>> Yeah.  It depends on the lc_time setting, and possibly also the timezone
>> GUC.  (Admittedly, common values of the format string would not have
>> any lc_time dependency, but the immutability property is not fine-grained
>> enough to recognize that.)
>>
>>                         regards, tom lane
>
>
> Thanks for your reply. Even it has something on GUC or lc_time setting, suppose
> it should be decided at planning time.  Do we have concerns about changes
> between planning and execution?

Planner can be called at prepared statement creation time, like

PREPARE yolo() AS SELECT * FROM foo WHERE pk = to_date(...);

Here, there's an arbitrary gap between planning time, and execution.

>
> The attached patch marked some common formatting function as immutable,
> only one partition prune test case needed fixing because of this. I only changed
> to_char/to_date/to_timestamp,  however the whole list is below. I can change
> all of them if needed.
>
>      proname     | count
> -----------------+-------
>  to_ascii        |     3
>  to_char         |     8
>  to_date         |     1
>  to_hex          |     2
>  to_json         |     1
>  to_jsonb        |     1
>  to_number       |     1
>  to_regclass     |     1
>  to_regcollation |     1
>  to_regnamespace |     1
>  to_regoper      |     1
>  to_regoperator  |     1
>  to_regproc      |     1
>  to_regprocedure |     1
>  to_regrole      |     1
>  to_regtype      |     1
>  to_timestamp    |     2
>  to_tsquery      |     2
>  to_tsvector     |     6
> (19 rows)
>
This patch is ridiculous.

Immutable functions need to produce the same output for the same
argument values. None of the functions changed in the patch is
immutable: they are all stable because they all depend on GUC settings
(e.g. to_tsvector depends on default_text_search_config).

Thanks,  how can I misunderstand Tom's comment above!!  

--
Best Regards
Andy Fan

Re: Partition prune with stable Expr

From
Juan José Santamaría Flecha
Date:

On Mon, Sep 28, 2020 at 9:23 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
On Mon, Sep 28, 2020 at 2:44 PM Jesse Zhang <sbjesse@gmail.com> wrote:
On Sun, Sep 27, 2020 at 7:52 PM Andy Fan wrote:
> On Mon, Sep 28, 2020 at 9:15 AM Tom Lane wrote:
>>
>> Yeah.  It depends on the lc_time setting, and possibly also the timezone
>> GUC.  (Admittedly, common values of the format string would not have
>> any lc_time dependency, but the immutability property is not fine-grained
>> enough to recognize that.)
>
> Thanks for your reply. Even it has something on GUC or lc_time setting, suppose
> it should be decided at planning time.  Do we have concerns about changes
> between planning and execution?

Immutable functions need to produce the same output for the same
argument values. None of the functions changed in the patch is
immutable: they are all stable because they all depend on GUC settings
(e.g. to_tsvector depends on default_text_search_config).

Thanks,  how can I misunderstand Tom's comment above!!

To make data type formatting functions immutable you would need a third input argument with the locale used for that specific function call, as proposed in [1].

As for the original use case, in the documentation there is a tip about using to_date() to handle input formats that cannot be converted by simple casting, and that for most standard date/time formats a cast is the easier way to do so [2], I may also add that is better performing in Postgres.


Regards,

Juan José Santamaría Flecha
 

Re: Partition prune with stable Expr

From
Andy Fan
Date:


On Mon, Sep 28, 2020 at 7:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andy Fan <zhihui.fan1213@gmail.com> writes:
> On Mon, Sep 28, 2020 at 4:46 AM David Rowley <dgrowleyml@gmail.com> wrote:
>> Thanks for showing an interest in partition pruning. Unfortunately,
>> it's not possible to use stable functions to prune partitions during
>> planning.

> Sigh.. I understand you now, I ignored the plan can be cached for later use.
> Without that,  we should be able to prune with stable function.

No, that's still wrong.  The contract for a stable function is that
its result won't change over execution of a single query; but that
says *execution*, not *planning and execution*.

I have a slightly different opinion about the impact of "cached the plan
for later use will be wrong" now.  Generic plan will never be partition 
pruned plan since we don't know which partition to prune at plan time. 
So for any cached plan, it is not a  plan time partition pruned plan.  
Partition prune with stable expr is still  unacceptable even this is not
an issue but hope the snapshot issue will be the only one issue to
fix in future for this direction.   I'd like to know if I am wrong again. 

--
Best Regards
Andy Fan

Re: Partition prune with stable Expr

From
Thomas Kellerer
Date:
Andy Fan schrieb am 28.09.2020 um 02:54:
> Well, that's very interesting.  Specific to my user case, 
> SELECT * FROM p WHERE pkey = to_date('2018-12-13', 'yyyy-mm-dd)';

Why use to_date() at all for a constant value?
Can't you use a standard ANSI date literal (date '2018-12-13')?

Alternatively, you could use make_date(2018,12,13) which is marked as immutable.

Thomas



Re: Partition prune with stable Expr

From
Andy Fan
Date:


On Mon, Sep 28, 2020 at 9:17 PM Thomas Kellerer <shammat@gmx.net> wrote:
Andy Fan schrieb am 28.09.2020 um 02:54:
> Well, that's very interesting.  Specific to my user case, 
> SELECT * FROM p WHERE pkey = to_date('2018-12-13', 'yyyy-mm-dd)';

Why use to_date() at all for a constant value?
Can't you use a standard ANSI date literal (date '2018-12-13')?


This is from lots of legacy code from Oracle and the real case is 
to_date('20181213', 'yyyymmdd'). 

Alternatively, you could use make_date(2018,12,13) which is marked as immutable.


Thanks, Nice to know make_date function. 

--
Best Regards
Andy Fan

Re: Partition prune with stable Expr

From
Andy Fan
Date:


On Mon, Sep 28, 2020 at 8:21 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:


On Mon, Sep 28, 2020 at 7:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andy Fan <zhihui.fan1213@gmail.com> writes:
> On Mon, Sep 28, 2020 at 4:46 AM David Rowley <dgrowleyml@gmail.com> wrote:
>> Thanks for showing an interest in partition pruning. Unfortunately,
>> it's not possible to use stable functions to prune partitions during
>> planning.

> Sigh.. I understand you now, I ignored the plan can be cached for later use.
> Without that,  we should be able to prune with stable function.

No, that's still wrong.  The contract for a stable function is that
its result won't change over execution of a single query; but that
says *execution*, not *planning and execution*.

I have a slightly different opinion about the impact of "cached the plan
for later use will be wrong" now.  Generic plan will never be partition 
pruned plan since we don't know which partition to prune at plan time. 
So for any cached plan, it is not a  plan time partition pruned plan.  
Partition prune with stable expr is still  unacceptable even this is not
an issue but hope the snapshot issue will be the only one issue to
fix in future for this direction.   I'd like to know if I am wrong again. 
 
Indeed I was wrong again.  I'd like to end this thread with this
understanding fix.

prepare s as select * from measurement where logdate = $1 ; 
execute s(now());  

In this case, even if we run the planning time partition prune with a stable
function, we can still get the correct result (ignore the different snapshot case).
since the generic plan includes all the partitions and just do initial partition
prune case.

However if we create the prepared stmt like prepare s as select * from 
measurement where logdate = now();  Then the cached plan should be
wrong.  Actually this example is exactly the same as Daivd's example 
at the beginning..

--
Best Regards
Andy Fan