Re: On partitioning - Mailing list pgsql-hackers

From Amit Langote
Subject Re: On partitioning
Date
Msg-id 001601d0180c$a212f550$e638dff0$@lab.ntt.co.jp
Whole thread Raw
In response to Re: On partitioning  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: On partitioning  (Claudio Freire <klaussfreire@gmail.com>)
Re: On partitioning  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Alvaro wrote:
> Claudio Freire wrote:
>
> > Fair enough, but that's not the same as not requiring easy proofs. The
> > planner might not the one doing the proofs, but you still need proofs.
> >
> > Even if the proving method is hardcoded into the partitioning method,
> > as in the case of list or range partitioning, it's still a proof. With
> > arbitrary functions (which is what prompted me to mention proofs) you
> > can't do that. A function works very well for inserting, but not for
> > selecting.
> >
> > I could be wrong though. Maybe there's a way to turn SQL functions
> > into analyzable things? But it would still be very easy to shoot
> > yourself in the foot by writing one that is too complex.
>
> Arbitrary SQL expressions (including functions) are not the thing to use
> for partitioning -- at least that's how I understand this whole
> discussion.  I don't think you want to do "proofs" as such -- they are
> expensive.
>

This means if a user puts arbitrary expressions in a partition definition, say,

... FOR VALUES  extract(month from current_date) TO extract(month from current_date + interval '3 months'),

we make sure that those expressions are pre-computed to literal values. The exact time when that happens is open for
discussionI guess. It could be either DDL time or, if feasible, during relation cache building when we compute the
valuefrom pg_node_tree of this expression which we may choose to store in the partition definition catalog. The former
entailsan obvious challenge of figuring out how we store the computed value into catalog (pg_node_tree of a Const?). 

> To make this discussion a bit clearer, there are two things to
> distinguish: one is routing tuples, when an INSERT or COPY command
> references the partitioned table, into the individual partitions
> (ingress); the other is deciding which partitions to read when a SELECT
> query wants to read tuples from the partitioned table (egress).
>
> On ingress, what you want is something like being able to do something
> on the tuple that tells you which partition it belongs into.  Ideally
> this is something much lighter than running an expression; if you can
> just apply an operator to the partitioning column values, that should be
> plenty fast.  This requires no proof.
>

And I am thinking this's all executor stuff.

> On egress you need some direct way to compare the scan quals with the
> partitioning values.  I would imagine this to be similar to how scan
> quals are compared to the values stored in a BRIN index: each scan qual
> has a corresponding operator strategy and a scan key, and you can say
> "aye" or "nay" based on a small set of operations that can be run
> cheaply, again without any proof or running arbitrary expressions.
>

My knowledge of this is far from being perfect, though to clear any confusions -

As far as planning is concerned, I could not imagine how index access method way of pruning partitions could be made to
work.Of course, I may be missing something.  

When you say "scan qual has a corresponding operator strategy", I'd think that is a part of scan key in executor, no?

Thanks,
Amit





pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Next
From: Michael Paquier
Date:
Subject: Re: B-Tree support function number 3 (strxfrm() optimization)