Re: On partitioning - Mailing list pgsql-hackers

From Amit Langote
Subject Re: On partitioning
Date
Msg-id 01e801d00f9c$5bfc98a0$13f5c9e0$@lab.ntt.co.jp
Whole thread Raw
In response to Re: On partitioning  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
Hi,

> From: Alvaro Herrera [mailto:alvherre@2ndquadrant.com]
> Amit Langote wrote:
> 
> > From: Robert Haas [mailto:robertmhaas@gmail.com]
> 
> > > What is an overflow partition and why do we want that?
> >
> > That would be a default partition. That is, where the tuples that
> > don't belong elsewhere (other defined partitions) go. VALUES clause of
> > the definition for such a partition would look like:
> >
> > (a range partition) ... VALUES LESS THAN MAXVALUE
> > (a list partition) ... VALUES DEFAULT
> >
> > There has been discussion about whether there shouldn't be such a
> > place for tuples to go. That is, it should generate an error if a
> > tuple can't go anywhere (or support auto-creating a new one like in
> > interval partitioning?)
> 
> In my design I initially had overflow partitions too, because I
> inherited the idea from Itagaki Takahiro's patch.  Eventually I realized
> that it's a useless concept, because you can always have leftmost and
> rightmost partitions, which are just regular partitions (except they
> don't have a "low key", resp. "high key").  If you don't define
> unbounded partitions at either side, it's fine, you just raise an error
> whenever the user tries to insert a value for which there is no
> partition.
> 

I think your mention of "low key" and "high key" of a partition has forced
me into rethinking how I was going about this. For example, in Itagaki-san's
patch, only upper bound for a range partition would go into the catalog
while the CHECK expression for that partition would use upper bound for
previous partition as lower bound for the partition (an expression of form
lower <= key AND key < upper). I'd think that's presumptuous to a certain
degree in that the arrangement does not allow holes in the range. That also
means range partitions on either end are unbounded on one side. In fact,
what I called overflow partition would get (last_partitions_upper <= key) as
its CHECK expression and vice versa.

You suggest such unbounded partitions be disallowed? Which would mean we do
not allow either of the partition bounds to be null in case of a range
partition or list of values to be non-empty in case of a LIST partition.

> Not real clear to me how this applies to list partitioning, but I have
> the hunch that it'd be better to deal with that without overflow
> partitions as well.
> 

Likewise, CHECK expression for a LIST overflow partition would look
something like NOT (key = ANY ( ARRAY[<values-of-all-other-partitions>])).

By the way, I am not saying the primary metadata of partitions is CHECK
expression anymore. I hope we can do away without them for partitioning
sooner than later.  I am looking to have bounds/values stored in the
partition definition catalog not as an expression but as something readily
amenable to use at places where it's useful. Suggestions are welcome!

> BTW I think auto-creating partitions is a bad idea in general, because
> you get into lock escalation mess and furthermore you have to waste time
> checking for existance beforehand, which lowers performance.  Just have
> a very easy command that users can run ahead of time (something like
> "CREATE PARTITION FOR VALUE now() + '30 days'", whatever), and
> preferrably one that doesn't fail if the partition already exist; that
> way, users can have (for instance) a daily create-30-partitions-ahead
> procedure which most days would only create one partition (the one for
> 30 days in the future) but whenever the odd case happens that the server
> is turned off just at that time someday, it creates two -- one belt, 29
> suspenders.
> 

Yeah, I mentioned auto-partitioning just to know if that's how people
usually prefer to have overflow cases dealt with. I'd much rather focus on
straightforward cases at this point. Having said that, I agree that users of
partitioning should have a mechanism you mention though not sure about the
details.

Thanks,
Amit





pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Parallel Seq Scan
Next
From: Heikki Linnakangas
Date:
Subject: Re: libpq pipelining