Re: RangeType internal use - Mailing list pgsql-hackers

From Robert Haas
Subject Re: RangeType internal use
Date
Msg-id CA+Tgmoahntvx112+FWVFdy0w0VHosA-EqcSN9WN_5DPEj8qbcA@mail.gmail.com
Whole thread Raw
In response to Re: RangeType internal use  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
List pgsql-hackers
On Mon, Feb 9, 2015 at 7:54 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> Well, that's debatable IMO (especially your claim that variable-size
>> partitions would be needed by a majority of users).  But in any case,
>> partitioning behavior that is emergent from a bunch of independent pieces
>> of information scattered among N tables seems absolutely untenable from
>> where I sit.  Whatever we support, the behavior needs to be described by
>> *one* chunk of information --- a sorted list of bin bounding values,
>> perhaps.
>
> I'm a bit confused here. I got an impression that partitioning formula
> as you suggest would consist of two pieces of information - an origin
> point & a bin width. Then routing a tuple consists of using exactly
> these two values to tell a bin number and hence a partition in O(1) time
> assuming we've made all partitions be exactly bin-width wide.
>
> You mention here a sorted list of bin bounding values which we can very
> well put together for a partitioned table in its relation descriptor
> based on whatever information we stored in catalog. That is, we can
> always have a *one* chunk of partitioning information as *internal*
> representation irrespective of how generalized we make our on-disk
> representation. We can get O(log N) if not O(1) from that I'd hope. In
> fact, that's what I had in mind about this.

Sure, we can always assemble data into a relation descriptor from
across multiple catalog entries.  I think the question is whether
there is any good reason to split up the information across multiple
relations or whether it might not be better, as I have suggested
multiple times, to serialize it using nodeToString() and stuff it in a
single column in pg_class.  There may be such a reason, but if you
said what it was, I missed that.  This thread started as a discussion
about using range types, and I think it's pretty clear that's a bad
idea, because:

1. There's no guarantee that a range type for the datatype exists at all.
2. If it does, there's no guarantee that it uses the same opclass that
we want to use for partitioning, and I certainly think it would be
strange if we refused to let the user pick the opclass she wants to
use.
3. Even if there is a suitable range type available, it's a poor
representational choice here, because it will be considerably more
verbose than just storing a sorted list of partition bounds.  In the
common case where the ranges are adjacent, you'll end up storing two
copies of every bound but the first and last for no discernable
benefit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: The return value of allocate_recordbuf()
Next
From: Bruce Momjian
Date:
Subject: pg_upgrade bug in handling postgres/template1 databases