Re: On partitioning - Mailing list pgsql-hackers

From Robert Haas
Subject Re: On partitioning
Date
Msg-id CA+Tgmob801TUBRX5QWojwpAeuHyEqGMT-isPVOiNGun0uGeH4w@mail.gmail.com
Whole thread Raw
In response to Re: On partitioning  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Thu, Nov 13, 2014 at 9:12 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> > I'm not a fan of using pg_class- there are a number of columns in there
>> > which I would *not* wish to be allowed to be different per partition
>> > (starting with relowner and relacl...).  Making those NULL would be just
>> > as bad (probably worse, really, since we'd also need to add new columns
>> > to pg_class to indicate the partitioning...) as having a sparsely
>> > populated new catalog table.
>>
>> I think you are, again, confused as to what we're discussing.  Nobody,
>> including Alvaro, has proposed a design where the individual
>> partitions don't have pg_class entries of some kind.  What we're
>> talking about is where to store the metadata for partition exclusion
>> and tuple routing.
>
> This discussion has gone a few rounds before and, yes, I was just
> jumping into the middle of this particular round, but I'm pretty sure
> I'm not the first to point out that storing the individual partition
> information into pg_class isn't ideal since there are pieces that we
> don't actually want to be different per partition, as I outlined
> previously.  Perhaps what that means is we should actually go the other
> way and move *those* columns into a new catalog instead.
>
> Consider this (totally off-the-cuff):
>
> pg_relation (pg_tables? pg_heaps?)
>   oid
>   relname
>   relnamespace
>   reltype
>   reloftype
>   relowner
>   relam (?)
>   relhas*
>   relisshared
>   relpersistence
>   relkind (?)
>   relnatts
>   relchecks
>   relacl
>   reloptions
>   relhowpartitioned (?)
>
> pg_class
>   pg_relation.oid
>   relfilenode
>   reltablespace
>   relpages
>   reltuples
>   reltoastrelid
>   reltoastidxid
>   relfrozenxid
>   relhasindexes (?)
>   relpartitioninfo (whatever this ends up being)
>
> The general idea being to seperate the user-facing notion of a "table"
> from the underlying implementation, with the implementation allowing
> multiple sets of files to be used for each table.  It's certainly not
> for the faint of heart, but we saw what happened with our inheiritance
> structure allowing different permissions on the child tables- we ended
> up creating a pretty grotty hack to deal with it (going through the
> parent bypasses the permissions).  That's the best solution for that
> situation, but it's far from ideal and it'd be nice to avoid that same
> risk with partitioning.  Additionally, if each partition is in pg_class,
> how are we handling name conflicts?  Why do individual partitions even
> need to have a name?  Do we allow queries against them directly?  etc..

There's certainly something to this, but "not for the faint of heart"
sounds like an understatement.

One of the good things about inheritance is that, if the system
doesn't automatically do the right thing, there's usually an escape
hatch.  If the INSERT trigger you use for tuple routing is too slow,
you can insert directly into the target partition.  If your query
doesn't realize that it can prune away all the partitions but one, or
takes too long to do it, you can query directly against that
partition.  These aren't beautiful things and I'm sure we're all
united in wanting a mechanism that will reduce the need to do them,
but we need to make sure that we are removing the need for the escape
hatch, and not just cementing it shut.

In other words, I don't think there is a problem with people querying
child tables directly; the problem is that people are forced to do so
in order to get good performance.  We shouldn't remove the ability for
people to do that unless we're extremely certain we've fixed the
problem that leads them to wish to do so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Wait free LW_SHARED acquisition - v0.2
Next
From: Robert Haas
Date:
Subject: Re: Wait free LW_SHARED acquisition - v0.2