Re: On partitioning - Mailing list pgsql-hackers

From Andres Freund
Subject Re: On partitioning
Date
Msg-id 20141208193902.GA30157@alap3.anarazel.de
Whole thread Raw
In response to Re: On partitioning  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: On partitioning
List pgsql-hackers
On 2014-12-08 14:05:52 -0500, Robert Haas wrote:
> On Sat, Dec 6, 2014 at 3:06 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Sure, I don't feel we should not provide anyway to take dump
> > for individual partition but not at level of independent table.
> > May be something like --table <table_name>
> > --partition <partition_name>.
> >
> > In general, I think we should try to avoid exposing that partitions are
> > individual tables as that might hinder any future enhancement in that
> > area (example if we someone finds a different and better way to
> > arrange the partition data, then due to the currently exposed syntax,
> > we might feel blocked).
> 
> I guess I'm in disagreement with you - and, perhaps - the majority on
> this point.  I think that ship has already sailed: partitions ARE
> tables.  We can try to make it less necessary for users to ever look
> at those tables as separate objects, and I think that's a good idea.
> But trying to go from a system where partitions are tables, which is
> what we have today, to a system where they are not seems like a bad
> idea to me.  If we make a major break from how things work today,
> we're going to end up having to reimplement stuff that already works.

I don't think this makes much sense. That'd severely restrict our
ability to do stuff for a long time. Unless we can absolutely rely on
the fact that partitions have the same schema and such we'll rob
ourselves of significant optimization opportunities.

> Besides, I haven't really seen anyone propose something that sounds
> like a credible alternative.  If we could make partition objects
> things that the storage layer needs to know about but the query
> planner doesn't need to understand, that'd be maybe worth considering.
> But I don't see any way that that's remotely feasible.  There are lots
> of places that we assume that a heap consists of blocks number 0 up
> through N: CTID pointers, index-to-heap pointers, nodeSeqScan, bits
> and pieces of the way index vacuuming is handled, which in turn bleeds
> into Hot Standby.  You can't just decide that now block numbers are
> going to be replaced by some more complex structure, or even that
> they're now going to be nonlinear, without breaking a huge amount of
> stuff.

I think you're making a wrong fundamental assumption here. Just because
we define partitions to not be full relations doesn't mean we have to
treat them entirely separate. I don't see why a pg_class.relkind = 'p'
entry would be something actually problematic. That'd easily allow to
treat them differently in all the relevant places (all of ALTER TABLE,
DML et al) and still allow all of the current planner/executor
infrastructure. We can even allow direct SELECTs from individual
partitions if we want to - that's trivial to achieve.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Compression of full-page-writes
Next
From: Robert Haas
Date:
Subject: Re: On partitioning