Re: On partitioning - Mailing list pgsql-hackers

From Josh Berkus
Subject Re: On partitioning
Date
Msg-id 5486320D.5020700@agliodbs.com
Whole thread Raw
In response to On partitioning  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: On partitioning  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-hackers
On 12/08/2014 02:12 PM, Jim Nasby wrote:
> On 12/8/14, 12:26 PM, Josh Berkus wrote:
>> 4. Creation Locking Problem
>> high probability of lock pile-ups whenever a new partition is created on
>> demand due to multiple backends trying to create the partition at the
>> same time.
>> Not Addressed?
> 
> Do users actually try and create new partitions during DML? That sounds
> doomed to failure in pretty much any system...

There is no question that it would be easier for users to create
partitions on demand automatically.  Particularly if you're partitioning
by something other than time.  For a particular case, consider users on
RDS, which has no cron jobs for creating new partitons; it's on demand
or manually.

It's quite possible that there is no good way to work out the locking
for on-demand partitions though, but *if* we're going to have a 2nd
partition system, I think it's important to at least discuss the
problems with on-demand creation.

>> 11. Hash Partitioning
>> Some users would prefer to partition into a fixed number of
>> hash-allocated partitions.
>> Not Addressed.
> 
> Though, you should be able to do that in either system if you bother to
> define your own hash in a BEFORE trigger...

That doesn't do you any good with the SELECT query, unless you change
your middleware to add a hash(column) to every query.  Which would be
really hard to do for joins.

>> A. COPY/ETL then attach
>> In inheritance partitioning, you can easily build a partition outside
>> the master and then "attach" it, allowing for minimal disturbance of
>> concurrent users.  Could be addressed in the future.
> 
> How much of the desire for this is because our current "row routing"
> solutions are very slow? I suspect that's the biggest reason, and
> hopefully Alvaro's proposal mostly eliminates it.

That doesn't always work, though.  In some cases the partition is being
built using some fairly complex logic (think of partitions which are
based on matviews) and there's no fast way to create the new data.
Again, this is an acceptable casualty of an improved design, but if it
will be so, we should consciously decide that.

>> B. Catchall Partition
>> Many partitioning schemes currently contain a "catchall" partition which
>> accepts rows outside of the range of the partitioning scheme, due to bad
>> input data.  Probably not handled on purpose; Alvaro is proposing that
>> we reject these instead, or create the partitions on demand, which is a
>> legitimate approach.
>>
>> C. Asymmetric Partitioning / NULLs in partition column
>> This is the classic Active/Inactive By Month setup for partitions.
>> Could be addressed via special handling for NULL/infinity in the
>> partitioned column.
> 
> If we allowed for a "catchall partition" and supported normal
> inheritance/triggers on that partition then users could continue to do
> whatever they needed with data that didn't fit the "normal" partitioning
> pattern.

That sounds to me like it would fall under the heading of "impossible
levels of backwards-compatibility".


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Proposal: Log inability to lock pages during vacuum
Next
From: Michael Paquier
Date:
Subject: Re: Proposal : REINDEX SCHEMA