Re: On partitioning - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: On partitioning
Date
Msg-id 54877884.9010608@BlueTreble.com
Whole thread Raw
In response to Re: On partitioning  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
On 12/8/14, 5:19 PM, Josh Berkus wrote:
> On 12/08/2014 02:12 PM, Jim Nasby wrote:
>> On 12/8/14, 12:26 PM, Josh Berkus wrote:
>>> 4. Creation Locking Problem
>>> high probability of lock pile-ups whenever a new partition is created on
>>> demand due to multiple backends trying to create the partition at the
>>> same time.
>>> Not Addressed?
>>
>> Do users actually try and create new partitions during DML? That sounds
>> doomed to failure in pretty much any system...
>
> There is no question that it would be easier for users to create
> partitions on demand automatically.  Particularly if you're partitioning
> by something other than time.  For a particular case, consider users on
> RDS, which has no cron jobs for creating new partitons; it's on demand
> or manually.
>
> It's quite possible that there is no good way to work out the locking
> for on-demand partitions though, but *if* we're going to have a 2nd
> partition system, I think it's important to at least discuss the
> problems with on-demand creation.

Yeah, we should discuss it. Perhaps the right answer here may be our own job scheduler, something a lot of folks want
anyway.

>>> 11. Hash Partitioning
>>> Some users would prefer to partition into a fixed number of
>>> hash-allocated partitions.
>>> Not Addressed.
>>
>> Though, you should be able to do that in either system if you bother to
>> define your own hash in a BEFORE trigger...
>
> That doesn't do you any good with the SELECT query, unless you change
> your middleware to add a hash(column) to every query.  Which would be
> really hard to do for joins.
>
>>> A. COPY/ETL then attach
>>> In inheritance partitioning, you can easily build a partition outside
>>> the master and then "attach" it, allowing for minimal disturbance of
>>> concurrent users.  Could be addressed in the future.
>>
>> How much of the desire for this is because our current "row routing"
>> solutions are very slow? I suspect that's the biggest reason, and
>> hopefully Alvaro's proposal mostly eliminates it.
>
> That doesn't always work, though.  In some cases the partition is being
> built using some fairly complex logic (think of partitions which are
> based on matviews) and there's no fast way to create the new data.
> Again, this is an acceptable casualty of an improved design, but if it
> will be so, we should consciously decide that.

Is there an example you can give here? If the scheme is that complicated I'm failing to see how you're supposed to do
thingslike partition elimination.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: operator does not exist: character varying[] <> character[]
Next
From: Stephen Frost
Date:
Subject: Re: GSSAPI, SSPI - include_realm default