Re: Dynamic Partitioning using Segment Visibility Maps - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Dynamic Partitioning using Segment Visibility Maps |
Date | |
Msg-id | 1199889270.4266.267.camel@ebony.site Whole thread Raw |
In response to | Re: Dynamic Partitioning using Segment Visibility Maps (Csaba Nagy <nagy@ecircle-ag.com>) |
List | pgsql-hackers |
On Mon, 2008-01-07 at 12:14 +0100, Csaba Nagy wrote: > On Wed, 2008-01-02 at 17:56 +0000, Simon Riggs wrote: > > Like it? > > Very cool :-) Thanks. As ever, a distillation of various thoughts, not all mine. > One additional thought: what about a kind of "segment fill factor" ? > Meaning: each segment has some free space reserved for future > updates/inserts of records in the same range of it's partitioning > constraint. And when inserting/updating you put the new record into the > corresponding segment... just like a very coarse clustering. Then you > could vacuum the segments separately to keep the free space not running > out. For active segments you would then fix the partitioning constraint > range once the fill factor is reached, to allow for keeping it's > constraint even when heavily updating (heavily vacuuming it too as > response to that), and create a new segment for the unbounded range for > new inserts... this would work fine for tables where the constraint is > based on ever increasing keys and accidental inserts in old ranges > (which do happen occasionally in real life). Lots of thoughts there, so I'll try to separate them out and analyse. The way I originally described it is a very simple mechanism and we could tweak that some more. All ideas welcome. If we had dynamic segment constraints when the segment was not yet read only that would lead to changes in the segment constraint for each INSERT when we have increasing keys. It seems better to set the constraints once only, when the segment was full, then prevent further INSERTs. The accidental inserts in old ranges seem like something that can be avoided if it is a real-world problem. For UPDATEs on a segment with constraints we might choose to apply the constraints to see what to do. You might want to allow the UPDATE and have it stretch the constraints outwards or you might want to prevent it and throw an error, or you might want to allow the UPDATE, yet migrate the new tuple to the appropriate segment. Dynamic partitioning works in multiple dimensions, so there isn't just one single valid location for any row. i.e. if we update a have a row with OrderId, OrderDate, RequiredByDate, ShipDate and LastModifiedDate on it, we'll probably expand the constraints on at least one of those. If we were lucky enough to have only changed one of those it might turn out there was *in that case* a single more sensible location for the new tuple, but that probably isn't the common case. So the likely best behaviour for UPDATEs is to try to keep the new row version in the same segment, then change the constraints. The segment constraints concept and the read only concept were linked. You're right we could separate them, thought that turns out not to be that desirable. When we do an DELETE or an UPDATE we don't know whether the deleted row version was the last tuple with that particular boundary value. So we don't know whether the DELETE or UPDATE changes the constraints or not, and however we try to avoid it, we'll probably need to recalc the constraints in some circumstance. So updating the constraints dynamically isn't anywhere near as easy as it sounds and ultimately probably isn't worth the effort. So thats why constraints and read-only go together. HOT allows us to keep the new row version within the segment, in many cases. What might be worth doing is marking the FSM space for that segment as update-only to exclude inserts from using it, then forcing UPDATEs to stay within the segment if possible by providing the current block number in each call to the FSM. That would also mean that every UPDATE that wants to do a multi-block update on a currently read-only segment would need to call the FSM. Sounds good, but that would only work for the first UPDATE on a segment after it is marked read only, which isn't much use, or we would do it for *every* block-spanning UPDATE, which would cause contention in other use cases. So although I'm willing to listen and tweak, that hasn't yet resulted in any additional design points, unless I missed something above? > When the change rate of old segments get down, the segments could be > reorganized to have a smaller fill factor, so that you still allow for > accidental updates but keep space usage efficient. This would be some > similar action as a clustering, but hopefully not blocking (which might > be a hard thing to do)... and later again you could mark some of the > really old things as read only and put them in special segments with no > wasted space. Yes, hopefully marking read only and compressed segments is a possible future. > One problem would be when the segment's free space runs out, so you must > put records from the same constraint range in multiple segments - but > that could still work, you just would have multiple segments covering > the same range, but if the "segment fill factor" is chosen properly it > should not be the case... you could normally maintain a set of > non-overlapping segments in terms of the partitioning constraint. Thanks very much for your detailed thoughts on the proposal. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
pgsql-hackers by date: