Re: possible vacuum improvement? - Mailing list pgsql-hackers

From Barry Lind
Subject Re: possible vacuum improvement?
Date
Msg-id 3D74FC44.9010909@xythos.com
Whole thread Raw
In response to possible vacuum improvement?  (Mario Weilguni <mweilguni@sime.com>)
List pgsql-hackers
Wouldn't it make sense to implement autovacuum information in a struture 
like the FSM, a Dirty Space Map (DSM)?  As blocks are dirtied by 
transactions they can be added to the DSM.  Then vacuum can give 
priority processing to those blocks only.  The reason I suggest this is 
that in many usage senerios it will be more efficient to only vacuum 
part of a table than the entire table.  Given a large table that grows 
over time, it tends to be the case that older data in the table becomes 
more static as it ages (a lot of financial data is like this, when it is 
initially created it may get a lot of updates done early in it's life 
and may even be deleted, but once the data gets older (for example a 
year old), it is unlikely to change).  This would imply that over time 
the first blocks in a table will change less and most activity will 
occur towards the end of the table.  If you have a multigig table, where 
most of the activity occurs near the end, a lot of cpu cycles can be 
wasted going over the mostly static begining of the table.

thanks,
--Barry

Tom Lane wrote:

>"Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes:
>  
>
>>1)Is this sounds like a workable solution?
>>    
>>
>
>Adding a trigger to every tuple update won't do at all.  Storing the
>counts in a table won't do either, as the updates on that table will
>generate a huge amount of wasted space themselves (not to mention
>enough contention to destroy concurrent performance).
>
>  
>
>>4)Is use of threads sounds portable enough?
>>    
>>
>
>Threads are completely out of the question, at least if you have any
>hope of seeing this code get accepted into the core distro.
>
>
>For vacuum's purposes all that we really care to know about is the
>number of obsoleted tuples in each table: committed deletes and updates,
>and aborted inserts and updates all count.  Furthermore, we do not need
>or want a 100% reliable solution; approximate counts would be plenty
>good enough.
>
>What I had in the back of my mind was: each backend counts attempted
>insertions and deletions in its relcache entries (an update adds to both
>counts).  At transaction commit or abort, we know which of these two
>counts represents the number of dead tuples added to each relation, so
>while we scan the relcache for post-xact cleanup (which we will be doing
>anyway) we can transfer the correct count into the shared FSM entry for
>the relation.  This gives us a reasonably accurate count in shared
>memory of all the tuple obsoletions since bootup, at least for
>heavily-used tables.  (The FSM might choose to forget about lightly-used
>tables.)  The auto vacuumer could look at the FSM numbers to decide
>which tables are highest priority to vacuum.
>
>This scheme would lose the count info on a database restart, but that
>doesn't bother me.  In typical scenarios the same tables will soon get
>enough new counts to be highly ranked for vacuuming.  In any case the
>auto vacuumer must be designed so that it vacuums every table every so
>often anyhow, so the possibility of forgetting that there were some dead
>tuples in a given table isn't catastrophic.
>
>I do not think we need or want a control table for this; certainly I see
>no need for per-table manual control over this process.  There should
>probably be a few knobs in the form of GUC parameters so that the admin
>can control how much overall work the auto-vacuumer does.  For instance
>you'd probably like to turn it off when under peak interactive load.
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>  
>




pgsql-hackers by date:

Previous
From: Rod Taylor
Date:
Subject: 7.2.2 bug?
Next
From: Joe Conway
Date:
Subject: Re: Memory management question