Re: Some ideas about Vacuum - Mailing list pgsql-hackers

From Gokulakannan Somasundaram
Subject Re: Some ideas about Vacuum
Date
Msg-id 9362e74e0801111310k6266093au9fc53f2b1256dda5@mail.gmail.com
Whole thread Raw
In response to Re: Some ideas about Vacuum  (Gregory Stark <stark@enterprisedb.com>)
Responses Re: Some ideas about Vacuum
List pgsql-hackers

Sorry Greg , I missed to read this part before.

On Jan 9, 2008 8:40 PM, Gregory Stark <stark@enterprisedb.com> wrote:

"Markus Schiltknecht" <markus@bluegap.ch> writes:

> Hi,
>
> Gokulakannan Somasundaram wrote:
>> If we can ask the Vacuum process to scan the WAL log, it can get all the
>> relevant details on where it needs to go.

That's an interesting thought. I think your caveats are right but with some
more work it might be possible to work it out. For example if a background
process processed the WAL and accumulated an array of possibly-dead tuples to
process in batch. It would wait whenever it sees an xid which isn't yet past
globalxmin, and keep accumulating until it has enough to make it worthwhile
doing a pass.

I think a bigger issue with this approach is that it ties all your tables
together. You can't process one table frequently while some other table has
some long-lived deleted tuples. 

I am not able to clearly understand what you are saying here. It ties all the tables yes.  There are two options here
a) Do we really need to do Vacuum table by table? Say we read 'n' WAL segments and accumulate the data. We should try to sort the result with Relation name, Block num and we can go ahead with the Vacuum. In this way, Vacuum will only work at the database level. Why do we need to process one table frequently?
b) We can create DSMs for each table separately and Vacuum will use the WAL information to update it. In this way, we can Vacuum table wise.
 


I'm also not sure it really buys us anything over having a second
dead-space-map data structure. The WAL is much larger and serves other
purposes which would limit what we can do with it.
Ok. One obvious advantage is that it saves the contention over DSM for the DML operations and Vacuum process. Since Vacuum process is going to have much more information on what has happened in the database, it is possible for some new structures. For example i have been thinking of changing our current index structure in such a way, it won't hold any duplicate tuples for different versions of data. Whenever there is a update, only the indexes relevant to the columns changed will get updated. The Vacuum has to play the role of changing the tid, the index tuple points to, whenever it vacuums a older version.
 It would be possible to create  such structures, which can be synched asynchronously. Another example would be Asynchronous Materialized views.

But pushing those future plans aside, don't you think this would reduce the contention,  which otherwise would be faced by the DML operations?

Thanks,
Gokul.


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Transaction Snapshot Cloning
Next
From: Gavin Sherry
Date:
Subject: Declarative partitioning grammar