Re: Some ideas about Vacuum - Mailing list pgsql-hackers

From Gokulakannan Somasundaram
Subject Re: Some ideas about Vacuum
Date
Msg-id 9362e74e0801160025s5415caeeq9599d6fbaa7563f8@mail.gmail.com
Whole thread Raw
In response to Re: Some ideas about Vacuum  (Markus Schiltknecht <markus@bluegap.ch>)
Responses Re: Some ideas about Vacuum  ("Gokulakannan Somasundaram" <gokul007@gmail.com>)
List pgsql-hackers

Hi,

Please find my answers inline

Do you have evidence of that contention being so worse, that it
justifies the additional WAL reading from disk? (Assuming no WAL archiving).
On a broader sense, DSM is a bitmap index with some optimization that has been placed to make the updates more effective. As you may know, the design of Bitmap index doesn't scale very well with concurrency. If you put more information into a little space, then i feel it might affect concurrency. Let us discuss it in detail.
DSM, i believe plans to achieve the following objectives,
a) To find out the blocks, which are to be Vacuumed
b) To find out the blocks, where freezing is required
c) To find out the blocks which are visible to everyone.

The DSM might get split into multiple maps like Visibility maps(already proposed by Heikki), Vacuum Maps and Freezing maps.  When the inserts happen, the map has to get extended and it has to lock the block to extend the map. Say if the DSM block corresponds to some 60K data blocks. Then any updates / deletes happening over those blocks have to wait for that time. This is just an example, which i can think of off-hand. May be the people, who are implementing might throw more light on the synchronization points.

IMO we can get about any granularity we want for DSM update locking,
depending on how we arrange the DSM bits.
I can't understand this exactly. 


> Since Vacuum process is going to
> have much more information on what has happened in the database,

Why should that be? IMO, collecting the information at transaction time
can give you exactly the same information, if not more or better
information.

My argument is if we have collected that information in WAL, why should we collect it again and again?
 
> it is
> possible for some new structures. For example i have been thinking of
> changing our current index structure in such a way, it won't hold any
> duplicate tuples for different versions of data. Whenever there is a
> update, only the indexes relevant to the columns changed will get
> updated. The Vacuum has to play the role of changing the tid, the index
> tuple points to, whenever it vacuums a older version.

Huh? The index would then point to the old tuple only, until a VACUUM
comes by, right. How are following transactions expected to find the new
tuple before that VACUUMing?
You are right. We have already discusses about this. In the Vacuum aproach, we travel front in time. We catch the oldest transaction and go to the new transaction, by following the ctid in the old tuple. In the undo log approach, it is the reverse. We go to the latest transaction and travel back in time. Its interesting to see, how theory of relativity has got applied in database science right?

So say we have 'n' versions of the same data in index. Right now we have 'n' index tuples which point to 'n' block in heap. we would read all the 'n' index tuples and go to all the versions of data in the table. If this changes, there will be one index tuple, which would point to the oldest heap tuple and from there we will navigate to all the new tuples. The advantage is obvious, the index is going to have lesser size and the updates will not update indexes, unless the data in it has got changed.

Hope i was clear. Please revert back, in case i am not clear.

Thanks,
Gokul.

pgsql-hackers by date:

Previous
From: "D'Arcy J.M. Cain"
Date:
Subject: Re: Password policy
Next
From: "Gokulakannan Somasundaram"
Date:
Subject: Re: Some ideas about Vacuum