Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From PFC
Subject Re: vacuum, performance, and MVCC
Date
Msg-id op.tbnthavhcigqcu@apollo13
Whole thread Raw
In response to Re: vacuum, performance, and MVCC  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> What I see in this discussion is a huge amount of "the grass must be
> greener on the other side" syndrome, and hardly any recognition that
> every technique has its downsides and complications.
Sure ;)
MVCC generates dead rows, by its very nature ; however I see two trends  
in this :
1* A large transaction that updates/deletes many rows.For instance suppose you UPDATE an entire table whose size is
largerthan  
 
memory.
Old row versions have to be kept somewhere until commit, be it in the  
table itself or in some accessory undo-log.So, there will be a lot of harddisk grinding anyway, be it MVCC or  
Oracle-style, or whatever. MVCC will bloat the table and indexes, then  
VACUUM will shrink them. Update-in-place systems will bloat an undo log.
It seems to me the current MVCC+VACUUM is the right tool for this job,  
requiring about the same amount of IO that the others.Vacuum scans sequentially, so it's the best way to process large
volumes 
 
of data.
2* Many short transactions update rows in a tableLike the sessions problem, for instance.
Current VACUUM sucks for this case, I guess that's known.
-------
So, we have two different problems, and one tool which is adapted to one  
problem only. Should the tool (Vacuum) be fixed to handle both problems,  
making it more complex and difficult to maintain, or should another tool  
be created specifically for the second problem ?Problem 2 is very different from problem 1. The only case when they
meet 
 
is when there is a continuous stream of small updates running concurrently  
with a long transaction.So, what is the ideal tool for case 2 ?
We'd want a vacuuming machine that can be run very often, or even better,  
continuously.The table can be large, larger than the disk cache, so scanning it is not  
an option.The updates are probably randomly distributed into the table. Therefore,  
VACUUMing a long time after these transactions are commited and the pages  
are no longer in cache would require a lot of random seeks, which is also  
bad.Besides, we want this vacuum to be continuous and transparent.
The best time to vacuum pages is, thus, when they are still in the  
background writer's memory, or the disk cache, waiting to be flushed to  
disk. There, they can be re-read, vacuumed and re-written with no seek  
penalty, only additional WAL traffic. However the amount of WAL traffic in  
bytes/s is less important that the frequency of WAL syncs. Emitting more  
WAL data shouldn't be a problem if those sync writes are coalesced with  
the sync writes of current reansactions.
So, I guess the best solution for case 2 is to have the background writer  
perform on-the-fly VACUUM :
An UPDATE or DELETE transaction hands over dirty pages to be written to  
the bgwriter. It also tells the bgwriter the ID of the current transaction  
and flags specifying if they contain candidate dead rows.The bgwriter should have a sufficiently large buffer in order
tobe able  
 
to keep these pages in memory until all the transactions that can see the  
dead rows in these pages are finished.Then, the pages are vacuumed and written.
The key is the size of the buffer. It should be large enough to contain  
enough pages so that it is actually possible to vacuum something out of  
them before writing them. However if the buffer capacity is exceeded (for  
instance, because there is a long running transaction), this is not a  
problem : the pages are simply written to disk normally, they will contain  
dead rows, which will need to be handled lated by the standard VACUUM.
I think this would maximize code reuse by using the current bgwriter's  
architecture... did I miss something ?















pgsql-hackers by date:

Previous
From: "Magnus Hagander"
Date:
Subject: Gist does not build with VC++ anymore
Next
From: "Magnus Hagander"
Date:
Subject: Re: Gist does not build with VC++ anymore