Some ideas about Vacuum - Mailing list pgsql-hackers
From | Gokulakannan Somasundaram |
---|---|
Subject | Some ideas about Vacuum |
Date | |
Msg-id | 9362e74e0801090239y4fde9d93tac044907df842c39@mail.gmail.com Whole thread Raw |
Responses |
Re: Some ideas about Vacuum
|
List | pgsql-hackers |
Hi,<br /> May be i am reposting something which has been discussed to end in this forum. I have made a search in the archivesand i couldn't find any immediately. <br /> With my relatively small experience in Performance Testing and Tuning,one of the rules of thumb for getting Performance is "Don't do it, if you don't need to do it". When we look at clearingthe older versions of tuples from our tables in PostgreSQL, we can't stop thinking about how it is done in otherdatabases. When we compare the Oracle Undo Log approach with the Postgresql Vacuum approach, the pattern looks verysimilar to C++ memory de-allocation and Java garbage collection. <br /> So, as you may all know, the thing whichworries us about Vacuum is that it is going to places where it need not goto. That's when we are thinking about Deadspace Map. This dead space map is a map, if implemented correctly, would guide Vacuum to go and only look at places wherethere was some activity of Delete/Update/Insert after the last Vacuum. This is accomplished at the cost of some verysmall overhead to Inserts/Deletes/Updates. <br /> Dead space Map is like an undo-log, if we think its role is toget rid of the older versions of data. Instead of moving the tuples to separate location, it guides the Vacuum processto do the cleanup task. May be we can even think of something like Dead space log, which may not be a bitmap. In thislog, transactions might enter their transaction ids and ctids, which can be scanned by the Vacuum process. While thismight take more space, it is with lesser contention, while compared to Dead space Map. To me, as far as i can think of,the only advantage of Dead space Map over Dead space log is the disk space. <br /> It just strikes me that WAL logis already doing just that. I think you can follow my thought-line. If we can ask the Vacuum process to scan the WAL log,it can get all the relevant details on where it needs to go. One optimization, that can be placed here is to somehowmake the archiver do a double-job of helping the Vacuum, while doing the archiving. For people, who have switchedoff archiving, this might not be a benefit. <br /> One main restriction it places on the WAL Logs is that theWAL Log needs to be archived only after all the transactions in it completes. In other words, WAL logs need to be givenenough space, to survive the longest transaction of the database. It is possible to avoid this situation by asking theVacuum process to take the necessary information out of WAL log and store it somewhere and wait for the long running transactionto complete. <br /> The information of interest in WAL is only the table inserts/updates/deletes. So if everyoneaccepts that this is a good idea, till this point, there is a point in reading further.<br /> Ultimately, whathas been achieved till now is that we have made the sequential scans made by the Vacuum process on each table into afew random i/os. Of course there are optimizations possible to group the random i/os and find some sequential i/o out ofit. But still we need to do a full index scan for all those indexes out there. HOT might have saved some work over there.But i am pessimistic here and wondering how it could have been improved. So it just strikes me, we can do the samething which we did just with the tables. Convert a seq scan of the entire table into a random scan of few blocks. Wecan read the necessary tuple information from the tuples, group them and hit at the index in just those blocks and cleanit up. <br /> I can already hear people, saying that it is not always possible to go back to index from table. Thereis this culprit called unstable function based indexes. The structure stops us from going back to index from table.So currently we should restrict the above said approach to only normal indexes(not the function based ones). I hopeit would still give a good benefit. <br /> Of course Vacuum can convert the few random scans into a seq scan, if requiredby referring to table statistics.<br /><br /> Thoughts about the idea????<br /><br />Thanks,<br />Gokul.<br /><br/>P.S.: Let the objections/opposing views have a subtle reduction in its harshness. <br />
pgsql-hackers by date: