Thread: Mulit-Vesions and Vacuum
Hello All, Does anyone really know how this multi version concurrency really works? The basic idea is simple, never update anything so that you can always get read-consistent queries by reading behind the updates. But the details are missing in the docs. In particular:- 1. How long do previous version hang arround? As long as the oldest transaction, or until a Vacuum? 2. How expensive are they? Do the multi-version copies clutter the main database blocks, or are they stored elsewhere? And if I update one field of a row, is the entire row copied or just the field? 3. What does Vacuum really do? Will it interfere with open transactions, or will it know to leave multi version copies for existing transactions? PS. I have done a few experiements on how the locking really works, see http://www.SimpleORM.org/DBNotes.html. Note the difference with Oracle, which I think is better and easy to implement given the basic MV approach. PPS. Design issues:- 1. Why not just use the transaction log to implement MV? Each record only needs to store a pointer into the logs to the previous update. And the logs would log that pointer, forming a backward pointing linked list. Most of the time this will not be needed, or will refer to a very recently written tail of the log which will already be cached in memory. This approach would put an end to Oracle's issus with running out of "Rollback Segments". It would also put an end to Postgres performance problems on inserts and updates -- why be slower than MySQL? You need a transaction log anyway, so no extra overhead should be added for MV unless a query actually needs to read behind updated data. 2. If you are going to go to the trouble of keeping multi versions for locking, why not make them available for the application as well? One of the painful features to implement is audit trails of how records get to be the way they are. The database could do that automatically.
On Thu, Jul 18, 2002 at 06:43:03PM -0700, Anthony Berglas wrote: > Does anyone really know how this multi version concurrency really works? Yes, lots of people -- namely, those that read the documentation. > 1. How long do previous version hang arround? As long as the oldest > transaction, or until a Vacuum? I may as well quote the docs: In normal PostgreSQL operation, an UPDATE or DELETE of a row does not immediately remove the old tuple (version of the row). This approach is necessary to gain the benefits of multiversion concurrency control (see the User's Guide): the tuple must not be deleted while it is still potentially visible to other transactions. But eventually, an outdated or deleted tuple is no longer of interest to any transaction. The space it occupies must be reclaimed for reuse by new tuples, to avoid infinite growth of disk space requirements. This is done by running VACUUM. (from http://developer.postgresql.org/docs/postgres/routine-vacuuming.html) > 2. How expensive are they? Do the multi-version copies clutter the main > database blocks, or are they stored elsewhere? And if I update one field of > a row, is the entire row copied or just the field? Sequential scans will need to read through old tuples. In 7.2 and earlier versions, dead tuples remain in indexes; to determine whether a tuple found from an index scan is visible, the backend needs to check the heap tuple (since the index entry doesn't store transaction visibility info). In 7.3, Tom Lane wrote the code to allow index scans to ignore dead tuples dynamically (without the need for a VACUUM) -- the dead index tuple still exists, but the index scan ignores it. Search for "tuple killing" in the pgsql-hackers archives for more information. > 3. What does Vacuum really do? Will it interfere with open transactions, or > will it know to leave multi version copies for existing transactions? This is from the same page I quoted above: Beginning in PostgreSQL 7.2, the standard form of VACUUM can run in parallel with normal database operations (selects, inserts, updates, deletes, but not changes to table schemas). > PS. I have done a few experiements on how the locking really works, see > http://www.SimpleORM.org/DBNotes.html. On that page, you note "We do not see any of the changes made by Session One, including the phantoms. This is in contradiction to the PostgreSQL documentation, but good." Where is the contradiction between the behavior you observed, and the documentation? For example: Read Committed is the default isolation level in PostgreSQL. When a transaction runs on this isolation level, a SELECT query sees only data committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions. (However, the SELECT does see the effects of previous updates executed within its own transaction, even though they are not yet committed.) In effect, a SELECT query sees a snapshot of the database as of the instant that that query begins to run. Notice that two successive SELECTs can see different data, even though they are within a single transaction, if other transactions commit changes during execution of the first SELECT. (from http://developer.postgresql.org/docs/postgres/xact-read-committed.html) Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
On Thu, Jul 18, 2002 at 06:43:03PM -0700, Anthony Berglas wrote: > Hello All, > > Does anyone really know how this multi version concurrency really works? Somewhat. > The basic idea is simple, never update anything so that you can always get > read-consistent queries by reading behind the updates. But the details are > missing in the docs. In particular:- > > 1. How long do previous version hang arround? As long as the oldest > transaction, or until a Vacuum? Tuples are valid for as long as a transaction can see them. Vacuum cleans out any tuples that are no longer valid. > 2. How expensive are they? Do the multi-version copies clutter the main > database blocks, or are they stored elsewhere? And if I update one field of > a row, is the entire row copied or just the field? To store them elsewhere would imply copying them to a new place when you update them, which could cause performance issues. The entire row is copied and usually appended to the end of the table. > 3. What does Vacuum really do? Will it interfere with open transactions, or > will it know to leave multi version copies for existing transactions? Old vacuum used to interfere. There is the new non-blocking vacuum which can cleanup a lot of stuff but not everything. VACUUM FULL is required to free up disk-space. > PS. I have done a few experiements on how the locking really works, see > http://www.SimpleORM.org/DBNotes.html. Note the difference with Oracle, > which I think is better and easy to implement given the basic MV approach. If you want serialisable, then you should ask for it (SET TRANSACTION LEVEL SERIALIZABLE or something like that). The default is not-serializable (as you saw). > 1. Why not just use the transaction log to implement MV? Each record only > needs to store a pointer into the logs to the previous update. And the logs > would log that pointer, forming a backward pointing linked list. Most of > the time this will not be needed, or will refer to a very recently written > tail of the log which will already be cached in memory. This approach > would put an end to Oracle's issus with running out of "Rollback Segments". > It would also put an end to Postgres performance problems on inserts and > updates -- why be slower than MySQL? You need a transaction log anyway, so > no extra overhead should be added for MV unless a query actually needs to > read behind updated data. This bit I'm less sure about. Each tuple does contain a pointer to the previous incarnation, but I'm not exactly sure what it's used for. The problem is that only some tuples are usually updated, most are left alone and they are never moved (except maybe by vacuum). As for insert speed, I can always saturate the disk subsystem doing straight inserts, so making it faster doesn't seem too useful. The problem with updates is that you have to read the old tuple first which ruins the disk performance. Though WAL helps a lot here IIRC. > 2. If you are going to go to the trouble of keeping multi versions for > locking, why not make them available for the application as well? One of > the painful features to implement is audit trails of how records get to be > the way they are. The database could do that automatically. Look in the docs for Time Travel. It used to be supported but was tossed a while ago since it could be more efficiently implemented using triggers and it had a tendency to use a lot of disk space. Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > There are 10 kinds of people in the world, those that can do binary > arithmetic and those that can't.
Martijn van Oosterhout <kleptog@svana.org> writes: >> 3. What does Vacuum really do? Will it interfere with open transactions, or >> will it know to leave multi version copies for existing transactions? > Old vacuum used to interfere. There is the new non-blocking vacuum which can > cleanup a lot of stuff but not everything. VACUUM FULL is required to free > up disk-space. VACUUM has never removed any tuples that could be seen by open transactions. > This bit I'm less sure about. Each tuple does contain a pointer to the > previous incarnation, but I'm not exactly sure what it's used for. Actually, the links run from tuple versions to newer tuple versions, not older ones. The purpose is to let UPDATE in READ COMMITTED mode easily find the latest version of a tuple --- you just chain through the forward links till you get to the end of the chain. There isn't any common operation that would want to chain backwards. > Look in the docs for Time Travel. It used to be supported but was tossed a > while ago since it could be more efficiently implemented using triggers and > it had a tendency to use a lot of disk space. There's been some discussion of re-allowing it as an optional feature... but I'm not clear on how that would work. regards, tom lane