Thread: MVCC performance issue
This is my first post in this mailing list and I would like to raise an issue that in my opinion is causing performance issues of PostgreSQL especially in a transaction processing environment. In my company we are using PostgreSQL for the last 8 year for our in-house developed billing system (telecom). The last few months we started considering moving to another RDBMS just because of this issue. After all these years, I believe that the biggest improvement that could be done and will boost overall performance especially for enterprise application will be to improve Multiversion Concurrency Control (MVCC) mechanism. In theory this seems to be improving performance for SELECT queries but on tables with very intensive and frequent updates, even that is not fully true because of the fragmentation of data caused by MVCC. I saw cases were a SELECT COUNT(*) on an empty (!!!) table (used as a buffer) took more than 40min to return a result! VACUUM is not a solution in my opinion even though after the introduction of autovacuum daemon situation got much better. PROBLEM DECRIPTION ------------------ By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a new copy of the row in a new location. Any SELECT queries within the same session are accessing the new version of the raw and all other queries from other users are still accessing the old version. When transaction is COMMIT PostgreSQL makes the a new version of the row as the "active" row and expires the old row that remains "dead" and then is up to VACUUM procedure to recover the "dead" rows space and make it available to the database engine. In case that transaction is ROLLBACK then space reserved for the new version of the row is released. The result is to have huge fragmentation on table space, unnecessary updates in all affected indexes, unnecessary costly I/O operations, poor performance on SELECT that retrieves big record sets (i.e. reports etc) and slower updates. As an example, consider updating the "live" balance of a customer for each phone call where the entire customer record has to be duplicated again and again upon each call just for modifying a numeric value! SUGGESTION -------------- 1) When a raw UPDATE is performed, store all "new raw versions" either in separate temporary table space or in a reserved space at the end of each table (can be allocated dynamically) etc 2) Any SELECT queries within the same session will be again accessing the new version of the row 3) Any SELECT queries from other users will still be accessing the old version 4) When UPDATE transaction is ROLLBACK just release the space used in new temporary location 5) When UPDATE transaction is COMMIT then try to LOCK the old version and overwrite it at the same physical location (NO FRAGMENTATION). 6) Similar mechanism can be applied on INSERTS and DELETES 7) In case that transaction was COMMIT, the temporary location can be either released or archived/cleaned on a pre-scheduled basis. This will possibly allow the introduction of a TRANSACTION LOG backup mechanism as a next step. 8) After that VACUUM will have to deal only with deletions!!! I understand that my suggestion seems to be too simplified and also that there are many implementation details and difficulties that I am not aware. I strongly believe that the outcome of the discussion regarding this issue will be helpful. Best Regards, Kyriacos Kyriacou Senior Developer/DBA
On 11/12/2010 02:25 AM, Kyriacos Kyriacou wrote: > The > result is to have huge fragmentation on table space, unnecessary updates > in all affected indexes, unnecessary costly I/O operations, poor > performance on SELECT that retrieves big record sets (i.e. reports etc) > and slower updates. Yep. It's all about trade-offs. For some workloads the in-table MVCC storage setup works pretty darn poorly, but for most it seems to work quite well. There are various other methods of implementing relational storage with ACID properties. You can exclude all other transactions while making a change to a table, ensuring that nobody else can see "old" or "new" rows so there's no need to keep them around. You can use an out-of-line redo log (a-la Oracle). Many other methods exist, too. They all have advantages and disadvantages for different workloads. It's far from trivial to mix multiple schemes within a single database, so mixing and matching schemes for different parts of your DB isn't generally practical. > 1) When a raw UPDATE is performed, store all "new raw versions" either > in separate temporary table space > or in a reserved space at the end of each table (can be allocated > dynamically) etc OK, so you want a redo log a-la Oracle? > 2) Any SELECT queries within the same session will be again accessing > the new version of the row > 3) Any SELECT queries from other users will still be accessing the old > version ... and incurring horrible random I/O penalties if the redo log doesn't fit in RAM. Again, a-la Oracle. Even read-only transactions have to hit the undo log if there's an update in progress, because rows they need may have been moved out to the undo log as they're updated in the main table storage. [snip description] > I understand that my suggestion seems to be too simplified and also that > there are many implementation details and difficulties that I am not > aware. It sounds like you're describing Oracle-style MVCC, using redo logs. http://blogs.sybase.com/database/2009/04/mvcc-dispelling-some-oracle-fudunderstanding-the-cost/ http://en.wikipedia.org/wiki/Multiversion_concurrency_control Oracle's MVCC approach has its own costs. Like Pg's, those costs increase with update/delete frequency. Instead of table bloat, Oracle suffers from redo log growth (or redo log size management issues). Instead of increased table scan costs from dead rows, Oracle suffers from random I/O costs as it looks up the out-of-line redo log for old rows. Instead of long-running writer transactions causing table bloat, Oracle can have problems with long-running reader transactions aborting when the redo log runs out of space. Personally, I don't know enough to know which is "better". I suspect they're just different, with different trade-offs. If redo logs allow you to do without write-ahead logging, that'd be interesting - but then, the WAL is useful for all sorts of replication options, and the use of linear WALs means that write ordering in the tables doesn't need to be as strict, which has performance advantages. -- Craig Ringer
In reading what you are describing, don't you think PG 9 goes a long way to helping you out?
#avg_ls_inline_popup { position:absolute; z-index:9999; padding: 0px 0px; margin-left: 0px; margin-top: 0px; width: 240px; overflow: hidden; word-wrap: break-word; color: black; font-size: 10px; text-align: left; line-height: 13px;}
On Sat, Nov 13, 2010 at 12:53 AM, Craig Ringer <craig@postnewspapers.com.au> wrote:
On 11/12/2010 02:25 AM, Kyriacos Kyriacou wrote:Yep. It's all about trade-offs. For some workloads the in-table MVCC storage setup works pretty darn poorly, but for most it seems to work quite well.The
result is to have huge fragmentation on table space, unnecessary updates
in all affected indexes, unnecessary costly I/O operations, poor
performance on SELECT that retrieves big record sets (i.e. reports etc)
and slower updates.
There are various other methods of implementing relational storage with ACID properties. You can exclude all other transactions while making a change to a table, ensuring that nobody else can see "old" or "new" rows so there's no need to keep them around. You can use an out-of-line redo log (a-la Oracle). Many other methods exist, too.
They all have advantages and disadvantages for different workloads. It's far from trivial to mix multiple schemes within a single database, so mixing and matching schemes for different parts of your DB isn't generally practical.OK, so you want a redo log a-la Oracle?1) When a raw UPDATE is performed, store all "new raw versions" either
in separate temporary table space
or in a reserved space at the end of each table (can be allocated
dynamically) etc... and incurring horrible random I/O penalties if the redo log doesn't fit in RAM. Again, a-la Oracle.2) Any SELECT queries within the same session will be again accessing
the new version of the row
3) Any SELECT queries from other users will still be accessing the old
version
Even read-only transactions have to hit the undo log if there's an update in progress, because rows they need may have been moved out to the undo log as they're updated in the main table storage.
[snip description]It sounds like you're describing Oracle-style MVCC, using redo logs.I understand that my suggestion seems to be too simplified and also that
there are many implementation details and difficulties that I am not
aware.
http://blogs.sybase.com/database/2009/04/mvcc-dispelling-some-oracle-fudunderstanding-the-cost/
http://en.wikipedia.org/wiki/Multiversion_concurrency_control
Oracle's MVCC approach has its own costs. Like Pg's, those costs increase with update/delete frequency. Instead of table bloat, Oracle suffers from redo log growth (or redo log size management issues). Instead of increased table scan costs from dead rows, Oracle suffers from random I/O costs as it looks up the out-of-line redo log for old rows. Instead of long-running writer transactions causing table bloat, Oracle can have problems with long-running reader transactions aborting when the redo log runs out of space.
Personally, I don't know enough to know which is "better". I suspect they're just different, with different trade-offs. If redo logs allow you to do without write-ahead logging, that'd be interesting - but then, the WAL is useful for all sorts of replication options, and the use of linear WALs means that write ordering in the tables doesn't need to be as strict, which has performance advantages.
--
Craig Ringer
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Craig Ringer wrote: > It sounds like you're describing Oracle-style MVCC, using redo logs. > > http://blogs.sybase.com/database/2009/04/mvcc-dispelling-some-oracle-fudunderstanding-the-cost/ > > Craig, this is an interesting blog page, making some valid points about the multiversioning vs. locking. The ATM example, however, is unrealistic and couldn't have happened the way the author describes. Oracle has the same write consistency mechanism as Postgres and it restarts the transaction if the transaction blocks were updated while the transaction was waiting. In other words, the wife's transaction would have been restarted before committing, the transaction would get the balance accurately and there wouldn't be a loss of $250. Such an example is naive, sheer FUD. If that was the case, no bank in the whole wide world would be using Oracle, and many of them do, I dare say many more are using Oracle than Sybase. That means that they're not losing money if 2 spouses decide to withdraw money from the joint account simultaneously. Given the number of people in the world, I imagine that to be a rather common and ordinary situation for the banks. The example is plain silly. Here is what I have in mind as "write consistency": http://www.postgresql.org/docs/9.0/static/transaction-iso.html#XACT-READ-COMMITTED: " If the first updater rolls back, then its effects are negated and the second updater can proceed with updating the originally found row. If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row. The search condition of the command (the WHERE clause) is re-evaluated to see if the updated version of the row still matches the search condition." Essentially the same behavior is described here, for Oracle: http://tkyte.blogspot.com/2005/08/something-different-part-i-of-iii.html "Obviously, we cannot modify an old version of a block—when we go to modify a row, we must modify the current version of that block. Additionally, Oracle cannot just simply skip this row, as that would be an inconsistent read and unpredictable. What we’ll discover is that in such cases, Oracle will restart the write modification from scratch." Postgres re-evaluates the where condition, Oracle restarts the entire transaction, but neither MVCC mechanism would allow for the silly ATM example described in the blog. Both databases would have noticed change in the balance, both databases would have ended with the proper balance in the account. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
On 11/14/2010 02:38 AM, Mladen Gogala wrote: > Craig Ringer wrote: >> It sounds like you're describing Oracle-style MVCC, using redo logs. >> >> http://blogs.sybase.com/database/2009/04/mvcc-dispelling-some-oracle-fudunderstanding-the-cost/ >> > Craig, this is an interesting blog page, making some valid points about > the multiversioning vs. locking. The ATM example, however, is > unrealistic and couldn't have happened the way the author describes. Yep, you're quite right. I was using it for its explanation of some of the costs of MVCC as Oracle implements it, because it's surprisingly hard to find explanations/analysis of that with some quick Google searching. I hadn't read beyond that part. I'd be really interested in some *good* writeups of the costs/benefits of the various common mvcc and locking based rdbms implementations. Thanks for posting a breakdown of the issues with that article, lest others be mislead. Appreciated. -- Craig Ringer
On Sat, Nov 13, 2010 at 07:53, Craig Ringer <craig@postnewspapers.com.au> wrote: > Oracle's MVCC approach has its own costs. Like Pg's, those costs increase > with update/delete frequency. Instead of table bloat, Oracle suffers from > redo log growth (or redo log size management issues). Instead of increased > table scan costs from dead rows, Oracle suffers from random I/O costs as it > looks up the out-of-line redo log for old rows. Instead of long-running > writer transactions causing table bloat, Oracle can have problems with > long-running reader transactions aborting when the redo log runs out of > space. Another advantage of Oracle's approach seems that they need much less tuple-level overhead. IMO the 23-byte tuple overhead is a much bigger drawback in Postgres than table fragmentation. Regards, Marti
On Thu, Nov 11, 2010 at 20:25, Kyriacos Kyriacou <kyriacosk@prime-tel.com> wrote: > By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a > new copy of the row in a new location. > result is to have huge fragmentation on table space, unnecessary updates > in all affected indexes, unnecessary costly I/O operations, poor > performance on SELECT that retrieves big record sets (i.e. reports etc) > and slower updates. Have you tried reducing the table fillfactor and seeing if HOT update ratio increases? PostgreSQL 8.3 introduced HOT updates as kind of a middle ground -- if the update doesn't affect indexed columns and there's enough space in the same page that is being updated, then the new version will be written in the same page and indexes don't need to be touched at all. Regards, Marti
Marti Raudsepp wrote: > > > Another advantage of Oracle's approach seems that they need much less > tuple-level overhead. IMO the 23-byte tuple overhead is a much bigger > drawback in Postgres than table fragmentation. > > Regards, > Marti > > Oracle, however, does have a problem with "ORA-1555 Snapshot too old", precisely because of their implementation of MVCC. In other words, if your query is running long and Oracle is not able to reconstruct the old rows from the UNDO segments, you're out of luck and your query will die. The greatest burden of the Postgres implementation is the fact that there is no row id, so that the table header and the indexes need to be updated much more frequently than is the case with Oracle. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
On Thu, Nov 11, 2010 at 1:25 PM, Kyriacos Kyriacou <kyriacosk@prime-tel.com> wrote: > This is my first post in this mailing list and I would like to raise an > issue that in my opinion is causing performance issues of PostgreSQL > especially in a transaction processing environment. In my company we are > using PostgreSQL for the last 8 year for our in-house developed billing > system (telecom). The last few months we started considering moving to > another RDBMS just because of this issue. > > After all these years, I believe that the biggest improvement that could > be done and will boost overall performance especially for enterprise > application will be to improve Multiversion Concurrency Control (MVCC) > mechanism. In theory this seems to be improving performance for SELECT > queries but on tables with very intensive and frequent updates, even > that is not fully true because of the fragmentation of data caused by > MVCC. I saw cases were a SELECT COUNT(*) on an empty (!!!) table (used > as a buffer) took more than 40min to return a result! VACUUM is not a > solution in my opinion even though after the introduction of autovacuum > daemon situation got much better. There are probably a number of ways that the behavior you're seeing could be improved without switching databases or rewriting PostgreSQL, but you haven't provided enough information here for anyone to help you in a meaningful way - such as the version of PostgreSQL you're running. One obvious suggestion would be to empty your table using TRUNCATE rather than DELETE, which will avoid the particular problem you're describing here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company