Thread: MVCC performance issue

MVCC performance issue

From

"Kyriacos Kyriacou"

Date:

12 November 2010, 18:07:37

This is my first post in this mailing list and I would like to raise an
issue that in my opinion is causing performance issues of PostgreSQL
especially in a transaction processing environment. In my company we are
using PostgreSQL for the last 8 year for our in-house developed billing
system (telecom). The last few months we started considering moving to
another RDBMS just because of this issue.

After all these years, I believe that the biggest improvement that could
be done and will boost overall performance especially for enterprise
application will be to improve Multiversion Concurrency Control (MVCC)
mechanism. In theory this seems to be improving performance for SELECT
queries but on tables with very intensive and frequent updates, even
that is not fully true because of the fragmentation of data caused by
MVCC. I saw cases were a SELECT COUNT(*) on an empty (!!!) table (used
as a buffer) took more than 40min to return a result! VACUUM is not a
solution in my opinion even though after the introduction of autovacuum
daemon situation got much better.

PROBLEM DECRIPTION
------------------
By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a
new copy of the row in a new location. Any SELECT queries within the
same session are accessing the new version of the raw and all other
queries from other users are still accessing the old version. When
transaction is COMMIT PostgreSQL makes the a new version of the row as
the "active" row and expires the old row that remains "dead" and then is
up to VACUUM procedure to recover the "dead" rows space and make it
available to the database engine. In case that transaction is ROLLBACK
then space reserved for the new version of the row is released. The
result is to have huge fragmentation on table space, unnecessary updates
in all affected indexes, unnecessary costly I/O operations, poor
performance on SELECT that retrieves big record sets (i.e. reports etc)
and slower updates. As an example, consider updating the "live" balance
of a customer for each phone call where the entire customer record has
to be duplicated again and again upon each call just for modifying a
numeric value!

SUGGESTION
--------------
1) When a raw UPDATE is performed, store all "new raw versions" either
in separate temporary table space
   or in a reserved space at the end of each table (can be allocated
dynamically) etc
2) Any SELECT queries within the same session will be again accessing
the new version of the row
3) Any SELECT queries from other users will still be accessing the old
version
4) When UPDATE transaction is ROLLBACK just release the space used in
new temporary location
5) When UPDATE transaction is COMMIT then try to LOCK the old version
and overwrite it at the same physical location (NO FRAGMENTATION).
6) Similar mechanism can be applied on INSERTS and DELETES
7) In case that transaction was COMMIT, the temporary location can be
either released or archived/cleaned on a pre-scheduled basis. This will
possibly allow the introduction of a TRANSACTION LOG backup mechanism as
a next step.
8) After that VACUUM will have to deal only with deletions!!!


I understand that my suggestion seems to be too simplified and also that
there are many implementation details and difficulties that I am not
aware.

I strongly believe that the outcome of the discussion regarding this
issue will be helpful.

Best Regards,

Kyriacos Kyriacou
Senior Developer/DBA

Re: MVCC performance issue

From

Craig Ringer

Date:

13 November 2010, 01:54:09

On 11/12/2010 02:25 AM, Kyriacos Kyriacou wrote:

>  The
> result is to have huge fragmentation on table space, unnecessary updates
> in all affected indexes, unnecessary costly I/O operations, poor
> performance on SELECT that retrieves big record sets (i.e. reports etc)
> and slower updates.

Yep. It's all about trade-offs. For some workloads the in-table MVCC
storage setup works pretty darn poorly, but for most it seems to work
quite well.

There are various other methods of implementing relational storage with
ACID properties. You can exclude all other transactions while making a
change to a table, ensuring that nobody else can see "old" or "new" rows
so there's no need to keep them around. You can use an out-of-line redo
log (a-la Oracle). Many other methods exist, too.

They all have advantages and disadvantages for different workloads. It's
far from trivial to mix multiple schemes within a single database, so
mixing and matching schemes for different parts of your DB isn't
generally practical.

> 1) When a raw UPDATE is performed, store all "new raw versions" either
> in separate temporary table space
>     or in a reserved space at the end of each table (can be allocated
> dynamically) etc

OK, so you want a redo log a-la Oracle?

> 2) Any SELECT queries within the same session will be again accessing
> the new version of the row
> 3) Any SELECT queries from other users will still be accessing the old
> version

... and incurring horrible random I/O penalties if the redo log doesn't
fit in RAM. Again, a-la Oracle.

Even read-only transactions have to hit the undo log if there's an
update in progress, because rows they need may have been moved out to
the undo log as they're updated in the main table storage.

[snip description]

> I understand that my suggestion seems to be too simplified and also that
> there are many implementation details and difficulties that I am not
> aware.

It sounds like you're describing Oracle-style MVCC, using redo logs.

http://blogs.sybase.com/database/2009/04/mvcc-dispelling-some-oracle-fudunderstanding-the-cost/

http://en.wikipedia.org/wiki/Multiversion_concurrency_control

Oracle's MVCC approach has its own costs. Like Pg's, those costs
increase with update/delete frequency. Instead of table bloat, Oracle
suffers from redo log growth (or redo log size management issues).
Instead of increased table scan costs from dead rows, Oracle suffers
from random I/O costs as it looks up the out-of-line redo log for old
rows. Instead of long-running writer transactions causing table bloat,
Oracle can have problems with long-running reader transactions aborting
when the redo log runs out of space.

Personally, I don't know enough to know which is "better". I suspect
they're just different, with different trade-offs. If redo logs allow
you  to do without write-ahead logging, that'd be interesting - but
then, the WAL is useful for all sorts of replication options, and the
use of linear WALs means that write ordering in the tables doesn't need
to be as strict, which has performance advantages.

--
Craig Ringer

Re: MVCC performance issue

From

Rich

Date:

13 November 2010, 03:05:57

In reading what you are describing, don't you think PG 9 goes a long way to helping you out?

On Sat, Nov 13, 2010 at 12:53 AM, Craig Ringer <craig@postnewspapers.com.au> wrote:

On 11/12/2010 02:25 AM, Kyriacos Kyriacou wrote:

The
result is to have huge fragmentation on table space, unnecessary updates
in all affected indexes, unnecessary costly I/O operations, poor
performance on SELECT that retrieves big record sets (i.e. reports etc)
and slower updates.

Yep. It's all about trade-offs. For some workloads the in-table MVCC storage setup works pretty darn poorly, but for most it seems to work quite well.

There are various other methods of implementing relational storage with ACID properties. You can exclude all other transactions while making a change to a table, ensuring that nobody else can see "old" or "new" rows so there's no need to keep them around. You can use an out-of-line redo log (a-la Oracle). Many other methods exist, too.

They all have advantages and disadvantages for different workloads. It's far from trivial to mix multiple schemes within a single database, so mixing and matching schemes for different parts of your DB isn't generally practical.

1) When a raw UPDATE is performed, store all "new raw versions" either
in separate temporary table space
or in a reserved space at the end of each table (can be allocated
dynamically) etc

OK, so you want a redo log a-la Oracle?

2) Any SELECT queries within the same session will be again accessing
the new version of the row
3) Any SELECT queries from other users will still be accessing the old
version

... and incurring horrible random I/O penalties if the redo log doesn't fit in RAM. Again, a-la Oracle.

Even read-only transactions have to hit the undo log if there's an update in progress, because rows they need may have been moved out to the undo log as they're updated in the main table storage.

[snip description]

I understand that my suggestion seems to be too simplified and also that
there are many implementation details and difficulties that I am not
aware.

It sounds like you're describing Oracle-style MVCC, using redo logs.

http://blogs.sybase.com/database/2009/04/mvcc-dispelling-some-oracle-fudunderstanding-the-cost/

http://en.wikipedia.org/wiki/Multiversion_concurrency_control

Oracle's MVCC approach has its own costs. Like Pg's, those costs increase with update/delete frequency. Instead of table bloat, Oracle suffers from redo log growth (or redo log size management issues). Instead of increased table scan costs from dead rows, Oracle suffers from random I/O costs as it looks up the out-of-line redo log for old rows. Instead of long-running writer transactions causing table bloat, Oracle can have problems with long-running reader transactions aborting when the redo log runs out of space.

Personally, I don't know enough to know which is "better". I suspect they're just different, with different trade-offs. If redo logs allow you to do without write-ahead logging, that'd be interesting - but then, the WAL is useful for all sorts of replication options, and the use of linear WALs means that write ordering in the tables doesn't need to be as strict, which has performance advantages.

--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

#avg_ls_inline_popup { position:absolute; z-index:9999; padding: 0px 0px; margin-left: 0px; margin-top: 0px; width: 240px; overflow: hidden; word-wrap: break-word; color: black; font-size: 10px; text-align: left; line-height: 13px;}

Re: MVCC performance issue

From

Mladen Gogala

Date:

13 November 2010, 14:38:59

Craig Ringer wrote:
> It sounds like you're describing Oracle-style MVCC, using redo logs.
>
> http://blogs.sybase.com/database/2009/04/mvcc-dispelling-some-oracle-fudunderstanding-the-cost/
>
>
Craig, this is an interesting blog page, making some valid points about
the multiversioning vs. locking. The ATM example, however, is
unrealistic and couldn't have happened the way the author describes.
Oracle has the same write consistency mechanism as Postgres and it
restarts the transaction if the transaction blocks were updated while
the transaction was waiting. In other words, the wife's transaction
would have been restarted before committing, the transaction would get
the balance accurately and there wouldn't be a loss of $250.
Such an example is naive, sheer FUD. If that was the case, no bank in
the whole wide world would be using Oracle, and many of them do, I dare
say many more are using Oracle than Sybase. That means that they're not
losing money if 2 spouses decide to withdraw money from the joint
account simultaneously. Given the number of people in the world, I
imagine that to be a rather common and ordinary situation for the banks.
The example is plain silly. Here is what I have in mind as "write
consistency":
http://www.postgresql.org/docs/9.0/static/transaction-iso.html#XACT-READ-COMMITTED:
" If the first updater rolls back, then its effects are negated and the
second updater can proceed with updating the originally found row. If
the first updater commits, the second updater will ignore the row if the
first updater deleted it, otherwise it will attempt to apply its
operation to the updated version of the row. The search condition of the
command (the WHERE clause) is re-evaluated to see if the updated version
of the row still matches the search condition."

Essentially the same behavior is described here, for Oracle:
http://tkyte.blogspot.com/2005/08/something-different-part-i-of-iii.html
"Obviously, we cannot modify an old version of a block—when we go to
modify a row, we must modify the current version of that block.
Additionally, Oracle cannot just simply skip this row, as that would be
an inconsistent read and unpredictable. What we’ll discover is that in
such cases, Oracle will restart the write modification from scratch."

Postgres re-evaluates the where condition, Oracle restarts the entire
transaction, but neither MVCC mechanism would allow for the silly ATM
example described in the blog. Both databases would have noticed change
in the balance, both databases would have ended with the proper balance
in the account.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

Re: MVCC performance issue

From

Craig Ringer

Date:

13 November 2010, 20:11:19

On 11/14/2010 02:38 AM, Mladen Gogala wrote:
> Craig Ringer wrote:
>> It sounds like you're describing Oracle-style MVCC, using redo logs.
>>
>> http://blogs.sybase.com/database/2009/04/mvcc-dispelling-some-oracle-fudunderstanding-the-cost/
>>

> Craig, this is an interesting blog page, making some valid points about
> the multiversioning vs. locking. The ATM example, however, is
> unrealistic and couldn't have happened the way the author describes.

Yep, you're quite right. I was using it for its explanation of some of
the costs of MVCC as Oracle implements it, because it's surprisingly
hard to find explanations/analysis of that with some quick Google
searching. I hadn't read beyond that part.

I'd be really interested in some *good* writeups of the costs/benefits
of the various common mvcc and locking based rdbms implementations.

Thanks for posting a breakdown of the issues with that article, lest
others be mislead. Appreciated.

--
Craig Ringer

Re: MVCC performance issue

From

Marti Raudsepp

Date:

14 November 2010, 04:31:16

On Sat, Nov 13, 2010 at 07:53, Craig Ringer <craig@postnewspapers.com.au> wrote:
> Oracle's MVCC approach has its own costs. Like Pg's, those costs increase
> with update/delete frequency. Instead of table bloat, Oracle suffers from
> redo log growth (or redo log size management issues). Instead of increased
> table scan costs from dead rows, Oracle suffers from random I/O costs as it
> looks up the out-of-line redo log for old rows. Instead of long-running
> writer transactions causing table bloat, Oracle can have problems with
> long-running reader transactions aborting when the redo log runs out of
> space.

Another advantage of Oracle's approach seems that they need much less
tuple-level overhead. IMO the 23-byte tuple overhead is a much bigger
drawback in Postgres than table fragmentation.

Regards,
Marti

Re: MVCC performance issue

From

Marti Raudsepp

Date:

14 November 2010, 04:46:53

On Thu, Nov 11, 2010 at 20:25, Kyriacos Kyriacou
<kyriacosk@prime-tel.com> wrote:
> By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a
> new copy of the row in a new location.

> result is to have huge fragmentation on table space, unnecessary updates
> in all affected indexes, unnecessary costly I/O operations, poor
> performance on SELECT that retrieves big record sets (i.e. reports etc)
> and slower updates.

Have you tried reducing the table fillfactor and seeing if HOT update
ratio increases?

PostgreSQL 8.3 introduced HOT updates as kind of a middle ground -- if
the update doesn't affect indexed columns and there's enough space in
the same page that is being updated, then the new version will be
written in the same page and indexes don't need to be touched at all.

Regards,
Marti

Re: MVCC performance issue

From

Mladen Gogala

Date:

14 November 2010, 20:33:03

Marti Raudsepp wrote:
>
>
> Another advantage of Oracle's approach seems that they need much less
> tuple-level overhead. IMO the 23-byte tuple overhead is a much bigger
> drawback in Postgres than table fragmentation.
>
> Regards,
> Marti
>
>
Oracle, however, does have a problem with "ORA-1555 Snapshot too old",
precisely because of their implementation of MVCC. In other words, if
your query is running long and Oracle is not able to reconstruct the old
rows from the UNDO segments, you're out of luck and your query will die.
The greatest burden of the Postgres implementation is the fact that
there is no row id, so that the table header and the indexes need to be
updated much more frequently than is the case with Oracle.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

Re: MVCC performance issue

From

Robert Haas

Date:

29 November 2010, 20:22:08

On Thu, Nov 11, 2010 at 1:25 PM, Kyriacos Kyriacou
<kyriacosk@prime-tel.com> wrote:
> This is my first post in this mailing list and I would like to raise an
> issue that in my opinion is causing performance issues of PostgreSQL
> especially in a transaction processing environment. In my company we are
> using PostgreSQL for the last 8 year for our in-house developed billing
> system (telecom). The last few months we started considering moving to
> another RDBMS just because of this issue.
>
> After all these years, I believe that the biggest improvement that could
> be done and will boost overall performance especially for enterprise
> application will be to improve Multiversion Concurrency Control (MVCC)
> mechanism. In theory this seems to be improving performance for SELECT
> queries but on tables with very intensive and frequent updates, even
> that is not fully true because of the fragmentation of data caused by
> MVCC. I saw cases were a SELECT COUNT(*) on an empty (!!!) table (used
> as a buffer) took more than 40min to return a result! VACUUM is not a
> solution in my opinion even though after the introduction of autovacuum
> daemon situation got much better.

There are probably a number of ways that the behavior you're seeing
could be improved without switching databases or rewriting PostgreSQL,
but you haven't provided enough information here for anyone to help
you in a meaningful way - such as the version of PostgreSQL you're
running.  One obvious suggestion would be to empty your table using
TRUNCATE rather than DELETE, which will avoid the particular problem
you're describing here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company