Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Gavin Sherry
Subject Re: vacuum, performance, and MVCC
Date
Msg-id Pine.LNX.4.58.0606231205470.32171@linuxworld.com.au
Whole thread Raw
In response to Re: vacuum, performance, and MVCC  (Agent M <agentm@themactionfaction.com>)
Responses Re: vacuum, performance, and MVCC  ("Jonah H. Harris" <jonah.harris@gmail.com>)
Re: vacuum, performance, and MVCC  (Florian Weimer <fw@deneb.enyo.de>)
List pgsql-hackers
On Thu, 22 Jun 2006, Agent M wrote:

>
> On Jun 22, 2006, at 9:56 PM, Christopher Kings-Lynne wrote:
>
> >> The example is a very active web site, the flow is this:
> >> query for session information
> >> process HTTP request
> >> update session information
> >> This happens for EVERY http request. Chances are that you won't have
> >> concurrent requests for the same row, but you may have well over 100
> >> HTTP
> >> server processes/threads answering queries in your web server farm.
> >
> > You're crazy :)  Use memcache, not the DB :)
>
> Still, the database is the one central location that the apaches can
> connect too- postgres already has a lot of application platform
> features- locking synchronization, asynchronous notifications,
> arbitrary pl code.
>
> Personally, I think that a special non-MVCC table type could be
> created- the catalogs are similarly flat. What I envision is a table
> type that can only be accessed "outside" transactions (like AutoCommit
> mode)- this is already possible to implement in plperl for a single
> session. It would be more efficient to have something like a global
> temp table hanging around...
>
> Just some random ideas...

Unfortunately, it's not so simple.

What if a user enters a transaction block, modifies a normal table,
modifies this 'special table'... then rolls back? This is the problem
MySQL has with innodb and myisam tables.

Well... we could just document that. If only.

What happens if, as a part the update to the special table, we encounter
and error? MVCC currently guarantees that this modification will be
invisible. Without MVCC, we have no such capability.

There seems to be a bit of confusion about what MVCC is. PostgreSQL is not
the only MVCC database. InnoDB is MVCC. Oracle is MVCC. As far as I know,
PostgreSQL is the only MVCC database with a 'non-overwriting storage
manager'. The other MVCC databases maintain UNDO logs outside of the
table's data files. When an update occurs, the existing row version is
copied to te UNDO file, the new data replaces the old and a backward
pointer from the table row to the UNDO log is created. Concurrent reads
must go into the UNDO log and find the version visible to them. This
implements the principle of MVCC and uses snapshot isolation, like we do,
to isolate read/write concurrency.

Overwriting MVCC comes with its own baggage. Ask any Oracle user about
error ORA-01555[1]. There's also the added cost of managing the UNDO logs,
the cost of jumping around between files to get row versions and so on.

Also, it leads to inefficiency with variable size data types. The new
version of a row might be longer or shorter than the previous version and
this has to be causing them a headaches and performance penalties.

As for single version databases -- they have a tonne of problems
themselves. They need to maintain UNDO functionality so as to deal with
the error handling I detailed above but they also cannot implement MVCC
rules for all cases: readers to not block writes, writers do not block
readers. Instead, they implement a large lock matrix and certain types of
queries use certain types of granular locks.

Some potentially interesting reading material on these ideas:

J. Gray & A Reuter, Transaction Processing: Concepts and Techniques
US Patent Number 5,870,758 -- (one of?) Oracle's snapshot isolation patent
Tom Lane's MVCC talk:
http://www.postgresql.org/files/developer/transactions.pdf

Thanks,

Gavin


---
[1] Basically, the UNDO logs are a circular buffer and, remarkably, Oracle
doesnt seem to let the buffer expand if there is a long running
transaction. (We do this for WAL).

Basically, in this case Oracle finds itself in a compromising position
because if any more data is written to the UNDO log the query affected
will not be able to roll back and/or concurrent queries will not be able
to find the correct row version for their snapshot.

UNDO log size can be adjusted but this situation bites Oracle users
constantly.


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Row comparison for tables (was Re: vacuum, performance, and MVCC)
Next
From: Christopher Browne
Date:
Subject: Re: vacuum, performance, and MVCC