Re: MVCC overheads - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: MVCC overheads
Date
Msg-id CAHyXU0wc-s22-GAwBjn7L6zRDk_Q=D7_rbDALujB4JSEZJbY2A@mail.gmail.com
Whole thread Raw
In response to MVCC overheads  (Pete Stevenson <etep.nosnevets@gmail.com>)
List pgsql-hackers
On Thu, Jul 7, 2016 at 11:45 AM, Pete Stevenson
<etep.nosnevets@gmail.com> wrote:
> Hi postgresql hackers -
>
> I would like to find some analysis (published work, blog posts) on the overheads affiliated with the guarantees
providedby MVCC isolation. More specifically, assuming the current workload is CPU bound (as opposed to IO) what is the
CPUoverhead of generating the WAL, the overhead of version checking and version creation, and of garbage collecting old
andunnecessary versions? For what it’s worth, I am working on a research project where it is envisioned that some of
thiswork can be offloaded. 

That's going to be hard to measure.   First, what you didn't say is,
'with respect to what?'. You mention WAL for example.  WAL is more of
a crash safety mechanism than anything and it's not really fair to
include it in an analysis of 'MVCC overhead', or at least not
completely.  One thing that MVCC *does* objectively cause is bloat,
although you can still get bloat without MVCC if you (for example)
delete rows or rewrite rows such that they can't fit in their old
slot.

MVCC definitely incurs some runtime overhead to check visibility but
the amount of overhead is highly dependent on the specific workload.
Postgres 'hint bits' reduce the cost to near zero for many workloads
but in other workloads they are expensive to maintain and cause a lot
of extra traffic.   One nice feature about not having to worry about
visibility is that you can read data directly out of the index.  We
have some workarounds to deal with that ('all visible bit') but again
the amount of benefit from that strategy is going to be very situation
specific.

Stepping back, the overhead of MVCC in postgres (and probably other
systems too) has been continually reduced over the years -- the really
nasty parts have been relegated to background cleanup processing.
That processing is pretty sequential and the 'i/o bottleneck' is
finally getting solved on cheap storage pushing things back into the
cpu space.

In summary, I think the future of MVCC and transactional systems is
very bright, and the data management systems that discard
transactional safety in order to get some short term performance gains
is, uh, not so bright.  Transactions are essential in systems where
data integrity matters.

merlin



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
Next
From: Pavel Stehule
Date:
Subject: Re: Showing parallel status in \df+