Re: MVCC overheads - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: MVCC overheads
Date
Msg-id CAMsr+YFgQ9kXFtgR=FJwD=GYfTmhgdaqL9tcZdudKs4eFjMLOg@mail.gmail.com
Whole thread Raw
In response to Re: MVCC overheads  (Pete Stevenson <etep.nosnevets@gmail.com>)
List pgsql-hackers
On 8 July 2016 at 03:50, Pete Stevenson <etep.nosnevets@gmail.com> wrote:
Hi Simon -

Thanks for the note. I think it's fair to say that I didn't provide enough context, so let me try and elaborate on my question.

Please reply in-line in posts to make it easier to follow conversations with multiple people.
  
It is the case that for the database to implement MVCC it must provide consistent read to multiple different versions of data, i.e. depending on the version used at transaction start.

Not necessarily transaction start; for REPEATABLE READ isolation, statement start is sufficient, or even weaker than that.
 
I'm not an expert on postgresql internals, but this must have some cost.

Sure it does. Disk space, efficiency of use of RAM for disk cache, CPU cost of scanning over not-visible tuples, etc.
 
I think the cost related to MVCC guarantees can roughly be categorized as: creating new versions (linking them in)

The way PostgreSQL does that (read the manual) is pretty lightweight. You will have already found the old tuple so setting its xmax is cheap. Writing the new tuple costs much the same as an insert.
 
version checking on read

Yep. In particular, index scans because PostgreSQL doesn't maintain visibility information in indexes. Read up on PostgreSQL's mvcc implementation, index scans, index-only scans, visibility map, etc.
 
garbage collecting old versions

As implemented in PostgreSQL by VACUUM
 
and then there is an additional cost that I am interested in (again not claiming it is unnecessary in any sense) but there is a cost to generating the log.

The write-ahead log is orthogonal to MVCC. You can have MVCC without WAL (or other write durability). You can have write durability without MVCC. The two are almost entirely unrelated.
 
Thanks, by the way, for the warning about lab vs. reality. That's why I'm asking this question here. I want to keep the hypothetical tagged as such, but find defensible and realistic metrics where those exist, i.e. in this instance, we do have a database that can use MVCC. It should be possible to figure out how much work goes into maintaining that property.

MVCC logic is woven deeply thoughout PostgreSQL. I'm not sure how you'd even begin to offload it in any meaningful way, nor if it'd be useful to do so. Presumably you're thinking of some way to tell the storage layer "show me the table as if it has only rows visible to [this xact]" so Pg doesn't have to do any checking at all. But it's not always that simple. See:

- Logical decoding (time travel)
- VACUUM
- EvalPlanQual, re-checks of updated rows
- ...
 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Showing parallel status in \df+
Next
From: Simon Riggs
Date:
Subject: Re: A Modest Upgrade Proposal