Thread: MVCC overheads

MVCC overheads

From
Pete Stevenson
Date:
Hi postgresql hackers -

I would like to find some analysis (published work, blog posts) on the overheads affiliated with the guarantees
providedby MVCC isolation. More specifically, assuming the current workload is CPU bound (as opposed to IO) what is the
CPUoverhead of generating the WAL, the overhead of version checking and version creation, and of garbage collecting old
andunnecessary versions? For what it’s worth, I am working on a research project where it is envisioned that some of
thiswork can be offloaded. 

Thank you,
Pete Stevenson




Re: MVCC overheads

From
Simon Riggs
Date:
On 7 July 2016 at 17:45, Pete Stevenson <etep.nosnevets@gmail.com> wrote:
Hi postgresql hackers -

I would like to find some analysis (published work, blog posts) on the overheads affiliated with the guarantees provided by MVCC isolation. More specifically, assuming the current workload is CPU bound (as opposed to IO) what is the CPU overhead of generating the WAL, the overhead of version checking and version creation, and of garbage collecting old and unnecessary versions? For what it’s worth, I am working on a research project where it is envisioned that some of this work can be offloaded.

MVCC is a benefit, not an overhead. To understand that you should compare MVCC with a system that performs S2PL.

If you're thinking that somehow consistency isn't important, I'd hope that you also consider some way to evaluate the costs associated with inconsistent and incorrect results in applications, or other architectural restrictions imposed to make that possible. It's easy to make assumptions in the lab that don't work in the real world.
 
--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: MVCC overheads

From
Pete Stevenson
Date:
Hi Simon -

Thanks for the note. I think it's fair to say that I didn't provide enough context, so let me try and elaborate on my question.

I agree, MVCC is a benefit. The research angle here is about enabling MVCC with hardware offload. Since I didn't explicitly mention it, the offload I refer to will respect all consistency guarantees also.

It is the case that for the database to implement MVCC it must provide consistent read to multiple different versions of data, i.e. depending on the version used at transaction start. I'm not an expert on postgresql internals, but this must have some cost. I think the cost related to MVCC guarantees can roughly be categorized as: creating new versions (linking them in), version checking on read, garbage collecting old versions, and then there is an additional cost that I am interested in (again not claiming it is unnecessary in any sense) but there is a cost to generating the log.

Thanks, by the way, for the warning about lab vs. reality. That's why I'm asking this question here. I want to keep the hypothetical tagged as such, but find defensible and realistic metrics where those exist, i.e. in this instance, we do have a database that can use MVCC. It should be possible to figure out how much work goes into maintaining that property.

Thank you,
Pete



On Thu, Jul 7, 2016 at 11:10 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 7 July 2016 at 17:45, Pete Stevenson <etep.nosnevets@gmail.com> wrote:
Hi postgresql hackers -

I would like to find some analysis (published work, blog posts) on the overheads affiliated with the guarantees provided by MVCC isolation. More specifically, assuming the current workload is CPU bound (as opposed to IO) what is the CPU overhead of generating the WAL, the overhead of version checking and version creation, and of garbage collecting old and unnecessary versions? For what it’s worth, I am working on a research project where it is envisioned that some of this work can be offloaded.

MVCC is a benefit, not an overhead. To understand that you should compare MVCC with a system that performs S2PL.

If you're thinking that somehow consistency isn't important, I'd hope that you also consider some way to evaluate the costs associated with inconsistent and incorrect results in applications, or other architectural restrictions imposed to make that possible. It's easy to make assumptions in the lab that don't work in the real world.
 
--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: MVCC overheads

From
Simon Riggs
Date:
On 7 July 2016 at 20:50, Pete Stevenson <etep.nosnevets@gmail.com> wrote:
Hi Simon -

Thanks for the note. I think it's fair to say that I didn't provide enough context, so let me try and elaborate on my question.

I agree, MVCC is a benefit. The research angle here is about enabling MVCC with hardware offload. Since I didn't explicitly mention it, the offload I refer to will respect all consistency guarantees also.

It is the case that for the database to implement MVCC it must provide consistent read to multiple different versions of data, i.e. depending on the version used at transaction start. I'm not an expert on postgresql internals, but this must have some cost. I think the cost related to MVCC guarantees can roughly be categorized as: creating new versions (linking them in), version checking on read, garbage collecting old versions, and then there is an additional cost that I am interested in (again not claiming it is unnecessary in any sense) but there is a cost to generating the log.

Thanks, by the way, for the warning about lab vs. reality. That's why I'm asking this question here. I want to keep the hypothetical tagged as such, but find defensible and realistic metrics where those exist, i.e. in this instance, we do have a database that can use MVCC. It should be possible to figure out how much work goes into maintaining that property.

PostgreSQL uses a no overwrite storage mechanism, so any additional row versions are maintained in the same table alongside other rows. The MVCC actions are mostly mixed in with other aspects of the storage, so not isolated much for offload.

Oracle has a different mechanism that does isolate changed row versions into a separate data structure, so would be much more amenable to offload than PostgreSQL, in its current form.

Maybe look at SLRUs (clog etc) as a place to offload something?

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: MVCC overheads

From
Craig Ringer
Date:
On 8 July 2016 at 03:50, Pete Stevenson <etep.nosnevets@gmail.com> wrote:
Hi Simon -

Thanks for the note. I think it's fair to say that I didn't provide enough context, so let me try and elaborate on my question.

Please reply in-line in posts to make it easier to follow conversations with multiple people.
  
It is the case that for the database to implement MVCC it must provide consistent read to multiple different versions of data, i.e. depending on the version used at transaction start.

Not necessarily transaction start; for REPEATABLE READ isolation, statement start is sufficient, or even weaker than that.
 
I'm not an expert on postgresql internals, but this must have some cost.

Sure it does. Disk space, efficiency of use of RAM for disk cache, CPU cost of scanning over not-visible tuples, etc.
 
I think the cost related to MVCC guarantees can roughly be categorized as: creating new versions (linking them in)

The way PostgreSQL does that (read the manual) is pretty lightweight. You will have already found the old tuple so setting its xmax is cheap. Writing the new tuple costs much the same as an insert.
 
version checking on read

Yep. In particular, index scans because PostgreSQL doesn't maintain visibility information in indexes. Read up on PostgreSQL's mvcc implementation, index scans, index-only scans, visibility map, etc.
 
garbage collecting old versions

As implemented in PostgreSQL by VACUUM
 
and then there is an additional cost that I am interested in (again not claiming it is unnecessary in any sense) but there is a cost to generating the log.

The write-ahead log is orthogonal to MVCC. You can have MVCC without WAL (or other write durability). You can have write durability without MVCC. The two are almost entirely unrelated.
 
Thanks, by the way, for the warning about lab vs. reality. That's why I'm asking this question here. I want to keep the hypothetical tagged as such, but find defensible and realistic metrics where those exist, i.e. in this instance, we do have a database that can use MVCC. It should be possible to figure out how much work goes into maintaining that property.

MVCC logic is woven deeply thoughout PostgreSQL. I'm not sure how you'd even begin to offload it in any meaningful way, nor if it'd be useful to do so. Presumably you're thinking of some way to tell the storage layer "show me the table as if it has only rows visible to [this xact]" so Pg doesn't have to do any checking at all. But it's not always that simple. See:

- Logical decoding (time travel)
- VACUUM
- EvalPlanQual, re-checks of updated rows
- ...
 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: MVCC overheads

From
Pete Stevenson
Date:
Good info, thanks for the note. Agreed that it is difficult to pull things apart to isolate these features for offload — so actually running experiments with offload is not possible, as you point out (and for other reasons).

Maybe I could figure out the lines of code that add versions into a table and then those that collect old versions (they do get collected, right?). Anyway, thought being I could profile while running TPC-C or similar. I was hoping that someone might be able to jump on this with a response that they already did something similar. I know that Stonebraker has done some analysis along these lines, but I’m looking for an independent result that confirms (or not) his work.

Thank you,
Pete Stevenson


On Jul 7, 2016, at 3:43 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 7 July 2016 at 20:50, Pete Stevenson <etep.nosnevets@gmail.com> wrote:
Hi Simon -

Thanks for the note. I think it's fair to say that I didn't provide enough context, so let me try and elaborate on my question.

I agree, MVCC is a benefit. The research angle here is about enabling MVCC with hardware offload. Since I didn't explicitly mention it, the offload I refer to will respect all consistency guarantees also.

It is the case that for the database to implement MVCC it must provide consistent read to multiple different versions of data, i.e. depending on the version used at transaction start. I'm not an expert on postgresql internals, but this must have some cost. I think the cost related to MVCC guarantees can roughly be categorized as: creating new versions (linking them in), version checking on read, garbage collecting old versions, and then there is an additional cost that I am interested in (again not claiming it is unnecessary in any sense) but there is a cost to generating the log.

Thanks, by the way, for the warning about lab vs. reality. That's why I'm asking this question here. I want to keep the hypothetical tagged as such, but find defensible and realistic metrics where those exist, i.e. in this instance, we do have a database that can use MVCC. It should be possible to figure out how much work goes into maintaining that property.

PostgreSQL uses a no overwrite storage mechanism, so any additional row versions are maintained in the same table alongside other rows. The MVCC actions are mostly mixed in with other aspects of the storage, so not isolated much for offload.

Oracle has a different mechanism that does isolate changed row versions into a separate data structure, so would be much more amenable to offload than PostgreSQL, in its current form.

Maybe look at SLRUs (clog etc) as a place to offload something?

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: MVCC overheads

From
Alvaro Herrera
Date:
Pete Stevenson wrote:

> Maybe I could figure out the lines of code that add versions into a
> table and then those that collect old versions (they do get collected,
> right?). Anyway, thought being I could profile while running TPC-C or
> similar. I was hoping that someone might be able to jump on this with
> a response that they already did something similar.

Old tuple versions are "collected" (removed) by either vacuum (see
vacuumlazy.c) and heap_page_prune.  The latter is one thing that could
perhaps somehow be offloaded, as it's quite independent from the other
stuff.  You can prune removable tuples at no additional cost from an
unlocked dirty page, which is a useful optimization because then
client-connected backends don't need to prune them later.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: MVCC overheads

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Pete Stevenson wrote:
>> Maybe I could figure out the lines of code that add versions into a
>> table and then those that collect old versions (they do get collected,
>> right?). Anyway, thought being I could profile while running TPC-C or
>> similar. I was hoping that someone might be able to jump on this with
>> a response that they already did something similar.

> Old tuple versions are "collected" (removed) by either vacuum (see
> vacuumlazy.c) and heap_page_prune.  The latter is one thing that could
> perhaps somehow be offloaded, as it's quite independent from the other
> stuff.  You can prune removable tuples at no additional cost from an
> unlocked dirty page, which is a useful optimization because then
> client-connected backends don't need to prune them later.

VACUUM in itself is an offloading optimization; the whole point of it
is to do maintenance in a background process not foreground queries.
AFAIR, heap_page_prune is just a small subset of VACUUM work that
we decided we could afford to do in foreground.
        regards, tom lane



Re: MVCC overheads

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Pete Stevenson wrote:
> >> Maybe I could figure out the lines of code that add versions into a
> >> table and then those that collect old versions (they do get collected,
> >> right?). Anyway, thought being I could profile while running TPC-C or
> >> similar. I was hoping that someone might be able to jump on this with
> >> a response that they already did something similar.
> 
> > Old tuple versions are "collected" (removed) by either vacuum (see
> > vacuumlazy.c) and heap_page_prune.  The latter is one thing that could
> > perhaps somehow be offloaded, as it's quite independent from the other
> > stuff.  You can prune removable tuples at no additional cost from an
> > unlocked dirty page, which is a useful optimization because then
> > client-connected backends don't need to prune them later.
> 
> VACUUM in itself is an offloading optimization; the whole point of it
> is to do maintenance in a background process not foreground queries.

Well, if VACUUM worked so great, we wouldn't get so many trouble reports
with it.  There's substantial improvement we could make in that area.

> AFAIR, heap_page_prune is just a small subset of VACUUM work that
> we decided we could afford to do in foreground.

Sure, but we could *also* do it separately, splitting VACUUMs tasks of
tuple freezing, page compaction, and index entry removal each into
separate tasks.

Currently vacuuming a 4TB table can take weeks, meanwhile dead tuples
accumulate in already scanned pages leading to further bloat, leading to
Xid wraparound danger later, emergency vacuuming leading to applications
blocking on DDL.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: MVCC overheads

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> VACUUM in itself is an offloading optimization; the whole point of it
>> is to do maintenance in a background process not foreground queries.

> Well, if VACUUM worked so great, we wouldn't get so many trouble reports
> with it.  There's substantial improvement we could make in that area.

Sure, and we've been chipping away at that problem over time, including
some significant improvement in 9.6.  My point is just that it's a good
idea to understand VACUUM as being some pre-existing work that's related
to this offloading idea.

> Sure, but we could *also* do it separately, splitting VACUUMs tasks of
> tuple freezing, page compaction, and index entry removal each into
> separate tasks.

Uh ... wouldn't that tend to make things worse?  The knocks on VACUUM are
too much I/O and too much latency for cleanup, and I can't see how
splitting it does anything good on either score.
        regards, tom lane



Re: MVCC overheads

From
Kevin Grittner
Date:
On Thu, Jul 7, 2016 at 11:45 AM, Pete Stevenson
<etep.nosnevets@gmail.com> wrote:

> I would like to find some analysis (published work, blog posts)
> on the overheads affiliated with the guarantees provided by MVCC
> isolation.

There are three levels of isolation implemented[1]; the incremental
cost of SERIALIZABLE isolation over REPEATABLE READ for several
standard benchmarking loads is available in section 8 of a paper
presented an a VLDB conference[2].

Hopefully that helps some.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1]  PostgreSQL current online documentation.  Transaction Isolation.
https://www.postgresql.org/docs/current/static/transaction-iso.html

[2]  Dan R. K. Ports and Kevin Grittner. 2012.    Serializable Snapshot Isolation in PostgreSQL.    Proceedings of the
VLDBEndowment, Vol. 5, No. 12.    The 38th International Conference on Very Large Data Bases,    August 27th - 31st
2012,Istanbul, Turkey.    http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf
 



Re: MVCC overheads

From
Peter Geoghegan
Date:
On Fri, Jul 8, 2016 at 11:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Sure, but we could *also* do it separately, splitting VACUUMs tasks of
>> tuple freezing, page compaction, and index entry removal each into
>> separate tasks.
>
> Uh ... wouldn't that tend to make things worse?  The knocks on VACUUM are
> too much I/O and too much latency for cleanup, and I can't see how
> splitting it does anything good on either score.

Has anyone ever done any kind of write-up of the "TED" design that was
discussed during FOSDEM (I hope I recall the name it was given
correctly)? Apparently that's something that's been discussed a few
times among senior community members, and I think it has promise.

-- 
Peter Geoghegan



Re: MVCC overheads

From
Merlin Moncure
Date:
On Thu, Jul 7, 2016 at 11:45 AM, Pete Stevenson
<etep.nosnevets@gmail.com> wrote:
> Hi postgresql hackers -
>
> I would like to find some analysis (published work, blog posts) on the overheads affiliated with the guarantees
providedby MVCC isolation. More specifically, assuming the current workload is CPU bound (as opposed to IO) what is the
CPUoverhead of generating the WAL, the overhead of version checking and version creation, and of garbage collecting old
andunnecessary versions? For what it’s worth, I am working on a research project where it is envisioned that some of
thiswork can be offloaded. 

That's going to be hard to measure.   First, what you didn't say is,
'with respect to what?'. You mention WAL for example.  WAL is more of
a crash safety mechanism than anything and it's not really fair to
include it in an analysis of 'MVCC overhead', or at least not
completely.  One thing that MVCC *does* objectively cause is bloat,
although you can still get bloat without MVCC if you (for example)
delete rows or rewrite rows such that they can't fit in their old
slot.

MVCC definitely incurs some runtime overhead to check visibility but
the amount of overhead is highly dependent on the specific workload.
Postgres 'hint bits' reduce the cost to near zero for many workloads
but in other workloads they are expensive to maintain and cause a lot
of extra traffic.   One nice feature about not having to worry about
visibility is that you can read data directly out of the index.  We
have some workarounds to deal with that ('all visible bit') but again
the amount of benefit from that strategy is going to be very situation
specific.

Stepping back, the overhead of MVCC in postgres (and probably other
systems too) has been continually reduced over the years -- the really
nasty parts have been relegated to background cleanup processing.
That processing is pretty sequential and the 'i/o bottleneck' is
finally getting solved on cheap storage pushing things back into the
cpu space.

In summary, I think the future of MVCC and transactional systems is
very bright, and the data management systems that discard
transactional safety in order to get some short term performance gains
is, uh, not so bright.  Transactions are essential in systems where
data integrity matters.

merlin



Re: MVCC overheads

From
Alvaro Herrera
Date:
Peter Geoghegan wrote:

> Has anyone ever done any kind of write-up of the "TED" design that was
> discussed during FOSDEM (I hope I recall the name it was given
> correctly)? Apparently that's something that's been discussed a few
> times among senior community members, and I think it has promise.

https://www.postgresql.org/message-id/55511D1F.7050902@iki.fi

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: MVCC overheads

From
Gavin Flower
Date:
Please see comment at the bottom of this post.

On 08/07/16 10:48, Pete Stevenson wrote:
> Good info, thanks for the note. Agreed that it is difficult to pull 
> things apart to isolate these features for offload — so actually 
> running experiments with offload is not possible, as you point out 
> (and for other reasons).
>
> Maybe I could figure out the lines of code that add versions into a 
> table and then those that collect old versions (they do get collected, 
> right?). Anyway, thought being I could profile while running TPC-C or 
> similar. I was hoping that someone might be able to jump on this with 
> a response that they already did something similar. I know that 
> Stonebraker has done some analysis along these lines, but I’m looking 
> for an independent result that confirms (or not) his work.
>
> Thank you,
> Pete Stevenson
>
>
>> On Jul 7, 2016, at 3:43 PM, Simon Riggs <simon@2ndquadrant.com 
>> <mailto:simon@2ndquadrant.com>> wrote:
>>
>> On 7 July 2016 at 20:50, Pete Stevenson <etep.nosnevets@gmail.com 
>> <mailto:etep.nosnevets@gmail.com>> wrote:
>>
>>     Hi Simon -
>>
>>     Thanks for the note. I think it's fair to say that I didn't
>>     provide enough context, so let me try and elaborate on my question.
>>
>>     I agree, MVCC is a benefit. The research angle here is about
>>     enabling MVCC with hardware offload. Since I didn't explicitly
>>     mention it, the offload I refer to will respect all consistency
>>     guarantees also.
>>
>>     It is the case that for the database to implement MVCC it must
>>     provide consistent read to multiple different versions of data,
>>     i.e. depending on the version used at transaction start. I'm not
>>     an expert on postgresql internals, but this must have some cost.
>>     I think the cost related to MVCC guarantees can roughly be
>>     categorized as: creating new versions (linking them in), version
>>     checking on read, garbage collecting old versions, and then there
>>     is an additional cost that I am interested in (again not claiming
>>     it is unnecessary in any sense) but there is a cost to generating
>>     the log.
>>
>>     Thanks, by the way, for the warning about lab vs. reality. That's
>>     why I'm asking this question here. I want to keep the
>>     hypothetical tagged as such, but find defensible and realistic
>>     metrics where those exist, i.e. in this instance, we do have a
>>     database that can use MVCC. It should be possible to figure out
>>     how much work goes into maintaining that property.
>>
>>
>> PostgreSQL uses a no overwrite storage mechanism, so any additional 
>> row versions are maintained in the same table alongside other rows. 
>> The MVCC actions are mostly mixed in with other aspects of the 
>> storage, so not isolated much for offload.
>>
>> Oracle has a different mechanism that does isolate changed row 
>> versions into a separate data structure, so would be much more 
>> amenable to offload than PostgreSQL, in its current form.
>>
>> Maybe look at SLRUs (clog etc) as a place to offload something?
>>
>> -- 
>> Simon Riggs http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/>
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
In this list, the convention is to post replies at the end (with some 
rare exceptions), or interspersed when appropriate, and to omit parts no 
longer relevant.

The motivation of bottom posting like this: is that people get to see 
the context before the reply, AND emails don't end up getting longer & 
longer as people reply at the beginning forgetting to trim the now 
irrelevant stuff at the end.


Cheers,
Gavin