Thread: MVCC overheads
Hi postgresql hackers - I would like to find some analysis (published work, blog posts) on the overheads affiliated with the guarantees providedby MVCC isolation. More specifically, assuming the current workload is CPU bound (as opposed to IO) what is the CPUoverhead of generating the WAL, the overhead of version checking and version creation, and of garbage collecting old andunnecessary versions? For what it’s worth, I am working on a research project where it is envisioned that some of thiswork can be offloaded. Thank you, Pete Stevenson
On 7 July 2016 at 17:45, Pete Stevenson <etep.nosnevets@gmail.com> wrote:
--
Hi postgresql hackers -
I would like to find some analysis (published work, blog posts) on the overheads affiliated with the guarantees provided by MVCC isolation. More specifically, assuming the current workload is CPU bound (as opposed to IO) what is the CPU overhead of generating the WAL, the overhead of version checking and version creation, and of garbage collecting old and unnecessary versions? For what it’s worth, I am working on a research project where it is envisioned that some of this work can be offloaded.
MVCC is a benefit, not an overhead. To understand that you should compare MVCC with a system that performs S2PL.
If you're thinking that somehow consistency isn't important, I'd hope that you also consider some way to evaluate the costs associated with inconsistent and incorrect results in applications, or other architectural restrictions imposed to make that possible. It's easy to make assumptions in the lab that don't work in the real world.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Simon -
Thanks for the note. I think it's fair to say that I didn't provide enough context, so let me try and elaborate on my question.
I agree, MVCC is a benefit. The research angle here is about enabling MVCC with hardware offload. Since I didn't explicitly mention it, the offload I refer to will respect all consistency guarantees also.
It is the case that for the database to implement MVCC it must provide consistent read to multiple different versions of data, i.e. depending on the version used at transaction start. I'm not an expert on postgresql internals, but this must have some cost. I think the cost related to MVCC guarantees can roughly be categorized as: creating new versions (linking them in), version checking on read, garbage collecting old versions, and then there is an additional cost that I am interested in (again not claiming it is unnecessary in any sense) but there is a cost to generating the log.
Thanks, by the way, for the warning about lab vs. reality. That's why I'm asking this question here. I want to keep the hypothetical tagged as such, but find defensible and realistic metrics where those exist, i.e. in this instance, we do have a database that can use MVCC. It should be possible to figure out how much work goes into maintaining that property.
Thank you,
Pete
On Thu, Jul 7, 2016 at 11:10 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 7 July 2016 at 17:45, Pete Stevenson <etep.nosnevets@gmail.com> wrote:--Hi postgresql hackers -
I would like to find some analysis (published work, blog posts) on the overheads affiliated with the guarantees provided by MVCC isolation. More specifically, assuming the current workload is CPU bound (as opposed to IO) what is the CPU overhead of generating the WAL, the overhead of version checking and version creation, and of garbage collecting old and unnecessary versions? For what it’s worth, I am working on a research project where it is envisioned that some of this work can be offloaded.MVCC is a benefit, not an overhead. To understand that you should compare MVCC with a system that performs S2PL.If you're thinking that somehow consistency isn't important, I'd hope that you also consider some way to evaluate the costs associated with inconsistent and incorrect results in applications, or other architectural restrictions imposed to make that possible. It's easy to make assumptions in the lab that don't work in the real world.Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 7 July 2016 at 20:50, Pete Stevenson <etep.nosnevets@gmail.com> wrote:
--
Hi Simon -Thanks for the note. I think it's fair to say that I didn't provide enough context, so let me try and elaborate on my question.I agree, MVCC is a benefit. The research angle here is about enabling MVCC with hardware offload. Since I didn't explicitly mention it, the offload I refer to will respect all consistency guarantees also.It is the case that for the database to implement MVCC it must provide consistent read to multiple different versions of data, i.e. depending on the version used at transaction start. I'm not an expert on postgresql internals, but this must have some cost. I think the cost related to MVCC guarantees can roughly be categorized as: creating new versions (linking them in), version checking on read, garbage collecting old versions, and then there is an additional cost that I am interested in (again not claiming it is unnecessary in any sense) but there is a cost to generating the log.Thanks, by the way, for the warning about lab vs. reality. That's why I'm asking this question here. I want to keep the hypothetical tagged as such, but find defensible and realistic metrics where those exist, i.e. in this instance, we do have a database that can use MVCC. It should be possible to figure out how much work goes into maintaining that property.
PostgreSQL uses a no overwrite storage mechanism, so any additional row versions are maintained in the same table alongside other rows. The MVCC actions are mostly mixed in with other aspects of the storage, so not isolated much for offload.
Oracle has a different mechanism that does isolate changed row versions into a separate data structure, so would be much more amenable to offload than PostgreSQL, in its current form.
Maybe look at SLRUs (clog etc) as a place to offload something?
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 8 July 2016 at 03:50, Pete Stevenson <etep.nosnevets@gmail.com> wrote:
Hi Simon -Thanks for the note. I think it's fair to say that I didn't provide enough context, so let me try and elaborate on my question.
Please reply in-line in posts to make it easier to follow conversations with multiple people.
It is the case that for the database to implement MVCC it must provide consistent read to multiple different versions of data, i.e. depending on the version used at transaction start.
Not necessarily transaction start; for REPEATABLE READ isolation, statement start is sufficient, or even weaker than that.
I'm not an expert on postgresql internals, but this must have some cost.
Sure it does. Disk space, efficiency of use of RAM for disk cache, CPU cost of scanning over not-visible tuples, etc.
I think the cost related to MVCC guarantees can roughly be categorized as: creating new versions (linking them in)
The way PostgreSQL does that (read the manual) is pretty lightweight. You will have already found the old tuple so setting its xmax is cheap. Writing the new tuple costs much the same as an insert.
version checking on read
Yep. In particular, index scans because PostgreSQL doesn't maintain visibility information in indexes. Read up on PostgreSQL's mvcc implementation, index scans, index-only scans, visibility map, etc.
garbage collecting old versions
As implemented in PostgreSQL by VACUUM
and then there is an additional cost that I am interested in (again not claiming it is unnecessary in any sense) but there is a cost to generating the log.
The write-ahead log is orthogonal to MVCC. You can have MVCC without WAL (or other write durability). You can have write durability without MVCC. The two are almost entirely unrelated.
Thanks, by the way, for the warning about lab vs. reality. That's why I'm asking this question here. I want to keep the hypothetical tagged as such, but find defensible and realistic metrics where those exist, i.e. in this instance, we do have a database that can use MVCC. It should be possible to figure out how much work goes into maintaining that property.
MVCC logic is woven deeply thoughout PostgreSQL. I'm not sure how you'd even begin to offload it in any meaningful way, nor if it'd be useful to do so. Presumably you're thinking of some way to tell the storage layer "show me the table as if it has only rows visible to [this xact]" so Pg doesn't have to do any checking at all. But it's not always that simple. See:
- Logical decoding (time travel)
- VACUUM
- EvalPlanQual, re-checks of updated rows
- ...
Good info, thanks for the note. Agreed that it is difficult to pull things apart to isolate these features for offload — so actually running experiments with offload is not possible, as you point out (and for other reasons).
Maybe I could figure out the lines of code that add versions into a table and then those that collect old versions (they do get collected, right?). Anyway, thought being I could profile while running TPC-C or similar. I was hoping that someone might be able to jump on this with a response that they already did something similar. I know that Stonebraker has done some analysis along these lines, but I’m looking for an independent result that confirms (or not) his work.
Thank you,
Pete Stevenson
On Jul 7, 2016, at 3:43 PM, Simon Riggs <simon@2ndquadrant.com> wrote:On 7 July 2016 at 20:50, Pete Stevenson <etep.nosnevets@gmail.com> wrote:Hi Simon -Thanks for the note. I think it's fair to say that I didn't provide enough context, so let me try and elaborate on my question.I agree, MVCC is a benefit. The research angle here is about enabling MVCC with hardware offload. Since I didn't explicitly mention it, the offload I refer to will respect all consistency guarantees also.It is the case that for the database to implement MVCC it must provide consistent read to multiple different versions of data, i.e. depending on the version used at transaction start. I'm not an expert on postgresql internals, but this must have some cost. I think the cost related to MVCC guarantees can roughly be categorized as: creating new versions (linking them in), version checking on read, garbage collecting old versions, and then there is an additional cost that I am interested in (again not claiming it is unnecessary in any sense) but there is a cost to generating the log.Thanks, by the way, for the warning about lab vs. reality. That's why I'm asking this question here. I want to keep the hypothetical tagged as such, but find defensible and realistic metrics where those exist, i.e. in this instance, we do have a database that can use MVCC. It should be possible to figure out how much work goes into maintaining that property.PostgreSQL uses a no overwrite storage mechanism, so any additional row versions are maintained in the same table alongside other rows. The MVCC actions are mostly mixed in with other aspects of the storage, so not isolated much for offload.Oracle has a different mechanism that does isolate changed row versions into a separate data structure, so would be much more amenable to offload than PostgreSQL, in its current form.Maybe look at SLRUs (clog etc) as a place to offload something?--Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Pete Stevenson wrote: > Maybe I could figure out the lines of code that add versions into a > table and then those that collect old versions (they do get collected, > right?). Anyway, thought being I could profile while running TPC-C or > similar. I was hoping that someone might be able to jump on this with > a response that they already did something similar. Old tuple versions are "collected" (removed) by either vacuum (see vacuumlazy.c) and heap_page_prune. The latter is one thing that could perhaps somehow be offloaded, as it's quite independent from the other stuff. You can prune removable tuples at no additional cost from an unlocked dirty page, which is a useful optimization because then client-connected backends don't need to prune them later. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Pete Stevenson wrote: >> Maybe I could figure out the lines of code that add versions into a >> table and then those that collect old versions (they do get collected, >> right?). Anyway, thought being I could profile while running TPC-C or >> similar. I was hoping that someone might be able to jump on this with >> a response that they already did something similar. > Old tuple versions are "collected" (removed) by either vacuum (see > vacuumlazy.c) and heap_page_prune. The latter is one thing that could > perhaps somehow be offloaded, as it's quite independent from the other > stuff. You can prune removable tuples at no additional cost from an > unlocked dirty page, which is a useful optimization because then > client-connected backends don't need to prune them later. VACUUM in itself is an offloading optimization; the whole point of it is to do maintenance in a background process not foreground queries. AFAIR, heap_page_prune is just a small subset of VACUUM work that we decided we could afford to do in foreground. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Pete Stevenson wrote: > >> Maybe I could figure out the lines of code that add versions into a > >> table and then those that collect old versions (they do get collected, > >> right?). Anyway, thought being I could profile while running TPC-C or > >> similar. I was hoping that someone might be able to jump on this with > >> a response that they already did something similar. > > > Old tuple versions are "collected" (removed) by either vacuum (see > > vacuumlazy.c) and heap_page_prune. The latter is one thing that could > > perhaps somehow be offloaded, as it's quite independent from the other > > stuff. You can prune removable tuples at no additional cost from an > > unlocked dirty page, which is a useful optimization because then > > client-connected backends don't need to prune them later. > > VACUUM in itself is an offloading optimization; the whole point of it > is to do maintenance in a background process not foreground queries. Well, if VACUUM worked so great, we wouldn't get so many trouble reports with it. There's substantial improvement we could make in that area. > AFAIR, heap_page_prune is just a small subset of VACUUM work that > we decided we could afford to do in foreground. Sure, but we could *also* do it separately, splitting VACUUMs tasks of tuple freezing, page compaction, and index entry removal each into separate tasks. Currently vacuuming a 4TB table can take weeks, meanwhile dead tuples accumulate in already scanned pages leading to further bloat, leading to Xid wraparound danger later, emergency vacuuming leading to applications blocking on DDL. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Tom Lane wrote: >> VACUUM in itself is an offloading optimization; the whole point of it >> is to do maintenance in a background process not foreground queries. > Well, if VACUUM worked so great, we wouldn't get so many trouble reports > with it. There's substantial improvement we could make in that area. Sure, and we've been chipping away at that problem over time, including some significant improvement in 9.6. My point is just that it's a good idea to understand VACUUM as being some pre-existing work that's related to this offloading idea. > Sure, but we could *also* do it separately, splitting VACUUMs tasks of > tuple freezing, page compaction, and index entry removal each into > separate tasks. Uh ... wouldn't that tend to make things worse? The knocks on VACUUM are too much I/O and too much latency for cleanup, and I can't see how splitting it does anything good on either score. regards, tom lane
On Thu, Jul 7, 2016 at 11:45 AM, Pete Stevenson <etep.nosnevets@gmail.com> wrote: > I would like to find some analysis (published work, blog posts) > on the overheads affiliated with the guarantees provided by MVCC > isolation. There are three levels of isolation implemented[1]; the incremental cost of SERIALIZABLE isolation over REPEATABLE READ for several standard benchmarking loads is available in section 8 of a paper presented an a VLDB conference[2]. Hopefully that helps some. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] PostgreSQL current online documentation. Transaction Isolation. https://www.postgresql.org/docs/current/static/transaction-iso.html [2] Dan R. K. Ports and Kevin Grittner. 2012. Serializable Snapshot Isolation in PostgreSQL. Proceedings of the VLDBEndowment, Vol. 5, No. 12. The 38th International Conference on Very Large Data Bases, August 27th - 31st 2012,Istanbul, Turkey. http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf
On Fri, Jul 8, 2016 at 11:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Sure, but we could *also* do it separately, splitting VACUUMs tasks of >> tuple freezing, page compaction, and index entry removal each into >> separate tasks. > > Uh ... wouldn't that tend to make things worse? The knocks on VACUUM are > too much I/O and too much latency for cleanup, and I can't see how > splitting it does anything good on either score. Has anyone ever done any kind of write-up of the "TED" design that was discussed during FOSDEM (I hope I recall the name it was given correctly)? Apparently that's something that's been discussed a few times among senior community members, and I think it has promise. -- Peter Geoghegan
On Thu, Jul 7, 2016 at 11:45 AM, Pete Stevenson <etep.nosnevets@gmail.com> wrote: > Hi postgresql hackers - > > I would like to find some analysis (published work, blog posts) on the overheads affiliated with the guarantees providedby MVCC isolation. More specifically, assuming the current workload is CPU bound (as opposed to IO) what is the CPUoverhead of generating the WAL, the overhead of version checking and version creation, and of garbage collecting old andunnecessary versions? For what it’s worth, I am working on a research project where it is envisioned that some of thiswork can be offloaded. That's going to be hard to measure. First, what you didn't say is, 'with respect to what?'. You mention WAL for example. WAL is more of a crash safety mechanism than anything and it's not really fair to include it in an analysis of 'MVCC overhead', or at least not completely. One thing that MVCC *does* objectively cause is bloat, although you can still get bloat without MVCC if you (for example) delete rows or rewrite rows such that they can't fit in their old slot. MVCC definitely incurs some runtime overhead to check visibility but the amount of overhead is highly dependent on the specific workload. Postgres 'hint bits' reduce the cost to near zero for many workloads but in other workloads they are expensive to maintain and cause a lot of extra traffic. One nice feature about not having to worry about visibility is that you can read data directly out of the index. We have some workarounds to deal with that ('all visible bit') but again the amount of benefit from that strategy is going to be very situation specific. Stepping back, the overhead of MVCC in postgres (and probably other systems too) has been continually reduced over the years -- the really nasty parts have been relegated to background cleanup processing. That processing is pretty sequential and the 'i/o bottleneck' is finally getting solved on cheap storage pushing things back into the cpu space. In summary, I think the future of MVCC and transactional systems is very bright, and the data management systems that discard transactional safety in order to get some short term performance gains is, uh, not so bright. Transactions are essential in systems where data integrity matters. merlin
Peter Geoghegan wrote: > Has anyone ever done any kind of write-up of the "TED" design that was > discussed during FOSDEM (I hope I recall the name it was given > correctly)? Apparently that's something that's been discussed a few > times among senior community members, and I think it has promise. https://www.postgresql.org/message-id/55511D1F.7050902@iki.fi -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Please see comment at the bottom of this post. On 08/07/16 10:48, Pete Stevenson wrote: > Good info, thanks for the note. Agreed that it is difficult to pull > things apart to isolate these features for offload — so actually > running experiments with offload is not possible, as you point out > (and for other reasons). > > Maybe I could figure out the lines of code that add versions into a > table and then those that collect old versions (they do get collected, > right?). Anyway, thought being I could profile while running TPC-C or > similar. I was hoping that someone might be able to jump on this with > a response that they already did something similar. I know that > Stonebraker has done some analysis along these lines, but I’m looking > for an independent result that confirms (or not) his work. > > Thank you, > Pete Stevenson > > >> On Jul 7, 2016, at 3:43 PM, Simon Riggs <simon@2ndquadrant.com >> <mailto:simon@2ndquadrant.com>> wrote: >> >> On 7 July 2016 at 20:50, Pete Stevenson <etep.nosnevets@gmail.com >> <mailto:etep.nosnevets@gmail.com>> wrote: >> >> Hi Simon - >> >> Thanks for the note. I think it's fair to say that I didn't >> provide enough context, so let me try and elaborate on my question. >> >> I agree, MVCC is a benefit. The research angle here is about >> enabling MVCC with hardware offload. Since I didn't explicitly >> mention it, the offload I refer to will respect all consistency >> guarantees also. >> >> It is the case that for the database to implement MVCC it must >> provide consistent read to multiple different versions of data, >> i.e. depending on the version used at transaction start. I'm not >> an expert on postgresql internals, but this must have some cost. >> I think the cost related to MVCC guarantees can roughly be >> categorized as: creating new versions (linking them in), version >> checking on read, garbage collecting old versions, and then there >> is an additional cost that I am interested in (again not claiming >> it is unnecessary in any sense) but there is a cost to generating >> the log. >> >> Thanks, by the way, for the warning about lab vs. reality. That's >> why I'm asking this question here. I want to keep the >> hypothetical tagged as such, but find defensible and realistic >> metrics where those exist, i.e. in this instance, we do have a >> database that can use MVCC. It should be possible to figure out >> how much work goes into maintaining that property. >> >> >> PostgreSQL uses a no overwrite storage mechanism, so any additional >> row versions are maintained in the same table alongside other rows. >> The MVCC actions are mostly mixed in with other aspects of the >> storage, so not isolated much for offload. >> >> Oracle has a different mechanism that does isolate changed row >> versions into a separate data structure, so would be much more >> amenable to offload than PostgreSQL, in its current form. >> >> Maybe look at SLRUs (clog etc) as a place to offload something? >> >> -- >> Simon Riggs http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/> >> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > In this list, the convention is to post replies at the end (with some rare exceptions), or interspersed when appropriate, and to omit parts no longer relevant. The motivation of bottom posting like this: is that people get to see the context before the reply, AND emails don't end up getting longer & longer as people reply at the beginning forgetting to trim the now irrelevant stuff at the end. Cheers, Gavin