Thread: A not so good comparison of MVCC implementations

A not so good comparison of MVCC implementations

From

Thomas Kellerer

Date:

26 January 2018, 12:53:40

https://dzone.com/articles/database-design-decisions-for-multi-version-concur

That doesn't make Postgres look particular well

Re: A not so good comparison of MVCC implementations

From

Stephen Frost

Date:

26 January 2018, 15:22:51

Greetings,

* Thomas Kellerer (spam_eater@gmx.net) wrote:
> https://dzone.com/articles/database-design-decisions-for-multi-version-concur
>
> That doesn't make Postgres look particular well

While interesting, if I'm following the paper correctly, they didn't
actually test *Postgres*, they tested their own implementation of how PG
works using "Peloton".  They also, apparently, discounted latency pretty
heavily given that their graph shows their "PG" implementation having
the lowest latency of all of the options.  If my reading is correct and
they didn't actually test these systems but just their own
implementation then it strikes me that this paper and those graphs are
particularly disingenuous and throw around these product names
specifically to try and garner attention.  The findings in the paper may
still be useful, of course, but it's unclear how much real-world
implication they have for users of the different products and if one
product would work better for a given user or workload than another.

One thing mentioned is the idea, again, of having indexes which include
the primary key of the table (a logical ID instead of the physical tuple
location) which has been discussed and patches proposed for.  That
seemed to be combined with the idea of flipping HOT chains to have the
latest version first instead of last in the chain.  Using logical IDs
instead of physical ones can reduce the updates required for indexes on
tables which have more than just the primary key and where the primary
key only rarely changes, since that ends up becoming more expensive.  Of
course, that also means that double-lookups are required when using
those non-primary-key indexes, which may explain the higher latency seen
in the approaches tested which use that.

This was just a quick review of the paper and article, just to be clear,
but it doesn't strike me as particularly concerning.  Unsurprisingly,
there are lots of trade-offs to be made and we continue to look at
ways to make PostgreSQL more flexible to allow users to choose which
trade-offs work best for their workload.

Thanks!

Stephen

Attachment

signature.asc

Re: A not so good comparison of MVCC implementations

From

Robert Haas

Date:

26 January 2018, 21:08:35

On Fri, Jan 26, 2018 at 7:22 AM, Stephen Frost <sfrost@snowman.net> wrote:
> * Thomas Kellerer (spam_eater@gmx.net) wrote:
>> https://dzone.com/articles/database-design-decisions-for-multi-version-concur
>>
>> That doesn't make Postgres look particular well
>
> While interesting, if I'm following the paper correctly, they didn't
> actually test *Postgres*, they tested their own implementation of how PG
> works using "Peloton".

Yeah, that's really deceptive.

> They also, apparently, discounted latency pretty
> heavily given that their graph shows their "PG" implementation having
> the lowest latency of all of the options.

Also, they seem to be comparing against PostgreSQL with SSI running
(transaction isolation level serializable) which is not actually the
way that people typically configure PostgreSQL.

The point of the article seems to be to say that NuoDB made some good
design decisions, rather than to objective compare existing systems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: A not so good comparison of MVCC implementations

From

Christophe Pettus

Date:

26 January 2018, 21:27:00

> On Jan 26, 2018, at 10:08, Robert Haas <robertmhaas@gmail.com> wrote:
>
> The point of the article seems to be to say that NuoDB made some good
> design decisions, rather than to objective compare existing systems.

It does remind me a bit of the Uber paper, in that they started with a technical decision they had already made, and
workedbackwards from there.

Re: A not so good comparison of MVCC implementations

From

"Jonathan S. Katz"

Date:

26 January 2018, 22:07:21

Hi,

> On Jan 26, 2018, at 1:08 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Jan 26, 2018 at 7:22 AM, Stephen Frost <sfrost@snowman.net> wrote:
>> * Thomas Kellerer (spam_eater@gmx.net) wrote:
>>> https://dzone.com/articles/database-design-decisions-for-multi-version-concur
>>>
>>> That doesn't make Postgres look particular well
>>
>> While interesting, if I'm following the paper correctly, they didn't
>> actually test *Postgres*, they tested their own implementation of how PG
>> works using "Peloton".
>
> Yeah, that's really deceptive.

Skimming the paper, it also does not mention which versions of the software
are being used.  Ideally how the DBs were configured on the hardware
would be great to see too, but that may be asking too much.

>> They also, apparently, discounted latency pretty
>> heavily given that their graph shows their "PG" implementation having
>> the lowest latency of all of the options.
>
> Also, they seem to be comparing against PostgreSQL with SSI running
> (transaction isolation level serializable) which is not actually the
> way that people typically configure PostgreSQL.
>
> The point of the article seems to be to say that NuoDB made some good
> design decisions, rather than to objective compare existing systems.

So the question is if and how we respond.  From a scan of the Twittersphere
I do not see much talk about the paper, so I would not give it that much
thought at this point and would not advocate for proactively addressing it.

However, if anyone wants to independently benchmark it and provide some fair
comparisons, that is something that we’ve certainly promoted through Planet
PostgreSQL.

Additionally, if anyone wants to comment to others who are referencing that paper
e.g. on Twitter etc. there are enough sound points in this thread alone to help
make the case of Postgres even without additional data.

> Unsurprisingly,
> there are lots of trade-offs to be made and we continue to look at
> ways to make PostgreSQL more flexible to allow users to choose which
> trade-offs work best for their workload.

+1

Jonathan

Re: A not so good comparison of MVCC implementations

From

Robert Haas

Date:

27 January 2018, 03:03:09

On Fri, Jan 26, 2018 at 2:07 PM, Jonathan S. Katz <jkatz@postgresql.org> wrote:
>> Yeah, that's really deceptive.
>
> Skimming the paper, it also does not mention which versions of the software
> are being used.  Ideally how the DBs were configured on the hardware
> would be great to see too, but that may be asking too much.

That's because they didn't use *any* version of PostgreSQL.  They
tested something that they claim works *like* PostgreSQL but is
actually not the PostgreSQL code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: A not so good comparison of MVCC implementations

From

"Jonathan S. Katz"

Date:

29 January 2018, 01:07:43

Hi Robert,

> On Jan 26, 2018, at 7:03 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Jan 26, 2018 at 2:07 PM, Jonathan S. Katz <jkatz@postgresql.org> wrote:
>>> Yeah, that's really deceptive.
>>
>> Skimming the paper, it also does not mention which versions of the software
>> are being used.  Ideally how the DBs were configured on the hardware
>> would be great to see too, but that may be asking too much.
>
> That's because they didn't use *any* version of PostgreSQL.  They
> tested something that they claim works *like* PostgreSQL but is
> actually not the PostgreSQL code.

To clarify, that comment was based on all the databases they were using,
not just PostgreSQL.

Thanks,

Jonathan

Re: A not so good comparison of MVCC implementations

From

Michael Paquier

Date:

29 January 2018, 07:53:38

On Sun, Jan 28, 2018 at 05:07:43PM -0500, Jonathan S. Katz wrote:
>> On Jan 26, 2018, at 7:03 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Fri, Jan 26, 2018 at 2:07 PM, Jonathan S. Katz <jkatz@postgresql.org> wrote:
>>>> Yeah, that's really deceptive.
>>>
>>> Skimming the paper, it also does not mention which versions of the software
>>> are being used.  Ideally how the DBs were configured on the hardware
>>> would be great to see too, but that may be asking too much.
>>
>> That's because they didn't use *any* version of PostgreSQL.  They
>> tested something that they claim works *like* PostgreSQL but is
>> actually not the PostgreSQL code.
>
> To clarify, that comment was based on all the databases they were using,
> not just PostgreSQL.

Their article never uses "configuration", "configure" and has no mention
about what kind of tuning they've done for any systems.
--
Michael

Attachment

signature.asc