Thread: What limits Postgres performance when the whole database lives in cache?

What limits Postgres performance when the whole database lives in cache?

From

"dandl"

Date:

02 September 2016, 10:48:59

Re this talk given by Michael Stonebraker:

http://slideshot.epfl.ch/play/suri_stonebraker

He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle, DB2, MS SQL Server, Postgres, given enough memory that the entire database lives in cache, the server will spend 96% of its memory cycles on unproductive overhead. This includes buffer management, locking, latching (thread/CPU conflicts) and recovery (including log file reads and writes).

[Enough memory in this case assumes that for just about any business, 1TB is enough. The intent of his argument is that a server designed correctly for it would run 25x faster.]

I wondered if there are any figures or measurements on Postgres performance in this ‘enough memory’ environment to support or contest this point of view?

Regards

David M Bennett FACS

Andl - A New Database Language - andl.org

Re: What limits Postgres performance when the whole database lives in cache?

From

Scott Marlowe

Date:

02 September 2016, 17:10:40

On Fri, Sep 2, 2016 at 4:49 AM, dandl <david@andl.org> wrote:
> Re this talk given by Michael Stonebraker:
>
> http://slideshot.epfl.ch/play/suri_stonebraker
>
>
>
> He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle, DB2, MS
> SQL Server, Postgres, given enough memory that the entire database lives in
> cache, the server will spend 96% of its memory cycles on unproductive
> overhead. This includes buffer management, locking, latching (thread/CPU
> conflicts) and recovery (including log file reads and writes).
>
>
>
> [Enough memory in this case assumes that for just about any business, 1TB is
> enough. The intent of his argument is that a server designed correctly for
> it would run 25x faster.]
>
>
>
> I wondered if there are any figures or measurements on Postgres performance
> in this ‘enough memory’ environment to support or contest this point of
> view?

What limits postgresql when everything fits in memory? The fact that
it's designed to survive a power outage and not lose all your data.

Stonebraker's new stuff is cool, but it is NOT designed to survive
total power failure.

Two totally different design concepts. It's apples and oranges to compare them.

Re: What limits Postgres performance when the whole database lives in cache?

From

Andres Freund

Date:

02 September 2016, 17:32:53

On 2016-09-02 11:10:35 -0600, Scott Marlowe wrote:
> On Fri, Sep 2, 2016 at 4:49 AM, dandl <david@andl.org> wrote:
> > Re this talk given by Michael Stonebraker:
> >
> > http://slideshot.epfl.ch/play/suri_stonebraker
> >
> >
> >
> > He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle, DB2, MS
> > SQL Server, Postgres, given enough memory that the entire database lives in
> > cache, the server will spend 96% of its memory cycles on unproductive
> > overhead. This includes buffer management, locking, latching (thread/CPU
> > conflicts) and recovery (including log file reads and writes).

I think those numbers are overblown, and more PR than reality.

But there certainly are some things that can be made more efficient if
you don't care about durability and replication.

> > I wondered if there are any figures or measurements on Postgres performance
> > in this ‘enough memory’ environment to support or contest this point of
> > view?

I don't think that's really answerable without individual use-cases in
mind.  Answering that question for analytics, operational, ... workloads
is going to look different, and the overheads are elsewhere.

I personally think that each implementations restrictions are more
likely to be an issue than anything "fundamental".

> What limits postgresql when everything fits in memory? The fact that
> it's designed to survive a power outage and not lose all your data.
>
> Stonebraker's new stuff is cool, but it is NOT designed to survive
> total power failure.
>
> Two totally different design concepts. It's apples and oranges to compare them.

I don't think they're that fundamentally different.

Greetings,

Andres Freund

Re: What limits Postgres performance when the whole database lives in cache?

From

Dorian Hoxha

Date:

02 September 2016, 18:35:18

Check out Voltdb (or Scylladb which is more different) for the changes in architecture required to achieve those performance increases.

On Fri, Sep 2, 2016 at 7:32 PM, Andres Freund <andres@anarazel.de> wrote:

On 2016-09-02 11:10:35 -0600, Scott Marlowe wrote:
> On Fri, Sep 2, 2016 at 4:49 AM, dandl <david@andl.org> wrote:
> > Re this talk given by Michael Stonebraker:
> >
> > http://slideshot.epfl.ch/play/suri_stonebraker
> >
> >
> >
> > He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle, DB2, MS
> > SQL Server, Postgres, given enough memory that the entire database lives in
> > cache, the server will spend 96% of its memory cycles on unproductive
> > overhead. This includes buffer management, locking, latching (thread/CPU
> > conflicts) and recovery (including log file reads and writes).

I think those numbers are overblown, and more PR than reality.

But there certainly are some things that can be made more efficient if
you don't care about durability and replication.

> > I wondered if there are any figures or measurements on Postgres performance
> > in this ‘enough memory’ environment to support or contest this point of
> > view?

I don't think that's really answerable without individual use-cases in
mind. Answering that question for analytics, operational, ... workloads
is going to look different, and the overheads are elsewhere.

I personally think that each implementations restrictions are more
likely to be an issue than anything "fundamental".

> What limits postgresql when everything fits in memory? The fact that
> it's designed to survive a power outage and not lose all your data.
>
> Stonebraker's new stuff is cool, but it is NOT designed to survive
> total power failure.
>
> Two totally different design concepts. It's apples and oranges to compare them.

I don't think they're that fundamentally different.

Greetings,

Andres Freund

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: What limits Postgres performance when the whole database lives in cache?

From

Bruce Momjian

Date:

02 September 2016, 22:24:36

On Fri, Sep  2, 2016 at 10:32:46AM -0700, Andres Freund wrote:
> On 2016-09-02 11:10:35 -0600, Scott Marlowe wrote:
> > On Fri, Sep 2, 2016 at 4:49 AM, dandl <david@andl.org> wrote:
> > > Re this talk given by Michael Stonebraker:
> > >
> > > http://slideshot.epfl.ch/play/suri_stonebraker
> > >
> > >
> > >
> > > He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle, DB2, MS
> > > SQL Server, Postgres, given enough memory that the entire database lives in
> > > cache, the server will spend 96% of its memory cycles on unproductive
> > > overhead. This includes buffer management, locking, latching (thread/CPU
> > > conflicts) and recovery (including log file reads and writes).
>
> I think those numbers are overblown, and more PR than reality.
>
> But there certainly are some things that can be made more efficient if
> you don't care about durability and replication.

Agreed. Stonebraker measured Shore DBMS, which is an academic database:

    http://research.cs.wisc.edu/shore/

If he had measured a production-quality database that had been optimized
like Postgres, I would take more stock of his "overhead" numbers.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

Re: What limits Postgres performance when the whole database lives in cache?

From

Peter Geoghegan

Date:

03 September 2016, 00:34:33

On Fri, Sep 2, 2016 at 10:32 AM, Andres Freund <andres@anarazel.de> wrote:
>
>> > I wondered if there are any figures or measurements on Postgres performance
>> > in this ‘enough memory’ environment to support or contest this point of
>> > view?
>
> I don't think that's really answerable without individual use-cases in
> mind.  Answering that question for analytics, operational, ... workloads
> is going to look different, and the overheads are elsewhere.
>
> I personally think that each implementations restrictions are more
> likely to be an issue than anything "fundamental".

+1

At one point, Stonebraker was regularly claiming that "crabbing" of
buffer locks in B-Trees was a fundamental overhead paid in systems
more or less based on System R. He did eventually start to acknowledge
that Lehman and Yao figured out a technique that made that untrue in
1981, if only barely [1], but the lesson for me was to take his claims
in this area with a generous pinch of salt.

[1] https://www.cs.cmu.edu/~pavlo/static/papers/stonebraker-ic2e2014.pdf
(See his citation 11)
--
Peter Geoghegan

Re: What limits Postgres performance when the whole database lives in cache?

From

"dandl"

Date:

03 September 2016, 00:38:38

> > Re this talk given by Michael Stonebraker:
> >
> > http://slideshot.epfl.ch/play/suri_stonebraker
> >
> >
> >
> > He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle,
> > DB2, MS SQL Server, Postgres, given enough memory that the entire
> > database lives in cache, the server will spend 96% of its memory
> > cycles on unproductive overhead. This includes buffer management,
> > locking, latching (thread/CPU
> > conflicts) and recovery (including log file reads and writes).
> >
> >
> >
> > [Enough memory in this case assumes that for just about any
> business,
> > 1TB is enough. The intent of his argument is that a server designed
> > correctly for it would run 25x faster.]
> >
> >
> >
> > I wondered if there are any figures or measurements on Postgres
> > performance in this ‘enough memory’ environment to support or
> contest
> > this point of view?
>
> What limits postgresql when everything fits in memory? The fact that
> it's designed to survive a power outage and not lose all your data.
>
> Stonebraker's new stuff is cool, but it is NOT designed to survive
> total power failure.

I don't think this is quite true. The mechanism he proposes has a small window in which committed transactions can be
lost,and this should be addressed by replication or by a small amount of UPC (a few seconds). 

But that isn't my question: I'm asking whether anyone *knows* any comparable figures for Postgres. IOW how much
performancegain might be available for different design choices. 

> Two totally different design concepts. It's apples and oranges to
> compare them.

Not to an end user. A system that runs 10x on OLTP and provides all the same functionality is a direct competitor.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org

Re: What limits Postgres performance when the whole database lives in cache?

From

"dandl"

Date:

03 September 2016, 00:43:50

> > > http://slideshot.epfl.ch/play/suri_stonebraker
> > >
> > >
> > >
> > > He makes the claim that in a modern ‘big iron’ RDBMS such as
> Oracle,
> > > DB2, MS SQL Server, Postgres, given enough memory that the entire
> > > database lives in cache, the server will spend 96% of its memory
> > > cycles on unproductive overhead. This includes buffer management,
> > > locking, latching (thread/CPU
> > > conflicts) and recovery (including log file reads and writes).
>
> I think those numbers are overblown, and more PR than reality.

Did you check out the presentation? He presents figures obtained by experiment from instrumentation. Even if it's only
90%instead of 96%, he has a point. 

> But there certainly are some things that can be made more efficient if
> you don't care about durability and replication.

He cares plenty. Durability and high availability both rely on active replication.

> > > I wondered if there are any figures or measurements on Postgres
> > > performance in this ‘enough memory’ environment to support or
> > > contest this point of view?
>
> I don't think that's really answerable without individual use-cases in
> mind.  Answering that question for analytics, operational, ...
> workloads is going to look different, and the overheads are elsewhere.

That's like: we don't have any figures for how fast your car will go: it depends on who's driving and how many
passengers.My answer is: yes, of course, but you can still provide figures for some specific set of conditions, and
they'llbe better than none at all. 

> I personally think that each implementations restrictions are more
> likely to be an issue than anything "fundamental".

Unlikely. But you can still obtain figures.

> > What limits postgresql when everything fits in memory? The fact that
> > it's designed to survive a power outage and not lose all your data.
> >
> > Stonebraker's new stuff is cool, but it is NOT designed to survive
> > total power failure.
> >
> > Two totally different design concepts. It's apples and oranges to
> compare them.
>
> I don't think they're that fundamentally different.

Agreed.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org

Re: What limits Postgres performance when the whole database lives in cache?

From

"dandl"

Date:

03 September 2016, 00:45:23

> Agreed. Stonebraker measured Shore DBMS, which is an academic
> database:
>
>     http://research.cs.wisc.edu/shore/
>
> If he had measured a production-quality database that had been
> optimized like Postgres, I would take more stock of his "overhead"
> numbers.

Exactly! And that's what I'm asking: has anyone done or know of any figures for Postgres, to set against these?

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org

Re: What limits Postgres performance when the whole database lives in cache?

From

'Bruce Momjian'

Date:

03 September 2016, 02:01:32

On Sat, Sep  3, 2016 at 10:45:47AM +1000, dandl wrote:
> > Agreed. Stonebraker measured Shore DBMS, which is an academic
> > database:
> >
> >     http://research.cs.wisc.edu/shore/
> >
> > If he had measured a production-quality database that had been
> > optimized like Postgres, I would take more stock of his "overhead"
> > numbers.
>
> Exactly! And that's what I'm asking: has anyone done or know of any figures for Postgres, to set against these?

Uh, well, there are Postgres tools that measure the overhead of locking
on queries and stuff.  I don't know any numbers myself.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

Re: What limits Postgres performance when the whole database lives in cache?

From

"dandl"

Date:

03 September 2016, 03:37:00

> >> > I wondered if there are any figures or measurements on Postgres
> >> > performance in this ‘enough memory’ environment to support or
> >> > contest this point of view?
> >
> > I don't think that's really answerable without individual use-cases
> in
> > mind.  Answering that question for analytics, operational, ...
> > workloads is going to look different, and the overheads are
> elsewhere.
> >
> > I personally think that each implementations restrictions are more
> > likely to be an issue than anything "fundamental".
>
> +1
>
> At one point, Stonebraker was regularly claiming that "crabbing" of
> buffer locks in B-Trees was a fundamental overhead paid in systems
> more or less based on System R. He did eventually start to acknowledge
> that Lehman and Yao figured out a technique that made that untrue in
> 1981, if only barely [1], but the lesson for me was to take his claims
> in this area with a generous pinch of salt.
>
> [1] https://www.cs.cmu.edu/~pavlo/static/papers/stonebraker-
> ic2e2014.pdf
> (See his citation 11)

The paper is substantially in agreement with the presentation I quoted. If there are differences in detail, they
certainlydon't dominate his argument. 

IMO your claim is far weaker. What specifically do you say is wrong about his current claims, and on what facts to you
baseit?  

In any event, I am not backing his claims. I am simply asking: does anyone have any facts to support or refute his 96%
claimas applied to Postgres? 

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org

Re: What limits Postgres performance when the whole database lives in cache?

From

Peter Geoghegan

Date:

03 September 2016, 04:04:57

On Fri, Sep 2, 2016 at 8:36 PM, dandl <david@andl.org> wrote:
> The paper is substantially in agreement with the presentation I quoted. If there are differences in detail, they
certainlydon't dominate his argument. 

My point is that the paper is rather light on details of the kind that
are really important. And, that it's noteworthy that Stonebraker has
in the past, during presentations, emphasized the buffer lock
crabbing/latch coupling thing *at length*, even though it's a totally
solved problem.

It's also true that Postgres has become vastly more scalable in the
past few years due to optimization that doesn't change the fundamental
nature of the system at all, so it's very easy to imagine individual
differences being more important than differences between major
classes of system.

Those are facts. You may take from them what you will.

> IMO your claim is far weaker. What specifically do you say is wrong about his current claims, and on what facts to
youbase it? 

I'm not the one making overarching conclusions. I'm not trying to
convince you of anything.

--
Peter Geoghegan

Re: What limits Postgres performance when the whole database lives in cache?

From

Simon Riggs

Date:

03 September 2016, 08:51:41

On 3 September 2016 at 04:36, dandl <david@andl.org> wrote:

> In any event, I am not backing his claims. I am simply asking: does anyone have any facts to support or refute his
96%claim as applied to Postgres? 

If that is scientific research he will publish evidence. If not, its
just words and no refutation is required.

Anybody can come here and discuss new features. Anybody. They just
need to explain their thoughts and produce evidence for their
assertions.

Come on in, database researchers, we're open to rational contributions.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: What limits Postgres performance when the whole database lives in cache?

From

Jim Nasby

Date:

07 September 2016, 23:46:47

On 9/2/16 7:39 PM, dandl wrote:
> I don't think this is quite true. The mechanism he proposes has a small window in which committed transactions can be
lost,and this should be addressed by replication or by a small amount of UPC (a few seconds). 

Except that's the entire point where all those kind of solutions
*completely* depart ways from Postgres. Postgres is designed to *lose
absolutely no data after a COMMIT*, potentially including requiring that
data to be synchronized out to a second server. That is worlds apart
from "we might lose a few seconds", and there's a lot of stuff Postgres
has to worry about to accomplish that. Some of that stuff can be
short-circuited if you don't care (that's what SET synchronous_commit =
off does), but there's always going to be some amount of extra work to
support synchronous_commit = local or remote_*.

Presumably there's more improvements that could be made to Postgres in
this area, but if you really don't care about losing seconds worth of
data and you need absolutely the best performance possible then maybe
Postgres isn't the right choice for you.

"All databases suck, each one just sucks in a different way." - Me,
circa 1999.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: What limits Postgres performance when the whole database lives in cache?

From

"dandl"

Date:

08 September 2016, 00:35:36

> > I don't think this is quite true. The mechanism he proposes has a
> small window in which committed transactions can be lost, and this
> should be addressed by replication or by a small amount of UPC (a few
> seconds).
>
> Except that's the entire point where all those kind of solutions
> *completely* depart ways from Postgres. Postgres is designed to *lose
> absolutely no data after a COMMIT*, potentially including requiring
> that data to be synchronized out to a second server. That is worlds
> apart from "we might lose a few seconds", and there's a lot of stuff
> Postgres has to worry about to accomplish that. Some of that stuff can
> be short-circuited if you don't care (that's what SET
> synchronous_commit = off does), but there's always going to be some
> amount of extra work to support synchronous_commit = local or
> remote_*.

I understand that. What I'm trying to get a handle on is the magnitude of that cost and how it influences other parts
ofthe product, specifically for Postgres. If the overhead for perfect durability were (say) 10%, few people would care
aboutthe cost. But Stonebraker puts the figure at 2500%! His presentation says that a pure relational in-memory store
canbeat a row store with disk fully cached in memory by 10x to 25x. [Ditto column stores beat row stores by 10x for
complexqueries in non-updatable data.] 

So my question is not to challenge the Postgres way. It's simply to ask whether there are any known figures that would
directlysupport or refute his claims. Does Postgres really spend 96% of its time in thumb-twiddling once the entire
databaseresides in memory? 

> Presumably there's more improvements that could be made to Postgres in
> this area, but if you really don't care about losing seconds worth of
> data and you need absolutely the best performance possible then maybe
> Postgres isn't the right choice for you.

Achieving durability for an in-memory database requires either UPS or active replication or both, which is an
additionalcost that is not needed for every application. My question precedes that one: is there a big performance gain
therefor the taking, or is it smoke and mirrors? 

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org

Re: What limits Postgres performance when the whole database lives in cache?

From

Nicolas Grilly

Date:

08 September 2016, 08:15:44

On Thu, Sep 8, 2016 at 2:35 AM, dandl <david@andl.org> wrote:

I understand that. What I'm trying to get a handle on is the magnitude of that cost and how it influences other parts of the product, specifically for Postgres. If the overhead for perfect durability were (say) 10%, few people would care about the cost. But Stonebraker puts the figure at 2500%! His presentation says that a pure relational in-memory store can beat a row store with disk fully cached in memory by 10x to 25x. [Ditto column stores beat row stores by 10x for complex queries in non-updatable data.]

VoltDB replication is synchronous in the same cluster/data center, and asynchronous with a remote cluster/data center. As a consequence, if your application needs to survive a data center power failure with zero data loss, then you have to enable VoltDB's synchronous command logging (which by the way is not available in the Community Edition — only in the Enterprise Edition). When Stonebraker says VoltDB's throughput is 10~25x greater, I'd guess this is with no command logging at all, and no periodic snapshotting.

So my question is not to challenge the Postgres way. It's simply to ask whether there are any known figures that would directly support or refute his claims. Does Postgres really spend 96% of its time in thumb-twiddling once the entire database resides in memory?

Alas, I've been unable to find any relevant benchmark. I'm not motivated enough to install a PostgreSQL and VoltDB and try it for myself :-)

Re: What limits Postgres performance when the whole database lives in cache?

From

Jim Nasby

Date:

11 September 2016, 02:33:36

On 9/8/16 3:15 AM, Nicolas Grilly wrote:
>     So my question is not to challenge the Postgres way. It's simply to
>     ask whether there are any known figures that would directly support
>     or refute his claims. Does Postgres really spend 96% of its time in
>     thumb-twiddling once the entire database resides in memory?
>
>
> Alas, I've been unable to find any relevant benchmark. I'm not motivated
> enough to install a PostgreSQL and VoltDB and try it for myself :-)

My guess is this is a test scenario that completely favors VoltDB while
hamstringing Postgres, such as using no transaction durability at all in
VoltDB while using maximum durability in Postgres. Comparing the cost of
every COMMIT doing an fsync vs not could certainly produce a 25x
difference. There could be other cases where you'd get a 25x difference.

You need to be careful of benchmarks from commercial companies. MySQL
used to tout how fast it was compared to Postgres, using a benchmark it
created specifically for that purpose that had very little to do with
the real world. People eventually discovered that as soon as you had a
concurrent workload Postgres was actually faster.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: What limits Postgres performance when the whole database lives in cache?

From

"dandl"

Date:

11 September 2016, 05:20:42

> From: Jim Nasby [mailto:Jim.Nasby@BlueTreble.com]
> My guess is this is a test scenario that completely favors VoltDB
> while hamstringing Postgres, such as using no transaction durability
> at all in VoltDB while using maximum durability in Postgres. Comparing
> the cost of every COMMIT doing an fsync vs not could certainly produce
> a 25x difference. There could be other cases where you'd get a 25x
> difference.

I guess my question then is: how much do you pay for that durability? If you benchmark Postgres configured for pure
in-memoryusage with absolutely no writes to disk (or SSD or network), where is it spending its time? Is there a lot of
overheadin getting data in and out of cache buffers and conversions and in concurrency control? 

As a case study, assume an RBMS is required to monitor and record Internet (or phone or VHF) traffic. If the power goes
offthe traffic continues, and it really doesn’t matter whether you lose 60 seconds of down time or 63 seconds; in any
caseanother instance in another data centre will pick up the slack. So the requirement is atomicity yes, but not
durability.Should you bid Postgres for the job, or look elsewhere? How much slower would Postgres be than a competitor?
Dowe care? 

> You need to be careful of benchmarks from commercial companies. MySQL
> used to tout how fast it was compared to Postgres, using a benchmark
> it created specifically for that purpose that had very little to do
> with the real world. People eventually discovered that as soon as you
> had a concurrent workload Postgres was actually faster.

Of course; but at the same time insisting on including durability favours Postgres when I'm actually asking about
alternatives.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org

Re: What limits Postgres performance when the whole database lives in cache?

From

Pavel Stehule

Date:

11 September 2016, 05:49:19

2016-09-11 7:20 GMT+02:00 dandl <david@andl.org>:

> From: Jim Nasby [mailto:Jim.Nasby@BlueTreble.com]
> My guess is this is a test scenario that completely favors VoltDB
> while hamstringing Postgres, such as using no transaction durability
> at all in VoltDB while using maximum durability in Postgres. Comparing
> the cost of every COMMIT doing an fsync vs not could certainly produce
> a 25x difference. There could be other cases where you'd get a 25x
> difference.

I guess my question then is: how much do you pay for that durability? If you benchmark Postgres configured for pure in-memory usage with absolutely no writes to disk (or SSD or network), where is it spending its time? Is there a lot of overhead in getting data in and out of cache buffers and conversions and in concurrency control?

It is not about durability only.

Postgres holds data in format equal or similar to saved data on persistent storage. There are repeated serialization and deserialization. Some structures are designed to be simply saved (like Btree), but the performance is second target.

I believe so new memory databases can be 10-100x faster - depends on use case, because they hold data primary in memory and uses different data structures. The performance of these databases is great, when all data are well placed in memory all time. But the performance is pretty bad, when this rule is not true. There is another issue - when you increase speed of database write operations, probably you will hit a file system limits, spin lock issues - so it is one reason, why big system are based on distributed systems more and more.

Regards

Pavel

As a case study, assume an RBMS is required to monitor and record Internet (or phone or VHF) traffic. If the power goes off the traffic continues, and it really doesn’t matter whether you lose 60 seconds of down time or 63 seconds; in any case another instance in another data centre will pick up the slack. So the requirement is atomicity yes, but not durability. Should you bid Postgres for the job, or look elsewhere? How much slower would Postgres be than a competitor? Do we care?

> You need to be careful of benchmarks from commercial companies. MySQL
> used to tout how fast it was compared to Postgres, using a benchmark
> it created specifically for that purpose that had very little to do
> with the real world. People eventually discovered that as soon as you
> had a concurrent workload Postgres was actually faster.

Of course; but at the same time insisting on including durability favours Postgres when I'm actually asking about alternatives.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: What limits Postgres performance when the whole database lives in cache?

From

"dandl"

Date:

11 September 2016, 07:23:33

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Pavel Stehule

I guess my question then is: how much do you pay for that durability? If you benchmark Postgres configured for pure in-memory usage with absolutely no writes to disk (or SSD or network), where is it spending its time? Is there a lot of overhead in getting data in and out of cache buffers and conversions and in concurrency control?

It is not about durability only.

That’s the point I’m making, exactly. The question is: does anyone have a handle on how big that cost really is, as a guide to whether to try to do anything about it? Is it really 25x as Stonebraker says?

Regards

David M Bennett FACS

Andl - A New Database Language - andl.org

Re: What limits Postgres performance when the whole database lives in cache?

From

Pavel Stehule

Date:

11 September 2016, 07:37:25

2016-09-11 9:23 GMT+02:00 dandl <david@andl.org>:

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Pavel Stehule

I guess my question then is: how much do you pay for that durability? If you benchmark Postgres configured for pure in-memory usage with absolutely no writes to disk (or SSD or network), where is it spending its time? Is there a lot of overhead in getting data in and out of cache buffers and conversions and in concurrency control?

It is not about durability only.
Postgres holds data in format equal or similar to saved data on persistent storage. There are repeated serialization and deserialization. Some structures are designed to be simply saved (like Btree), but the performance is second target.
I believe so new memory databases can be 10-100x faster - depends on use case, because they hold data primary in memory and uses different data structures. The performance of these databases is great, when all data are well placed in memory all time. But the performance is pretty bad, when this rule is not true. There is another issue - when you increase speed of database write operations, probably you will hit a file system limits, spin lock issues - so it is one reason, why big system are based on distributed systems more and more.
That’s the point I’m making, exactly. The question is: does anyone have a handle on how big that cost really is, as a guide to whether to try to do anything about it? Is it really 25x as Stonebraker says?

Some years ago the MonetDB developers did comparing with C++ applications. The cost of generic database is difference of speed between generic database and native C application.

I did some benchmarks of MonetDB and it is really pretty fast for OLAP.

You can try to implement TPC-B benchmark in C (without SQL - the cost of SQL is not significant), and you can check it.

Regards

Pavel

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org