Re: Four issues why "old elephants" lack performance: Explanation sought Four issues why "old elephants" lack performance: Explanation sought - Mailing list pgsql-general
From | Chris Travers |
---|---|
Subject | Re: Four issues why "old elephants" lack performance: Explanation sought Four issues why "old elephants" lack performance: Explanation sought |
Date | |
Msg-id | CAKt_ZfsJ6aOj7nkrGUXOMSBrFHExR=Eey6tbYdqSzVhYJ_dC2g@mail.gmail.com Whole thread Raw |
In response to | Re: Four issues why "old elephants" lack performance: Explanation sought Four issues why "old elephants" lack performance: Explanation sought (Stefan Keller <sfkeller@gmail.com>) |
Responses |
Re: Four issues why "old elephants" lack performance: Explanation
sought Four issues why "old elephants" lack performance: Explanation sought
Re: Four issues why "old elephants" lack performance: Explanation sought Four issues why "old elephants" lack performance: Explanation sought |
List | pgsql-general |
On Mon, Feb 27, 2012 at 3:46 AM, Stefan Keller <sfkeller@gmail.com> wrote:
Best Wishes,
Hi,...
2012/2/27 Chris Travers <chris.travers@gmail.com> wrote:
>> 1. Buffering Pool
>>
>> To get rid of I/O bounds Mike proposes in-memory database structures.>> Now I'm still wondering why PG could'nt realize that probably inThat means, that this could be enhanced in PG.
>> combination with unlogged tables? I don't overview the respective code
>> but I think it's worthwhile to discuss even if implementation of
>> memory-oriented structures would be to difficult.
>
>
> The reason is that the data structures assume disk-based data structures, so
> they are written to be efficient to look up on disk but not as efficient in
> memory.
Is there really no research or implementation projects going on in the
direction where all table content can be hold in-memory? This could be
enhanced in many ways (besides optimized in-memory structures)
including index-only scans.
So if I want to read a row in a middle of a table and it is now on disk, I have to read the whole thing into memory first? I see so many problems with trying to do in-memory-structure on disk that I just don't see that as happening.
Ok, that's interesting too, looking at Postgres-XC (eXtensible
> Note that VoltDB is a niche product and Stonebreaker makes this pretty
> clear. However, the more interesting question is what the tradeoffs are
> when looking at VoltDB vs Postgres-XC.
Cluster) is a multi-master write-scalable PostgreSQL cluster based on
shared-nothing architecture.
But I'm thinking more about enhancing PostgreSQL core.
VoltDB niche product? Look at his USENIX conferenc speech
(http://www.youtube.com/watch?v=uhDM4fcI2aI ) minute 11:53: There he
puts VoltDB into "high OLTP" and "old elephants" into category "low"
which only get the crevices and which have "The Innovators Dilemma".
Strangely, he doesn't consider PostgreSQL to be an elephant...... His point is, as I understood it, that open source databases like PostgreSQL will drive the proprietary equivalents out of the market because they cannot reach these specific challenges. That's why the crevices exists between open source and the niche corners of his triangle.
My thesis is that PG doesn't have that problem necessarily because
it's open source and can be (and has been) refactored since then.
If you watch the Usenix speech closely you will notice something.
Two of the major sources of overhead have to do with durability (you know, writing to the hard disk), and the other two have to do with concurrency. His solution is to get rid intra-machine durability (durability only exists distributed throughout the cluster in VoltDB), and also get rid of concurrency. As he puts it, there's no need to deschedule a process that's going to run for only 50 microseconds. This is why this is a niche product. It requires a specialized environment to run with real ACID compliance. This isn't something like hydrolics in back-hoes. It's a fundamental attribute of their innovative architecture.
There are a heck of a lot of environments where this sort of approach really doesn't make sense. In fact I would say that the majority of databases deployed in the future will be in areas where it doesn't make sense. I'd say, realistically, that you have to be at least close to the edge of what you can reasonably do on modern hardware with new versions of PostgreSQL before it makes sense to consider that tradeoff. That means, what, 100000 transactions per sec?
For the sorts of applications I write, the costs of going with something like VoltDB would easily eclipse the benefits in every single deployment, and moreover this is not due to questions of the maturity of the technology but rather of fundamental design choices. There are a lot really, really cool things VoltDB would be good at doing (from state handling of MMORGS to handling of huge numbers of inputs from a million sensors in real time, and providing reporting data on these).
But those are niche markets.
Best Wishes,
Chris Travers
pgsql-general by date: