Re: Scalability in postgres - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Scalability in postgres
Date
Msg-id f67928030908161455x620bbe21ue6e8f73d77f63d83@mail.gmail.com
Whole thread Raw
In response to Re: Scalability in postgres  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Fri, Aug 14, 2009 at 4:21 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> I apologize if it is bad form to respond to a message that is two
>> months old, but I did not see this question answered elsewhere and
>> thought it would be helpful to have it answered.  This my rough
>> understanding.  Oracle never "takes" a snapshot, it computes one the
>> fly, if and when it is needed.  It maintains a structure of recently
>> committed transactions, with the XID for when they committed.  If a
>> process runs into a tuple that is neither from the future nor from the
>> deep past, it consults this structure to see if that transaction has
>> committed, and if so whether it did so before or after the current
>> query was started.  The structure is partionable so it does not have
>> one global lock to serialize on, and the lock is short as it only gets
>> the info it needs, not the entire set of global open transactions.
>
> Are you sure it's partitionable?

I don't have inside knowledge, but I'm pretty sure that that structure is
partionable.  Each data block has in its header a list of in-doubt
transactions touching that block, and a link to where in the rollback/UNDO
to find info on each one.  The UNDO header knows that transaction's status.

Of course there is still the global serialization on obtaining
the SCN, but the work involved in obtaining that (other than
fighting over the lock) is constant, it doesn't increase with the number of
backends. Real Applications Clusters must have solved that somehow,
I don't recall how.  But I think it does make compromises, like in read
committed mode a change made by another transaction might be invisible
to your simple select statements for up to 3 seconds or so.  I've never
had the opportunity to play with a RAC.

For all I know, the work of scanning ProcArray is trivial compared to the
work of obtaining the lock, even if the array is large.  If I had the talent
to write and run stand alone programs that could attach themselves to
the shared memory structure and then run my arbitrary code, I would
test that out.

> I've been told that Oracle's
> transaction log is a serious scalability bottleneck.  (But I think
> I first heard that in 2001, so maybe they've improved it in recent
> releases.)

Well, something always has to be the bottleneck.  Do you know at what
degree of scaling that became a major issue?  I don't think that there
is a point of global serialization, other than taking SCNs, but if
there is enough pair-wise fighting, it could still add up to a lot
of contention.

> We know that Postgres' WAL log is a bottleneck --- check
> for recent discussions involving XLogInsert.

Would these two be good places for me to start looking into that:

http://archives.postgresql.org/pgsql-hackers/2009-06/msg01205.php
http://archives.postgresql.org/pgsql-hackers/2009-06/msg01019.php

Or is bulk-copy (but with WAL logging) to specific to apply findings to the
general case?

> But the WAL log is
> only touched by read-write transactions, whereas in Oracle even
> read-only transactions often have to go to the transaction log.

That's true, but any given read only transaction shouldn't have to make heavy
use of the transaction log just to decide if a transaction has committed
or not.  It should be able to look that up once and cache it for the rest
of that subtran.  Of course if it actually has to construct a consistent
read from the UNDO on many different buffers due to the same interfering
transaction, that is more work and more contention.

Cheers,

Jeff

pgsql-performance by date:

Previous
From: Jean-Max Reymond
Date:
Subject: Re: Less expensive proprietary or Open source ETL tools
Next
From: Matthew Wakeling
Date:
Subject: Re: Memory reporting on CentOS Linux