Thread: How is statement level read consistency implemented?

How is statement level read consistency implemented?

From

Thomas Kellerer

Date:

22 April 2008, 10:41:28

Hi,

in my office we were discussing the various ways statement level read consistency is implemented in different
databases,namely Oracle and Postgres. 

I am interested in the technical details on how PG determines that a block needs to be read from from "some other place
thanthe data block" because another transaction has updated the data block. 

I'm referring to the scenario that Tom Kyte describes here:
www.oracle.com/technology/oramag/oracle/05-nov/o65asktom.html
and how PG would detect that row 342,023 has been modified by a different transaction (and thus reads the correct
value,unlike "the other database") 

Oracle assigns a SCN (system change number) to each transaction, each block contains the SCN for which it is valid, any
numberhigher than the current SCN indicates that the block has to be taken from the rollback segment. 

How is this "test" implemented in Postgres?
Does it have a similar concept (SCN) or does WAL imply a completely different way?


Thanks in advance
Thomas

Re: How is statement level read consistency implemented?

From

"Pavan Deolasee"

Date:

22 April 2008, 10:57:25

On Tue, Apr 22, 2008 at 4:11 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:
>
>
>  I am interested in the technical details on how PG determines that a block
> needs to be read from from "some other place than the data block" because
> another transaction has updated the data block.
>

Postgres uses MVCC for transaction consistency. When a row is updated,
a new copy of the row is created and the old version is marked for
deletion (though its not removed immediately). The old readers
continue to read from the old copy whereas new transactions see the
new copy. This is all managed by XMIN and XMAX which are transaction
ids of the inserting and deleting transactions respectively and
control the visibility of the different versions.

If you want to understand MVCC, please see this excellent
documentation by Tom Lane:
http://www.postgresql.org/files/developer/transactions.pdf

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Re: How is statement level read consistency implemented?

From

Martijn van Oosterhout

Date:

22 April 2008, 11:07:46

On Tue, Apr 22, 2008 at 12:41:08PM +0200, Thomas Kellerer wrote:
> I'm referring to the scenario that Tom Kyte describes here:
> www.oracle.com/technology/oramag/oracle/05-nov/o65asktom.html
> and how PG would detect that row 342,023 has been modified by a different
> transaction (and thus reads the correct value, unlike "the other database")

Check the docs, but the basic ideas are:
- Every transaction has a transaction ID and a snapshot which
identifies the transactions whose results it is allowed to see.
- Every row has a start and end transaction ID which identifes who
created it and who deleted it.
- Each row also has a ctid pointer which points to the newer version of
that row.

What will happen in this case is that the updating transaction will
duplicate the row, setting the end xid of the old version and the
start xid of the new version to its own xid. Any parallel scan will see
either the new version or the old version depending on its snapshot.

If you end up trying to update a row someone else updated in the
meantime, the ctid is followed to the new version, the tuple is
retested for matching and then updated.

See http://www.postgresql.org/docs/8.3/interactive/mvcc.html

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

signature.asc

Re: How is statement level read consistency implemented?

From

Thomas Kellerer

Date:

22 April 2008, 11:11:25

Pavan Deolasee, 22.04.2008 12:57:
> On Tue, Apr 22, 2008 at 4:11 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:
>>
>>  I am interested in the technical details on how PG determines that a block
>> needs to be read from from "some other place than the data block" because
>> another transaction has updated the data block.
>>
> Postgres uses MVCC for transaction consistency. When a row is updated,
> a new copy of the row is created and the old version is marked for
> deletion (though its not removed immediately). The old readers
> continue to read from the old copy whereas new transactions see the
> new copy. This is all managed by XMIN and XMAX which are transaction
> ids of the inserting and deleting transactions respectively and
> control the visibility of the different versions.
>
> If you want to understand MVCC, please see this excellent
> documentation by Tom Lane:
> http://www.postgresql.org/files/developer/transactions.pdf

The xmin, xmax info is basically what I was looking for.

As far as I can tell (from the PDF and your quote) Postgres uses a very similar concept as Oracle. .
Each transaction has a uniqe number and each tuple contains the information for which transaction number it is visible.


Thanks for the quick answer.
Thomas

Re: How is statement level read consistency implemented?

From

"Roberts, Jon"

Date:

22 April 2008, 12:58:19

> Pavan Deolasee, 22.04.2008 12:57:
> > On Tue, Apr 22, 2008 at 4:11 PM, Thomas Kellerer
<spam_eater@gmx.net>
> wrote:
> >>
> >>  I am interested in the technical details on how PG determines that
a
> block
> >> needs to be read from from "some other place than the data block"
> because
> >> another transaction has updated the data block.
> >>
> > Postgres uses MVCC for transaction consistency. When a row is
updated,
> > a new copy of the row is created and the old version is marked for
> > deletion (though its not removed immediately). The old readers
> > continue to read from the old copy whereas new transactions see the
> > new copy. This is all managed by XMIN and XMAX which are transaction
> > ids of the inserting and deleting transactions respectively and
> > control the visibility of the different versions.
> >
> > If you want to understand MVCC, please see this excellent
> > documentation by Tom Lane:
> > http://www.postgresql.org/files/developer/transactions.pdf
>
> The xmin, xmax info is basically what I was looking for.
>
> As far as I can tell (from the PDF and your quote) Postgres uses a
very
> similar concept as Oracle. .
> Each transaction has a uniqe number and each tuple contains the
> information for which transaction number it is visible.

Oracle moves the old row(s) to the rollback segment instead of putting a
new row in the table as PostgreSQL does.  The new value goes right in
place of the old row and it adds a pointer to the rollback segment.

A commit removes the pointer and a rollback forces the db to move the
old row from the rollback segment back in place of the row updated.
Oracle's rollback segment can be tuned so that retains data up to a
certain age and then it is flushed.  If rollback is too small, you can
get a "snapshot too old" error.

Back to PostgreSQL, when a vacuum is issued, the old deleted rows are
removed.  This can be done with the autovacuum feature or manually.

-Does vacuum make sure there are no long running queries referencing the
deleted rows before it attempts to remove the old rows?
-With autovacuum, does it skip these rows still being referenced in a
transaction or does it wait?

I'm guessing that you will never get the snapshot too old error in
PostgreSQL based on the architecture.

Jon

Re: How is statement level read consistency implemented?

From

Thomas Kellerer

Date:

22 April 2008, 13:07:17

Roberts, Jon, 22.04.2008 14:56:
>> As far as I can tell (from the PDF and your quote) Postgres uses a
> very
>> similar concept as Oracle. .
>> Each transaction has a uniqe number and each tuple contains the
>> information for which transaction number it is visible.
>
> Oracle moves the old row(s) to the rollback segment instead of putting a
> new row in the table as PostgreSQL does.  The new value goes right in
> place of the old row and it adds a pointer to the rollback segment.

> A commit removes the pointer and a rollback forces the db to move the
> old row from the rollback segment back in place of the row updated.
> Oracle's rollback segment can be tuned so that retains data up to a
> certain age and then it is flushed.  If rollback is too small, you can
> get a "snapshot too old" error.

Thanks, this was exactly the answer I was looking for!


Cheers
Thomas

Re: How is statement level read consistency implemented?

From

Alvaro Herrera

Date:

22 April 2008, 13:10:35

Roberts, Jon wrote:

> Back to PostgreSQL, when a vacuum is issued, the old deleted rows are
> removed.  This can be done with the autovacuum feature or manually.
>
> -Does vacuum make sure there are no long running queries referencing the
> deleted rows before it attempts to remove the old rows?

It does.

> -With autovacuum, does it skip these rows still being referenced in a
> transaction or does it wait?

It skips them, the idea being that a future vacuum will remove them.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: How is statement level read consistency implemented?

From

"Roberts, Jon"

Date:

22 April 2008, 13:25:06

> > Back to PostgreSQL, when a vacuum is issued, the old deleted rows
are
> > removed.  This can be done with the autovacuum feature or manually.
> >
> > -Does vacuum make sure there are no long running queries referencing
the
> > deleted rows before it attempts to remove the old rows?
>
> It does.
>
> > -With autovacuum, does it skip these rows still being referenced in
a
> > transaction or does it wait?
>
> It skips them, the idea being that a future vacuum will remove them.
>

Awesome.  In a large data warehouse, the snapshot too old error is very
annoying and I'm glad PostgreSQL is superior to Oracle in this regard.
:)



Jon

Re: How is statement level read consistency implemented?

From

Alvaro Herrera

Date:

22 April 2008, 13:35:47

Roberts, Jon wrote:

> > > -With autovacuum, does it skip these rows still being referenced
> > > in a transaction or does it wait?
> >
> > It skips them, the idea being that a future vacuum will remove them.
>
> Awesome.  In a large data warehouse, the snapshot too old error is very
> annoying and I'm glad PostgreSQL is superior to Oracle in this regard.
> :)

Well, the disadvantage of the PostgreSQL way is that it keeps dead rows
around for longer than they're actually needed, and so it causes some
problems in pathological conditions -- for example when setting up large
replication sets with Slony, or during a pg_dump, no dead rows can be
removed.  Since the Slony thing can take a very long time, dead rows
start to pile up in a way that can really harm performance.

We're currently working on it so that it is less of a problem for 8.4 ...

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: How is statement level read consistency implemented?

From

Erik Jones

Date:

22 April 2008, 14:22:15

On Apr 22, 2008, at 8:35 AM, Alvaro Herrera wrote:

> Roberts, Jon wrote:
>
>>>> -With autovacuum, does it skip these rows still being referenced
>>>> in a transaction or does it wait?
>>>
>>> It skips them, the idea being that a future vacuum will remove them.
>>
>> Awesome.  In a large data warehouse, the snapshot too old error is
>> very
>> annoying and I'm glad PostgreSQL is superior to Oracle in this
>> regard.
>> :)
>
> Well, the disadvantage of the PostgreSQL way is that it keeps dead
> rows
> around for longer than they're actually needed, and so it causes some
> problems in pathological conditions -- for example when setting up
> large
> replication sets with Slony, or during a pg_dump, no dead rows can be
> removed.  Since the Slony thing can take a very long time, dead rows
> start to pile up in a way that can really harm performance.

In addition or rather, another potential issue, if you have a REALLY
long transaction running then you can risk transaction id wraparound.

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com