Re: 64 bit transaction id - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: 64 bit transaction id
Date
Msg-id 20191102232043.ia6gu57uriucdtxd@development
Whole thread Raw
In response to Re: 64 bit transaction id  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers
On Sat, Nov 02, 2019 at 11:35:09PM +0300, Павел Ерёмин wrote:
>   The proposed option is not much different from what it is now.
>   We are not trying to save some space - we will reuse the existing one. We
>   just work in 64 bit transaction counters. Correct me if I'm wrong - the
>   address of the next version of the line is stored in the 6 byte field
>   t_cid in the tuple header - which is not attached to the current page in
>   any way - and can be stored anywhere in the table. Nothing changes.

I think you mean t_ctid, not t_cid (which is a 4-byte CommandId, not any
sort of item pointer).

I think this comment from htup_details.h explains the issue:

 * ... Beware however that VACUUM might
 * erase the pointed-to (newer) tuple before erasing the pointing (older)
 * tuple.  Hence, when following a t_ctid link, it is necessary to check
 * to see if the referenced slot is empty or contains an unrelated tuple.
 * Check that the referenced tuple has XMIN equal to the referencing tuple's
 * XMAX to verify that it is actually the descendant version and not an
 * unrelated tuple stored into a slot recently freed by VACUUM.  If either
 * check fails, one may assume that there is no live descendant version.

Now, imagine you have a tuple that gets updated repeatedly (say, 3x) and
each version gets to a different page. Say, pages #1, #2, #3. And then
VACUUM happens on some of the "middle" page (this may happen when trying
to fit new row onto a page to allow HOT, but it might happen even during
regular VACUUM).

So we started with 3 tuples on pages #1, #2, #3, but now we have this

  #1 - tuple exists, points to tuple on page #2
  #2 - tuple no longer exists, cleaned up by vacuum
  #3 - tuple exists

The scheme you proposed requires existence of all the tuples in the
chain to determine visibility. When tuple #2 no longer exists, it's
impossible to decide whether tuple on page #1 is visible or not.

This also significantly increases the amount of random I/O, pretty much
by factor of 2, because whenever you look at a row, you also have to
look at the "next version" which may be on another page. That's pretty
bad, bot for I/O and cache hit ratio. I don't think that's a reasonable
trade-off (at least compared to simply making the XIDs 64bit).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: ilmari@ilmari.org (Dagfinn Ilmari Mannsåker)
Date:
Subject: [PATCH] contrib/seg: Fix PG_GETARG_SEG_P definition
Next
From: Vik Fearing
Date:
Subject: Re: Allow cluster_name in log_line_prefix