Re: 64 bit transaction id - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: 64 bit transaction id
Date
Msg-id 20191103191522.jkrv6eik73r3ulzo@development
Whole thread Raw
In response to Re: 64 bit transaction id  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers
On Sun, Nov 03, 2019 at 02:17:15PM +0300, Павел Ерёмин wrote:
>   I completely agree with all of the above. Therefore, the proposed
>   mechanism may entail larger improvements (and not only VACUUM).

I think the best think you can do is try implementing this ...

I'm afraid the "improvements" essentially mean making various imporant
parts of the system much more complicated and expensive. There's a
trade-off between saving 8B per row and additional overhead (during
vacuum etc.), and it does not seem like a winning strategy. What started
as "we can simply look at the next row version" is clearly way more
complicated and expensive.

The trouble here is that it adds dependency between pages in the data
file. That for example means that during cleanup of a page it may be 
necessary to modify the other page, when originally that would be 
read-only in that checkpoint interval. That's essentially write 
amplification, and may significantly increase the amount of WAL due to 
generating FPW for the other page.

>   I can offer the following solution.
>   For VACUUM, create a hash table.
>   VACUUM scanning the table sees that the version (tuple1) has t_ctid filled
>   and refers to the address tuple2, it creates a structure into which it
>   writes the address tuple1, tuple1.xid, length tuple1 (well, and other
>   information that is needed), puts this structure in the hash table by key
>   tuple2 addresses.
>   VACUUM reaches tuple2, checks the address of tuple2 in the hash table - if
>   it finds it, it evaluates the connection between them and makes a decision
>   on cleaning.
>   

We know VACUUM is already pretty expensive, so making it even more
expensive seems pretty awful. And the proposed solution seems damn
expensive. We already do something similar for indexes - we track
pointers for removed rows, so that we can remove them from indexes. And
it's damn expensive because we don't know where in the index the tuples
are - so we have to scan the whole indexes.

This would mean we have to do the same thing for table, because we don't
know where in the table are the older versions of those rows, because we
don't know where the other rows are. That seems mighty expensive.

Not to mention that this does nothing for page-level vacuum, which we
do when trying to fit another row on a page (e.g. for HOT). This has to
be absolutely cheap, we certainly are not going to do lookups of other
pages or looking for older versions of the row, and so on.

Being able to do visibility decisions based on the tuple alone (or
possibly page-level + tuple information) has a lot of value, and I don't
think we want to make this more complicated.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Josef Šimánek
Date:
Subject: [PATCH] Include triggers in EXPLAIN
Next
From: Fabien COELHO
Date:
Subject: Re: Getting psql to redisplay command after \e