Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CA+TgmoZy2ygb4AFq33g51BDkQpKZF32XQ_=+2mEANv_cQUzteg@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum  (Yura Sokolov <y.sokolov@postgrespro.ru>)
Re: [PoC] Improve dead tuple storage for lazy vacuum  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Thu, Jul 29, 2021 at 5:11 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Indeed. Given that the radix tree itself has other use cases, I have
> no concern about using radix tree for vacuum's dead tuples storage. It
> will be better to have one that can be generally used and has some
> optimizations that are helpful also for vacuum's use case, rather than
> having one that is very optimized only for vacuum's use case.

What I'm about to say might be a really stupid idea, especially since
I haven't looked at any of the code already posted, but what I'm
wondering about is whether we need a full radix tree or maybe just a
radix-like lookup aid. For example, suppose that for a relation <= 8MB
in size, we create an array of 1024 elements indexed by block number.
Each element of the array stores an offset into the dead TID array.
When you need to probe for a TID, you look up blkno and blkno + 1 in
the array and then bsearch only between those two offsets. For bigger
relations, a two or three level structure could be built, or it could
always be 3 levels. This could even be done on demand, so you
initialize all of the elements to some special value that means "not
computed yet" and then fill them the first time they're needed,
perhaps with another special value that means "no TIDs in that block".

I don't know if this is better, but I do kind of like the fact that
the basic representation is just an array. It makes it really easy to
predict how much memory will be needed for a given number of dead
TIDs, and it's very DSM-friendly as well.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Dave Cramer
Date:
Subject: Re: pg_upgrade does not upgrade pg_stat_statements properly
Next
From: Alvaro Herrera
Date:
Subject: Re: pg_upgrade does not upgrade pg_stat_statements properly