Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From John Naylor
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAFBsxsF2e-e_m7CTouaGP6fBb2t726okhzq0kjC1+M3egujisw@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers

On Mon, Jan 16, 2023 at 3:18 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Jan 16, 2023 at 2:02 PM John Naylor
> <john.naylor@enterprisedb.com> wrote:

> > + * Add Tids on a block to TidStore. The caller must ensure the offset numbers
> > + * in 'offsets' are ordered in ascending order.
> >
> > Must? What happens otherwise?
>
> It ends up missing TIDs by overwriting the same key with different
> values. Is it better to have a bool argument, say need_sort, to sort
> the given array if the caller wants?

Now that I've studied it some more, I see what's happening: We need all bits set in the "value" before we insert it, since it would be too expensive to retrieve the current value, add one bit, and put it back. Also, as a consequence of the encoding, part of the tid is in the key, and part in the value. It makes more sense now, but it needs more than zero comments.

As for the order, I don't think it's the responsibility of the caller to guess if it needs sorting -- if unordered offsets lead to data loss, this function needs to take care of it.

> > + uint64 last_key = PG_UINT64_MAX;
> >
> > I'm having some difficulty understanding this sentinel and how it's used.
>
> Will improve the logic.

Part of the problem is the English language: "last" can mean "previous" or "at the end", so maybe some name changes would help.

--
John Naylor
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Next
From: Nathan Bossart
Date:
Subject: Re: recovery modules