Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAD21AoCCQ7T1XNCKtnNe+qiqsdgMsmZsugDpbYQdBVCAi5=zgg@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum
List pgsql-hackers
On Fri, Sep 16, 2022 at 4:54 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
>
> On Fri, Sep 16, 2022 at 1:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Aug 15, 2022 at 10:39 PM John Naylor
> > <john.naylor@enterprisedb.com> wrote:
> > >
> > > bool, buth = and <=. Should be pretty close. Also, i believe if you
> > > left this for last as a possible refactoring, it might save some work.
>
> v6 demonstrates why this should have been put off towards the end. (more below)
>
> > > In any case, I'll take a look at the latest patch next month.
>
> Since the CF entry said "Needs Review", I began looking at v5 again
> this week. Hopefully not too much has changed, but in the future I
> strongly recommend setting to "Waiting on Author" if a new version is
> forthcoming. I realize many here share updated patches at any time,
> but I'd like to discourage the practice especially for large patches.

Understood. Sorry for the inconveniences.

>
> > I've updated the radix tree patch. It's now separated into two patches.
> >
> > 0001 patch introduces pg_lsearch8() and pg_lsearch8_ge() (we may find
> > better names) that are similar to the pg_lfind8() family but they
> > return the index of the key in the vector instead of true/false. The
> > patch includes regression tests.
>
> I don't want to do a full review of this just yet, but I'll just point
> out some problems from a quick glance.
>
> +/*
> + * Return the index of the first element in the vector that is greater than
> + * or eual to the given scalar. Return sizeof(Vector8) if there is no such
> + * element.
>
> That's a bizarre API to indicate non-existence.
>
> + *
> + * Note that this function assumes the elements in the vector are sorted.
> + */
>
> That is *completely* unacceptable for a general-purpose function.
>
> +#else /* USE_NO_SIMD */
> + Vector8 r = 0;
> + uint8 *rp = (uint8 *) &r;
> +
> + for (Size i = 0; i < sizeof(Vector8); i++)
> + rp[i] = (((const uint8 *) &v1)[i] == ((const uint8 *) &v2)[i]) ? 0xFF : 0;
>
> I don't think we should try to force the non-simd case to adopt the
> special semantics of vector comparisons. It's much easier to just use
> the same logic as the assert builds.
>
> +#ifdef USE_SSE2
> + return (uint32) _mm_movemask_epi8(v);
> +#elif defined(USE_NEON)
> + static const uint8 mask[16] = {
> +        1 << 0, 1 << 1, 1 << 2, 1 << 3,
> +        1 << 4, 1 << 5, 1 << 6, 1 << 7,
> +        1 << 0, 1 << 1, 1 << 2, 1 << 3,
> +        1 << 4, 1 << 5, 1 << 6, 1 << 7,
> +      };
> +
> +    uint8x16_t masked = vandq_u8(vld1q_u8(mask), (uint8x16_t)
> vshrq_n_s8(v, 7));
> +    uint8x16_t maskedhi = vextq_u8(masked, masked, 8);
> +
> +    return (uint32) vaddvq_u16((uint16x8_t) vzip1q_u8(masked, maskedhi));
>
> For Arm, we need to be careful here. This article goes into a lot of
> detail for this situation:
>
>
https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/porting-x86-vector-bitmask-optimizations-to-arm-neon
>
> Here again, I'd rather put this off and focus on getting the "large
> details" in good enough shape so we can got towards integrating with
> vacuum.

Thank you for the comments! These above comments are addressed by
Nathan in a newly derived thread. I'll work on the patch.

I'll consider how to integrate with vacuum as the next step. One
concern for me is how to limit the memory usage to
maintenance_work_mem. Unlike using a flat array, memory space for
adding one TID varies depending on the situation. If we want strictly
not to allow using memory more than maintenance_work_mem, probably we
need to estimate the memory consumption in a conservative way.


Regards,

--
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Zhang Mingli
Date:
Subject: Re: Add 64-bit XIDs into PostgreSQL 15
Next
From: Peter Smith
Date:
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher