Re: [PATCH] Speedup truncates of relation forks - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PATCH] Speedup truncates of relation forks
Date
Msg-id CAD21AoDjFW7vOYbrw7S5VOv6dKqdcwViCpBcqmzVKof7Pv1B6w@mail.gmail.com
Whole thread Raw
In response to RE: [PATCH] Speedup truncates of relation forks  ("Jamison, Kirk" <k.jamison@jp.fujitsu.com>)
Responses RE: [PATCH] Speedup truncates of relation forks
RE: [PATCH] Speedup truncates of relation forks
List pgsql-hackers
On Thu, Jun 13, 2019 at 6:30 PM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote:
>
> On Wednesday, June 12, 2019 4:29 PM (GMT+9), Masahiko Sawada wrote:
> > On Wed, Jun 12, 2019 at 12:25 PM Tsunakawa, Takayuki
> > <tsunakawa.takay@jp.fujitsu.com> wrote:
> > >
> > > From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]
> > > > Years ago I've implemented an optimization for many DROP TABLE
> > > > commands in a single transaction - instead of scanning buffers for
> > > > each relation, the code now accumulates a small number of relations
> > > > into an array, and then does a bsearch for each buffer.
> > > >
> > > > Would something like that be applicable/useful here? That is, if we
> > > > do multiple TRUNCATE commands in a single transaction, can we
> > > > optimize it like this?
> > >
> > > Unfortunately not.  VACUUM and autovacuum handles each table in a different
> > transaction.
> >
> > We do RelationTruncate() also when we truncate heaps that are created in the
> > current transactions or has a new relfilenodes in the current transaction.
> > So I think there is a room for optimization Thomas suggested, although I'm
> > not sure it's a popular use case.
>
> I couldn't think of a use case too.
>
> > I've not look at this patch deeply but in DropRelFileNodeBuffer I think we
> > can get the min value of all firstDelBlock and use it as the lower bound of
> > block number that we're interested in. That way we can skip checking the array
> > during scanning the buffer pool.
>
> I'll take note of this suggestion.
> Could you help me expound more on this idea, skipping the internal loop by
> comparing the min and buffer descriptor (bufHdr)?
>

Yes. For example,

    BlockNumber minBlock = InvalidBlockNumber;
(snip)
    /* Get lower bound block number we're interested in */
    for (i = 0; i < nforks; i++)
    {
        if (!BlockNumberIsValid(minBlock) ||
            minBlock > firstDelBlock[i])
            minBlock = firstDelBlock[i];
    }

    for (i = 0; i < NBuffers; i++)
    {
(snip)
        buf_state = LockBufHdr(bufHdr);

        /* check with the lower bound and skip the loop */
        if (bufHdr->tag.blockNum < minBlock)
        {
            UnlockBufHdr(bufHdr, buf_state);
            continue;
        }

        for (k = 0; k < nforks; k++)
        {
            if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
                bufHdr->tag.forkNum == forkNum[k] &&
                bufHdr->tag.blockNum >= firstDelBlock[k])

But since we acquire the buffer header lock after all and the number
of the internal loops is small (at most 3 for now)  the benefit will
not be big.

> In the current patch, I've implemented the following in DropRelFileNodeBuffers:
>         for (i = 0; i < NBuffers; i++)
>         {
>                 ...
>                 buf_state = LockBufHdr(bufHdr);
>                 for (k = 0; k < nforks; k++)
>                 {
>                         if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
>                                 bufHdr->tag.forkNum == forkNum[k] &&
>                                 bufHdr->tag.blockNum >= firstDelBlock[k])
>                         {
>                                 InvalidateBuffer(bufHdr); /* releases spinlock */
>                                 break;
>                         }
>
> > Don't we use each elements of nblocks for each fork? That is, each fork uses
> > an element at its fork number in the nblocks array and sets InvalidBlockNumber
> > for invalid slots, instead of passing the valid number of elements. That way
> > the following code that exist at many places,
> >
> >     blocks[nforks] = visibilitymap_mark_truncate(rel, nblocks);
> >    if (BlockNumberIsValid(blocks[nforks]))
> >    {
> >        forks[nforks] = VISIBILITYMAP_FORKNUM;
> >        nforks++;
> >    }
> >
> > would become
> >
> >     blocks[VISIBILITYMAP_FORKNUM] = visibilitymap_mark_truncate(rel,
> > nblocks);
>
> In the patch, we want to truncate all forks' blocks simultaneously, so
> we optimize the invalidation of buffers and reduce the number of loops
> using those values.
> The suggestion above would have to remove the forks array and its
> forksize (nforks), is it correct? But I think we’d need the fork array
> and nforks to execute the truncation all at once.

I meant that each forks can use the its forknumber'th element of
firstDelBlock[]. For example, if firstDelBlock = {1000,
InvalidBlockNumber, 20, InvalidBlockNumber}, we can invalid buffers
pertaining both greater than block number 1000 of main and greater
than block number 20 of vm. Since firstDelBlock[FSM_FORKNUM] ==
InvalidBlockNumber we don't invalid buffers of fsm.

As Tsunakawa-san mentioned, since your approach would reduce the loop
count your idea might be better than mine which always takes 4 loop
counts.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: Parallel grouping sets
Next
From:
Date:
Subject: RE: BEFORE UPDATE trigger on postgres_fdw table not work