Home > mailing lists

Re: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

From	Kyotaro Horiguchi
Subject	Re: [Patch] Optimize dropping of relation buffers using dlist
Date	October 22, 2020 08:50:36
Msg-id	20201022.175036.529997219363894066.horikyota.ntt@gmail.com Whole thread Raw
In response to	RE: [Patch] Optimize dropping of relation buffers using dlist ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses	Re: [Patch] Optimize dropping of relation buffers using dlist Re: [Patch] Optimize dropping of relation buffers using dlist
List	pgsql-hackers

Tree view

At Thu, 22 Oct 2020 07:31:55 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in 
> From: Thomas Munro <thomas.munro@gmail.com>
> > On Thu, Oct 22, 2020 at 7:33 PM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> > > Mmm. Not exact. The requirement here is that we must be certain that
> > > the we don't have a buffuer for blocks after the file size known to
> > > the process.  While recoverying, If the first lseek() returned smaller
> > > size than actual, we cannot have a buffer for the blocks after the
> > > size. After we trncated or extended the file, we are certain that we
> > > don't have a buffer for unknown blocks.
> > 
> > Thanks, I understand now.  Something feels fragile about it, perhaps
> > because it's not really acting as a "cache" anymore despite its name,
> > but I see the logic now.  It becomes the authoritative source of
> > information, even if the kernel decides to make our file smaller
> > asynchronously.
> 
> Thank you Horiguchi-san, you are a savior!  I was worried like the end of the world has come.
> 
> 
> > I think a synchronised file size cache wouldn't be enough to use this
> > trick outside the recovery process, because the initial value would
> > come from a call to lseek(), but unlike recovery, that wouldn't happen
> > *before* we start putting pages in the buffer pool.  Also, if we one
> > day have a size-limited relcache, even recovery could get into
> > trouble, if it evicts the RelationData that holds the authoritative
> > nblocks value.
> 
> That's too bad, because we hoped we may be able to various operations during normal operation (TRUNCATE, DROP
TABLE/INDEX,DROP DATABASE,  etc.)  An honest man can't believe the system call, that's a hell.
 
> 
> I'm probably being silly, but can't we avoid the problem by using fstat() instead of lseek(SEEK_END)?  Would they
returnthe same value from the i-node?
 
> 
> Or, can't we just try to do BufTableLookup() one block after what smgrnblocks() returns?

Lossy smgrrelcache or relacache is not a hard obstacle. As the same
with the case of !accurate, we just give up the optimized dropping if
the relcache doesn't give the authoritative size.

By the way, heap scan finds the size of target relation using
smgrnblocks().  I'm not sure why we don't miss recently-extended pages
on a heap-scan?  It seems to be possible that concurrent checkpoint
fsyncs relation files inbetween the extension and scanning and the
scanning gets smaller size than it really is.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

pgsql-hackers by date:

From: Amit Kapila
Date: 22 October 2020, 08:39:08
Subject: Re: Track statistics for streaming of in-progress transactions

From: Dilip Kumar
Date: 22 October 2020, 09:08:03
Subject: Re: parallel distinct union and aggregate support patch

Re: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

Previous

Next