Re: drop/truncate table sucks for large values of shared buffers - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: drop/truncate table sucks for large values of shared buffers
Date
Msg-id CAA4eK1LGmXe7ORg5K7E52=pLbSVJup-B5=ya_DWy2rRG3TdY5Q@mail.gmail.com
Whole thread Raw
In response to Re: drop/truncate table sucks for large values of shared buffers  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: drop/truncate table sucks for large values of shared buffers
List pgsql-hackers
On Tue, Jun 30, 2015 at 12:10 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> On 30 June 2015 at 07:34, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Tue, Jun 30, 2015 at 11:00 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> >
>> > On 30 June 2015 at 05:02, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >>
>> >> On Mon, Jun 29, 2015 at 7:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> >> >
>> >> > On 28 June 2015 at 17:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> >> >>
>> >> > If lseek fails badly then SeqScans would give *silent* data loss, which in my view is worse. Just added pages aren't the only thing we might miss if lseek is badly wrong.
>> >> >
>> >>
>> >> So for the purpose of this patch, do we need to assume that
>> >> lseek can give us wrong size of file and we should add preventive
>> >> checks and other handling for the same?
>> >> I am okay to change that way, if we are going to have that as assumption
>> >> in out code wherever we are using it or will use it in-future, otherwise
>> >> we will end with some preventive checks which are actually not required.
>> >
>> >
>> > They're preventative checks. You always hope it is wasted effort.
>> >
>>
>> I am not sure if Preventative checks (without the real need) are okay if they
>> are not-cheap which could happen in this case.  I think Validating buffer-tag
>> would require rel or sys cache lookup.
>
>
> True, so don't do that.
>
> Keep a list of dropped relations and have the checkpoint process scan the buffer pool every 64 tables, kinda like AbsorbFsync
>

Okay. I think we can maintain the list in similar way as we do for
UNLINK_RELATION_REQUEST in RememberFsyncRequest(), but
why to wait till 64 tables?  We already scan whole buffer list in each
checkpoint cycle, so during that scan we can refer this dropped relation
list and avoid syncing such buffer contents.  Also for ENOENT error
handling for FileWrite, we can use this list to refer relations for which we
need to ignore the error.  I think we are already doing something similar in
mdsync to avoid the problem of Dropped tables, so it seems okay to
have it in mdwrite as well.

The crucial thing in this idea to think about is avoiding reassignment of
relfilenode (due to wrapped OID's) before we have ensured that none of
the buffers contains tag for that relfilenode.  Currently we avoid this for
Fsync case by retaining the first segment of relation (which will avoid
reassignment of relfilenode) till checkpoint ends, I think if we just postpone
it till we have validated it in shared_buffers, then we can avoid this problem
in new scheme and it should be delay of maximum one checkpoint cycle
for unlinking such file assuming we refer dropped relation list in each checkpoint
cycle during buffer scan.

Does that make sense?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: pg_basebackup and replication slots
Next
From: Peter Eisentraut
Date:
Subject: Re: Support for N synchronous standby servers - take 2