Re: mdclose() does not cope w/ FileClose() failure - Mailing list pgsql-hackers

From Noah Misch
Subject Re: mdclose() does not cope w/ FileClose() failure
Date
Msg-id 20191224195739.GB1689008@rfd.leadboat.com
Whole thread Raw
In response to Re: mdclose() does not cope w/ FileClose() failure  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: mdclose() does not cope w/ FileClose() failure
List pgsql-hackers
On Mon, Dec 23, 2019 at 07:41:49PM +0900, Kyotaro Horiguchi wrote:
> At Sun, 22 Dec 2019 12:21:00 -0800, Noah Misch <noah@leadboat.com> wrote in 
> > On Sun, Dec 22, 2019 at 01:19:30AM -0800, Noah Misch wrote:
> > > An alternative would be to call
> > > _fdvec_resize() after every FileClose(), like mdtruncate() does; however, the
> > > repalloc() overhead could be noticeable.  (mdclose() is called much more
> > > frequently than mdtruncate().)
> > 
> > I can skip repalloc() when the array length decreases, to assuage mdclose()'s
> > worry.  In the mdclose() case, the final _fdvec_resize(reln, fork, 0) will
> > still pfree() the array.  Array elements that mdtruncate() frees today will
> > instead persist to end of transaction.  That is okay, since mdtruncate()
> > crossing more than one segment boundary is fairly infrequent.  For it to
> > happen, you must either create a >2G relation and then TRUNCATE it in the same
> > transaction, or VACUUM must find >1-2G of unused space at the end of the
> > relation.  I'm now inclined to do it that way, attached.
> 
>          * It doesn't seem worthwhile complicating the code by having a more
>          * aggressive growth strategy here; the number of segments doesn't
>          * grow that fast, and the memory context internally will sometimes
> -         * avoid doing an actual reallocation.
> +         * avoid doing an actual reallocation.  Likewise, since the number of
> +         * segments doesn't shrink that fast, don't shrink at all.  During
> +         * mdclose(), we'll pfree the array at nseg==0.
> 
> If I understand it correctly, it is mentioning the number of the all
> segment files in a fork, not the length of md_seg_fds arrays at a
> certain moment. But actually _fdvec_resize is called for every segment
> opening during mdnblocks (just-after mdopen), and every segment
> closing during mdclose and mdtruncate as mentioned here. We are going
> to omit pallocs only in the decreasing case.

That is a good point.  How frequently one adds 1 GiB of data is not the main
issue.  mdclose() and subsequent re-opening of all segments will be more
relevant to overall performance.

> If we regard repalloc as far faster than FileOpen/FileClose or we care
> about only increase of segment number of mdopen'ed files and don't
> care the frequent resize that happens during the functions above, then
> the comment is right and we may resize the array in the
> segment-by-segment manner.

In most cases, the array will fit into a power-of-two chunk, so repalloc()
already does the right thing.  Once the table has more than ~1000 segments (~1
TiB table size), the allocation will get a single-chunk block, and every
subsequent repalloc() will call realloc().  Even then, repalloc() probably is
far faster than File operations.  Likely, I should just accept the extra
repalloc() calls and drop the "else if" change in _fdvec_resize().



pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: Allow an alias to be attached directly to a JOIN ... USING
Next
From: Tatsuo Ishii
Date:
Subject: Re: Implementing Incremental View Maintenance