Re: mdclose() does not cope w/ FileClose() failure - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: mdclose() does not cope w/ FileClose() failure |
Date | |
Msg-id | 20191224195739.GB1689008@rfd.leadboat.com Whole thread Raw |
In response to | Re: mdclose() does not cope w/ FileClose() failure (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Responses |
Re: mdclose() does not cope w/ FileClose() failure
|
List | pgsql-hackers |
On Mon, Dec 23, 2019 at 07:41:49PM +0900, Kyotaro Horiguchi wrote: > At Sun, 22 Dec 2019 12:21:00 -0800, Noah Misch <noah@leadboat.com> wrote in > > On Sun, Dec 22, 2019 at 01:19:30AM -0800, Noah Misch wrote: > > > An alternative would be to call > > > _fdvec_resize() after every FileClose(), like mdtruncate() does; however, the > > > repalloc() overhead could be noticeable. (mdclose() is called much more > > > frequently than mdtruncate().) > > > > I can skip repalloc() when the array length decreases, to assuage mdclose()'s > > worry. In the mdclose() case, the final _fdvec_resize(reln, fork, 0) will > > still pfree() the array. Array elements that mdtruncate() frees today will > > instead persist to end of transaction. That is okay, since mdtruncate() > > crossing more than one segment boundary is fairly infrequent. For it to > > happen, you must either create a >2G relation and then TRUNCATE it in the same > > transaction, or VACUUM must find >1-2G of unused space at the end of the > > relation. I'm now inclined to do it that way, attached. > > * It doesn't seem worthwhile complicating the code by having a more > * aggressive growth strategy here; the number of segments doesn't > * grow that fast, and the memory context internally will sometimes > - * avoid doing an actual reallocation. > + * avoid doing an actual reallocation. Likewise, since the number of > + * segments doesn't shrink that fast, don't shrink at all. During > + * mdclose(), we'll pfree the array at nseg==0. > > If I understand it correctly, it is mentioning the number of the all > segment files in a fork, not the length of md_seg_fds arrays at a > certain moment. But actually _fdvec_resize is called for every segment > opening during mdnblocks (just-after mdopen), and every segment > closing during mdclose and mdtruncate as mentioned here. We are going > to omit pallocs only in the decreasing case. That is a good point. How frequently one adds 1 GiB of data is not the main issue. mdclose() and subsequent re-opening of all segments will be more relevant to overall performance. > If we regard repalloc as far faster than FileOpen/FileClose or we care > about only increase of segment number of mdopen'ed files and don't > care the frequent resize that happens during the functions above, then > the comment is right and we may resize the array in the > segment-by-segment manner. In most cases, the array will fit into a power-of-two chunk, so repalloc() already does the right thing. Once the table has more than ~1000 segments (~1 TiB table size), the allocation will get a single-chunk block, and every subsequent repalloc() will call realloc(). Even then, repalloc() probably is far faster than File operations. Likely, I should just accept the extra repalloc() calls and drop the "else if" change in _fdvec_resize().
pgsql-hackers by date: