On Fri, Aug 20, 2021 at 1:29 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> Thinking about this stuff further, I was wondering if one way to
> handle the bounded shared hash table problem would be to replace the
> latest boundary in the map whenever it was full. But at that point,
> do we even need a hash table? This led me to revisit the two-element
> approach that was discussed upthread. What if we only stored the
> earliest and latest segment boundaries at any given time? Once the
> earliest boundary is added, it never changes until the segment is
> flushed and it is removed. The latest boundary, however, will be
> updated any time we register another segment. Once the earliest
> boundary is removed, we replace it with the latest boundary. This
> strategy could cause us to miss intermediate boundaries, but AFAICT
> the worst case scenario is that we hold off creating .ready files a
> bit longer than necessary.
I think this is a promising approach. We could also have a small
fixed-size array, so that we only have to risk losing track of
anything when we overflow the array. But I guess I'm still unconvinced
that there's a real possibility of genuinely needing multiple
elements. Suppose we are thinking of adding a second element to the
array (or the hash table). I feel like it's got to be safe to just
remove the first one. If not, then apparently the WAL record that
caused us to make the first entry isn't totally flushed yet - which I
still think is impossible.
--
Robert Haas
EDB: http://www.enterprisedb.com