Re: Making empty Bitmapsets always be NULL - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Making empty Bitmapsets always be NULL |
Date | |
Msg-id | CAApHDvr5O41MuUjw0DQKqmAnv7QqvmLqXReEd5o4nXTzWp8-+w@mail.gmail.com Whole thread Raw |
In response to | Making empty Bitmapsets always be NULL (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Making empty Bitmapsets always be NULL
|
List | pgsql-hackers |
On Wed, 1 Mar 2023 at 10:59, Tom Lane <tgl@sss.pgh.pa.us> wrote: > When I designed the Bitmapset module, I set things up so that an empty > Bitmapset could be represented either by a NULL pointer, or by an > allocated object all of whose bits are zero. I've recently come to > the conclusion that that was a bad idea and we should instead have > a convention like the longstanding invariant for Lists: an empty > list is represented by NIL and nothing else. I know I'm late to the party here, but I just wanted to add that I agree with this and also that I've never been a fan of having to decide if it was safe to check for NULL when I needed performance or if I needed to use bms_is_empty() because the set might have all its words set to 0. I suggest tightening the rule even further so instead of just empty sets having to be represented as NULL, the rule should be that sets should never contain any trailing zero words, which is effectively a superset of the "empty is NULL" rule that you've just changed. If we did this, then various functions can shake loose some crufty code and various other functions become more optimal due to not having to loop over trailing zero words needlessly. For example. * bms_equal() and bms_compare() can precheck nwords match before troubling themselves with looping over each member, and; * bms_is_subset() can return false early if 'a' has more words than 'b', and; * bms_subset_compare() becomes more simple as it does not have to look for trailing 0 words, and; * bms_nonempty_difference() can return true early if 'a' has more words than 'b', plus no need to check for trailing zero words at the end. We can also chop the set down to size in; bms_intersect(), bms_difference(), bms_int_members(), bms_del_members() and bms_intersect() which saves looping needlessly over empty words when various other BMS operations are performed later on the set, for example, bms_next_member(), bms_prev_member, bms_copy(), etc. The only reason I can think of that this would be a bad idea is that if we want to add members again then we need to do repalloc(). If we're only increasing the nwords back to what it had been on some previous occasion then repalloc() is effectively a no-op, so I doubt this will really be a net negative. I think the effort we'll waste by carrying around needless trailing zero words in most cases is likely to outweigh the overhead of any no-op repallocs. Take bms_int_members(), for example, we'll purposefully 0 out all the trailing words possibly having to read in new cachelines from RAM to do so. It would be better to leave having to read those in again until we actually need to do something more useful with them, like adding some new members to the set again. We'll have to dirty those cachelines then anyway and we may have flushed those cachelines out of the CPU cache by the time we get around to adding the new members again. I've coded this up in the attached and followed the lead in list.c and added a function named check_bitmapset_invariants() which ensures the above rule is followed. I think the code as it stands today should really have something like that anyway. The patch also optimizes sub-optimal newly added code which calls bms_is_empty_internal() when we have other more optimal means to determine if the set is empty or not. David
Attachment
pgsql-hackers by date: