Thread: [MASSMAIL]further improving roles_is_member_of()

[MASSMAIL]further improving roles_is_member_of()

From
Nathan Bossart
Date:
(moved to a new thread)

On Thu, Mar 21, 2024 at 04:31:45PM -0400, Tom Lane wrote:
> I wrote:
>> ... I still see the problematic GRANT taking ~250ms, compared
>> to 5ms in v15.  roles_is_member_of is clearly on the hook for that.
> 
> Ah: looks like that is mainly the fault of the list_append_unique_oid
> calls in roles_is_member_of.  That's also an O(N^2) cost of course,
> though with a much smaller constant factor.
> 
> I don't think we have any really cheap way to de-duplicate the role
> OIDs, especially seeing that it has to be done on-the-fly within the
> collection loop, and the order of roles_list is at least potentially
> interesting.  Not sure how to make further progress without a lot of
> work.

I looked at this one again because I suspect these "thousands of roles"
cases are going to continue to appear.  Specifically, I tried to convert
the cached roles lists to hash tables to avoid the list_member_oid linear
searches.  That actually was pretty easy to do.  The most complicated part
is juggling a couple of lists to keep track of the roles we need to iterate
over in roles_is_member_of().

AFAICT the only catch is that select_best_grantor() seems to depend on the
ordering of roles_list so that it prefers a "closer" role.  To deal with
that, I added a "depth" field to the entry struct that can be used as a
tiebreaker.  I'm not certain that this is good enough, though.

As shown in the attached work-in-progress patch, this actually ends up
removing more code than it adds, and it seems to provide similar results to
HEAD (using the benchmark from the previous thread [0]).  I intend to test
it with many more roles to see if it's better in more extreme cases.

[0] https://postgr.es/m/341186.1711037256%40sss.pgh.pa.us

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: further improving roles_is_member_of()

From
Nathan Bossart
Date:
On Thu, Apr 11, 2024 at 11:16:33PM -0500, Nathan Bossart wrote:
> As shown in the attached work-in-progress patch, this actually ends up
> removing more code than it adds, and it seems to provide similar results to
> HEAD (using the benchmark from the previous thread [0]).  I intend to test
> it with many more roles to see if it's better in more extreme cases.

Even with 100K roles, the Bloom filter added in commit d365ae7 seems to do
a pretty good job at keeping up with the hash table approach.  The callers
of roles_is_member_of() that do list_member_oid() on the returned list
might benefit a little from a hash table, but I'm skeptical it makes much
difference in practice.  This was an interesting test, but I'll likely
withdraw this patch shortly.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com