On Mon, Feb 19, 2024 at 4:35 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Hi, > > After taking a look at the patch optimizing SpecialJoinInfo allocations, > I decided to take a quick look at this one too. I don't have any > specific comments on the code yet, but it seems quite a bit more complex > than the other patch ... it's introducing a HTAB into the optimizer, > surely that has costs too.
Thanks for looking into this too.
> > I started by doing the same test as with the other patch, comparing > master to the two patches (applied independently and both). And I got > about this (in MB): > > tables master sjinfo rinfo both > ----------------------------------------------- > 2 37 36 34 33 > 3 138 129 122 113 > 4 421 376 363 318 > 5 1495 1254 1172 931 > > Unlike the SpecialJoinInfo patch, I haven't seen any reduction in > planning time for this one.
Yeah. That agreed with my observation as well.
> > The reduction in allocated memory is nice. I wonder what's allocating > the remaining memory, and we'd have to do to reduce it further.
Please see reply to SpecialJoinInfo thread. Other that the current patches, we require memory efficient Relids implementation. I have shared some ideas in the slides I shared in the other thread, but haven't spent time experimenting with any ideas there.
> > However, this is a somewhat extreme example - it's joining 5 tables, > each with 1000 partitions, using a partition-wise join. It reduces the > amount of memory, but the planning time is still quite high (and > essentially the same as without the patch). So it's not like it'd make > them significantly more practical ... do we have any other ideas/plans > how to improve that?
Yuya has been working on reducing planning time [1]. Has some significant improvements in that area based on my experiments. But those patches are complex and still WIP.
> > AFAIK we don't expect this to improve "regular" cases with modest number > of tables / partitions, etc. But could it make them slower in some case? >
AFAIR, my experiments did not show any degradation in regular cases with modest number of tables/partitions. The variation in planning time was with the usual planning time variations.
Documenting some comments from todays' patch review session
1. Instead of a nested hash table, it might be better to use a flat hash table to save more memory.
2. new comm_rinfo member in RestrictInfo may have problems when copying RestrictInfo or translating it. Instead commuted versions may be tracked outside RestrictInfo
Combining the above two, it feels like we need a single hash table with (commuted, rinfo_serial, relids) as key to store all the translations of a RestrictInfo.