Re: Speeding up parts of the planner using a binary search treestructure for nodes - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Speeding up parts of the planner using a binary search treestructure for nodes
Date
Msg-id CA+HiwqG8OxgxeQNyAvzwcSKXME8gg=xh-wVnyXNNhw9UNPi0VA@mail.gmail.com
Whole thread Raw
In response to Speeding up parts of the planner using a binary search tree structurefor nodes  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
Hi David,

On Fri, May 29, 2020 at 2:16 PM David Rowley <dgrowleyml@gmail.com> wrote:
> In [1] I mentioned that I'd like to look into coding a data structure
> to allow Node types to be looked up more efficiently than what List
> allows.  There are many places in the planner where we must determine
> if a given Node is in some List of Nodes. This is an O(n) operation.
> We could reduce these to as low as O(log n) with a binary search tree.
>
> A few places in the code that does this which can become a problem
> when the lists being searched are large:
>
> get_eclass_for_sort_expr()
> tlist_member()
>
> For the former, it's quite easy to see this come up in the profiles by
> querying a partitioned table with many partitions. e.g
>
> create table listp (a int) partition by list(a);
> select 'create table listp'||x::Text || ' partition of listp for
> values in('||x::text||');' from generate_Series(1,10000)x;
> \gexec
> create index on listp(a);
> explain select * from listp order by a;
>
> There are a few places that this new data structure could be used. My
> primary interest at the moment is EquivalenceClass.ec_members. There
> are perhaps other interesting places, such as planner targetlists, but
> obviously, any changes would need to be proved worthwhile before
> they're made.

I'm glad to see you mention this problem.

I have often wondered if we could do something about these data
structures growing bigger with the number of partitions in the first
place.  I could imagine that when "child" EC members were designed,
Tom maybe didn't think we'd continue relying on the abstraction for
the 10000 partitions case.  Making the lookup algorithm smarter with
binary or hashed search, as you are trying to do, might help us
somewhat, but would it be interesting to consider alternative designs
for the underlying abstraction?  Sorry, I don't have any concrete
ideas to offer at the moment, but thought sharing this perspective of
the problem might inspire others whose may have some.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Should we remove a fallback promotion? take 2
Next
From: Amit Kapila
Date:
Subject: Re: elog(DEBUG2 in SpinLocked section.