Re: Speeding up parts of the planner using a binary search treestructure for nodes - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Speeding up parts of the planner using a binary search treestructure for nodes
Date
Msg-id CAG-ACPWy93jG2GhYN71cctzYqbCe2QrpBSYoUcBtpHpSYt3miQ@mail.gmail.com
Whole thread Raw
In response to Re: Speeding up parts of the planner using a binary search treestructure for nodes  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers


On Tue, 2 Jun 2020 at 03:13, David Rowley <dgrowleyml@gmail.com> wrote:


It does seem likely to me that hash tables would be a more efficient
structure, but the gains might not be as big as you think. To perform
a lookup in the table we need to hash the entire node to get the hash
value, then perform at least one equal comparison.  With the Binary
Search Lists, we'd need more comparisons, but those can be partial
comparisons as they can abort early when we find the first mismatching
field in the Node. When the number of items in the list is fairly
small that might actually be less work, especially when the Node type
varies (since that's the first field we compare). Hash tables are
likely to become more efficient when the number of items is larger.
The general case is that we'll just have a small number of items. I'd
like to improve the less common situation when the number of items is
large with minimal impact for when the number of items is small.

Agree with that. I think we can again borrow from the way we search a join - when small number of joins use list and for a larger number use hash table. I am not suggesting that we use list in this case, but the idea is to use two data structures. May be every eclass will use one of them depending upon the number of members. Queries involving partitioned tables with lots of partitions will have a large number of child eclass members.
 

> > tlist_member()
>
> hash table by expression as key here as well?

The order of the tlist does matter in many cases. I'm unsure how we
could track the order that items were added to the hash table and then
obtain the items back in that order in an efficient way. I imagine
we'd need to store the item in some subsequent data structure, such as
a List.
 
If we use hash table then we retain the list as well. But your idea below looks better.
 
There's obviously going to be some overhead to doing that.
The idea with the Binary Search Lists was that items would be stored
in an array, similar to List, but the cells of the list would contain
the offset index to its parent and left and right child.  New items
would always take the next free slot in the list, just like List does
so we'd easily be able to get the insert order by looping over the
array of elements in order.

That seems like a good idea. I am worried that the expression nodes do not have any inherent ordering and we are proposing to use a structure which relies on order.

--
Best Wishes,
Ashutosh

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: A wrong index choose issue because of inaccurate statistics
Next
From: Tom Lane
Date:
Subject: Re: snowball release