It does seem likely to me that hash tables would be a more efficient structure, but the gains might not be as big as you think. To perform a lookup in the table we need to hash the entire node to get the hash value, then perform at least one equal comparison. With the Binary Search Lists, we'd need more comparisons, but those can be partial comparisons as they can abort early when we find the first mismatching field in the Node. When the number of items in the list is fairly small that might actually be less work, especially when the Node type varies (since that's the first field we compare). Hash tables are likely to become more efficient when the number of items is larger. The general case is that we'll just have a small number of items. I'd like to improve the less common situation when the number of items is large with minimal impact for when the number of items is small.
Agree with that. I think we can again borrow from the way we search a join - when small number of joins use list and for a larger number use hash table. I am not suggesting that we use list in this case, but the idea is to use two data structures. May be every eclass will use one of them depending upon the number of members. Queries involving partitioned tables with lots of partitions will have a large number of child eclass members.
> > tlist_member() > > hash table by expression as key here as well?
The order of the tlist does matter in many cases. I'm unsure how we could track the order that items were added to the hash table and then obtain the items back in that order in an efficient way. I imagine we'd need to store the item in some subsequent data structure, such as a List.
If we use hash table then we retain the list as well. But your idea below looks better.
There's obviously going to be some overhead to doing that. The idea with the Binary Search Lists was that items would be stored in an array, similar to List, but the cells of the list would contain the offset index to its parent and left and right child. New items would always take the next free slot in the list, just like List does so we'd easily be able to get the insert order by looping over the array of elements in order.
That seems like a good idea. I am worried that the expression nodes do not have any inherent ordering and we are proposing to use a structure which relies on order.