Hello hackers,
David Rowley posted some interesting stuff about an ArrayList[1] that
would replace List in many places. (I probably would have replied to
that thread if I hadn't changed email account, sorry.) I had similar
thoughts when I needed to maintain a sorted vector for my refactoring
proposal for the fsync queue, as part of my undo log work, except I
came at the problem with some ideas from C++ in mind. However, now
this stuff seems to be redundant for now, given some decisions we've
arrived at in the thread about how to track the fsync queue, and I'm
more focused on getting the undo work done.
I wanted to share what I had anyway, since it seems to be related to
David's proposal but make some different trade-offs. Perhaps it will
be useful for ideas, code, or examples of what not to do :-) Or
perhaps I'll come back to it later.
Specialising the comparator and element size of functions like
qsort(), binary_search(), unique() is a well known technique for going
faster, and the resulting programming environment has better type
safety. It's easy to measure a speedup for specialised qsort of small
objects (compared to pg_qsort() with function pointer and object size
as arguments). Doing that in a way that works well with automatically
managed vectors (and also any array, including in shmem etc) seemed
like a good plan to me, hence this prototyping work.
The basic idea is to use simplehash-style generic programming, like a
kind of poor man's C++ standard library. The vector type is
instantiated for a given type like Oid etc, and then you can
instantiate specialised qsort() etc. The vector has a 'small vector
optimisation' where you can hold very small lists without allocating
any extra memory until you get past (say) 3 members. I was planning
to extend the qsort implementation a bit further to that it could
replace both pg_qsort() and the Perl-based tuple qsort generator, and
then we'd have only one copy of the algorithm in the tree (the dream
of generic programmers). I wanted to do the same with unique(), since
I'd already noticed that we have many open coded examples of that
algorithm scattered throughout the tree.
Some of the gratuitous C++isms should be removed including some
nonsense about const qualifiers, use of begin and end rather than data
and size (more common in C code), some some other details, and I was
planning to fix some of that before I reposted the patch set as part
of the larger undo patch set, but now that I'm not going to do that...
here is a snapshot of the patch set as-is, with toy examples showing
several examples of List-of-Oid relationOids being replaced with a
specialised vector (with a missed opportunity for it to be sorted and
use binary_search() instead of find()), and the List-of-Node
p_joinexprs being replaced with a specialised vector. I think those
last things are the sort of thing that David was targeting.
[1] https://www.postgresql.org/message-id/flat/CAKJS1f_2SnXhPVa6eWjzy2O9A%3Docwgd0Cj-LQeWpGtrWqbUSDA%40mail.gmail.com
--
Thomas Munro
https://enterprisedb.com