Re: POC: converting Lists into arrays - Mailing list pgsql-hackers

From Tom Lane
Subject Re: POC: converting Lists into arrays
Date
Msg-id 4555.1551128115@sss.pgh.pa.us
Whole thread Raw
In response to Re: POC: converting Lists into arrays  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Peter Geoghegan <pg@bowt.ie> writes:
> If we knew that the list code was the bottleneck in a handful of
> cases, then I'd come down on Robert's side here. It would then be
> possible to update the relevant bottlenecked code in an isolated
> fashion, while getting the lion's share of the benefit. However, I
> don't think that that's actually possible. The costs of using Lists
> everywhere are real and measurable, but it's also highly distributed.
> At least, that's my recollection from previous discussion from several
> years back. I remember talking about this with Andres in early 2016.

Yeah, that's exactly the point.  If we could replace some small number
of places with a vector-ish data structure and get all/most of the win,
then that would be the way to go about it.  But I'm pretty sure that
we aren't going to make much of an improvement without wholesale
changes.  Nor is it really all that attractive to have some pieces of
the parse/plan/execution tree data structures using one kind of list
while other places use another.  If we're to attack this at all,
I think we should go for a wholesale replacement.

Another way of looking at this is that if we expected that extensions
had a lot of private Lists, unrelated to these core data structures,
it might be worth preserving the List implementation so as not to cause
problems for such usage.  But I doubt that that's the case; or that
any such private lists are more likely to be broken by these API changes
than the core usage is (600 changes in however many lines we've got is
not a lot); or that people would really want to deal with two independent
list implementations with different behaviors just to avoid revisiting
some portions of their code while they're being forced to revisit others
anyway.

> If you want to stop third party code from compiling, you can find a
> way to do that without really changing your approach. Nothing stops
> you from changing some symbol names minimally, and then making
> corresponding mechanical changes to all of the client code within the
> tree. Third party code authors would then follow this example, with
> the expectation that it's probably going to be a totally mechanical
> process.

Yeah, if we expected that only mechanical changes would be needed, and
forcing certain syntax changes would be a good guide to what had to be
done, then this'd be a helpful way to proceed.  The lnext changes in
my proposed patch do indeed work like that.  But the part that's actually
hard is finding/fixing the places where you can't safely use lnext
anymore, and there's nothing very mechanical about that.  (Unless you want
to just forbid lnext altogether, which maybe is a reasonable thing to
contemplate, but I judged it overkill.)

BTW, I neglected to respond to Robert's earlier point that

>>> It is also perhaps worth mentioning that reimplementing a List as an
>>> array means that it is... not a list.  That is a pretty odd state of
>>> affairs

I think the reason we have Lisp-style lists all over the place has little
to do with whether those are ideal data structures, and a lot to do with
the fact that chunks of Postgres were originally written in Lisp, and
in that language using lists for everything is just How It's Done.
I don't have any problem with regarding that nomenclature as being mostly
a legacy thing, which is how I documented it in the proposed revision
to pg_list.h's header comment.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] generated columns
Next
From: Andres Freund
Date:
Subject: Re: POC: converting Lists into arrays