Re: POC: converting Lists into arrays - Mailing list pgsql-hackers

From David Rowley
Subject Re: POC: converting Lists into arrays
Date
Msg-id CAKJS1f-pNfoUJrmU8vgD1WrKLqs9WxOVLQ0RpSWu9h6W9RnypA@mail.gmail.com
Whole thread Raw
In response to Re: POC: converting Lists into arrays  (Andres Freund <andres@anarazel.de>)
Responses Re: POC: converting Lists into arrays
Re: POC: converting Lists into arrays
List pgsql-hackers
On Tue, 5 Mar 2019 at 11:11, Andres Freund <andres@anarazel.de> wrote:
> On 2019-03-04 16:28:40 -0500, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > I don't buy this. I think e.g. redisgning the way we represent
> > > targetlists would be good (it's e.g. insane that we recompute
> > > descriptors out of them all the time), and would reduce their allocator
> > > costs.
> >
> > Maybe we're not on the same page here, but it seems to me that that'd be
> > addressable with pretty localized changes (eg, adding more fields to
> > TargetEntry, or keeping a pre-instantiated output tupdesc in each Plan
> > node).  But if the concern is about the amount of palloc bandwidth going
> > into List cells, we're not going to be able to improve that with localized
> > data structure changes; it'll take something like the patch I've proposed.
>
> What I'm saying is that it'd be reasonable to replace the use of list
> for targetlists with 'list2' without a wholesale replacement of all the
> list code, and it'd give us benefits.

So you think targetlists are the only case to benefit from an array
based list? (Ignoring the fact that I already showed another case)
When we discover the next thing to benefit, then the replacement will
be piecemeal, just the way Tom would rather not do it.  I personally
don't want to be up against huge resistance when I discover that
turning a single usage of a List into List2 is better.  We'll need to
consider backpatching pain / API breakage *every single time*.

A while ago I did have a go at changing some List implementations for
my then proposed ArrayList and it was beyond a nightmare, as each time
I changed one I realised I needed to change another. In the end, I
just gave up. Think of all the places we have forboth() and
forthree(), we'll need to either provide a set of macros that take
various combinations of List and List2 or do some conversion
beforehand.  With respect, if you don't believe me, please take my
ArrayList patch [1] and have a go at changing targetlists to use
ArrayLists all the way from the parser through to the executor. I'll
be interested in the diff stat once you're done.

It's true that linked lists are certainly better for some stuff;
list_concat() is going to get slower, lcons() too, but likely we can
have a bonus lcons() elimination round at some point. I see quite a
few of them that look like they could be changed to lappend().  I also
just feel that if we insist on more here then we'll get about nothing.
I'm also blocked on my partition performance improvement goals on
list_nth() being O(N), so I'm keen to see progress here and do what I
can to help with that.  With list_concat() I find that pretty scary
anyway. Using it means we can have a valid list that does not get it's
length updated when someone appends a new item. Most users of that do
list_copy() to sidestep that and other issues... which likely is
something we'd want to rip out with Tom's patch.

[1] https://www.postgresql.org/message-id/CAKJS1f_2SnXhPVa6eWjzy2O9A=ocwgd0Cj-LQeWpGtrWqbUSDA@mail.gmail.com

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: jsonpath
Next
From: Andres Freund
Date:
Subject: Inheriting table AMs for partitioned tables