Re: [HACKERS] ArrayLists instead of List (for some things) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [HACKERS] ArrayLists instead of List (for some things)
Date
Msg-id 20171102154453.q5k6kdqntk3iysw3@alap3.anarazel.de
Whole thread Raw
In response to Re: [HACKERS] ArrayLists instead of List (for some things)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 2017-11-02 10:38:34 -0400, Tom Lane wrote:
> David Rowley <david.rowley@2ndquadrant.com> writes:
> > On 3 November 2017 at 03:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> We've jacked up the List API and driven a new implementation underneath
> >> once before.  Maybe it's time to do that again.
> 
> > Maybe, but the new implementation is not going to do well with places
> > where we perform lcons(). Probably many of those places could be
> > changed to lappend(), but I bet there's plenty that need prepend.
> 
> [ shrug... ] To me, that means this implementation isn't necessarily
> the right solution.
> 
> It seems to me that a whole lot of the complaints about this could be
> resolved simply by improving the List infrastructure to allocate ListCells
> in batches.  That would address the question of "too much palloc traffic"
> and greatly improve the it-accesses-all-over-memory situation too.

It'll however not solve the problem that pointer chained datastructures
have very poor access patterns.


> Possibly there are more aggressive changes that could be implemented
> without breaking too many places, but I think it would be useful to
> start there and see what it buys us.

I'm suspicious that that will result in causing pain twice, once doing
the compromise thing, and once doing it right.


FWIW, I'd started pursuing converting a lot of executor uses of lists
with arrays. But everywhere I did, particularly the uses inside the
expression machinery, I noticed that larger changes where needed. For
the expression stuff we needed a much larger architectural change (hence
the stuff in 10 I did with Tom's help).  The other big user of lists at
the execution phase are targetlists - I think there the architectural
change is also bigger: Rebuilding/analyzing all the targetlists,
tupledescs etc at execution time is the architectural problem, not the
use of lists itself (although that's also bad).   The amount of cycles
spent in ExecTypeFromTL() etc and friends is partially caused by the use
of lists, but the real problem is that we're doing all that work in the
first place.

So I personally think most uses of lists at execution time are
archetecturally wrong, therefore optimizing their implementation is just
a bandaid.  Where I think the arguments for a more optimized
implementation are better is the parser, where we really don't ahead of
time know the size of the data we're dealing with.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: [HACKERS] proposal: schema variables
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] proposal: schema variables