Re: POC: converting Lists into arrays - Mailing list pgsql-hackers

From Robert Haas
Subject Re: POC: converting Lists into arrays
Date
Msg-id CA+TgmoZM4c=v0ifR++i09JFtNtnqTq_HkFmbFjwWqbUJffOLmA@mail.gmail.com
Whole thread Raw
In response to Re: POC: converting Lists into arrays  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: POC: converting Lists into arrays
Re: POC: converting Lists into arrays
List pgsql-hackers
On Sun, Mar 3, 2019 at 1:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > My main observation was from when the expression evaluation was using
> > lists all over. List iteration overhead was very substantial there. But
> > that's not a problem anymore, because all of those are gone now due to
> > the expression rewrite.  I personally wasn't actually advocating for a
> > new list implementation, I was/am advocating that we should move some
> > tasks over to a more optimized representation.
>
> I doubt that you'll get far with that; if this experiment is anything
> to go by, it's going to be really hard to make the case that twiddling
> the representation of widely-known data structures is worth the work
> and bug hazards.

I'm befuddled by this comment.  Andres is arguing that we shouldn't go
do a blind search-and-replace, but rather change certain things, and
you're saying that's going to be really hard because twiddling the
representation of widely-known data structures is really hard.  But if
we only change certain things, we don't *need* to twiddle the
representation of a widely-known data structure.  We just add a new
one and convert the things that benefit from it, like I proposed
upthread (and promptly got told I was wrong).

I think the reason why you're not seeing a performance benefit is
because the problem is not that lists are generically a more expensive
data structure than arrays, but that there are cases when they are
more expensive than arrays.  If you only ever push/pop at the front,
of course a list is going to be better.  If you often look up elements
by index, of course an array is going to be better.  If you change
every case where the code currently uses a list to use something else
instead, then you're changing both the winning and losing cases.
Yeah, changing things individually is more work, but that's how you
get the wins without incurring the losses.

I think David's results go in this direction, too.  Code that was
written on the assumption that list_nth() is slow is going to avoid
using it as much as possible, and therefore no benefit is to be
expected from making it fast.  If the author written the same code
assuming that the underlying data structure was an array rather than a
list, they might have picked a different algorithm which, as David's
results shows, could be a lot faster in some cases.  But it's not
going to come from just changing the way lists work internally; it's
going to come from redesigning the algorithms that are using lists to
do something better instead, as Andres's example of linearized
expression evaluation also shows.

> The cases I've been looking at suggest to me that we'd make far
> more progress on the excessive-palloc'ing front if we could redesign
> things to reduce unnecessary copying of parsetrees.  Right now the
> planner does an awful lot of copying because of fear of unwanted
> modifications of multiply-linked subtrees.  I suspect that we could
> reduce that overhead with some consistently enforced rules about
> not scribbling on input data structures; but it'd take a lot of work
> to get there, and I'm afraid it'd be easily broken :-(

I think that's a separate but also promising thing to attack, and I
agree that it'd take a lot of work to get there.  I don't think that
the problem with either parse-tree-copying or list usage is that no
performance benefits are to be had; I think it's that the amount of
work required to get those benefits is pretty large.  If it were
otherwise, somebody likely would have done it before now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Ibrar Ahmed
Date:
Subject: Re: Problem with default partition pruning
Next
From: Laurenz Albe
Date:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) andKey Management Service (KMS)