Re: [HACKERS] ArrayLists instead of List (for some things) - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: [HACKERS] ArrayLists instead of List (for some things)
Date
Msg-id 20171102142748.GG4628@tamriel.snowman.net
Whole thread Raw
In response to [HACKERS] ArrayLists instead of List (for some things)  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: [HACKERS] ArrayLists instead of List (for some things)  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
David,

* David Rowley (david.rowley@2ndquadrant.com) wrote:
> Our List implementation internally uses linked lists which are
> certainly good for some things, but pretty bad at other things. Linked
> lists are pretty bad when you want to fetch the Nth element, as it
> means looping ever each previous element until you get to the Nth.
> They're also pretty bad for a modern CPU due to their memory access
> pattern.

I can certainly understand this complaint.

> We could get around a few of these problems if Lists were implemented
> internally as arrays, however, arrays are pretty bad if we want to
> delete an item out the middle as we'd have to shuffle everything over
> one spot. Linked lists are much better here... at least they are when
> you already have the cell you want to remove... so we can't expect
> that we can just rewrite List to use arrays instead.

This actually depends on just how you delete the item out of the array,
though there's a trade-off there.  If you "delete" the item by marking
it as "dead" then you don't need to re-shuffle everything, but, of
course, then you have to make sure to 'skip' over those by running down
the array for each list_nth() call- but here's the thing, with a small
list that's all in a tighly packed array, that might not be too bad.
Certainly far better than having to grab random memory on each step and
I'm guessing you'd only actually store the "data" item for a given list
member in the array if it's a integer/oid/etc list, not when it's
actually a pointer being stored in the list, so you're always going to
be working with pretty tightly packed arrays where scanning the list on
the list_nth call really might be darn close to as fast as using an
offset to the actual entry in the array, unless it's a pretty big list,
but I don't think we've actually got lots of multi-hundred-deep lists.

Of course, you'd have to watch out for cases where there's a lot of
deletes happening, since that would effectively make this list longer.

Anyway, just something to consider.  Having to actually store the
"delete" flag would have some cost to it also, unless you happen to
already be storing something per entry and have an extra bit to spare.

Others who are more versed in CPU magics than I am are certainly welcome
to comment on if they think this makes sense to consider, I'm no CPU
architecture guru.

> Anyway, please don't debate the usages of the new type here. As for
> all the above plans, I admit to not having a full handle on them yet.

+1.

Thanks!

Stephen

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: [HACKERS] ArrayLists instead of List (for some things)
Next
From: Aleksandr Parfenov
Date:
Subject: Re: [HACKERS] pgbench - use enum for meta commands