Re: [HACKERS] ArrayLists instead of List (for some things) - Mailing list pgsql-hackers

From David Rowley
Subject Re: [HACKERS] ArrayLists instead of List (for some things)
Date
Msg-id CAKJS1f-5K1phLzPQRpdbEZ45W-aWeYuC7V=uTaHC2aXK_zsq4A@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] ArrayLists instead of List (for some things)  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On 3 November 2017 at 03:27, Stephen Frost <sfrost@snowman.net> wrote:
> * David Rowley (david.rowley@2ndquadrant.com) wrote:
>> We could get around a few of these problems if Lists were implemented
>> internally as arrays, however, arrays are pretty bad if we want to
>> delete an item out the middle as we'd have to shuffle everything over
>> one spot. Linked lists are much better here... at least they are when
>> you already have the cell you want to remove... so we can't expect
>> that we can just rewrite List to use arrays instead.
>
> This actually depends on just how you delete the item out of the array,
> though there's a trade-off there.  If you "delete" the item by marking
> it as "dead" then you don't need to re-shuffle everything, but, of
> course, then you have to make sure to 'skip' over those by running down
> the array for each list_nth() call- but here's the thing, with a small
> list that's all in a tighly packed array, that might not be too bad.
> Certainly far better than having to grab random memory on each step and
> I'm guessing you'd only actually store the "data" item for a given list
> member in the array if it's a integer/oid/etc list, not when it's
> actually a pointer being stored in the list, so you're always going to
> be working with pretty tightly packed arrays where scanning the list on
> the list_nth call really might be darn close to as fast as using an
> offset to the actual entry in the array, unless it's a pretty big list,
> but I don't think we've actually got lots of multi-hundred-deep lists.

I was hoping to have list_nth as always O(1) so that we'd never be
tempted again to use arrays directly like we are not for things like
simle_rel_array[].
Probably it could be made to work quickly in most cases by tracking if
there are any deleted items and just do the direct array lookups if
nothing is deleted, and if something is deleted then visit index n
minus bms_num_members() of some deleted bitmapset for the bits earlier
than the Nth element. bms_num_members() could be made faster with
64bit bitmap words and __builtin_popcountl() falling back on the
number_of_ones[] array when not available. Would also require a
bms_num_members_before() function.

In the implementation that I attached, I showed an example of how to
delete items out an array list, which I think is quite a bit cleaner
than how we generally do it today with List. (e.g in
reconsider_outer_join_clauses()) However, it'll still perform badly
when just a single known item is being removed.


-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] ArrayLists instead of List (for some things)
Next
From: Craig Ringer
Date:
Subject: Re: [HACKERS] Custom compression methods