Pre-alloc ListCell's optimization - Mailing list pgsql-hackers

From Stephen Frost
Subject Pre-alloc ListCell's optimization
Date
Msg-id 20110525025621.GH4548@tamriel.snowman.net
Whole thread Raw
Responses Re: Pre-alloc ListCell's optimization  (Stephen Frost <sfrost@snowman.net>)
Re: Pre-alloc ListCell's optimization  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: Pre-alloc ListCell's optimization  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: Pre-alloc ListCell's optimization  (Robert Haas <robertmhaas@gmail.com>)
Re: Pre-alloc ListCell's optimization  (Stephen Frost <sfrost@snowman.net>)
Re: Pre-alloc ListCell's optimization  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
Greetings,

  Someone (*cough*Haas*cough) made a claim over beers at PGCon that it
  would be very difficult to come up with a way to pre-allocate List
  entries and maintain the current List API.  I'll admit that it wasn't
  quite as trivial as I had *hoped*, but attached is a proof-of-concept
  patch which does it.

  A couple of notes regarding the patch:

  First, it uses ffs(), which might not be fully portable..  We could
  certainly implement the same thing in userspace and use ffs() when
  it's available.

  Second, there are a couple of bugs (or at least, I'll characterize
  them that way) where we're pfree'ing a list which has been passed to
  list_concat().  That's not strictly legal as either argument passed to
  list_concat() could be returned and so one might end up free'ing the
  list pointer that was just returned.  Those pfree's are commented out
  in this patch, but really should probably just be removed or replaced
  with 'right' code (if it's critical to free this stuff).

  Third, COPY_NODE_CELL() wasn't used anywhere other than _copyList and
  would be difficult to keep as a macro given the way allocations happen
  now for lists.  It's no longer being used at all and therefore should
  really be removed.

  Fourth, the current implementation goes ahead and allocates 8
  ListCell's, which quadruples the size of a List (40 bytes to 168
  bytes, assuming 64bit ints).  I don't see that as really being an
  issue, but perhaps others would disagree.

  Fifth, list_concat() has become more like list_concat_unique() and
  friends (and hence slower).  Instead of being able to just tack one
  list on to the end of the other, we have to do an actual copy of the
  list.  This is due to having to allow callers to list_free(), which
  will call pfree(), and we don't have any way to know which ListCell's
  from the *old* list were pre-allocated and which weren't, which can
  end up causing pfree() to crash when it's passed an invalid pointer.
  In general, I'd hope that not having to palloc() for a small list
  might make up for a lot of that slowdown, but you can't really argue
  that anything O(n) can compete with the previous O(1) approach.

  Finally, sorry it's kind of a fugly patch, it's just a proof of
  concept and I'd be happy to clean it up if others feel it's worthwhile
  and a reasonable approach, but I really need to get it out there and
  take a break from it (I've been a bit obsessive-compulsive about it
  since PGCon.. :D).

      Thanks!

        Stephen

Attachment

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: tackling full page writes
Next
From: Stephen Frost
Date:
Subject: Re: Pre-alloc ListCell's optimization