Re: embedded list v2 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: embedded list v2
Date
Msg-id 201209151008.51251.andres@2ndquadrant.com
Whole thread Raw
In response to Re: embedded list v2  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: embedded list v2
List pgsql-hackers
Hi Tom,

On Saturday, September 15, 2012 07:32:45 AM Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On Friday, September 14, 2012 10:48:35 PM Tom Lane wrote:
> >> Instead let's provide a macro for an empty list value, so that you can
> >> do something like
> >> static ilist_d_head DatabaseList = EMPTY_DLIST;
> > 
> > I don't think that can work with dlists because they are circular and
> > need their pointers to be adjusted.
> 
> Well, actually, that just brings us to the main point which is: I do not
> believe that circular links are a good design choice here.  They prevent
> the possibility of trivial list initialization, they make plain
> iteration over the list more expensive, and you have provided no
> evidence that there's any meaningful savings in other operations, much
> less enough to justify those disadvantages.
I think I have talked about the reasoning on the list before, but here it 
goes: The cases where I primarily used them are cases where the list 
*manipulation* is a considerable part of the expense. In those situations 
having less branches is beneficial and, to my knowledge, cannot be done in 
normal flat lists.
For the initial user of those lists - the slab allocator for postgres which I 
intend to finish at some point - the move to circular lists instead of 
classical lists was an improvement in the 12-15% range.

Inhowfar do they make iteration more expensive? ptr != head shouldn't be more 
expensive than !ptr.

Imo, that leaves initialization where I admit that you have a case. I don't 
find it a big problem in practise though.

> >> The apparently random decisions to make some things inline functions
> >> and other things macros (even though they evaluate their args multiple
> >> times) will come back to bite us.  I am much more interested in
> >> unsurprising behavior of these constructs than I am in squeezing out
> >> an instruction or two, so I'm very much against the unsafe macros.
> > 
> > At the moment the only thing that are macros are things that cannot be
> > expressed as inline functions because they return the actual contained
> > type and/or because they contain a for() loop.
> 
> I don't really care.  If you can't build it without multiple-evaluation
> macros, it's too dangerous for a fundamental construct that's meant to
> be used all over the place.  Time to redesign.
Its not like pg doesn't use any other popularly used macros that have multiple 
evaluation hazarards. Even in cases where they *could* be replaced by inline 
functions without that danger.

> One possible option, though it's a bit restrictive, is to insist that
> the list node be the first element of whatever it's embedded in,
> eliminating the need for ilist_container altogether.  This would not
> work if we meant to put these kinds of list links into Node-derived
> structs, but I suspect we don't need that.
I already had list elements which are in multiple lists at the same time and I 
think replacing some of the pg_list.h usages might be a good idea even for 
Node derived structures given the List manipulation overhead we have seen in 
several profiles.

> Then for instance (given the additional choice to not use circular links)
> dlist_get_head would degenerate to ((type *) (ptr)->head.next), which
> eliminates its multi-evaluation hazard and saves a few instructions as well.
I still fail to see why not using cirular lists removes instructions in such a 
situation:

Get the first element in a circular list:
(ptr)->head.next

->head.next is at a constant offset from the start of *ptr, just as a ->first 
pointer is.

In iterations like:

for(name = (ptr)->head.next;name != &(ptr)->head;name = name->next)
{
}

you have one potentially additional indexed memory access (&ptr->head) for the 
whole loop to an address which will normally lie on the same cacheline as the 
already accessed ->next pointer.

> Or if you don't want to do that, dlist_get_head(type, membername, ptr)
> could be defined as
>     ((type *) dlist_do_get_head(ptr, offsetof(type, membername)))
> where dlist_do_get_head is an inline'able function, eliminating the
> multi-evaluation-of-ptr hazard.
Thats a rather neat idea. Let me play with it for a second, it might make some 
of the macros safe, although I don't see how it could work for for() loops.

Just to make that clear, purely accessing the first node can be done without 
any macros at al, its just the combination of returning the first node and 
getting the contained element that needs to be a macro because of the variadic 
type issues (at times, I really wish for c++ style templates...)

I will take a stab at trying to improve Alvaro's current patch wrt to those 
issues.

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Amit kapila
Date:
Subject: FW: Minor inheritance/check bug: Inconsistent behavior
Next
From: Andres Freund
Date:
Subject: Re: git tree