Thread: next draft of list rewrite
I've attached the latest header and implementation files (pg_list.h and list.c, respectively) for the new linked list implementation. I'm pretty satisfied with the new API: now would be the ideal time to suggest any changes you think I should make to it. I decided to use a new naming convention: all public functions and macros are prefixed with list_ or cell_, as appropriate, and type-specific functions are suffixed with _int or _oid, as appropriate. ISTM that the "allocate the list itself and the head node in one palloc()" idea that Tom suggested actually won't work: since the head node can change over the life of the list, we need to be able to deallocate a list cell via pfree() -- which would not be the case if a particular node was allocated via the same palloc() as the list itself. BTW, one nice thing about the new List API is that it lets us get rid of the WRITE_INTLIST_FIELD() and related cruft in nodes/, because each type of list (T_List, T_IntList, or T_OidList) is now a legitimate Node in its own right. So we can now use WRITE_NODE_FIELD() to output a list of integers, for example. (The implementation is a little "off the cuff", and I haven't reviewed it; also, since the tree is totally broken with this patch applied, I haven't tested it either. That said, I'd appreciate anyone mentioning any implementation bugs they might happen to spot.) Incidentally, does anyone have any thoughts on the best way to commit this? It requires changing pretty much _every single_ place in the code that uses the List API. Although most of those modifications should be trivial, it will still require touching a lot of files. Should I just make the changes locally and commit them all in one fell swoop? Create a private branch in CVS? Any other suggestions? FWIW, I'm hoping to have this done (and committed to CVS HEAD) by the end of next weekend. -Neil
Attachment
Neil Conway <neilc@samurai.com> writes: > I've attached the latest header and implementation files (pg_list.h > and list.c, respectively) for the new linked list implementation. I'm > pretty satisfied with the new API: now would be the ideal time to > suggest any changes you think I should make to it. I decided to use a > new naming convention: all public functions and macros are prefixed > with list_ or cell_, as appropriate, and type-specific functions are > suffixed with _int or _oid, as appropriate. I'm fairly unhappy with that idea. I thought the plan was to keep the API exactly the same as before, with the single exception of having to use ListCell* rather than List* as the type of pointers used to iterate through a list. Your apparent intention to rename every single list routine makes the change far more invasive than it has any need to be. It will undoubtedly break a fair amount of user-written code that we don't have control over; and to what benefit? Everyone who works on the backend is already accustomed to the existing names, which are mostly derived from ancient Lisp tradition. I don't think we should change them. If you want to push forward on it, we had better have a vote in pghackers, rather than assuming everyone concerned will see this discussion in -patches. Other than the naming issues, the code looks reasonable. I noticed one trivial bug, which is that the Asserts in list_member_int and list_member_oid seem backwards. A bigger problem is that list_length, list_head, and list_tail must *not* be macros --- we cannot afford double-evaluation bugs associated with operations as widely used as these. Since these functions are not used within loop bodies, I don't think that avoiding function call overhead for them is conceivably worth the risks involved. > ISTM that the "allocate the list itself and the head node in one > palloc()" idea that Tom suggested actually won't work: since the head > node can change over the life of the list, we need to be able to > deallocate a list cell via pfree() -- which would not be the case if a > particular node was allocated via the same palloc() as the list > itself. No, you'd simply leave that cell-space wasted if the head member were to change. It's doable, though it would complicate the deletion routine and probably nconc as well. It's entirely likely that it'd be more complexity than it's worth, though, so if you don't want to do it that's fine with me. Certainly it makes sense not to do it in the first pass. > BTW, one nice thing about the new List API is that it lets us get rid > of the WRITE_INTLIST_FIELD() and related cruft in nodes/, because each > type of list (T_List, T_IntList, or T_OidList) is now a legitimate > Node in its own right. So we can now use WRITE_NODE_FIELD() to output > a list of integers, for example. Right, that was a benefit I was expecting ... > (The implementation is a little "off the cuff", and I haven't reviewed > it; also, since the tree is totally broken with this patch applied, I > haven't tested it either. This is one reason I'd suggest refraining from renaming everything. Without the rename, it'd be possible to change the implementation with little or no change to the rest of the tree. The idea I had was to temporarily place "(ListCell*)" casts in foreach, lfirst, and lnext, so that they would work without complaint if the iteration variable is misdeclared as List* rather than ListCell*. Then it should be possible to test and commit the reimplemented list.c with few if any source-code changes elsewhere. After that, you can incrementally go through the rest of the tree and change iteration variables to ListCell*, removing the casts in the macros when all is done. This is a lot more pleasant way to proceed than a "big bang" changeover. (Removal of the FastList code could also be done incrementally.) regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > I thought the plan was to keep the API exactly the same as before, > with the single exception of having to use ListCell* rather than > List* as the type of pointers used to iterate through a list. I thought that once we had decided to change the API, anything was fair game: either we need to change every call site of the API (which would be required if we adopted ListCell* as the type of the foreach iteration variable), or we don't. > It will undoubtedly break a fair amount of user-written code that we > don't have control over Granted, but so will changing the API at all: as above, the ListCell* change would cause approximately the same amount of breakage for users. > Everyone who works on the backend is already accustomed to the > existing names, which are mostly derived from ancient Lisp > tradition. The heritage of the names was part of my reason for changing them: the Lisp names imply a certain style of list implementation (cons cells, linear-time length and append ops but constant time prepend) that is no longer being used. In addition, that makes names like lfirst() (which would now be applied to a ListCell*) no longer very meaningful. In addition, there are plenty of infelicities that it would be nice to remove from the API. For example: - names like lispRemove() that aren't very meaningful (AFAICS), even in the present implementation - some names use case to separate words (e.g. LispRemove(), freeList()), some use underscores (e.g. set_union()), while some use nothing (e.g. llastnode()) - some names begin with capital letters (e.g. FastAppend()), the rest do not - some names use a 'ptr/int/oid' prefix to denote type-specific variants of a function, whereas some use a 'i' or 'o' suffix for the same purpose. - some names are prefixed with 'l', some are not -- without apparently rhyme or reason for this distinction - some functions take the List* as their first argument (e.g. lappend()), some do not (e.g. nth()) -- again, without any good reason for the inconsistency that I could see > If you want to push forward on it, we had better have a vote in > pghackers, rather than assuming everyone concerned will see this > discussion in -patches. I'm still open to being convinced, BTW: I only write the code as it is without prior discussion because I misunderstood your earlier position. I'll send a mail to hackers if we're still in disagreement at the end of this thread. > I noticed one trivial bug, which is that the Asserts in > list_member_int and list_member_oid seem backwards. Thanks: Alvaro was already kind enough to point that out via email. > A bigger problem is that list_length, list_head, and list_tail must > *not* be macros --- we cannot afford double-evaluation bugs > associated with operations as widely used as these. Fair enough. I'll also take a look at GCC-style inline functions... > No, you'd simply leave that cell-space wasted if the head member > were to change. It's doable, though it would complicate the > deletion routine and probably nconc as well. It's entirely likely > that it'd be more complexity than it's worth, though, so if you > don't want to do it that's fine with me. Certainly it makes sense > not to do it in the first pass. Ok, I won't worry about it for now. > [ a scheme for iteratively committing the changes ] > This is a lot more pleasant way to proceed than a "big bang" > changeover. I agree that your method is easier procedurally, but ISTM that the aggregate amount of work required is about the same: we still need to change pretty much every call site of the list API. Yes, it is slightly easier if we can do this one call site at a time, and yes, it's slightly easier if s/List/ListCell/ is the only required change, but there is about the same degree of work involved either way -- and IMHO the benefit of a well named API is worth the short-term pain. Furthermore, it is quite possible to reduce the pain induced by my method. For example, we could create a private CVS branch. In that branch, it wouldn't matter if the tree is broken for a week or two, in which time we can make the changes across the tree at our leisure, and then merge them into HEAD in one relatively painless branch landing. -Neil
Neil Conway <neilc@samurai.com> writes: > I thought that once we had decided to change the API, anything was > fair game: either we need to change every call site of the API (which > would be required if we adopted ListCell* as the type of the foreach > iteration variable), or we don't. Changing the iteration variable type only impacts places that use foreach, lfirst, and lnext, which is a small subset of the places that use the complete API. (A quick look at a couple of planner files suggested that it'd be about half as many changes, though I'm not certain that the ones I looked at are very representative.) A bigger point is that the iteration variable type makes no real difference in terms of reading the code, whereas a wholesale change in function names will entail a fair amount of relearning. BTW, I noticed that lfirst() gets applied to both iteration variables and true Lists, so we will certainly need to do something about that. The second case is much less frequent, so possibly lfirst(first_cell(list)) is the right sort of locution for it. It might be worthwhile to define ListCell and List as the same struct temporarily (with some wasted fields in each case) until all the cases like that can be fixed. > In addition, there are plenty of infelicities that it would be nice to > remove from the API. For example: Yes, there's inconsistency in the names. That's true all through Postgres, and is an inevitable result when you have code that's been worked on by many people over many years. If we were going to start enforcing some project-wide naming standards, we could make things prettier-looking, but I don't see the call to do it to list.c alone. > I agree that your method is easier procedurally, but ISTM that the > aggregate amount of work required is about the same: we still need to > change pretty much every call site of the list API. You could still do it incrementally, by making macros that implement the old function names on top of the new. But I remain of the opinion that renaming functions simply for the sake of renaming is a dead loss of time --- and not only your time, but a lot of other contributors'. > Furthermore, it is quite possible to reduce the pain induced by my > method. For example, we could create a private CVS branch. In that > branch, it wouldn't matter if the tree is broken for a week or two, in > which time we can make the changes across the tree at our leisure, and > then merge them into HEAD in one relatively painless branch landing. I doubt the branch landing would be painless, unless the rest of us sit around twiddling our thumbs meanwhile. This is such core code that a wholesale renaming would likely break any patch anyone else has in progress. regards, tom lane