Re: POC: converting Lists into arrays - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: POC: converting Lists into arrays |
Date | |
Msg-id | CA+TgmoZM4c=v0ifR++i09JFtNtnqTq_HkFmbFjwWqbUJffOLmA@mail.gmail.com Whole thread Raw |
In response to | Re: POC: converting Lists into arrays (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: POC: converting Lists into arrays
Re: POC: converting Lists into arrays |
List | pgsql-hackers |
On Sun, Mar 3, 2019 at 1:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > My main observation was from when the expression evaluation was using > > lists all over. List iteration overhead was very substantial there. But > > that's not a problem anymore, because all of those are gone now due to > > the expression rewrite. I personally wasn't actually advocating for a > > new list implementation, I was/am advocating that we should move some > > tasks over to a more optimized representation. > > I doubt that you'll get far with that; if this experiment is anything > to go by, it's going to be really hard to make the case that twiddling > the representation of widely-known data structures is worth the work > and bug hazards. I'm befuddled by this comment. Andres is arguing that we shouldn't go do a blind search-and-replace, but rather change certain things, and you're saying that's going to be really hard because twiddling the representation of widely-known data structures is really hard. But if we only change certain things, we don't *need* to twiddle the representation of a widely-known data structure. We just add a new one and convert the things that benefit from it, like I proposed upthread (and promptly got told I was wrong). I think the reason why you're not seeing a performance benefit is because the problem is not that lists are generically a more expensive data structure than arrays, but that there are cases when they are more expensive than arrays. If you only ever push/pop at the front, of course a list is going to be better. If you often look up elements by index, of course an array is going to be better. If you change every case where the code currently uses a list to use something else instead, then you're changing both the winning and losing cases. Yeah, changing things individually is more work, but that's how you get the wins without incurring the losses. I think David's results go in this direction, too. Code that was written on the assumption that list_nth() is slow is going to avoid using it as much as possible, and therefore no benefit is to be expected from making it fast. If the author written the same code assuming that the underlying data structure was an array rather than a list, they might have picked a different algorithm which, as David's results shows, could be a lot faster in some cases. But it's not going to come from just changing the way lists work internally; it's going to come from redesigning the algorithms that are using lists to do something better instead, as Andres's example of linearized expression evaluation also shows. > The cases I've been looking at suggest to me that we'd make far > more progress on the excessive-palloc'ing front if we could redesign > things to reduce unnecessary copying of parsetrees. Right now the > planner does an awful lot of copying because of fear of unwanted > modifications of multiply-linked subtrees. I suspect that we could > reduce that overhead with some consistently enforced rules about > not scribbling on input data structures; but it'd take a lot of work > to get there, and I'm afraid it'd be easily broken :-( I think that's a separate but also promising thing to attack, and I agree that it'd take a lot of work to get there. I don't think that the problem with either parse-tree-copying or list usage is that no performance benefits are to be had; I think it's that the amount of work required to get those benefits is pretty large. If it were otherwise, somebody likely would have done it before now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: