Re: Reducing tuple overhead - Mailing list pgsql-hackers
From | Joshua D. Drake |
---|---|
Subject | Re: Reducing tuple overhead |
Date | |
Msg-id | 5539253D.8000506@commandprompt.com Whole thread Raw |
In response to | Re: Reducing tuple overhead (Jim Nasby <Jim.Nasby@BlueTreble.com>) |
Responses |
Re: Reducing tuple overhead
|
List | pgsql-hackers |
On 04/23/2015 09:42 AM, Jim Nasby wrote: > > On 4/23/15 11:24 AM, Andres Freund wrote: >> I do wonder what, in realistic cases, is actually the bigger contributor >> to the overhead. The tuple header or the padding we liberally add in >> many cases... > > Assuming you're talking about padding between fields... > > Several years ago Enova paid Command Prompt to start work on logical > column ordering, and part of the motivation for that was to allow > re-ordering physical tuples into the most efficient on-disk format > possible. I think I did some tests re-arranging some tables into the > theoretically most efficient order and measuring heap size. I think > there was some modest size improvement, maybe 10-15%? This was several > years ago so it's all foggy. Maybe Josh can find some of this in CMD's > ticketing system? Yeah I dug around. I don't see anything about size improvement but here are our notes: Alvaro said: I ended up not producing notes as regularly as I had initially hoped. To try and make up for it, here's an update covering everything I've done since I started working on this issue. This patch turned out to be completely different than what we had initially thought. We had thought it was going to be a matter of finding out places that used "attnum" and replace it with either attnum, attlognum or attphysnum, depending on what order was necessary on any given spot. This wasn't an easy thing to do because there are several hundreds of those. So it was supposed to be amazingly time-consuming and rather boring work. This has nothing to do with reality: anywhere from parser down to optimizer and executor, the way things work is that a list of attributes is built, processed, and referenced. Some places assume that the list is in a certain order that's always the same order for those three cases. So the way to develop this feature is to change those places so that instead of receiving the list in one of these orders, they instead receive it in a different order. So what I had to do early on, was find a way to retrieve the sort order from catalogs, preserve it when TupleDescriptors are built, and ensure the attribute list is extracted from TupleDesc in the correct order. But it turned out that this is not enough, because down in the parser guts, a target list is constructed; and later, a TupleDescriptor is built from the target list. So it's necessary to preserve the sorting info from the original tuple descriptor into the target list (which means adding order info to Var and TargetEntry nodes), so that the new TupleDesc can also have it. Today I'm finding that even more than that is necessary. It turns out that the RangeTableEntries (i.e. the entries in the FROM clause of a query) have an item dubbed "eref" which is a list of column names; due to my changes in the earlier parser stages, this list is sorted in logical column order; but the code to resolve things such as columns used in JOIN/ON clauses walks the list (which is in logical order) and then uses the number of times it had to walk the elements in the list to construct a Var (column reference) in "attnum" order -- so it finds a different column, and it all fails. So what I'm doing now is modify the RangeTableEntry node to keep a mapping list of logical to identity numbers. Then I'll have to search for places using the rte->eref->colnames and make sure that they correctly use attlognum as index into it. And then later: First of all I should note that I discussed the approach mentioned above to pgsql-hackers and got a very interesting comment from Tom Lane that adding sorting info to Var and TargetEntry nodes was not a very good idea because it'd break stored rules whenever a table column changed. So I went back and studied that code and noticed that it was really the change in RangeTableEntry that's doing the good magic; those other changes are fortunately not necessary. (Though there were a necessary vehicle for me to understand how the other stuff works.) I've been continuing to study the backend code looking for uses of attribute lists that assume a single ordering. As I get more into it, more complex cases appear. The number of cases is fortunately bounded, though. Most of the uses of straight attribute lists are in places that do not require modification, or require little work or thought to update correctly. However, some other places are not like that. I have "fixed" SQL functions two times now, and I just found out that the second fix (which I believed to be "mostly correct") was to be the final one, but I found out just now that it's not, and the proper fix is going to require something a bit more low-level (namely, a projection step that reorders columns correctly after the fact). Fortunately, I believe that this extra projection step is going to fix a lot of other cases too, which I originally had no idea how to attack. Moreover, understanding that bit means I also figured out what Tom Lane meant on the second half of his response to my original pgsql-hackers comment. So I think we're good on that front. -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing "I'm offended" is basically telling the world you can't control your own emotions, so everyone else should do it for you.
pgsql-hackers by date: