Re: Reducing tuple overhead - Mailing list pgsql-hackers

From Joshua D. Drake
Subject Re: Reducing tuple overhead
Date
Msg-id 5539253D.8000506@commandprompt.com
Whole thread Raw
In response to Re: Reducing tuple overhead  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Responses Re: Reducing tuple overhead
List pgsql-hackers
On 04/23/2015 09:42 AM, Jim Nasby wrote:
>
> On 4/23/15 11:24 AM, Andres Freund wrote:
>> I do wonder what, in realistic cases, is actually the bigger contributor
>> to the overhead. The tuple header or the padding we liberally add in
>> many cases...
>
> Assuming you're talking about padding between fields...
>
> Several years ago Enova paid Command Prompt to start work on logical
> column ordering, and part of the motivation for that was to allow
> re-ordering physical tuples into the most efficient on-disk format
> possible. I think I did some tests re-arranging some tables into the
> theoretically most efficient order and measuring heap size. I think
> there was some modest size improvement, maybe 10-15%? This was several
> years ago so it's all foggy. Maybe Josh can find some of this in CMD's
> ticketing system?

Yeah I dug around. I don't see anything about size improvement but here 
are our notes:

Alvaro said:

I ended up not producing notes as regularly as I had initially hoped. To 
try and make up for it, here's an update covering everything I've done 
since I started working on this issue.

This patch turned out to be completely different than what we had 
initially thought. We had thought it was going to be a matter of finding 
out places that used "attnum" and replace it with either attnum, 
attlognum or attphysnum, depending on what order was necessary on any 
given spot. This wasn't an easy thing to do because there are several 
hundreds of those. So it was supposed to be amazingly time-consuming and 
rather boring work.

This has nothing to do with reality: anywhere from parser down to 
optimizer and executor, the way things work is that a list of attributes 
is built, processed, and referenced. Some places assume that the list is 
in a certain order that's always the same order for those three cases. 
So the way to develop this feature is to change those places so that 
instead of receiving the list in one of these orders, they instead 
receive it in a different order.

So what I had to do early on, was find a way to retrieve the sort order 
from catalogs, preserve it when TupleDescriptors are built, and ensure 
the attribute list is extracted from TupleDesc in the correct order. But 
it turned out that this is not enough, because down in the parser guts, 
a target list is constructed; and later, a TupleDescriptor is built from 
the target list. So it's necessary to preserve the sorting info from the 
original tuple descriptor into the target list (which means adding order 
info to Var and TargetEntry nodes), so that the new TupleDesc can also 
have it.

Today I'm finding that even more than that is necessary. It turns out 
that the RangeTableEntries (i.e. the entries in the FROM clause of a 
query) have an item dubbed "eref" which is a list of column names; due 
to my changes in the earlier parser stages, this list is sorted in 
logical column order; but the code to resolve things such as columns 
used in JOIN/ON clauses walks the list (which is in logical order) and 
then uses the number of times it had to walk the elements in the list to 
construct a Var (column reference) in "attnum" order -- so it finds a 
different column, and it all fails.

So what I'm doing now is modify the RangeTableEntry node to keep a 
mapping list of logical to identity numbers. Then I'll have to search 
for places using the rte->eref->colnames and make sure that they 
correctly use attlognum as index into it.

And then later:

First of all I should note that I discussed the approach mentioned above 
to pgsql-hackers and got a very interesting comment from Tom Lane that 
adding sorting info to Var and TargetEntry nodes was not a very good 
idea because it'd break stored rules whenever a table column changed. So 
I went back and studied that code and noticed that it was really the 
change in RangeTableEntry that's doing the good magic; those other 
changes are fortunately not necessary. (Though there were a necessary 
vehicle for me to understand how the other stuff works.)

I've been continuing to study the backend code looking for uses of 
attribute lists that assume a single ordering. As I get more into it, 
more complex cases appear. The number of cases is fortunately bounded, 
though. Most of the uses of straight attribute lists are in places that 
do not require modification, or require little work or thought to update 
correctly.

However, some other places are not like that. I have "fixed" SQL 
functions two times now, and I just found out that the second fix (which 
I believed to be "mostly correct") was to be the final one, but I found 
out just now that it's not, and the proper fix is going to require 
something a bit more low-level (namely, a projection step that reorders 
columns correctly after the fact). Fortunately, I believe that this 
extra projection step is going to fix a lot of other cases too, which I 
originally had no idea how to attack. Moreover, understanding that bit 
means I also figured out what Tom Lane meant on the second half of his 
response to my original pgsql-hackers comment. So I think we're good on 
that front.




-- 
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.



pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: Re: Reducing tuple overhead
Next
From: Radovan Jablonovsky
Date:
Subject: adding more information about process(es) cpu and memory usage