Thread: Re: [HACKERS] Terrible performance on wide selects
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Wednesday, January 22, 2003 3:15 PM > To: Steve Crawford > Cc: pgsql-performance@postgreSQL.org; pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] Terrible performance on wide selects > > > Steve Crawford sent me some profiling results for queries > involving wide tuples (hundreds of columns). > > > Done, results attached. nocachegetattr seems to be the > likely suspect. > > Yipes, you can say that again. > > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 93.38 26.81 26.81 885688 0.03 0.03 nocachegetattr > > 0.00 0.00 1/885688 heapgettup [159] > 0.00 0.00 1/885688 > CatalogCacheComputeTupleHashValue [248] > 0.00 0.00 5/885688 SearchCatCache [22] > 13.40 0.00 442840/885688 ExecEvalVar [20] > 13.40 0.00 442841/885688 printtup [12] > [11] 93.4 26.81 0.00 885688 nocachegetattr [11] > > > Half of the calls are coming from printtup(), which seems > relatively easy to fix. > > /* > * send the attributes of this tuple > */ > for (i = 0; i < natts; ++i) > { > ... > origattr = heap_getattr(tuple, i + 1, typeinfo, > &isnull); > ... > } > > The trouble here is that in the presence of variable-width > fields, heap_getattr requires a linear scan over the tuple > --- and so the total time spent in it is O(N^2) in the number > of fields. > > What we could do is reinstitute heap_deformtuple() as the inverse of > heap_formtuple() --- but make it extract Datums for all the > columns in a single pass over the tuple. This would reduce > the time in printtup() from O(N^2) to O(N), which would > pretty much wipe out that part of the problem. > > The other half of the calls are coming from ExecEvalVar, > which is a harder problem to solve, since those calls are > scattered all over the place. It's harder to see how to get > them to share work. Any ideas out there? Is it possible that the needed information could be retrieved by querying the system metadata to collect the column information? Once the required tuple attributes are described, it could form a binding list that allocates a buffer of sufficient size with pointers to the required column start points. Maybe I don't really understand the problem, but it seems simple enough to do it once for the whole query. If this is utter stupidity, please disregard and have a hearty laugh at my expense. ;-)
"Dann Corbit" <DCorbit@connx.com> writes: > Maybe I don't really understand the problem, but it seems simple enough > to do it once for the whole query. We already do cache column offsets when they are fixed. The code that's the problem executes when there's a variable-width column in the table --- which means that all columns to its right are not at fixed offsets, and have to be scanned for separately in each tuple, AFAICS. regards, tom lane
Tom Lane kirjutas N, 23.01.2003 kell 02:04: > "Dann Corbit" <DCorbit@connx.com> writes: > > Maybe I don't really understand the problem, but it seems simple enough > > to do it once for the whole query. > > We already do cache column offsets when they are fixed. The code that's > the problem executes when there's a variable-width column in the table > --- which means that all columns to its right are not at fixed offsets, > and have to be scanned for separately in each tuple, AFAICS. Not only varlen columns, but also NULL columns forbid knowing the offsets beforehand. -- Hannu Krosing <hannu@tm.ee>
>>>Hannu Krosing said: > Tom Lane kirjutas N, 23.01.2003 kell 02:04: > > We already do cache column offsets when they are fixed. The code that's > > the problem executes when there's a variable-width column in the table > > --- which means that all columns to its right are not at fixed offsets, > > and have to be scanned for separately in each tuple, AFAICS. > > Not only varlen columns, but also NULL columns forbid knowing the > offsets beforehand. Does this mean, that constructing tables where fixed length fields are 'before' variable lenght fields and 'possibly null' fields might increase performance? Daniel
Daniel Kalchev <daniel@digsys.bg> writes: > Does this mean, that constructing tables where fixed length fields are > 'before' variable lenght fields and 'possibly null' fields might increase > performance? There'd have to be no nulls, period, to get any useful performance difference --- but yes, in theory putting fixed-length columns before variable-length ones is a win. I wouldn't bother going out to rearrange your schemas though ... at least not before you do some tests to prove that it's worthwhile. regards, tom lane
On Thu, 23 Jan 2003, Daniel Kalchev wrote: > Does this mean, that constructing tables where fixed length fields are > 'before' variable lenght fields and 'possibly null' fields might increase > performance? This, I believe, is why DB2 always puts (in physical storage) all of the fixed-length fields before the variable-length fields. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC