Thread: Re: Terrible performance on wide selects

Re: Terrible performance on wide selects

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Wednesday, January 22, 2003 4:18 PM
> To: Dann Corbit
> Cc: Steve Crawford; pgsql-performance@postgreSQL.org;
> pgsql-hackers@postgreSQL.org
> Subject: Re: [HACKERS] Terrible performance on wide selects
>
>
> "Dann Corbit" <DCorbit@connx.com> writes:
> > Why not waste a bit of memory and make the row buffer the maximum
> > possible length? E.g. for varchar(2000) allocate 2000 characters +
> > size element and point to the start of that thing.
>
> Surely you're not proposing that we store data on disk that way.
>
> The real issue here is avoiding overhead while extracting
> columns out of a stored tuple.  We could perhaps use a
> different, less space-efficient format for temporary tuples
> in memory than we do on disk, but I don't think that will
> help a lot.  The nature of O(N^2) bottlenecks is you have to
> kill them all --- for example, if we fix printtup and don't
> do anything with ExecEvalVar, we can't do more than double
> the speed of Steve's example, so it'll still be slow.  So we
> must have a solution for the case where we are disassembling
> a stored tuple, anyway.
>
> I have been sitting here toying with a related idea, which is
> to use the heap_deformtuple code I suggested before to form
> an array of pointers to Datums in a specific tuple (we could
> probably use the TupleTableSlot mechanisms to manage the
> memory for these).  Then subsequent accesses to individual
> columns would just need an array-index operation, not a
> nocachegetattr call.  The trick with that would be that if
> only a few columns are needed out of a row, it might be a net
> loss to compute the Datum values for all columns.  How could
> we avoid slowing that case down while making the wide-tuple
> case faster?

For the disk case, why not have the start of the record contain an array
of offsets to the start of the data for each column?  It would only be
necessary to have a list for variable fields.

So (for instance) if you have 12 variable fields, you would store 12
integers at the start of the record.

Re: Terrible performance on wide selects

From
Tom Lane
Date:
"Dann Corbit" <DCorbit@connx.com> writes:
> For the disk case, why not have the start of the record contain an array
> of offsets to the start of the data for each column?  It would only be
> necessary to have a list for variable fields.

No, you'd need an entry for *every* column (or at least, every one to
the right of the first variable-width column or NULL).  That's a lot of
overhead, especially in comparison to datatypes like bool or int4 ...

            regards, tom lane