Thread: Re: [HACKERS] Terrible performance on wide selects

Re: [HACKERS] Terrible performance on wide selects

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Wednesday, January 22, 2003 4:04 PM
> To: Dann Corbit
> Cc: Steve Crawford; pgsql-performance@postgreSQL.org;
> pgsql-hackers@postgreSQL.org
> Subject: Re: [HACKERS] Terrible performance on wide selects
>
>
> "Dann Corbit" <DCorbit@connx.com> writes:
> > Maybe I don't really understand the problem, but it seems simple
> > enough to do it once for the whole query.
>
> We already do cache column offsets when they are fixed.  The
> code that's the problem executes when there's a
> variable-width column in the table
> --- which means that all columns to its right are not at
> fixed offsets, and have to be scanned for separately in each
> tuple, AFAICS.

Why not waste a bit of memory and make the row buffer the maximum
possible length?
E.g. for varchar(2000) allocate 2000 characters + size element and point
to the start of that thing.

If we have 64K rows, even at that it is a pittance.  If someone designs
10,000 row tables, then it will allocate an annoyingly large block of
memory, but bad designs are always going to cause a fuss.

Re: [HACKERS] Terrible performance on wide selects

From
Tom Lane
Date:
"Dann Corbit" <DCorbit@connx.com> writes:
> Why not waste a bit of memory and make the row buffer the maximum
> possible length?
> E.g. for varchar(2000) allocate 2000 characters + size element and point
> to the start of that thing.

Surely you're not proposing that we store data on disk that way.

The real issue here is avoiding overhead while extracting columns out of
a stored tuple.  We could perhaps use a different, less space-efficient
format for temporary tuples in memory than we do on disk, but I don't
think that will help a lot.  The nature of O(N^2) bottlenecks is you
have to kill them all --- for example, if we fix printtup and don't do
anything with ExecEvalVar, we can't do more than double the speed of
Steve's example, so it'll still be slow.  So we must have a solution for
the case where we are disassembling a stored tuple, anyway.

I have been sitting here toying with a related idea, which is to use the
heap_deformtuple code I suggested before to form an array of pointers to
Datums in a specific tuple (we could probably use the TupleTableSlot
mechanisms to manage the memory for these).  Then subsequent accesses to
individual columns would just need an array-index operation, not a
nocachegetattr call.  The trick with that would be that if only a few
columns are needed out of a row, it might be a net loss to compute the
Datum values for all columns.  How could we avoid slowing that case down
while making the wide-tuple case faster?

            regards, tom lane

Re: [HACKERS] Terrible performance on wide selects

From
Hannu Krosing
Date:
Tom Lane kirjutas N, 23.01.2003 kell 02:18:
> "Dann Corbit" <DCorbit@connx.com> writes:
> > Why not waste a bit of memory and make the row buffer the maximum
> > possible length?
> > E.g. for varchar(2000) allocate 2000 characters + size element and point
> > to the start of that thing.
>
> Surely you're not proposing that we store data on disk that way.
>
> The real issue here is avoiding overhead while extracting columns out of
> a stored tuple.  We could perhaps use a different, less space-efficient
> format for temporary tuples in memory than we do on disk, but I don't
> think that will help a lot.  The nature of O(N^2) bottlenecks is you
> have to kill them all --- for example, if we fix printtup and don't do
> anything with ExecEvalVar, we can't do more than double the speed of
> Steve's example, so it'll still be slow.  So we must have a solution for
> the case where we are disassembling a stored tuple, anyway.
>
> I have been sitting here toying with a related idea, which is to use the
> heap_deformtuple code I suggested before to form an array of pointers to
> Datums in a specific tuple (we could probably use the TupleTableSlot
> mechanisms to manage the memory for these).  Then subsequent accesses to
> individual columns would just need an array-index operation, not a
> nocachegetattr call.  The trick with that would be that if only a few
> columns are needed out of a row, it might be a net loss to compute the
> Datum values for all columns.  How could we avoid slowing that case down
> while making the wide-tuple case faster?

make the pointer array incrementally for O(N) performance:

i.e. for tuple with 100 cols, allocate an array of 100 pointers, plus
keep count of how many are actually valid,

so the first call to get col[5] will fill first 5 positions in the array
save said nr 5 and then access tuple[ptrarray[5]]

next call to get col[75] will start form col[5] and fill up to col[75]

next call to col[76] will start form col[75] and fill up to col[76]

next call to col[60] will just get tuple[ptrarray[60]]

the above description assumes 1-based non-C arrays ;)

--
Hannu Krosing <hannu@tm.ee>

Re: [HACKERS] Terrible performance on wide selects

From
Hannu Krosing
Date:
Hannu Krosing kirjutas N, 23.01.2003 kell 12:11:

> make the pointer array incrementally for O(N) performance:
>
> i.e. for tuple with 100 cols, allocate an array of 100 pointers, plus
> keep count of how many are actually valid,

Additionally, this should also make repeted determining of NULL fields
faster - just put a NULL-pointer in and voila - no more bit-shifting and
AND-ing to find out if the field is null.

One has to watch the NULL bitmap on fist pass anyway.

> so the first call to get col[5] will fill first 5 positions in the array
> save said nr 5 and then access tuple[ptrarray[5]]
>
> next call to get col[75] will start form col[5] and fill up to col[75]
>
> next call to col[76] will start form col[75] and fill up to col[76]
>
> next call to col[60] will just get tuple[ptrarray[60]]
>
> the above description assumes 1-based non-C arrays ;)
--
Hannu Krosing <hannu@tm.ee>

Re: [HACKERS] Terrible performance on wide selects

From
Tom Lane
Date:
Hannu Krosing <hannu@tm.ee> writes:
>> i.e. for tuple with 100 cols, allocate an array of 100 pointers, plus
>> keep count of how many are actually valid,

> Additionally, this should also make repeted determining of NULL fields
> faster - just put a NULL-pointer in and voila - no more bit-shifting and
> AND-ing to find out if the field is null.

Right, the output of the operation would be a pair of arrays: Datum
values and is-null flags.  (NULL pointers don't work for pass-by-value
datatypes.)

I like the idea of keeping track of a last-known-column position and
incrementally extending that as needed.

I think the way to manage this is to add the overhead data (the output
arrays and last-column state) to TupleTableSlots.  Then we'd have
a routine similar to heap_getattr except that it takes a TupleTableSlot
and makes use of the extra state data.  The infrastructure to manage
the state data is already in place: for example, ExecStoreTuple would
reset the last-known-column to 0, ExecSetSlotDescriptor would be
responsible for allocating the output arrays using the natts value from
the provided tupdesc, etc.

This wouldn't help for accesses that are not in the context of a slot,
but certainly all the ones from ExecEvalVar are.  The executor always
works with tuples stored in slots, so I think we could fix all the
high-traffic cases this way.

            regards, tom lane

Re: [HACKERS] Terrible performance on wide selects

From
Bruce Momjian
Date:
Added to TODO:

    * Cache last known per-tuple offsets to speed long tuple access


---------------------------------------------------------------------------

Tom Lane wrote:
> Hannu Krosing <hannu@tm.ee> writes:
> >> i.e. for tuple with 100 cols, allocate an array of 100 pointers, plus
> >> keep count of how many are actually valid,
>
> > Additionally, this should also make repeted determining of NULL fields
> > faster - just put a NULL-pointer in and voila - no more bit-shifting and
> > AND-ing to find out if the field is null.
>
> Right, the output of the operation would be a pair of arrays: Datum
> values and is-null flags.  (NULL pointers don't work for pass-by-value
> datatypes.)
>
> I like the idea of keeping track of a last-known-column position and
> incrementally extending that as needed.
>
> I think the way to manage this is to add the overhead data (the output
> arrays and last-column state) to TupleTableSlots.  Then we'd have
> a routine similar to heap_getattr except that it takes a TupleTableSlot
> and makes use of the extra state data.  The infrastructure to manage
> the state data is already in place: for example, ExecStoreTuple would
> reset the last-known-column to 0, ExecSetSlotDescriptor would be
> responsible for allocating the output arrays using the natts value from
> the provided tupdesc, etc.
>
> This wouldn't help for accesses that are not in the context of a slot,
> but certainly all the ones from ExecEvalVar are.  The executor always
> works with tuples stored in slots, so I think we could fix all the
> high-traffic cases this way.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073