Thread: Re: Terrible performance on wide selects

Re: Terrible performance on wide selects

From
"Dann Corbit"
Date:
[snip]
> For the disk case, why not have the start of the record
> contain an array of offsets to the start of the data for each
> column?  It would only be necessary to have a list for
> variable fields.
>
> So (for instance) if you have 12 variable fields, you would
> store 12 integers at the start of the record.

You have to store this information anyway (for variable length objects).
By storing it at the front of the record you would lose nothing (except
the logical coupling of an object with its length).  But I would think
that it would not consume any additional storage.

Re: [PERFORM] Terrible performance on wide selects

From
Hannu Krosing
Date:
Dann Corbit kirjutas N, 23.01.2003 kell 02:39:
> [snip]
> > For the disk case, why not have the start of the record
> > contain an array of offsets to the start of the data for each
> > column?  It would only be necessary to have a list for
> > variable fields.
> >
> > So (for instance) if you have 12 variable fields, you would
> > store 12 integers at the start of the record.
>
> You have to store this information anyway (for variable length objects).
> By storing it at the front of the record you would lose nothing (except
> the logical coupling of an object with its length).  But I would think
> that it would not consume any additional storage.

I don't think it will win much either (except for possible cache
locality with really huge page sizes), as the problem is _not_ scanning
over big strings finding their end marker, but instead is chasing long
chains of pointers.

There could be some merit in the idea of storing in the beginning of
tuple all pointers starting with first varlen field (16 bit int should
be enough)
so people can minimize the overhead by moving fixlen fields to the
beginning. once we have this setup, we no longer need the varlen fields
/stored/ together with field data.

this adds complexity of converting form (len,data) to ptr,...,data) when
constructing the tuple

as  tuple (int,int,int,varchar,varchar)

which is currently stored as

(intdata1, intdata2, intdata3, (len4, vardata4), (len5,vardata5))

should be rewritten on storage to

(ptr4,ptr5),(intdata1, intdata2, intdata3, vardata4,vardata5)

but it seems to solve the O(N) problem quite nicely (and forces no
storage growth for tuples with fixlen fields in the beginning of tuple)

and we must also account for NULL fields in calculations .

--
Hannu Krosing <hannu@tm.ee>

Re: [PERFORM] Terrible performance on wide selects

From
Tom Lane
Date:
Hannu Krosing <hannu@tm.ee> writes:
> as  tuple (int,int,int,varchar,varchar)
> which is currently stored as
> (intdata1, intdata2, intdata3, (len4, vardata4), (len5,vardata5))
> should be rewritten on storage to
> (ptr4,ptr5),(intdata1, intdata2, intdata3, vardata4,vardata5)

I do not see that this buys anything at all.  heap_getattr still has to
make essentially the same calculation as before to determine column
locations, namely adding up column widths.  All you've done is move the
data that it has to fetch to make the calculation.  If anything, this
will be slower not faster, because now heap_getattr has to keep track
of two positions not one --- not just the next column offset, but also
the index of the next "ptr" to use.  In the existing method it only
needs the column offset, because that's exactly where it can pick up
the next length from.

But the really serious objection is that the datatype functions that
access the data would now also need to be passed two pointers, since
after all they would like to know the length too.  That breaks APIs
far and wide :-(

            regards, tom lane