Thread: Column lookup in a row performance

Column lookup in a row performance

From

Павлухин Иван

Date:

30 March 2019, 06:35:39

Hi PostgresSQL developers,

I asked my question already on pgsql-general list and did not find an
explanation. Below is the question mainly copied from [0].
----

I am learning deeply how tuples are organized and column values are
accessed in different databases. As far as undertood postgres does not
store all column positions in a tuple (e.g. in header or footer). In
contrast MySQL InnoDB stores column lengths in a record header [1].
From the first glance it seems that a postgres format can have a
significant performance penalty when accessing a single column which
is located after multiple variable-length columns because searching a
column value position in a row requires multiple jumps. And in InnoDB
a position of a particular column can be found right after reading a
header.

I found several related threads in pgsql-hackers archives [2,3]
describing significant performance wins in a prototype.

Does anyone know why the format is still the same? Perhaps InnoDB and
similar formats are not so good, are they?

Please respond if you have the clue!
----

I did a rough experiment to check if a difference is visible. I used
following table (200 pairs of columns):
create table layout(
i0 int,
s0 varchar(255),
...
i199 int,
s199 varchar(255)
);

And populated it with 1 million rows. And run following queries
SELECT AVG(i0) FROM layout;
SELECT AVG(i199) FROM layout;

On my machine calculating an average over column i0 took about 1
second and about 2.5 seconds for column i199. And similar observations
were described in threads mentioned before. Quite significant
difference!

I made a similar experiment for mysql as well (innodb). And results
are the same for first and last columns.

Some details how it is stored in innodb. They store varlen column
lengths only in a tuple header (there is no length prefix before
column data itself). Having a tuple descriptor and lengths in a tuple
header it is always possible to calculate each column position without
jumping through an entire record. And seems that space requirements
are same as in postgresql.

It seems that an innodb layout is better at least for reading. So, it
is still unclear for me why postgresql does not employ similar layout
if it can give significant benefits.
----

[0] https://www.postgresql.org/message-id/flat/CAOykqKc8Uoi3NKVfd5DpTmUzD4rJBWG9Gjo3pr7eaUGLtrstvw%40mail.gmail.com
[1] https://dev.mysql.com/doc/refman/8.0/en/innodb-row-format.html#innodb-row-format-compact
[2] https://www.postgresql.org/message-id/flat/c58979e50702201307w64b12892uf8dfc3d8bf117ec0%40mail.gmail.com
[3] https://www.postgresql.org/message-id/flat/87irj16umm.fsf%40enterprisedb.com

--
Best regards,
Ivan Pavlukhin

Re: Column lookup in a row performance

From

Tom Lane

Date:

30 March 2019, 14:26:06

=?UTF-8?B?0J/QsNCy0LvRg9GF0LjQvSDQmNCy0LDQvQ==?= <vololo100@gmail.com> writes:
> Does anyone know why the format is still the same?

(1) Backwards compatibility, and (2) it's not clear that a different
layout would be a win for all cases.

            regards, tom lane

Re: Column lookup in a row performance

From

Павлухин Иван

Date:

02 April 2019, 05:48:31

Tom,

Thank you.
> (1) Backwards compatibility, and (2) it's not clear that a different
> layout would be a win for all cases.

I am curious regarding (2), for my understanding it is good to find
out at least one case when layout with lengths/offsets in a header
will be crucially worse. I will be happy if someone can elaborate.

сб, 30 мар. 2019 г. в 17:26, Tom Lane <tgl@sss.pgh.pa.us>:
>
> =?UTF-8?B?0J/QsNCy0LvRg9GF0LjQvSDQmNCy0LDQvQ==?= <vololo100@gmail.com> writes:
> > Does anyone know why the format is still the same?
>
> (1) Backwards compatibility, and (2) it's not clear that a different
> layout would be a win for all cases.
>
>                         regards, tom lane



--
Best regards,
Ivan Pavlukhin

Re: Column lookup in a row performance

From

Tom Lane

Date:

02 April 2019, 15:41:16

=?UTF-8?B?0J/QsNCy0LvRg9GF0LjQvSDQmNCy0LDQvQ==?= <vololo100@gmail.com> writes:
>> (1) Backwards compatibility, and (2) it's not clear that a different
>> layout would be a win for all cases.

> I am curious regarding (2), for my understanding it is good to find
> out at least one case when layout with lengths/offsets in a header
> will be crucially worse. I will be happy if someone can elaborate.

It seems like you think the only figure of merit here is how fast
deform_heap_tuple runs.  That's not the case.  There are at least
two issues:

1.  You're not going to be able to do this without making tuples
larger overall in many cases; but more data means more I/O which
means less performance.  I base this objection on the observation
that our existing design allows single-byte length "words" in many
common cases, but it's really hard to see how you could avoid
storing a full-size offset for each column if you want to be able
to access each column in O(1) time without any examination of other
columns.

2.  Our existing system design has an across-the-board assumption
that each variable-length datum has its length embedded in it,
so that a single pointer carries enough information for any called
function to work with the value.  If you remove the length word
and expect the length to be computed by subtracting two offsets that
are not even physically adjacent to the datum, that stops working.
There is no fix for that that doesn't add performance costs and
complexity.

Practically speaking, even if we were willing to lose on-disk database
compatibility, point 2 breaks so many internal and extension APIs that
there's no chance whatever that we could remove the length-word datum
headers.  That means that the added fields in tuple headers would be
pure added space with no offsetting savings in the data size, making
point 1 quite a lot worse.

            regards, tom lane

Re: Column lookup in a row performance

From

Павлухин Иван

Date:

03 April 2019, 11:44:37

Tom, thanks for your answer. It definitely makes a picture in my mind
more clear.

вт, 2 апр. 2019 г. в 18:41, Tom Lane <tgl@sss.pgh.pa.us>:
>
> =?UTF-8?B?0J/QsNCy0LvRg9GF0LjQvSDQmNCy0LDQvQ==?= <vololo100@gmail.com> writes:
> >> (1) Backwards compatibility, and (2) it's not clear that a different
> >> layout would be a win for all cases.
>
> > I am curious regarding (2), for my understanding it is good to find
> > out at least one case when layout with lengths/offsets in a header
> > will be crucially worse. I will be happy if someone can elaborate.
>
> It seems like you think the only figure of merit here is how fast
> deform_heap_tuple runs.  That's not the case.  There are at least
> two issues:
>
> 1.  You're not going to be able to do this without making tuples
> larger overall in many cases; but more data means more I/O which
> means less performance.  I base this objection on the observation
> that our existing design allows single-byte length "words" in many
> common cases, but it's really hard to see how you could avoid
> storing a full-size offset for each column if you want to be able
> to access each column in O(1) time without any examination of other
> columns.
>
> 2.  Our existing system design has an across-the-board assumption
> that each variable-length datum has its length embedded in it,
> so that a single pointer carries enough information for any called
> function to work with the value.  If you remove the length word
> and expect the length to be computed by subtracting two offsets that
> are not even physically adjacent to the datum, that stops working.
> There is no fix for that that doesn't add performance costs and
> complexity.
>
> Practically speaking, even if we were willing to lose on-disk database
> compatibility, point 2 breaks so many internal and extension APIs that
> there's no chance whatever that we could remove the length-word datum
> headers.  That means that the added fields in tuple headers would be
> pure added space with no offsetting savings in the data size, making
> point 1 quite a lot worse.
>
>                         regards, tom lane



--
Best regards,
Ivan Pavlukhin