Thread: Terrible performance on wide selects

Terrible performance on wide selects

From
Steve Crawford
Date:
I have a table which is rather wide (~800 columns) and consists of a few
columns of identifying data (run time, channel and such) and up to several
hundred columns of collected data (no, normalization does not suggest putting
collected data in another table - collected item 1 always corresponds to
collected item 1 but is completely different than item 3).

My test table is very short (62 rows) but in production would grow by several
thousand rows per day. Unfortunately if my test data is correct, performance
on wide selects is so bad that it will render the system unusable.

Here's the test. I have created two versions of the table - one stores the
collected data in an array of text and the other stores the data in
individual columns, no joins, no indexes. Times are averages of many runs -
the times varied very little and the data is small enough that I'm sure it
was served from RAM. Postgres CPU utilization observed on the longer runs was
98-99%. Changing the output format didn't seem to change things significantly.

Times for selecting all the columns in the table:
select * from columnversion;
8,000 ms

select * from arrayversion;
110 ms

select * from arraytocolumnview (data in the array version but converted to
columns in the view)
10,000 ms

Times to select a single column in a table:
select runstarttime from columversion;
32 ms

select runstarttime from arrayversion;
6 ms

So the question is, does it seem reasonable that a query on fundamentally
identical data should take 70-90 times as long when displayed as individual
columns vs. when output as a raw array and, more imporantly, what can I do to
get acceptable performance on this query?

Cheers,
Steve

Re: Terrible performance on wide selects

From
Tom Lane
Date:
Steve Crawford <scrawford@pinpointresearch.com> writes:
> So the question is, does it seem reasonable that a query on fundamentally
> identical data should take 70-90 times as long when displayed as individual
> columns vs. when output as a raw array and, more imporantly, what can I do to
> get acceptable performance on this query?

There are undoubtedly some places that are O(N^2) in the number of
targetlist items.  Feel free to do some profiling to identify them.
It probably won't be real hard to fix 'em once identified.

            regards, tom lane