sequential scan result order vs performance - Mailing list pgsql-hackers

From Andres Freund
Subject sequential scan result order vs performance
Date
Msg-id 20161030073655.rfa6nvbyk4w2kkpk@alap3.anarazel.de
Whole thread Raw
Responses Re: sequential scan result order vs performance  (Andres Freund <andres@anarazel.de>)
Re: sequential scan result order vs performance  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

while working on the executor, to process "batches" or "bubbles" of
tuples I hit some weird performance issues (as in things didn't improve
as much as I had hoped).  A fair amount of headscratching later I
figured out that the tuple order in sequential scans is a major
bottleneck.

When iterating over a page we return tuples in itemid order, which makes
them returned in *descending* order address-wise, as tuples are stored
starting from the end of the page.  But when actually accessing the
tuples, we access them increasing address order (header, and then column
by column). It appears that that, quite understandable confuses prefetch
units, leading to drastically increased cache miss ratios.

It's quite easy to change iteration so we start with the latest item,
and iterate till the first, rather than the other way round. In
benchmarks with somewhat wide columns and aggregation, this yields
speedups of over 30%, before hitting other bottlenecks.

I do wonder however if it's acceptable to change the result order of
sequential scans. People don't tend to specify ORDER BY everwhere - as
evidenced by large swathes of our regression tests failing spuriously -
so they might not be happy to see a somewhat weird order (pages
sequentially increasing, but individual tuples inside a page in reverse
order).

We could change the order only in cases where the user doesn't actually
see the result, say below aggregation, sort, and whatnot nodes. On the
other hand the benefit is quite significant for heavily filtered
sequential scans as well, COPY out also benefits.

Comments?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: "Karl O. Pinc"
Date:
Subject: Re: Patch to implement pg_current_logfile() function
Next
From: Greg Stark
Date:
Subject: Re: JIT compiler for expressions