Thread: sequential scan result order vs performance

sequential scan result order vs performance

From
Andres Freund
Date:
Hi,

while working on the executor, to process "batches" or "bubbles" of
tuples I hit some weird performance issues (as in things didn't improve
as much as I had hoped).  A fair amount of headscratching later I
figured out that the tuple order in sequential scans is a major
bottleneck.

When iterating over a page we return tuples in itemid order, which makes
them returned in *descending* order address-wise, as tuples are stored
starting from the end of the page.  But when actually accessing the
tuples, we access them increasing address order (header, and then column
by column). It appears that that, quite understandable confuses prefetch
units, leading to drastically increased cache miss ratios.

It's quite easy to change iteration so we start with the latest item,
and iterate till the first, rather than the other way round. In
benchmarks with somewhat wide columns and aggregation, this yields
speedups of over 30%, before hitting other bottlenecks.

I do wonder however if it's acceptable to change the result order of
sequential scans. People don't tend to specify ORDER BY everwhere - as
evidenced by large swathes of our regression tests failing spuriously -
so they might not be happy to see a somewhat weird order (pages
sequentially increasing, but individual tuples inside a page in reverse
order).

We could change the order only in cases where the user doesn't actually
see the result, say below aggregation, sort, and whatnot nodes. On the
other hand the benefit is quite significant for heavily filtered
sequential scans as well, COPY out also benefits.

Comments?

Greetings,

Andres Freund



Re: sequential scan result order vs performance

From
Andres Freund
Date:
Hi,

On 2016-10-30 00:36:55 -0700, Andres Freund wrote:
> It's quite easy to change iteration so we start with the latest item,
> and iterate till the first, rather than the other way round. In
> benchmarks with somewhat wide columns and aggregation, this yields
> speedups of over 30%, before hitting other bottlenecks.

One more point: Over IM Robert commented that it's not guaranteed that
itemid order correlates precisely with reverse "tuple data" order. After
PageRepairFragmentation() that'll not be the case anymore. That's true,
but I suspect in many cases with larger analytics queries the
correlation will still be significant, and we also could make it
guaranteed with the price of making PageRepairFragmentation() a bit more
expensive.

Greetings,

Andres Freund



Re: sequential scan result order vs performance

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> It's quite easy to change iteration so we start with the latest item,
> and iterate till the first, rather than the other way round. In
> benchmarks with somewhat wide columns and aggregation, this yields
> speedups of over 30%, before hitting other bottlenecks.

> I do wonder however if it's acceptable to change the result order of
> sequential scans.

I think there will be a lot of howls.  People expect that creating
a table by inserting a bunch of rows, and then reading back those
rows, will not change the order.  We already futzed with that guarantee
a bit with syncscans, but that only affects quite large tables --- and
even there, we were forced to provide a way to turn it off.

If you were talking about 3X then maybe it would be worth it, but for 30%
(on a subset of queries) I am not excited.

I wonder whether we could instead adjust the rules for insertion so
that tuples tend to be physically in order by itemid.  I'm imagining
leaving two "holes" in a page and sometimes (hopefully not often)
having to shift data during insert to preserve the ordering.
        regards, tom lane



Re: sequential scan result order vs performance

From
Jim Nasby
Date:
On 10/30/16 9:12 AM, Tom Lane wrote:
> I think there will be a lot of howls.  People expect that creating
> a table by inserting a bunch of rows, and then reading back those
> rows, will not change the order.  We already futzed with that guarantee
> a bit with syncscans, but that only affects quite large tables --- and
> even there, we were forced to provide a way to turn it off.

Leaving a 30% performance improvement on the floor because some people 
don't grok how sets work seems insane to me.

We could have a GUC to disable this. I suspect ORDER BY ctid would be 
another option.

BTW, I've sometimes wished for a mode where queries would silently have 
result ordering intentionally futzed, to eliminate any possibility of 
dependence on tuple ordering (as well as having sequences start at some 
random value). I guess with the hooks that are in place today it 
wouldn't be hard to stick a ORDER BY random() in if there wasn't already 
a Sort node at the top level...
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: sequential scan result order vs performance

From
Albe Laurenz
Date:
Jim Nasby wrote:
> On 10/30/16 9:12 AM, Tom Lane wrote:
>> I think there will be a lot of howls.  People expect that creating
>> a table by inserting a bunch of rows, and then reading back those
>> rows, will not change the order.  We already futzed with that guarantee
>> a bit with syncscans, but that only affects quite large tables --- and
>> even there, we were forced to provide a way to turn it off.
> 
> Leaving a 30% performance improvement on the floor because some people
> don't grok how sets work seems insane to me.

+1

Yours,
Laurenz Albe

Re: sequential scan result order vs performance

From
ilmari@ilmari.org (Dagfinn Ilmari Mannsåker)
Date:
Jim Nasby <Jim.Nasby@BlueTreble.com> writes:

> BTW, I've sometimes wished for a mode where queries would silently have
> result ordering intentionally futzed, to eliminate any possibility of
> dependence on tuple ordering (as well as having sequences start at some
> random value).

FWIW, SQLite has this, in the form of 'PRAGMA reverse_unordered_selects'.

http://sqlite.org/pragma.html#pragma_reverse_unordered_selects

-- 
"The surreality of the universe tends towards a maximum" -- Skud's Law
"Never formulate a law or axiom that you're not prepared to live withthe consequences of."
--Skud's Meta-Law
 



Re: sequential scan result order vs performance

From
Corey Huinker
Date:
<div dir="ltr"><div class="gmail_extra"><br /><div class="gmail_quote">On Sun, Oct 30, 2016 at 11:37 PM, Jim Nasby
<spandir="ltr"><<a href="mailto:Jim.Nasby@bluetreble.com" target="_blank">Jim.Nasby@bluetreble.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">BTW,
I'vesometimes wished for a mode where queries would silently have result ordering intentionally futzed, to eliminate
anypossibility of dependence on tuple ordering (as well as having sequences start at some random value). I guess with
thehooks that are in place today it wouldn't be hard to stick a ORDER BY random() in if there wasn't already a Sort
nodeat the top level...</blockquote></div><div class="gmail_extra"><br /></div>+1<br />In Oracle, we sorta had that
featureby adding a parallel hint to a query even if it didn't need it. It came in handy.</div></div>