Re: subselect requires offset 0 for good performance. - Mailing list pgsql-performance

From Scott Marlowe
Subject Re: subselect requires offset 0 for good performance.
Date
Msg-id CAOR=d=0EJw=LHpqFk-y4HSvVuESQDbZ8uExfAxv0KDS7T7sEHw@mail.gmail.com
Whole thread Raw
In response to Re: subselect requires offset 0 for good performance.  (Scott Marlowe <scott.marlowe@gmail.com>)
Responses Re: subselect requires offset 0 for good performance.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Fri, Aug 2, 2013 at 1:58 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Fri, Aug 2, 2013 at 1:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Scott Marlowe <scott.marlowe@gmail.com> writes:
>>> I extracted all the data like so:
>>> select * into dba.pp_test_wide from original table;
>>> and get this query plan from explain analyze:
>>> http://explain.depesz.com/s/EPx which takes 20 minutes to run.
>>> If I extract it this way:
>>> select tree_sortkey, product_name, deleted_at into db.pp_test_3col
>>> from original table;
>>> I get this plan: http://explain.depesz.com/s/gru which gets a
>>> materialize in it, and suddenly takes 106 ms.
>>
>> There's no reason why suppressing some unrelated columns would change the
>> rowcount estimates, but those two plans show different rowcount estimates.
>>
>> I suspect the *actual* reason for the plan change was that autovacuum had
>> had a chance to update statistics for the one table, and not yet for the
>> other.  Please do a manual ANALYZE on both tables and see if there's
>> still a plan difference.
>
> Interesting. I ran analyze on both tables and sure enough the new test
> table runs fast. Ran analyze on the old table and it runs slow. The
> only thing the old table and its plan are missing is the materialize.
> So what is likely to change from the old table to the new one? Here's
> the explain analyze output from the old table and the same query
> against it: http://explain.depesz.com/s/CtZ and here's the plan with
> offset 0 in it: http://explain.depesz.com/s/Gug note that while the
> estimates are a bit off, the really huge difference here says to me
> some suboptimal method is getting deployed in the background
> somewhere. Do we need a stack trace?

So as a followup. I ran vacuum verbose analyze on the original table,
thinking it might be bloated but it wasn't. Out of 320k or so rows
there were 4k dead tuples recovered, and none that it couldn't
recover.  So now I'm trying to recreate the original table with a
select into with an order by random() on the end. Nope it gets a
materialize in it and runs fast. Well it's danged hard to make a test
case when copying the table with random ordering results in a much
faster query against the same data. I'm at a loss on how to reproduce
this. Are the indexes on the master table leading it astray maybe?
Yep. Added the indexes and performance went right into the dumper. New
plan on new table with old data added in random order now looks like
the old table, only worse because it's on a slower drive. Just to be
complete here's the plan: http://explain.depesz.com/s/PYH Note that I
created new table with order by random() and created indexes. Ran
analyze on it, and the select plan looks similar now:
http://explain.depesz.com/s/bsE

So maybe I can make a test case now. But to summarize, when it can use
indexes this query gets REAL slow because it lacks a materialize step.
That seem about right?


pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: subselect requires offset 0 for good performance.
Next
From: Tom Lane
Date:
Subject: Re: subselect requires offset 0 for good performance.