Thread: Regression: 8.3 2 seconds -> 8.4 100+ seconds

Regression: 8.3 2 seconds -> 8.4 100+ seconds

From

Francisco Reyes

Date:

27 October 2010, 09:48:52

CentOS 5.4 and 5.5

Query
SELECT sum(usramt) as givensum,
              sum (case when usedid > 0 then usramt else 0 end) as usedsum
              FROM argrades
              WHERE userstatus in (5) and
              membid in (select distinct members.membid from members, cards
where members.membid = cards.membid and members.membershipid = 40 and useraccount in
( select useraccount from cards where membid in
(select membid from members where commonid = 3594)))


Run on 8.3.7. Result below was from second run. First run took 2 seconds.
    QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=205095.01..205095.02 rows=1 width=10) (actual time=1.430..1.431 rows=1 loops=1)
   ->  Nested Loop  (cost=948.04..204823.07 rows=54387 width=10) (actual time=0.153..1.016 rows=472 loops=1)
         ->  Unique  (cost=948.04..948.42 rows=76 width=4) (actual time=0.126..0.128 rows=2 loops=1)
               ->  Sort  (cost=948.04..948.23 rows=76 width=4) (actual time=0.126..0.126 rows=2 loops=1)
                     Sort Key: public.members.membid
                     Sort Method:  quicksort  Memory: 25kB
                     ->  Nested Loop  (cost=362.66..945.66 rows=76 width=4) (actual time=0.089..0.113 rows=2 loops=1)
                           ->  Nested Loop  (cost=362.66..891.84 rows=76 width=4) (actual time=0.082..0.097 rows=2
loops=1)
                                 ->  HashAggregate  (cost=362.66..363.07 rows=41 width=29) (actual time=0.055..0.056
rows=2loops=1) 
                                       ->  Nested Loop  (cost=11.79..362.55 rows=41 width=29) (actual time=0.044..0.054
rows=2loops=1) 
                                             ->  HashAggregate  (cost=11.79..12.20 rows=41 width=4) (actual
time=0.032..0.032rows=2 loops=1) 
                                                   ->  Index Scan using members_commonid on members  (cost=0.00..11.69
rows=41width=4) (actual time=0.010..0.013 rows=2 loops=1) 
                                                         Index Cond: (commonid = 3594)
                                             ->  Index Scan using cards_membid on cards  (cost=0.00..8.53 rows=1
width=33)(actual time=0.007..0.007 rows=1 loops=2) 
                                                   Index Cond: (public.cards.membid = public.members.membid)
                                 ->  Index Scan using cards_useraccount on cards  (cost=0.00..12.87 rows=2 width=33)
(actualtime=0.019..0.019 rows=1 loops=2) 
                                       Index Cond: (public.cards.useraccount = public.cards.useraccount)
                           ->  Index Scan using members_pkey on members  (cost=0.00..0.70 rows=1 width=4) (actual
time=0.006..0.007rows=1 loops=2) 
                                 Index Cond: (public.members.membid = public.cards.membid)
                                 Filter: (public.members.membershipid = 40)
         ->  Index Scan using argrades_membid on argrades  (cost=0.00..2673.60 rows=716 width=14) (actual
time=0.020..0.319rows=236 loops=2) 
               Index Cond: (argrades.membid = public.members.membid)
               Filter: (argrades.userstatus = 5)
 Total runtime: 1.576 ms


Query on 8.4.4 shown below. Unfortunately the RPMs used for that install had enable-cassert.
Same run on identical machine with 8.4.5 was 120 seconds.
                                                                  QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=8726078.65..8726078.66 rows=1 width=10) (actual time=280704.491..280704.492 rows=1 loops=1)
   ->  Hash Semi Join  (cost=295565.48..8725460.91 rows=123547 width=10) (actual time=1995.191..280698.091 rows=472
loops=1)
         Hash Cond: (argrades.membid = public.members.membid)
         ->  Seq Scan on argrades  (cost=0.00..7705748.40 rows=197758125 width=14) (actual time=0.033..245808.983
rows=197504297loops=1) 
               Filter: (userstatus = 5)
         ->  Hash  (cost=265133.95..265133.95 rows=2434522 width=4) (actual time=1837.777..1837.777 rows=2 loops=1)
               ->  HashAggregate  (cost=216443.51..240788.73 rows=2434522 width=4) (actual time=1834.352..1837.760
rows=2loops=1) 
                     ->  Hash Join  (cost=31151.39..210357.21 rows=2434522 width=4) (actual time=1620.620..1830.788
rows=2loops=1) 
                           Hash Cond: (public.members.membid = public.cards.membid)
                           ->  Seq Scan on members  (cost=0.00..121379.95 rows=2434956 width=4) (actual
time=0.024..1085.143rows=2435153 loops=1) 
                                 Filter: (membershipid = 40)
                           ->  Hash  (cost=719.87..719.87 rows=2434522 width=4) (actual time=241.921..241.921 rows=2
loops=1)
                                 ->  Nested Loop  (cost=293.80..719.87 rows=2434522 width=4) (actual
time=228.867..241.909rows=2 loops=1) 
                                       ->  HashAggregate  (cost=293.80..294.13 rows=33 width=29) (actual
time=169.551..169.553rows=2 loops=1) 
                                             ->  Nested Loop  (cost=11.33..293.71 rows=33 width=29) (actual
time=145.940..169.543rows=2 loops=1) 
                                                   ->  HashAggregate  (cost=11.33..11.66 rows=33 width=4) (actual
time=64.730..64.732rows=2 loops=1) 
                                                         ->  Index Scan using members_commonid on members
(cost=0.00..11.25rows=33 width=4) (actual time = 64.688..64.703 rows=2 loops=1) 
                                                               Index Cond: (commonid = 3594)
                                                   ->  Index Scan using cards_membid on cards  (cost=0.00..8.53 rows=1
width=33)(actual time= 52.400..52.401 rows=1 loops=2) 
                                                         Index Cond: (public.cards.membid = public.members.membid)
                                       ->  Index Scan using cards_useraccount on cards  (cost=0.00..12.88 rows=2
width=33)(actual time=36.172..  36.173 rows=1 loops=2) 
                                             Index Cond: (public.cards.useraccount = public.cards.useraccount)
 Total runtime: 280730.327 ms

The 8.4 machines have more memory than the 8.3.7 and are in general much
better machines.
8.4 settings
Shared_buffers 18GB
effective_cache_size 18GB

Machines have 72GB of RAM
Tried turning off sequential scan on the 8.4.5 and that did not help.

Any ideas/suggestions?

Re: Regression: 8.3 2 seconds -> 8.4 100+ seconds

From

Pavel Stehule

Date:

27 October 2010, 10:17:42

Hello

>
> The 8.4 machines have more memory than the 8.3.7 and are in general much
> better machines.
> 8.4 settings
> Shared_buffers 18GB
> effective_cache_size 18GB
>
> Machines have 72GB of RAM
> Tried turning off sequential scan on the 8.4.5 and that did not help.
>
> Any ideas/suggestions?
>

increase statistics on columns. The estimation is totally out.

Regards

Pavel Stehule

> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: Regression: 8.3 2 seconds -> 8.4 100+ seconds

From

Francisco Reyes

Date:

27 October 2010, 11:53:29

Pavel Stehule writes:

> increase statistics on columns. The estimation is totally out.

Stats when I ran the above was at 250.
Will try 500.

Re: Regression: 8.3 2 seconds -> 8.4 100+ seconds

From

Marcin Mirosław

Date:

27 October 2010, 11:56:59

W dniu 27.10.2010 16:53, Francisco Reyes pisze:
> Pavel Stehule writes:
>
>> increase statistics on columns. The estimation is totally out.
>
> Stats when I ran the above was at 250.
> Will try 500.


Please paste: ANALYZE VERBOSE argrades
Regards

Re: Regression: 8.3 2 seconds -> 8.4 100+ seconds

From

Francisco Reyes

Date:

27 October 2010, 12:14:14

 [UTF-8]Marcin MirosÂaw writes:

> W dniu 27.10.2010 16:53, Francisco Reyes pisze:
>> Pavel Stehule writes:
>>
>>> increase statistics on columns. The estimation is totally out.
>>
>> Stats when I ran the above was at 250.
>> Will try 500.

Did 1000.


> Please paste: ANALYZE VERBOSE argrades


analyze verbose argrades;
INFO:  analyzing "public.argrades"
INFO:  "argrades": scanned 75000 of 5122661 pages, containing 3054081 live
rows and 4152 dead rows; 75000 rows in sample, 208600288 estimated total rows
ANALYZE

Re: Regression: 8.3 2 seconds -> 8.4 100+ seconds

From

Tom Lane

Date:

27 October 2010, 13:09:57

Francisco Reyes <lists@stringsutils.com> writes:
> CentOS 5.4 and 5.5
> Query
> SELECT sum(usramt) as givensum,
>               sum (case when usedid > 0 then usramt else 0 end) as usedsum
>               FROM argrades
>               WHERE userstatus in (5) and
>               membid in (select distinct members.membid from members, cards
> where members.membid = cards.membid and members.membershipid = 40 and useraccount in
> ( select useraccount from cards where membid in
> (select membid from members where commonid = 3594)))

Please provide a self-contained test case for this.  In testing here,
8.4 can still figure out that it ought to use an indexscan against the
large table ... so there is some aspect of your tables that you're not
showing us.

            regards, tom lane

Re: Regression: 8.3 2 seconds -> 8.4 100+ seconds

From

Francisco Reyes

Date:

27 October 2010, 14:02:55

Tom Lane writes:

> Please provide a self-contained test case for this.  In testing here,
> 8.4 can still figure out that it ought to use an indexscan against the
> large table ... so there is some aspect of your tables that you're not
> showing us.

It is possible, and very likely, that it is an issue with the distribution
of values.

The data is 40GB+ and the largest table is 200M+ rows..
Setting a standalone setup may be difficult, but what we could do is
something like webex where you, or someone else we would trust, would
connect to a web conference and tell us what to type.

Re: Regression: 8.3 2 seconds -> 8.4 100+ seconds

From

Robert Haas

Date:

06 November 2010, 22:23:37

On Wed, Oct 27, 2010 at 8:41 AM, Francisco Reyes <lists@stringsutils.com> wrote:
>                                ->  Nested Loop  (cost=293.80..719.87
> rows=2434522 width=4) (actual time=228.867..241.909 rows=2 loops=1)
>                                      ->  HashAggregate  (cost=293.80..294.13
> rows=33 width=29) (actual time=169.551..169.553 rows=2 loops=1)
>                                            ->  Nested Loop
>  (cost=11.33..293.71 rows=33 width=29) (actual time=145.940..169.543 rows=2
> loops=1)
>                                                  ->  HashAggregate
>  (cost=11.33..11.66 rows=33 width=4) (actual time=64.730..64.732 rows=2
> loops=1)
>                                                        ->  Index Scan using
> members_commonid on members  (cost=0.00..11.25 rows=33 width=4) (actual time
> = 64.688..64.703 rows=2 loops=1)
>                                                              Index Cond:
> (commonid = 3594)
>                                                  ->  Index Scan using
> cards_membid on cards  (cost=0.00..8.53 rows=1 width=33) (actual time=
> 52.400..52.401 rows=1 loops=2)
>                                                        Index Cond:
> (public.cards.membid = public.members.membid)
>                                      ->  Index Scan using cards_useraccount
> on cards  (cost=0.00..12.88 rows=2 width=33) (actual time=36.172..  36.173
> rows=1 loops=2)
>                                            Index Cond:
> (public.cards.useraccount = public.cards.useraccount)

This part looks really strange to me.  Here we have a nested loop
whose outer side is estimated to produce 33 rows and whose outer side
is estimated to produce 2 rows.  Given that, one would think that the
estimate for the loop as a whole shouldn't be more than 33 * 2 = 66
rows (or maybe a bit more if 33 is really 33.4999 rounded down, and 2
is really 2.49999 rounded down).   But the actual estimate is 5 orders
of magnitude larger.  How is that possible?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Regression: 8.3 2 seconds -> 8.4 100+ seconds

From

Francisco Reyes

Date:

08 November 2010, 05:13:44

Robert Haas writes:

> This part looks really strange to me.  Here we have a nested loop
> whose outer side is estimated to produce 33 rows and whose outer side
> is estimated to produce 2 rows.

We have retained someone to help us troubleshoot the issue.
Once the issue has been resolved I will make sure to post our
findings.

Re: Regression: 8.3 2 seconds -> 8.4 100+ seconds

From

Greg Smith

Date:

28 December 2010, 22:27:09

Reviving a thread from two months ago...Francisco asked about a query
that was much slower in 8.3 at
http://archives.postgresql.org/message-id/cone.1288183283.263738.5695.1000@shelca

There was some theorizing about a stats problem.  I've now pulled
pg_stats data from the production server, and that alone doesn't seem to
be enough to account for what's hapenning.  Robert asked a good question
at
http://archives.postgresql.org/message-id/AANLkTi=VotmtMz5tYcnQ_5nDhZmsvp_9oYq43FF+qjnn@mail.gmail.com
that never was followed up on.  That specificially is what I'm still
confused about, even after collecting a lot more data from this system,
so continuing from there:

Robert Haas wrote:
> This part looks really strange to me.  Here we have a nested loop
> whose outer side is estimated to produce 33 rows and whose [inner] side
> is estimated to produce 2 rows.  Given that, one would think that the
> estimate for the loop as a whole shouldn't be more than 33 * 2 = 66
> rows (or maybe a bit more if 33 is really 33.4999 rounded down, and 2
> is really 2.49999 rounded down).   But the actual estimate is 5 orders
> of magnitude larger.  How is that possible?

That part stuck out to me too.  I have no idea how that particular bit
of Nested Loop ends up with that giant count for estimated output rows.
I thought it would be about 33 * 2 too, so how is that turning into an
estimate of 2434522 output rows?  I believe this is the part that's
executing "cards.useraccount IN".

Here's a cut-down subset of the query that just does the suspicious part
without any nodes above it, run on the 8.4 system having the problem.
This is basically the same plan that was seen in a lower-level node
before, just the number of matching rows for the one loop has gone from
33 to 34 due to more data being added to the table since the previous
run.  I think it's easier to follow the logic it's trying to execute
when simplified like this:

SELECT
  members.membid
FROM cards
WHERE cards.useraccount IN
    (SELECT useraccount FROM cards WHERE membid IN
      (SELECT membid from members where commonid = 3594)
    )
;

Nested Loop  (cost=303.01..742.28 rows=2469120 width=4) (actual
time=0.066..0.082 rows=2 loops=1)
  ->  HashAggregate  (cost=303.01..303.35 rows=34 width=29) (actual
time=0.048..0.048 rows=2 loops=1)
        ->  Nested Loop  (cost=11.86..302.93 rows=34 width=29) (actual
time=0.034..0.045 rows=2 loops=1)
              ->  HashAggregate  (cost=11.86..12.20 rows=34 width=4)
(actual time=0.023..0.024 rows=2 loops=1)
                    ->  Index Scan using members_commonid on members
(cost=0.00..11.77 rows=34 width=4) (actual time=0.014..0.016 rows=2 loops=1)
                          Index Cond: (commonid = 3594)
              ->  Index Scan using cards_membid on cards
(cost=0.00..8.54 rows=1 width=33) (actual time=0.009..0.010 rows=1 loops=2)
                    Index Cond: (public.creditcards.membid = members.membid)
  ->  Index Scan using cards_useraccount on cards  (cost=0.00..12.88
rows=2 width=33) (actual time=0.015..0.016 rows=1 loops=2)
        Index Cond: (public.cards.useraccount = public.cards.useraccount)

It's possible to rewrite this whole thing using a join instead of IN,
and sure enough doing so gives a better plan.  That's how they got
around this being a crippling issue.  I'm still suspicious of what
caused such a performance regression from 8.3 though, where this query
executed so much better.

Stepping back from that part of the query for a second, the main time
related difference between the 8.3 and 8.4 plans involves how much of
the members table gets scanned.  When 8.3 looks up a matching item in
that table, in order to implement this part of the larger query:

WHERE members.membid = cards.membid AND members.membershipid = 40

It uses the membid index and gets a quick plan, followed by filtering on
membershipid:

->  Index Scan using members_pkey on members  (cost=0.00..0.70 rows=1
width=4) (actual time=0.006..0.007 rows=1 loops=2)
      Index Cond: (public.members.membid = public.cards.membid)
      Filter: (public.members.membershipid = 40)

8.4 is scanning the whole table instead:

->  Seq Scan on members  (cost=0.00..121379.95 rows=2434956 width=4)
(actual time=0.024..1085.143 rows=2435153 loops=1)
      Filter: (membershipid = 40)

Which gives you essentially every single member ID available, to then
match against in a Hash join.  The filter on membershipid isn't
considered selective at all.  I'm not sure why 8.4 isn't also
recognizing the value of being selective on the membid here, to reduce
the number of output rows that come out of that.

Is the mis-estimation of the Nested Loop part causing this sequential
scan to happen, because there are so many more potential values to join
against in the estimate than in reality?  If that's the case, it just
comes full circle back to how the node already discussed above is coming
about.

While there are some statistics anonomlies due to data skew on the
production system I don't see how they could explain this Nested Loop
row explosion.  I can tell you in detail why some of the lower-level
data is misestimated by a single order of magnitude.  For example, if
you focus on this inner part:

SELECT useraccount FROM cards WHERE membid IN
  (SELECT membid from members where commonid = 3594));

->  Nested Loop  (cost=11.33..293.71 rows=33 width=29) (actual
time=145.940..169.543 rows=2 loops=1)
      ->  HashAggregate  (cost=11.33..11.66 rows=33 width=4) (actual
time=64.730..64.732 rows=2 loops=1)
            ->  Index Scan using members_commonid on members
(cost=0.00..11.25 rows=33 width=4) (actual time = 64.688..64.703 rows=2
loops=1)
                  Index Cond: (commonid = 3594)
      ->  Index Scan using cards_membid on cards  (cost=0.00..8.53
rows=1 width=33) (actual time= 52.400..52.401 rows=1 loops=2)
            Index Cond: (public.cards.membid = public.members.membid)

The index scan on members_commonid here is estimating 33 rows when there
are actually 2 that match.  Looking at the table stats for this
relation, the distribution is a bit skewed because 99.7% of the rows are
set to the sole named MCV:  the value "0", that's used as a flag for no
data here instead of NULL (that's a standard legacy system import
compatibility issue).  My guess is that the 250 points of histogram data
aren't quite enough to really capture the distribution of the non-zero
values very well in the wake of that, so it's off by a factor of ~16.
That alone isn't enough of an error to switch any of the efficient index
scans into other forms though.  The actual runtime in this part of the
plan isn't suffering that badly from this error, it's more that other
plan decisions aren't being made well around it.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us