Thread: Two fast queries get slow when combined

Two fast queries get slow when combined

From
cluster
Date:
I have two small queries which are both very fast to evaluate
separately. The first query, "Query 1", calculates some statistics and
the the second query, "Query 2", finds a subset of relevant keys.
When combined into a single query which calculates statistics from only
the subset of relevant keys the evaluation plan explodes and uses both
seq scans and bitmap heap scans.
How can I improve the performance of the combined query?

Queries and output from EXPLAIN ANALYZE can be seen here with some
syntax highlighting:
    http://rafb.net/p/BJIW4p69.html

I will also paste it here:

=============================================================================
QUERY 1 (very *fast*):
=============================================================================
SELECT keyId, count(1) as num_matches
FROM stats
GROUP BY keyId
LIMIT 50

  Limit  (cost=0.00..23.65 rows=50 width=8) (actual time=0.090..2.312
rows=50 loops=1)
    ->  GroupAggregate  (cost=0.00..4687.46 rows=9912 width=8) (actual
time=0.085..2.145 rows=50 loops=1)
          ->  Index Scan using stats_keyId on stats  (cost=0.00..3820.19
rows=99116 width=8) (actual time=0.031..1.016 rows=481 loops=1)
  Total runtime: 2.451 ms
(4 rows)


=============================================================================
QUERY 2 (very *fast*):
=============================================================================
SELECT keyId, sortNum
FROM items i
WHERE sortNum > 123
ORDER BY sortNum
LIMIT 50

  Limit  (cost=0.01..9.87 rows=50 width=8) (actual time=0.068..0.610
rows=50 loops=1)
    InitPlan
      ->  Limit  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.009..0.025 rows=1 loops=1)
            ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.006..0.007 rows=1 loops=1)
    ->  Index Scan using items_sortNum on items i  (cost=0.00..1053.67
rows=5344 width=8) (actual time=0.063..0.455 rows=50 loops=1)
          Index Cond: (sortNum >= $0)
  Total runtime: 0.749 ms
(7 rows)



=============================================================================
COMBINED QUERY (very *slow*):
=============================================================================
   SELECT keyId, sortNum, count(1)
   FROM stats s, items i
   WHERE s.keyId = i.keyId AND i.sortNum > 123
   GROUP BY i.keyId, i.sortNum
   ORDER BY i.sortNum
   LIMIT 50

Limit  (cost=3281.72..3281.84 rows=50 width=16) (actual
time=435.838..436.043 rows=50 loops=1)
    InitPlan
      ->  Limit  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.016..0.021 rows=1 loops=1)
            ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.012..0.013 rows=1 loops=1)
    ->  Sort  (cost=3281.71..3289.97 rows=3304 width=16) (actual
time=435.833..435.897 rows=50 loops=1)
          Sort Key: i.sortNum
          ->  Hash Join  (cost=2745.80..3088.59 rows=3304 width=16)
(actual time=364.247..413.164 rows=8490 loops=1)
                Hash Cond: (s.keyId = i.keyId)
                ->  HashAggregate  (cost=2270.53..2394.43 rows=9912
width=8) (actual time=337.869..356.533 rows=9911 loops=1)
                      ->  Seq Scan on items  (cost=0.00..1527.16
rows=99116 width=8) (actual time=0.016..148.118 rows=99116 loops=1)
                ->  Hash  (cost=408.47..408.47 rows=5344 width=12)
(actual time=26.342..26.342 rows=4491 loops=1)
                      ->  Bitmap Heap Scan on items i
(cost=121.67..408.47 rows=5344 width=12) (actual time=5.007..16.898
rows=4491 loops=1)
                            Recheck Cond: (sortNum >= $0)
                            ->  Bitmap Index Scan on items_sortNum
(cost=0.00..120.33 rows=5344 width=0) (actual time=4.273..4.273
rows=13375 loops=1)
                                  Index Cond: (sortNum >= $0)
Total runtime: 436.421 ms
(16 rows)

Re: Two fast queries get slow when combined

From
Heikki Linnakangas
Date:
cluster wrote:
>   SELECT keyId, sortNum, count(1)
>   FROM stats s, items i
>   WHERE s.keyId = i.keyId AND i.sortNum > 123
>   GROUP BY i.keyId, i.sortNum
>   ORDER BY i.sortNum
>   LIMIT 50
>
> Limit  (cost=3281.72..3281.84 rows=50 width=16) (actual
> time=435.838..436.043 rows=50 loops=1)
>    InitPlan
>      ->  Limit  (cost=0.00..0.01 rows=1 width=0) (actual
> time=0.016..0.021 rows=1 loops=1)
>            ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
> time=0.012..0.013 rows=1 loops=1)
>    ->  Sort  (cost=3281.71..3289.97 rows=3304 width=16) (actual
> time=435.833..435.897 rows=50 loops=1)
>          Sort Key: i.sortNum
>          ->  Hash Join  (cost=2745.80..3088.59 rows=3304 width=16)
> (actual time=364.247..413.164 rows=8490 loops=1)
>                Hash Cond: (s.keyId = i.keyId)
>                ->  HashAggregate  (cost=2270.53..2394.43 rows=9912
> width=8) (actual time=337.869..356.533 rows=9911 loops=1)
>                      ->  Seq Scan on items  (cost=0.00..1527.16
> rows=99116 width=8) (actual time=0.016..148.118 rows=99116 loops=1)
>                ->  Hash  (cost=408.47..408.47 rows=5344 width=12)
> (actual time=26.342..26.342 rows=4491 loops=1)
>                      ->  Bitmap Heap Scan on items i
> (cost=121.67..408.47 rows=5344 width=12) (actual time=5.007..16.898
> rows=4491 loops=1)
>                            Recheck Cond: (sortNum >= $0)
>                            ->  Bitmap Index Scan on items_sortNum
> (cost=0.00..120.33 rows=5344 width=0) (actual time=4.273..4.273
> rows=13375 loops=1)
>                                  Index Cond: (sortNum >= $0)
> Total runtime: 436.421 ms
> (16 rows)

There's something odd about that plan. It's doing both a seq scan and a
bitmap scan on "items", but I can't see stats table being mentioned
anywhere. Looking at the row count, I believe that seq scan is actually
on the stats table, not items like it says above. Is that really a
verbatim copy of the output you got?

Which version of Postgres is this?

You could try rewriting the query like this:

SELECT keyId, sortNum,
   (SELECT count(*) FROM stats s WHERE s.keyId = i.keyId) AS stats_cnt
FROM items i
WHERE i.sortNum > 123
ORDER BY sortNum
LIMIT 50

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Two fast queries get slow when combined

From
Tom Lane
Date:
cluster <skrald@amossen.dk> writes:
> Queries and output from EXPLAIN ANALYZE can be seen here with some
> syntax highlighting:
>     http://rafb.net/p/BJIW4p69.html

You are lying to us about how those queries were posed to Postgres
(and no I don't feel a need to explain how I know).  In future please
present the full truth about what you are doing, not a simplification
that you think is sufficient.

But I think the short answer to your question is that query 1 is fast
because it need only select the first 50 rows in some ordering, and
query 2 is fast because it need only select the first 50 rows in
some ordering, but they are not the same ordering so the join query
doesn't get to exploit that shortcut.

            regards, tom lane

Re: Two fast queries get slow when combined

From
cluster
Date:
> There's something odd about that plan. It's doing both a seq scan and a
> bitmap scan on "items", but I can't see stats table being mentioned
> anywhere.

Huh? Aaah, sorry. I made a major search/replace-refactoring (that
obviously went wrong) on all open files in the editor before posting to
this newsgroup, and one of these files was my preparation of the queries
and planner output. Damn.
Sorry for wasting your time! :-(

However, your suggestion worked perfectly. Thanks a lot!

Re: Two fast queries get slow when combined

From
cluster
Date:
> You are lying to us about how those queries were posed to Postgres
> (and no I don't feel a need to explain how I know).

Sorry. The "lying" was not intended as explained in my reply to Heikku.

Thanks for the tips anyways.