[PERFORM] Query performance changes significantly depending on limit value - Mailing list pgsql-performance

From Rowan Seymour
Subject [PERFORM] Query performance changes significantly depending on limit value
Date
Msg-id CAEhK25pV_Q+XgCZk+gfn07ZgjBWxXNdmGWY3gwh+cBZy8MptAg@mail.gmail.com
Whole thread Raw
Responses Re: [PERFORM] Query performance changes significantly depending onlimit value
List pgsql-performance
Hi guys

I'm a bit stuck on a query that performs fantastically up to a certain limit value, after which the planner goes off in a completely different direction and performance gets dramatically worse. Am using Postgresql 9.3

You can see all the relevant schemas at http://pastebin.com/PNEqw2id and in the test database there are 1,000,000 records in contacts_contact, and about half of those will match the subquery on values_value.

The query in question is:

SELECT "contacts_contact".* FROM "contacts_contact"
INNER JOIN "contacts_contactgroup_contacts" ON ("contacts_contact"."id" = "contacts_contactgroup_contacts"."contact_id")
WHERE ("contacts_contactgroup_contacts"."contactgroup_id" = 1
       AND "contacts_contact"."id" IN (
         SELECT U0."contact_id" FROM "values_value" U0 WHERE (U0."contact_field_id" = 1 AND UPPER(U0."string_value"::text) = UPPER('F'))
       )
) ORDER BY "contacts_contact"."id" DESC LIMIT 222;

With that limit of 222, it performs like:

Limit  (cost=3.09..13256.36 rows=222 width=88) (actual time=0.122..3.358 rows=222 loops=1)
  Buffers: shared hit=708 read=63
  ->  Nested Loop  (cost=3.09..59583.10 rows=998 width=88) (actual time=0.120..3.304 rows=222 loops=1)
        Buffers: shared hit=708 read=63
        ->  Merge Semi Join  (cost=2.65..51687.89 rows=2004 width=92) (actual time=0.103..1.968 rows=227 loops=1)
              Merge Cond: (contacts_contact.id = u0.contact_id)
              Buffers: shared hit=24 read=63
              ->  Index Scan Backward using contacts_contact_pkey on contacts_contact  (cost=0.42..41249.43 rows=1000000 width=88) (actual time=0.008..0.502 rows=1117 loops=1)
                    Buffers: shared hit=22 read=2
              ->  Index Scan using values_value_field_string_value_contact on values_value u0  (cost=0.43..7934.72 rows=2004 width=4) (actual time=0.086..0.857 rows=227 loops=1)
                    Index Cond: ((contact_field_id = 1) AND (upper(string_value) = 'F'::text))
                    Buffers: shared hit=2 read=61
        ->  Index Only Scan using contacts_contactgroup_contacts_contactgroup_id_0f909f73_uniq on contacts_contactgroup_contacts  (cost=0.43..3.93 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=227)
              Index Cond: ((contactgroup_id = 1) AND (contact_id = contacts_contact.id))
              Heap Fetches: 0
              Buffers: shared hit=684
Total runtime: 3.488 ms


But if increase the limit to 223 then it performs like:

Limit  (cost=8785.68..13306.24 rows=223 width=88) (actual time=2685.830..2686.534 rows=223 loops=1)
  Buffers: shared hit=767648 read=86530
  ->  Merge Join  (cost=8785.68..29016.70 rows=998 width=88) (actual time=2685.828..2686.461 rows=223 loops=1)
        Merge Cond: (contacts_contact.id = contacts_contactgroup_contacts.contact_id)
        Buffers: shared hit=767648 read=86530
        ->  Sort  (cost=8784.44..8789.45 rows=2004 width=92) (actual time=2685.742..2685.804 rows=228 loops=1)
              Sort Key: contacts_contact.id
              Sort Method: quicksort  Memory: 34327kB
              Buffers: shared hit=767648 read=86524
              ->  Nested Loop  (cost=6811.12..8674.53 rows=2004 width=92) (actual time=646.573..2417.291 rows=200412 loops=1)
                    Buffers: shared hit=767648 read=86524
                    ->  HashAggregate  (cost=6810.70..6813.14 rows=244 width=4) (actual time=646.532..766.200 rows=200412 loops=1)
                          Buffers: shared read=51417
                          ->  Bitmap Heap Scan on values_value u0  (cost=60.98..6805.69 rows=2004 width=4) (actual time=92.016..433.709 rows=200412 loops=1)
                                Recheck Cond: ((contact_field_id = 1) AND (upper(string_value) = 'F'::text))
                                Buffers: shared read=51417
                                ->  Bitmap Index Scan on values_value_field_string_value_contact  (cost=0.00..60.47 rows=2004 width=0) (actual time=70.647..70.647 rows=200412 loops=1)
                                      Index Cond: ((contact_field_id = 1) AND (upper(string_value) = 'F'::text))
                                      Buffers: shared read=770
                    ->  Index Scan using contacts_contact_pkey on contacts_contact  (cost=0.42..7.62 rows=1 width=88) (actual time=0.007..0.007 rows=1 loops=200412)
                          Index Cond: (id = u0.contact_id)
                          Buffers: shared hit=767648 read=35107
        ->  Index Only Scan Backward using contacts_contactgroup_contacts_contactgroup_id_0f909f73_uniq on contacts_contactgroup_contacts  (cost=0.43..18967.29 rows=497992 width=4) (actual time=0.073..0.273 rows=550 loops=1)
              Index Cond: (contactgroup_id = 1)
              Heap Fetches: 0
              Buffers: shared read=6
Total runtime: 2695.301 ms


I've tried running ANALYZE but that actually reduced the limit at which things get worse. Any insight into the reasoning of the query planner would be much appreciated.

Thanks

--
Rowan Seymour | +260 964153686 | @rowanseymour

pgsql-performance by date:

Previous
From: John Gorman
Date:
Subject: Re: [PERFORM] Correct use of cursors for very large result sets inPostgres
Next
From: Pavel Stehule
Date:
Subject: Re: [PERFORM] Query performance changes significantly depending onlimit value