Re: Slow "not in array" operation - Mailing list pgsql-performance

From Marco Colli
Subject Re: Slow "not in array" operation
Date
Msg-id CAFvCgN6d0cTQuB_Ye4HHyh61ngWR3UR0O-kWG7CnEbrU38X8-w@mail.gmail.com
Whole thread Raw
In response to Re: Slow "not in array" operation  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Slow "not in array" operation
Re: Slow "not in array" operation
List pgsql-performance
Replying to the previous questions:
- work_mem = 64MB (there are hundreds of connections)
- the project 123 has more than 7M records, and those that don't have the tag 'en' are 4.8M
 
What was the plan for the one that took 500ms?

This is the query / plan without the filter on tags:

SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123 AND "subscriptions"."trashed_at" IS NULL;

                 QUERY PLAN                                                                                                 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=291342.67..291342.68 rows=1 width=8) (actual time=354.556..354.556 rows=1 loops=1)
   ->  Gather  (cost=291342.05..291342.66 rows=6 width=8) (actual time=354.495..374.305 rows=7 loops=1)
         Workers Planned: 6
         Workers Launched: 6
         ->  Partial Aggregate  (cost=290342.05..290342.06 rows=1 width=8) (actual time=349.799..349.799 rows=1 loops=7)
               ->  Parallel Index Only Scan using index_subscriptions_on_project_id_and_uid on subscriptions  (cost=0.56..287610.27 rows=1092713 width=0) (actual time=0.083..273.018 rows=1030593 loops=7)
                     Index Cond: (project_id = 123)
                     Heap Fetches: 280849
 Planning Time: 0.753 ms
 Execution Time: 374.483 ms
(10 rows) 

Then if I simply add the exclusion of a single tag, it goes from a few milliseconds to 70s...



On Wed, Nov 13, 2019 at 12:33 AM Jeff Janes <jeff.janes@gmail.com> wrote:

3) Here's the query plan that I get after disabling the seq scan:

                                                                                        QUERY PLAN                                                                                         

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Finalize Aggregate  (cost=2183938.89..2183938.90 rows=1 width=8) (actual time=94972.253..94972.254 rows=1 loops=1)

   ->  Gather  (cost=2183938.16..2183938.87 rows=7 width=8) (actual time=94952.895..95132.626 rows=8 loops=1)

         Workers Planned: 7

         Workers Launched: 7

         ->  Partial Aggregate  (cost=2182938.16..2182938.17 rows=1 width=8) (actual time=94950.958..94950.958 rows=1 loops=8)

               ->  Parallel Bitmap Heap Scan on subscriptions  (cost=50294.50..2180801.47 rows=854677 width=0) (actual time=1831.342..94895.208 rows=611828 loops=8)

                     Recheck Cond: ((project_id = 123) AND (trashed_at IS NULL))

                     Rows Removed by Index Recheck: 2217924

                     Filter: (NOT (tags @> '{en}'::character varying[]))

                     Rows Removed by Filter: 288545

                     Heap Blocks: exact=120301 lossy=134269

                     ->  Bitmap Index Scan on index_subscriptions_on_project_id_and_tags  (cost=0.00..48798.81 rows=6518094 width=0) (actual time=1493.823..1493.823 rows=7203173 loops=1)

                           Index Cond: (project_id = 123)

 Planning Time: 1.273 ms

 Execution Time: 95132.766 ms

(15 rows)


What was the plan for the one that took 500ms?  I don't see how it is possible that this one is 180 times slower than that one.  Maybe a hot cache versus cold cache?  Also, it seems weird to me that "trashed_at IS NULL" shows up in the recheck but not in the original  Index Cond.  Increasing work_mem can also help, but since the  Bitmap Index Scan itself took half the time there is only so much it can do.

Cheers,

Jeff

pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Slow "not in array" operation
Next
From: Morris de Oryx
Date:
Subject: Re: Slow "not in array" operation