Re: Slow "not in array" operation - Mailing list pgsql-performance

From Marco Colli
Subject Re: Slow "not in array" operation
Date
Msg-id CAFvCgN6bQhf03jdakdk1eVMad8r=6nFV=cmy2v_sp0f8_kZ-bg@mail.gmail.com
Whole thread Raw
In response to Re: Slow "not in array" operation  (Michael Lewis <mlewis@entrata.com>)
Responses Re: Slow "not in array" operation  (Michael Lewis <mlewis@entrata.com>)
List pgsql-performance
To be honest, I have simplified the question above. In order to show you the plan, I must show you the actual query, which is this:

=== QUERY ===

SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123 AND "subscriptions"."trashed_at" IS NULL AND NOT (tags @> ARRAY['en']::varchar[]);


=== QUERY PLAN ===


                                                                      QUERY PLAN                                                                      

------------------------------------------------------------------------------------------------------------------------------------------------------

 Finalize Aggregate  (cost=2152593.04..2152593.05 rows=1 width=8) (actual time=70555.561..70555.561 rows=1 loops=1)

   ->  Gather  (cost=2152592.31..2152593.02 rows=7 width=8) (actual time=70540.641..70702.365 rows=8 loops=1)

         Workers Planned: 7

         Workers Launched: 7

         ->  Partial Aggregate  (cost=2151592.31..2151592.32 rows=1 width=8) (actual time=70537.376..70537.377 rows=1 loops=8)

               ->  Parallel Seq Scan on subscriptions  (cost=0.00..2149490.49 rows=840731 width=0) (actual time=0.742..70479.359 rows=611828 loops=8)

                     Filter: ((trashed_at IS NULL) AND (NOT (tags @> '{en}'::character varying[])) AND (project_id = 123))

                     Rows Removed by Filter: 4572769

 Planning Time: 1.304 ms

 Execution Time: 70702.463 ms

(10 rows)


=== INDEXES ===


Indexes:

    "subscriptions_pkey" PRIMARY KEY, btree (id)

    "index_subscriptions_on_project_id_and_created_at" btree (project_id, created_at DESC)

    "index_subscriptions_on_project_id_and_tags" gin (project_id, tags) WHERE trashed_at IS NULL

    "index_subscriptions_on_project_id_and_trashed_at" btree (project_id, trashed_at DESC)


=== NOTES ===

Running the query without the last filter on tags takes only 500ms. 
Unfortunately I cannot make strict assumptions on data or tags: for example I also have to count subscriptions in a project that don't have tag A and don't have tag B, etc. This means that I cannot simply calculate the total and then make a subtraction.

On Tue, Nov 12, 2019 at 7:40 PM Michael Lewis <mlewis@entrata.com> wrote:
What's the plan for the slow one? What's the time to just count all rows?

pgsql-performance by date:

Previous
From: Michael Lewis
Date:
Subject: Re: Slow "not in array" operation
Next
From: Michael Lewis
Date:
Subject: Re: Slow "not in array" operation