Re: Slow "not in array" operation - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Slow "not in array" operation
Date
Msg-id CAMkU=1xwG6OZd7tzi3+jCmt=QhCX6KTj4027iN7xAT42nA=LmA@mail.gmail.com
Whole thread Raw
In response to Re: Slow "not in array" operation  (Marco Colli <collimarco91@gmail.com>)
Responses Re: Slow "not in array" operation  (Marco Colli <collimarco91@gmail.com>)
List pgsql-performance
On Wed, Nov 13, 2019 at 4:20 AM Marco Colli <collimarco91@gmail.com> wrote:
Replying to the previous questions:
- work_mem = 64MB (there are hundreds of connections)
- the project 123 has more than 7M records, and those that don't have the tag 'en' are 4.8M
 
What was the plan for the one that took 500ms?

This is the query / plan without the filter on tags:

SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123 AND "subscriptions"."trashed_at" IS NULL;

                 QUERY PLAN                                                                                                 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=291342.67..291342.68 rows=1 width=8) (actual time=354.556..354.556 rows=1 loops=1)
   ->  Gather  (cost=291342.05..291342.66 rows=6 width=8) (actual time=354.495..374.305 rows=7 loops=1)
         Workers Planned: 6
         Workers Launched: 6
         ->  Partial Aggregate  (cost=290342.05..290342.06 rows=1 width=8) (actual time=349.799..349.799 rows=1 loops=7)
               ->  Parallel Index Only Scan using index_subscriptions_on_project_id_and_uid on subscriptions  (cost=0.56..287610.27 rows=1092713 width=0) (actual time=0.083..273.018 rows=1030593 loops=7)
                     Index Cond: (project_id = 123)
                     Heap Fetches: 280849
 Planning Time: 0.753 ms
 Execution Time: 374.483 ms
(10 rows)

My previous comment about the bitmap index scan taking half the time was a slip of the eye, I was comparing *cost* of the bitmap index scan to the *time* of the overall plan.  But then the question is, why isn't it doing an index-only scan on  "index_subscriptions_on_project_id_and_tags"?  And the answer is that is because it is a GIN index.  Make the same index only as btree, and you should get good performance as it can filter the tags within a given project without visiting the table.

Cheers,

Jeff

pgsql-performance by date:

Previous
From: Rick Otten
Date:
Subject: Re: Slow "not in array" operation
Next
From: Marco Colli
Date:
Subject: Re: Slow "not in array" operation