Re: Slow "not in array" operation - Mailing list pgsql-performance

From Marco Colli
Subject Re: Slow "not in array" operation
Date
Msg-id CAFvCgN4txnTT3Uev-YYOPaExPxX2xBYxq46BCohTUzWfWsmbgQ@mail.gmail.com
Whole thread Raw
In response to Re: Slow "not in array" operation  (Michael Lewis <mlewis@entrata.com>)
Responses Re: Slow "not in array" operation  (Michael Lewis <mlewis@entrata.com>)
Re: Slow "not in array" operation  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Slow "not in array" operation  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-performance
1) It is running on a DigitalOcean CPU-optimized droplet with dedicated hyperthreads (16 cores) and SSD.
SHOW random_page_cost; => 2

2) What config names should I check exactly? I used some suggestions from the online PGTune, when I first configured the db some months ago:
max_worker_processes = 16
max_parallel_workers_per_gather = 8
max_parallel_workers = 16

3) Here's the query plan that I get after disabling the seq scan:

                                                                                        QUERY PLAN                                                                                         

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Finalize Aggregate  (cost=2183938.89..2183938.90 rows=1 width=8) (actual time=94972.253..94972.254 rows=1 loops=1)

   ->  Gather  (cost=2183938.16..2183938.87 rows=7 width=8) (actual time=94952.895..95132.626 rows=8 loops=1)

         Workers Planned: 7

         Workers Launched: 7

         ->  Partial Aggregate  (cost=2182938.16..2182938.17 rows=1 width=8) (actual time=94950.958..94950.958 rows=1 loops=8)

               ->  Parallel Bitmap Heap Scan on subscriptions  (cost=50294.50..2180801.47 rows=854677 width=0) (actual time=1831.342..94895.208 rows=611828 loops=8)

                     Recheck Cond: ((project_id = 123) AND (trashed_at IS NULL))

                     Rows Removed by Index Recheck: 2217924

                     Filter: (NOT (tags @> '{en}'::character varying[]))

                     Rows Removed by Filter: 288545

                     Heap Blocks: exact=120301 lossy=134269

                     ->  Bitmap Index Scan on index_subscriptions_on_project_id_and_tags  (cost=0.00..48798.81 rows=6518094 width=0) (actual time=1493.823..1493.823 rows=7203173 loops=1)

                           Index Cond: (project_id = 123)

 Planning Time: 1.273 ms

 Execution Time: 95132.766 ms

(15 rows)



On Tue, Nov 12, 2019 at 8:20 PM Michael Lewis <mlewis@entrata.com> wrote:
It is very interesting to me that the optimizer chose a parallel sequential scan rather than an index scan on either of your indexes that start with project_id that also reference trashed_at.

1) Are you running on SSD type storage? Has random_page_cost been lowered to 1-1.5 or so (close to 1 assumes good cache hits)?
2) It seems you have increased parallel workers. Have you also changed the startup or other cost configs related to how inclined the system is to use sequential scans?
3) If you disable sequential scan, what does the plan look like for this query? (SET ENABLE_SEQSCAN TO OFF;)

pgsql-performance by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: Slow "not in array" operation
Next
From: Michael Lewis
Date:
Subject: Re: Slow "not in array" operation