Re: 600 million rows of data. Bad hardware or need partitioning? - Mailing list pgsql-performance

From Arya F
Subject Re: 600 million rows of data. Bad hardware or need partitioning?
Date
Msg-id CAFoK1azUSgoVLQgPd-GP9UJkPn=91Gc=_UwTOV_SzyuWA01F5w@mail.gmail.com
Whole thread Raw
In response to Re: 600 million rows of data. Bad hardware or need partitioning?  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-performance
On Mon, May 4, 2020 at 12:44 AM David Rowley <dgrowleyml@gmail.com> wrote:
> How long does it take if you first do:
>
> SET enable_nestloop TO off;

I tried this, but it takes much longer

Time: 318620.319 ms (05:18.620)

Below is the EXPLAIN ANALYZE


      QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on test_table  (cost=639.96..4581378.80 rows=10000 width=160)
(actual time=290593.159..290593.159 rows=0 loops=1)
   ->  Hash Join  (cost=639.96..4581378.80 rows=10000 width=160)
(actual time=422.313..194430.318 rows=10000 loops=1)
         Hash Cond: (test_table.id = subquery.id)
         ->  Seq Scan on test_table  (cost=0.00..4200967.98
rows=101238898 width=92) (actual time=296.970..177731.611
rows=101189271 loops=1)
         ->  Hash  (cost=514.96..514.96 rows=10000 width=56) (actual
time=125.312..125.312 rows=10000 loops=1)
               Buckets: 16384  Batches: 1  Memory Usage: 959kB
               ->  Subquery Scan on subquery  (cost=0.00..514.96
rows=10000 width=56) (actual time=0.030..123.031 rows=10000 loops=1)
                     ->  Limit  (cost=0.00..414.96 rows=10000
width=16) (actual time=0.024..121.014 rows=10000 loops=1)
                           ->  Seq Scan on test_table test_table_1
(cost=0.00..4200967.98 rows=101238898 width=16) (actual
time=0.021..120.106 rows=10000 loops=1)
 Planning Time: 0.304 ms
 JIT:
   Functions: 12
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 2.178 ms, Inlining 155.980 ms, Optimization
100.611 ms, Emission 39.481 ms, Total 298.250 ms
 Execution Time: 290595.448 ms
(15 rows)


> If you find it's faster then you most likely have random_page_cost set
> unrealistically low. In fact, I'd say it's very unlikely that a nested
> loop join will be a win in this case when random pages must be read
> from a mechanical disk, but by all means, try disabling it with the
> above command and see for yourself.

It's much slower with SET enable_nestloop TO off. Any other suggestions?



pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Please help! Query jumps from 1s -> 4m
Next
From: Arya F
Date:
Subject: Re: 600 million rows of data. Bad hardware or need partitioning?