Re: DBT-3 with SF=20 got failed - Mailing list pgsql-hackers

From Kohei KaiGai
Subject Re: DBT-3 with SF=20 got failed
Date
Msg-id CADyhKSWnRTvVzpgFXDrQ5sgvY4E+kWor7Qfj4qCt2VrFDEs+0A@mail.gmail.com
Whole thread Raw
In response to Re: DBT-3 with SF=20 got failed  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: DBT-3 with SF=20 got failed  (Simon Riggs <simon@2ndQuadrant.com>)
Re: DBT-3 with SF=20 got failed  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
2015-08-19 20:12 GMT+09:00 Simon Riggs <simon@2ndquadrant.com>:
> On 12 June 2015 at 00:29, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
>>
>> I see two ways to fix this:
>>
>> (1) enforce the 1GB limit (probably better for back-patching, if that's
>>     necessary)
>>
>> (2) make it work with hash tables over 1GB
>>
>> I'm in favor of (2) if there's a good way to do that. It seems a bit
>> stupid not to be able to use fast hash table because there's some artificial
>> limit. Are there any fundamental reasons not to use the
>> MemoryContextAllocHuge fix, proposed by KaiGai-san?
>
>
> If there are no objections, I will apply the patch for 2) to HEAD and
> backpatch to 9.5.
>
Please don't be rush. :-)

It is not difficult to replace palloc() by palloc_huge(), however, it may lead
another problem once planner gives us a crazy estimation.
Below is my comment on the another thread.

==========
Also, we may need to pay attention to reliability of scale estimation
by planner.
Even though the plan below says that Join generates 60521928028 rows,
it actually generates 776157676 rows (0.12%).


tpcds100=# EXPLAIN ANALYZE select
ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2from web_sales ws1,web_sales ws2where
ws1.ws_order_number= ws2.ws_order_number  and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk;
                                QUERY PLAN
 


--------------------------------------------------------------------------------------------------------------------------------------------Merge
Join (cost=25374644.08..1160509591.61 rows=60521928028
 
width=24) (actual time=138347.979..491889.343 rows=776157676 loops=1)  Merge Cond: (ws1.ws_order_number =
ws2.ws_order_number) Join Filter: (ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)  Rows Removed by Join Filter: 127853313
-> Sort  (cost=12687322.04..12867325.16 rows=72001248 width=16)
 
(actual time=73252.300..79017.420 rows=72001237 loops=1)        Sort Key: ws1.ws_order_number        Sort Method:
quicksort Memory: 7083296kB        ->  Seq Scan on web_sales ws1  (cost=0.00..3290612.48
 
rows=72001248 width=16) (actual time=0.023..39951.201 rows=72001237
loops=1)  ->  Sort  (cost=12687322.04..12867325.16 rows=72001248 width=16)
(actual time=65095.655..128885.811 rows=904010978 loops=1)        Sort Key: ws2.ws_order_number        Sort Method:
quicksort Memory: 7083296kB        ->  Seq Scan on web_sales ws2  (cost=0.00..3290612.48
 
rows=72001248 width=16) (actual time=0.014..31046.888 rows=72001237
loops=1)Planning time: 0.232 msExecution time: 530176.521 ms
(14 rows)


So, even if we allows nodeHash.c to allocate hash buckets larger than
1GB, its initial size may be determined carefully.
Probably, 1GB is a good starting point even if expanded later.

-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: DBT-3 with SF=20 got failed
Next
From: Simon Riggs
Date:
Subject: Re: Declarative partitioning