Thread: Bug? ExecChooseHashTableSize() got assertion failed with crazy number of rows
Bug? ExecChooseHashTableSize() got assertion failed with crazy number of rows
From
Kouhei Kaigai
Date:
Hello, I noticed ExecChooseHashTableSize() in nodeHash.c got failed by Assert(nbuckets > 0), when crazy number of rows are expected. BACKTRACE: #0 0x0000003f79432625 in raise () from /lib64/libc.so.6 #1 0x0000003f79433e05 in abort () from /lib64/libc.so.6 #2 0x000000000092600a in ExceptionalCondition (conditionName=0xac1ea0 "!(nbuckets > 0)", errorType=0xac1d88 "FailedAssertion", fileName=0xac1d40 "nodeHash.c", lineNumber=545) at assert.c:54 #3 0x00000000006851ff in ExecChooseHashTableSize (ntuples=60521928028, tupwidth=8, useskew=1 '\001', numbuckets=0x7fff146bff04, numbatches=0x7fff146bff00, num_skew_mcvs=0x7fff146bfefc) at nodeHash.c:545 #4 0x0000000000701735 in initial_cost_hashjoin (root=0x253a318, workspace=0x7fff146bffc0, jointype=JOIN_SEMI, hashclauses=0x257e4f0, outer_path=0x2569a40, inner_path=0x2569908, sjinfo=0x2566f40, semifactors=0x7fff146c0168) at costsize.c:2592 #5 0x000000000070e02a in try_hashjoin_path (root=0x253a318, joinrel=0x257d940, outer_path=0x2569a40, inner_path=0x2569908, hashclauses=0x257e4f0, jointype=JOIN_SEMI, extra=0x7fff146c0150) at joinpath.c:543 See the following EXPLAIN output, at the configuration without --enable-cassert. Planner expects 60.5B rows towards the self join by a relation with 72M rows. (Probably, this estimation is too much.) [kaigai@ayu ~]$ (echo EXPLAIN; cat ~/tpcds/query95.sql) | psql tpcds100 QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=9168667273.07..9168667273.08 rows=1 width=20) CTE ws_wh -> Custom Scan (GpuJoin) (cost=3342534.49..654642911.88 rows=60521928028 width=24) Bulkload: On (density: 100.00%) Depth 1: Logic: GpuHashJoin, HashKeys: (ws_order_number), JoinQual: ((ws_warehouse_sk <> ws_warehouse_sk) AND(ws_order_number = ws_order_number)), nrows (ratio: 84056.77%) -> Custom Scan (BulkScan) on web_sales ws1_1 (cost=0.00..3290612.48 rows=72001248 width=16) -> Seq Scan on web_sales ws2 (cost=0.00..3290612.48 rows=72001248 width=16) -> Sort (cost=8514024361.19..8514024361.20 rows=1 width=20) Sort Key: (count(DISTINCT ws1.ws_order_number)) : This crash was triggered by Assert(nbuckets > 0), and nbuckets is calculated as follows. /* * If there's not enough space to store the projected number of tuples and * the required bucket headers, we will need multiple batches. */ if (inner_rel_bytes + bucket_bytes > hash_table_bytes) { /* We'll need multiple batches */ long lbuckets; double dbatch; int minbatch; long bucket_size; /* * Estimate the number of buckets we'll want to have when work_mem is * entirely full. Each bucket will contain a bucket pointer plus * NTUP_PER_BUCKET tuples, whose projected size already includes * overhead for the hash code, pointer to the next tuple, etc. */ bucket_size = (tupsize * NTUP_PER_BUCKET + sizeof(HashJoinTuple)); lbuckets = 1 << my_log2(hash_table_bytes / bucket_size); lbuckets = Min(lbuckets, max_pointers); nbuckets = (int) lbuckets; bucket_bytes = nbuckets * sizeof(HashJoinTuple); : : } Assert(nbuckets > 0); Assert(nbatch > 0); In my case, the hash_table_bytes was 101017630802, and bucket_size was 48. So, my_log2(hash_table_bytes / bucket_size) = 31, then lbuckets will have negative number because both "1" and my_log2() is int32. So, Min(lbuckets, max_pointers) chooses 0x80000000, then it was set on the nbuckets and triggers the Assert(). Attached patch fixes the problem. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
Re: Bug? ExecChooseHashTableSize() got assertion failed with crazy number of rows
From
Kevin Grittner
Date:
Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > long lbuckets; > lbuckets = 1 << my_log2(hash_table_bytes / bucket_size); > Assert(nbuckets > 0); > In my case, the hash_table_bytes was 101017630802, and bucket_size was 48. > So, my_log2(hash_table_bytes / bucket_size) = 31, then lbuckets will have > negative number because both "1" and my_log2() is int32. > So, Min(lbuckets, max_pointers) chooses 0x80000000, then it was set on > the nbuckets and triggers the Assert(). > Attached patch fixes the problem. So you changed the literal of 1 to 1U, but doesn't that just double the threshold for failure? Wouldn't 1L (to match the definition of lbuckets) be better? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Bug? ExecChooseHashTableSize() got assertion failed with crazy number of rows
From
Kouhei Kaigai
Date:
> Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > > long lbuckets; > > > lbuckets = 1 << my_log2(hash_table_bytes / bucket_size); > > > Assert(nbuckets > 0); > > > In my case, the hash_table_bytes was 101017630802, and bucket_size was 48. > > So, my_log2(hash_table_bytes / bucket_size) = 31, then lbuckets will have > > negative number because both "1" and my_log2() is int32. > > So, Min(lbuckets, max_pointers) chooses 0x80000000, then it was set on > > the nbuckets and triggers the Assert(). > > > Attached patch fixes the problem. > > So you changed the literal of 1 to 1U, but doesn't that just double > the threshold for failure? Wouldn't 1L (to match the definition of > lbuckets) be better? > Ah, yes, it is my careless mistake. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Re: Bug? ExecChooseHashTableSize() got assertion failed with crazy number of rows
From
David Rowley
Date:
On 19 August 2015 at 08:54, Kevin Grittner <kgrittn@ymail.com> wrote:
Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> long lbuckets;
> lbuckets = 1 << my_log2(hash_table_bytes / bucket_size);
> Assert(nbuckets > 0);
> In my case, the hash_table_bytes was 101017630802, and bucket_size was 48.
> So, my_log2(hash_table_bytes / bucket_size) = 31, then lbuckets will have
> negative number because both "1" and my_log2() is int32.
> So, Min(lbuckets, max_pointers) chooses 0x80000000, then it was set on
> the nbuckets and triggers the Assert().
> Attached patch fixes the problem.
So you changed the literal of 1 to 1U, but doesn't that just double
the threshold for failure? Wouldn't 1L (to match the definition of
lbuckets) be better?
I agree, but I can only imagine this is happening because the maximum setting of work_mem has been modified with the code you're running.
hash_tables_bytes is set based on work_mem
hash_table_bytes = work_mem * 1024L;
The size of your hash table is 101017630802 bytes, which is:
david=# select pg_size_pretty(101017630802);
pg_size_pretty
----------------
94 GB
(1 row)
david=# set work_mem = '94GB';
ERROR: 98566144 is outside the valid range for parameter "work_mem" (64 .. 2097151)
So I think the only way the following could cause an error, is if bucket_size was 1, which it can't be.
lbuckets = 1 << my_log2(hash_table_bytes / bucket_size);
I think one day soon we'll need to allow larger work_mem sizes, but I think there's lots more to do than this change.
Regards
David Rowley
--
David Rowley http://www.2ndQuadrant.com/
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Re: Bug? ExecChooseHashTableSize() got assertion failed with crazy number of rows
From
Kouhei Kaigai
Date:
> -----Original Message----- > From: David Rowley [mailto:david.rowley@2ndquadrant.com] > Sent: Wednesday, August 19, 2015 9:00 AM > To: Kevin Grittner > Cc: Kaigai Kouhei(海外 浩平); pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Bug? ExecChooseHashTableSize() got assertion failed with > crazy number of rows > > On 19 August 2015 at 08:54, Kevin Grittner <kgrittn@ymail.com> wrote: > > > Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > > long lbuckets; > > > lbuckets = 1 << my_log2(hash_table_bytes / bucket_size); > > > Assert(nbuckets > 0); > > > In my case, the hash_table_bytes was 101017630802, and bucket_size was > 48. > > So, my_log2(hash_table_bytes / bucket_size) = 31, then lbuckets will > have > > negative number because both "1" and my_log2() is int32. > > So, Min(lbuckets, max_pointers) chooses 0x80000000, then it was set > on > > the nbuckets and triggers the Assert(). > > > Attached patch fixes the problem. > > So you changed the literal of 1 to 1U, but doesn't that just double > the threshold for failure? Wouldn't 1L (to match the definition of > lbuckets) be better? > > > > > I agree, but I can only imagine this is happening because the maximum setting > of work_mem has been modified with the code you're running. > > hash_tables_bytes is set based on work_mem > > hash_table_bytes = work_mem * 1024L; > > The size of your hash table is 101017630802 bytes, which is: > > david=# select pg_size_pretty(101017630802); > > pg_size_pretty > ---------------- > 94 GB > (1 row) > > david=# set work_mem = '94GB'; > ERROR: 98566144 is outside the valid range for parameter "work_mem" (64 .. > 2097151) > Hmm. Why I could set work_mem = '96GB' without error. It was described in the postgresql.conf. postgres=# SHOW work_mem; work_mem ---------- 96GB (1 row) > So I think the only way the following could cause an error, is if bucket_size > was 1, which it can't be. > > lbuckets = 1 << my_log2(hash_table_bytes / bucket_size); > > > I think one day soon we'll need to allow larger work_mem sizes, but I think there's > lots more to do than this change. > I oversight this limitation, but why I can bypass GUC limitation check? -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Re: Bug? ExecChooseHashTableSize() got assertion failed with crazy number of rows
From
David Rowley
Date:
On 19 August 2015 at 12:23, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
Hmm. Why I could set work_mem = '96GB' without error.> -----Original Message-----
> From: David Rowley [mailto:david.rowley@2ndquadrant.com]
> Sent: Wednesday, August 19, 2015 9:00 AM
> The size of your hash table is 101017630802 bytes, which is:
>
> david=# select pg_size_pretty(101017630802);
>
> pg_size_pretty
> ----------------
> 94 GB
> (1 row)
>
> david=# set work_mem = '94GB';
> ERROR: 98566144 is outside the valid range for parameter "work_mem" (64 ..
> 2097151)
>
It was described in the postgresql.conf.
postgres=# SHOW work_mem;
work_mem
----------
96GB
(1 row)
> So I think the only way the following could cause an error, is if bucket_size
> was 1, which it can't be.
>
> lbuckets = 1 << my_log2(hash_table_bytes / bucket_size);
>
>
> I think one day soon we'll need to allow larger work_mem sizes, but I think there's
> lots more to do than this change.
>
I oversight this limitation, but why I can bypass GUC limitation check?
I'm unable to get the server to start if I set work_mem that big. I also tried starting the start with 1GB work_mem then doing a pg_ctl reload. With each of these I get the same error message that I would have gotten if I had done; set work_mem = '96GB';
Which version are you running?
Are you sure there's not changes in guc.c for work_mem's range?
Regards
David Rowley
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Re: Bug? ExecChooseHashTableSize() got assertion failed with crazy number of rows
From
Tom Lane
Date:
David Rowley <david.rowley@2ndquadrant.com> writes: > david=# set work_mem = '94GB'; > ERROR: 98566144 is outside the valid range for parameter "work_mem" (64 .. > 2097151) Apparently you're testing on a 32-bit server. 64-bit servers allow work_mem to go up to INT_MAX kilobytes. regards, tom lane
Re: Bug? ExecChooseHashTableSize() got assertion failed with crazy number of rows
From
David Rowley
Date:
On 19 August 2015 at 12:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <david.rowley@2ndquadrant.com> writes:
> david=# set work_mem = '94GB';
> ERROR: 98566144 is outside the valid range for parameter "work_mem" (64 ..
> 2097151)
Apparently you're testing on a 32-bit server. 64-bit servers allow
work_mem to go up to INT_MAX kilobytes.
hmm, no, but it does appear that sizeof(long) is 4 bytes for me, despite 64 bit OS.
I guess this accounts for my misunderstanding that work_mem was limited to 2GB even on 64 bit machines.
From guc.h
#if SIZEOF_SIZE_T > 4 && SIZEOF_LONG > 4
#define MAX_KILOBYTES INT_MAX
#else
#define MAX_KILOBYTES (INT_MAX / 1024)
#endif
Apologies for the noise.
Re: Bug? ExecChooseHashTableSize() got assertion failed with crazy number of rows
From
Kouhei Kaigai
Date:
I reported a similar issue before. * Re: DBT-3 with SF=20 got failed http://www.postgresql.org/message-id/557A19D1.9050107@2ndquadrant.com I didn't get a server crash at that time, however, palloc() complained about request size >= 1GB. So, we may need a couple of overhaul around HashJoin to support large size of data, not only nbuckets around 0x80000000. Also, we may need to pay attention to reliability of scale estimation by planner. Even though the plan below says that Join generates 60521928028 rows, it actually generates 776157676 rows (0.12%). tpcds100=# EXPLAIN ANALYZE select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2from web_sales ws1,web_salesws2where ws1.ws_order_number = ws2.ws_order_number and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------Merge Join (cost=25374644.08..1160509591.61 rows=60521928028 width=24) (actual time=138347.979..491889.343 rows=776157676 loops=1) Merge Cond: (ws1.ws_order_number = ws2.ws_order_number) Join Filter: (ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk) Rows Removed by Join Filter: 127853313 -> Sort (cost=12687322.04..12867325.16 rows=72001248 width=16)(actual time=73252.300..79017.420 rows=72001237 loops=1) Sort Key: ws1.ws_order_number Sort Method:quicksort Memory: 7083296kB -> Seq Scan on web_sales ws1 (cost=0.00..3290612.48 rows=72001248 width=16)(actual time=0.023..39951.201 rows=72001237 loops=1) -> Sort (cost=12687322.04..12867325.16 rows=72001248 width=16)(actual time=65095.655..128885.811 rows=904010978 loops=1) Sort Key: ws2.ws_order_number Sort Method:quicksort Memory: 7083296kB -> Seq Scan on web_sales ws2 (cost=0.00..3290612.48 rows=72001248 width=16)(actual time=0.014..31046.888 rows=72001237 loops=1)Planning time: 0.232 msExecution time: 530176.521 ms (14 rows) So, even if we allows nodeHash.c to allocate hash buckets larger than 1GB, its initial size may be determined carefully. Probably, 1GB is a good starting point even if expanded later. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of David Rowley > Sent: Wednesday, August 19, 2015 10:07 AM > To: Tom Lane > Cc: Kevin Grittner; Kaigai Kouhei(海外 浩平); pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Bug? ExecChooseHashTableSize() got assertion failed with > crazy number of rows > > On 19 August 2015 at 12:38, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > David Rowley <david.rowley@2ndquadrant.com> writes: > > david=# set work_mem = '94GB'; > > ERROR: 98566144 is outside the valid range for parameter "work_mem" > (64 .. > > 2097151) > > Apparently you're testing on a 32-bit server. 64-bit servers allow > work_mem to go up to INT_MAX kilobytes. > > > > hmm, no, but it does appear that sizeof(long) is 4 bytes for me, despite 64 bit > OS. > > I guess this accounts for my misunderstanding that work_mem was limited to 2GB > even on 64 bit machines. > > From guc.h > > #if SIZEOF_SIZE_T > 4 && SIZEOF_LONG > 4 > #define MAX_KILOBYTES INT_MAX > #else > #define MAX_KILOBYTES (INT_MAX / 1024) > #endif > > Apologies for the noise.
Re: Bug? ExecChooseHashTableSize() got assertion failed with crazy number of rows
From
Kevin Grittner
Date:
Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > we may need a couple of overhaul around HashJoin to support large > size of data, not only nbuckets around 0x80000000. Perhaps, but this is a clear bug, introduced to the 9.5 code, with an obvious fix; so I've pushed the change from 1 to 1L on that left shift. There was clearly an attempt in surrounding code to deal with size overruns by using a long, applying a min(), and casting to int; this one statement just missed a trick. If we need to further constrain sizes to keep within allocation limits, that seems like an entirely separate patch. Thanks for finding and analyzing this and providing a patch! -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Bug? ExecChooseHashTableSize() got assertion failed with crazy number of rows
From
Tom Lane
Date:
Kevin Grittner <kgrittn@ymail.com> writes: > Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> we may need a couple of overhaul around HashJoin to support large >> size of data, not only nbuckets around 0x80000000. > Perhaps, but this is a clear bug, introduced to the 9.5 code, with > an obvious fix; so I've pushed the change from 1 to 1L on that left > shift. I don't think it's anywhere near as clear as you think. The preceding lines should have limited nbuckets to be no more than INT_MAX/2, so how could an overflow occur there? (The result of 1<<mylog() should be at most 0x40000000 AFAICS.) If overflow is possible, how will s/1/1L/ make it better on machines where int and long are the same size? And on machines where long is wider than int, you've still got a bug if the result of the left shift somehow overflowed an int, because it's going to be assigned to nbuckets which is an int. So I think the "1" coding was just fine in context. If there is an overflow in ExecChooseHashTableSize, it's somewhere else. regards, tom lane
Re: Bug? ExecChooseHashTableSize() got assertion failed with crazy number of rows
From
Kevin Grittner
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kevin Grittner <kgrittn@ymail.com> writes: >> Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >>> we may need a couple of overhaul around HashJoin to support large >>> size of data, not only nbuckets around 0x80000000. >> Perhaps, but this is a clear bug, introduced to the 9.5 code, with >> an obvious fix; so I've pushed the change from 1 to 1L on that left >> shift. > I don't think it's anywhere near as clear as you think. The preceding > lines should have limited nbuckets to be no more than INT_MAX/2, so how > could an overflow occur there? There is a 1 shifted for nbuckets that I didn't touch because it could not. The limits on nbuckets are not applied at the point of the bug. > (The result of 1<<mylog() should be > at most 0x40000000 AFAICS.) If overflow is possible, how will s/1/1L/ > make it better on machines where int and long are the same size? In such cases there is not a bug -- at least not with my_log returning less than 63. In those cases the change will make no difference. > And on machines where long is wider than int, you've still got a bug > if the result of the left shift somehow overflowed an int, because > it's going to be assigned to nbuckets which is an int. You seem to have missed reading the line in between: | lbuckets = 1L << my_log2(hash_table_bytes / bucket_size); | lbuckets = Min(lbuckets, max_pointers); | nbuckets = (int) lbuckets; There is a Min() and a cast to sort that out. > So I think the "1" coding was just fine in context. I disagree. > If there is an overflow in ExecChooseHashTableSize, it's > somewhere else. I do believe that there are other places we need to place limits, to avoid attempts to allocate more than 1GB, but I think patches to address that should be separate from fixing this particular bug. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Bug? ExecChooseHashTableSize() got assertion failed with crazy number of rows
From
Tom Lane
Date:
I wrote: > I don't think it's anywhere near as clear as you think. Ah, scratch that --- I was looking at the wrong my_log2() call. -ENOCAFFEINE. I'm still doubtful that this is the only overflow risk in that new ExecChooseHashTableSize code, though. For instance, the only reason the line immediately above this one isn't broken is that NTUP_PER_BUCKET is 1 ... but surely we shouldn't be coding on that assumption. regards, tom lane
Re: Bug? ExecChooseHashTableSize() got assertion failed with crazy number of rows
From
Kevin Grittner
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm still doubtful that this is the only overflow risk in that > new ExecChooseHashTableSize code, though. KaiGai already pointed that out on this thread and I completely agree; but I figured that I might as well fix the clear bug with an obvious fix that was causing an assertion failure during 9.5 testing pending analysis and fixes for the other hazards. Getting past the assertion failure might help identify some other bugs in the area. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company