Re: BUG #18806: When enable_rartitionwise_join is set to ON, the database shuts down abnormally - Mailing list pgsql-bugs
From | Ashutosh Bapat |
---|---|
Subject | Re: BUG #18806: When enable_rartitionwise_join is set to ON, the database shuts down abnormally |
Date | |
Msg-id | CAExHW5ty=yQtMdTCRFj0JXnfdns5jME_+-wVpmsTAH8T+4O6aQ@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #18806: When enable_rartitionwise_join is set to ON, the database shuts down abnormally (Tender Wang <tndrwang@gmail.com>) |
Responses |
Re: BUG #18806: When enable_rartitionwise_join is set to ON, the database shuts down abnormally
|
List | pgsql-bugs |
On Wed, Feb 12, 2025 at 6:03 PM Tender Wang <tndrwang@gmail.com> wrote: > > > > Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> 于2025年2月12日周三 20:20写道: >> >> On Wed, Feb 12, 2025 at 5:41 PM Tender Wang <tndrwang@gmail.com> wrote: >> > >> > >> > >> > Tender Wang <tndrwang@gmail.com> 于2025年2月12日周三 19:48写道: >> >> >> >> >> >> >> >> Tender Wang <tndrwang@gmail.com> 于2025年2月12日周三 17:48写道: >> >>> >> >>> >> >>> >> >>> Laurenz Albe <laurenz.albe@cybertec.at> 于2025年2月12日周三 15:38写道: >> >>>> >> >>>> On Wed, 2025-02-12 at 07:01 +0000, PG Bug reporting form wrote: >> >>>> > PostgreSQL version: 17.2 >> >>>> > Operating system: CentOS Linux release 7.9.2009 (Core) >> >>>> > Description: >> >>>> > >> >>>> > After upgrading PG17.2, testing found that when enable_rartitionwise_join is >> >>>> > set to ON, when executing a query, if the main query and subquery have the >> >>>> > same table, the query will report an error. >> >>>> > >> >>>> > The database log shows the following error: >> >>>> > LOG: server process (PID 24796) was terminated by signal 6: Aborted >> >>>> > DETAIL: Failed process was running: select * from test t join orders o on >> >>>> > t.order_id =o.order_id where t.order_id in (select order_id from orders); >> >>>> > LOG: terminating any other active server processes >> >>>> >> >>>> That is a crash that may well indicate a PostgreSQL bug. >> >>>> >> >>>> However, without a way to reproduce the behavior, we won't be able to fix >> >>>> the problem. Try to come up with a self-contained test case. It would >> >>>> also be interesting to know which PostgreSQL extensions are present. >> >>>> >> >>> >> >>> I can reproduce this crash on HEAD. >> >>> >> >>> postgres=# create table test(order_id int, name varchar); >> >>> create table orders(order_id int, name char(10)) partition by hash (order_id); >> >>> CREATE TABLE >> >>> CREATE TABLE >> >>> postgres=# set enable_partitionwise_join = on; >> >>> SET >> >>> postgres=# create table orders_p1 partition of orders for values with ( modulus 32, remainder 0); >> >>> CREATE TABLE >> >>> postgres=# explain select * from test t join orders o on >> >>> t.order_id =o.order_id where t.order_id in (select order_id from orders); >> >>> server closed the connection unexpectedly >> >>> This probably means the server terminated abnormally >> >>> before or while processing the request. >> >>> The connection to the server was lost. Attempting reset: Succeeded. >> >>> >> >> >> >> I debug this crash, I found this crash was related with below commit: >> >> commit 5278d0a2e870c61f9374a7796b90e6f9f6a73638 >> >> Author: Amit Langote <amitlan@postgresql.org> >> >> Date: Mon Mar 25 12:02:40 2024 +0900 >> >> >> >> Reduce memory used by partitionwise joins >> >> >> >> >> >> the joinlist in this query is 1,2,5 >> >> 1 means table t >> >> 2 means table orders >> >> 5 also means table orders, which subquery is pulled up. >> >> When we create join rel for (2,5), then wen enter try_partitionwise_join(). >> >> In this func, we call build_child_join_sjinfo() to build SpecialJoinInfo for child. >> >> In build_child_join_sjinfo(), we use memcpy(), I think we should use copyObject(). >> >> Otherwise, the parent_sjinfo will be freed, for example, after call free_child_join_sjinfo(), >> >> the parent_sjinfo min_lefthand looks like as below: >> >> >> >> (gdb) p *parent_sjinfo->min_lefthand >> >> $7 = {type = 3951489872, nwords = 21994, words = 0x55eaeb8696b0} >> >> >> >> The above bitmap is invalid, so trigger the Assert(bms_is_valid_set(a)); >> >> >> >> Attached patch is my fix. Any thoughts? >> >> Thanks for the report and the fix. Ideally, following line should >> allocate separate memory for the child's bitmapset. >> sjinfo->min_lefthand = adjust_child_relids(sjinfo->min_lefthand, >> left_nappinfos, left_appinfos); >> >> Unless no relid in sjinfo->min_lefthand was translated according to >> the given appinfos, adjust_child_relids() should create a new >> Bitmapset. Can you please check why no translation happened. I will >> try your reproduction tomorrow. > > > I step into adjust_child_relids(), see below: > nappinfos = 1; > p appinfo->parent_relid > 2 > call bmsToString(relids) > "(b 1)" > > So no translation you said happened, just return relids. > Hmm. The code there assumes that all the Relids will at least have one parent each of the children involved. For some reason sjinfo->min_lefthand has only relid 1 but not 2 or 5. 2 and 5 are actually the parent relids of the children passed in respectively. The join is between 2 and 5, then why is 1 appearing in the min_lefthand. It might be legitimate, but we need to find the reason. If it's legitimate, I think we need to copy the Relids which haven't undergone any translation so as to keep them isolated from the parent relids. -- Best Wishes, Ashutosh Bapat
pgsql-bugs by date: