BUG #18380: BUG: notify stops working after LEFT JOIN if enable_mergejoin is ON - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #18380: BUG: notify stops working after LEFT JOIN if enable_mergejoin is ON
Date
Msg-id 18380-c795d2b4033dfca0@postgresql.org
Whole thread Raw
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18380
Logged by:          Christian Grothoff
Email address:      grothoff@gnu.org
PostgreSQL version: 16.2
Operating system:   Debian GNU/Linux
Description:

We tested with Postgres 14, 15, 16.1 and 16.2; same results.

Overview:
========

We use LISTEN + NOTIFY to implement some long polling. This usually works.
However, we stop getting NOTIFY messages after a LISTENing client issued a
particular SELECT query with a LEFT JOIN in the LISTENing client.  Disabling
"enable_mergejoin" fixes the issue. We believe that the "enable_mergejoin"
logic somehow corrupts the notify state (likely an obscure rare memory
corruption issue). The bug is 100% reproduceable on several of our
systems.

How to reproduce:
===============
* Install GNUnet 0.21.0 [or: git.gnunet.org/gnunet.git master]
* Install GNU Taler exchange, checkout tag 'postgres-bug-mergejoin' [from
git.taler.net/exchange.git]
* Install taler-wallet-core [from git.taler.net/taler-wallet-core]
* run $ taler-harness run-integrationtests deposit 

Buggy outcome
=============

If enable_mergejoin is ON, the test reports this at the end:
      parent: got result
{"name":"deposit","timeSec":35.168,"status":"pass"}
Note that the 35s is due to all kinds of things in the test, none related to
actual postgres query performance.
This is NOT a performance issue. We can set the long-poll threshold higher,
and the time will go up accordingly.

Expected behavior with work-around
==============================

 If enable_mergejoin is OFF, long-polling works and the test reports
something like this at the end:
      parent: got result
{"name":"deposit","timeSec":16.094,"status":"pass"}

This is basically the fastest the integration test will run (~15-17
seconds), the NOTIFY happens as expected.

Relevant SQL query break-down
==========================

The query that causes things to break is in
exchange/src/exchangedb/pg_lookup_transfer_by_deposit.c. The buggy
sub-clause is marked with
#if BUG below:

    PREPARE (pg,
             "get_deposit_without_wtid",
             "SELECT"
             " bdep.wire_salt"
             ",wt.payto_uri"
             ",cdep.amount_with_fee"
             ",denom.fee_deposit"
             ",bdep.wire_deadline"
#if BUG
             ",agt.legitimization_requirement_serial_id"
#endif
             ",aml.status"
             ",aml.kyc_requirement"
             " FROM coin_deposits cdep"
             " JOIN batch_deposits bdep"
             "   USING (batch_deposit_serial_id)"
             " JOIN wire_targets wt"
             "   USING (wire_target_h_payto)"
             " JOIN known_coins kc"
             "   ON (kc.coin_pub = cdep.coin_pub)"
             " JOIN denominations denom"
             "   USING (denominations_serial)"
#if BUG
             " LEFT JOIN aggregation_transient agt "
             "   ON ( (bdep.wire_target_h_payto = agt.wire_target_h_payto)
AND"
             "        (bdep.merchant_pub = agt.merchant_pub) )"
#endif
             " LEFT JOIN aml_status aml"
             "   ON (wt.wire_target_h_payto = aml.h_payto)"
             " WHERE cdep.coin_pub=$1"
             "   AND bdep.merchant_pub=$3"
             "   AND bdep.h_contract_terms=$2"
             " LIMIT 1;");

If we #define BUG to 0 for the code above, we do NOT have to disable the
merge_join to get the desired behavior.


pgsql-bugs by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault
Next
From: Tom Lane
Date:
Subject: Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault