Re: FDW, too long to run explain - Mailing list pgsql-general

From Jeff Janes
Subject Re: FDW, too long to run explain
Date
Msg-id CAMkU=1zokhfktj6KMtyHirjjD15VCab5=e=3uQYkHJuTp15K_A@mail.gmail.com
Whole thread Raw
In response to Re: FDW, too long to run explain  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: FDW, too long to run explain  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: FDW, too long to run explain  (auxsvr <auxsvr@gmail.com>)
List pgsql-general
On Sun, Feb 17, 2019 at 12:41 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Feb 4, 2019 at 2:15 AM Vijaykumar Jain <vjain@opentable.com> wrote:

now we have some long running queries via FDW that take minutes and get killed explain runs as idle in transaction on remote servers.

Are you saying the EXPLAIN itself gets killed, or execution of the plan generated based on the EXPLAIN (issued under use_remote_estimate = true) gets killed?  Who is doing the killing, the local side or the foreign side?  Can you include verbatim log entries for this?

After thinking about it a bit more, I think I see the issue here.  The EXPLAIN pursuant to use_remote_estimate is issued in the same remote transaction as the following DECLARE and FETCH's are.  But after the EXPLAIN is issued, the local server executes the query for a different FDW to satisfy some other branch of the UNION ALL, giving the first FDW connection time to do an idle-in-transaction timeout.  This happens even if no rows need to fetched from that FDW, because another branch of the UNION ALL satisfied the LIMIT.

A question for the PostgreSQL hackers would be, Is it necessary and desirable that the EXPLAIN be issued in the same transaction as the eventual DECLARE and FETCHes?  I don't think it is.  I guess if the foreign side table definition got changed between EXPLAIN and DECLARE it would cause problems, but changing the foreign side definition out of sync with the local side can cause problems anyway, so is that important to preserve?

Changing that might narrow but not completely fix the problem, as there might still be delays between the DECLARE and the FETCH or between successive FETCHes. 

So a question for you would be, why do have such an aggressive setting for idle_in_transaction_session_timeout that it causes this to happen?  Couldn't you relax it, perhaps just for the role used for the FDW connections?

Cheers,

Jeff

pgsql-general by date:

Previous
From: Andres Freund
Date:
Subject: Re: WSL (windows subsystem on linux) users will need to turn fsyncoff as of 11.2
Next
From: Tom Lane
Date:
Subject: Re: FDW, too long to run explain