Re: Asynchronous Append on postgres_fdw nodes. - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: Asynchronous Append on postgres_fdw nodes.
Date
Msg-id CAPmGK15CyEmstJyzkGsT_1OmArJwD6vQguBEwiz8oYx4zz6ssA@mail.gmail.com
Whole thread Raw
In response to Re: Asynchronous Append on postgres_fdw nodes.  (Andrey Lepikhov <a.lepikhov@postgrespro.ru>)
Responses Re: Asynchronous Append on postgres_fdw nodes.  (Andrey Lepikhov <a.lepikhov@postgrespro.ru>)
List pgsql-hackers
On Tue, May 11, 2021 at 11:58 AM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> Your patch fixes the problem. But I found two more problems:
>
> EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
> SELECT * FROM (
>         (SELECT * FROM f1)
>                 UNION ALL
>         (SELECT * FROM f2)
>                 UNION ALL
>         (SELECT * FROM l3)
> ) q1 LIMIT 6709;
>                            QUERY PLAN
> --------------------------------------------------------------
>   Limit (actual rows=6709 loops=1)
>     ->  Append (actual rows=6709 loops=1)
>           ->  Async Foreign Scan on f1 (actual rows=1 loops=1)
>           ->  Async Foreign Scan on f2 (actual rows=1 loops=1)
>           ->  Seq Scan on l3 (actual rows=6708 loops=1)
>
> Here we scan 6710 tuples at low level but appended only 6709. Where did
> we lose one tuple?

The extra tuple, which is from f1 or f2, would have been kept in the
Append node's as_asyncresults, not returned from the Append node to
the Limit node.  The async Foreign Scan nodes would fetch tuples
before the Append node ask the tuples, so the fetched tuples may or
may not be used.

> 2.
> SELECT * FROM (
>         (SELECT * FROM f1)
>                 UNION ALL
>         (SELECT * FROM f2)
>                 UNION ALL
>         (SELECT * FROM f3 WHERE a > 0)
> ) q1 LIMIT 3000;
>                            QUERY PLAN
> --------------------------------------------------------------
>   Limit (actual rows=3000 loops=1)
>     ->  Append (actual rows=3000 loops=1)
>           ->  Async Foreign Scan on f1 (actual rows=0 loops=1)
>           ->  Async Foreign Scan on f2 (actual rows=0 loops=1)
>           ->  Foreign Scan on f3 (actual rows=3000 loops=1)
>
> Here we give preference to the synchronous scan. Why?

This would be expected behavior, and the reason is avoid performance
degradation; you might think it would be better to execute the async
Foreign Scan nodes more aggressively, but it would require
waiting/polling for file descriptor events many times, which is
expensive and might cause performance degradation.  I think there is
room for improvement, though.

Thanks!

Best regards,
Etsuro Fujita



pgsql-hackers by date:

Previous
From: Andrey Lepikhov
Date:
Subject: Re: Defer selection of asynchronous subplans until the executor initialization stage
Next
From: Michael Paquier
Date:
Subject: Re: Multiple hosts in connection string failed to failover in non-hot standby mode