[HACKERS] PATCH: enabling parallel execution for cursors explicitly(experimental) - Mailing list pgsql-hackers

From Tomas Vondra
Subject [HACKERS] PATCH: enabling parallel execution for cursors explicitly(experimental)
Date
Msg-id 98760523-a0d2-8705-38e3-c4602ecf2448@2ndquadrant.com
Whole thread Raw
Responses Re: [HACKERS] PATCH: enabling parallel execution for cursorsexplicitly (experimental)
Re: [HACKERS] PATCH: enabling parallel execution for cursorsexplicitly (experimental)
List pgsql-hackers
Hi,

One of the existing limitations of parallel query is that cursors
generally do not benefit from it [1]. Commit 61c2e1a95f [2] improved the
situation for cursors from procedural languages, but unfortunately for
user-defined cursors parallelism is still disabled.

For many use cases that is perfectly fine, but for applications that
need to process large amounts of data this is rather annoying. When the
result sets are large, cursors are extremely efficient - in terms of
memory consumption, for example. So the applications have to choose
between "cursor" approach (and no parallelism), or parallelism and
uncomfortably large result sets.

I believe there are two main reasons why parallelism is disabled for
user-defined cursors (or queries that might get suspended):

(1) We can't predict what will happen while the query is suspended (and
the transaction is still in "parallel mode"), e.g. the user might run
arbitrary DML which is not allowed.

(2) If the cursor gets suspended, the parallel workers would be still
assigned to it and could not be used for anything else.

Clearly, we can't solve those issues in general, so the default will
probably remain "parallelism disabled".

I propose is to add a new cursor option (PARALLEL), which would allow
parallel plans for that particular user-defined cursor. Attached is an
experimental patch doing this (I'm sure there are some loose ends).

This does not make either any of the issues go away, of course. We still
enforce "no DML while parallel operation in progress" as before, so this
will not work:

BEGIN;
DECLARE x PARALLEL CURSOR FOR SELECT * FROM t2 WHERE ...;
FETCH 1000 FROM x;
INSERT INTO t2 VALUES (1);
FETCH 1000 FROM x;
COMMIT;

but this will

BEGIN;
DECLARE x PARALLEL CURSOR FOR SELECT * FROM t2 WHERE ...;
FETCH 1000 FROM x;
...
FETCH 1000 FROM x;
CLOSE x;
INSERT INTO t2 VALUES (1);
COMMIT;

Regarding (2), if the user suspends the cursor for a long time, bummer.
The parallel workers will remain assigned, doing nothing. I don't have
any idea how to get around that, but I don't see how we could do better.
I don't see either of these limitations as fatal.

Any opinions / obvious flaws that I missed?

regards

[1]
https://www.postgresql.org/docs/9.6/static/when-can-parallel-query-be-used.html

[2]
https://www.postgresql.org/message-id/CAOGQiiMfJ%2B4SQwgG%3D6CVHWoisiU0%2B7jtXSuiyXBM3y%3DA%3DeJzmg%40mail.gmail.com

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] pg_control_recovery() return value when not in recovery
Next
From: Gourav Kumar
Date:
Subject: [HACKERS] Turn off transitive joins