Thread: [HACKERS] CONNECTION LIMIT and Parallel Query don't play well together

[HACKERS] CONNECTION LIMIT and Parallel Query don't play well together

From
David Rowley
Date:
It has come to my attention that when a user has a CONNECTION LIMIT
set, and they make use of parallel query, that their queries can fail
due to the connection limit being exceeded.

Simple test case:

postgres=# CREATE USER user1 LOGIN CONNECTION LIMIT 2;
CREATE ROLE
postgres=# \c postgres user1
You are now connected TO DATABASE "postgres" AS USER "user1".
postgres=> CREATE TABLE t1 AS (SELECT i FROM GENERATE_SERIES(1,6000000) s(i));
SELECT 6000000
postgres=> SET max_parallel_workers_per_gather = 2;
SET
postgres=> SELECT COUNT(*) FROM t1;
ERROR:  too many connections FOR ROLE "user1"
CONTEXT:  parallel worker

Now, as I understand it, during the design of parallel query, it was
designed in such a way that nodeGather could perform all of the work
in the main process in the event that no workers were available, and
that the only user visible evidence of this would be the query would
be slower than it would otherwise be.

After a little bit of looking around I see that CountUserBackends()
does not ignore the parallel workers, and counts these as
"CONNECTIONS". It's probably debatable to weather these are
connections or not, but I do see that max_connections is separate from
max_worker_processes, per:

/* the extra unit accounts for the autovacuum launcher */
MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
max_worker_processes;

so the two don't stomp on each other's feet, which makes me think that
a parallel worker should not consume a user connection, since it's not
eating into max_connections. Also this is convenient fix for this
would be to have CountUserBackends() ignore parallel workers
completely.

The alternatives I've thought of are would be to make some additional
checks in RegisterDynamicBackgroundWorker() to make sure we don't get
more workers than the user would be allowed, but that would add more
code between the lock and increase contention, and we'd also somehow
need to find a way to reserve the connections until the parallel
workers started, so they were not taken by another concurrent
connection in the meantime. This all sounds pretty horrid.

Perhaps we can provide greater control of parallel workers per user in
a future release to allow admins who are concerned about users hogging
all of the parallel workers. Yet that's likely premature, as we don't
have a per query nob for that yet.

Thoughts?

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play well together

From
Amit Kapila
Date:
On Wed, Jan 11, 2017 at 2:44 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> It has come to my attention that when a user has a CONNECTION LIMIT
> set, and they make use of parallel query, that their queries can fail
> due to the connection limit being exceeded.
>
> Simple test case:
>
> postgres=# CREATE USER user1 LOGIN CONNECTION LIMIT 2;
> CREATE ROLE
> postgres=# \c postgres user1
> You are now connected TO DATABASE "postgres" AS USER "user1".
> postgres=> CREATE TABLE t1 AS (SELECT i FROM GENERATE_SERIES(1,6000000) s(i));
> SELECT 6000000
> postgres=> SET max_parallel_workers_per_gather = 2;
> SET
> postgres=> SELECT COUNT(*) FROM t1;
> ERROR:  too many connections FOR ROLE "user1"
> CONTEXT:  parallel worker
>
> Now, as I understand it, during the design of parallel query, it was
> designed in such a way that nodeGather could perform all of the work
> in the main process in the event that no workers were available, and
> that the only user visible evidence of this would be the query would
> be slower than it would otherwise be.
>

This has been reported previously [1] and I have explained the reason
why such a behaviour is possible and why this can't be handled in
Gather node.

> After a little bit of looking around I see that CountUserBackends()
> does not ignore the parallel workers, and counts these as
> "CONNECTIONS". It's probably debatable to weather these are
> connections or not,

I think this is not only for parallel workers, rather any background
worker that uses database connection
(BGWORKER_BACKEND_DATABASE_CONNECTION) will be counted in a similar
way.  I am not sure if it is worth inventing something to consider
such background worker connections different from backend connections.
However, I think we should document it either in parallel query or in
background worker or in Create User .. Connection section.


[1] - https://www.postgresql.org/message-id/20161222111345.25620.8603%40wrigleys.postgresql.org


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play welltogether

From
Albe Laurenz
Date:
Amit Kapila wrote:
> On Wed, Jan 11, 2017 at 2:44 AM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> It has come to my attention that when a user has a CONNECTION LIMIT
>> set, and they make use of parallel query, that their queries can fail
>> due to the connection limit being exceeded.
>>
>> Simple test case:
>>
>> postgres=# CREATE USER user1 LOGIN CONNECTION LIMIT 2;
>> [...]
>> postgres=> SELECT COUNT(*) FROM t1;
>> ERROR:  too many connections FOR ROLE "user1"
>> CONTEXT:  parallel worker
>>
>> Now, as I understand it, during the design of parallel query, it was
>> designed in such a way that nodeGather could perform all of the work
>> in the main process in the event that no workers were available, and
>> that the only user visible evidence of this would be the query would
>> be slower than it would otherwise be.
>>
>> After a little bit of looking around I see that CountUserBackends()
>> does not ignore the parallel workers, and counts these as
>> "CONNECTIONS". It's probably debatable to weather these are
>> connections or not,
> 
> I think this is not only for parallel workers, rather any background
> worker that uses database connection
> (BGWORKER_BACKEND_DATABASE_CONNECTION) will be counted in a similar
> way.  I am not sure if it is worth inventing something to consider
> such background worker connections different from backend connections.
> However, I think we should document it either in parallel query or in
> background worker or in Create User .. Connection section.

I think that this should be fixed rather than documented.
Users will not take it well if their queries error out
in this fashion.

Background processes should not be counted as active connections.
Their limit should be determined by max_worker_processes,
and neither max_connections nor the connection limit per user
or database should take them into account.

Yours,
Laurenz Albe

Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play well together

From
Robert Haas
Date:
On Tue, Jan 10, 2017 at 4:14 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> It has come to my attention that when a user has a CONNECTION LIMIT
> set, and they make use of parallel query, that their queries can fail
> due to the connection limit being exceeded.

That's bad.

> Now, as I understand it, during the design of parallel query, it was
> designed in such a way that nodeGather could perform all of the work
> in the main process in the event that no workers were available, and
> that the only user visible evidence of this would be the query would
> be slower than it would otherwise be.

That was the intent.

> After a little bit of looking around I see that CountUserBackends()
> does not ignore the parallel workers, and counts these as
> "CONNECTIONS". It's probably debatable to weather these are
> connections or not, ...

Yeah.  I think that I looked at the connection limit stuff in the 9.4
time frame and said, well, we shouldn't let people use background
workers as a way of evading the connection limit, so let's continue to
count them against that limit.  Then, later on, I did the work to try
to make it transparent when sufficient parallel workers cannot be
obtained, but forgot about this case or somehow convinced myself that
it didn't matter.

One option is certainly to decide categorically that background
workers are not connections, and therefore CountUserBackends() should
ignore them and InitializeSessionUserId() shouldn't call it when the
session being started is a background worker.  That means that
background workers don't count against the user connection limit, full
stop.  Another option, probably slightly less easy to implement, is to
decide that background workers in general count against the limit but
parallel workers do not.  The third option is to count both background
workers and parallel workers against the limit but somehow recover
gracefully when this error trips.  But I have no idea how we could
implement that third option in a reasonable way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play well together

From
David Rowley
Date:
On 12 January 2017 at 09:36, Robert Haas <robertmhaas@gmail.com> wrote:
> One option is certainly to decide categorically that background
> workers are not connections, and therefore CountUserBackends() should
> ignore them and InitializeSessionUserId() shouldn't call it when the
> session being started is a background worker.  That means that
> background workers don't count against the user connection limit, full
> stop.  Another option, probably slightly less easy to implement, is to
> decide that background workers in general count against the limit but
> parallel workers do not.

I think the root question here which we need to ask ourselves is, what
is "CONNECTION LIMIT" for. I seem to have come around to assuming it's
meant to be to protect the server to give everyone a fairer chance of
getting a connection to the database. Now, since background workers
don't consume anything from max_connections, then I don't really feel
that a background worker should count towards "CONNECTION LIMIT". I'd
assume any CONNECTION LIMITs that are set for a user would be
calculated based on what max_connections is set to. If we want to
limit background workers in the same manner, then perhaps we'd want to
invent something like "WORKER LIMIT N" in CREATE USER.

> The third option is to count both background
> workers and parallel workers against the limit but somehow recover
> gracefully when this error trips.  But I have no idea how we could
> implement that third option in a reasonable way.

I agree with your view on the third option. I looked at this too and
it seems pretty horrible to try and do anything in that direction. It
seems that even if we suppressed the ERROR message, and had the worker
exit, that we'd still briefly consume a background worker slot which
would reduce the chances of some entitle user connection obtaining
them, in fact, this is the case as it stands today, even if that
moment is brief.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play well together

From
David Rowley
Date:
> On 12 January 2017 at 09:36, Robert Haas <robertmhaas@gmail.com> wrote:
>> One option is certainly to decide categorically that background
>> workers are not connections, and therefore CountUserBackends() should
>> ignore them and InitializeSessionUserId() shouldn't call it when the
>> session being started is a background worker.  That means that
>> background workers don't count against the user connection limit, full
>> stop.

I've attached a patch which intended to assist discussions on this topic.

The patch adds some notes to the docs to mention that background
workers and prepared xacts are not counted in CONNECTION LIMIT, it
then goes on and makes CountUserBackends() ignore bgworkers. It was
already ignoring prepared xacts. There's a bit of plumbing work to
make the proc array aware of the background worker status. Hopefully
this is suitable. I'm not all that close to that particular area of
the code.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play well together

From
David Rowley
Date:
On 12 January 2017 at 15:24, David Rowley <david.rowley@2ndquadrant.com> wrote:
> I've attached a patch which intended to assist discussions on this topic.
>
> The patch adds some notes to the docs to mention that background
> workers and prepared xacts are not counted in CONNECTION LIMIT, it
> then goes on and makes CountUserBackends() ignore bgworkers. It was
> already ignoring prepared xacts. There's a bit of plumbing work to
> make the proc array aware of the background worker status. Hopefully
> this is suitable. I'm not all that close to that particular area of
> the code.

Hi Robert,

Wondering you've had any time to glance over this?

If you think the patch needs more work, or goes about things the wrong
way, let me know, and I'll make the changes.

Thanks

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play well together

From
Robert Haas
Date:
On Thu, Jan 26, 2017 at 7:59 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 12 January 2017 at 15:24, David Rowley <david.rowley@2ndquadrant.com> wrote:
>> I've attached a patch which intended to assist discussions on this topic.
>>
>> The patch adds some notes to the docs to mention that background
>> workers and prepared xacts are not counted in CONNECTION LIMIT, it
>> then goes on and makes CountUserBackends() ignore bgworkers. It was
>> already ignoring prepared xacts. There's a bit of plumbing work to
>> make the proc array aware of the background worker status. Hopefully
>> this is suitable. I'm not all that close to that particular area of
>> the code.
>
> Wondering you've had any time to glance over this?
>
> If you think the patch needs more work, or goes about things the wrong
> way, let me know, and I'll make the changes.

Sorry, this had slipped through the cracks -- I'm having a very hard
time keeping up with the flow of patches and emails.  But it looks
good to me, except that it seems like CountDBBackends() needs the same
fix (and probably a corresponding documentation update).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play well together

From
David Rowley
Date:
On 27 January 2017 at 03:53, Robert Haas <robertmhaas@gmail.com> wrote:
> Sorry, this had slipped through the cracks -- I'm having a very hard
> time keeping up with the flow of patches and emails.  But it looks
> good to me, except that it seems like CountDBBackends() needs the same
> fix (and probably a corresponding documentation update).

Thanks for looking at this.

Looks like there's a few other usages of CountDBBackends() which
require background workers to be counted too, so I ended up creating
CountDBConnections() as I didn't really think adding a bool flag to
CountDBBackends was so nice.

I thought about renaming CountUserBackends() to become
CountUserConnections(), but I've not. Although, perhaps its better to
break any third party stuff that uses that so that authors can review
which behaviour they need rather than have their extension silently
break?

David



-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play welltogether

From
Andrew Dunstan
Date:

On 01/29/2017 04:07 PM, David Rowley wrote:
> On 27 January 2017 at 03:53, Robert Haas <robertmhaas@gmail.com> wrote:
>> Sorry, this had slipped through the cracks -- I'm having a very hard
>> time keeping up with the flow of patches and emails.  But it looks
>> good to me, except that it seems like CountDBBackends() needs the same
>> fix (and probably a corresponding documentation update).
> Thanks for looking at this.
>
> Looks like there's a few other usages of CountDBBackends() which
> require background workers to be counted too, so I ended up creating
> CountDBConnections() as I didn't really think adding a bool flag to
> CountDBBackends was so nice.
>
> I thought about renaming CountUserBackends() to become
> CountUserConnections(), but I've not. Although, perhaps its better to
> break any third party stuff that uses that so that authors can review
> which behaviour they need rather than have their extension silently
> break?
>
>



I'm inclined to keep this as is - I don't think we should change the
names at least in the stable releases. I'm not sure how far back it
should be patched. The real effect is going to be felt from 9.6, I
think, but arguably for consistency we should change it back to 9.3 or
9.4. Thoughts?

Other things being equal I intend to commit this later today.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play welltogether

From
Albe Laurenz
Date:
Andrew Dunstan wrote:
> On 01/29/2017 04:07 PM, David Rowley wrote:
>> Looks like there's a few other usages of CountDBBackends() which
>> require background workers to be counted too, so I ended up creating
>> CountDBConnections() as I didn't really think adding a bool flag to
>> CountDBBackends was so nice.
>>
>> I thought about renaming CountUserBackends() to become
>> CountUserConnections(), but I've not. Although, perhaps its better to
>> break any third party stuff that uses that so that authors can review
>> which behaviour they need rather than have their extension silently
>> break?
> 
> I'm inclined to keep this as is - I don't think we should change the
> names at least in the stable releases. I'm not sure how far back it
> should be patched. The real effect is going to be felt from 9.6, I
> think, but arguably for consistency we should change it back to 9.3 or
> 9.4. Thoughts?
> 
> Other things being equal I intend to commit this later today.

+1

Maybe it is better not to backpatch farther than 9.6 - I think it is
good to be conservative about backpatching, and, as you say, the effect
won't be noticable much before 9.6.

Yours,
Laurenz Albe

Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play welltogether

From
Peter Eisentraut
Date:
On 1/11/17 5:51 PM, David Rowley wrote:
> Now, since background workers
> don't consume anything from max_connections, then I don't really feel
> that a background worker should count towards "CONNECTION LIMIT". I'd
> assume any CONNECTION LIMITs that are set for a user would be
> calculated based on what max_connections is set to. If we want to
> limit background workers in the same manner, then perhaps we'd want to
> invent something like "WORKER LIMIT N" in CREATE USER.

This explanation makes sense, but it kind of upset my background
sessions patch, which would previously have been limited by per-user
connection settings.

So I would like to have a background worker limit per user, as you
allude to.  Attached is a patch that implements a GUC setting
max_worker_processes_per_user.

Besides the uses for background sessions, but it can also be useful for
parallel workers, logical replication apply workers, or things like
third-party partitioning extensions.

Thoughts?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play well together

From
Robert Haas
Date:
On Wed, Feb 15, 2017 at 11:19 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 1/11/17 5:51 PM, David Rowley wrote:
>> Now, since background workers
>> don't consume anything from max_connections, then I don't really feel
>> that a background worker should count towards "CONNECTION LIMIT". I'd
>> assume any CONNECTION LIMITs that are set for a user would be
>> calculated based on what max_connections is set to. If we want to
>> limit background workers in the same manner, then perhaps we'd want to
>> invent something like "WORKER LIMIT N" in CREATE USER.
>
> This explanation makes sense, but it kind of upset my background
> sessions patch, which would previously have been limited by per-user
> connection settings.
>
> So I would like to have a background worker limit per user, as you
> allude to.  Attached is a patch that implements a GUC setting
> max_worker_processes_per_user.
>
> Besides the uses for background sessions, but it can also be useful for
> parallel workers, logical replication apply workers, or things like
> third-party partitioning extensions.
>
> Thoughts?

This isn't going to deliver consistent results if it's set differently
in different sessions, although maybe you could weasel around that by
wording the documentation in just the right way.  It seems OK
otherwise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play welltogether

From
Peter Eisentraut
Date:
On 2/15/17 11:19, Peter Eisentraut wrote:
> So I would like to have a background worker limit per user, as you
> allude to.  Attached is a patch that implements a GUC setting
> max_worker_processes_per_user.
> 
> Besides the uses for background sessions, but it can also be useful for
> parallel workers, logical replication apply workers, or things like
> third-party partitioning extensions.

Given that background sessions have been postponed, is there still
interest in this separate from that?  It would be useful for per-user
parallel worker limits, for example.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play welltogether

From
Peter Eisentraut
Date:
On 4/6/17 15:01, Peter Eisentraut wrote:
> On 2/15/17 11:19, Peter Eisentraut wrote:
>> So I would like to have a background worker limit per user, as you
>> allude to.  Attached is a patch that implements a GUC setting
>> max_worker_processes_per_user.
>>
>> Besides the uses for background sessions, but it can also be useful for
>> parallel workers, logical replication apply workers, or things like
>> third-party partitioning extensions.
> 
> Given that background sessions have been postponed, is there still
> interest in this separate from that?  It would be useful for per-user
> parallel worker limits, for example.

Here is a slightly updated patch for consideration in the upcoming
commit fest.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play well together

From
David Rowley
Date:
On 24 August 2017 at 11:15, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> Here is a slightly updated patch for consideration in the upcoming
> commit fest.

Hi Peter,

I just had a quick glance over this and wondered about 2 things.

1. Why a GUC and not a new per user option so it can be configured
differently for different users? Something like ALTER USER ... WORKER
LIMIT <n>; perhaps. I mentioned about this up-thread a bit.

2.

+ if (count > max_worker_processes_per_user)
+ {
+ ereport(LOG,
+ (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+ errmsg("too many worker processes for role \"%s\"",
+ GetUserNameFromId(GetUserId(), false))));
+ LWLockRelease(BackgroundWorkerLock);
+ return false;

Unless I've misunderstood something, it seems that this is going to
give random errors to users which might only occur when they run
queries against larger tables. Part of why it made sense not to count
workers towards the CONNECTION LIMIT was the fact that we didn't want
to throw these random errors when workers could not be obtained when
we take precautions in other places to just silently have fewer
workers. There's lots of discussions earlier in this thread about this
and I don't think anyone was in favour of queries randomly working
sometimes.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play well together

From
Michael Paquier
Date:
On Fri, Aug 25, 2017 at 6:25 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> I just had a quick glance over this and wondered about 2 things.
>
> 1. Why a GUC and not a new per user option so it can be configured
> differently for different users? Something like ALTER USER ... WORKER
> LIMIT <n>; perhaps. I mentioned about this up-thread a bit.
>
> 2.
>
> + if (count > max_worker_processes_per_user)
> + {
> + ereport(LOG,
> + (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
> + errmsg("too many worker processes for role \"%s\"",
> + GetUserNameFromId(GetUserId(), false))));
> + LWLockRelease(BackgroundWorkerLock);
> + return false;
>
> Unless I've misunderstood something, it seems that this is going to
> give random errors to users which might only occur when they run
> queries against larger tables. Part of why it made sense not to count
> workers towards the CONNECTION LIMIT was the fact that we didn't want
> to throw these random errors when workers could not be obtained when
> we take precautions in other places to just silently have fewer
> workers. There's lots of discussions earlier in this thread about this
> and I don't think anyone was in favour of queries randomly working
> sometimes.

The status of the patch is incorrect I think. This was marked as needs
review but I can see some input here which has remained unanswered for
three months. I am marking this patch as returned with feedback.
-- 
Michael