Thread: A million users
Hi A simple question before design. Would there be performance issues having perhaps a million users, each having several roles? I could imagine a user would have on average 10-20 roles. Roles would be used to allow or restrict access to tables or rows according to the business rules. /kaare
---------- Forwarded message ---------
From: Kaare Rasmussen <>
From: Kaare Rasmussen <>
Hi
A simple question before design.
Would there be performance issues having perhaps a million users, each
having several roles? I could imagine a user would have on average 10-20
roles.
I tried to grant select permissions to 5000 different roles on one table,
It failed with row size too big already at 2443.
That said,
I am not even talking of catalog bloat.
So please test ...I think there might be some doc that will show the limitations or in the source code, I can check later
On 2024-Nov-13, Vijaykumar Jain wrote: > I tried to grant select permissions to 5000 different roles on one table, > It failed with row size too big already at 2443. But you can grant select to one "reader" role, and grant that one role to however many other roles you want. This way you can have an arbitrary number of roles with indirect access to the table. In real-world usage, this is more convenient that granting access to individual roles on individual tables; likely, you'll grant access to sets of tables/views/functions/etc rather than exactly one, and you can manage that more easily if you have one intermediate role to modify than if you have to mess with 5000 individual roles. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Wed, Nov 13, 2024 at 11:29 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On 2024-Nov-13, Vijaykumar Jain wrote: > > I tried to grant select permissions to 5000 different roles on one table, > > It failed with row size too big already at 2443. > > But you can grant select to one "reader" role, and grant that one role > to however many other roles you want. This way you can have an > arbitrary number of roles with indirect access to the table. In > real-world usage, this is more convenient that granting access to > individual roles on individual tables; likely, you'll grant access to > sets of tables/views/functions/etc rather than exactly one, and you can > manage that more easily if you have one intermediate role to modify than > if you have to mess with 5000 individual roles. Hi. Sure, that's a good point, and a best practice IMHO. But I already do that, and the original question remain, i.e. how does PostgreSQL with thousands or millions of roles? In my use case, 1000 LOGIN users, and 10'000 schema related ROLEs, is possible, with can translate to millions of GRANTs. It matters to me especially, since I'm using pg_has_role() in RLS predicates, in addition to using ROLEs for access to schemas as usual (see above). I'm not in a point to test that myself at this time, but if anyone looked into how PostgreSQL scales with many roles (O(N)? O(logN)? O(NlogN)?), I'd be interested in pointers to that research or those benchmarks. Thanks, --DD
On 11/13/24 12:29, Alvaro Herrera wrote: > On 2024-Nov-13, Vijaykumar Jain wrote: > >> I tried to grant select permissions to 5000 different roles on one table, >> It failed with row size too big already at 2443. > But you can grant select to one "reader" role, and grant that one role > to however many other roles you want. This way you can have an > arbitrary number of roles with indirect access to the table. In > real-world usage, this is more convenient that granting access to > individual roles on individual tables; likely, you'll grant access to > sets of tables/views/functions/etc rather than exactly one, and you can > manage that more easily if you have one intermediate role to modify than > if you have to mess with 5000 individual roles. Exactly! In the later versions, security gets more and more refined and strengthened. So ppl should think about moving away from "public" , and start implementing finer grained schemes of security, as you suggest. + \dp shows prettier than having 1000+ users listed. >
On Wed, Nov 13, 2024, 4:42 PM Achilleas Mantzios - cloud <a.mantzios@cloud.gatewaynet.com> wrote:
Exactly! In the later versions, security gets more and more refined and
strengthened. So ppl should think about moving away from "public" , and
start implementing finer grained schemes of security, as you suggest. +
\dp shows prettier than having 1000+ users listed.
I wanted to just communicate the limits.
a lot of postgresql architecture can leverage the resources and scale, but not all.
i had 100s of 1000s of tables on my setup where i worked last.
if i did \dt it would freeze all the time. i had to exit the pdwl session, check the source code of how the partition was named and then look for what I wanted.
if things are pretty with psql or not should not be a criteria for how many objects you want to have.
i would expect clear exceptions so one knows what the real problem is.
the error I got did not in anyway communicate the role limits for col size limits.
On Wed, Nov 13, 2024, 5:00 PM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
On Wed, Nov 13, 2024, 4:42 PM Achilleas Mantzios - cloud <a.mantzios@cloud.gatewaynet.com> wrote:
Exactly! In the later versions, security gets more and more refined and
strengthened. So ppl should think about moving away from "public" , and
start implementing finer grained schemes of security, as you suggest. +
\dp shows prettier than having 1000+ users listed.I wanted to just communicate the limits.a lot of postgresql architecture can leverage the resources and scale, but not all.i had 100s of 1000s of tables on my setup where i worked last.if i did \dt it would freeze all the time. i had to exit the pdwl session, check the source code of how the partition was named and then look for what I wanted.if things are pretty with psql or not should not be a criteria for how many objects you want to have.i would expect clear exceptions so one knows what the real problem is.the error I got did not in anyway communicate the role limits for col size limits.
so roles are not the problem.
but if you grant them individually select on the same table for ex. then the limits are breached based of size of the col not number of permissions.
Hi Dominique
Hi. Sure, that's a good point, and a best practice IMHO. But I already do that, and the original question remain, i.e. how does PostgreSQL with thousands or millions of roles?
Sorry if my original post was unclear, but I don't expect that there will be much more than perhaps a hundred roles. Each may have from a few up to a million users in them, though.
On 2024-Nov-13, Kaare Rasmussen wrote: > Sorry if my original post was unclear, but I don't expect that there > will be much more than perhaps a hundred roles. Each may have from a > few up to a million users in them, though. In Postgres, a user is a role. So if you have a hundred roles and a million users that these roles are granted to, that means you'll have 100100 roles. (In the worst case, where you grant all one hundred roles to each of the million users, you would end up with 100_000_000 rows in pg_auth_member). I would expect such a system to work mostly fine. It'll need memory for the caches used to store contents of system catalogs. I think you should test it out and see what happens. I haven't seen any published _actual_ benchmarks on this point. That said, having a million users is a bit strange. Do you want to give each visitor to your website a unique Postgres role, or something like that? I think this is unusual, but it should work. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Dominique Devienne: > Hi. Sure, that's a good point, and a best practice IMHO. > But I already do that, and the original question remain, > i.e. how does PostgreSQL with thousands or millions of roles? > In my use case, 1000 LOGIN users, and 10'000 schema related ROLEs, > is possible, with can translate to millions of GRANTs. > > It matters to me especially, since I'm using pg_has_role() in RLS predicates, > in addition to using ROLEs for access to schemas as usual (see above). > I'm not in a point to test that myself at this time, but if anyone looked into > how PostgreSQL scales with many roles (O(N)? O(logN)? O(NlogN)?), > I'd be interested in pointers to that research or those benchmarks. I don't have any benchmarks, but the following data point: We use PostgREST [1] which connects to the database with one "authenticator" role. For each request it handles, it does a SET ROLE to a role defined in the Authorization Header (JWT). Some numbers we are running with currently: - 1 authenticator role - ~ 127.000 user roles granted to "authenticator" - ~ 14.000 "scope" roles granted to the user roles (tenants, groups, ..) - ~ 15 "context" roles granted to user roles ("admin", "user", ...) - ~ 50 "access" roles granted to context roles ("view_x", "do_y", ...) Only the access roles have any direct privileges granted. We currently have ~ 700 RLS policies defined. Those are created TO the context roles. The policies check the current role's scope roles to select "allowed" rows. In total, we have ~370.000 roles granted to each other (pg_auth_members). Except for one thing, we have never had any real problems with this. We didn't observe anything getting massively worse with many roles, even though we use them extensively. RLS policies need to be carefully written to get any performance, though. The one problem we found is: The first time the authenticator role does a SET ROLE in a session it's **terribly** slow. With fewer users back then it took 6-7 minutes to do it. Any SET ROLE afterwards in the same session would be fast. Even more annoying - killing the session with SET ROLE running would not work properly and leave zombie processes. Giving the authenticator role the SUPERUSER privilege avoids the problem and makes it instant. However.. that's not very desirable. There were some improvements, IIRC in the 17 cycle (?), in that area, but I had not have the time to test it with that. We are still on v15 and the last time I tested this was ~ two years ago. I still wasn't able to put together a simple reproducer either. You should *probably* be better off with your different LOGIN roles, I assume the role cache builds up much quicker in that case. Hope that helps. Best, Wolfgang [1]: https://postgrest.org
Hi
That said, having a million users is a bit strange. Do you want to give each visitor to your website a unique Postgres role, or something like that? I think this is unusual, but it should work.
OK, thanks for the feedback, all. It may not be necessary to have such a fine grained system, but I was just wondering.
the caches used to store contents of system catalogs. I think you should test it out and see what happens. I haven't seen any published _actual_ benchmarks on this point.
I will, if this progresses. I did want to know if something like this has been tried and failed before.
/kaare
On Wed, Nov 13, 2024 at 12:02 PM <walther@technowledgy.de> wrote:
I don't have any benchmarks, but the following data point:
We use PostgREST [1] which connects to the database with one
"authenticator" role. For each request it handles, it does a SET ROLE to
a role defined in the Authorization Header (JWT).
Some numbers we are running with currently:
- 1 authenticator role
- ~ 127.000 user roles granted to "authenticator"
- ~ 14.000 "scope" roles granted to the user roles (tenants, groups, ..)
- ~ 15 "context" roles granted to user roles ("admin", "user", ...)
- ~ 50 "access" roles granted to context roles ("view_x", "do_y", ...)
Only the access roles have any direct privileges granted.
We currently have ~ 700 RLS policies defined. Those are created TO the
context roles. The policies check the current role's scope roles to
select "allowed" rows.
In total, we have ~370.000 roles granted to each other (pg_auth_members).
Except for one thing, we have never had any real problems with this. We
didn't observe anything getting massively worse with many roles, even
though we use them extensively. RLS policies need to be carefully
written to get any performance, though.
The one problem we found is:
The first time the authenticator role does a SET ROLE in a session it's
**terribly** slow. With fewer users back then it took 6-7 minutes to do
it. Any SET ROLE afterwards in the same session would be fast. Even more
annoying - killing the session with SET ROLE running would not work
properly and leave zombie processes. Giving the authenticator role the
SUPERUSER privilege avoids the problem and makes it instant. However..
that's not very desirable.
There were some improvements, IIRC in the 17 cycle (?), in that area,
but I had not have the time to test it with that. We are still on v15
and the last time I tested this was ~ two years ago. I still wasn't able
to put together a simple reproducer either.
You should *probably* be better off with your different LOGIN roles, I
assume the role cache builds up much quicker in that case.
I'm really interested in how this works. Role-per-user or even the ability to have many roles (370k??) seems like a dream come true. But I always was wary of it because:
a) A connection-per-role hits the ceiling pretty quickly because connections can't be pooled and shared between users and take up a lot of memory etc.
b) One could try to get around this with an authenticator role as you describe, but isn't it then possible to do a RESET ROLE and then another SET ROLE to get access to another user? This of course would have to be through SQL injection or some such, but it seems like that defeats at least some of the purpose of RLS.
Did you find some way to prevent RESET ROLE? I once advocated for a NO RESET option on SET ROLE [1] so that RESET ROLE would be impossible for the rest of the session. Still think it would be helpful.
Thanks,
Eric
Eric Hanson: > Did you find some way to prevent RESET ROLE? I once advocated for a NO > RESET option on SET ROLE [1] so that RESET ROLE would be impossible for > the rest of the session. Still think it would be helpful. Yeah, this is still on my list of things to research more about eventually - currently still unsolved. For my use-case the NO RESET would need to apply until the end of the transaction, not end of the session. I imagine something like an extension, that would: - block any SET SESSION ROLE - block any RESET ROLE - only allow SET LOCAL ROLE when CURRENT_USER has the right to do so Then the effect of SET LOCAL ROLE would still be reversed at the end of the transaction, but you could never "escape" a SET LOCAL ROLE that was set earlier. Best, Wolfgang
On Fri, Nov 22, 2024 at 6:57 AM <walther@technowledgy.de> wrote:
Yeah, this is still on my list of things to research more about
eventually - currently still unsolved.
For my use-case the NO RESET would need to apply until the end of the
transaction, not end of the session.
I imagine something like an extension, that would:
- block any SET SESSION ROLE
- block any RESET ROLE
- only allow SET LOCAL ROLE when CURRENT_USER has the right to do so
Then the effect of SET LOCAL ROLE would still be reversed at the end of
the transaction, but you could never "escape" a SET LOCAL ROLE that was
set earlier.
As things are now, would someone be able to do a RESET ROLE if *any* code/function had a SQL injection vulnerability, or only if there was one in the pooler? Or (ideally) neither. That's what a NO RESET option (or some similar functionality) would provide with certainty.
I found this extension:
but haven't used it. Seems to address this though, they introduce a set_session_auth(token) function and then reset_role requires the token if session_auth has been set.
Thanks,
Eric