Re: Proposal: two new role attributes and/or capabilities? - Mailing list pgsql-hackers

From José Luis Tallón
Subject Re: Proposal: two new role attributes and/or capabilities?
Date
Msg-id 5499C622.5040105@adv-solutions.net
Whole thread Raw
In response to Re: Proposal: two new role attributes and/or capabilities?  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On 12/23/2014 07:52 PM, Stephen Frost wrote:
> [snip]
> Manually performing VACUUM / VACUUM ANALYZE on the (few) affected
> tables every 12h or so fixes the performance problem for the
> particular queries without impacting the other users too much ---
> the tables and indexes in question have been moved to a separate
> tablespace/disk volume of their own.
> Autovacuum can certainly run vacuum/analyze on a few tables every 12
> hours, so I'm not really following where you see autovacuum being unable
> to cope.  I agree that there *are* such cases, but getting more
> information about those cases and exactly what solution *does* work
> would really help us improve autovacuum to address those use-cases.

I'll try to. I don't have direct access, and the use case is quite edgy 
to be fair.
Plus, the configuration and hardware leaves quite a bit to be desired...

...but it's a real use case and the solution (even if only treating the 
symptoms) is quite straight-forward and easy.
>> In short, this addresses situations where some tables have a much
>> higher update rate than the rest of the database so that performance
>> degrades with time --- the application became unusable after about 6
>> days' worth of updates until the manual vacuums were setup
> This really looks like a configuration issue with autovacuum..  Perhaps
> you need to make it more aggressive than the default and have it run
> more threads?

Yes to both. Up to something which actually affected performance a bit.
But basically only a few tables exhibited this behaviour among several 
hundreds in this particular situation.

> Have you turned the autovacuum logging up all the way?
> Is autovacuum giving up due to locking?
Not one of my systems, and I don't have access to it anymore, but I 
don't think this was the reason.

However, having some hundred million deleted rows piling every few hours 
quite increases the load. For the record, the (closed-source) 
application did issue the DELETEs on the table, so partitioning + 
TRUNCATE child_part was not applicable.


In any case, I was aiming at making this kind of operations possible and 
easier --- regardless of whether they are solving the right problem or 
not, or whether there exists an optimal solution --- since I have seen 
some real life solutions that could benefit from it.
I agree that routine index maintenance is a better match for this 
feature, though :)
>>>>      REINDEX CONCURRENTLY  (but not the blocking, regular, one)
>>>>      REFRESH MATERIALIZED VIEW CONCURRENTLY (but not the blocking one)
>>> These are interesting, but would these make sense at the role level?
>>> Both of these commands explicitly take specific relations to operate
>>> against, after all.
>> Yup. Let's imagine a cron job invoking psql in order to perform
>> maintenance routine.
> If they make sense at a relation level then they should be
> relation-level GRANT'd permissions, not role-level attributes.

Same as before.
Let's imagine this coupled with REINDEX SCHEMA CONCURRENTLY ... or 
simply when constructing the list of tables dynamically and there is no 
other use for such a grant.
Arguably, this isn't that much of a problem if there exists a way to 
easily revoke all such permissions from all objects in one go (just like 
recently discussed in another thread)

>> The particular command(s) can be generated on-the-fly by querying
>> the catalog and then send them in one go to be run sequentially by
>> the one backend.... as a crude form of rate
>> limiting/quality-of-service of sorts ("renice -p" or even "ionice
>> -p" seems quite inadequate).
> This sounds like it's something that we might want an autovacuum-like
> background process to handle..  Some kind of auto-reindex-concurrently.
> There are already plans to deal with updating of materialized views, as
> I understand it.

While I can definitively see it for materialized views (they *are* 
views, after all), this pattern potentially gets us adding everything 
but the kitchen sink inside the database.
FWIW, it's only a matter of providing a mechanism for maintenance 
routines to use very unprivileged users to perform their duties on the 
whole cluster without having to explicitly grant permissions and/or 
include these into another, regular, role.
Please keep in mind that these  roles [having only LOGIN and 
MAINTENANCE] would NOT be able to perform any DML or DDL whatsoever, nor 
any queries (unless explicitly granted permission for SELECTs).

> [snip]
>> Yes. That's the reason for the question marks  :-\
>> Some "dump to csv then load somewhere else" kind of jobs might
>> benefit from this feature, but I'm not sure the convenience is worth
>> the risk.
> I've run into quite a few processes which would really benefit from
> this, and would even be safe to use (the processes running the COPY
> commands don't have any rights on the directories except through PG),
> but it's not clear if that use-case is sufficiently broad for the
> feature to be worthwhile..  At least, some feel it isn't.  Can you
> describe your use-case more and perhaps the needle will move on that
> point?

I don't have any concrete example for this right now, but a "psql -f 
query_file.sql | processing_script | load_into_somewhere" scheduled to 
run nightly (every few hours, even) is definitively quite common.
The COPY argument (which actually reads and outputs information) 
actually falsifies the above assumptions, I'm afraid. So this task would 
need explicit permissions anyway.

You seem to have better arguments in favor of this last feature, though.

> [snip]
> I suppose, but I think you're over-simplying your use-case.  Consider
> this- do you want this login role to be a member of superuser roles?
> What about DBA or other privilege-user roles?

As pointed out by David Johnston, impersonating superusers shall be 
explicitly forbidden.
The "noimpersonate" attribute (actually, we might need a better name to 
avoid the implication that it means "can not impersonate others" but 
"can not be impersonated") could easily block this loophole.

This feature is definitely orthogonal to the proposed "magical audit 
role" proposal, which I heartily agree with.
We might need to tweak logging a bit under this circumstances to reveal 
the login user vs the authorization user (quite like UID vs EUID for 
processess), though.



Since the changes to pg_authid seem to be about to be reverted, I'll 
wait a bit for a WIP patch --- this is definitively a lot more effort 
without the bitmask and helper funcs :(
There existed another suggestion ---in the form of text-based 
"capabilities" IIRC--- which might be used for this purpose, however.

Thanks,
    J.L.




pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Next
From: Tom Lane
Date:
Subject: Re: [COMMITTERS] pgsql: Use a bitmask to represent role attributes