Re: Logging parallel worker draught - Mailing list pgsql-hackers
From | Benoit Lobréau |
---|---|
Subject | Re: Logging parallel worker draught |
Date | |
Msg-id | 82ffc7a2-b77d-437b-a684-4a79a32fd87b@dalibo.com Whole thread Raw |
In response to | Re: Logging parallel worker draught (Sami Imseih <samimseih@gmail.com>) |
Responses |
Re: Logging parallel worker draught
|
List | pgsql-hackers |
Hi, thank you for the review and sorry for the delayed answer. On 12/28/24 05:17, Sami Imseih wrote:> Thinking about this further, it seems to me this logging only makes sense > for parallel query and not maintenance commands. This is because > for the latter case, the commands are executed manually and a user can issue > VACUUM VERBOSE ( for the case of vacuum ). For CREATE INDEX, one > can enable DEBUG1. For example: > > postgres=# CREATE INDEX tbl_c1 ON tbl(c1); > DEBUG: building index "tbl_c1" on table "tbl" with request for 1 > parallel workers > DEBUG: index "tbl_c1" can safely use deduplication > CREATE INDEX In my opinion, it's useful: * Centralization of logs: The maintenance operation can be planned in cron jobs. In that context, it could certainly be logged separately via the script. However, when I am doing an audit on a client database, I think it's useful to have all the relevant information in PostgreSQL logs. * Centralization of the configuration: If I am auditing the usage of parallelism, I would like to have a single GUC to enable the logging for all relevant information. * Context: Maintenance workers are part of the global pool of workers. To know why there is a shortage after the fact, having all the information on what was using workers could be useful. I tied this argument when we added the parallel worker info to pg_stat_database and wasn't successful. It would be nice to have the info somewhere. The logging for CREATE INDEX and VACUUM is in separate patches. If you don't mind, I would like to keep it as long as possible. > 2/ For logging parallel query workers, the log line could be simpler. > > Instead of: > > + elog(LOG, "%i parallel nodes planned (%i obtained all their workers, > %i obtained none), " > + "%i workers planned to be launched (%i workers launched)", > > what about something like: > > "launched %d parallel workers (planned: %)" > > For example, if I have 2 gather nodes and they each plan and launch > 2 workers, the log line should be as simple as > > "launched 4 parallel workers (planned: 4)" > > This behavior in which workers planned and launched are aggregated in the > log can be described in the documentation. "%i parallel nodes planned (%i obtained all their workers, %i obtained none)," I added this part during my first reorganization. I was trying to extract as much information as possible and thought that the special case where a parallel node was planned but no workers were obtained was the worst-case scenario, which might be interesting to log. If you think it doesn't bring much value, we can strip it. > 3/ Also, log_parallel_query_workers as the name of the GUC will make more sense > if we go logging parallel query only. I agree that log_parallel_query_workers would make more sense if we focus on logging parallel queries only. Since the maintenance-related logging was added later, I missed this point. Thank you for bringging this up. If agreeable, I will rename the GUC to log_parallel_workers as long as the argument regarding maintenance workers is unresolved. If we decide to strip them, I will revert it to log_parallel_query_workers. >> I don't think "failure" is a good name for the setting as it's >> a bit too harsh. What we really have is a "shortage" of >> workers. I agree that "failure" is too harsh. The term "shortage" better describes the situation. Note: I added Tomas Vondra to the recipients of the email since he advised me on this topic initially and might have an opinion on this. -- Benoit Lobréau Consultant http://dalibo.com
pgsql-hackers by date: