Re: tests against running server occasionally fail, postgres_fdw & tenk1 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: tests against running server occasionally fail, postgres_fdw & tenk1 |
Date | |
Msg-id | 20230209002935.2mq47txxcpvcgyqv@awork3.anarazel.de Whole thread Raw |
In response to | Re: tests against running server occasionally fail, postgres_fdw & tenk1 (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: tests against running server occasionally fail, postgres_fdw & tenk1
|
List | pgsql-hackers |
Hi, On 2023-02-08 14:03:49 -0800, Peter Geoghegan wrote: > On Tue, Feb 7, 2023 at 6:47 PM Andres Freund <andres@anarazel.de> wrote: > > One thing I'm not quite sure what to do about is that we atm use a hardcoded > > DEBUG2 (not controlled by VERBOSE) in a bunch of places: > > > > ereport(DEBUG2, > > (errmsg("table \"%s\": removed %lld dead item identifiers in %u pages", > > vacrel->relname, (long long) index, vacuumed_pages))); > > > > ivinfo.message_level = DEBUG2; > > > > I find DEBUG2 hard to use to run the entire regression tests, it results in a > > lot of output. Lots of it far less important than these kinds of details > > here. So I'd like to use a different log level for them - but without further > > complications that'd mean they'd show up in VACUUUM VERBOSE. > > I think that these DEBUG2 traces are of limited value, even for > experts. I personally never use them -- I just use VACUUM > VERBOSE/autovacuum logging for everything, since it's far easier to > read, and isn't missing something that the DEBUG2 traces have. In fact > I wonder if we should even have them at all. I find it useful information when debugging problems. Without it, the log doesn't tell you which index was processed when a problem started to occur. Or even that we were scanning indexes at all. IOW, I care a lot more about log entries denoting the start / end of an index vacuum than attached numbers. Although I also care about those, to some degree. > I generally don't care about the details when VACUUM runs out of space > for dead_items -- these days the important thing is to just avoid it > completely. I wonder if it'd possibly make sense to log more verbosely if we do end up running out of space, but not otherwise? Particularly because it's important to avoid multi-pass vacuums? Right now there's basically no log output until the vacuum finished, which in a bad situation could take days, with many index scan cycles. Logging enough to be able to see such things happening IMO is important. > I also generally don't care about how many index tuples > were deleted by each index's ambulkdelete() call, since VACUUM VERBOSE > already shows me the number of LP_DEAD stubs encountered/removed in > the heap. Isn't it actually quite useful to see how many entries were removed from the index relative to the number of LP_DEAD removed in the heap? And relative to other indexes? E.g. to understand how effective killtuple style logic is? One annoyance with the multiple logs as part of a [auto]vacuum is that we end up logging the context / statement repeatedly, leading to pointless output like this: 2023-02-08 15:55:01.117 PST [3904676][client backend][2/55:0][psql] LOG: vacuuming "postgres.public.large" 2023-02-08 15:55:01.117 PST [3904676][client backend][2/55:0][psql] STATEMENT: VACUUM (PARALLEL 0) large ; 2023-02-08 15:55:02.671 PST [3904676][client backend][2/55:0][psql] LOG: scanned index "large_pkey" to remove 499994 rowversions 2023-02-08 15:55:02.671 PST [3904676][client backend][2/55:0][psql] CONTEXT: while vacuuming index "large_pkey" of relation"public.large" 2023-02-08 15:55:02.671 PST [3904676][client backend][2/55:0][psql] STATEMENT: VACUUM (PARALLEL 0) large ; ... 2023-02-08 15:55:03.496 PST [3904676][client backend][2/55:0][psql] STATEMENT: VACUUM (PARALLEL 0) large ; 2023-02-08 15:55:03.498 PST [3904676][client backend][2/56:0][psql] LOG: vacuuming "postgres.pg_toast.pg_toast_3370138" 2023-02-08 15:55:03.498 PST [3904676][client backend][2/56:0][psql] STATEMENT: VACUUM (PARALLEL 0) large ; 2023-02-08 15:55:03.498 PST [3904676][client backend][2/56:0][psql] LOG: finished vacuuming "postgres.pg_toast.pg_toast_3370138":index scans: 0 I think we should emit most of the non-error/warning messages in vacuum with errhidestmt(true), errhidecontext(true) to avoid that. The error context is quite helpful for error messages due to corruption, cancellations and such, but not for messages where we already are careful to include relation names. I generally like the improved log output for [auto]vacuum, but the fact that you can't see anymore when index scans start imo problematic. Right now you can't even infer how long the first index scan takes, which really isn't great. I'd thus like to: 1) Use errhidestmt(true), errhidecontext(true) for vacuum ereport(non-error-or-warning) 2) Add a message to lazy_vacuum() or lazy_vacuum_all_indexes(), that includes - num_index_scans - how many indexes we'll scan - how many dead tids we're working on removing 3) Add a log at the end of lazy_vacuum_heap_rel() that's logged only (or more verbosely) when lazy_vacuum() was run due to running out of space If we just do the heap scan once, this can be easily inferred from the other messages. 4) When we run out of space for dead tids, increase the log level for the rest of vacuum. It's sufficiently bad if that happens that we really ought to include it in the log by default. 5) Remove the row versions from vac_bulkdel_one_index()'s message, it's already included in 2). Instead we could emit the content from vac_cleanup_one_index(), imo that's a lot more useful when separated for each scan. 6) Possibly remove the log output from vac_cleanup_one_index()? 2) and 3) together allow to figure out how long individual scan / vacuum phases are taking. 1) should reduce log verbosity sufficiently to make it easier to actually read the output. FWIW, I'm not proposing to do all of that in one patch, once I understand a bit more what we have concensus and what not I can split it into steps. Greetings, Andres Freund
pgsql-hackers by date: