Re: tests against running server occasionally fail, postgres_fdw & tenk1 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: tests against running server occasionally fail, postgres_fdw & tenk1
Date
Msg-id 20230208024748.ijvwabhqu4xlbvin@awork3.anarazel.de
Whole thread Raw
In response to Re: tests against running server occasionally fail, postgres_fdw & tenk1  (Andres Freund <andres@anarazel.de>)
Responses Re: tests against running server occasionally fail, postgres_fdw & tenk1
List pgsql-hackers
Hi,

On 2023-02-06 17:53:00 -0800, Andres Freund wrote:
> Another run hit an issue we've been fighting repeatedly on the buildfarm / CI:
> https://cirrus-ci.com/task/5527490404286464
>
https://api.cirrus-ci.com/v1/artifact/task/5527490404286464/testrun/build/testrun/regress-running/regress/regression.diffs
>
> diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/create_index.out
/tmp/cirrus-ci-build/build/testrun/regress-running/regress/results/create_index.out
> --- /tmp/cirrus-ci-build/src/test/regress/expected/create_index.out    2023-02-06 23:52:43.627604000 +0000
> +++ /tmp/cirrus-ci-build/build/testrun/regress-running/regress/results/create_index.out    2023-02-07
00:03:04.535232000+0000
 
> @@ -1930,12 +1930,13 @@
>  SELECT thousand, tenthous FROM tenk1
>  WHERE thousand < 2 AND tenthous IN (1001,3000)
>  ORDER BY thousand;
> -                      QUERY PLAN
> --------------------------------------------------------
> - Index Only Scan using tenk1_thous_tenthous on tenk1
> -   Index Cond: (thousand < 2)
> -   Filter: (tenthous = ANY ('{1001,3000}'::integer[]))
> -(3 rows)
> +                                      QUERY PLAN
> +--------------------------------------------------------------------------------------
> + Sort
> +   Sort Key: thousand
> +   ->  Index Only Scan using tenk1_thous_tenthous on tenk1
> +         Index Cond: ((thousand < 2) AND (tenthous = ANY ('{1001,3000}'::integer[])))
> +(4 rows)
>
>  SELECT thousand, tenthous FROM tenk1
>  WHERE thousand < 2 AND tenthous IN (1001,3000)
>
>
> I'd be tempted to disable the test, but it also found genuine issues in a
> bunch of CF entries, and all these test instabilities seem like ones we'd also
> see, with a lower occurrence on the buildfarm.

The last occasion we hit this was at: https://www.postgresql.org/message-id/1346227.1649887693%40sss.pgh.pa.us

I'm working on cleaning up the patch used for debugging in that thread, to
make VACUUM log to the server log if VERBOSE isn't used.

One thing I'm not quite sure what to do about is that we atm use a hardcoded
DEBUG2 (not controlled by VERBOSE) in a bunch of places:

    ereport(DEBUG2,
            (errmsg("table \"%s\": removed %lld dead item identifiers in %u pages",
                    vacrel->relname, (long long) index, vacuumed_pages)));

    ivinfo.message_level = DEBUG2;

I find DEBUG2 hard to use to run the entire regression tests, it results in a
lot of output. Lots of it far less important than these kinds of details
here. So I'd like to use a different log level for them - but without further
complications that'd mean they'd show up in VACUUUM VERBOSE.

I made them part of VERBOSE for now, but not because I think that's
necessarily the right answer, but because it could be useful for debugging
this stupid flapping test.


I right now set instrument = true when
message_level_is_interesting(DEBUG1). But that probably should be false? I set
it to true because of starttime, but it'd probably be better to just move it
out of the if (instrument). Also would require re-jiggering the condition of
the "main block" doing the logging.


FWIW, running all regression tests that support doing so against a running
server with DEBUG1 results in 8.1MB, DEBUG2 in 17MB.


Greetings,

Andres Freund

Attachment

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: Add LZ4 compression in pg_dump
Next
From: "Sho Kato (Fujitsu)"
Date:
Subject: Difference of ecpg japanese translation in PG15