Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id 085eaeac-1009-7ff0-3243-9ba5c760c84e@dunslane.net
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Custom compression methods (buildfarm xupgrade)  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On 3/24/21 12:45 PM, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Mar 24, 2021 at 11:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> On reflection, though, I wonder if we've made pg_dump do the right
>>> thing anyway.  There is a strong case to be made for the idea that
>>> when dumping from a pre-14 server, it should emit
>>> SET default_toast_compression = 'pglz';
>>> rather than omitting any mention of the variable, which is what
>>> I made it do in aa25d1089.
>> But also ... aren't we just doing this to work around a test case that
>> isn't especially good in the first place? Counting the number of lines
>> in the diff between A and B is an extremely crude proxy for "they're
>> similar enough that we probably haven't broken anything."
> I wouldn't be proposing this if the xversion failures were the only
> reason; making them go away is just a nice side-effect.  The core
> point is that the charter of pg_dump is to reproduce the source
> database's state, and as things stand we're failing to ensure we
> do that.
>
> (But yeah, we really need a better way of making this check in
> the xversion tests.  I don't like the arbitrary "n lines of diff
> is probably OK" business one bit.)
>
>             



Well, I ran this module for years privately and used to have a matrix of
the exact number of diff lines expected for each combination of source
and target branch. If I didn't get that exact number of lines I reported
an error on stderr. That was fine when we weren't reporting the results
on the server, and I just sent an email to -hackers if I found an error.
I kept this matrix by examining the diffs to make sure they were all
benign. That was a pretty laborious process. So I decided to try a
heuristic approach instead, and by trial and error came up with this
2000 lines measurement. When this appeared to be working and stable the
module was released into the wild for other buildfarm owners to deploy.

Nothing is hidden here - the diffs are reported, see for example

<https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=crake&dt=2021-03-28%2015%3A37%3A07&stg=xversion-upgrade-REL9_4_STABLE-HEAD>
What we're comparing here is target pg_dumpall against the original
source vs target pg_dumpall against the upgraded source.

If someone wants to come up with a better rule for detecting that
nothing has gone wrong, I'll be happy to implement it. I don't
particularly like the current rule either, it's there faute de mieux.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




pgsql-hackers by date:

Previous
From: Vik Fearing
Date:
Subject: Re: Idea: Avoid JOINs by using path expressions to follow FKs
Next
From: Thomas Munro
Date:
Subject: Re: MultiXact\SLRU buffers configuration