Re: Instability with incremental backup tests (pg_combinebackup, 003_timeline.pl) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Instability with incremental backup tests (pg_combinebackup, 003_timeline.pl)
Date
Msg-id b6083df1-623d-4f25-bbb9-9f3fdf292c00@vondra.me
Whole thread Raw
In response to Re: Instability with incremental backup tests (pg_combinebackup, 003_timeline.pl)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 8/21/24 14:58, Robert Haas wrote:
> ...
>
> All we're doing here is taking an incremental backup of 1-table
> database that had 1 row at the time of the full backup and has had 1
> more row inserted since then. On my system, the last time I ran this
> regression test, this step completed in 410ms. It shouldn't be
> expensive. So I'm inclined to chalk this up to the machine not having
> enough resources. The only thing that I don't really understand is why
> this particular test would fail vs. anything else. We have a bunch of
> tests that take backups. A possibly important difference here is that
> this one is an incremental backup, so it would need to read WAL
> summary files from the beginning of the full backup to the beginning
> of the current backup and combine them into one super-summary that it
> could then use to decide what to include in the incremental backup.
> However, since this is an artificial example with just 1 insert
> between the full and the incremental, it's hard to imagine that being
> expensive, unless there's some low-probability bug that makes it go
> into an infinite loop or chew up a million CPU cycles or something.
> That's not impossible, but given the discussion between you and Tomas,
> I'm kinda hoping it was just a hardware issue.
> 
> Barring objections or other similar trouble reports, I think we should
> just close out this open item.
> 

+1 to just close it

The animal is running FreeBSD on rpi4, and used to be running from a
flash disk. Seems FreeBSD has some trouble with that, which likely
contributed to the failures (a bit weird it affected just this test).

Moving to a better storage (SATA SSD over USB) improved the situation
quite a bit. It's a bit too early to say for sure, ofc. But I don't
think the test itself is broken.


regards

-- 
Tomas Vondra



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: generic plans and "initial" pruning
Next
From: Peter Eisentraut
Date:
Subject: Re: Requiring LLVM 14+ in PostgreSQL 18