Re: O(n^2) system calls in RemoveOldXlogFiles() - Mailing list pgsql-hackers

From Andres Freund
Subject Re: O(n^2) system calls in RemoveOldXlogFiles()
Date
Msg-id 20210112055534.ncnt2rrbzjiybzqd@alap3.anarazel.de
Whole thread Raw
In response to O(n^2) system calls in RemoveOldXlogFiles()  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: O(n^2) system calls in RemoveOldXlogFiles()  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
Hi,

On 2021-01-11 16:35:56 +1300, Thomas Munro wrote:
> I noticed that RemoveXlogFile() has this code:
> 
>         /*
>          * Before deleting the file, see if it can be recycled as a future log
>          * segment. Only recycle normal files, pg_standby for example can create
>          * symbolic links pointing to a separate archive directory.
>          */
>         if (wal_recycle &&
>                 endlogSegNo <= recycleSegNo &&
>                 lstat(path, &statbuf) == 0 && S_ISREG(statbuf.st_mode) &&
>                 InstallXLogFileSegment(&endlogSegNo, path,
>                                                            true,
> recycleSegNo, true))
>         {
>                 ereport(DEBUG2,
>                                 (errmsg("recycled write-ahead log file \"%s\"",
>                                                 segname)));
>                 CheckpointStats.ckpt_segs_recycled++;
>                 /* Needn't recheck that slot on future iterations */
>                 endlogSegNo++;
>         }
> 
> I didn't check the migration history of this code but it seems that
> endlogSegNo doesn't currently have the right scoping to achieve the
> goal of that last comment, so checkpoints finish up repeatedly search
> for the next free slot, starting at the low end each time, like so:
> 
> stat("pg_wal/00000001000000000000004F", {st_mode=S_IFREG|0600,
> st_size=16777216, ...}) = 0
> ...
> stat("pg_wal/000000010000000000000073", 0x7fff98b9e060) = -1 ENOENT
> (No such file or directory)
> 
> stat("pg_wal/00000001000000000000004F", {st_mode=S_IFREG|0600,
> st_size=16777216, ...}) = 0
> ...
> stat("pg_wal/000000010000000000000074", 0x7fff98b9e060) = -1 ENOENT
> (No such file or directory)
> 
> ... and so on until we've recycled all our recyclable segments.  Ouch.

I found this before as well: https://postgr.es/m/CAB7nPqTB3VcKSSrW2Qj59tYYR2H4+n=5pZbdWou+X9iqVNMCag@mail.gmail.com

I did put a hastily rebased version of that commit in my aio branch
during development: https://github.com/anarazel/postgres/commit/b3cc8adacf7860add8cc62ec373ac955d9d12992

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Ian Lawrence Barwick
Date:
Subject: "has_column_privilege()" issue with attnums and non-existent columns
Next
From: Bharath Rupireddy
Date:
Subject: Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION