Re: Two patches to speed up pg_rewind. - Mailing list pgsql-hackers
From | Paul Guo |
---|---|
Subject | Re: Two patches to speed up pg_rewind. |
Date | |
Msg-id | CABQrize7pf4hbKpeCMT_gsh0Q9QROqvseoS7GqVfyhNqVis4nQ@mail.gmail.com Whole thread Raw |
In response to | Re: Two patches to speed up pg_rewind. (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: Two patches to speed up pg_rewind.
|
List | pgsql-hackers |
Thanks for reviewing, please see the replies below. On Tue, Aug 17, 2021 at 3:47 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Aug 05, 2021 at 06:18:03PM +0800, Paul Guo wrote: > > I modified the copy_file_range() patch using the below logic: > > > > If the first call of copy_file_range() fails with errno EXDEV or > > ENOTSUP, pg_rewind > > would not use copy_file_range() in rest code, and if copy_file_range() fails we > > fallback to use the previous read()+write() code logic for the file. > > I have looked at 0001, and I don't really like it. One argument > against this approach is that if pg_rewind fails in the middle of its > operation then we would have done a set of fsync() for nothing, with > the data folder still unusable. I would be curious to see some > numbers to see how much it matters with many physical files (say cases > with thousands of small relations?). > +/* Define PG_FLUSH_DATA_WORKS if we have an implementation for pg_flush_data */ > +#if defined(HAVE_SYNC_FILE_RANGE) > +#define PG_FLUSH_DATA_WORKS 1 > +#elif !defined(WIN32) && defined(MS_ASYNC) > +#define PG_FLUSH_DATA_WORKS 1 > +#elif defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED) > +#define PG_FLUSH_DATA_WORKS 1 > > This is wrong for the code frontend on platforms that may finish by > using MS_ASYNC, no? There is no such implementation in file_utils.c > but there is one in fd.c. Yes, it seems that we need to add the MS_ASYNC code (refer that in fd.c) in src/common/file_utils.c:pre_sync_fname(). > + fsync_fname("global/pg_control", false); > + fsync_fname("backup_label", false); > + if (access("recovery.conf", F_OK) == 0) > + fsync_fname("recovery.conf", false); > + if (access("postgresql.auto.conf", F_OK) == 0) > + fsync_fname("postgresql.auto.conf", false); > > This list is wrong on various aspects, no? This would miss custom > configuration files, or included files. I did not understand this. Can you please clarify? Anyway let me explain, here we fsync these files additionally because pg_rewind (possibly) modified these files after rewinding. These files may not be handled/logged in filemap pg_control action is FILE_ACTION_NONE backup_label is excluded recovery.conf is not logged in filemap postgresql.auto.conf may be logged but let's fsync this file for safety. > > - if (showprogress) > - pg_log_info("syncing target data directory"); > - sync_target_dir(); > - > /* Also update the standby configuration, if requested. */ > if (writerecoveryconf && !dry_run) > WriteRecoveryConfig(conn, datadir_target, > GenerateRecoveryConfig(conn, NULL)); > > + if (showprogress) > + pg_log_info("syncing target data directory"); > + perform_sync(filemap); > > Why inverting the order here? We need to synchronize the recoveryconf change finally in perform_sync(). > > + * Pre Linux 5.3 does not allow cross-fs copy_file_range() call > + * (return EXDEV). Some fs do not support copy_file_range() (return > + * ENOTSUP). Here we explicitly disable copy_file_range() for the > + * two scenarios. For other failures we still allow subsequent > + * copy_file_range() try. > + */ > + if (errno == ENOTSUP || errno == EXDEV) > + copy_file_range_support = false; > Are you sure that it is right to always cancel the support of > copy_file_range() after it does not work once? Couldn't it be > possible that things work properly depending on the tablespace being > worked on by pg_rewind? Ideally we should retry when first running into a symlink (e.g. tablespace, wal), but it seems not easy to do gracefully. > Having the facility for copy_file_range() in pg_rewind is not nice at > the end, and we are going to need a run-time check to fallback > dynamically to an equivalent implementation on errno={EXDEV,ENOTSUP}. > Hmm. What about locating all that in file_utils.c instead, with a > brand new routine name (pg_copy_file_range would be one choice)? We > still need the ./configure check, except that the conditions to use > the fallback implementation is in this routine, aka fallback on EXDEV, > ENOTSUP or !HAVE_COPY_FILE_RANGE. The backend may need to solve this > problem at some point, but logging and fd handling will likely just > locate that in fd.c, so having one routine for the purpose of all > frontends looks like a step in the right direction. Yes, seems better to make it generic.
pgsql-hackers by date: