Re: making relfilenodes 56 bits - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: making relfilenodes 56 bits
Date
Msg-id CA+hUKGLViQEkmM0KJa_KHKmBVfNgps2oUpzVByEWXpzGnvO8kA@mail.gmail.com
Whole thread Raw
In response to Re: making relfilenodes 56 bits  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: making relfilenodes 56 bits
List pgsql-hackers
Hi Dilip,

I am very happy to see these commits.  Here's some belated review for
the tombstone-removal patch.

> v7-0004-Don-t-delay-removing-Tombstone-file-until-next.patch

More things you can remove:

 * sync_unlinkfiletag in struct SyncOps
 * the macro UNLINKS_PER_ABSORB
 * global variable pendingUnlinks

This comment after the question mark is obsolete:

                * XXX should we CHECK_FOR_INTERRUPTS in this loop?
Escaping with an
                * error in the case of SYNC_UNLINK_REQUEST would leave the
                * no-longer-used file still present on disk, which
would be bad, so
                * I'm inclined to assume that the checkpointer will
always empty the
                * queue soon.

(I think if the answer to the question is now yes, then we should
replace the stupid sleep with a condition variable sleep, but there's
another thread about that somewhere).

In a couple of places in dbcommands.c you could now make this change:

        /*
-        * Force a checkpoint before starting the copy. This will
force all dirty
-        * buffers, including those of unlogged tables, out to disk, to ensure
-        * source database is up-to-date on disk for the copy.
-        * FlushDatabaseBuffers() would suffice for that, but we also want to
-        * process any pending unlink requests. Otherwise, if a checkpoint
-        * happened while we're copying files, a file might be deleted just when
-        * we're about to copy it, causing the lstat() call in copydir() to fail
-        * with ENOENT.
+        * Force all dirty buffers, including those of unlogged tables, out to
+        * disk, to ensure source database is up-to-date on disk for the copy.
         */
-       RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE |
-                                         CHECKPOINT_WAIT |
CHECKPOINT_FLUSH_ALL);
+       FlushDatabaseBuffers(src_dboid);

More obsolete comments you could change:

         * If we were copying database at block levels then drop pages for the
         * destination database that are in the shared buffer cache.  And tell
-->      * checkpointer to forget any pending fsync and unlink
requests for files

-->     * Tell checkpointer to forget any pending fsync and unlink requests for
        * files in the database; else the fsyncs will fail at next
checkpoint, or
        * worse, it will delete file

In tablespace.c I think you could now make this change:

        if (!destroy_tablespace_directories(tablespaceoid, false))
        {
-               /*
-                * Not all files deleted?  However, there can be
lingering empty files
-                * in the directories, left behind by for example DROP
TABLE, that
-                * have been scheduled for deletion at next checkpoint
(see comments
-                * in mdunlink() for details).  We could just delete
them immediately,
-                * but we can't tell them apart from important data
files that we
-                * mustn't delete.  So instead, we force a checkpoint
which will clean
-                * out any lingering files, and try again.
-                */
-               RequestCheckpoint(CHECKPOINT_IMMEDIATE |
CHECKPOINT_FORCE | CHECKPOINT_WAIT);
-
+#ifdef WIN32
                /*
                 * On Windows, an unlinked file persists in the
directory listing
                 * until no process retains an open handle for the
file.  The DDL
@@ -523,6 +513,7 @@ DropTableSpace(DropTableSpaceStmt *stmt)

                /* And now try again. */
                if (!destroy_tablespace_directories(tablespaceoid, false))
+#endif
                {
                        /* Still not empty, the files must be important then */
                        ereport(ERROR,



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)