Re: logical decoding : exceeded maxAllocatedDescs for .spill files - Mailing list pgsql-hackers

From Amit Khandekar
Subject Re: logical decoding : exceeded maxAllocatedDescs for .spill files
Date
Msg-id CAJ3gD9ckx6ch34tGk6UWWqhWb6bnd9+8Ax8=wnoZgJ_J_YNUgg@mail.gmail.com
Whole thread Raw
In response to Re: logical decoding : exceeded maxAllocatedDescs for .spill files  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: logical decoding : exceeded maxAllocatedDescs for .spill files  (Thomas Munro <thomas.munro@gmail.com>)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files  (Amit Kapila <amit.kapila16@gmail.com>)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tue, 19 Nov 2019 at 14:07, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 18, 2019 at 5:50 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> >
> > On Mon, 18 Nov 2019 at 17:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > I see that you have made changes in ReorderBufferRestoreChanges to use
> > > PathNameOpenFile, but not in ReorderBufferSerializeTXN.  Is there a
> > > reason for the same?  In my test environment, with the test provided
> > > by you, I got the error (reported in this thread) via
> > > ReorderBufferSerializeTXN.
> >
> > You didn't get this error with the patch applied, did you ?
> >
>
> No, I got this before applying the patch.  However, after applying the
> patch, I got below error in the same test:
>
> postgres=# SELECT 1 from
> pg_logical_slot_get_changes('regression_slot', NULL,NULL) LIMIT 1;
> ERROR:  could not read from reorderbuffer spill file: Invalid argument
>
> It seems to me that FileRead API used in the patch can return value <
> 0 on  EOF.  See the API usage in BufFileLoadBuffer.  I got this error
> on a windows machine and in the server log the message was "LOG:
> unrecognized win32 error code: 38" which indicates "Reached the end of
> the file."

On Windows, it is documented that ReadFile() (which is called by
pg_pread) will return false on EOF but only when the file is open for
asynchronous reads/writes. But here we are just dealing with usual
synchronous reads. So pg_pread() code should indeed return 0 on EOF on
Windows. Not yet able to figure out how FileRead() managed to return
this error on Windows. But from your symptoms, it does look like
pg_pread()=>ReadFile() returned false (despite doing asynchronous
reads), and so _dosmaperr() gets called, and then it does not find the
eof error in doserrors[], so the "unrecognized win32 error code"
message is printed. May have to dig up more on this.


>
> > If you were debugging this without the patch applied, I suspect that
> > the reason why ReorderBufferSerializeTXN() => OpenTransientFile() is
> > generating this error is because the max limit must be already crossed
> > because of earlier calls to ReorderBufferRestoreChanges().
> >
> > Note that in ReorderBufferSerializeTXN(), OpenTransientFile() is
> > sufficient because the code in that function has made sure the fd gets
> > closed there itself.
> >
>
> Okay, then we might not need it there, but we should at least add a
> comment in ReorderBufferRestoreChanges to explain why we have used a
> different function to operate on the file at that place.

Yeah, that might make sense.

>
> >
> > For the API's that use VFDs (like PathNameOpenFile), the files opened
> > are always recorded in the VfdCache array. So it is not required to do
> > the cleanup at (sub)transaction end, because the kernel fds get closed
> > dynamically in ReleaseLruFiles() whenever they reach max_safe_fds
> > limit. So if a transaction aborts, the fds might remain open, but
> > those will get cleaned up whenever we require more fds, through
> > ReleaseLruFiles(). Whereas, for files opened through
> > OpenTransientFile(), VfdCache is not involved, so this needs
> > transaction end cleanup.
> >
>
> Have you tried by injecting some error?  After getting the error
> mentioned above in email, when I retried the same query, I got the
> below message.
>
> postgres=# SELECT 1 from
> pg_logical_slot_get_changes('regression_slot', NULL,NULL) LIMIT 1;
> ERROR:  could not remove file
> "pg_replslot/regression_slot/xid-1693-lsn-0-18000000.spill" during
> removal of pg_replslot/regression_slot/xid*: Permission denied
>
> And, then I tried to drop the replication slot and I got below error.
> postgres=# SELECT * FROM pg_drop_replication_slot('regression_slot');
> ERROR:  could not rename file "pg_replslot/regression_slot" to
> "pg_replslot/regression_slot.tmp": Permission denied
>
> It might be something related to Windows

Oh ok, I missed the fact that on Windows we can't delete the files
that are already open, unlike Linux/Unix.
I guess, I may have to use FD_CLOSE_AT_EOXACT flags; or simply use
OpenTemporaryFile(). I wonder though if this same issue might come up
for the other use-case of PathNameOpenFile() :
logical_rewrite_log_mapping().

> but you can once try by
> injecting some error after reading a few files in the code path and
> see the behavior.
Yeah, will check the behaviour, although on Linux, I think I won't get
this error. But yes, like I mentioned above, I think we might have to
arrange for something.



--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Protect syscache from bloating with negative cache entries
Next
From: Thomas Munro
Date:
Subject: Re: logical decoding : exceeded maxAllocatedDescs for .spill files