Re: logical decoding : exceeded maxAllocatedDescs for .spill files - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: logical decoding : exceeded maxAllocatedDescs for .spill files
Date
Msg-id CAA4eK1+cey6i6a0zD9kk_eaDXb4RPNZqu4UwXO9LbHAgMpMBkg@mail.gmail.com
Whole thread Raw
In response to Re: logical decoding : exceeded maxAllocatedDescs for .spill files  (Amit Khandekar <amitdkhan.pg@gmail.com>)
Responses Re: logical decoding : exceeded maxAllocatedDescs for .spill files  (Amit Khandekar <amitdkhan.pg@gmail.com>)
List pgsql-hackers
On Mon, Nov 18, 2019 at 5:50 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Mon, 18 Nov 2019 at 17:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I see that you have made changes in ReorderBufferRestoreChanges to use
> > PathNameOpenFile, but not in ReorderBufferSerializeTXN.  Is there a
> > reason for the same?  In my test environment, with the test provided
> > by you, I got the error (reported in this thread) via
> > ReorderBufferSerializeTXN.
>
> You didn't get this error with the patch applied, did you ?
>

No, I got this before applying the patch.  However, after applying the
patch, I got below error in the same test:

postgres=# SELECT 1 from
pg_logical_slot_get_changes('regression_slot', NULL,NULL) LIMIT 1;
ERROR:  could not read from reorderbuffer spill file: Invalid argument

It seems to me that FileRead API used in the patch can return value <
0 on  EOF.  See the API usage in BufFileLoadBuffer.  I got this error
on a windows machine and in the server log the message was "LOG:
unrecognized win32 error code: 38" which indicates "Reached the end of
the file."

> If you were debugging this without the patch applied, I suspect that
> the reason why ReorderBufferSerializeTXN() => OpenTransientFile() is
> generating this error is because the max limit must be already crossed
> because of earlier calls to ReorderBufferRestoreChanges().
>
> Note that in ReorderBufferSerializeTXN(), OpenTransientFile() is
> sufficient because the code in that function has made sure the fd gets
> closed there itself.
>

Okay, then we might not need it there, but we should at least add a
comment in ReorderBufferRestoreChanges to explain why we have used a
different function to operate on the file at that place.

>
> For the API's that use VFDs (like PathNameOpenFile), the files opened
> are always recorded in the VfdCache array. So it is not required to do
> the cleanup at (sub)transaction end, because the kernel fds get closed
> dynamically in ReleaseLruFiles() whenever they reach max_safe_fds
> limit. So if a transaction aborts, the fds might remain open, but
> those will get cleaned up whenever we require more fds, through
> ReleaseLruFiles(). Whereas, for files opened through
> OpenTransientFile(), VfdCache is not involved, so this needs
> transaction end cleanup.
>

Have you tried by injecting some error?  After getting the error
mentioned above in email, when I retried the same query, I got the
below message.

postgres=# SELECT 1 from
pg_logical_slot_get_changes('regression_slot', NULL,NULL) LIMIT 1;
ERROR:  could not remove file
"pg_replslot/regression_slot/xid-1693-lsn-0-18000000.spill" during
removal of pg_replslot/regression_slot/xid*: Permission denied

And, then I tried to drop the replication slot and I got below error.
postgres=# SELECT * FROM pg_drop_replication_slot('regression_slot');
ERROR:  could not rename file "pg_replslot/regression_slot" to
"pg_replslot/regression_slot.tmp": Permission denied

It might be something related to Windows, but you can once try by
injecting some error after reading a few files in the code path and
see the behavior.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Planner chose a much slower plan in hashjoin, using a large tableas the inner table.
Next
From: Pavel Stehule
Date:
Subject: Re: range_agg