Thread: logical decoding : exceeded maxAllocatedDescs for .spill files

logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

11 September 2019, 10:44:18

Hi,

I reproduced the error "exceeded maxAllocatedDescs (492) while trying
to open file ...", which was also discussed about in the thread [1].
This issue is similar but not exactly the same as [1]. In [1], the
file for which this error used to show up was
"pg_logical/mappings/map...." , while here it's the .spill file. And
here the issue , in short, seems to be : The .spill file does not get
closed there and then, unlike in [1] where there was a file descriptor
leak.

I could reproduce it using a transaction containing a long series of
sub-transactions (possibly could be reproduced without
sub-transactions, but looking at the code I could come up with this
script using sub-transactions easily) :

create table tab(id int);

-- Function that does huge changes in a single transaction
create or replace function f(id int) returns int as
$$
begin
    -- Iterate this more than 492 times (max transient file
descriptors PG would typically allow)
    -- This will create that many sub-transactions due to presence of
exception block.
    FOR i IN 1..600 LOOP

        BEGIN
            -- Iterate more than 4096 times (so that changes spill to
disk: max_changes_in_memory)
            FOR j IN 1..5000 LOOP
               insert into tab values (1);
            END LOOP;
        EXCEPTION
            when division_by_zero then perform 'dummy';
        END;

    END LOOP;

    return id;
end $$ language plpgsql;

SELECT * FROM pg_create_logical_replication_slot('logical', 'test_decoding');

begin;
select f(1); -- Do huge changes in a single transaction
commit;

\! pg_recvlogical -d postgres  --slot=logical --verbose --start -f -

pg_recvlogical: starting log streaming at 0/0 (slot logical)
pg_recvlogical: streaming initiated
pg_recvlogical: confirming write up to 0/0, flush to 0/0 (slot logical)
BEGIN 1869
pg_recvlogical: confirming write up to 0/1B6D6E38, flush to 0/1B6D6E38
(slot logical)
pg_recvlogical: error: unexpected termination of replication stream:
ERROR:  exceeded maxAllocatedDescs (492) while trying to open file
"pg_replslot/logical/xid-2362-lsn-0-24000000.spill"
pg_recvlogical: disconnected; waiting 5 seconds to try again

Looking at the code, what might be happening is,
ReorderBufferIterTXNInit()=>ReorderBufferRestoreChanges() opens the
files, but leaves them open if end of file is not reached. Eventually
if end of file is reached, it gets closed. The function returns back
without closing the file descriptor if  reorder buffer changes being
restored are more than max_changes_in_memory. Probably later on, the
rest of the changes get restored in another
ReorderBufferRestoreChanges() call. But meanwhile, if there are a lot
of such files opened, we can run out of the max files that PG decides
to keep open (it has some logic that takes into account system files
allowed to be open, and already opened).

Offhand, what I am thinking is, we need to close the file descriptor
before returning from ReorderBufferRestoreChanges(), and keep track of
the file offset and file path, so that next time we can resume reading
from there.

Comments ?

[1] https://www.postgresql.org/message-id/flat/738a590a-2ce5-9394-2bef-7b1caad89b37%402ndquadrant.com

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Alvaro Herrera from 2ndQuadrant

Date:

11 September 2019, 12:51:40

On 2019-Sep-11, Amit Khandekar wrote:

> I reproduced the error "exceeded maxAllocatedDescs (492) while trying
> to open file ...", which was also discussed about in the thread [1].
> This issue is similar but not exactly the same as [1]. In [1], the
> file for which this error used to show up was
> "pg_logical/mappings/map...." , while here it's the .spill file. And
> here the issue , in short, seems to be : The .spill file does not get
> closed there and then, unlike in [1] where there was a file descriptor
> leak.

Uh-oh :-(  Thanks for the reproducer -- I confirm it breaks things.

> Looking at the code, what might be happening is,
> ReorderBufferIterTXNInit()=>ReorderBufferRestoreChanges() opens the
> files, but leaves them open if end of file is not reached. Eventually
> if end of file is reached, it gets closed. The function returns back
> without closing the file descriptor if  reorder buffer changes being
> restored are more than max_changes_in_memory. Probably later on, the
> rest of the changes get restored in another
> ReorderBufferRestoreChanges() call. But meanwhile, if there are a lot
> of such files opened, we can run out of the max files that PG decides
> to keep open (it has some logic that takes into account system files
> allowed to be open, and already opened).

Makes sense.

> Offhand, what I am thinking is, we need to close the file descriptor
> before returning from ReorderBufferRestoreChanges(), and keep track of
> the file offset and file path, so that next time we can resume reading
> from there.

I think doing this all the time would make restore very slow -- there's
a reason we keep the files open, after all.  It would be better if we
can keep the descriptors open as much as possible, and only close them
if there's trouble.  I was under the impression that using
OpenTransientFile was already taking care of that, but that's evidently
not the case.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Tomas Vondra

Date:

12 September 2019, 09:30:55

On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:
>On 2019-Sep-11, Amit Khandekar wrote:
>
>> I reproduced the error "exceeded maxAllocatedDescs (492) while trying
>> to open file ...", which was also discussed about in the thread [1].
>> This issue is similar but not exactly the same as [1]. In [1], the
>> file for which this error used to show up was
>> "pg_logical/mappings/map...." , while here it's the .spill file. And
>> here the issue , in short, seems to be : The .spill file does not get
>> closed there and then, unlike in [1] where there was a file descriptor
>> leak.
>
>Uh-oh :-(  Thanks for the reproducer -- I confirm it breaks things.
>
>> Looking at the code, what might be happening is,
>> ReorderBufferIterTXNInit()=>ReorderBufferRestoreChanges() opens the
>> files, but leaves them open if end of file is not reached. Eventually
>> if end of file is reached, it gets closed. The function returns back
>> without closing the file descriptor if  reorder buffer changes being
>> restored are more than max_changes_in_memory. Probably later on, the
>> rest of the changes get restored in another
>> ReorderBufferRestoreChanges() call. But meanwhile, if there are a lot
>> of such files opened, we can run out of the max files that PG decides
>> to keep open (it has some logic that takes into account system files
>> allowed to be open, and already opened).
>
>Makes sense.
>
>> Offhand, what I am thinking is, we need to close the file descriptor
>> before returning from ReorderBufferRestoreChanges(), and keep track of
>> the file offset and file path, so that next time we can resume reading
>> from there.
>
>I think doing this all the time would make restore very slow -- there's a
>reason we keep the files open, after all.

How much slower? It certainly will have a hit, but maybe it's negligible
compared to all the other stuff happening in this code?

>It would be better if we can keep the descriptors open as much as
>possible, and only close them if there's trouble.  I was under the
>impression that using OpenTransientFile was already taking care of that,
>but that's evidently not the case.
>

I don't see how the current API could do that transparently - it does
track the files, but the user only gets a file descriptor. With just a
file descriptor, how could the code know to do reopen/seek when it's going
just through the regular fopen/fclose?

Anyway, I agree we need to do something, to fix this corner case (many
serialized in-progress transactions). ISTM we have two options - either do
something in the context of reorderbuffer.c, or extend the transient file
API somehow. I'd say the second option is the right thing going forward,
because it does allow doing it transparently and without leaking details
about maxAllocatedDescs etc. There are two issues, though - it does
require changes / extensions to the API, and it's not backpatchable.

So maybe we should start with the localized fix in reorderbuffer, and I
agree tracking offset seems reasonable.

As a sidenote - in the other thread about streaming, one of the patches
does change how we log subxact assignments. In the end, this allows using
just a single file for the top-level transaction, instead of having one
file per subxact. That would also solve this.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Alvaro Herrera

Date:

12 September 2019, 12:57:55

On 2019-Sep-12, Tomas Vondra wrote:

> On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:
> > On 2019-Sep-11, Amit Khandekar wrote:

> > I think doing this all the time would make restore very slow -- there's a
> > reason we keep the files open, after all.
> 
> How much slower? It certainly will have a hit, but maybe it's negligible
> compared to all the other stuff happening in this code?

I dunno -- that's a half-assed guess based on there being many more
syscalls, and on the fact that the API is how it is in the first place.
(Andres did a lot of perf benchmarking and tweaking back when he was
writing this ... I just point out that I have a colleague that had to
invent *a lot* of new MemCxt infrastructure in order to make some of
Andres' perf optimizations cleaner, just as a semi-related data point.
Anyway, I digress.)

Anyway, such a fix would pessimize all cases, including every single
case that works today (which evidently is almost every single usage of
this feature, since we hadn't heard of this problem until yesterday), in
order to solve a problem that you can only find in very rare ones.

Another point of view is that we should make it work first, then make it
fast.  But the point remains that it works fine and fast for 99.99% of
cases.

> > It would be better if we can keep the descriptors open as much as
> > possible, and only close them if there's trouble.  I was under the
> > impression that using OpenTransientFile was already taking care of that,
> > but that's evidently not the case.
> 
> I don't see how the current API could do that transparently - it does
> track the files, but the user only gets a file descriptor. With just a
> file descriptor, how could the code know to do reopen/seek when it's going
> just through the regular fopen/fclose?

Yeah, I don't know what was in Amit's mind, but it seemed obvious to me
that such a fix required changing that API so that a seekpos is kept
together with the fd.  ReorderBufferRestoreChange is static in
reorderbuffer.c so it's not a problem to change its ABI.

I agree with trying to get a reorderbuffer-localized back-patchable fix
first, then we how to improve from that.

> As a sidenote - in the other thread about streaming, one of the patches
> does change how we log subxact assignments. In the end, this allows using
> just a single file for the top-level transaction, instead of having one
> file per subxact. That would also solve this.

:-(  While I would love to get that patch done and get rid of the issue,
it's not a backpatchable fix either.  ... however, it does mean we maybe
can get away with the reorderbuffer.c-local fix, and then just use your
streaming stuff to get rid of the perf problem forever?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Robert Haas

Date:

12 September 2019, 13:41:02

On Thu, Sep 12, 2019 at 5:31 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> I don't see how the current API could do that transparently - it does
> track the files, but the user only gets a file descriptor. With just a
> file descriptor, how could the code know to do reopen/seek when it's going
> just through the regular fopen/fclose?
>
> Anyway, I agree we need to do something, to fix this corner case (many
> serialized in-progress transactions). ISTM we have two options - either do
> something in the context of reorderbuffer.c, or extend the transient file
> API somehow. I'd say the second option is the right thing going forward,
> because it does allow doing it transparently and without leaking details
> about maxAllocatedDescs etc. There are two issues, though - it does
> require changes / extensions to the API, and it's not backpatchable.
>
> So maybe we should start with the localized fix in reorderbuffer, and I
> agree tracking offset seems reasonable.

We've already got code that knows how to track this sort of thing.
You just need to go through the File abstraction (PathNameOpenFile or
PathNameOpenFilePerm or OpenTemporaryFile) rather than using the
functions that deal directly with fds (OpenTransientFile,
BasicOpenFile, etc.).  It seems like it would be better to reuse the
existing VFD layer than to invent a whole new one specific to logical
replication.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Andres Freund

Date:

12 September 2019, 18:31:37

Hi,

On 2019-09-12 09:41:02 -0400, Robert Haas wrote:
> On Thu, Sep 12, 2019 at 5:31 AM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> > I don't see how the current API could do that transparently - it does
> > track the files, but the user only gets a file descriptor. With just a
> > file descriptor, how could the code know to do reopen/seek when it's going
> > just through the regular fopen/fclose?
> >
> > Anyway, I agree we need to do something, to fix this corner case (many
> > serialized in-progress transactions). ISTM we have two options - either do
> > something in the context of reorderbuffer.c, or extend the transient file
> > API somehow. I'd say the second option is the right thing going forward,
> > because it does allow doing it transparently and without leaking details
> > about maxAllocatedDescs etc. There are two issues, though - it does
> > require changes / extensions to the API, and it's not backpatchable.
> >
> > So maybe we should start with the localized fix in reorderbuffer, and I
> > agree tracking offset seems reasonable.
> 
> We've already got code that knows how to track this sort of thing.
> You just need to go through the File abstraction (PathNameOpenFile or
> PathNameOpenFilePerm or OpenTemporaryFile) rather than using the
> functions that deal directly with fds (OpenTransientFile,
> BasicOpenFile, etc.).  It seems like it would be better to reuse the
> existing VFD layer than to invent a whole new one specific to logical
> replication.

Yea, I agree that that is the right fix.

Greetings,

Andres Freund

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Andres Freund

Date:

12 September 2019, 18:34:01

Hi,

On 2019-09-12 09:57:55 -0300, Alvaro Herrera wrote:
> On 2019-Sep-12, Tomas Vondra wrote:
> 
> > On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:
> > > On 2019-Sep-11, Amit Khandekar wrote:
> 
> > > I think doing this all the time would make restore very slow -- there's a
> > > reason we keep the files open, after all.
> > 
> > How much slower? It certainly will have a hit, but maybe it's negligible
> > compared to all the other stuff happening in this code?

I'd expect it to be significant.


> > As a sidenote - in the other thread about streaming, one of the patches
> > does change how we log subxact assignments. In the end, this allows using
> > just a single file for the top-level transaction, instead of having one
> > file per subxact. That would also solve this.

Uhm, how is rollback to savepoint going to be handled in that case? I
don't think it's great to just retain space for all rolled back
subtransactions.

Greetings,

Andres Freund

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Tomas Vondra

Date:

13 September 2019, 14:57:34

On Thu, Sep 12, 2019 at 11:34:01AM -0700, Andres Freund wrote:
>Hi,
>
>On 2019-09-12 09:57:55 -0300, Alvaro Herrera wrote:
>> On 2019-Sep-12, Tomas Vondra wrote:
>>
>> > On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:
>> > > On 2019-Sep-11, Amit Khandekar wrote:
>>
>> > > I think doing this all the time would make restore very slow -- there's a
>> > > reason we keep the files open, after all.
>> >
>> > How much slower? It certainly will have a hit, but maybe it's negligible
>> > compared to all the other stuff happening in this code?
>
>I'd expect it to be significant.
>
>
>> > As a sidenote - in the other thread about streaming, one of the patches
>> > does change how we log subxact assignments. In the end, this allows using
>> > just a single file for the top-level transaction, instead of having one
>> > file per subxact. That would also solve this.
>
>Uhm, how is rollback to savepoint going to be handled in that case? I
>don't think it's great to just retain space for all rolled back
>subtransactions.
>

The code can just do ftruncate() to the position of the subtransaction.
That's what the patch [1] does.

[1] https://commitfest.postgresql.org/24/1927/


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

13 September 2019, 15:58:12

On Thu, 12 Sep 2019 at 19:11, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Sep 12, 2019 at 5:31 AM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> > I don't see how the current API could do that transparently - it does
> > track the files, but the user only gets a file descriptor. With just a
> > file descriptor, how could the code know to do reopen/seek when it's going
> > just through the regular fopen/fclose?
> >
> > Anyway, I agree we need to do something, to fix this corner case (many
> > serialized in-progress transactions). ISTM we have two options - either do
> > something in the context of reorderbuffer.c, or extend the transient file
> > API somehow. I'd say the second option is the right thing going forward,
> > because it does allow doing it transparently and without leaking details
> > about maxAllocatedDescs etc. There are two issues, though - it does
> > require changes / extensions to the API, and it's not backpatchable.
> >
> > So maybe we should start with the localized fix in reorderbuffer, and I
> > agree tracking offset seems reasonable.
>

> We've already got code that knows how to track this sort of thing.

You mean tracking excess kernel fds right ? Yeah, we can use VFDs so
that excess fds are automatically closed. But Alvaro seems to be
talking in context of tracking of file seek position. VFD  does not
have a mechanism to track file offsets if one of the vfd cached file
is closed and reopened. Robert, are you suggesting to add this
capability to VFD ? I agree that we could do it, but for
back-patching, offhand I couldn't think of a simpler way.



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Tom Lane

Date:

13 September 2019, 16:14:32

Amit Khandekar <amitdkhan.pg@gmail.com> writes:
> You mean tracking excess kernel fds right ? Yeah, we can use VFDs so
> that excess fds are automatically closed. But Alvaro seems to be
> talking in context of tracking of file seek position. VFD  does not
> have a mechanism to track file offsets if one of the vfd cached file
> is closed and reopened.

Hm.  It used to, but somebody got rid of that on the theory that
we could use pread/pwrite instead.  I'm inclined to think that that
was the right tradeoff, but it'd mean that getting logical decoding
to adhere to the VFD API requires extra work to track file position
on the caller side.

Again, though, the advice that's been given here is that we should
fix logical decoding to use the VFD API as it stands, not change
that API.  I concur with that.

            regards, tom lane

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Robert Haas

Date:

13 September 2019, 16:31:38

On Fri, Sep 13, 2019 at 12:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Khandekar <amitdkhan.pg@gmail.com> writes:
> > You mean tracking excess kernel fds right ? Yeah, we can use VFDs so
> > that excess fds are automatically closed. But Alvaro seems to be
> > talking in context of tracking of file seek position. VFD  does not
> > have a mechanism to track file offsets if one of the vfd cached file
> > is closed and reopened.
>
> Hm.  It used to, but somebody got rid of that on the theory that
> we could use pread/pwrite instead.  I'm inclined to think that that
> was the right tradeoff, but it'd mean that getting logical decoding
> to adhere to the VFD API requires extra work to track file position
> on the caller side.

Oops.  I forgot that we'd removed that.

> Again, though, the advice that's been given here is that we should
> fix logical decoding to use the VFD API as it stands, not change
> that API.  I concur with that.

A reasonable position.  So I guess logical decoding has to track the
file position itself, but perhaps use the VFD layer for managing FD
pooling.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

14 September 2019, 15:05:25

On Fri, 13 Sep 2019 at 22:01, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Sep 13, 2019 at 12:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Again, though, the advice that's been given here is that we should
> > fix logical decoding to use the VFD API as it stands, not change
> > that API.  I concur with that.
>
> A reasonable position.  So I guess logical decoding has to track the
> file position itself, but perhaps use the VFD layer for managing FD
> pooling.

Yeah, something like the attached patch. I think this tracking of
offsets would have been cleaner if we add in-built support in VFD. But
yeah, for bank branches at least, we need to handle it outside of VFD.
Or may be we would add it if we find one more use-case.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment

use_vfd_for_logrep.patch

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Tom Lane

Date:

14 September 2019, 18:34:21

Amit Khandekar <amitdkhan.pg@gmail.com> writes:
> On Fri, 13 Sep 2019 at 22:01, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Sep 13, 2019 at 12:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Again, though, the advice that's been given here is that we should
>>> fix logical decoding to use the VFD API as it stands, not change
>>> that API.  I concur with that.

>> A reasonable position.  So I guess logical decoding has to track the
>> file position itself, but perhaps use the VFD layer for managing FD
>> pooling.

> Yeah, something like the attached patch. I think this tracking of
> offsets would have been cleaner if we add in-built support in VFD. But
> yeah, for bank branches at least, we need to handle it outside of VFD.
> Or may be we would add it if we find one more use-case.

Again, we had that and removed it, for what seem to me to be solid
reasons.  It adds cycles when we're forced to close/reopen a file,
and it also adds failure modes that we could do without (ie, failure
of either the ftell or the lseek, which are particularly nasty because
they shouldn't happen according to the VFD abstraction).  I do not
think there is going to be any argument strong enough to make us
put it back, especially not for non-mainstream callers like logical
decoding.

            regards, tom lane

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Andres Freund

Date:

17 September 2019, 15:49:21

Hi,

On 2019-09-14 14:34:21 -0400, Tom Lane wrote:
> Amit Khandekar <amitdkhan.pg@gmail.com> writes:
> > On Fri, 13 Sep 2019 at 22:01, Robert Haas <robertmhaas@gmail.com> wrote:
> >> On Fri, Sep 13, 2019 at 12:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> Again, though, the advice that's been given here is that we should
> >>> fix logical decoding to use the VFD API as it stands, not change
> >>> that API.  I concur with that.
> 
> >> A reasonable position.  So I guess logical decoding has to track the
> >> file position itself, but perhaps use the VFD layer for managing FD
> >> pooling.
> 
> > Yeah, something like the attached patch. I think this tracking of
> > offsets would have been cleaner if we add in-built support in VFD. But
> > yeah, for bank branches at least, we need to handle it outside of VFD.
> > Or may be we would add it if we find one more use-case.
> 
> Again, we had that and removed it, for what seem to me to be solid
> reasons.  It adds cycles when we're forced to close/reopen a file,
> and it also adds failure modes that we could do without (ie, failure
> of either the ftell or the lseek, which are particularly nasty because
> they shouldn't happen according to the VFD abstraction).  I do not
> think there is going to be any argument strong enough to make us
> put it back, especially not for non-mainstream callers like logical
> decoding.

Yea, I think that's the right call. Avoiding kernel seeks is quite
worthwhile, and we shouldn't undo it just because of this usecase. And
that'll become more and more important performance-wise (and has already
done so, with all the intel fixes making syscalls much slower).

I could see an argument for adding a separate generic layer providing
position tracking ontop of the VFD abstraction however. Seems quite
possible that there's some other parts of the system that could benefit
from using VFDs rather than plain fds. And they'd probably also need the
positional tracking.

Greetings,

Andres Freund

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

18 September 2019, 06:54:50

On Tue, 17 Sep 2019 at 21:19, Andres Freund <andres@anarazel.de> wrote:
> On 2019-09-14 14:34:21 -0400, Tom Lane wrote:
> > Amit Khandekar <amitdkhan.pg@gmail.com> writes:
> > > Yeah, something like the attached patch. I think this tracking of
> > > offsets would have been cleaner if we add in-built support in VFD. But
> > > yeah, for bank branches at least, we need to handle it outside of VFD.
> > > Or may be we would add it if we find one more use-case.
> >
> > Again, we had that and removed it, for what seem to me to be solid
> > reasons.  It adds cycles when we're forced to close/reopen a file,
> > and it also adds failure modes that we could do without (ie, failure
> > of either the ftell or the lseek, which are particularly nasty because
> > they shouldn't happen according to the VFD abstraction).

Ok. So you mean, when the caller would call FileRead() for sequential
reading, underneath VFD would do a pread(), but if pread() returns
error, the errno can belong to read() or it might as well belong to
lseek(). If it's due to lseek(), it's not expected from the caller
because for the caller it's just a sequential read. Yeah, makes sense.

>>  I do not
> > think there is going to be any argument strong enough to make us
> > put it back, especially not for non-mainstream callers like logical
> > decoding.

Ok. Also, more about putting back is in the below comments ...

>
> Yea, I think that's the right call. Avoiding kernel seeks is quite
> worthwhile, and we shouldn't undo it just because of this usecase. And
> that'll become more and more important performance-wise (and has already
> done so, with all the intel fixes making syscalls much slower).

By the way, I was not thinking about adding back the read() and
lseek() calls. I was saying we continue to use the pread() call, so
it's just a single system call. FileReadAt(..., offset) would do
pread() with user-supplied offset, and FileRead() would do pread()
using internally tracked offset. So for the user, FileReadAt() is like
pread(), and FileRead() would be like read().

But I agree with Tom's objection about having to unnecessarily handle
lseek error codes.

>
> I could see an argument for adding a separate generic layer providing
> position tracking ontop of the VFD abstraction however. Seems quite
> possible that there's some other parts of the system that could benefit
> from using VFDs rather than plain fds. And they'd probably also need the
> positional tracking.

Yeah, that also could be done.

Probably, for now at least, what everyone seems to agree is to take my
earlier attached patch forward.

I am going to see if I can add a TAP test for the patch, and will add
the patch into the commitfest soon.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

03 October 2019, 11:17:19

On Wed, 18 Sep 2019 at 12:24, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> Probably, for now at least, what everyone seems to agree is to take my
> earlier attached patch forward.
>
> I am going to see if I can add a TAP test for the patch, and will add
> the patch into the commitfest soon.

Attached is an updated patch v2. Has a new test scenario added in
contrib/test_decoding/sql/spill test, and some minor code cleanup.

Going to add this into Nov commitfest.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment

use_vfd_for_logrep_v2.patch

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

18 November 2019, 11:50:30

On Thu, Oct 3, 2019 at 4:48 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Wed, 18 Sep 2019 at 12:24, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> > Probably, for now at least, what everyone seems to agree is to take my
> > earlier attached patch forward.
> >
> > I am going to see if I can add a TAP test for the patch, and will add
> > the patch into the commitfest soon.
>
> Attached is an updated patch v2.
>

I see that you have made changes in ReorderBufferRestoreChanges to use
PathNameOpenFile, but not in ReorderBufferSerializeTXN.  Is there a
reason for the same?  In my test environment, with the test provided
by you, I got the error (reported in this thread) via
ReorderBufferSerializeTXN.  See call stack below:

!errfinish(int dummy=0, ...)  Line 441 C
!OpenTransientFilePerm(const char * fileName=0x012deeac, int
fileFlags=33033, unsigned short fileMode=384)  Line 2272 + 0x57 bytes
C
!OpenTransientFile(const char * fileName=0x012deeac, int
fileFlags=33033)  Line 2256 + 0x15 bytes C
!ReorderBufferSerializeTXN(ReorderBuffer * rb=0x01ee4d80,
ReorderBufferTXN * txn=0x1f9a6ce8)  Line 2302 + 0x11 bytes C
!ReorderBufferIterTXNInit(ReorderBuffer * rb=0x01ee4d80,
ReorderBufferTXN * txn=0x01f08f80)  Line 1044 + 0xd bytes C


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

18 November 2019, 12:20:22

On Mon, 18 Nov 2019 at 17:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
> I see that you have made changes in ReorderBufferRestoreChanges to use
> PathNameOpenFile, but not in ReorderBufferSerializeTXN.  Is there a
> reason for the same?  In my test environment, with the test provided
> by you, I got the error (reported in this thread) via
> ReorderBufferSerializeTXN.

You didn't get this error with the patch applied, did you ?

If you were debugging this without the patch applied, I suspect that
the reason why ReorderBufferSerializeTXN() => OpenTransientFile() is
generating this error is because the max limit must be already crossed
because of earlier calls to ReorderBufferRestoreChanges().

Note that in ReorderBufferSerializeTXN(), OpenTransientFile() is
sufficient because the code in that function has made sure the fd gets
closed there itself.

If you are getting this error even with the patch applied, then this
needs investigation.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

18 November 2019, 12:22:15

On Mon, Nov 18, 2019 at 5:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Oct 3, 2019 at 4:48 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> >
> > On Wed, 18 Sep 2019 at 12:24, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> > > Probably, for now at least, what everyone seems to agree is to take my
> > > earlier attached patch forward.
> > >
> > > I am going to see if I can add a TAP test for the patch, and will add
> > > the patch into the commitfest soon.
> >
> > Attached is an updated patch v2.
> >
>
> I see that you have made changes in ReorderBufferRestoreChanges to use
> PathNameOpenFile, but not in ReorderBufferSerializeTXN.  Is there a
> reason for the same?
>

I have one more question regarding this patch.   It seems to me that
the files opened via OpenTransientFile or OpenTemporaryFile are
automatically closed at transaction end(abort), but that doesn't seem
to be the case for files opened with PathNameOpenFile.  See
AtEOXact_Files and AtEOSubXact_Files.  So, now with the change
proposed by this patch, don't we need to deal it in some other way?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

18 November 2019, 12:59:43

On Mon, 18 Nov 2019 at 17:52, Amit Kapila <amit.kapila16@gmail.com> wrote:
> I have one more question regarding this patch.   It seems to me that
> the files opened via OpenTransientFile or OpenTemporaryFile are
> automatically closed at transaction end(abort), but that doesn't seem
> to be the case for files opened with PathNameOpenFile.  See
> AtEOXact_Files and AtEOSubXact_Files.  So, now with the change
> proposed by this patch, don't we need to deal it in some other way?

For the API's that use VFDs (like PathNameOpenFile), the files opened
are always recorded in the VfdCache array. So it is not required to do
the cleanup at (sub)transaction end, because the kernel fds get closed
dynamically in ReleaseLruFiles() whenever they reach max_safe_fds
limit. So if a transaction aborts, the fds might remain open, but
those will get cleaned up whenever we require more fds, through
ReleaseLruFiles(). Whereas, for files opened through
OpenTransientFile(), VfdCache is not involved, so this needs
transaction end cleanup.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

19 November 2019, 08:37:46

On Mon, Nov 18, 2019 at 5:50 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Mon, 18 Nov 2019 at 17:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I see that you have made changes in ReorderBufferRestoreChanges to use
> > PathNameOpenFile, but not in ReorderBufferSerializeTXN.  Is there a
> > reason for the same?  In my test environment, with the test provided
> > by you, I got the error (reported in this thread) via
> > ReorderBufferSerializeTXN.
>
> You didn't get this error with the patch applied, did you ?
>

No, I got this before applying the patch.  However, after applying the
patch, I got below error in the same test:

postgres=# SELECT 1 from
pg_logical_slot_get_changes('regression_slot', NULL,NULL) LIMIT 1;
ERROR:  could not read from reorderbuffer spill file: Invalid argument

It seems to me that FileRead API used in the patch can return value <
0 on  EOF.  See the API usage in BufFileLoadBuffer.  I got this error
on a windows machine and in the server log the message was "LOG:
unrecognized win32 error code: 38" which indicates "Reached the end of
the file."

> If you were debugging this without the patch applied, I suspect that
> the reason why ReorderBufferSerializeTXN() => OpenTransientFile() is
> generating this error is because the max limit must be already crossed
> because of earlier calls to ReorderBufferRestoreChanges().
>
> Note that in ReorderBufferSerializeTXN(), OpenTransientFile() is
> sufficient because the code in that function has made sure the fd gets
> closed there itself.
>

Okay, then we might not need it there, but we should at least add a
comment in ReorderBufferRestoreChanges to explain why we have used a
different function to operate on the file at that place.

>
> For the API's that use VFDs (like PathNameOpenFile), the files opened
> are always recorded in the VfdCache array. So it is not required to do
> the cleanup at (sub)transaction end, because the kernel fds get closed
> dynamically in ReleaseLruFiles() whenever they reach max_safe_fds
> limit. So if a transaction aborts, the fds might remain open, but
> those will get cleaned up whenever we require more fds, through
> ReleaseLruFiles(). Whereas, for files opened through
> OpenTransientFile(), VfdCache is not involved, so this needs
> transaction end cleanup.
>

Have you tried by injecting some error?  After getting the error
mentioned above in email, when I retried the same query, I got the
below message.

postgres=# SELECT 1 from
pg_logical_slot_get_changes('regression_slot', NULL,NULL) LIMIT 1;
ERROR:  could not remove file
"pg_replslot/regression_slot/xid-1693-lsn-0-18000000.spill" during
removal of pg_replslot/regression_slot/xid*: Permission denied

And, then I tried to drop the replication slot and I got below error.
postgres=# SELECT * FROM pg_drop_replication_slot('regression_slot');
ERROR:  could not rename file "pg_replslot/regression_slot" to
"pg_replslot/regression_slot.tmp": Permission denied

It might be something related to Windows, but you can once try by
injecting some error after reading a few files in the code path and
see the behavior.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

19 November 2019, 11:28:18

On Tue, 19 Nov 2019 at 14:07, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 18, 2019 at 5:50 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> >
> > On Mon, 18 Nov 2019 at 17:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > I see that you have made changes in ReorderBufferRestoreChanges to use
> > > PathNameOpenFile, but not in ReorderBufferSerializeTXN.  Is there a
> > > reason for the same?  In my test environment, with the test provided
> > > by you, I got the error (reported in this thread) via
> > > ReorderBufferSerializeTXN.
> >
> > You didn't get this error with the patch applied, did you ?
> >
>
> No, I got this before applying the patch.  However, after applying the
> patch, I got below error in the same test:
>
> postgres=# SELECT 1 from
> pg_logical_slot_get_changes('regression_slot', NULL,NULL) LIMIT 1;
> ERROR:  could not read from reorderbuffer spill file: Invalid argument
>
> It seems to me that FileRead API used in the patch can return value <
> 0 on  EOF.  See the API usage in BufFileLoadBuffer.  I got this error
> on a windows machine and in the server log the message was "LOG:
> unrecognized win32 error code: 38" which indicates "Reached the end of
> the file."

On Windows, it is documented that ReadFile() (which is called by
pg_pread) will return false on EOF but only when the file is open for
asynchronous reads/writes. But here we are just dealing with usual
synchronous reads. So pg_pread() code should indeed return 0 on EOF on
Windows. Not yet able to figure out how FileRead() managed to return
this error on Windows. But from your symptoms, it does look like
pg_pread()=>ReadFile() returned false (despite doing asynchronous
reads), and so _dosmaperr() gets called, and then it does not find the
eof error in doserrors[], so the "unrecognized win32 error code"
message is printed. May have to dig up more on this.


>
> > If you were debugging this without the patch applied, I suspect that
> > the reason why ReorderBufferSerializeTXN() => OpenTransientFile() is
> > generating this error is because the max limit must be already crossed
> > because of earlier calls to ReorderBufferRestoreChanges().
> >
> > Note that in ReorderBufferSerializeTXN(), OpenTransientFile() is
> > sufficient because the code in that function has made sure the fd gets
> > closed there itself.
> >
>
> Okay, then we might not need it there, but we should at least add a
> comment in ReorderBufferRestoreChanges to explain why we have used a
> different function to operate on the file at that place.

Yeah, that might make sense.

>
> >
> > For the API's that use VFDs (like PathNameOpenFile), the files opened
> > are always recorded in the VfdCache array. So it is not required to do
> > the cleanup at (sub)transaction end, because the kernel fds get closed
> > dynamically in ReleaseLruFiles() whenever they reach max_safe_fds
> > limit. So if a transaction aborts, the fds might remain open, but
> > those will get cleaned up whenever we require more fds, through
> > ReleaseLruFiles(). Whereas, for files opened through
> > OpenTransientFile(), VfdCache is not involved, so this needs
> > transaction end cleanup.
> >
>
> Have you tried by injecting some error?  After getting the error
> mentioned above in email, when I retried the same query, I got the
> below message.
>
> postgres=# SELECT 1 from
> pg_logical_slot_get_changes('regression_slot', NULL,NULL) LIMIT 1;
> ERROR:  could not remove file
> "pg_replslot/regression_slot/xid-1693-lsn-0-18000000.spill" during
> removal of pg_replslot/regression_slot/xid*: Permission denied
>
> And, then I tried to drop the replication slot and I got below error.
> postgres=# SELECT * FROM pg_drop_replication_slot('regression_slot');
> ERROR:  could not rename file "pg_replslot/regression_slot" to
> "pg_replslot/regression_slot.tmp": Permission denied
>
> It might be something related to Windows

Oh ok, I missed the fact that on Windows we can't delete the files
that are already open, unlike Linux/Unix.
I guess, I may have to use FD_CLOSE_AT_EOXACT flags; or simply use
OpenTemporaryFile(). I wonder though if this same issue might come up
for the other use-case of PathNameOpenFile() :
logical_rewrite_log_mapping().

> but you can once try by
> injecting some error after reading a few files in the code path and
> see the behavior.
Yeah, will check the behaviour, although on Linux, I think I won't get
this error. But yes, like I mentioned above, I think we might have to
arrange for something.



--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Thomas Munro

Date:

19 November 2019, 11:49:02

On Wed, Nov 20, 2019 at 12:28 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> On Tue, 19 Nov 2019 at 14:07, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > No, I got this before applying the patch.  However, after applying the
> > patch, I got below error in the same test:
> >
> > postgres=# SELECT 1 from
> > pg_logical_slot_get_changes('regression_slot', NULL,NULL) LIMIT 1;
> > ERROR:  could not read from reorderbuffer spill file: Invalid argument
> >
> > It seems to me that FileRead API used in the patch can return value <
> > 0 on  EOF.  See the API usage in BufFileLoadBuffer.  I got this error
> > on a windows machine and in the server log the message was "LOG:
> > unrecognized win32 error code: 38" which indicates "Reached the end of
> > the file."
>
> On Windows, it is documented that ReadFile() (which is called by
> pg_pread) will return false on EOF but only when the file is open for
> asynchronous reads/writes. But here we are just dealing with usual
> synchronous reads. So pg_pread() code should indeed return 0 on EOF on
> Windows. Not yet able to figure out how FileRead() managed to return
> this error on Windows. But from your symptoms, it does look like
> pg_pread()=>ReadFile() returned false (despite doing asynchronous
> reads), and so _dosmaperr() gets called, and then it does not find the
> eof error in doserrors[], so the "unrecognized win32 error code"
> message is printed. May have to dig up more on this.

Hmm.  See also this report:

https://www.postgresql.org/message-id/flat/CABuU89MfEvJE%3DWif%2BHk7SCqjSOF4rhgwJWW6aR3hjojpGqFbjQ%40mail.gmail.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Juan José Santamaría Flecha

Date:

19 November 2019, 12:14:22

On Tue, Nov 19, 2019 at 12:49 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Wed, Nov 20, 2019 at 12:28 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> On Tue, 19 Nov 2019 at 14:07, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > No, I got this before applying the patch. However, after applying the
> > patch, I got below error in the same test:
> >
> > postgres=# SELECT 1 from
> > pg_logical_slot_get_changes('regression_slot', NULL,NULL) LIMIT 1;
> > ERROR: could not read from reorderbuffer spill file: Invalid argument
> >
> > It seems to me that FileRead API used in the patch can return value <
> > 0 on EOF. See the API usage in BufFileLoadBuffer. I got this error
> > on a windows machine and in the server log the message was "LOG:
> > unrecognized win32 error code: 38" which indicates "Reached the end of
> > the file."
>
> On Windows, it is documented that ReadFile() (which is called by
> pg_pread) will return false on EOF but only when the file is open for
> asynchronous reads/writes. But here we are just dealing with usual
> synchronous reads. So pg_pread() code should indeed return 0 on EOF on
> Windows. Not yet able to figure out how FileRead() managed to return
> this error on Windows. But from your symptoms, it does look like
> pg_pread()=>ReadFile() returned false (despite doing asynchronous
> reads), and so _dosmaperr() gets called, and then it does not find the
> eof error in doserrors[], so the "unrecognized win32 error code"
> message is printed. May have to dig up more on this.

Hmm. See also this report:

https://www.postgresql.org/message-id/flat/CABuU89MfEvJE%3DWif%2BHk7SCqjSOF4rhgwJWW6aR3hjojpGqFbjQ%40mail.gmail.com

The files from pgwin32_open() are open for synchronous access, while pg_pread() uses the asynchronous functionality to offset the read. Under these circunstances, a read past EOF will return ERROR_HANDLE_EOF (38), as explained in:

https://devblogs.microsoft.com/oldnewthing/20150121-00/?p=44863

Regards,

Juan José Santamaría Flecha

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Thomas Munro

Date:

19 November 2019, 18:58:16

On Wed, Nov 20, 2019 at 1:14 AM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:
> On Tue, Nov 19, 2019 at 12:49 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>> On Wed, Nov 20, 2019 at 12:28 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>> > On Windows, it is documented that ReadFile() (which is called by
>> > pg_pread) will return false on EOF but only when the file is open for
>> > asynchronous reads/writes. But here we are just dealing with usual
>> > synchronous reads. So pg_pread() code should indeed return 0 on EOF on
>> > Windows. Not yet able to figure out how FileRead() managed to return
>> > this error on Windows. But from your symptoms, it does look like
>> > pg_pread()=>ReadFile() returned false (despite doing asynchronous
>> > reads), and so _dosmaperr() gets called, and then it does not find the
>> > eof error in doserrors[], so the "unrecognized win32 error code"
>> > message is printed. May have to dig up more on this.
>>
>> Hmm.  See also this report:
>>
>> https://www.postgresql.org/message-id/flat/CABuU89MfEvJE%3DWif%2BHk7SCqjSOF4rhgwJWW6aR3hjojpGqFbjQ%40mail.gmail.com
>>
>
> The files from pgwin32_open() are open for synchronous access, while pg_pread() uses the asynchronous functionality
tooffset the read. Under these circunstances, a read past EOF will return ERROR_HANDLE_EOF (38), as explained in: 

Oh, thanks.

> https://devblogs.microsoft.com/oldnewthing/20150121-00/?p=44863

!?!

Amit, since it looks like you are Windows-enabled and have a repro,
would you mind confirming that this fixes the problem?

--- a/src/port/pread.c
+++ b/src/port/pread.c
@@ -41,6 +41,9 @@ pg_pread(int fd, void *buf, size_t size, off_t offset)
        overlapped.Offset = offset;
        if (!ReadFile(handle, buf, size, &result, &overlapped))
        {
+               if (GetLastError() == ERROR_HANDLE_EOF)
+                       return 0;
+
                _dosmaperr(GetLastError());
                return -1;
        }

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Thomas Munro

Date:

19 November 2019, 19:34:39

On Wed, Nov 20, 2019 at 7:58 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Wed, Nov 20, 2019 at 1:14 AM Juan José Santamaría Flecha
> > https://devblogs.microsoft.com/oldnewthing/20150121-00/?p=44863
>
> !?!

One thing I don't understand (besides, apparently, the documentation):
how did this problem escape detection by check-world for such a long
time?  Surely we expect to hit the end of various temporary files in
various tests.  Is it intermittent, or dependent on Windows version,
or something like that?

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

20 November 2019, 03:54:03

On Wed, 20 Nov 2019 at 01:05, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Wed, Nov 20, 2019 at 7:58 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > On Wed, Nov 20, 2019 at 1:14 AM Juan José Santamaría Flecha
> > > https://devblogs.microsoft.com/oldnewthing/20150121-00/?p=44863
> >
> > !?!

Thanks Juan and Thomas for pointing to these links where already this
was discussed.

>
> One thing I don't understand (besides, apparently, the documentation):
> how did this problem escape detection by check-world for such a long
> time?  Surely we expect to hit the end of various temporary files in
> various tests.  Is it intermittent, or dependent on Windows version,
> or something like that?

Possibly there aren't any callers who try to pread() at end-of-file
using FileRead/pg_pread :

- mdread() seems to read from an offset which it seems to know that it
is inside the end-of file, including the whole BLCKSZ.
- BufFileLoadBuffer() seems to deliberately ignore FileRead()'s return
value if it is -1
    if (file->nbytes < 0) file->nbytes = 0;
- XLogPageRead() also seems to know that the offset is a valid offset.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

20 November 2019, 04:44:23

On Wed, Nov 20, 2019 at 12:28 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Wed, Nov 20, 2019 at 1:14 AM Juan José Santamaría Flecha
> <juanjo.santamaria@gmail.com> wrote:
> > On Tue, Nov 19, 2019 at 12:49 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> >> On Wed, Nov 20, 2019 at 12:28 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> >> > On Windows, it is documented that ReadFile() (which is called by
> >> > pg_pread) will return false on EOF but only when the file is open for
> >> > asynchronous reads/writes. But here we are just dealing with usual
> >> > synchronous reads. So pg_pread() code should indeed return 0 on EOF on
> >> > Windows. Not yet able to figure out how FileRead() managed to return
> >> > this error on Windows. But from your symptoms, it does look like
> >> > pg_pread()=>ReadFile() returned false (despite doing asynchronous
> >> > reads), and so _dosmaperr() gets called, and then it does not find the
> >> > eof error in doserrors[], so the "unrecognized win32 error code"
> >> > message is printed. May have to dig up more on this.
> >>
> >> Hmm.  See also this report:
> >>
> >>
https://www.postgresql.org/message-id/flat/CABuU89MfEvJE%3DWif%2BHk7SCqjSOF4rhgwJWW6aR3hjojpGqFbjQ%40mail.gmail.com
> >>
> >
> > The files from pgwin32_open() are open for synchronous access, while pg_pread() uses the asynchronous functionality
tooffset the read. Under these circunstances, a read past EOF will return ERROR_HANDLE_EOF (38), as explained in: 
>
> Oh, thanks.
>
> > https://devblogs.microsoft.com/oldnewthing/20150121-00/?p=44863
>
> !?!
>
> Amit, since it looks like you are Windows-enabled and have a repro,
> would you mind confirming that this fixes the problem?
>
> --- a/src/port/pread.c
> +++ b/src/port/pread.c
> @@ -41,6 +41,9 @@ pg_pread(int fd, void *buf, size_t size, off_t offset)
>         overlapped.Offset = offset;
>         if (!ReadFile(handle, buf, size, &result, &overlapped))
>         {
> +               if (GetLastError() == ERROR_HANDLE_EOF)
> +                       return 0;
> +
>                 _dosmaperr(GetLastError());
>                 return -1;
>         }

Yes, this works for me.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Thomas Munro

Date:

20 November 2019, 04:48:41

On Wed, Nov 20, 2019 at 4:54 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> - BufFileLoadBuffer() seems to deliberately ignore FileRead()'s return
> value if it is -1
>     if (file->nbytes < 0) file->nbytes = 0;

Ok, that's a different problem we need to fix then.  But it does
explain how we didn't know.  And sure enough there is "unrecognized
win32 error code: 38" LOG-spam on the build farm, at places where
tuplestores are expected:

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=whelk&dt=2019-11-20%2002%3A41%3A41&stg=check

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

20 November 2019, 05:16:13

On Tue, Nov 19, 2019 at 4:58 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Tue, 19 Nov 2019 at 14:07, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Nov 18, 2019 at 5:50 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> > >
> > > For the API's that use VFDs (like PathNameOpenFile), the files opened
> > > are always recorded in the VfdCache array. So it is not required to do
> > > the cleanup at (sub)transaction end, because the kernel fds get closed
> > > dynamically in ReleaseLruFiles() whenever they reach max_safe_fds
> > > limit. So if a transaction aborts, the fds might remain open, but
> > > those will get cleaned up whenever we require more fds, through
> > > ReleaseLruFiles(). Whereas, for files opened through
> > > OpenTransientFile(), VfdCache is not involved, so this needs
> > > transaction end cleanup.
> > >
> >
> > Have you tried by injecting some error?  After getting the error
> > mentioned above in email, when I retried the same query, I got the
> > below message.
> >
> > postgres=# SELECT 1 from
> > pg_logical_slot_get_changes('regression_slot', NULL,NULL) LIMIT 1;
> > ERROR:  could not remove file
> > "pg_replslot/regression_slot/xid-1693-lsn-0-18000000.spill" during
> > removal of pg_replslot/regression_slot/xid*: Permission denied
> >
> > And, then I tried to drop the replication slot and I got below error.
> > postgres=# SELECT * FROM pg_drop_replication_slot('regression_slot');
> > ERROR:  could not rename file "pg_replslot/regression_slot" to
> > "pg_replslot/regression_slot.tmp": Permission denied
> >
> > It might be something related to Windows
>
> Oh ok, I missed the fact that on Windows we can't delete the files
> that are already open, unlike Linux/Unix.
> I guess, I may have to use FD_CLOSE_AT_EOXACT flags; or simply use
> OpenTemporaryFile().
>

I think setting FD_CLOSE_AT_EOXACT won't work unless you also set
have_xact_temporary_files because it checks that flag in
CleanupTempFiles.  Also, OpenTemporaryFile() doesn't take the input
file path, so how will you use it?

> I wonder though if this same issue might come up
> for the other use-case of PathNameOpenFile() :
> logical_rewrite_log_mapping().
>

It is possible, but I haven't tested that path.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Thomas Munro

Date:

20 November 2019, 05:33:44

On Wed, Nov 20, 2019 at 5:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Nov 20, 2019 at 12:28 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > +               if (GetLastError() == ERROR_HANDLE_EOF)
> > +                       return 0;

> Yes, this works for me.

Thanks, pushed.

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

20 November 2019, 07:40:06

On Tue, Nov 19, 2019 at 4:58 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Tue, 19 Nov 2019 at 14:07, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > Have you tried by injecting some error?  After getting the error
> > mentioned above in email, when I retried the same query, I got the
> > below message.
> >
> > postgres=# SELECT 1 from
> > pg_logical_slot_get_changes('regression_slot', NULL,NULL) LIMIT 1;
> > ERROR:  could not remove file
> > "pg_replslot/regression_slot/xid-1693-lsn-0-18000000.spill" during
> > removal of pg_replslot/regression_slot/xid*: Permission denied
> >
> > And, then I tried to drop the replication slot and I got below error.
> > postgres=# SELECT * FROM pg_drop_replication_slot('regression_slot');
> > ERROR:  could not rename file "pg_replslot/regression_slot" to
> > "pg_replslot/regression_slot.tmp": Permission denied
> >
> > It might be something related to Windows
>
> Oh ok, I missed the fact that on Windows we can't delete the files
> that are already open, unlike Linux/Unix.
>

See comment in pgunlink() "We need to loop because even though
PostgreSQL uses flags that allow unlink while the file is open, other
applications might have the file
open without those flags.".  Can you once see if there is any flag
that you have missed to pass to allow this?  If there is nothing we
can do about it, then we might need to use some different API or maybe
define a new API that can handle this.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

20 November 2019, 08:32:33

On Wed, 20 Nov 2019 at 13:10, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Nov 19, 2019 at 4:58 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> >
> > On Tue, 19 Nov 2019 at 14:07, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > Have you tried by injecting some error?  After getting the error
> > > mentioned above in email, when I retried the same query, I got the
> > > below message.
> > >
> > > postgres=# SELECT 1 from
> > > pg_logical_slot_get_changes('regression_slot', NULL,NULL) LIMIT 1;
> > > ERROR:  could not remove file
> > > "pg_replslot/regression_slot/xid-1693-lsn-0-18000000.spill" during
> > > removal of pg_replslot/regression_slot/xid*: Permission denied
> > >
> > > And, then I tried to drop the replication slot and I got below error.
> > > postgres=# SELECT * FROM pg_drop_replication_slot('regression_slot');
> > > ERROR:  could not rename file "pg_replslot/regression_slot" to
> > > "pg_replslot/regression_slot.tmp": Permission denied
> > >
> > > It might be something related to Windows
> >
> > Oh ok, I missed the fact that on Windows we can't delete the files
> > that are already open, unlike Linux/Unix.
> >
>
> See comment in pgunlink() "We need to loop because even though
> PostgreSQL uses flags that allow unlink while the file is open, other
> applications might have the file
> open without those flags.".  Can you once see if there is any flag
> that you have missed to pass to allow this?  If there is nothing we
> can do about it, then we might need to use some different API or maybe
> define a new API that can handle this.

Hmm, looks like there is one such flag: FILE_SHARE_DELETE. When file
is opened with this flag, other processes can delete as well as rename
the file.

But it turns out that in pgwin32_open(), we already use
FILE_SHARE_DELETE. So, this is again confusing.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

20 November 2019, 08:48:08

On Wed, 20 Nov 2019 at 13:10, Amit Kapila <amit.kapila16@gmail.com> wrote:
> See comment in pgunlink() "We need to loop because even though
> PostgreSQL uses flags that allow unlink while the file is open, other
> applications might have the file
> open without those flags.".  Can you once see if there is any flag
> that you have missed to pass to allow this?

> If there is nothing we
> can do about it, then we might need to use some different API or maybe
> define a new API that can handle this.

There were objections against modifying the vfd api only for this
replication-related use-case. Having a new API will require all the
changes required to enable the virtual FDs feature that we need from
vfd. If nothing works out from the FILE_SHARE_DELETE thing, I am
thinking, we can use VFD, plus we can keep track of per-subtransaction
vfd handles, and do something similar to AtEOSubXact_Files().

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Juan José Santamaría Flecha

Date:

20 November 2019, 12:11:26

On Wed, Nov 20, 2019 at 9:48 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On Wed, 20 Nov 2019 at 13:10, Amit Kapila <amit.kapila16@gmail.com> wrote:
> See comment in pgunlink() "We need to loop because even though
> PostgreSQL uses flags that allow unlink while the file is open, other
> applications might have the file
> open without those flags.". Can you once see if there is any flag
> that you have missed to pass to allow this?

> If there is nothing we
> can do about it, then we might need to use some different API or maybe
> define a new API that can handle this.

There were objections against modifying the vfd api only for this
replication-related use-case. Having a new API will require all the
changes required to enable the virtual FDs feature that we need from
vfd. If nothing works out from the FILE_SHARE_DELETE thing, I am
thinking, we can use VFD, plus we can keep track of per-subtransaction
vfd handles, and do something similar to AtEOSubXact_Files().

The comment about "other applications might have the file open without those flags." is surely due to systems working with an antivirus touching Postgres files.

I was not able to reproduce the Permission denied error with current HEAD, until I opened another CMD inside the "pg_replslot/regression_slot" folder. This will be problematic, is the deletion of the folder actually needed?

Regards,

Juan José Santamaría Flecha

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Alvaro Herrera

Date:

20 November 2019, 13:54:11

On 2019-Nov-20, Juan José Santamaría Flecha wrote:

> I was not able to reproduce the Permission denied error with current HEAD,
> until I opened another CMD inside the "pg_replslot/regression_slot" folder.
> This will be problematic, is the deletion of the folder actually needed?

Yes :-(  The code assumes that if the directory is there, then it's
valid.  Trying to remove that assumption is probably a more invasive
fix.

I think ReplicationSlotDropAcquired is too pessimistic (no recourse if
the rename fails) and too optimistic (this will almost never happen).
We could change it so that the rename is retried a few times, and avoid
the failure.  (Naturally, the rmtree should also be retried.)  The code
seems written with the POSIX semantics in mind, but it seems easy to
improve.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

21 November 2019, 03:24:40

On Wed, 20 Nov 2019 at 19:24, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> On 2019-Nov-20, Juan José Santamaría Flecha wrote:
>
> > I was not able to reproduce the Permission denied error with current HEAD,
> > until I opened another CMD inside the "pg_replslot/regression_slot" folder.
> > This will be problematic, is the deletion of the folder actually needed?
>
> Yes :-(  The code assumes that if the directory is there, then it's
> valid.  Trying to remove that assumption is probably a more invasive
> fix.
>
> I think ReplicationSlotDropAcquired is too pessimistic (no recourse if
> the rename fails) and too optimistic (this will almost never happen).
> We could change it so that the rename is retried a few times, and avoid
> the failure.  (Naturally, the rmtree should also be retried.)  The code
> seems written with the POSIX semantics in mind, but it seems easy to
> improve.

Just to be clear, there are two issues being discussed here :

1. Issue with the patch, where pg_replslot/slotname/xid-*.spill files
can't be removed because the same backend process has left these files
opened because of an abort. This is happening despite the file being
opened using FILE_SHARE_DELETE flag. I am going to investigate
(possibly the flag is not applicable in case a single process is
involved)

2. This existing issue where pg_replslot/slotname directory removal
will fail if someone else is accessing this directory. This has
nothing to do with the patch.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

21 November 2019, 04:02:07

On Wed, Nov 20, 2019 at 5:41 PM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:
>
> On Wed, Nov 20, 2019 at 9:48 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>>
>> On Wed, 20 Nov 2019 at 13:10, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> > See comment in pgunlink() "We need to loop because even though
>> > PostgreSQL uses flags that allow unlink while the file is open, other
>> > applications might have the file
>> > open without those flags.".  Can you once see if there is any flag
>> > that you have missed to pass to allow this?
>>
>> > If there is nothing we
>> > can do about it, then we might need to use some different API or maybe
>> > define a new API that can handle this.
>>
>> There were objections against modifying the vfd api only for this
>> replication-related use-case. Having a new API will require all the
>> changes required to enable the virtual FDs feature that we need from
>> vfd. If nothing works out from the FILE_SHARE_DELETE thing, I am
>> thinking, we can use VFD, plus we can keep track of per-subtransaction
>> vfd handles, and do something similar to AtEOSubXact_Files().
>>
>
> The comment about "other applications might have the file open without those flags." is surely due to systems working
withan antivirus touching Postgres files. 
>
> I was not able to reproduce the Permission denied error with current HEAD,
>

I am not sure what exactly you tried.  Can you share the steps and
your environment details?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

21 November 2019, 04:56:52

On Wed, Nov 20, 2019 at 2:18 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Wed, 20 Nov 2019 at 13:10, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > See comment in pgunlink() "We need to loop because even though
> > PostgreSQL uses flags that allow unlink while the file is open, other
> > applications might have the file
> > open without those flags.".  Can you once see if there is any flag
> > that you have missed to pass to allow this?
>
> > If there is nothing we
> > can do about it, then we might need to use some different API or maybe
> > define a new API that can handle this.
>
> There were objections against modifying the vfd api only for this
> replication-related use-case. Having a new API will require all the
> changes required to enable the virtual FDs feature that we need from
> vfd. If nothing works out from the FILE_SHARE_DELETE thing,
>

While experimenting with FILE_SHARE_DELETE, I think you can once try
to open/close the file before unlink.  If you read the specs [1] of
this flag, it seems they allow you to open the file with delete access
even when it is already opened by someone else.  I am not sure if that
is helpful, but at least we can try out.

> I am
> thinking, we can use VFD, plus we can keep track of per-subtransaction
> vfd handles, and do something similar to AtEOSubXact_Files().
>

I think if we can't make the current API work, then it is better to
sketch the design for this approach and probably the design of new API
using existing infrastructure.  Then we can see which approach people
prefer.

[1] - Enables subsequent open operations on a file or device to
request delete access.  Otherwise, other processes cannot open the
file or device if they request delete access.
(https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilea)

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Juan José Santamaría Flecha

Date:

21 November 2019, 15:02:18

On Thu, Nov 21, 2019 at 5:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Nov 20, 2019 at 5:41 PM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:
>
> On Wed, Nov 20, 2019 at 9:48 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>>
>> On Wed, 20 Nov 2019 at 13:10, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> > See comment in pgunlink() "We need to loop because even though
>> > PostgreSQL uses flags that allow unlink while the file is open, other
>> > applications might have the file
>> > open without those flags.". Can you once see if there is any flag
>> > that you have missed to pass to allow this?
>>
>> > If there is nothing we
>> > can do about it, then we might need to use some different API or maybe
>> > define a new API that can handle this.
>>
>> There were objections against modifying the vfd api only for this
>> replication-related use-case. Having a new API will require all the
>> changes required to enable the virtual FDs feature that we need from
>> vfd. If nothing works out from the FILE_SHARE_DELETE thing, I am
>> thinking, we can use VFD, plus we can keep track of per-subtransaction
>> vfd handles, and do something similar to AtEOSubXact_Files().
>>
>
> The comment about "other applications might have the file open without those flags." is surely due to systems working with an antivirus touching Postgres files.
>
> I was not able to reproduce the Permission denied error with current HEAD,
>

I am not sure what exactly you tried. Can you share the steps and
your environment details?

Sure, I was trying to reproduce the Permission denied error after the ERROR_HANDLE_EOF fix.

1. Using a clean environment [1] the spill.sql script produces the expected output.

2. I manually injected a negative value for readBytes after @@ -2611,10 +2627,11 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn. In doing so pg_logical_slot_get_changes() failed, but following executions did not run into Permission denied.

3. During the cleanup of some of the tests, pg_drop_replication_slot() failed because the "pg_replslot/regression_slot" folder was is use.

[1] Win10 (1903) MSVC 19.22.27905

Regards,

Juan José Santamaría Flecha

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

22 November 2019, 03:37:59

On Thu, Nov 21, 2019 at 8:32 PM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:
>
> On Thu, Nov 21, 2019 at 5:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Wed, Nov 20, 2019 at 5:41 PM Juan José Santamaría Flecha
>> <juanjo.santamaria@gmail.com> wrote:
>> >
>> > On Wed, Nov 20, 2019 at 9:48 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>> >>
>> >> On Wed, 20 Nov 2019 at 13:10, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >> > See comment in pgunlink() "We need to loop because even though
>> >> > PostgreSQL uses flags that allow unlink while the file is open, other
>> >> > applications might have the file
>> >> > open without those flags.".  Can you once see if there is any flag
>> >> > that you have missed to pass to allow this?
>> >>
>> >> > If there is nothing we
>> >> > can do about it, then we might need to use some different API or maybe
>> >> > define a new API that can handle this.
>> >>
>> >> There were objections against modifying the vfd api only for this
>> >> replication-related use-case. Having a new API will require all the
>> >> changes required to enable the virtual FDs feature that we need from
>> >> vfd. If nothing works out from the FILE_SHARE_DELETE thing, I am
>> >> thinking, we can use VFD, plus we can keep track of per-subtransaction
>> >> vfd handles, and do something similar to AtEOSubXact_Files().
>> >>
>> >
>> > The comment about "other applications might have the file open without those flags." is surely due to systems
workingwith an antivirus touching Postgres files. 
>> >
>> > I was not able to reproduce the Permission denied error with current HEAD,
>> >
>>
>> I am not sure what exactly you tried.  Can you share the steps and
>> your environment details?
>>
>
> Sure, I was trying to reproduce the Permission denied error after the ERROR_HANDLE_EOF fix.
>

Have you tried before that fix , if not, can you once try by
temporarily reverting that fix in your environment and share the
output of each step?  After you get the error due to EOF, check that
you have .spill files in pg_replslot/<slot_name>/ and then again try
to get changes by pg_logical_slot_get_changes().   If you want, you
can use the test provided in Amit Khandekar's patch.

>
> [1] Win10 (1903) MSVC 19.22.27905
>

I have tested this on Windows7. I am not sure if it is due to a
different version of windows, but I think we can't rule out that
possibility.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

22 November 2019, 05:29:34

On Fri, 22 Nov 2019 at 09:08, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Have you tried before that fix , if not, can you once try by
> temporarily reverting that fix in your environment and share the
> output of each step?  After you get the error due to EOF, check that
> you have .spill files in pg_replslot/<slot_name>/ and then again try
> to get changes by pg_logical_slot_get_changes().   If you want, you
> can use the test provided in Amit Khandekar's patch.

On my Linux machine, I added elog() in ReorderBufferRestoreChanges(),
just after FileRead() returns 0. This results in error. But the thing is, in
ReorderBufferCommit(), the error is already handled using PG_CATCH :

PG_CATCH();
{
.....
   AbortCurrentTransaction();
.......
   if (using_subtxn)
      RollbackAndReleaseCurrentSubTransaction();
........
........
   /* remove potential on-disk data, and deallocate */
   ReorderBufferCleanupTXN(rb, txn);
}

So ReorderBufferCleanupTXN() removes all the .spill files using unlink().

And on Windows, what should happen is : unlink() should succeed
because the file is opened using FILE_SHARE_DELETE. But the files
should still remain there because these are still open. It is just
marked for deletion until there is no one having opened the file. That
is what is my conclusion from running a sample attached program test.c
.
But what you are seeing is "Permission denied" errors. Not sure why
unlink() is failing.

The thing that is still a problem is : On Windows, if the file remains
open, and later even when the unlink() succeeds, the file will be left
there until it is closed. So subsequent operations will open the same
old file. Not sure what happens if we open a file that is marked for
deletion.

-

Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment

test.c

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

22 November 2019, 10:56:34

On Fri, Nov 22, 2019 at 11:00 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Fri, 22 Nov 2019 at 09:08, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Have you tried before that fix , if not, can you once try by
> > temporarily reverting that fix in your environment and share the
> > output of each step?  After you get the error due to EOF, check that
> > you have .spill files in pg_replslot/<slot_name>/ and then again try
> > to get changes by pg_logical_slot_get_changes().   If you want, you
> > can use the test provided in Amit Khandekar's patch.
>
> On my Linux machine, I added elog() in ReorderBufferRestoreChanges(),
> just after FileRead() returns 0. This results in error. But the thing is, in
> ReorderBufferCommit(), the error is already handled using PG_CATCH :
>
> PG_CATCH();
> {
> .....
>    AbortCurrentTransaction();
> .......
>    if (using_subtxn)
>       RollbackAndReleaseCurrentSubTransaction();
> ........
> ........
>    /* remove potential on-disk data, and deallocate */
>    ReorderBufferCleanupTXN(rb, txn);
> }
>
> So ReorderBufferCleanupTXN() removes all the .spill files using unlink().
>
> And on Windows, what should happen is : unlink() should succeed
> because the file is opened using FILE_SHARE_DELETE. But the files
> should still remain there because these are still open. It is just
> marked for deletion until there is no one having opened the file. That
> is what is my conclusion from running a sample attached program test.c
>

I think this is exactly the reason for the problem.  In my test [1],
the error "permission denied" occurred when I second time executed
pg_logical_slot_get_changes() which means on first execution the
unlink would have been successful but the files are still not removed
as they were not closed. Then on second execution, it gets an error
"Permission denied" when it again tries to unlink files via
ReorderBufferCleanupSerializedTXNs().


.
> But what you are seeing is "Permission denied" errors. Not sure why
> unlink() is failing.
>

In your test program, if you try to unlink the file second time, you
should see the error "Permission denied".


[1] - https://www.postgresql.org/message-id/CAA4eK1%2Bcey6i6a0zD9kk_eaDXb4RPNZqu4UwXO9LbHAgMpMBkg%40mail.gmail.com



--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

22 November 2019, 14:08:20

On Fri, 22 Nov 2019 at 4:26 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Nov 22, 2019 at 11:00 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Fri, 22 Nov 2019 at 09:08, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Have you tried before that fix , if not, can you once try by
> > temporarily reverting that fix in your environment and share the
> > output of each step? After you get the error due to EOF, check that
> > you have .spill files in pg_replslot/<slot_name>/ and then again try
> > to get changes by pg_logical_slot_get_changes(). If you want, you
> > can use the test provided in Amit Khandekar's patch.
>
> On my Linux machine, I added elog() in ReorderBufferRestoreChanges(),
> just after FileRead() returns 0. This results in error. But the thing is, in
> ReorderBufferCommit(), the error is already handled using PG_CATCH :
>
> PG_CATCH();
> {
> .....
> AbortCurrentTransaction();
> .......
> if (using_subtxn)
> RollbackAndReleaseCurrentSubTransaction();
> ........
> ........
> /* remove potential on-disk data, and deallocate */
> ReorderBufferCleanupTXN(rb, txn);
> }
>
> So ReorderBufferCleanupTXN() removes all the .spill files using unlink().
>
> And on Windows, what should happen is : unlink() should succeed
> because the file is opened using FILE_SHARE_DELETE. But the files
> should still remain there because these are still open. It is just
> marked for deletion until there is no one having opened the file. That
> is what is my conclusion from running a sample attached program test.c
>

I think this is exactly the reason for the problem. In my test [1],
the error "permission denied" occurred when I second time executed
pg_logical_slot_get_changes() which means on first execution the
unlink would have been successful but the files are still not removed
as they were not closed. Then on second execution, it gets an error
"Permission denied" when it again tries to unlink files via
ReorderBufferCleanupSerializedTXNs().

.
> But what you are seeing is "Permission denied" errors. Not sure why
> unlink() is failing.
>

In your test program, if you try to unlink the file second time, you
should see the error "Permission denied".

I tested using the sample program and indeed I got the error 5 (access denied) when I called unlink the second time.

[1] - https://www.postgresql.org/message-id/CAA4eK1%2Bcey6i6a0zD9kk_eaDXb4RPNZqu4UwXO9LbHAgMpMBkg%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Juan José Santamaría Flecha

Date:

25 November 2019, 15:06:46

On Fri, Nov 22, 2019 at 4:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Nov 21, 2019 at 8:32 PM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:
>
> [1] Win10 (1903) MSVC 19.22.27905
>

I have tested this on Windows7. I am not sure if it is due to a
different version of windows, but I think we can't rule out that
possibility.

This seems to be the case. The unexpected behaviour is on my end, which is working as described in FILE_DISPOSITION_POSIX_SEMANTICS [1].

The expected behaviour is what you have already diagnosed.

[1] https://docs.microsoft.com/es-es/windows-hardware/drivers/ddi/ntddk/ns-ntddk-_file_disposition_information_ex

Regards,

Juan José Santamaría Flecha

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

26 November 2019, 05:19:06

On Fri, Nov 22, 2019 at 7:38 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Fri, 22 Nov 2019 at 4:26 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> I think this is exactly the reason for the problem.  In my test [1],
>> the error "permission denied" occurred when I second time executed
>> pg_logical_slot_get_changes() which means on first execution the
>> unlink would have been successful but the files are still not removed
>> as they were not closed. Then on second execution, it gets an error
>> "Permission denied" when it again tries to unlink files via
>> ReorderBufferCleanupSerializedTXNs().
>>
>>
>> .
>> > But what you are seeing is "Permission denied" errors. Not sure why
>> > unlink() is failing.
>> >
>>
>> In your test program, if you try to unlink the file second time, you
>> should see the error "Permission denied".
>
>  I tested using the sample program and indeed I got the error 5 (access denied) when I called unlink the second
time.
>

So, what is the next step here?  How about if we somehow check whether
the file exists before doing unlink, say by using stat?  If that
doesn't work, I think we might need to go in the direction of tracking
file handles in some way, so that they can be closed during an abort.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

26 November 2019, 05:48:53

On Tue, 26 Nov 2019 at 10:49, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Nov 22, 2019 at 7:38 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> >
> > On Fri, 22 Nov 2019 at 4:26 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>
> >> I think this is exactly the reason for the problem.  In my test [1],
> >> the error "permission denied" occurred when I second time executed
> >> pg_logical_slot_get_changes() which means on first execution the
> >> unlink would have been successful but the files are still not removed
> >> as they were not closed. Then on second execution, it gets an error
> >> "Permission denied" when it again tries to unlink files via
> >> ReorderBufferCleanupSerializedTXNs().
> >>
> >>
> >> .
> >> > But what you are seeing is "Permission denied" errors. Not sure why
> >> > unlink() is failing.
> >> >
> >>
> >> In your test program, if you try to unlink the file second time, you
> >> should see the error "Permission denied".
> >
> >  I tested using the sample program and indeed I got the error 5 (access denied) when I called unlink the second
time.
> >
>
> So, what is the next step here?  How about if we somehow check whether
> the file exists before doing unlink, say by using stat?
But the thing is, the behaviour is so much in a grey area, that we
cannot reliably say for instance that when stat() says there is no
such file, there is indeed no such file, and if we re-create  the same
file when it is still open, it is always going to open a new file,
etc.

> If that doesn't work, I think we might need to go in the direction of tracking
> file handles in some way, so that they can be closed during an abort.
Yeah, that is one way. I am still working on different approaches.
WIll get back with proposals.


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

26 November 2019, 06:39:48

On Tue, Nov 26, 2019 at 11:19 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Tue, 26 Nov 2019 at 10:49, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > So, what is the next step here?  How about if we somehow check whether
> > the file exists before doing unlink, say by using stat?
> But the thing is, the behaviour is so much in a grey area, that we
> cannot reliably say for instance that when stat() says there is no
> such file, there is indeed no such file,
>

Why so?

> and if we re-create  the same
> file when it is still open, it is always going to open a new file,
> etc.
>

Yeah, or maybe even if we don't create with the same name, there is
always be some dangling file which again doesn't sound like a good
thing.

> > If that doesn't work, I think we might need to go in the direction of tracking
> > file handles in some way, so that they can be closed during an abort.
> Yeah, that is one way. I am still working on different approaches.
> WIll get back with proposals.
>

Fair enough.  See, if you can also consider an approach that is local
to ReorderBuffer module wherein we can track those handles in
ReorderBufferTxn or some other place local to that module.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

27 November 2019, 08:46:16

On Tue, 26 Nov 2019 at 12:10, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Nov 26, 2019 at 11:19 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> >
> > On Tue, 26 Nov 2019 at 10:49, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > So, what is the next step here?  How about if we somehow check whether
> > > the file exists before doing unlink, say by using stat?
> > But the thing is, the behaviour is so much in a grey area, that we
> > cannot reliably say for instance that when stat() says there is no
> > such file, there is indeed no such file,
> >
>
> Why so?
This was just an example. What I meant was, we are really not sure of
the behaviour of file operations when the file is in this state.
unlink() returning "Permission denied" when called twice is itself
weird enough.

>
> > and if we re-create  the same
> > file when it is still open, it is always going to open a new file,
> > etc.
> >
>
> Yeah, or maybe even if we don't create with the same name, there is
> always be some dangling file which again doesn't sound like a good
> thing.

Right.

>
> > > If that doesn't work, I think we might need to go in the direction of tracking
> > > file handles in some way, so that they can be closed during an abort.
> > Yeah, that is one way. I am still working on different approaches.
> > WIll get back with proposals.
> >
>
> Fair enough.  See, if you can also consider an approach that is local
> to ReorderBuffer module wherein we can track those handles in
> ReorderBufferTxn or some other place local to that module.

What I found was : We do attempt to close the opened vfds in the
PG_CATCH block. In ReorderBufferCommit(), ReorderBufferIterTXNFinish
is called both in PG_TRY and PG_CATCH. This closes all the opened
vfds. But the issue is : if the ereport() occurs inside
ReorderBufferIterTXNInit(), then iterstate is still NULL. So in
PG_CATCH, ReorderBufferIterTXNFinish() is not called, so the vfds in
state->entries[] remain open.

We can have &iterstate passed to ReorderBufferIterTXNInit() as another
argument, and initialize it first thing inside the function. This way,
it will never be NULL. But need to be careful about the possibility of
having a iterstate in a half-cooked state, so cleanup might use some
uninitialized handles. Will work on it. At least, we can make sure the
iterstate->entries handle doesn't have junk values.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

03 December 2019, 05:39:36

On Wed, 27 Nov 2019 at 14:16, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> What I found was : We do attempt to close the opened vfds in the
> PG_CATCH block. In ReorderBufferCommit(), ReorderBufferIterTXNFinish
> is called both in PG_TRY and PG_CATCH. This closes all the opened
> vfds. But the issue is : if the ereport() occurs inside
> ReorderBufferIterTXNInit(), then iterstate is still NULL. So in
> PG_CATCH, ReorderBufferIterTXNFinish() is not called, so the vfds in
> state->entries[] remain open.
>
> We can have &iterstate passed to ReorderBufferIterTXNInit() as another
> argument, and initialize it first thing inside the function. This way,
> it will never be NULL. But need to be careful about the possibility of
> having a iterstate in a half-cooked state, so cleanup might use some
> uninitialized handles. Will work on it. At least, we can make sure the
> iterstate->entries handle doesn't have junk values.

Done as stated above; attached v3 patch. I have verified that the file
handles do get closed in PG_CATCH block via
ReorderBufferIterTXNFinish().

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment

use_vfd_for_logrep_v3.patch

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

05 December 2019, 10:50:18

On Tue, Dec 3, 2019 at 11:10 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Wed, 27 Nov 2019 at 14:16, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> > What I found was : We do attempt to close the opened vfds in the
> > PG_CATCH block. In ReorderBufferCommit(), ReorderBufferIterTXNFinish
> > is called both in PG_TRY and PG_CATCH. This closes all the opened
> > vfds. But the issue is : if the ereport() occurs inside
> > ReorderBufferIterTXNInit(), then iterstate is still NULL. So in
> > PG_CATCH, ReorderBufferIterTXNFinish() is not called, so the vfds in
> > state->entries[] remain open.
> >
> > We can have &iterstate passed to ReorderBufferIterTXNInit() as another
> > argument, and initialize it first thing inside the function. This way,
> > it will never be NULL. But need to be careful about the possibility of
> > having a iterstate in a half-cooked state, so cleanup might use some
> > uninitialized handles. Will work on it. At least, we can make sure the
> > iterstate->entries handle doesn't have junk values.
>
> Done as stated above; attached v3 patch. I have verified that the file
> handles do get closed in PG_CATCH block via
> ReorderBufferIterTXNFinish().
>

I couldn't reproduce the original problem (on HEAD) reported with the
test case in the patch.  So, I can't verify the fix.  I think it is
because of recent commits cec2edfa7859279f36d2374770ca920c59c73dd8 and
9290ad198b15d6b986b855d2a58d087a54777e87.  It seems you need to either
change the value of logical_decoding_work_mem or change the test in
some way so that the original problem can be reproduced.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

06 December 2019, 10:10:20

On Thu, Dec 5, 2019 at 4:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Dec 3, 2019 at 11:10 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> >
> >
> > Done as stated above; attached v3 patch. I have verified that the file
> > handles do get closed in PG_CATCH block via
> > ReorderBufferIterTXNFinish().
> >
>
> I couldn't reproduce the original problem (on HEAD) reported with the
> test case in the patch.  So, I can't verify the fix.  I think it is
> because of recent commits cec2edfa7859279f36d2374770ca920c59c73dd8 and
> 9290ad198b15d6b986b855d2a58d087a54777e87.  It seems you need to either
> change the value of logical_decoding_work_mem or change the test in
> some way so that the original problem can be reproduced.
>

Few comments:
----------------------

1.
+ /* Now that the state fields are initialized, it is safe to return it. */
+ *iter_state = state;
+
  /* allocate heap */
  state->heap =
binaryheap_allocate(state->nr_txns,
    ReorderBufferIterCompare,

Is there a reason for not initializing iter_state after
binaryheap_allocate?  If we do so, then we don't need additional check
you have added in ReorderBufferIterTXNFinish.

2.
/* No harm in resetting the offset even in case of failure */
file->curOffset = 0;

The above comment is not clear because you are not setting it in case
of error rather this is a success path.

3.
+ *
+ * Note: The iterator state is returned through iter_state parameter rather
+ * than the function's return value. This is because the state gets cleaned up
+ * in a PG_CATCH block, so we want to make sure the caller gets back the state
+ * even if this function throws an exception, so that the state resources can
+ * be cleaned up.

How about changing it slightly as follows to make it more clear.
"Note: The iterator state is returned through iter_state parameter
rather than the function's return value.  This is because the state
gets cleaned up in a PG_CATCH block in the caller, so we want to make
sure the caller gets back the state even if this function throws an
exception."

4. I think we should also check how much time increase will happen for
test_decoding regression test after the test added by this patch?

5. One naive question about the usage of PathNameOpenFile().  When it
reaches the max limit, it will automatically close one of the files,
but how will that be reflected in the data structure (TXNEntryFile)
you are managing.  Basically, when PathNameOpenFile closes some file,
how will the corresponding vfd in TXNEntryFile be changed. Because if
it is not changed, then won't it start pointing to some wrong
filehandle?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

06 December 2019, 11:30:10

On Fri, 6 Dec 2019 at 15:40, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 5, 2019 at 4:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Dec 3, 2019 at 11:10 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> > >
> > >
> > > Done as stated above; attached v3 patch. I have verified that the file
> > > handles do get closed in PG_CATCH block via
> > > ReorderBufferIterTXNFinish().
> > >
> >
> > I couldn't reproduce the original problem (on HEAD) reported with the
> > test case in the patch.  So, I can't verify the fix.  I think it is
> > because of recent commits cec2edfa7859279f36d2374770ca920c59c73dd8 and
> > 9290ad198b15d6b986b855d2a58d087a54777e87.  It seems you need to either
> > change the value of logical_decoding_work_mem or change the test in
> > some way so that the original problem can be reproduced.

Yeah, it does seem like the commit cec2edfa78592 must have caused the
test to not reproduce on your env, although the test does fail for me
still. Setting logical_decoding_work_mem to a low value does sound
like a good idea. Will work on it.

> >
>
> Few comments:
> ----------------------
>
> 1.
> + /* Now that the state fields are initialized, it is safe to return it. */
> + *iter_state = state;
> +
>   /* allocate heap */
>   state->heap =
> binaryheap_allocate(state->nr_txns,
>     ReorderBufferIterCompare,
>
> Is there a reason for not initializing iter_state after
> binaryheap_allocate?  If we do so, then we don't need additional check
> you have added in ReorderBufferIterTXNFinish.

If iter_state is initialized *after* binaryheap_allocate, then we
won't be able to close the vfds if binaryheap_allocate() ereports().

>
> 2.
> /* No harm in resetting the offset even in case of failure */
> file->curOffset = 0;
>
> The above comment is not clear because you are not setting it in case
> of error rather this is a success path.

I meant, even if PathNameOpenFile() failed, it is ok to set
file->curOffset to 0, so need not set it only in case of *fd >= 0. In
most of the cases, fd would be valid, so just set file->curOffset to 0
always.

>
> 3.
> + *
> + * Note: The iterator state is returned through iter_state parameter rather
> + * than the function's return value. This is because the state gets cleaned up
> + * in a PG_CATCH block, so we want to make sure the caller gets back the state
> + * even if this function throws an exception, so that the state resources can
> + * be cleaned up.
>
> How about changing it slightly as follows to make it more clear.
> "Note: The iterator state is returned through iter_state parameter
> rather than the function's return value.  This is because the state
> gets cleaned up in a PG_CATCH block in the caller, so we want to make
> sure the caller gets back the state even if this function throws an
> exception."

Agreed. Will do that in the next patch version.

>
> 4. I think we should also check how much time increase will happen for
> test_decoding regression test after the test added by this patch?
Yeah, it currently takes noticeably longer compared to the others.
Let's see if setting logical_decoding_work_mem to a min value allows
us to reproduce the test with much lesser number of inserts.

>
> 5. One naive question about the usage of PathNameOpenFile().  When it
> reaches the max limit, it will automatically close one of the files,
> but how will that be reflected in the data structure (TXNEntryFile)
> you are managing.  Basically, when PathNameOpenFile closes some file,
> how will the corresponding vfd in TXNEntryFile be changed. Because if
> it is not changed, then won't it start pointing to some wrong
> filehandle?

In PathNameOpenFile(), excess kernel fds could be closed
(ReleaseLruFiles). But with that, the vfds themselves don't get
invalidated. Only the underlying kernel fd gets closed, and the
vfd->fd is marked VFD_CLOSED. The vfd array element remains valid (a
non-null vfd->fileName means the vfd slot is valid; check
FileIsValid). So later, when FileRead(vfd1) is called and that vfd1
happens to be the one that had got it's kernel fd closed, it gets
opened again through FileAccess().

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

07 December 2019, 06:07:31

On Fri, Dec 6, 2019 at 5:00 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Fri, 6 Dec 2019 at 15:40, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > Few comments:
> > ----------------------
> >
> > 1.
> > + /* Now that the state fields are initialized, it is safe to return it. */
> > + *iter_state = state;
> > +
> >   /* allocate heap */
> >   state->heap =
> > binaryheap_allocate(state->nr_txns,
> >     ReorderBufferIterCompare,
> >
> > Is there a reason for not initializing iter_state after
> > binaryheap_allocate?  If we do so, then we don't need additional check
> > you have added in ReorderBufferIterTXNFinish.
>
> If iter_state is initialized *after* binaryheap_allocate, then we
> won't be able to close the vfds if binaryheap_allocate() ereports().
>

Is it possible to have vfds opened before binaryheap_allocate(), if so how?

> >
> > 5. One naive question about the usage of PathNameOpenFile().  When it
> > reaches the max limit, it will automatically close one of the files,
> > but how will that be reflected in the data structure (TXNEntryFile)
> > you are managing.  Basically, when PathNameOpenFile closes some file,
> > how will the corresponding vfd in TXNEntryFile be changed. Because if
> > it is not changed, then won't it start pointing to some wrong
> > filehandle?
>
> In PathNameOpenFile(), excess kernel fds could be closed
> (ReleaseLruFiles). But with that, the vfds themselves don't get
> invalidated. Only the underlying kernel fd gets closed, and the
> vfd->fd is marked VFD_CLOSED. The vfd array element remains valid (a
> non-null vfd->fileName means the vfd slot is valid; check
> FileIsValid). So later, when FileRead(vfd1) is called and that vfd1
> happens to be the one that had got it's kernel fd closed, it gets
> opened again through FileAccess().
>

I was under impression that once the fd is closed due to excess kernel
fds that are opened, the slot in VfdCache array could be resued by
someone else, but on closer inspection that is not true.  It will be
only available for reuse after we explicitly call FileClose, right?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Thomas Munro

Date:

08 December 2019, 23:10:33

On Wed, Nov 20, 2019 at 1:14 AM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:
> On Tue, Nov 19, 2019 at 12:49 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>> On Wed, Nov 20, 2019 at 12:28 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>> > On Windows, it is documented that ReadFile() (which is called by
>> > pg_pread) will return false on EOF but only when the file is open for
>> > asynchronous reads/writes. But here we are just dealing with usual
>> > synchronous reads. So pg_pread() code should indeed return 0 on EOF on
>> > Windows. Not yet able to figure out how FileRead() managed to return
>> > this error on Windows. But from your symptoms, it does look like
>> > pg_pread()=>ReadFile() returned false (despite doing asynchronous
>> > reads), and so _dosmaperr() gets called, and then it does not find the
>> > eof error in doserrors[], so the "unrecognized win32 error code"
>> > message is printed. May have to dig up more on this.
>>
>> Hmm.  See also this report:
>>
>> https://www.postgresql.org/message-id/flat/CABuU89MfEvJE%3DWif%2BHk7SCqjSOF4rhgwJWW6aR3hjojpGqFbjQ%40mail.gmail.com
>
> The files from pgwin32_open() are open for synchronous access, while pg_pread() uses the asynchronous functionality
tooffset the read. Under these circunstances, a read past EOF will return ERROR_HANDLE_EOF (38), as explained in: 
>
> https://devblogs.microsoft.com/oldnewthing/20150121-00/?p=44863

FWIW, I sent a pull request to see if the MicrosoftDocs project agrees
that the ReadFile page is misleading on this point:

https://github.com/MicrosoftDocs/sdk-api/pull/7

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

11 December 2019, 10:46:43

On Sat, 7 Dec 2019 at 11:37, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Dec 6, 2019 at 5:00 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> >
> > On Fri, 6 Dec 2019 at 15:40, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > Few comments:
> > > ----------------------
> > >
> > > 1.
> > > + /* Now that the state fields are initialized, it is safe to return it. */
> > > + *iter_state = state;
> > > +
> > >   /* allocate heap */
> > >   state->heap =
> > > binaryheap_allocate(state->nr_txns,
> > >     ReorderBufferIterCompare,
> > >
> > > Is there a reason for not initializing iter_state after
> > > binaryheap_allocate?  If we do so, then we don't need additional check
> > > you have added in ReorderBufferIterTXNFinish.
> >
> > If iter_state is initialized *after* binaryheap_allocate, then we
> > won't be able to close the vfds if binaryheap_allocate() ereports().
> >
>
> Is it possible to have vfds opened before binaryheap_allocate(), if so how?
No it does not look possible for the vfds to be opened before
binaryheap_allocate(). But actually, the idea behind placing the
iter_state at the place where I put is that, we should return back the
iter_state at the *earliest* place in the code where it is safe to
return.


> > I couldn't reproduce the original problem (on HEAD) reported with the
> > test case in the patch.  So, I can't verify the fix.  I think it is
> > because of recent commits cec2edfa7859279f36d2374770ca920c59c73dd8 and
> > 9290ad198b15d6b986b855d2a58d087a54777e87.  It seems you need to either
> > change the value of logical_decoding_work_mem or change the test in
> > some way so that the original problem can be reproduced.
Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> Yeah, it does seem like the commit cec2edfa78592 must have caused the
> test to not reproduce on your env, although the test does fail for me
> still. Setting logical_decoding_work_mem to a low value does sound
> like a good idea. Will work on it.

I checked that setting logical_decoding_work_mem to its min value
(64KB) causes early serialization. So I think if you set this, you
should be able to reproduce with the spill.sql test that has the new
testcase. I will anyway set this value in the test. Also check below
...

>> 4. I think we should also check how much time increase will happen for
>> test_decoding regression test after the test added by this patch?
Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> Yeah, it currently takes noticeably longer compared to the others.
> Let's see if setting logical_decoding_work_mem to a min value allows
> us to reproduce the test with much lesser number of inserts.

The test in the patch takes around 20 seconds, as compared to the max
time of 2 seconds any of the other tests take in that test suite.

But if we set the max_files_per_process to a very low value (say 26)
as against the default 1000, we can reproduce the issue with as low as
20 sub-transactions as against the 600 that I used in spill.sql test.
And with this, the test runs in around 4 seconds, so this is good. But
the problem is : max_files_per_process needs server restart. So either
we have to shift this test to src/test/recovery in one of the
logical_decoding test, or retain it in contrib/test_decoding and let
it run for 20 seconds. Let me know if you figure out any other
approach.


>
> > >
> > > 5. One naive question about the usage of PathNameOpenFile().  When it
> > > reaches the max limit, it will automatically close one of the files,
> > > but how will that be reflected in the data structure (TXNEntryFile)
> > > you are managing.  Basically, when PathNameOpenFile closes some file,
> > > how will the corresponding vfd in TXNEntryFile be changed. Because if
> > > it is not changed, then won't it start pointing to some wrong
> > > filehandle?
> >
> > In PathNameOpenFile(), excess kernel fds could be closed
> > (ReleaseLruFiles). But with that, the vfds themselves don't get
> > invalidated. Only the underlying kernel fd gets closed, and the
> > vfd->fd is marked VFD_CLOSED. The vfd array element remains valid (a
> > non-null vfd->fileName means the vfd slot is valid; check
> > FileIsValid). So later, when FileRead(vfd1) is called and that vfd1
> > happens to be the one that had got it's kernel fd closed, it gets
> > opened again through FileAccess().
> >
>
> I was under impression that once the fd is closed due to excess kernel
> fds that are opened, the slot in VfdCache array could be resued by
> someone else, but on closer inspection that is not true.  It will be
> only available for reuse after we explicitly call FileClose, right?

Yes, that's right.


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

12 December 2019, 04:19:29

On Wed, Dec 11, 2019 at 4:17 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Sat, 7 Dec 2019 at 11:37, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Dec 6, 2019 at 5:00 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> > >
> > > On Fri, 6 Dec 2019 at 15:40, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > 1.
> > > > + /* Now that the state fields are initialized, it is safe to return it. */
> > > > + *iter_state = state;
> > > > +
> > > >   /* allocate heap */
> > > >   state->heap =
> > > > binaryheap_allocate(state->nr_txns,
> > > >     ReorderBufferIterCompare,
> > > >
> > > > Is there a reason for not initializing iter_state after
> > > > binaryheap_allocate?  If we do so, then we don't need additional check
> > > > you have added in ReorderBufferIterTXNFinish.
> > >
> > > If iter_state is initialized *after* binaryheap_allocate, then we
> > > won't be able to close the vfds if binaryheap_allocate() ereports().
> > >
> >
> > Is it possible to have vfds opened before binaryheap_allocate(), if so how?
> No it does not look possible for the vfds to be opened before
> binaryheap_allocate(). But actually, the idea behind placing the
> iter_state at the place where I put is that, we should return back the
> iter_state at the *earliest* place in the code where it is safe to
> return.
>

Sure, I get that point, but it seems it is equally good to do this
after binaryheap_allocate().  It will be sligthly better because if
there is any error in binaryheap_allocate, then we don't need to even
call ReorderBufferIterTXNFinish().

>
> >> 4. I think we should also check how much time increase will happen for
> >> test_decoding regression test after the test added by this patch?
> Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> > Yeah, it currently takes noticeably longer compared to the others.
> > Let's see if setting logical_decoding_work_mem to a min value allows
> > us to reproduce the test with much lesser number of inserts.
>
> The test in the patch takes around 20 seconds, as compared to the max
> time of 2 seconds any of the other tests take in that test suite.
>
> But if we set the max_files_per_process to a very low value (say 26)
> as against the default 1000, we can reproduce the issue with as low as
> 20 sub-transactions as against the 600 that I used in spill.sql test.
> And with this, the test runs in around 4 seconds, so this is good.
>

Do you get 4s even after setting the minimum value of
logical_decoding_work_mem?  I think 4s is also too much for this test
which is going to test one extreme scenario. Can you try with some
bigger rows or something else to reduce this time?  I think if we can
get it close to 2s or whatever maximum time taken by any other logical
decoding tests, then good, otherwise, it doesn't seem like a good idea
to add this test.

> But
> the problem is : max_files_per_process needs server restart. So either
> we have to shift this test to src/test/recovery in one of the
> logical_decoding test, or retain it in contrib/test_decoding and let
> it run for 20 seconds. Let me know if you figure out any other
> approach.
>

I don't think 20s for one test is acceptable.  So, we should just give
up that path, you can try what I suggested above, if we can reduce the
test time, then good, otherwise, I suggest to drop the test from the
patch.  Having said that, we should try our best to reduce the test
time as it will be good if we can have such a test.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

12 December 2019, 06:04:10

On Thu, 12 Dec 2019 at 09:49, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 11, 2019 at 4:17 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> >
> > On Sat, 7 Dec 2019 at 11:37, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Dec 6, 2019 at 5:00 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> > > >
> > > > On Fri, 6 Dec 2019 at 15:40, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > 1.
> > > > > + /* Now that the state fields are initialized, it is safe to return it. */
> > > > > + *iter_state = state;
> > > > > +
> > > > >   /* allocate heap */
> > > > >   state->heap =
> > > > > binaryheap_allocate(state->nr_txns,
> > > > >     ReorderBufferIterCompare,
> > > > >
> > > > > Is there a reason for not initializing iter_state after
> > > > > binaryheap_allocate?  If we do so, then we don't need additional check
> > > > > you have added in ReorderBufferIterTXNFinish.
> > > >
> > > > If iter_state is initialized *after* binaryheap_allocate, then we
> > > > won't be able to close the vfds if binaryheap_allocate() ereports().
> > > >
> > >
> > > Is it possible to have vfds opened before binaryheap_allocate(), if so how?
> > No it does not look possible for the vfds to be opened before
> > binaryheap_allocate(). But actually, the idea behind placing the
> > iter_state at the place where I put is that, we should return back the
> > iter_state at the *earliest* place in the code where it is safe to
> > return.
> >
>
> Sure, I get that point, but it seems it is equally good to do this
> after binaryheap_allocate().  It will be sligthly better because if
> there is any error in binaryheap_allocate, then we don't need to even
> call ReorderBufferIterTXNFinish().

All right. WIll do that.

>
> >
> > >> 4. I think we should also check how much time increase will happen for
> > >> test_decoding regression test after the test added by this patch?
> > Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> > > Yeah, it currently takes noticeably longer compared to the others.
> > > Let's see if setting logical_decoding_work_mem to a min value allows
> > > us to reproduce the test with much lesser number of inserts.
> >
> > The test in the patch takes around 20 seconds, as compared to the max
> > time of 2 seconds any of the other tests take in that test suite.
> >
> > But if we set the max_files_per_process to a very low value (say 26)
> > as against the default 1000, we can reproduce the issue with as low as
> > 20 sub-transactions as against the 600 that I used in spill.sql test.
> > And with this, the test runs in around 4 seconds, so this is good.
> >
>
> Do you get 4s even after setting the minimum value of
> logical_decoding_work_mem?  I think 4s is also too much for this test
> which is going to test one extreme scenario. Can you try with some
> bigger rows or something else to reduce this time?  I think if we can
> get it close to 2s or whatever maximum time taken by any other logical
> decoding tests, then good, otherwise, it doesn't seem like a good idea
> to add this test.

>
> > But
> > the problem is : max_files_per_process needs server restart. So either
> > we have to shift this test to src/test/recovery in one of the
> > logical_decoding test, or retain it in contrib/test_decoding and let
> > it run for 20 seconds. Let me know if you figure out any other
> > approach.
> >
>
> I don't think 20s for one test is acceptable.  So, we should just give
> up that path, you can try what I suggested above, if we can reduce the
> test time, then good, otherwise, I suggest to drop the test from the
> patch.  Having said that, we should try our best to reduce the test
> time as it will be good if we can have such a test.

I tried; it is actually roughly 3.4 seconds. Note that reducing
logical_decoding_work_mem does not reduce the test time. It causes the
serialization to start early, and so increases the chance of
reproducing the problem. During restore of the serialized data, we
still use max_changes_in_memory. So max_changes_in_memory is the one
that allows us to reduce the number of transactions required, so we
can cut down on the outer loop iterations and make the test finish
much earlier.

But also note that, we can't use the test suite in
contrib/test_decoding, because max_changes_in_memory needs server
restart. So we need to shift this test to src/test/recovery. And
there, I guess it is not that critical for the testcase to be very
quick because the tests in general are much slower than the ones in
contrib/test_decoding, although it would be nice to make it fast. What
I propose is to modify  max_changes_in_memory, do a server restart
(which takes hardly a sec), run the testcase (3.5 sec) and then
restart after resetting the guc. So totally it will be around 4-5
seconds.


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

12 December 2019, 06:22:47

On Thu, 12 Dec 2019 at 11:34, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

> So max_changes_in_memory is the one
> that allows us to reduce the number of transactions required, so we
> can cut down on the outer loop iterations and make the test finish
> much earlier.

>
> But also note that, we can't use the test suite in
> contrib/test_decoding, because max_changes_in_memory needs server
> restart. So we need to shift this test to src/test/recovery. And
> there, I guess it is not that critical for the testcase to be very
> quick because the tests in general are much slower than the ones in
> contrib/test_decoding, although it would be nice to make it fast. What
> I propose is to modify  max_changes_in_memory, do a server restart
> (which takes hardly a sec), run the testcase (3.5 sec) and then
> restart after resetting the guc. So totally it will be around 4-5
> seconds.

Sorry I meant max_files_per_process. We need to reduce
max_files_per_process, so that it causes max_safe_fds to be reduced,
and so only a few transactions are sufficient to reproduce the
problem, because the reserveAllocatedDesc() will return false much
sooner due to low max_safe_fds.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

12 December 2019, 08:48:43

On Thu, Dec 12, 2019 at 11:53 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Thu, 12 Dec 2019 at 11:34, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> > So max_changes_in_memory is the one
> > that allows us to reduce the number of transactions required, so we
> > can cut down on the outer loop iterations and make the test finish
> > much earlier.
>
> >
> > But also note that, we can't use the test suite in
> > contrib/test_decoding, because max_changes_in_memory needs server
> > restart. So we need to shift this test to src/test/recovery. And
> > there, I guess it is not that critical for the testcase to be very
> > quick because the tests in general are much slower than the ones in
> > contrib/test_decoding, although it would be nice to make it fast. What
> > I propose is to modify  max_changes_in_memory, do a server restart
> > (which takes hardly a sec), run the testcase (3.5 sec) and then
> > restart after resetting the guc. So totally it will be around 4-5
> > seconds.
>
> Sorry I meant max_files_per_process.
>

Okay, what time other individual tests take in that directory on your
machine?  How about providing a separate test patch for this case so
that I can also test it?  I think we can take the opinion of others as
well if they are fine with adding this test, otherwise, we can go
ahead with the main patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

12 December 2019, 16:20:12

On Thu, 12 Dec 2019 at 14:18, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 12, 2019 at 11:53 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> >
> > On Thu, 12 Dec 2019 at 11:34, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> >
> > > So max_changes_in_memory is the one
> > > that allows us to reduce the number of transactions required, so we
> > > can cut down on the outer loop iterations and make the test finish
> > > much earlier.
> >
> > >
> > > But also note that, we can't use the test suite in
> > > contrib/test_decoding, because max_changes_in_memory needs server
> > > restart. So we need to shift this test to src/test/recovery. And
> > > there, I guess it is not that critical for the testcase to be very
> > > quick because the tests in general are much slower than the ones in
> > > contrib/test_decoding, although it would be nice to make it fast. What
> > > I propose is to modify  max_changes_in_memory, do a server restart
> > > (which takes hardly a sec), run the testcase (3.5 sec) and then
> > > restart after resetting the guc. So totally it will be around 4-5
> > > seconds.
> >
> > Sorry I meant max_files_per_process.
> >
>
> Okay, what time other individual tests take in that directory on your
> machine?
For src/test/recovery directory, on average, a test takes about 4-5 seconds.

> How about providing a separate test patch for this case so
> that I can also test it?
Attached is a v4 patch that also addresses your code comments so far.
I have included the test case in 006_logical_decoding.pl. I observed
that the test case just adds only about 0.5 to 1 sec time. Please
verify on your env also, and also whether the test reproduces the
issue without the code changes.

> I think we can take the opinion of others as
> well if they are fine with adding this test, otherwise, we can go
> ahead with the main patch.
Sure.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment

use_vfd_for_logrep_v4.patch

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

14 December 2019, 06:29:43

On Thu, Dec 12, 2019 at 9:50 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> Attached is a v4 patch that also addresses your code comments so far.
> I have included the test case in 006_logical_decoding.pl. I observed
> that the test case just adds only about 0.5 to 1 sec time. Please
> verify on your env also, and also whether the test reproduces the
> issue without the code changes.
>

It takes roughly the same time on my machine as well.  I have checked
on Windows as well, it increases the time from 14 to 16 (17) seconds
for this test.  I don't think this is any big increase considering the
timing of other tests and it would be good to have a test for such
boundary conditions.  I have slightly change the comments in the patch
and ran pgindent.  Attached, find the patch with a proposed commit
message.

I have also made minor changes related to below code in patch:
- else if (readBytes != sizeof(ReorderBufferDiskChange))
+
+ file->curOffset += readBytes;
+
+ if (readBytes !=
sizeof(ReorderBufferDiskChange))

Why the size is added before the error check?  I think it should be
after that check, so changed accordingly.  Similarly, I don't see why
we need to change 'else if' to 'if' in this code, so changed back.

I think we need to change/tweak the test for back branches as there we
don't have logical_decoding_work_mem.  Can you please look into that
and see if you can run perltidy for the test file.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

use_vfd_for_logrep_v5.patch

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

16 December 2019, 09:55:52

On Sat, 14 Dec 2019 at 11:59, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 12, 2019 at 9:50 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> >
> > Attached is a v4 patch that also addresses your code comments so far.
> > I have included the test case in 006_logical_decoding.pl. I observed
> > that the test case just adds only about 0.5 to 1 sec time. Please
> > verify on your env also, and also whether the test reproduces the
> > issue without the code changes.
> >
>
> It takes roughly the same time on my machine as well.  I have checked
> on Windows as well, it increases the time from 14 to 16 (17) seconds
> for this test.  I don't think this is any big increase considering the
> timing of other tests and it would be good to have a test for such
> boundary conditions.  I have slightly change the comments in the patch
> and ran pgindent.  Attached, find the patch with a proposed commit
> message.
>
> I have also made minor changes related to below code in patch:
> - else if (readBytes != sizeof(ReorderBufferDiskChange))
> +
> + file->curOffset += readBytes;
> +
> + if (readBytes !=
> sizeof(ReorderBufferDiskChange))
>
> Why the size is added before the error check?
The logic was : even though it's an error that the readBytes does not
match the expected size, the file read is successful so update the vfd
offset as early as possible. In our case, this might not matter much,
but who knows, in the future, in the exception block (say, in
ReorderBufferIterTXNFinish, someone assumes that the file offset is
correct and does something with that, then we will get in trouble,
although I agree that it's very unlikely. But IMO, because we want to
simulate the file offset support in vfd, we should update the file
offset immediately after a file read is known to have succeeded.

>  I think it should be
> after that check, so changed accordingly.  Similarly, I don't see why
> we need to change 'else if' to 'if' in this code, so changed back.
Since for adding the size before the error check I had to remove the
else-if, so to be consistent, I removed the else-if at surrounding
places also.

>
> I think we need to change/tweak the test for back branches as there we
> don't have logical_decoding_work_mem.  Can you please look into that
Yeah, I believe we need to backport up to PG 9.4 where logical
decoding was introduced, so I am first trying out with 9.4 branch.

> and see if you can run perltidy for the test file.
Hmm, I tried perltidy, and it seems to mostly add a space after ( and
a space before ) if there's already; so "('postgres'," is replaced by
"(<space> 'postgres',". And this is going to be inconsistent with
other places. And it replaces tab with spaces. Do you think we should
try perltidy, or have we before been using this tool for the tap tests
?


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

16 December 2019, 11:22:00

On Mon, Dec 16, 2019 at 3:26 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Sat, 14 Dec 2019 at 11:59, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > I have also made minor changes related to below code in patch:
> > - else if (readBytes != sizeof(ReorderBufferDiskChange))
> > +
> > + file->curOffset += readBytes;
> > +
> > + if (readBytes !=
> > sizeof(ReorderBufferDiskChange))
> >
> > Why the size is added before the error check?
> The logic was : even though it's an error that the readBytes does not
> match the expected size, the file read is successful so update the vfd
> offset as early as possible. In our case, this might not matter much,
> but who knows, in the future, in the exception block (say, in
> ReorderBufferIterTXNFinish, someone assumes that the file offset is
> correct and does something with that, then we will get in trouble,
> although I agree that it's very unlikely.
>

I am not sure if there is any such need, but even if it is there, I
think updating after a *short* read (read less than expected) doesn't
seem like a good idea because there is clearly some problem with the
read call.  Also, in the case below that case where we read the actual
change data, the offset is updated after the check of *short* read.  I
don't see any advantage in such an inconsistency.  I still feel it is
better to update the offset after all error checks.

>
> > and see if you can run perltidy for the test file.
> Hmm, I tried perltidy, and it seems to mostly add a space after ( and
> a space before ) if there's already; so "('postgres'," is replaced by
> "(<space> 'postgres',". And this is going to be inconsistent with
> other places. And it replaces tab with spaces. Do you think we should
> try perltidy, or have we before been using this tool for the tap tests
> ?
>

See text in src/test/perl/README (Note that all tests and test tools
should have perltidy run on them before patches are submitted, using
perltidy - profile=src/tools/pgindent/perltidyrc).  It is recommended
to use perltidy.

Now, if it is making the added code inconsistent with nearby code,
then I suggest to leave it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

17 December 2019, 12:10:58

On Mon, 16 Dec 2019 at 16:52, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Dec 16, 2019 at 3:26 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> >
> > On Sat, 14 Dec 2019 at 11:59, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > I have also made minor changes related to below code in patch:
> > > - else if (readBytes != sizeof(ReorderBufferDiskChange))
> > > +
> > > + file->curOffset += readBytes;
> > > +
> > > + if (readBytes !=
> > > sizeof(ReorderBufferDiskChange))
> > >
> > > Why the size is added before the error check?
> > The logic was : even though it's an error that the readBytes does not
> > match the expected size, the file read is successful so update the vfd
> > offset as early as possible. In our case, this might not matter much,
> > but who knows, in the future, in the exception block (say, in
> > ReorderBufferIterTXNFinish, someone assumes that the file offset is
> > correct and does something with that, then we will get in trouble,
> > although I agree that it's very unlikely.
> >
>
> I am not sure if there is any such need, but even if it is there, I
> think updating after a *short* read (read less than expected) doesn't
> seem like a good idea because there is clearly some problem with the
> read call.  Also, in the case below that case where we read the actual
> change data, the offset is updated after the check of *short* read.  I
> don't see any advantage in such an inconsistency.  I still feel it is
> better to update the offset after all error checks.
Ok, no problem; I don't see any harm in doing the updates after the size checks.

By the way, the backport patch is turning out to be simpler. It's
because in pre-12 versions, the file offset is part of the Vfd
structure, so all the offset handling is not required.

>
> >
> > > and see if you can run perltidy for the test file.
> > Hmm, I tried perltidy, and it seems to mostly add a space after ( and
> > a space before ) if there's already; so "('postgres'," is replaced by
> > "(<space> 'postgres',". And this is going to be inconsistent with
> > other places. And it replaces tab with spaces. Do you think we should
> > try perltidy, or have we before been using this tool for the tap tests
> > ?
> >
>
> See text in src/test/perl/README (Note that all tests and test tools
> should have perltidy run on them before patches are submitted, using
> perltidy - profile=src/tools/pgindent/perltidyrc).  It is recommended
> to use perltidy.
>
> Now, if it is making the added code inconsistent with nearby code,
> then I suggest to leave it.
In many places, it is becoming inconsistent, but will see if there are
some places where it does make sense and does not break consistency.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

18 December 2019, 07:03:39

On Tue, 17 Dec 2019 at 17:40, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> By the way, the backport patch is turning out to be simpler. It's
> because in pre-12 versions, the file offset is part of the Vfd
> structure, so all the offset handling is not required.

Please have a look at the attached backport patch for PG 11. branch.
Once you are ok with the patch, I will port it on other branches.
Note that in the patch, wherever applicable I have renamed the fd
variable to vfd to signify that it is a vfd, and not the kernel fd. If
we don't do the renaming, the patch would be still smaller, but I
think the renaming makes sense.

The recovery TAP tests don't seem to be there on 9.4 and 9.5 branch,
so I think it's ok to not have any tests with the patches on these
branches that don't have the tap tests.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment

pg11_use_vfd_for_logrep.patch

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

19 December 2019, 06:29:29

On Wed, Dec 18, 2019 at 12:34 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Tue, 17 Dec 2019 at 17:40, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> > By the way, the backport patch is turning out to be simpler. It's
> > because in pre-12 versions, the file offset is part of the Vfd
> > structure, so all the offset handling is not required.
>
> Please have a look at the attached backport patch for PG 11. branch.
> Once you are ok with the patch, I will port it on other branches.
> Note that in the patch, wherever applicable I have renamed the fd
> variable to vfd to signify that it is a vfd, and not the kernel fd. If
> we don't do the renaming, the patch would be still smaller, but I
> think the renaming makes sense.
>

The other usage of PathNameOpenFile in md.c is already using 'fd' as a
variable name (also, if you see example in fd.h, that also uses fd as
variable name), so I don't see any problem with using fd especially if
that leads to lesser changes.  Apart from that, your patch LGTM.

> The recovery TAP tests don't seem to be there on 9.4 and 9.5 branch,
> so I think it's ok to not have any tests with the patches on these
> branches that don't have the tap tests.
>

Yeah, that is fine.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

20 December 2019, 04:01:10

On Thu, 19 Dec 2019 at 11:59, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 18, 2019 at 12:34 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> >
> > On Tue, 17 Dec 2019 at 17:40, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> > > By the way, the backport patch is turning out to be simpler. It's
> > > because in pre-12 versions, the file offset is part of the Vfd
> > > structure, so all the offset handling is not required.
> >
> > Please have a look at the attached backport patch for PG 11. branch.
> > Once you are ok with the patch, I will port it on other branches.
> > Note that in the patch, wherever applicable I have renamed the fd
> > variable to vfd to signify that it is a vfd, and not the kernel fd. If
> > we don't do the renaming, the patch would be still smaller, but I
> > think the renaming makes sense.
> >
>
> The other usage of PathNameOpenFile in md.c is already using 'fd' as a
> variable name (also, if you see example in fd.h, that also uses fd as
> variable name), so I don't see any problem with using fd especially if
> that leads to lesser changes.

Ok. I have retained fd name.

> Apart from that, your patch LGTM.
Attached are the patches from master back up to 94 branch.

PG 9.4 and 9.5 have a common patch to be applied :
pg94_95_use_vfd_for_logrep.patch
From PG 9.6 onwards, each version has a separate patch.

For PG 9.6, there is no logical decoding perl test file. So I have
made a new file 006_logical_decoding_spill.pl that has only the
specific testcase. Also, for building the test_decoding.so, I had to
add the EXTRA_INSTALL=contrib/test_decoding line in the
src/test/recovery/Makefile, because this is the first time we are
using the plugin in the 9.6 tap test.

From PG 10 onwards, pgstat_report_*() calls around read() are removed
in the patch, because FileRead() itself reports the wait events.

From PG 12 onwards, the vfd offset handling had to be added, because
the offset is not present in Vfd structure.

In master, logical_decoding_work_mem is used in the test file.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment

use_vfd_for_logrep_patches.tar.gz

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

24 December 2019, 05:11:30

On Fri, Dec 20, 2019 at 9:31 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> Attached are the patches from master back up to 94 branch.
>
> PG 9.4 and 9.5 have a common patch to be applied :
> pg94_95_use_vfd_for_logrep.patch
> From PG 9.6 onwards, each version has a separate patch.
>
> For PG 9.6, there is no logical decoding perl test file. So I have
> made a new file 006_logical_decoding_spill.pl that has only the
> specific testcase. Also, for building the test_decoding.so, I had to
> add the EXTRA_INSTALL=contrib/test_decoding line in the
> src/test/recovery/Makefile, because this is the first time we are
> using the plugin in the 9.6 tap test.
>

I am not sure if we need to go that far for 9.6 branch.  If the other
tests for logical decoding are not present, then I don't see why we
need to create a new test file for this test only.  Also, I think this
will make the patch the same for 9.4,9.5 and 9.6.

> From PG 10 onwards, pgstat_report_*() calls around read() are removed
> in the patch, because FileRead() itself reports the wait events.
>

Why there are different patches for 10 and 11?

We should try to minimize the difference between patches in different branches.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

24 December 2019, 09:01:02

On Tue, 24 Dec 2019 at 10:41, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Dec 20, 2019 at 9:31 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> > Attached are the patches from master back up to 94 branch.
> >
> > PG 9.4 and 9.5 have a common patch to be applied :
> > pg94_95_use_vfd_for_logrep.patch
> > From PG 9.6 onwards, each version has a separate patch.
> >
> > For PG 9.6, there is no logical decoding perl test file. So I have
> > made a new file 006_logical_decoding_spill.pl that has only the
> > specific testcase. Also, for building the test_decoding.so, I had to
> > add the EXTRA_INSTALL=contrib/test_decoding line in the
> > src/test/recovery/Makefile, because this is the first time we are
> > using the plugin in the 9.6 tap test.
> >
>
> I am not sure if we need to go that far for 9.6 branch.  If the other
> tests for logical decoding are not present, then I don't see why we
> need to create a new test file for this test only.  Also, I think this
> will make the patch the same for 9.4,9.5 and 9.6.

Ok. I tested pg94_95_use_vfd_for_logrep.patch for 9.6 branch, and it
works there. So please use this patch for all the three branches.

>
> > From PG 10 onwards, pgstat_report_*() calls around read() are removed
> > in the patch, because FileRead() itself reports the wait events.
> >
>
> Why there are different patches for 10 and 11?
For PG10, OpenTransientFile() and PathNameOpenFile() each have an
extra parameter for specifying file creation modes such as S_IRUSR or
S_IWUSR. For 11, these functions don't accept the flags, rather the
file is always opened with PG_FILE_MODE_OWNER. Because of these
differences in the calls, the PG 10 patch does not apply on 11.


>
> We should try to minimize the difference between patches in different branches.
Ok.


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

02 January 2020, 12:14:10

On Tue, Dec 24, 2019 at 2:31 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Tue, 24 Dec 2019 at 10:41, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Dec 20, 2019 at 9:31 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> > > Attached are the patches from master back up to 94 branch.
> > >
> > > PG 9.4 and 9.5 have a common patch to be applied :
> > > pg94_95_use_vfd_for_logrep.patch
> > > From PG 9.6 onwards, each version has a separate patch.
> > >
> > > For PG 9.6, there is no logical decoding perl test file. So I have
> > > made a new file 006_logical_decoding_spill.pl that has only the
> > > specific testcase. Also, for building the test_decoding.so, I had to
> > > add the EXTRA_INSTALL=contrib/test_decoding line in the
> > > src/test/recovery/Makefile, because this is the first time we are
> > > using the plugin in the 9.6 tap test.
> > >
> >
> > I am not sure if we need to go that far for 9.6 branch.  If the other
> > tests for logical decoding are not present, then I don't see why we
> > need to create a new test file for this test only.  Also, I think this
> > will make the patch the same for 9.4,9.5 and 9.6.
>
> Ok. I tested pg94_95_use_vfd_for_logrep.patch for 9.6 branch, and it
> works there. So please use this patch for all the three branches.
>

Pushed!

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

03 January 2020, 02:59:00

On Thu, Jan 2, 2020 at 5:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Dec 24, 2019 at 2:31 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
>
> Ok. I tested pg94_95_use_vfd_for_logrep.patch for 9.6 branch, and it
> works there. So please use this patch for all the three branches.
>

Pushed!

I see one failure in REL_10_STABLE [1] which seems to be due to this commit:

Test Summary Report
-------------------
t/006_logical_decoding.pl (Wstat: 7424 Tests: 10 Failed: 0)
Non-zero exit status: 29
Parse errors: Bad plan. You planned 11 tests but ran 10.
Files=14, Tests=122, 1968 wallclock secs ( 0.10 usr 0.03 sys + 19.00 cusr 21.98 csys = 41.11 CPU)
Result: FAIL
Makefile:19: recipe for target 'check' failed
make: *** [check] Error 1

See below snippet from 006_logical_decoding_master.log
..
..
2020-01-03 01:30:48.254 UTC [12189836:9] t/006_logical_decoding.pl STATEMENT: SELECT data from pg_logical_slot_get_changes('test_slot', NULL, NULL)
WHERE data LIKE '%INSERT%' ORDER BY lsn LIMIT 1;
2020-01-03 01:30:51.990 UTC [6882174:3] LOG: server process (PID 12189836) was terminated by signal 11
2020-01-03 01:30:51.990 UTC [6882174:4] DETAIL: Failed process was running: SELECT data from pg_logical_slot_get_changes('test_slot', NULL, NULL)
WHERE data LIKE '%INSERT%' ORDER BY lsn LIMIT 1;
2020-01-03 01:30:51.990 UTC [6882174:5] LOG: terminating any other active server processes

The strange thing is that the same test passes on master on the same machine [2] and for 10 as well, it passes on other machines, so not sure what could cause this. Any clue?

[1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2020-01-02%2023%3A36%3A31

[2] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2020-01-02%2010%3A37%3A33

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

03 January 2020, 04:49:12

On Fri, Jan 3, 2020 at 8:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jan 2, 2020 at 5:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Dec 24, 2019 at 2:31 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
>
> Ok. I tested pg94_95_use_vfd_for_logrep.patch for 9.6 branch, and it
> works there. So please use this patch for all the three branches.
>

Pushed!

I see one failure in REL_10_STABLE [1] which seems to be due to this commit:

I tried this test on my CentOs and Power8 machine more than 50 times, but couldn't reproduce it. So, adding Noah to see if he can try this test [1] on his machine (tern) and get stack track or some other information?

[1] - make -C src/test/recovery/ check PROVE_TESTS=t/006_logical_decoding.pl

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

03 January 2020, 08:50:09

On Fri, 3 Jan 2020 at 10:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jan 3, 2020 at 8:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Thu, Jan 2, 2020 at 5:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>
>>> On Tue, Dec 24, 2019 at 2:31 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>>> >
>>> >
>>> > Ok. I tested pg94_95_use_vfd_for_logrep.patch for 9.6 branch, and it
>>> > works there. So please use this patch for all the three branches.
>>> >
>>>
>>> Pushed!
>>
>>
>>
>> I see one failure in REL_10_STABLE [1] which seems to be due to this commit:
>>
>
> I tried this test on my CentOs and Power8 machine more than 50 times, but couldn't reproduce it.  So, adding Noah to
seeif he can try this test [1] on his machine (tern) and get stack track or some other information?
 
>
> [1] - make -C src/test/recovery/ check PROVE_TESTS=t/006_logical_decoding.pl

I also tested multiple times using PG 10 branch; also tried to inject
an error so that PG_CATCH related code also gets covered, but
unfortunately didn't get the crash on my machine. I guess, we will
have to somehow get the stacktrace.

>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Noah Misch

Date:

04 January 2020, 18:51:48

On Fri, Jan 03, 2020 at 02:20:09PM +0530, Amit Khandekar wrote:
> On Fri, 3 Jan 2020 at 10:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Fri, Jan 3, 2020 at 8:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> I see one failure in REL_10_STABLE [1] which seems to be due to this commit:
> >
> > I tried this test on my CentOs and Power8 machine more than 50 times, but couldn't reproduce it.  So, adding Noah
tosee if he can try this test [1] on his machine (tern) and get stack track or some other information?

> >
> > [1] - make -C src/test/recovery/ check PROVE_TESTS=t/006_logical_decoding.pl
> 
> I also tested multiple times using PG 10 branch; also tried to inject
> an error so that PG_CATCH related code also gets covered, but
> unfortunately didn't get the crash on my machine. I guess, we will
> have to somehow get the stacktrace.

I have buildfarm member tern running this test in a loop.  In the 290
iterations so far, it hasn't failed.  I've leave it running for another week
or so.

The buildfarm client can capture stack traces, but it currently doesn't do so
for TAP test suites (search the client code for get_stack_trace).  If someone
feels like writing a fix for that, it would be a nice improvement.  Perhaps,
rather than having the client code know all the locations where core files
might appear, failed runs should walk the test directory tree for core files?

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

05 January 2020, 04:59:52

On Sun, Jan 5, 2020 at 12:21 AM Noah Misch <noah@leadboat.com> wrote:
>
> On Fri, Jan 03, 2020 at 02:20:09PM +0530, Amit Khandekar wrote:
> > On Fri, 3 Jan 2020 at 10:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > On Fri, Jan 3, 2020 at 8:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >> I see one failure in REL_10_STABLE [1] which seems to be due to this commit:
> > >
> > > I tried this test on my CentOs and Power8 machine more than 50 times, but couldn't reproduce it.  So, adding Noah
tosee if he can try this test [1] on his machine (tern) and get stack track or some other information?
 
> > >
> > > [1] - make -C src/test/recovery/ check PROVE_TESTS=t/006_logical_decoding.pl
> >
> > I also tested multiple times using PG 10 branch; also tried to inject
> > an error so that PG_CATCH related code also gets covered, but
> > unfortunately didn't get the crash on my machine. I guess, we will
> > have to somehow get the stacktrace.
>
> I have buildfarm member tern running this test in a loop.  In the 290
> iterations so far, it hasn't failed.  I've leave it running for another week
> or so.
>

Okay, thanks!  FYI, your other machine 'mandril' also exhibits the
exact same behavior and on v10.  Both the machines (tern and mandril)
seem to have the same specs which seems to be the reason that they are
failing in the same way.  The thing that bothers me is that the fix
and test are the same for v11 and test passes for v11 on both
machines.  Does this indicate any random behavior or maybe some other
bug in v10 which is discovered by this test?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Khandekar

Date:

08 January 2020, 09:20:53

On Sun, 5 Jan 2020 at 00:21, Noah Misch <noah@leadboat.com> wrote:
> The buildfarm client can capture stack traces, but it currently doesn't do so
> for TAP test suites (search the client code for get_stack_trace).  If someone
> feels like writing a fix for that, it would be a nice improvement.  Perhaps,
> rather than having the client code know all the locations where core files
> might appear, failed runs should walk the test directory tree for core files?

I think this might end up having the same code to walk the directory
spread out on multiple files. Instead, I think in the build script, in
get_stack_trace(), we can do an equivalent of "find <inputdir> -name
"*core*" , as against the current way in which it looks for core files
only in the specific data directory. So get_stack_trace(bindir,
datadir) would change to get_stack_trace(bindir, input_dir) where
input_dir can be any directory that can contain multiple data
directories. E.g. a recovery test can create multiple instances so
there would be multiple data directories inside the test directory.

Noah, is it possible to run a patch'ed build script once I submit a
patch, so that we can quickly get the stack trace ? I mean, can we do
this before getting the patch committed ? I guess, we can run the
build script with a single branch specified, right ?

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

09 January 2020, 02:12:29

On Sun, Jan 5, 2020 at 10:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Jan 5, 2020 at 12:21 AM Noah Misch <noah@leadboat.com> wrote:
> >
> > On Fri, Jan 03, 2020 at 02:20:09PM +0530, Amit Khandekar wrote:
> > > On Fri, 3 Jan 2020 at 10:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > On Fri, Jan 3, 2020 at 8:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >> I see one failure in REL_10_STABLE [1] which seems to be due to this commit:
> > > >
> > > > I tried this test on my CentOs and Power8 machine more than 50 times, but couldn't reproduce it.  So, adding
Noahto see if he can try this test [1] on his machine (tern) and get stack track or some other information?
 
> > > >
> > > > [1] - make -C src/test/recovery/ check PROVE_TESTS=t/006_logical_decoding.pl
> > >
> > > I also tested multiple times using PG 10 branch; also tried to inject
> > > an error so that PG_CATCH related code also gets covered, but
> > > unfortunately didn't get the crash on my machine. I guess, we will
> > > have to somehow get the stacktrace.
> >
> > I have buildfarm member tern running this test in a loop.  In the 290
> > iterations so far, it hasn't failed.  I've leave it running for another week
> > or so.
> >
>
> Okay, thanks!  FYI, your other machine 'mandril' also exhibits the
> exact same behavior and on v10.  Both the machines (tern and mandril)
> seem to have the same specs which seems to be the reason that they are
> failing in the same way.  The thing that bothers me is that the fix
> and test are the same for v11 and test passes for v11 on both
> machines.  Does this indicate any random behavior or maybe some other
> bug in v10 which is discovered by this test?
>

Another thing to notice here is that on buildfarm 'tern' (for v10), it
is getting reproduced, whereas when you ran it independently, then the
problem is not reproduced even after so many runs.  What could be the
difference which is causing this?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Noah Misch

Date:

09 January 2020, 05:37:04

On Wed, Jan 08, 2020 at 02:50:53PM +0530, Amit Khandekar wrote:
> On Sun, 5 Jan 2020 at 00:21, Noah Misch <noah@leadboat.com> wrote:
> > The buildfarm client can capture stack traces, but it currently doesn't do so
> > for TAP test suites (search the client code for get_stack_trace).  If someone
> > feels like writing a fix for that, it would be a nice improvement.  Perhaps,
> > rather than having the client code know all the locations where core files
> > might appear, failed runs should walk the test directory tree for core files?
> 
> I think this might end up having the same code to walk the directory
> spread out on multiple files. Instead, I think in the build script, in
> get_stack_trace(), we can do an equivalent of "find <inputdir> -name
> "*core*" , as against the current way in which it looks for core files
> only in the specific data directory.

Agreed.

> Noah, is it possible to run a patch'ed build script once I submit a
> patch, so that we can quickly get the stack trace ? I mean, can we do
> this before getting the patch committed ? I guess, we can run the
> build script with a single branch specified, right ?

Yes to all questions, but it would not have helped in this case.  First, v10
deletes PostgresNode base directories at the end of this test file, despite
the failure[1].  Second, the stack trace was minimal:

  (gdb) bt       
  #0  0xd011119c in extend_brk () from /usr/lib/libc.a(shr.o)

Even so, a web search for "extend_brk" led to the answer.  By default, 32-bit
AIX binaries get only 256M of RAM for stack and sbrk.  The new regression test
used more than that, hence this crash.  Setting LDR_CNTRL=MAXDATA=0x80000000
in the environment cured the crash.  I've put that in the buildfarm member
configuration and started a new run.

(PostgreSQL documentation actually covers this problem:
https://www.postgresql.org/docs/devel/installation-platform-notes.html#INSTALLATION-NOTES-AIX)

[1] It has the all_tests_passing() logic in an attempt to stop this.  I'm
guessing it didn't help because the file failed by calling die "connection
error: ...", not by reporting a failure to Test::More via ok(0) or similar.

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Tom Lane

Date:

09 January 2020, 05:45:41

Noah Misch <noah@leadboat.com> writes:
> Even so, a web search for "extend_brk" led to the answer.  By default, 32-bit
> AIX binaries get only 256M of RAM for stack and sbrk.  The new regression test
> used more than that, hence this crash.

Hm, so

(1) Why did we get a crash and not some more-decipherable out-of-resources
error?  Can we improve that experience?

(2) Should we be dialing back the resource consumption of this test?
Even on machines where it doesn't fail outright, I'd imagine that it's
costing a lot of buildfarm cycles.  Is it actually worth that?

            regards, tom lane

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

09 January 2020, 06:17:46

On Thu, Jan 9, 2020 at 11:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Noah Misch <noah@leadboat.com> writes:
> > Even so, a web search for "extend_brk" led to the answer.  By default, 32-bit
> > AIX binaries get only 256M of RAM for stack and sbrk.  The new regression test
> > used more than that, hence this crash.
>
> Hm, so
>
> (1) Why did we get a crash and not some more-decipherable out-of-resources
> error?  Can we improve that experience?
>
> (2) Should we be dialing back the resource consumption of this test?
>

In HEAD, we have a guc variable 'logical_decoding_work_mem' by which
we can control the memory usage of changes and we have used that, but
for back branches, we don't have such a control.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

09 January 2020, 10:51:38

On Thu, Jan 9, 2020 at 11:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Noah Misch <noah@leadboat.com> writes:
> > Even so, a web search for "extend_brk" led to the answer.  By default, 32-bit
> > AIX binaries get only 256M of RAM for stack and sbrk.  The new regression test
> > used more than that, hence this crash.
>
> Hm, so
>
> (1) Why did we get a crash and not some more-decipherable out-of-resources
> error?  Can we improve that experience?
>
> (2) Should we be dialing back the resource consumption of this test?
> Even on machines where it doesn't fail outright, I'd imagine that it's
> costing a lot of buildfarm cycles.  Is it actually worth that?
>

After the latest changes by Noah, the tern and mandrill both are
green.   I will revert the test added by this patch unless there is
some strong argument to keep it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Alvaro Herrera

Date:

09 January 2020, 14:44:29

On 2020-Jan-09, Amit Kapila wrote:

> In HEAD, we have a guc variable 'logical_decoding_work_mem' by which
> we can control the memory usage of changes and we have used that, but
> for back branches, we don't have such a control.

> After the latest changes by Noah, the tern and mandrill both are
> green.   I will revert the test added by this patch unless there is
> some strong argument to keep it.

Hmm, so why not revert the test only in the back branches, given that
it's not so onerous in master?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Tom Lane

Date:

09 January 2020, 15:25:22

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Hmm, so why not revert the test only in the back branches, given that
> it's not so onerous in master?

I grow tired of repeating myself, but: it's purely accidental that this
test passes in master for the existing set of buildfarm members.

If I have to do so to prove my point, I will set up a buildfarm member
that uses USE_NAMED_POSIX_SEMAPHORES, and then insist that the patch
cope with that.

But the real issue is that the test is abusing max_files_per_process
to do something it was never intended for.  What it was intended for,
and works well at, is to constrain the total FD consumption of a
collection of backends.  It doesn't work well to constrain the maximum
allocatedDescs consumption, because there's too much variability in
our demand for other FDs.  If we feel that we should have a test that
is constraining that, then we need to invent some other mechanism to
do it with.  If we're not willing to invent an appropriate mechanism
to support the test, then we should drop the test, because a
half-baked test is worse than none.

An appropriate mechanism, perhaps, would be some way to constrain
max_safe_fds directly, without any platform- or environment-dependent
effects in the way.  It could be as simple as

    /*
     * Take off the FDs reserved for system() etc.
     */
    max_safe_fds -= NUM_RESERVED_FDS;

+    /*
+     * Apply debugging limit, if defined.
+     */
+#ifdef MAX_SAFE_FDS_LIMIT
+    max_safe_fds = Min(max_safe_fds, MAX_SAFE_FDS_LIMIT);
+#endif
+
    /*
     * Make sure we still have enough to get by.
     */

and then somebody who was concerned about this could run a buildfarm
member with "-DMAX_SAFE_FDS_LIMIT=10" or so.

            regards, tom lane

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Alvaro Herrera

Date:

09 January 2020, 15:30:58

On 2020-Jan-09, Tom Lane wrote:

> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Hmm, so why not revert the test only in the back branches, given that
> > it's not so onerous in master?
> 
> I grow tired of repeating myself, but: it's purely accidental that this
> test passes in master for the existing set of buildfarm members.

Oh, I forgot we had that problem.  I agree with reverting the test,
rather than building all the extra functionality needed to make it more
stable, in that case.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Tom Lane

Date:

09 January 2020, 23:51:41

Amit Kapila <amit.kapila16@gmail.com> writes:
> On Thu, Jan 9, 2020 at 11:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Noah Misch <noah@leadboat.com> writes:
>>> Even so, a web search for "extend_brk" led to the answer.  By default, 32-bit
>>> AIX binaries get only 256M of RAM for stack and sbrk.  The new regression test
>>> used more than that, hence this crash.

>> Hm, so
>> (1) Why did we get a crash and not some more-decipherable out-of-resources
>> error?  Can we improve that experience?
>> (2) Should we be dialing back the resource consumption of this test?

> In HEAD, we have a guc variable 'logical_decoding_work_mem' by which
> we can control the memory usage of changes and we have used that, but
> for back branches, we don't have such a control.

I poked into this a bit more by running the src/test/recovery tests under
restrictive ulimit settings.  I used

ulimit -s 1024
ulimit -v 250000

(At least on my 64-bit RHEL6 box, reducing ulimit -v much below this
causes initdb to fail, apparently because the post-bootstrap process
tries to load all our tsearch and encoding conversion shlibs at once,
and it hasn't got enough VM space to do so.  Someday we may have to
improve that.)

I did not manage to duplicate Noah's crash this way.  What I see in
the v10 branch is that the new 006_logical_decoding.pl test fails,
but with a clean "out of memory" error.  The memory map dump that
that produces fingers the culprit pretty unambiguously:

...
          ReorderBuffer: 223302560 total in 26995 blocks; 7056 free (3 chunks); 223295504 used
            ReorderBufferByXid: 24576 total in 2 blocks; 11888 free (3 chunks); 12688 used
            Slab: TXN: 8192 total in 1 blocks; 5208 free (21 chunks); 2984 used
            Slab: Change: 2170880 total in 265 blocks; 2800 free (35 chunks); 2168080 used
...
Grand total: 226714720 bytes in 27327 blocks; 590888 free (785 chunks); 226123832 used

The test case is only inserting 50K fairly-short rows, so this seems
like an unreasonable amount of memory to be consuming for that; and
even if you think it's reasonable, it clearly isn't going to scale
to large production transactions.

Now, the good news is that v11 and later get through
006_logical_decoding.pl just fine under the same restriction.
So we did something in v11 to fix this excessive memory consumption.
However, unless we're willing to back-port whatever that was, this
test case is clearly consuming excessive resources for the v10 branch.

We're not out of the woods either.  I also observe that v12 and HEAD
fall over, under these same test conditions, with a stack-overflow
error in the 012_subtransactions.pl test.  This seems to be due to
somebody's decision to use a heavily recursive function to generate a
bunch of subtransactions.  Is there a good reason for hs_subxids() to
use recursion instead of a loop?  If there is, what's the value of
using 201 levels rather than, say, 10?

Anyway it remains unclear why Noah's machine got a crash instead of
something more user-friendly.  But the reason why it's only in the
v10 branch seems non-mysterious.

            regards, tom lane

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Tom Lane

Date:

10 January 2020, 00:40:12

I wrote:
>           ReorderBuffer: 223302560 total in 26995 blocks; 7056 free (3 chunks); 223295504 used

> The test case is only inserting 50K fairly-short rows, so this seems
> like an unreasonable amount of memory to be consuming for that; and
> even if you think it's reasonable, it clearly isn't going to scale
> to large production transactions.

> Now, the good news is that v11 and later get through
> 006_logical_decoding.pl just fine under the same restriction.
> So we did something in v11 to fix this excessive memory consumption.
> However, unless we're willing to back-port whatever that was, this
> test case is clearly consuming excessive resources for the v10 branch.

I dug around a little in the git history for backend/replication/logical/,
and while I find several commit messages mentioning memory leaks and
faulty spill logic, they all claim to have been back-patched as far
as 9.4.

It seems reasonably likely to me that this result is telling us about
an actual bug, ie, faulty back-patching of one or more of those fixes
into v10 and perhaps earlier branches.

I don't know this code well enough to take point on looking for the
problem, though.

            regards, tom lane

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Noah Misch

Date:

10 January 2020, 01:57:36

On Thu, Jan 09, 2020 at 12:45:41AM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > Even so, a web search for "extend_brk" led to the answer.  By default, 32-bit
> > AIX binaries get only 256M of RAM for stack and sbrk.  The new regression test
> > used more than that, hence this crash.
> 
> Hm, so
> 
> (1) Why did we get a crash and not some more-decipherable out-of-resources
> error?  Can we improve that experience?

By default, 32-bit AIX binaries have maxdata:0x00000000.  Specifying
maxdata:0x10000000 provides the same 256M of RAM, yet it magically changes the
SIGSEGV to ENOMEM:

$ OBJECT_MODE=32 gcc maxdata.c && ./a.out
Segmentation fault
$ OBJECT_MODE=32 gcc -Wl,-bmaxdata:0x00000000 maxdata.c && ./a.out
Segmentation fault
$ OBJECT_MODE=32 gcc -Wl,-bmaxdata:0x10000000 maxdata.c && ./a.out
done at 255 MiB: Not enough space

We could add -Wl,-bmaxdata:0x10000000 (or a higher value) to LDFLAGS when
building for 32-bit AIX.

> (2) Should we be dialing back the resource consumption of this test?
> Even on machines where it doesn't fail outright, I'd imagine that it's
> costing a lot of buildfarm cycles.  Is it actually worth that?

The test's resource usage, being quite low, should not be a factor in the
test's fate.  On my usual development machine, the entire
006_logical_decoding.pl file takes just 3s and ~250 MiB of RAM.

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

10 January 2020, 04:01:31

On Fri, Jan 10, 2020 at 6:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
> >           ReorderBuffer: 223302560 total in 26995 blocks; 7056 free (3 chunks); 223295504 used
>
> > The test case is only inserting 50K fairly-short rows, so this seems
> > like an unreasonable amount of memory to be consuming for that; and
> > even if you think it's reasonable, it clearly isn't going to scale
> > to large production transactions.
>
> > Now, the good news is that v11 and later get through
> > 006_logical_decoding.pl just fine under the same restriction.
> > So we did something in v11 to fix this excessive memory consumption.
> > However, unless we're willing to back-port whatever that was, this
> > test case is clearly consuming excessive resources for the v10 branch.
>
> I dug around a little in the git history for backend/replication/logical/,
> and while I find several commit messages mentioning memory leaks and
> faulty spill logic, they all claim to have been back-patched as far
> as 9.4.
>
> It seems reasonably likely to me that this result is telling us about
> an actual bug, ie, faulty back-patching of one or more of those fixes
> into v10 and perhaps earlier branches.
>

I think it would be good to narrow down this problem, but it seems we
can do this separately.   I think to avoid forgetting about this, can
we track it somewhere as an open issue (In Older Bugs section of
PostgreSQL 12 Open Items or some other place)?

It seems to me that this test has found a problem in back-branches, so
we might want to keep it after removing the max_files_per_process
restriction.  However, unless we narrow down this memory leak it is
not a good idea to keep it at least not in v10.  So, we have the below
options:
(a) remove this test entirely from all branches and once we found the
memory leak problem in back-branches, then consider adding it again
without max_files_per_process restriction.
(b) keep this test without max_files_per_process restriction till v11
and once the memory leak issue in v10 is found, we can back-patch to
v10 as well.

Suggestions?

>  If I have to do so to prove my point, I will set up a buildfarm member
>  that uses USE_NAMED_POSIX_SEMAPHORES, and then insist that the patch
>  cope with that.
>

Shall we document that under USE_NAMED_POSIX_SEMAPHORES, we consume
additional fd?  I thought about it because the minimum limit for
max_files_per_process is 25 and the system won't even start if someone
has used a platform where USE_NAMED_POSIX_SEMAPHORES is enabled.
Also, if it would have been explicitly mentioned, then I think this
test wouldn't have tried to become so optimistic about
max_files_per_process.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Tom Lane

Date:

10 January 2020, 05:16:37

Noah Misch <noah@leadboat.com> writes:
> On Thu, Jan 09, 2020 at 12:45:41AM -0500, Tom Lane wrote:
>> (1) Why did we get a crash and not some more-decipherable out-of-resources
>> error?  Can we improve that experience?

> By default, 32-bit AIX binaries have maxdata:0x00000000.  Specifying
> maxdata:0x10000000 provides the same 256M of RAM, yet it magically changes the
> SIGSEGV to ENOMEM:
> ...
> We could add -Wl,-bmaxdata:0x10000000 (or a higher value) to LDFLAGS when
> building for 32-bit AIX.

+1, seems like that would improve matters considerably on that platform.

>> (2) Should we be dialing back the resource consumption of this test?
>> Even on machines where it doesn't fail outright, I'd imagine that it's
>> costing a lot of buildfarm cycles.  Is it actually worth that?

> The test's resource usage, being quite low, should not be a factor in the
> test's fate.  On my usual development machine, the entire
> 006_logical_decoding.pl file takes just 3s and ~250 MiB of RAM.

Yeah, as I noted downthread, it appears that initdb itself can't
succeed with less than ~250MB these days.  My old-school self
feels like that's excessive, but I must admit I'm not motivated
to go reduce it right now.  But I think it's a clear win to fail
with "out of memory" rather than "SIGSEGV", so I think we ought
to adjust the AIX build options as you suggest.

            regards, tom lane

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

11 January 2020, 05:36:35

On Fri, Jan 10, 2020 at 9:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jan 10, 2020 at 6:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > I wrote:
> > >           ReorderBuffer: 223302560 total in 26995 blocks; 7056 free (3 chunks); 223295504 used
> >
> > > The test case is only inserting 50K fairly-short rows, so this seems
> > > like an unreasonable amount of memory to be consuming for that; and
> > > even if you think it's reasonable, it clearly isn't going to scale
> > > to large production transactions.
> >
> > > Now, the good news is that v11 and later get through
> > > 006_logical_decoding.pl just fine under the same restriction.
> > > So we did something in v11 to fix this excessive memory consumption.
> > > However, unless we're willing to back-port whatever that was, this
> > > test case is clearly consuming excessive resources for the v10 branch.
> >
> > I dug around a little in the git history for backend/replication/logical/,
> > and while I find several commit messages mentioning memory leaks and
> > faulty spill logic, they all claim to have been back-patched as far
> > as 9.4.
> >
> > It seems reasonably likely to me that this result is telling us about
> > an actual bug, ie, faulty back-patching of one or more of those fixes
> > into v10 and perhaps earlier branches.
> >
>
> I think it would be good to narrow down this problem, but it seems we
> can do this separately.   I think to avoid forgetting about this, can
> we track it somewhere as an open issue (In Older Bugs section of
> PostgreSQL 12 Open Items or some other place)?
>
> It seems to me that this test has found a problem in back-branches, so
> we might want to keep it after removing the max_files_per_process
> restriction.  However, unless we narrow down this memory leak it is
> not a good idea to keep it at least not in v10.  So, we have the below
> options:
> (a) remove this test entirely from all branches and once we found the
> memory leak problem in back-branches, then consider adding it again
> without max_files_per_process restriction.
> (b) keep this test without max_files_per_process restriction till v11
> and once the memory leak issue in v10 is found, we can back-patch to
> v10 as well.
>

I am planning to go with option (a) and attached are patches to revert
the entire test on HEAD and back branches.  I am planning to commit
these by Tuesday unless someone has a better idea.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Tom Lane

Date:

11 January 2020, 05:46:19

Amit Kapila <amit.kapila16@gmail.com> writes:
> On Fri, Jan 10, 2020 at 9:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> ... So, we have the below
>> options:
>> (a) remove this test entirely from all branches and once we found the
>> memory leak problem in back-branches, then consider adding it again
>> without max_files_per_process restriction.
>> (b) keep this test without max_files_per_process restriction till v11
>> and once the memory leak issue in v10 is found, we can back-patch to
>> v10 as well.

> I am planning to go with option (a) and attached are patches to revert
> the entire test on HEAD and back branches.  I am planning to commit
> these by Tuesday unless someone has a better idea.

Makes sense to me.  We've certainly found out something interesting
from this test, but not what it was expecting to find ;-).  I think
that there could be scope for two sorts of successor tests:

* I still like my idea of directly constraining max_safe_fds through
some sort of debug option.  But to my mind, we want to run the entire
regression suite with that restriction, not just one small test.

* The seeming bug in v10 suggests that we aren't testing large enough
logical-decoding cases, or at least aren't noticing leaks in that
area.  I'm not sure what a good design is for testing that.  I'm not
thrilled with just using a larger (and slower) test case, but it's
not clear to me how else to attack it.

            regards, tom lane

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

12 January 2020, 02:48:08

On Sat, Jan 11, 2020 at 11:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Kapila <amit.kapila16@gmail.com> writes:
> > On Fri, Jan 10, 2020 at 9:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> ... So, we have the below
> >> options:
> >> (a) remove this test entirely from all branches and once we found the
> >> memory leak problem in back-branches, then consider adding it again
> >> without max_files_per_process restriction.
> >> (b) keep this test without max_files_per_process restriction till v11
> >> and once the memory leak issue in v10 is found, we can back-patch to
> >> v10 as well.
>
> > I am planning to go with option (a) and attached are patches to revert
> > the entire test on HEAD and back branches.  I am planning to commit
> > these by Tuesday unless someone has a better idea.
>
> Makes sense to me.  We've certainly found out something interesting
> from this test, but not what it was expecting to find ;-).  I think
> that there could be scope for two sorts of successor tests:
>
> * I still like my idea of directly constraining max_safe_fds through
> some sort of debug option.  But to my mind, we want to run the entire
> regression suite with that restriction, not just one small test.
>

Good idea.

> * The seeming bug in v10 suggests that we aren't testing large enough
> logical-decoding cases, or at least aren't noticing leaks in that
> area.  I'm not sure what a good design is for testing that.  I'm not
> thrilled with just using a larger (and slower) test case, but it's
> not clear to me how else to attack it.
>

It is not clear to me either at this stage, but I think we can decide
that after chasing the issue in v10.  My current plan is to revert
this test and make a note of the memory leak problem found (probably
track in Older Bugs section of PostgreSQL 12 Open Items).  I think
once we found the issue id v10, we might be in a better position to
decide if the test on the lines of the current test would make sense
or we need something else.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Tomas Vondra

Date:

12 January 2020, 03:39:30

On Thu, Jan 09, 2020 at 07:40:12PM -0500, Tom Lane wrote:
>I wrote:
>>           ReorderBuffer: 223302560 total in 26995 blocks; 7056 free (3 chunks); 223295504 used
>
>> The test case is only inserting 50K fairly-short rows, so this seems
>> like an unreasonable amount of memory to be consuming for that; and
>> even if you think it's reasonable, it clearly isn't going to scale
>> to large production transactions.
>
>> Now, the good news is that v11 and later get through
>> 006_logical_decoding.pl just fine under the same restriction.
>> So we did something in v11 to fix this excessive memory consumption.
>> However, unless we're willing to back-port whatever that was, this
>> test case is clearly consuming excessive resources for the v10 branch.
>
>I dug around a little in the git history for backend/replication/logical/,
>and while I find several commit messages mentioning memory leaks and
>faulty spill logic, they all claim to have been back-patched as far
>as 9.4.
>
>It seems reasonably likely to me that this result is telling us about
>an actual bug, ie, faulty back-patching of one or more of those fixes
>into v10 and perhaps earlier branches.
>
>I don't know this code well enough to take point on looking for the
>problem, though.
>

Well, one thing we did in 11 is introduction of the Generation context.
In 10 we're still stashing all tuple data into the main AllocSet. I
wonder if backporting a4ccc1cef5a04cc054af83bc4582a045d5232cb3 and a
couple of follow-up fixes would make the issue go away.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Tom Lane

Date:

12 January 2020, 03:53:57

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Thu, Jan 09, 2020 at 07:40:12PM -0500, Tom Lane wrote:
>> It seems reasonably likely to me that this result is telling us about
>> an actual bug, ie, faulty back-patching of one or more of those fixes
>> into v10 and perhaps earlier branches.

> Well, one thing we did in 11 is introduction of the Generation context.
> In 10 we're still stashing all tuple data into the main AllocSet. I
> wonder if backporting a4ccc1cef5a04cc054af83bc4582a045d5232cb3 and a
> couple of follow-up fixes would make the issue go away.

Hm.  I'm loath to back-port Generation contexts.  But looking at
a4ccc1cef5a04cc054af83bc4582a045d5232cb3, I see that (a) the
commit message mentions space savings, but (b) the replaced code
in reorderbuffer.c doesn't look like it really would move the needle
much in that regard.  The old code had a one-off slab allocator
that we got rid of, but I don't see any actual leak there ...
remind me where the win came from, exactly?

            regards, tom lane

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Tomas Vondra

Date:

12 January 2020, 04:10:08

On Sat, Jan 11, 2020 at 10:53:57PM -0500, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On Thu, Jan 09, 2020 at 07:40:12PM -0500, Tom Lane wrote:
>>> It seems reasonably likely to me that this result is telling us about
>>> an actual bug, ie, faulty back-patching of one or more of those fixes
>>> into v10 and perhaps earlier branches.
>
>> Well, one thing we did in 11 is introduction of the Generation context.
>> In 10 we're still stashing all tuple data into the main AllocSet. I
>> wonder if backporting a4ccc1cef5a04cc054af83bc4582a045d5232cb3 and a
>> couple of follow-up fixes would make the issue go away.
>
>Hm.  I'm loath to back-port Generation contexts.

Yeah, I agree. My suggestion was to try backpatching it and see if it
resolves the issue.

>But looking at
>a4ccc1cef5a04cc054af83bc4582a045d5232cb3, I see that (a) the
>commit message mentions space savings, but (b) the replaced code
>in reorderbuffer.c doesn't look like it really would move the needle
>much in that regard.  The old code had a one-off slab allocator
>that we got rid of, but I don't see any actual leak there ...
>remind me where the win came from, exactly?
>

Well, the problem is that in 10 we allocate tuple data in the main
memory ReorderBuffer context, and when the transaction gets decoded we
pfree() it. But in AllocSet that only moves the data to the freelists,
it does not release it entirely. So with the right allocation pattern
(sufficiently diverse chunk sizes) this can easily result in allocation
of large amount of memory that is never released.

I don't know if this is what's happening in this particular test, but I
wouldn't be surprised by it.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Tom Lane

Date:

12 January 2020, 04:20:56

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Sat, Jan 11, 2020 at 10:53:57PM -0500, Tom Lane wrote:
>> remind me where the win came from, exactly?

> Well, the problem is that in 10 we allocate tuple data in the main
> memory ReorderBuffer context, and when the transaction gets decoded we
> pfree() it. But in AllocSet that only moves the data to the freelists,
> it does not release it entirely. So with the right allocation pattern
> (sufficiently diverse chunk sizes) this can easily result in allocation
> of large amount of memory that is never released.

> I don't know if this is what's happening in this particular test, but I
> wouldn't be surprised by it.

Nah, don't think I believe that: the test inserts a bunch of tuples,
but they look like they will all be *exactly* the same size.

CREATE TABLE decoding_test(x integer, y text);
...

    FOR i IN 1..10 LOOP
        BEGIN
            INSERT INTO decoding_test(x) SELECT generate_series(1,5000);
        EXCEPTION
            when division_by_zero then perform 'dummy';
        END;

            regards, tom lane

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

14 January 2020, 04:28:32

On Sun, Jan 12, 2020 at 8:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Jan 11, 2020 at 11:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
>
> > * The seeming bug in v10 suggests that we aren't testing large enough
> > logical-decoding cases, or at least aren't noticing leaks in that
> > area.  I'm not sure what a good design is for testing that.  I'm not
> > thrilled with just using a larger (and slower) test case, but it's
> > not clear to me how else to attack it.
> >
>
> It is not clear to me either at this stage, but I think we can decide
> that after chasing the issue in v10.  My current plan is to revert
> this test and make a note of the memory leak problem found (probably
> track in Older Bugs section of PostgreSQL 12 Open Items).
>

Pushed the revert and added an open item in the 'Older Bugs' section
of PostgreSQL 12 Open Items [1].


[1] - https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

15 January 2020, 01:40:28

On Tue, Jan 14, 2020 at 9:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Jan 12, 2020 at 8:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sat, Jan 11, 2020 at 11:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >
> >
> > > * The seeming bug in v10 suggests that we aren't testing large enough
> > > logical-decoding cases, or at least aren't noticing leaks in that
> > > area.  I'm not sure what a good design is for testing that.  I'm not
> > > thrilled with just using a larger (and slower) test case, but it's
> > > not clear to me how else to attack it.
> > >
> >
> > It is not clear to me either at this stage, but I think we can decide
> > that after chasing the issue in v10.  My current plan is to revert
> > this test and make a note of the memory leak problem found (probably
> > track in Older Bugs section of PostgreSQL 12 Open Items).
> >
>
> Pushed the revert
>

Sidewinder is green now on back branches.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

TestLib condition for deleting temporary directories

From

Noah Misch

Date:

02 February 2020, 17:01:55

Forking thread "logical decoding : exceeded maxAllocatedDescs for .spill
files" for this side issue:

On Wed, Jan 08, 2020 at 09:37:04PM -0800, Noah Misch wrote:
> v10
> deletes PostgresNode base directories at the end of this test file, despite
> the failure[1].

> [1] It has the all_tests_passing() logic in an attempt to stop this.  I'm
> guessing it didn't help because the file failed by calling die "connection
> error: ...", not by reporting a failure to Test::More via ok(0) or similar.

That is what happened.  We should test the exit status to decide whether to
keep temporaries, as attached.  PostgresNode does that, since commit 90627cf
(thread https://postgr.es/m/flat/6205.1492883490%40sss.pgh.pa.us).  That
thread already discussed $SUBJECT[1] and the __DIE__ handler being
redundant[2].  I plan to back-patch, since it's most useful for v10 and v9.6.

[1] https://postgr.es/m/CAMsr+YFyFU=+MVFZqhthfMW22x5-h517e6ck6ET+DT+X4bUO7g@mail.gmail.com
[2] https://postgr.es/m/FEA925B2-C3AE-4BA9-9194-5F5616AD0794@yesql.se

Attachment

perl-temporaries-v1.patch

Re: TestLib condition for deleting temporary directories

From

Daniel Gustafsson

Date:

02 February 2020, 17:19:04

> On 2 Feb 2020, at 18:01, Noah Misch <noah@leadboat.com> wrote:
>
> Forking thread "logical decoding : exceeded maxAllocatedDescs for .spill
> files" for this side issue:

Thanks, I hadn't seen this.

> On Wed, Jan 08, 2020 at 09:37:04PM -0800, Noah Misch wrote:
>> v10
>> deletes PostgresNode base directories at the end of this test file, despite
>> the failure[1].
>
>> [1] It has the all_tests_passing() logic in an attempt to stop this.  I'm
>> guessing it didn't help because the file failed by calling die "connection
>> error: ...", not by reporting a failure to Test::More via ok(0) or similar.
>
> That is what happened.  We should test the exit status to decide whether to
> keep temporaries, as attached.  PostgresNode does that, since commit 90627cf
> (thread https://postgr.es/m/flat/6205.1492883490%40sss.pgh.pa.us).  That
> thread already discussed $SUBJECT[1] and the __DIE__ handler being
> redundant[2].  I plan to back-patch, since it's most useful for v10 and v9.6.

I'm travelling and haven't been able to test, but this makes sense from
reading.  +1 on backpatching.

cheers ./daniel

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Kuntal Ghosh

Date:

04 February 2020, 04:45:01

On Sun, Jan 12, 2020 at 9:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> > On Sat, Jan 11, 2020 at 10:53:57PM -0500, Tom Lane wrote:
> >> remind me where the win came from, exactly?
>
> > Well, the problem is that in 10 we allocate tuple data in the main
> > memory ReorderBuffer context, and when the transaction gets decoded we
> > pfree() it. But in AllocSet that only moves the data to the freelists,
> > it does not release it entirely. So with the right allocation pattern
> > (sufficiently diverse chunk sizes) this can easily result in allocation
> > of large amount of memory that is never released.
>
> > I don't know if this is what's happening in this particular test, but I
> > wouldn't be surprised by it.
>
> Nah, don't think I believe that: the test inserts a bunch of tuples,
> but they look like they will all be *exactly* the same size.
>
> CREATE TABLE decoding_test(x integer, y text);
> ...
>
>     FOR i IN 1..10 LOOP
>         BEGIN
>             INSERT INTO decoding_test(x) SELECT generate_series(1,5000);
>         EXCEPTION
>             when division_by_zero then perform 'dummy';
>         END;
>
I performed the same test in pg11 and reproduced the issue on the
commit prior to a4ccc1cef5a04 (Generational memory allocator).

ulimit -s 1024
ulimit -v 300000

wal_level = logical
max_replication_slots = 4

And executed the following code snippet (shared by Amit Khandekar
earlier in the thread).

SELECT pg_create_logical_replication_slot('test_slot',
'test_decoding');

CREATE TABLE decoding_test(x integer, y text);
do $$
BEGIN
    FOR i IN 1..10 LOOP
        BEGIN
            INSERT INTO decoding_test(x) SELECT
generate_series(1,3000);
        EXCEPTION
            when division_by_zero then perform 'dummy';
        END;
    END LOOP;
END $$;

SELECT data from pg_logical_slot_get_changes('test_slot', NULL, NULL) LIMIT 10;

I got the following error:
ERROR:  out of memory
DETAIL:  Failed on request of size 8208.

After that, I applied the "Generational memory allocator" patch and
that solved the issue. From the error message, it is evident that the
underlying code is trying to allocate a MaxTupleSize memory for each
tuple. So, I re-introduced the following lines (which are removed by
a4ccc1cef5a04) on top of the patch:

--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -417,6 +417,9 @@ ReorderBufferGetTupleBuf(ReorderBuffer *rb, Size tuple_len)

    alloc_len = tuple_len + SizeofHeapTupleHeader;

+   if (alloc_len < MaxHeapTupleSize)
+       alloc_len = MaxHeapTupleSize;

And, the issue got reproduced with the same error:
WARNING:  problem in Generation Tuples: number of free chunks 0 in
block 0x7fe9e9e74010 exceeds 1018 allocated
.....
ERROR:  out of memory
DETAIL:  Failed on request of size 8208.

I don't understand the code well enough to comment whether we can
back-patch only this part of the code. But, this seems to allocate a
huge amount of memory per chunk although the tuple is small.

Thoughts?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

04 February 2020, 09:10:26

On Tue, Feb 4, 2020 at 10:15 AM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
>
> On Sun, Jan 12, 2020 at 9:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> > > On Sat, Jan 11, 2020 at 10:53:57PM -0500, Tom Lane wrote:
> > >> remind me where the win came from, exactly?
> >
> > > Well, the problem is that in 10 we allocate tuple data in the main
> > > memory ReorderBuffer context, and when the transaction gets decoded we
> > > pfree() it. But in AllocSet that only moves the data to the freelists,
> > > it does not release it entirely. So with the right allocation pattern
> > > (sufficiently diverse chunk sizes) this can easily result in allocation
> > > of large amount of memory that is never released.
> >
> > > I don't know if this is what's happening in this particular test, but I
> > > wouldn't be surprised by it.
> >
> > Nah, don't think I believe that: the test inserts a bunch of tuples,
> > but they look like they will all be *exactly* the same size.
> >
> > CREATE TABLE decoding_test(x integer, y text);
> > ...
> >
> >     FOR i IN 1..10 LOOP
> >         BEGIN
> >             INSERT INTO decoding_test(x) SELECT generate_series(1,5000);
> >         EXCEPTION
> >             when division_by_zero then perform 'dummy';
> >         END;
> >
> I performed the same test in pg11 and reproduced the issue on the
> commit prior to a4ccc1cef5a04 (Generational memory allocator).
>
> ulimit -s 1024
> ulimit -v 300000
>
> wal_level = logical
> max_replication_slots = 4
>
> And executed the following code snippet (shared by Amit Khandekar
> earlier in the thread).
>
..
>
> SELECT data from pg_logical_slot_get_changes('test_slot', NULL, NULL) LIMIT 10;
>
> I got the following error:
> ERROR:  out of memory
> DETAIL:  Failed on request of size 8208.
>
> After that, I applied the "Generational memory allocator" patch and
> that solved the issue. From the error message, it is evident that the
> underlying code is trying to allocate a MaxTupleSize memory for each
> tuple. So, I re-introduced the following lines (which are removed by
> a4ccc1cef5a04) on top of the patch:
>
> --- a/src/backend/replication/logical/reorderbuffer.c
> +++ b/src/backend/replication/logical/reorderbuffer.c
> @@ -417,6 +417,9 @@ ReorderBufferGetTupleBuf(ReorderBuffer *rb, Size tuple_len)
>
>     alloc_len = tuple_len + SizeofHeapTupleHeader;
>
> +   if (alloc_len < MaxHeapTupleSize)
> +       alloc_len = MaxHeapTupleSize;
>
> And, the issue got reproduced with the same error:
> WARNING:  problem in Generation Tuples: number of free chunks 0 in
> block 0x7fe9e9e74010 exceeds 1018 allocated
> .....
> ERROR:  out of memory
> DETAIL:  Failed on request of size 8208.
>
> I don't understand the code well enough to comment whether we can
> back-patch only this part of the code.
>

I don't think we can just back-patch that part of code as it is linked
to the way we are maintaining a cache (~8MB) for frequently allocated
objects.  See the comments around the definition of
max_cached_tuplebufs.  But probably, we can do something once we reach
such a limit, basically, once we know that we have already allocated
max_cached_tuplebufs number of tuples of size MaxHeapTupleSize, we
don't need to allocate more of that size.  Does this make sense?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Kuntal Ghosh

Date:

07 February 2020, 12:01:47

On Tue, Feb 4, 2020 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> I don't think we can just back-patch that part of code as it is linked
> to the way we are maintaining a cache (~8MB) for frequently allocated
> objects.  See the comments around the definition of
> max_cached_tuplebufs.  But probably, we can do something once we reach
> such a limit, basically, once we know that we have already allocated
> max_cached_tuplebufs number of tuples of size MaxHeapTupleSize, we
> don't need to allocate more of that size.  Does this make sense?
>

Yeah, this makes sense. I've attached a patch that implements the
same. It solves the problem reported earlier. This solution will at
least slow down the process of going OOM even for very small sized
tuples.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Attachment

0001-Restrict-memory-allocation-in-reorderbuffer-context.patch

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Andres Freund

Date:

07 February 2020, 18:33:48

Hi,

On 2020-02-04 10:15:01 +0530, Kuntal Ghosh wrote:
> And, the issue got reproduced with the same error:
> WARNING:  problem in Generation Tuples: number of free chunks 0 in
> block 0x7fe9e9e74010 exceeds 1018 allocated

That seems like a problem in generation.c - because this should be
unreachable, I think?

Tomas?

Regards,

Andres

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Andres Freund

Date:

07 February 2020, 18:40:46

Hi,

On 2020-02-04 10:15:01 +0530, Kuntal Ghosh wrote:
> I performed the same test in pg11 and reproduced the issue on the
> commit prior to a4ccc1cef5a04 (Generational memory allocator).
> 
> ulimit -s 1024
> ulimit -v 300000
> 
> wal_level = logical
> max_replication_slots = 4
> 
> [...]

> After that, I applied the "Generational memory allocator" patch and
> that solved the issue. From the error message, it is evident that the
> underlying code is trying to allocate a MaxTupleSize memory for each
> tuple. So, I re-introduced the following lines (which are removed by
> a4ccc1cef5a04) on top of the patch:

> --- a/src/backend/replication/logical/reorderbuffer.c
> +++ b/src/backend/replication/logical/reorderbuffer.c
> @@ -417,6 +417,9 @@ ReorderBufferGetTupleBuf(ReorderBuffer *rb, Size tuple_len)
> 
>     alloc_len = tuple_len + SizeofHeapTupleHeader;
> 
> +   if (alloc_len < MaxHeapTupleSize)
> +       alloc_len = MaxHeapTupleSize;

Maybe I'm being slow here - but what does this actually prove? Before
the generation contexts were introduced we avoided fragmentation (which
would make things unusably slow) using a a brute force method (namely
forcing all tuple allocations to be of the same/maximum size).

Which means that yes, we'll need more memory than necessary. Do you
think you see anything but that here?

It's good that the situation is better now, but I don't think this means
we need to necessarily backpatch something nontrivial?

Greetings,

Andres Freund

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Tomas Vondra

Date:

07 February 2020, 19:02:01

On Fri, Feb 07, 2020 at 10:33:48AM -0800, Andres Freund wrote:
>Hi,
>
>On 2020-02-04 10:15:01 +0530, Kuntal Ghosh wrote:
>> And, the issue got reproduced with the same error:
>> WARNING:  problem in Generation Tuples: number of free chunks 0 in
>> block 0x7fe9e9e74010 exceeds 1018 allocated
>
>That seems like a problem in generation.c - because this should be
>unreachable, I think?
>
>Tomas?
>

That's rather strange. How could we print this message? The code looks
like this

   if (block->nfree >= block->nchunks)
     elog(WARNING, "problem in Generation %s: number of free chunks %d in block %p exceeds %d allocated",
          name, block->nfree, block, block->nchunks);

so this says 0 >= 1018. Or am I missing something?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Andres Freund

Date:

07 February 2020, 19:34:34

Hi,

On 2020-01-11 23:20:56 -0500, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> Nah, don't think I believe that: the test inserts a bunch of tuples,
> but they look like they will all be *exactly* the same size.
>
> CREATE TABLE decoding_test(x integer, y text);
> ...
>
>     FOR i IN 1..10 LOOP
>         BEGIN
>             INSERT INTO decoding_test(x) SELECT generate_series(1,5000);
>         EXCEPTION
>             when division_by_zero then perform 'dummy';
>         END;

I think the issue this triggers higher memory usage in in older versions
is that before

commit cec2edfa7859279f36d2374770ca920c59c73dd8
Author: Amit Kapila <akapila@postgresql.org>
Date:   2019-11-16 17:49:33 +0530

    Add logical_decoding_work_mem to limit ReorderBuffer memory usage.

we enforced how many changes to keep in memory (vs on disk)

/*
 * Maximum number of changes kept in memory, per transaction. After that,
 * changes are spooled to disk.
 *
 * The current value should be sufficient to decode the entire transaction
 * without hitting disk in OLTP workloads, while starting to spool to disk in
 * other workloads reasonably fast.
 *
 * At some point in the future it probably makes sense to have a more elaborate
 * resource management here, but it's not entirely clear what that would look
 * like.
 */
static const Size max_changes_in_memory = 4096;

on a per-transaction basis. And that subtransactions are *different*
transactions for that purpose (as they can be rolled back
separately). As the test generates loads of records for different
subtransactions, they each end up having quite a few changes (including
the tuples pointed to!) in memory at the same time.

Due to the way the limit of 4096 interacts with the 5000 rows inserted
above, we only hit the out of memory error when loading. That's because
when decoding (before the commit has been seen), we spill after 4096 changes:

2020-02-07 11:18:22.399 PST [1136134][3/2] DEBUG:  spill 4096 changes in XID 585 to disk
2020-02-07 11:18:22.419 PST [1136134][3/2] DEBUG:  spill 4096 changes in XID 586 to disk
2020-02-07 11:18:22.431 PST [1136134][3/2] DEBUG:  spill 4096 changes in XID 587 to disk
2020-02-07 11:18:22.443 PST [1136134][3/2] DEBUG:  spill 4096 changes in XID 588 to disk
2020-02-07 11:18:22.454 PST [1136134][3/2] DEBUG:  spill 4096 changes in XID 589 to disk
2020-02-07 11:18:22.465 PST [1136134][3/2] DEBUG:  spill 4096 changes in XID 590 to disk
2020-02-07 11:18:22.477 PST [1136134][3/2] DEBUG:  spill 4096 changes in XID 591 to disk
2020-02-07 11:18:22.488 PST [1136134][3/2] DEBUG:  spill 4096 changes in XID 592 to disk
2020-02-07 11:18:22.499 PST [1136134][3/2] DEBUG:  spill 4096 changes in XID 593 to disk
2020-02-07 11:18:22.511 PST [1136134][3/2] DEBUG:  spill 4096 changes in XID 594 to disk

so there's each 5000 - 4096 changes in memory, times 10. But when
actually calling the output plugin (at the commit record), we start with
loading changes back into memory from the start of each
subtransaction. That first entails spilling the tail of that transaction
to disk, and then loading the start:

2020-02-07 11:18:22.515 PST [1136134][3/2] DEBUG:  StartSubTransaction(1) name: unnamed; blockState: STARTED; state:
INPROGR,xid/subid/cid: 0/1/0

2020-02-07 11:18:22.515 PST [1136134][3/2] DEBUG:  StartSubTransaction(2) name: replay; blockState: SUB BEGIN; state:
INPROGR,xid/subid/cid: 0/2/0

2020-02-07 11:18:22.515 PST [1136134][3/2] DEBUG:  spill 904 changes in XID 585 to disk
2020-02-07 11:18:22.524 PST [1136134][3/2] DEBUG:  restored 4096 changes in XID 585 into memory
2020-02-07 11:18:22.524 PST [1136134][3/2] DEBUG:  spill 904 changes in XID 586 to disk
2020-02-07 11:18:22.534 PST [1136134][3/2] DEBUG:  restored 4096 changes in XID 586 into memory
2020-02-07 11:18:22.534 PST [1136134][3/2] DEBUG:  spill 904 changes in XID 587 to disk
2020-02-07 11:18:22.544 PST [1136134][3/2] DEBUG:  restored 4096 changes in XID 587 into memory
2020-02-07 11:18:22.544 PST [1136134][3/2] DEBUG:  spill 904 changes in XID 588 to disk
2020-02-07 11:18:22.554 PST [1136134][3/2] DEBUG:  restored 4096 changes in XID 588 into memory
2020-02-07 11:18:22.554 PST [1136134][3/2] DEBUG:  spill 904 changes in XID 589 to disk
TopMemoryContext: 161440 total in 7 blocks; 80240 free (68 chunks); 81200 used
...

Because each transaction has 4096 changes in memory, we actually need
more memory here than we did during the decoding phase, where all but
the "current" subtransaction only have 5000 - 4096 changes in memory.

If we instead change the test to insert 4096*2 - 1 tuples each, we run
out of memory earlier:
2020-02-07 11:23:20.540 PST [1136134][3/12] DEBUG:  spill 4096 changes in XID 610 to disk
2020-02-07 11:23:20.565 PST [1136134][3/12] DEBUG:  spill 4096 changes in XID 611 to disk
2020-02-07 11:23:20.587 PST [1136134][3/12] DEBUG:  spill 4096 changes in XID 612 to disk
2020-02-07 11:23:20.608 PST [1136134][3/12] DEBUG:  spill 4096 changes in XID 613 to disk
2020-02-07 11:23:20.630 PST [1136134][3/12] DEBUG:  spill 4096 changes in XID 614 to disk
TopMemoryContext: 161440 total in 7 blocks; 79264 free (82 chunks); 82176 used
...
2020-02-07 11:23:20.655 PST [1136134][3/12] ERROR:  out of memory
2020-02-07 11:23:20.655 PST [1136134][3/12] DETAIL:  Failed on request of size 8208.
2020-02-07 11:23:20.655 PST [1136134][3/12] STATEMENT:  SELECT * FROM pg_logical_slot_peek_changes('regression_slot',
NULL,NULL);

The reason that the per-subxact enforcement of max_changes_in_memory
isn't as noticable in 11 is that there we have the generational
context. Which means that each of the 4096*10 tuples we have in memory
doesn't allocate MaxHeapTupleSize, but instead something like ~30 bytes.

I wonder if we, in the backbranches that don't have generation context,
should just reduce the size of slab allocated tuples to be ~1024 bytes
instead of MaxHeapTupleSize. That's an almost trivial change, as we
already have to support tuples above that limit (in cases the oldtuple
in an update/delete contains toasted columns that we "inlined"). POC for
that attached.

Greetings,

Andres Freund

Attachment

wip-limit-slab-allocated-tuple-size.diff

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Andres Freund

Date:

07 February 2020, 19:47:58

Hi,

On 2020-02-07 20:02:01 +0100, Tomas Vondra wrote:
> On Fri, Feb 07, 2020 at 10:33:48AM -0800, Andres Freund wrote:
> > Hi,
> > 
> > On 2020-02-04 10:15:01 +0530, Kuntal Ghosh wrote:
> > > And, the issue got reproduced with the same error:
> > > WARNING:  problem in Generation Tuples: number of free chunks 0 in
> > > block 0x7fe9e9e74010 exceeds 1018 allocated
> > 
> > That seems like a problem in generation.c - because this should be
> > unreachable, I think?

> That's rather strange. How could we print this message? The code looks
> like this
> 
>   if (block->nfree >= block->nchunks)
>     elog(WARNING, "problem in Generation %s: number of free chunks %d in block %p exceeds %d allocated",
>          name, block->nfree, block, block->nchunks);
> 
> so this says 0 >= 1018. Or am I missing something?

Indeed, it's pretty weird. I can't reproduce it either. Kuntal, which
exact git version did you repro this on? What precise settings /
platform?

Greetings,

Andres Freund

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

09 February 2020, 03:48:29

On Sat, Feb 8, 2020 at 12:10 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2020-02-04 10:15:01 +0530, Kuntal Ghosh wrote:
> > I performed the same test in pg11 and reproduced the issue on the
> > commit prior to a4ccc1cef5a04 (Generational memory allocator).
> >
> > ulimit -s 1024
> > ulimit -v 300000
> >
> > wal_level = logical
> > max_replication_slots = 4
> >
> > [...]
>
> > After that, I applied the "Generational memory allocator" patch and
> > that solved the issue. From the error message, it is evident that the
> > underlying code is trying to allocate a MaxTupleSize memory for each
> > tuple. So, I re-introduced the following lines (which are removed by
> > a4ccc1cef5a04) on top of the patch:
>
> > --- a/src/backend/replication/logical/reorderbuffer.c
> > +++ b/src/backend/replication/logical/reorderbuffer.c
> > @@ -417,6 +417,9 @@ ReorderBufferGetTupleBuf(ReorderBuffer *rb, Size tuple_len)
> >
> >     alloc_len = tuple_len + SizeofHeapTupleHeader;
> >
> > +   if (alloc_len < MaxHeapTupleSize)
> > +       alloc_len = MaxHeapTupleSize;
>
> Maybe I'm being slow here - but what does this actually prove? Before
> the generation contexts were introduced we avoided fragmentation (which
> would make things unusably slow) using a a brute force method (namely
> forcing all tuple allocations to be of the same/maximum size).
>

It seems for this we formed a cache of max_cached_tuplebufs number of
objects and we don't need to allocate more than that number of tuples
of size MaxHeapTupleSize because we will anyway return that memory to
aset.c.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Kuntal Ghosh

Date:

10 February 2020, 05:30:26

Hello,

On Sat, Feb 8, 2020 at 1:18 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2020-02-07 20:02:01 +0100, Tomas Vondra wrote:
> > On Fri, Feb 07, 2020 at 10:33:48AM -0800, Andres Freund wrote:
> > > Hi,
> > >
> > > On 2020-02-04 10:15:01 +0530, Kuntal Ghosh wrote:
> > > > And, the issue got reproduced with the same error:
> > > > WARNING:  problem in Generation Tuples: number of free chunks 0 in
> > > > block 0x7fe9e9e74010 exceeds 1018 allocated
> > >
> > > That seems like a problem in generation.c - because this should be
> > > unreachable, I think?
>
> > That's rather strange. How could we print this message? The code looks
> > like this
> >
> >   if (block->nfree >= block->nchunks)
> >     elog(WARNING, "problem in Generation %s: number of free chunks %d in block %p exceeds %d allocated",
> >          name, block->nfree, block, block->nchunks);
> >
> > so this says 0 >= 1018. Or am I missing something?
>
> Indeed, it's pretty weird. I can't reproduce it either. Kuntal, which
> exact git version did you repro this on? What precise settings /
> platform?
>

I've used the following steps:

Platform:
OS: Ubuntu 64-bit 18.04.2
VMWare Fusion Version 8.5.10 on my MacOS 10.14.6
16GB RAM with 8 core processors

git checkout -b pg11 remotes/origin/REL_11_STABLE
git reset --hard a4ccc1cef5a04cc (Generational memory allocator)
git apply Set-alloc_len-as-MaxHeapTupleSize.patch

Then, I performed the same test. I've attached the test.sql file for the same.

With this test, I wanted to check how the generational memory
allocator patch is solving the issue. As reported by you as well
earlier in the thread, each of the 4096*10 in-memory tuples
doesn't allocate MaxHeapTupleSize, but instead something like ~30
bytes with this patch. IMHO, this is the change that is making the
difference. If we allocate MaxHeapTupleSize instead for all the
tuples, we'll encounter the same out-of-memory issue.

I haven't looked into the issue in generational tuple context yet.
It's possible that that the changes I've done in the attached patch
don't make sense and break the logic of generational memory allocator.
:-)

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Kuntal Ghosh

Date:

14 February 2020, 10:35:59

On Sun, Feb 9, 2020 at 9:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> It seems for this we formed a cache of max_cached_tuplebufs number of
> objects and we don't need to allocate more than that number of tuples
> of size MaxHeapTupleSize because we will anyway return that memory to
> aset.c.
>
In the approach suggested by Amit (approach 1), once we allocate the
max_cached_tuplebufs number of MaxHeapTupleSize, we can use the actual
length of the tuple for allocating memory. So, if we have m
subtransactions, the memory usage at worst case will be,

(max_cached_tuplebufs * MaxHeapTupleSize) cache +
(Maximum changes in a subtransaction before spilling) * m * (Actual tuple size)

= 64 MB cache + 4095 * m * (Actual tuple size)

In the approach suggested by Andres (approach 2), we're going to
reduce the size of a cached tuple to 1024 bytes. So, if we have m
sub-transactions, the memory usage at worst case will be,

(max_cached_tuplebufs * 1024 bytes) cache + (Maximum changes in a
subtransaction before spilling) * m * 1024 bytes

= 8 MB cache + 4095 * m * 1024 (considering the size of the tuple is
less than 1024 bytes)

Once the cache is filled, for 1000 sub-transactions operating on tuple
size, say 100 bytes, approach 1 will allocate 390 MB of memory
(approx.) whereas approach 2 will allocate 4GB of memory
approximately. If there is no obvious error that I'm missing, I think
we should implement the first approach.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

18 February 2020, 05:47:42

On Fri, Feb 7, 2020 at 5:32 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
>
> On Tue, Feb 4, 2020 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > I don't think we can just back-patch that part of code as it is linked
> > to the way we are maintaining a cache (~8MB) for frequently allocated
> > objects.  See the comments around the definition of
> > max_cached_tuplebufs.  But probably, we can do something once we reach
> > such a limit, basically, once we know that we have already allocated
> > max_cached_tuplebufs number of tuples of size MaxHeapTupleSize, we
> > don't need to allocate more of that size.  Does this make sense?
> >
>
> Yeah, this makes sense. I've attached a patch that implements the
> same. It solves the problem reported earlier. This solution will at
> least slow down the process of going OOM even for very small sized
> tuples.
>

The patch seems to be in right direction and the test at my end shows
that it resolves the issue.  One minor comment:
  * those. Thus always allocate at least MaxHeapTupleSize. Note that tuples
  * generated for oldtuples can be bigger, as they don't have out-of-line
  * toast columns.
+ *
+ * But, if we've already allocated the memory required for building the
+ * cache later, we don't have to allocate memory more than the size of the
+ * tuple.
  */

How about modifying the existing comment as: "Most tuples are below
MaxHeapTupleSize, so we use a slab allocator for those. Thus always
allocate at least MaxHeapTupleSize till the slab cache is filled. Note
that tuples generated for oldtuples can be bigger, as they don't have
out-of-line toast columns."?

Have you tested this in 9.6 and 9.5?




-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

18 February 2020, 05:50:17

On Fri, Feb 14, 2020 at 4:06 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
>
> On Sun, Feb 9, 2020 at 9:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > It seems for this we formed a cache of max_cached_tuplebufs number of
> > objects and we don't need to allocate more than that number of tuples
> > of size MaxHeapTupleSize because we will anyway return that memory to
> > aset.c.
> >
> In the approach suggested by Amit (approach 1), once we allocate the
> max_cached_tuplebufs number of MaxHeapTupleSize, we can use the actual
> length of the tuple for allocating memory. So, if we have m
> subtransactions, the memory usage at worst case will be,
>
> (max_cached_tuplebufs * MaxHeapTupleSize) cache +
> (Maximum changes in a subtransaction before spilling) * m * (Actual tuple size)
>
> = 64 MB cache + 4095 * m * (Actual tuple size)
>
> In the approach suggested by Andres (approach 2), we're going to
> reduce the size of a cached tuple to 1024 bytes. So, if we have m
> sub-transactions, the memory usage at worst case will be,
>
> (max_cached_tuplebufs * 1024 bytes) cache + (Maximum changes in a
> subtransaction before spilling) * m * 1024 bytes
>
> = 8 MB cache + 4095 * m * 1024 (considering the size of the tuple is
> less than 1024 bytes)
>
> Once the cache is filled, for 1000 sub-transactions operating on tuple
> size, say 100 bytes, approach 1 will allocate 390 MB of memory
> (approx.) whereas approach 2 will allocate 4GB of memory
> approximately. If there is no obvious error that I'm missing, I think
> we should implement the first approach.
>

Your calculation seems correct to me.  So, I think we should proceed
with the patch written by you.

Andres, any objections on proceeding with Kuntal's patch for
back-branches (10, 9.6 and 9.5)?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Andres Freund

Date:

18 February 2020, 06:03:53

On 2020-02-18 11:20:17 +0530, Amit Kapila wrote:
> Andres, any objections on proceeding with Kuntal's patch for
> back-branches (10, 9.6 and 9.5)?

Yes. In my past experiments that lead to *terrible* allocator
performance due to fragmentation. Like, up to 90% of the time spent in
aset.c.  Try a workload with a number of overlapping transactions that
have different tuple sizes.

I'm not even sure it's the right thing to do anything in the back
branches to be honest. If somebody hits this badly they likely have done
so before, and they at least have the choice to upgrade, but if we
regress performance for more people...

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From

Amit Kapila

Date:

18 February 2020, 08:16:41

On Tue, Feb 18, 2020 at 11:33 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2020-02-18 11:20:17 +0530, Amit Kapila wrote:
> > Andres, any objections on proceeding with Kuntal's patch for
> > back-branches (10, 9.6 and 9.5)?
>
> Yes. In my past experiments that lead to *terrible* allocator
> performance due to fragmentation. Like, up to 90% of the time spent in
> aset.c.  Try a workload with a number of overlapping transactions that
> have different tuple sizes.
>

I thought slab-cache would have addressed it.  But, it is possible if
there are many-2 such overlapping transactions, then that might lead
to performance regression.  OTOH, the current code also might lead to
worse performance for transactions with multiple subtransactions as
they would frequently need to malloc.

> I'm not even sure it's the right thing to do anything in the back
> branches to be honest. If somebody hits this badly they likely have done
> so before, and they at least have the choice to upgrade, but if we
> regress performance for more people...

I could see that for some cases the current code might give better
performance, but OTOH, consuming memory at a high rate for some other
cases is also not good either. But you are right that we can always
ask such users to upgrade (which again sometimes is painful for some
of the users), so maybe the right thing is to do nothing here. Anyone
else has any opinion on this?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com