Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table. - Mailing list pgsql-bugs

From Kasahara Tatsuhito
Subject Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
Date
Msg-id CAP0=ZVLue4DBqV5cSD63JbXryV7+UKXdukfoLyEOpw+2EHRFCg@mail.gmail.com
Whole thread Raw
In response to Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.  (tomohiro hiramitsu <hiramit.tm@gmail.com>)
Responses Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
List pgsql-bugs
Hi,

On Mon, Dec 28, 2020 at 4:31 PM tomohiro hiramitsu <hiramit.tm@gmail.com> wrote:
>
> Hi
>
> > Oh meh. Yeah, I didn't think about the actual implementation of doing it :/
> >
> > That said, I'd really prefer to see *some* kind of counter that could
> > get people noticing this a bit earlier, rather than just bugging them
> > in the logfile once it's gone over a threshold.  A statistics counter
> > maybe, but I guess that'd have to be tracked at a per-table level to
> > be really useful, which would make it potentially fairly expensive to
> > keep around...
>
> I think the statistics on OID assignments per-table is a good idea, but how about simply logging the event to notify
usersas the first step? 
> I think it's easy to implement and helpful for users.
+1
It is a simple and quick way for users to know if the OID assignment
is taking a long time or not.
At least it will be better than now.

> First of all, I would like to provide a way for the user to know that it is taking a long time to generate the OID,
andprovide an opportunity for the user to deal with this problem with statement_timeout etc. 
> This is an interim measure, but I think it's better than it is.
>
> Since it is complicated to handle WaitEvent and statistics for each table, I first considered a patch that simply
outputslogs, giving priority to avoiding a hang state without outputting logs. 
>
> * v01-0001-OID_log_output.patch
>    In this patch, GetNewOidWithIndex outputs a log when it loops too many times to assign a new OID.
>    Also, if the log is output even once, it will be output to the log when the OID is assigned.
>
>    * This patch does not cause an error even if it loops too many times.
>      How much time can be tolerated in the OID search loop depends on the user, so I think that the user should
decidethe time to make an error with statement_timeout. 
>
>    * The log output interval threshold increases by "* = 2"(exponential backoff) and the maximum number of log
outputsis set to 5. 
>      The reason for setting this threshold is to reduce the number of log outputs and output the log before stopping
atstatement_timeout etc. I think the appropriate value for this threshold is controversial. 
>      Since it is not necessary to keep logging, the upper limit is set to 5 times.
>
>    * The log output interval threshold is set to 1 million for the first log output.
>      In my environment, it takes about 4 seconds to loop 1 million times, and the 5th log is output after about 2
minutes.Ithink this threshold is just right considering statements_timeout, but it's controversial. 
>
> Log output does not need to continue output during OID assignment. I think the first few and the last log are enough.
>
> I will add this patch to the commitfest. I look forward to your feedback about the patch.
>
>
> By the way
> How about distributing OIDs so that the GetNewOidWithIndex function doesn't take long to find an available OID?
> For example, skip one and assign an OID.
>
> As a side effect of this method, the OID progresses faster.
> (Maybe the btree index will be fragmented faster)
I think the idea of assigning OIDs to sparse is interesting.
However, I think it needs to find out how much it will affect the performance.
It could be a non-negligible overhead, especially for large amounts of
data insertion/updating.

Best regards,

> It may be better to include skips only for TOAST chunk_id, as this method may have other side effects.
>
> regards.
> --
> Tomohiro Hiramitsu
> NTT Open Source Software Center
>
> 2020年12月21日(月) 16:44 Magnus Hagander <magnus@hagander.net>:
>>
>> On Wed, Nov 18, 2020 at 8:27 PM Andres Freund <andres@anarazel.de> wrote:
>> >
>> > Hi,
>> >
>> > On 2020-11-18 11:57:05 +0100, Magnus Hagander wrote:
>> > > This definitely sounds like something that's worth putting out as a
>> > > wait event. Even before you start traversing millions of OIDs it might
>> > > gradually start to show up, and being able to monitor that would
>> > > definitely be useful.
>> >
>> > I don't think this is likely to work well as a wait event. All the index
>> > traversals etc will do IO, acquire locks, etc, which will overwrite the
>> > wait event and reset it to nothing once done.
>>
>> Oh meh. Yeah, I didn't think about the actual implementation of doing it :/
>>
>> That said, I'd really prefer to see *some* kind of counter that could
>> get people noticing this a bit earlier, rather than just bugging them
>> in the logfile once it's gone over a threshold.  A statistics counter
>> maybe, but I guess that'd have to be tracked at a per-table level to
>> be really useful, which would make it potentially fairly expensive to
>> keep around...
>>
>> --
>>  Magnus Hagander
>>  Me: https://www.hagander.net/
>>  Work: https://www.redpill-linpro.com/
>>
>>
>>
>>


--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com



pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: compiling error on OpenWrt
Next
From: Etsuro Fujita
Date:
Subject: Re: BUG #16807: Assert failed in postgres_fdw/estimate_path_cost_size with an empty foreign table