Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table. - Mailing list pgsql-bugs

From Kasahara Tatsuhito
Subject Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
Date
Msg-id CAP0=ZVKdHgLGvFJHE_6x2PTrw50_FLqPUSdRNzVTKSb_y8naCQ@mail.gmail.com
Whole thread Raw
In response to Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.  (Kasahara Tatsuhito <kasahara.tatsuhito@gmail.com>)
List pgsql-bugs
Hi,

On Wed, Jan 6, 2021 at 11:13 AM Kasahara Tatsuhito
<kasahara.tatsuhito@gmail.com> wrote:
>
> Hi,
>
> On Mon, Dec 28, 2020 at 4:31 PM tomohiro hiramitsu <hiramit.tm@gmail.com> wrote:
> >
> > Hi
> >
> > > Oh meh. Yeah, I didn't think about the actual implementation of doing it :/
> > >
> > > That said, I'd really prefer to see *some* kind of counter that could
> > > get people noticing this a bit earlier, rather than just bugging them
> > > in the logfile once it's gone over a threshold.  A statistics counter
> > > maybe, but I guess that'd have to be tracked at a per-table level to
> > > be really useful, which would make it potentially fairly expensive to
> > > keep around...
> >
> > I think the statistics on OID assignments per-table is a good idea, but how about simply logging the event to
notifyusers as the first step? 
> > I think it's easy to implement and helpful for users.
> +1
> It is a simple and quick way for users to know if the OID assignment
> is taking a long time or not.
> At least it will be better than now.
>
> > First of all, I would like to provide a way for the user to know that it is taking a long time to generate the OID,
andprovide an opportunity for the user to deal with this problem with statement_timeout etc. 
> > This is an interim measure, but I think it's better than it is.
> >
> > Since it is complicated to handle WaitEvent and statistics for each table, I first considered a patch that simply
outputslogs, giving priority to avoiding a hang state without outputting logs. 
> >
> > * v01-0001-OID_log_output.patch
> >    In this patch, GetNewOidWithIndex outputs a log when it loops too many times to assign a new OID.
> >    Also, if the log is output even once, it will be output to the log when the OID is assigned.
> >
> >    * This patch does not cause an error even if it loops too many times.
> >      How much time can be tolerated in the OID search loop depends on the user, so I think that the user should
decidethe time to make an error with statement_timeout. 
> >
> >    * The log output interval threshold increases by "* = 2"(exponential backoff) and the maximum number of log
outputsis set to 5. 
> >      The reason for setting this threshold is to reduce the number of log outputs and output the log before
stoppingat statement_timeout etc. I think the appropriate value for this threshold is controversial. 
> >      Since it is not necessary to keep logging, the upper limit is set to 5 times.
> >
> >    * The log output interval threshold is set to 1 million for the first log output.
> >      In my environment, it takes about 4 seconds to loop 1 million times, and the 5th log is output after about 2
minutes.Ithink this threshold is just right considering statements_timeout, but it's controversial. 
> >
> > Log output does not need to continue output during OID assignment. I think the first few and the last log are
enough.
> >
> > I will add this patch to the commitfest. I look forward to your feedback about the patch.
I tested the patch.
The patch can be applied with the current HEAD
(eb42110d952f8d1ad4049b8f2491e9dfba75ffed) and the regression test
passed.

I actually tried to reproduce OID assignment conflicts using the
following procedure.
================================================================================================
DROP TABLE IF EXISTS t1;

-- Create a table. SET TOAST_TUPLE_TARGET for generate much TOAST records.
SELECT 'CREATE UNLOGGED TABLE t1 ( c1 int,' || string_agg(str, ',') ||
') WITH (TOAST_TUPLE_TARGET = 128);'
 FROM (SELECT ' c' || i || ' text' as str FROM generate_series(2,21)i )t;\gexec

-- Generate data and insert. This query generate 20 TOAST records per-rows.
-- So this results in 16 million TOAST rows (same as
GETNEWOID_NOTIFICATION_LIMIT).
WITH r AS (SELECT string_agg(str, '') as d FROM (SELECT
md5(clock_timestamp()::text) AS str
 FROM generate_series(1,4)t)s)
INSERT INTO t1 SELECT generate_series(1,800000)
,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d FROM r;

-- Check min(chunk_id) to rewind OID following test
SELECT 'SELECT min(chunk_id) FROM pg_toast.' || relname
  FROM pg_class WHERE oid = (SELECT reltoastrelid FROM pg_class WHERE
relname = 't1');\gexec

-- Check normal(non OID conflicting) insert's duration
\timing
WITH r AS (SELECT string_agg(str, '') as d FROM (SELECT
md5(clock_timestamp()::text) AS str
 FROM generate_series(1,4)t)s)
INSERT INTO t1 SELECT 1,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d FROM r;

 INSERT 0 1
 Time: 0.899 ms

-- Reset OID to the value of min(chunk_id) by using pg_resetwal (-o)
-- Check problematic(OID conflicting) insert's duration
\timing
WITH r AS (SELECT string_agg(str, '') as d FROM (SELECT
md5(clock_timestamp()::text) AS str
 FROM generate_series(1,4)t)s)
INSERT INTO t1 SELECT 1,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d FROM r;

 INSERT 0 1
 Time: 36399.478 ms (00:36.399)
================================================================================================

And I got following log messages.
(The STATEMENT section is long and has been cut)
===================================================================================================
2021-02-18 19:04:59.055 JST [32153] LOG:  failed to assign new OID in
relation "pg_toast_32018451" after 1000000 retries
2021-02-18 19:04:59.055 JST [32153] STATEMENT:  WITH r ...
2021-02-18 19:05:01.299 JST [32153] LOG:  failed to assign new OID in
relation "pg_toast_32018451" after 2000000 retries
2021-02-18 19:05:01.299 JST [32153] STATEMENT:  WITH r ...
2021-02-18 19:05:05.802 JST [32153] LOG:  failed to assign new OID in
relation "pg_toast_32018451" after 4000000 retries
2021-02-18 19:05:05.802 JST [32153] STATEMENT:  WITH r ...
2021-02-18 19:05:14.853 JST [32153] LOG:  failed to assign new OID in
relation "pg_toast_32018451" after 8000000 retries
2021-02-18 19:05:14.853 JST [32153] STATEMENT:  WITH r ...
2021-02-18 19:05:33.185 JST [32153] LOG:  failed to assign new OID in
relation "pg_toast_32018451" after 16000000 retries
2021-02-18 19:05:33.185 JST [32153] STATEMENT:  WITH r ...
2021-02-18 19:05:33.185 JST [32153] LOG:  the new OID has been
assigned in relation "pg_toast_32018451" after 16000021 retries
2021-02-18 19:05:33.185 JST [32153] STATEMENT:  WITH r ...
===================================================================================================

It works as expected, and I think the message is useful.

Here are some comments about the code.

+#define GETNEWOID_NOTIFICATION_INTERVAL_FACTOR 1000000

It seems to be more of a minimum value for conflict detection rather
than an interval factor.
So, it would be better to use another name, such as
GETNEWOID_NOTIFICATION_MINVAL or others.

+        /* retry count and notification limit check */
+        if (retry_count == next_notify && next_notify <=
GETNEWOID_NOTIFICATION_LIMIT)
+        {
+            ereport(LOG,
+                (errmsg("failed to assign new OID in relation \"%s\"
after "UINT64_FORMAT" retries",
+                    RelationGetRelationName(relation), retry_count)));
+            next_notify *= 2; /* double it for the next notification */
+        }
+        retry_count++;

I think that we should add more comments about why output this LOG message.

Best regards,


> >
> >
> > By the way
> > How about distributing OIDs so that the GetNewOidWithIndex function doesn't take long to find an available OID?
> > For example, skip one and assign an OID.
> >
> > As a side effect of this method, the OID progresses faster.
> > (Maybe the btree index will be fragmented faster)
> I think the idea of assigning OIDs to sparse is interesting.
> However, I think it needs to find out how much it will affect the performance.
> It could be a non-negligible overhead, especially for large amounts of
> data insertion/updating.
>
> Best regards,
>
> > It may be better to include skips only for TOAST chunk_id, as this method may have other side effects.
> >
> > regards.
> > --
> > Tomohiro Hiramitsu
> > NTT Open Source Software Center
> >
> > 2020年12月21日(月) 16:44 Magnus Hagander <magnus@hagander.net>:
> >>
> >> On Wed, Nov 18, 2020 at 8:27 PM Andres Freund <andres@anarazel.de> wrote:
> >> >
> >> > Hi,
> >> >
> >> > On 2020-11-18 11:57:05 +0100, Magnus Hagander wrote:
> >> > > This definitely sounds like something that's worth putting out as a
> >> > > wait event. Even before you start traversing millions of OIDs it might
> >> > > gradually start to show up, and being able to monitor that would
> >> > > definitely be useful.
> >> >
> >> > I don't think this is likely to work well as a wait event. All the index
> >> > traversals etc will do IO, acquire locks, etc, which will overwrite the
> >> > wait event and reset it to nothing once done.
> >>
> >> Oh meh. Yeah, I didn't think about the actual implementation of doing it :/
> >>
> >> That said, I'd really prefer to see *some* kind of counter that could
> >> get people noticing this a bit earlier, rather than just bugging them
> >> in the logfile once it's gone over a threshold.  A statistics counter
> >> maybe, but I guess that'd have to be tracked at a per-table level to
> >> be really useful, which would make it potentially fairly expensive to
> >> keep around...
> >>
> >> --
> >>  Magnus Hagander
> >>  Me: https://www.hagander.net/
> >>  Work: https://www.redpill-linpro.com/
> >>
> >>
> >>
> >>
>
>
> --
> Tatsuhito Kasahara
> kasahara.tatsuhito _at_ gmail.com



--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com



pgsql-bugs by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: BUG #16868: Cannot find sqlstat error codes.
Next
From: Fujii Masao
Date:
Subject: Re: BUG #16867: savepoints vs. commit and chain