Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table. - Mailing list pgsql-bugs

From Magnus Hagander
Subject Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
Date
Msg-id CABUevExw6AHqC_U7nH83=rvF-xY_U8AUDbR94kBDqDR0hGqKXw@mail.gmail.com
Whole thread Raw
In response to Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
List pgsql-bugs
On Wed, Nov 18, 2020 at 2:18 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2020/11/17 3:15, PG Bug reporting form wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference:      16722
> > Logged by:          Sergey Koposov
> > Email address:      skoposov@ed.ac.uk
> > PostgreSQL version: 11.9
> > Operating system:   debian
> > Description:
> >
> > Hi,
> >
> > When ingesting a billion or so rows in the table that has some array columns
> > (see schema below),  at some point all the backends doing the ingestion hang
> > with 100% CPU usage. When investigating, I think I can trace this to the
> > limit of 2^32 toasted records per table.
>
> Yes, this is the limitation of the number of out-of-line values in toast.
> https://wiki.postgresql.org/wiki/TOAST#Total_table_size_limit
>
>
> > See the gdb full backtrace  of
> > hanged backends in the bottom. When the problem occurs, it has 272 mill
> > records.
> > Basically the hanging happens in GetNewOidWithIndex called by
> > toast_save_datum.
> > While I understand the limit for the toast number is there to stay, but the
> > behaviour of PG next to the limit is IMO a bug (or at least non-trivial) .
> >
> > I would rather prefer to see an error-message as opposed to backends
> > hanging.
>
> To emit an error, we need to check that there is no unused OID for
> the toast and it would take very long to do that. So I'm not sure
> if to emit an error message really improves the current situation or not.
>
> OTOH it might be good idea to emit a warning message (every time we
> cannot find unused OID in recent 1 million OIDs, for example) or report
> the issue as wait event, or something while the record insertion is hanging
> because of toast limit, so that we can easily detect the issue.

This definitely sounds like something that's worth putting out as a
wait event. Even before you start traversing millions of OIDs it might
gradually start to show up, and being able to monitor that would
definitely be useful.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



pgsql-bugs by date:

Previous
From: Amit Kapila
Date:
Subject: Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Next
From: Jonathan Tauman
Date:
Subject: Issues with pg_dump command