Thread: High table creation rate results in “File exists” error

High table creation rate results in “File exists” error

From
Radek Slupik
Date:
Hi,

We occasionally run into an internal error when creating tables. The error message is «could not create file "base/17006/2851173576": File exists» with SQLSTATE 58P02. The attachment is included with this email contains the PostgreSQL logs. Note that some identifiers are renamed in the logs because they contain sensitive information, but identifier names are unlikely the source of the error.

An old mailing list thread [1] suggests a problem related to OID wraparound. Our use case involves creating and dropping thousands of tables every day. The CREATE TABLE statements for these tables are similar to the one in the attached logs, but with different table names and columns. Nonetheless, the error is difficult to reproduce as it happens only sporadically. Retrying the CREATE TABLE statement fixes the problem temporarily.

Some more information about our PostgreSQL instance and usage:
  • PostgreSQL version: PostgreSQL 10.6 (Ubuntu 10.6-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit.
  • The DBMS did not crash or abruptly shut down recently.
  • The file base/17006/2851173576 was actually in use by an index, as indicated by pg_class.relfilenode.
  • Autovacuum is disabled for these tables. Logging is disabled for these tables upon creation, but enabled after filling them with data.
  • The query “SELECT max(oid) FROM pg_class” reports 4,294,603,797, which is close to 2³² - 1.
Kind regards,

Radek Slupik
Channable

Attachment

Re: High table creation rate results in “File exists” error

From
Tomas Vondra
Date:
On Tue, Jan 21, 2020 at 02:04:24PM +0100, Radek Slupik wrote:
>Hi,
>
>We occasionally run into an internal error when creating tables. The error
>message is «could not create file "base/17006/2851173576": File exists»
>with SQLSTATE 58P02. The attachment is included with this email contains
>the PostgreSQL logs. Note that some identifiers are renamed in the logs
>because they contain sensitive information, but identifier names are
>unlikely the source of the error.
>
>An old mailing list thread [1] suggests a problem related to OID
>wraparound. Our use case involves creating and dropping thousands of tables
>every day. The CREATE TABLE statements for these tables are similar to the
>one in the attached logs, but with different table names and columns.
>Nonetheless, the error is difficult to reproduce as it happens only
>sporadically. Retrying the CREATE TABLE statement fixes the problem
>temporarily.
>
>Some more information about our PostgreSQL instance and usage:
>
>   - PostgreSQL version: PostgreSQL 10.6 (Ubuntu 10.6-1.pgdg16.04+1) on
>   x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0
>   20160609, 64-bit.
>   - The DBMS did not crash or abruptly shut down recently.
>   - The file base/17006/2851173576 was actually in use by an index, as
>   indicated by pg_class.relfilenode.
>   - Autovacuum is disabled for these tables. Logging is disabled for these
>   tables upon creation, but enabled after filling them with data.
>   - The query “SELECT max(oid) FROM pg_class” reports 4,294,603,797, which
>   is close to 2³² - 1.
>

So what was the actual rate of table creation? How long did it take to
get to this state?

What kernel/filesystem was used? What did lsof say?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: High table creation rate results in “File exists” error

From
Ruud van Asseldonk
Date:
Hi, thanks for looking into this!

On Wed, 22 Jan 2020 at 18:07, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> So what was the actual rate of table creation? How long did it take to
> get to this state?

Currently we create tables at about 1.8 Hz. We drop them at the same
rate, almost all tables are short-lived (between 1d and 1h). We
started seeing this error roughly 1.5 years after we started using
this system, but in the beginning the rate was not as high as it is
now.

I don’t have all historical occurrences at hand, but in the past 14
days, the error has occurred twice.

> What kernel/filesystem was used? What did lsof say?

We use kernel 4.15.0-1036-gcp and the file system is ext4. Lsof does
not currently list the file in the most recent error message, but I
don't know what the state was at the time of the error.

Best,
Ruud