Thread: Re: Issue with pg_dump due to Schema OID Error
On 12/19/24 01:43, arons wrote: > I forgot to attache the script. In MyTestBugSchema01.baseProc() you meant to have: select MyTestBugSchema.afunction( u.username ) instead of select MyTestBugSchema2.afunction( u.username ) In other words there is a predefined MyTestBugSchema? > > On Thu, Dec 19, 2024 at 10:41 AM Renzo Dani <renzo.dani@gmail.com > <mailto:renzo.dani@gmail.com>> wrote: > > Hi, > > > Recently, I encountered a problem during a database export using > pg_dump. > > > Here is the error message: > > > pg_dump: last built-in OID is 16383 > > pg_dump: reading extensions > > pg_dump: identifying extension members > > pg_dump: reading schemas > > pg_dump: reading user-defined tables > > pg_dump: reading user-defined functions > > pg_dump: error: schema with OID 41960442 does not exist > > > To investigate the issue, I ran the following query: > > > SELECT * FROM pg_proc WHERE pronamespace = 41960442; > > > The result: > > > oid;proname;pronamespace;proowner;prolang;procost;prorows;provariadic;prosupport;prokind;prosecdef;proleakproof;proisstrict;proretset;provolatile;proparallel;pronargs;pronargdefaults;prorettype;proargtypes;proallargtypes;proargmodes;proargnames;proargdefaults;protrftypes;prosrc;probin;prosqlbody;proconfig;proacl > > 41966618;remapprotocoltypeids;41960442;19214494;13547;100;0;0;-;f;f;f;f;f;v;u;1;0;25;25;;;{pprotocoltypeids};;; > > > I resolved the issue by removing the problematic record (admin > privileges required): > > > DELETE FROM pg_proc WHERE oid = 41966618; > > > This situation seems inconsistent and likely should not occur under > normal conditions. > > > While I’m unsure exactly when this issue originated in our > environment, I was able to reproduce it by performing concurrent > modifications on the schema. > > > To demonstrate, I wrote a bash script (test_bug.sh) that starts two > threads running in parallel. > > Each thread drops the schema with CASCADE and recreates it using the > SQL script search_bug.sql. > > > To use the script, you’ll need to adapt two variables at the > beginning of the script: PGPASSWORD and URL. > > > Using this script, I reproduced the problem on PostgreSQL versions > 16.1 and 17.1. > > It typically takes less than a minute to trigger the issue. > > The script terminates automatically as soon as the problem is detected. > > > Here are additional references that might be related to this issue: > > > https://www.postgresql.org/message-id/flat/20110209003823.GA93840%40mr-paradox.net <https://www.postgresql.org/message-id/flat/20110209003823.GA93840%40mr-paradox.net> > > https://www.postgresql.org/message-id/flat/BB8AF37F-E3D9-4DE0-B398-AF89748704F5%40bandwidth.com <https://www.postgresql.org/message-id/flat/BB8AF37F-E3D9-4DE0-B398-AF89748704F5%40bandwidth.com> > > > > Let me know if you need additional information. > > > Best regards > > Renzo > -- Adrian Klaver adrian.klaver@aklaver.com
On 12/19/24 08:21, Renzo Dani wrote: Reply to list also. Ccing list. > HI Adrian, > you are right, there is a typo, the correct would be to have: > MyTestBugSchema2.afunction( u.username ) > > In any case the problem appears in my tests also with that script. To be clear the test script did not use MyTestBugSchema2.afunction( u.username ) but instead MyTestBugSchema.afunction( u.username ). If that is the case where did MyTestBugSchema come from? > I think the execution is not really important, I suppose what create the > problem is the drop cascade of the schema. It is important if someone wants to replicate the test case. > > BR > Renzo > > On Thu, Dec 19, 2024 at 5:15 PM Adrian Klaver <adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>> wrote: > > On 12/19/24 01:43, arons wrote: > > I forgot to attache the script. > > In MyTestBugSchema01.baseProc() you meant to have: > > select MyTestBugSchema.afunction( u.username ) > > instead of > > select MyTestBugSchema2.afunction( u.username ) > > In other words there is a predefined MyTestBugSchema? > > > > > On Thu, Dec 19, 2024 at 10:41 AM Renzo Dani <renzo.dani@gmail.com > <mailto:renzo.dani@gmail.com> > > <mailto:renzo.dani@gmail.com <mailto:renzo.dani@gmail.com>>> wrote: > > > > Hi, > > > > > > Recently, I encountered a problem during a database export using > > pg_dump. > > > > > > Here is the error message: > > > > > > pg_dump: last built-in OID is 16383 > > > > pg_dump: reading extensions > > > > pg_dump: identifying extension members > > > > pg_dump: reading schemas > > > > pg_dump: reading user-defined tables > > > > pg_dump: reading user-defined functions > > > > pg_dump: error: schema with OID 41960442 does not exist > > > > > > To investigate the issue, I ran the following query: > > > > > > SELECT * FROM pg_proc WHERE pronamespace = 41960442; > > > > > > The result: > > > > > > > oid;proname;pronamespace;proowner;prolang;procost;prorows;provariadic;prosupport;prokind;prosecdef;proleakproof;proisstrict;proretset;provolatile;proparallel;pronargs;pronargdefaults;prorettype;proargtypes;proallargtypes;proargmodes;proargnames;proargdefaults;protrftypes;prosrc;probin;prosqlbody;proconfig;proacl > > > > > 41966618;remapprotocoltypeids;41960442;19214494;13547;100;0;0;-;f;f;f;f;f;v;u;1;0;25;25;;;{pprotocoltypeids};;; > > > > > > I resolved the issue by removing the problematic record (admin > > privileges required): > > > > > > DELETE FROM pg_proc WHERE oid = 41966618; > > > > > > This situation seems inconsistent and likely should not occur > under > > normal conditions. > > > > > > While I’m unsure exactly when this issue originated in our > > environment, I was able to reproduce it by performing concurrent > > modifications on the schema. > > > > > > To demonstrate, I wrote a bash script (test_bug.sh) that > starts two > > threads running in parallel. > > > > Each thread drops the schema with CASCADE and recreates it > using the > > SQL script search_bug.sql. > > > > > > To use the script, you’ll need to adapt two variables at the > > beginning of the script: PGPASSWORD and URL. > > > > > > Using this script, I reproduced the problem on PostgreSQL > versions > > 16.1 and 17.1. > > > > It typically takes less than a minute to trigger the issue. > > > > The script terminates automatically as soon as the problem is > detected. > > > > > > Here are additional references that might be related to this > issue: > > > > > > > https://www.postgresql.org/message-id/flat/20110209003823.GA93840%40mr-paradox.net <https://www.postgresql.org/message-id/flat/20110209003823.GA93840%40mr-paradox.net> <https://www.postgresql.org/message-id/flat/20110209003823.GA93840%40mr-paradox.net <https://www.postgresql.org/message-id/flat/20110209003823.GA93840%40mr-paradox.net>> > > > > > https://www.postgresql.org/message-id/flat/BB8AF37F-E3D9-4DE0-B398-AF89748704F5%40bandwidth.com <https://www.postgresql.org/message-id/flat/BB8AF37F-E3D9-4DE0-B398-AF89748704F5%40bandwidth.com> <https://www.postgresql.org/message-id/flat/BB8AF37F-E3D9-4DE0-B398-AF89748704F5%40bandwidth.com <https://www.postgresql.org/message-id/flat/BB8AF37F-E3D9-4DE0-B398-AF89748704F5%40bandwidth.com>> > > > > > > > > Let me know if you need additional information. > > > > > > Best regards > > > > Renzo > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > -- Adrian Klaver adrian.klaver@aklaver.com
On 12/19/24 08:50, Adrian Klaver wrote: > On 12/19/24 08:46, Renzo Dani wrote: > > Again. > > Reply to list also using Reply All. > Ccing list. > >> Hi Adrian, >> here a new version of the script that I just tested produce the same >> problem. >> The script do not rely now on any additional relation than the ones >> defined into the script. I could not replicate using Ubuntu 22.04 and Postgres 16.6. From your original post: "I reproduced the problem on PostgreSQL versions 16.1 and 17.1." The latest bug fixes of each are 16.6 and 17.2. I would upgrade to those and then try again. >> >> BR >> Renzo >> -- Adrian Klaver adrian.klaver@aklaver.com
Hi Adrian,
Thanks for the test, I installed also version 17.2 and test with a fresh newly created db.
At the beginning I cannot reproduce the problem as well.
Than I start adding the extension we use, one after one, and testing in between.
I found out that the problem appear immediately as soon as I add the extension pg_stat_statements:
create extension pg_stat_statements WITH SCHEMA public;
alter system set shared_preload_libraries = 'pg_stat_statements';
Can you maybe test as well with this extension?
the other that I installed before that are: pgcrypto and pg_prewarm.
here the current installed one:
dbtest=# \dx
List of installed extensions
Name | Version | Schema | Description
--------------------+---------+------------+------------------------------------------------------------------------
pg_prewarm | 1.2 | public | prewarm relation data
pg_stat_statements | 1.11 | public | track planning and execution statistics of all SQL statements executed
pgcrypto | 1.3 | public | cryptographic functions
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(4 rows)
List of installed extensions
Name | Version | Schema | Description
--------------------+---------+------------+------------------------------------------------------------------------
pg_prewarm | 1.2 | public | prewarm relation data
pg_stat_statements | 1.11 | public | track planning and execution statistics of all SQL statements executed
pgcrypto | 1.3 | public | cryptographic functions
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(4 rows)
dbtest=# select version();
version
------------------------------------------------------------------------------------------------------------------
PostgreSQL 17.2 reda build on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, 64-bit
(1 row)
version
------------------------------------------------------------------------------------------------------------------
PostgreSQL 17.2 reda build on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, 64-bit
(1 row)
I'll do some additional tests.
BR
Renzo
On Thu, Dec 19, 2024 at 6:21 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 12/19/24 08:50, Adrian Klaver wrote:
> On 12/19/24 08:46, Renzo Dani wrote:
>
> Again.
>
> Reply to list also using Reply All.
> Ccing list.
>
>> Hi Adrian,
>> here a new version of the script that I just tested produce the same
>> problem.
>> The script do not rely now on any additional relation than the ones
>> defined into the script.
I could not replicate using Ubuntu 22.04 and Postgres 16.6.
From your original post:
"I reproduced the problem on PostgreSQL versions 16.1 and 17.1."
The latest bug fixes of each are 16.6 and 17.2. I would upgrade to those
and then try again.
>>
>> BR
>> Renzo
>>
--
Adrian Klaver
adrian.klaver@aklaver.com
Adrian Klaver <adrian.klaver@aklaver.com> writes: > The latest bug fixes of each are 16.6 and 17.2. I would upgrade to those > and then try again. Highly unlikely to make any difference. What's evidently going on here is that the test script attempts to do DROP SCHEMA concurrently with another session that's creating an object inside that schema. (Here, that's a function, but the particular type of object doesn't really matter.) There are three possible outcomes of that: 1. The object creation commits soon enough that DROP SCHEMA sees it, and drops the object along with the schema. 2. The object creation begins after DROP SCHEMA commits, and fails because the schema is not to be found. 3. The object creation goes through, leaving a now-dangling schema OID reference in the object's catalog entry. The object is useless because it's unnamable, but it won't really cause any trouble except for applications that scan the system catalogs (like pg_dump). Exactly none of these outcomes result in a usable object, so one wonders why your application is doing this sort of thing often enough to hit the race condition. We could prevent case 3 by locking the schema during object creation, converting it to one of the other cases. We actually do that for tables, but not for any other object types, reasoning that the greatly increased cost of locking would outweigh the problems that dangling objects create. (Note that to eliminate the issue fully, we'd have to lock every referenced object not only schemas; for example, also the data types of the function's arguments and result.) Also, adding such locking might well lead to deadlocks in concurrent add/drop scenarios, not just performance costs. tl;dr: it's been like this a long time, and I don't really foresee us accepting the costs of making it not act like that. I seem to recall someone submitting a patch recently that would add such locking, but I doubt it'll get accepted. regards, tom lane