Thread: Drop database command will raise "wrong tuple length" if pg_database tuple contains toast attribute.

Hi hackers,

We encountered an issue lately, that if the database grants too many roles `datacl` is toasted, following which, the drop database command will fail with error "wrong tuple length".

To reproduce the issue, please follow below steps:

CREATE DATABASE test;

-- create helper function
CREATE OR REPLACE FUNCTION data_tuple() returns text as $body$
declare
          mycounter int;
begin
          for mycounter in select i from generate_series(1,2000) i loop
                    execute 'CREATE ROLE aaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbb ' || mycounter;
                    execute 'GRANT ALL ON DATABASE test to aaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbb ' || mycounter;
          end loop;
          return 'ok';
end;
$body$ language plpgsql volatile strict; 

-- create roles and grant on the database.
SELECT data_tuple(); 

-- drop database command, this will result in "wrong tuple length" error.
DROP DATABASE test;

The root cause of this behaviour is that the HeapTuple in dropdb function fetches a copy of pg_database tuple from system cache.
But the system cache flattens any toast attributes, which cause the length check to fail in heap_inplace_update.

A patch for this issue is attached to the mail, the solution is to change the logic to fetch the tuple by directly scanning pg_database rather than using the catcache.

Regards,
Ayush




Attachment
Hi Ayush,

On 8/13/24 07:37, Ayush Tiwari wrote:
> Hi hackers,
> 
> We encountered an issue lately, that if the database grants too many
> roles `datacl` is toasted, following which, the drop database command
> will fail with error "wrong tuple length".
> 
> To reproduce the issue, please follow below steps:
> 
> CREATE DATABASE test;
> 
> -- create helper function
> CREATE OR REPLACE FUNCTION data_tuple() returns text as $body$
> declare
>           mycounter int;
> begin
>           for mycounter in select i from generate_series(1,2000) i loop
>                     execute 'CREATE
> ROLE aaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbb ' || mycounter;
>                     execute 'GRANT ALL ON DATABASE test to
> aaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbb ' || mycounter;
>           end loop;
>           return 'ok';
> end;
> $body$ language plpgsql volatile strict; 
> 
> -- create roles and grant on the database.
> SELECT data_tuple(); 
> 
> -- drop database command, this will result in "wrong tuple length" error.
> DROP DATABASE test;
> 
> The root cause of this behaviour is that the HeapTuple in dropdb
> function fetches a copy of pg_database tuple from system cache.
> But the system cache flattens any toast attributes, which cause the
> length check to fail in heap_inplace_update.
> 
> A patch for this issue is attached to the mail, the solution is to
> change the logic to fetch the tuple by directly scanning pg_database
> rather than using the catcache.
> 

Thanks for the report. I can reproduce the issue following your
instructions, and the fix seems reasonable ...


But there's also one thing I don't quite understand. I did look for
other places that might have a similar issue, that is places that

  1) lookup tuple using SearchSysCacheCopy1

  2) call on the tuple heap_inplace_update


And I found about four places doing that:

  - index_update_stats (src/backend/catalog/index.c)

  - create_toast_table (src/backend/catalog/toasting.c)

  - vac_update_relstats / vac_update_datfrozenxid (commands/vacuum.c)


But I haven't managed to trigger the same kind of failure for any of
those places, despite trying. AFAIK that's because those places update
pg_class, and that doesn't have TOAST, so the tuple length can't change.


So this fix seems reasonable.

-- 
Tomas Vondra



On 8/16/24 13:26, Tomas Vondra wrote:
> Hi Ayush,
> 
> ...
> 
> So this fix seems reasonable.
> 

I've pushed this to all affected branches, except for 11 which is EOL.

I thought about adding a test, but I couldn't think of a TAP test where
this would really fit, and it didn't seem very practical to have a test
creating hundreds of roles. So I abandoned the idea.


Thanks for the report and the fix!

-- 
Tomas Vondra



On 8/19/24 11:01, Yugo Nagata wrote:
> On Mon, 19 Aug 2024 00:35:39 +0200
> Tomas Vondra <tomas@vondra.me> wrote:
> 
>> On 8/16/24 13:26, Tomas Vondra wrote:
>>> Hi Ayush,
>>>
>>> ...
>>>
>>> So this fix seems reasonable.
>>>
>>
>> I've pushed this to all affected branches, except for 11 which is EOL.
>>
>> I thought about adding a test, but I couldn't think of a TAP test where
>> this would really fit, and it didn't seem very practical to have a test
>> creating hundreds of roles. So I abandoned the idea.
> 
> I tried to add Assert in heap_inplace_update to prevent possible similar 
> failures, but I gave up because I could not find a good way to determine if
> a tuple is detoasted of not.
> 

Right, not sure there's a good way to check for that.

> By the way, I found a comment in vac_update_datfrozenxid() and EventTriggerOnLogin()
> that explains why we  could not use tuples from the syscache for heap_inplace_update. 
> I think it is better ad d the same comment in dropdb(). I attached a trivial patch for it.
> 

Agreed. That seems like a nice improvement to the comment.


regards

-- 
Tomas Vondra



On 8/19/24 12:16, Tomas Vondra wrote:
> On 8/19/24 11:01, Yugo Nagata wrote:
>
> ...
> 
>> By the way, I found a comment in vac_update_datfrozenxid() and EventTriggerOnLogin()
>> that explains why we  could not use tuples from the syscache for heap_inplace_update. 
>> I think it is better ad d the same comment in dropdb(). I attached a trivial patch for it.
>>
> 
> Agreed. That seems like a nice improvement to the comment.
> 

Done, thanks for the suggestion / patch.


regards

-- 
Tomas Vondra