Thread: "not related" code blocks for removal of dead rows when using vacuum and this kills the performance

Hi

We have a master code block which starts small, tiny operations that create a table and inserts data into that table in many threads.  

Nothing is done the master code, we follow an Orchestration pattern , where master just sends a message about what to do and that is done in other database connections not related connections used by master code.

In the master code I add sleep after the CRUD operations are done to make it easier to test. The test table will not change in the rest of this master code (in real life it happens more in the master code off course) .

Then we start testing VACUUM and very simple SQL testing in another window.

We can now show we have performance of "3343.794 ms" and not "0.123 ms", which is what we get when we are able to remove dead rows and run a new analyze.

The problem is that as long as the master code is active, we cannot remove alle dead rows and that what seems to be killing the performance.

With active I mean in hanging on pg_sleep and remember that this master has not created the test table or inserted any data in this test table it self.

Is the expected behavior ?

Is possible to around this problem in any way ?

In this note you find a detailed description and a simple standalone test script  https://gitlab.com/nibioopensource/resolve-overlap-and-gap/-/issues/67#note_1779300212

I have tested on "PostgreSQL 14.10 (Homebrew) on aarch64-apple-darwin23.0.0, compiled by Apple clang version 15.0.0 (clang-1500.0.40.1), 64-bit" and "PostgreSQL 12.6 (Ubuntu 12.6-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit"

Thanks .

Lars

On Mon, 2024-02-19 at 16:14 +0000, Lars Aksel Opsahl wrote:
> Then we start testing VACUUM and very simple SQL testing in another window.
>
> We can now show we have performance of "3343.794 ms" and not "0.123 ms", which
> is what we get when we are able to remove dead rows and run a new analyze.
>
> The problem is that as long as the master code is active, we cannot remove
> alle dead rows and that what seems to be killing the performance.
>
> With active I mean in hanging on pg_sleep and remember that this master has
> not created the test table or inserted any data in this test table it self.
>
> Is the expected behavior ?

It is not entirely clear what you are doing, but it seems like you are holding
a database transaction open, and yes, then it is expected behavior that
VACUUM cannot clean up dead rows in the table.

Make sure that your database transactions are short.
Don't use table or row locks to synchronize application threads.
What you could use to synchronize your application threads are advisory locks,
they are not tied to a database transaction.

Yours,
Laurenz Albe




From: Laurenz Albe <laurenz.albe@cybertec.at>
>
>It is not entirely clear what you are doing, but it seems like you are holding
>a database transaction open, and yes, then it is expected behavior that
>VACUUM cannot clean up dead rows in the table.
>
>Make sure that your database transactions are short.
>Don't use table or row locks to synchronize application threads.
>What you could use to synchronize your application threads are advisory locks,
>they are not tied to a database transaction.
>

Hi

The details are here at https://gitlab.com/nibioopensource/resolve-overlap-and-gap/-/issues/67#note_1779300212 and
here is also a ref. to this test script that shows problem https://gitlab.com/nibioopensource/resolve-overlap-and-gap/uploads/9a0988b50f05386ec9d91d6600bb03ec/test_issue_67.sql

I am not doing any locks I just do plain CRUD operations .

The key is that the master code is not creating any table or insert rows that is done by many short operations as you suggested.

But even if the master code is not doing any operations against the test table it's blocking removal of dead rows.

If this expected behavior, it's means that any long running transactions will block for removal of any dead rows for all visible tables in the database and that seems like problem or weakness of Postgresql.

While writing this I now was thinking maybe I can get around problem by not making the table not visible by the master code but that makes it very complicated for mee.     

Thanks.

Lars


From: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>
>From: Laurenz Albe <laurenz.albe@cybertec.at>
>>
>>It is not entirely clear what you are doing, but it seems like you are holding
>>a database transaction open, and yes, then it is expected behavior that
>>VACUUM cannot clean up dead rows in the table.
>>
>>Make sure that your database transactions are short.
>>Don't use table or row locks to synchronize application threads.
>>What you could use to synchronize your application threads are advisory locks,
>>they are not tied to a database transaction.
>>
>
>Hi
>
>The details are here at https://gitlab.com/nibioopensource/resolve-overlap-and-gap/-/issues/67#note_1779300212 and
>here is also a ref. to this test script that shows problem https://gitlab.com/nibioopensource/resolve-overlap-and-gap/uploads/9a0988b50f05386ec9d91d6600bb03ec/test_issue_67.sql
>
>I am not doing any locks I just do plain CRUD operations .
>
>The key is that the master code is not creating any table or insert rows that is done by many short operations as you suggested.
>
>But even if the master code is not doing any operations against the test table it's blocking removal of dead rows.
>
>If this expected behavior, it's means that any long running transactions will block for removal of any dead rows for all visible tables in the database and that seems like problem or weakness of Postgresql.
>
>While writing this I now was thinking maybe I can get around problem by not making the table not visible by the master code but that makes it very complicated for mee.    
>
>Thanks.
>
>Lars

Hi

I now tested running the master (Orchestration) code  as user joe.

In the master code I connect back as user lop and creates the test table test_null and inserts data in many tiny operations.

User joe who has the long running operation does not know anything about table test_null and does not have any grants to that table.

The table test_null is not granted to public either.    

The problem is the same, the long running transaction to joe will kill the performance on a table which user joe does not have any access to or know anything about .

If this is expected behavior it means that any user on the database that writes a long running sql that does not even insert any data can kill performance for any other user in the database.

So applications like QGIS who seems to keep open connections for a while can then also kill the performance for any other user in the data.

Having postgresql working like this also makes it very difficult to debug performance issues because a problem may just have been a side effect of a not related sql.

So I hope this is not the case and that I have done something wrong or that there are some parameters that can be adjusted on get around this problem.

Thanks

Lars

On Tue, 2024-02-20 at 05:46 +0000, Lars Aksel Opsahl wrote:
> If this is expected behavior it means that any user on the database that writes
> a long running sql that does not even insert any data can kill performance for
> any other user in the database.

Yes, that is the case.  A long running query will hold a snapshot, and no data
visible in that snapshot can be deleted.

That can cause bloat, which can impact performance.

> So applications like QGIS who seems to keep open connections for a while can
> then also kill the performance for any other user in the data.

No, that is not a problem.  Keeping *connections* open is a good thing. It is
keeping data modifying transactions, cursors or long-running queries open
that constitutes a problem.

Yours,
Laurenz Albe




From: Laurenz Albe <laurenz.albe@cybertec.at>
Sent: Tuesday, February 20, 2024 8:29 AM
>Re: "not related" code blocks for removal of dead rows when using vacuum and this kills the performance
>Laurenz Albe <laurenz.albe@cybertec.at>
>​Lars Aksel Opsahl;​
>pgsql-performance@lists.postgresql.org​
>On Tue, 2024-02-20 at 05:46 +0000, Lars Aksel Opsahl wrote:
>> If this is expected behavior it means that any user on the database that writes
>> a long running sql that does not even insert any data can kill performance for
>> any other user in the database.
>
>Yes, that is the case.  A long running query will hold a snapshot, and no data
>visible in that snapshot can be deleted.
>
>That can cause bloat, which can impact performance.
>

Hi

Thanks for the chat, seems like I finally found solution that seems work for this test code.

Adding a commit's  like here /uploads/031b350bc1f65752b013ee4ae5ae64a3/test_issue_67_with_commit.sql to master code even if there are nothing to commit seems to solve problem and that makes sense based on what you say, because then the master code gets a new visible snapshot and then releases the old snapshot.

The reason why I like to use psql as the master/Orchestration code and not C/Python/Bash and so is to make more simple to use/code and test.

Lars

Postgres index usage

From
"Dirschel, Steve"
Date:

I’ve found multiple postings out there saying you can query pg_stat_all_indexes and look at idx_scan to know if an index has been used by queries.  I want to be 100% sure I can rely on that table/column to know if an index has never been used.

 

I queried that table for a specific index and idx_scan is 0.  I queried pg_statio_all_indexes and can see idx_blks_read and idx_blks_hit have numbers in there.  If the index is not being used then what it causing idx_blks_read and idx_blks_hit to increase over time?  I’m wondering if those increase due to DML on the table.  Could anyone please confirm I can rely on pg_stat_all_index.idx_scan to know if queries are using an index and the increases over time in idx_blks_read and idx_blks_hit in pg_statio_all_indexes would be from DML (or possibly vacuum or other things)?

 

Thanks in advance.

This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html

RE: Postgres index usage

From
"Dirschel, Steve"
Date:

Didn’t mention-  this is Aurora Postgres version 14.6 if that matters for my question.  Thanks

 

From: Dirschel, Steve <steve.dirschel@thomsonreuters.com>
Sent: Wednesday, August 7, 2024 12:06 PM
To: pgsql-performance@lists.postgresql.org
Subject: Postgres index usage

 

I’ve found multiple postings out there saying you can query pg_stat_all_indexes and look at idx_scan to know if an index has been used by queries.  I want to be 100% sure I can rely on that table/column to know if an index has never been used.

 

I queried that table for a specific index and idx_scan is 0.  I queried pg_statio_all_indexes and can see idx_blks_read and idx_blks_hit have numbers in there.  If the index is not being used then what it causing idx_blks_read and idx_blks_hit to increase over time?  I’m wondering if those increase due to DML on the table.  Could anyone please confirm I can rely on pg_stat_all_index.idx_scan to know if queries are using an index and the increases over time in idx_blks_read and idx_blks_hit in pg_statio_all_indexes would be from DML (or possibly vacuum or other things)?

 

Thanks in advance.

This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html

Re: Postgres index usage

From
Tom Lane
Date:
"Dirschel, Steve" <steve.dirschel@thomsonreuters.com> writes:
> I queried that table for a specific index and idx_scan is 0.  I
> queried pg_statio_all_indexes and can see idx_blks_read and
> idx_blks_hit have numbers in there.  If the index is not being used
> then what it causing idx_blks_read and idx_blks_hit to increase over
> time?  I'm wondering if those increase due to DML on the table.

Yes, I think that's the case: index updates will cause the per-block
counters to advance, but only an index search will increment idx_scan.

I'd recommend testing this theory for yourself in an idle database,
though.  It's not impossible that Aurora works differently from
community PG.

Another thing to keep in mind is that in versions before PG 15,
the statistics subsystem is (by design) unreliable and might sometimes
miss events under load.  This effect isn't big enough to invalidate
a conclusion that an index with idx_scan = 0 isn't being used, but
it's something to keep in mind when running small tests that are
only expected to record a few events.

            regards, tom lane



Re: Postgres index usage

From
Greg Sabino Mullane
Date:
On Wed, Aug 7, 2024 at 1:06 PM Dirschel, Steve <steve.dirschel@thomsonreuters.com> wrote:

I’ve found multiple postings out there saying you can query pg_stat_all_indexes and look at idx_scan to know if an index has been used by queries.  I want to be 100% sure I can rely on that table/column to know if an index has never been used.


Also make sure you check pg_stat_all_indexes on your replicas as well. Each has their own independent idx_scan counters. So while your primary is not using a particular index, one or more of your replicas might be.

Cheers,
Greg