Thread: "not related" code blocks for removal of dead rows when using vacuum and this kills the performance
"not related" code blocks for removal of dead rows when using vacuum and this kills the performance
Re: "not related" code blocks for removal of dead rows when using vacuum and this kills the performance
On Mon, 2024-02-19 at 16:14 +0000, Lars Aksel Opsahl wrote: > Then we start testing VACUUM and very simple SQL testing in another window. > > We can now show we have performance of "3343.794 ms" and not "0.123 ms", which > is what we get when we are able to remove dead rows and run a new analyze. > > The problem is that as long as the master code is active, we cannot remove > alle dead rows and that what seems to be killing the performance. > > With active I mean in hanging on pg_sleep and remember that this master has > not created the test table or inserted any data in this test table it self. > > Is the expected behavior ? It is not entirely clear what you are doing, but it seems like you are holding a database transaction open, and yes, then it is expected behavior that VACUUM cannot clean up dead rows in the table. Make sure that your database transactions are short. Don't use table or row locks to synchronize application threads. What you could use to synchronize your application threads are advisory locks, they are not tied to a database transaction. Yours, Laurenz Albe
Re: "not related" code blocks for removal of dead rows when using vacuum and this kills the performance
Re: "not related" code blocks for removal of dead rows when using vacuum and this kills the performance
Re: "not related" code blocks for removal of dead rows when using vacuum and this kills the performance
On Tue, 2024-02-20 at 05:46 +0000, Lars Aksel Opsahl wrote: > If this is expected behavior it means that any user on the database that writes > a long running sql that does not even insert any data can kill performance for > any other user in the database. Yes, that is the case. A long running query will hold a snapshot, and no data visible in that snapshot can be deleted. That can cause bloat, which can impact performance. > So applications like QGIS who seems to keep open connections for a while can > then also kill the performance for any other user in the data. No, that is not a problem. Keeping *connections* open is a good thing. It is keeping data modifying transactions, cursors or long-running queries open that constitutes a problem. Yours, Laurenz Albe
Re: "not related" code blocks for removal of dead rows when using vacuum and this kills the performance
Sent: Tuesday, February 20, 2024 8:29 AM
I’ve found multiple postings out there saying you can query pg_stat_all_indexes and look at idx_scan to know if an index has been used by queries. I want to be 100% sure I can rely on that table/column to know if an index has never been used.
I queried that table for a specific index and idx_scan is 0. I queried pg_statio_all_indexes and can see idx_blks_read and idx_blks_hit have numbers in there. If the index is not being used then what it causing idx_blks_read and idx_blks_hit to increase over time? I’m wondering if those increase due to DML on the table. Could anyone please confirm I can rely on pg_stat_all_index.idx_scan to know if queries are using an index and the increases over time in idx_blks_read and idx_blks_hit in pg_statio_all_indexes would be from DML (or possibly vacuum or other things)?
Thanks in advance.
Didn’t mention- this is Aurora Postgres version 14.6 if that matters for my question. Thanks
From: Dirschel, Steve <steve.dirschel@thomsonreuters.com>
Sent: Wednesday, August 7, 2024 12:06 PM
To: pgsql-performance@lists.postgresql.org
Subject: Postgres index usage
I’ve found multiple postings out there saying you can query pg_stat_all_indexes and look at idx_scan to know if an index has been used by queries. I want to be 100% sure I can rely on that table/column to know if an index has never been used.
I queried that table for a specific index and idx_scan is 0. I queried pg_statio_all_indexes and can see idx_blks_read and idx_blks_hit have numbers in there. If the index is not being used then what it causing idx_blks_read and idx_blks_hit to increase over time? I’m wondering if those increase due to DML on the table. Could anyone please confirm I can rely on pg_stat_all_index.idx_scan to know if queries are using an index and the increases over time in idx_blks_read and idx_blks_hit in pg_statio_all_indexes would be from DML (or possibly vacuum or other things)?
Thanks in advance.
This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html
"Dirschel, Steve" <steve.dirschel@thomsonreuters.com> writes: > I queried that table for a specific index and idx_scan is 0. I > queried pg_statio_all_indexes and can see idx_blks_read and > idx_blks_hit have numbers in there. If the index is not being used > then what it causing idx_blks_read and idx_blks_hit to increase over > time? I'm wondering if those increase due to DML on the table. Yes, I think that's the case: index updates will cause the per-block counters to advance, but only an index search will increment idx_scan. I'd recommend testing this theory for yourself in an idle database, though. It's not impossible that Aurora works differently from community PG. Another thing to keep in mind is that in versions before PG 15, the statistics subsystem is (by design) unreliable and might sometimes miss events under load. This effect isn't big enough to invalidate a conclusion that an index with idx_scan = 0 isn't being used, but it's something to keep in mind when running small tests that are only expected to record a few events. regards, tom lane
I’ve found multiple postings out there saying you can query pg_stat_all_indexes and look at idx_scan to know if an index has been used by queries. I want to be 100% sure I can rely on that table/column to know if an index has never been used.