Re: vacuum on table1 skips rows because of a query on table2 - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: vacuum on table1 skips rows because of a query on table2 |
Date | |
Msg-id | CA+Tgmobgc3o+emMH3MXGxxg884j_V+kTwQ9sbJBRb-5d2OSyLw@mail.gmail.com Whole thread Raw |
In response to | Re: vacuum on table1 skips rows because of a query on table2 (Virender Singla <virender.cse@gmail.com>) |
Responses |
Re: vacuum on table1 skips rows because of a query on table2
Re: vacuum on table1 skips rows because of a query on table2 |
List | pgsql-hackers |
On Sat, Oct 26, 2019 at 1:44 PM Virender Singla <virender.cse@gmail.com> wrote: > If long-running transaction is "read committed", then we are sure that any new query coming > (even on same table1 as vacuum table) will need snapshot on point of time query start and not the time transaction > starts (but still why read committed transaction on table2 cause vacuum on table1 to skip rows). I wish that this argument were completely correct, but it isn't, because the current query could involve a function written in some procedural language (or in C) which could do anything, including accessing tables that the query hasn't previously touched. It could be that the function will only be called towards the end of the current query's execution, or it could be that it's going to be called multiple times and does different things each time. Now, this is pretty unlikely and most queries don't behave anything like that. They do things like "+" or "coalesce" which don't open new tables. There are contrary examples, though, even among functions built into core, like "table_to_xmlschema", which takes a relation OID as an argument and thus may open a new relation each time it's called. If we had some way of analyzing a query and determining whether it uses any functions or operators that open new tables, then this kind of optimization might be possible, but we don't. However, even if we did have such infrastructure, it wouldn't solve all of our problems, because vacuum would have to know which sessions were running queries that might open new tables and which were running queries that won't open new tables -- and among the latter, it would need to know which tables those sessions already have open. We could make the former available via a new shared memory flag and the latter could, perhaps, be deduced from the lock table, which is already shared. However, if we did all that, VACUUM would potentially have to do significantly more work to deduce the xmin horizon for each table that it wanted to process. Even given all that, I'm moderately confident that something like this would benefit a lot of people. However, it would probably hurt some people too, either because the overhead of figuring out that the current query won't lock any more relations, so that we can advertise that fact in shared memory, or because of the increased overhead of figuring out the xmin horizon for a table to be vacuumed. Users with short-running queries and small tables would be the most likely to be harmed. On the other hand, for users with giant tables, even more aggressive approaches might pay off - e.g. recompute the xmin horizon every 1GB or so, because it might have advanced, and the effort to recheck that might pay off by allowing us to vacuum more stuff sooner. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: