Re: Seqscan in MAX(index_column) - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Seqscan in MAX(index_column) |
Date | |
Msg-id | 11010.1062771161@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Seqscan in MAX(index_column) (Christopher Browne <cbbrowne@acm.org>) |
List | pgsql-hackers |
Christopher Browne <cbbrowne@acm.org> writes: > Wouldn't this more or less be the same thing as having a trigger that > does, upon each insert/delete "update pg_counts set count = count + 1 > where reltable = 45232;"? (... where 1 would be -1 for deletes, and where > 45232 is the OID of the table...) I think that's exactly what Bruce was suggesting. A slightly more efficient variant is for each transaction to save up its net deltas, and issue a single UPDATE for each table it's touched just before it commits. But that just reduces the volume of update traffic, it doesn't fundamentally alter the concept. > Technically, it seems _feasible_, albeit with the problem that it > turns pg_counts into a pretty horrid bottleneck. Not to mention a likely source of deadlocks. And it still doesn't solve the fundamental objection that you can't get an MVCC-correct answer by examining the table. An idea I was toying with is to do something similar to what was just suggested to David Skoll for his stats problem: instead of using UPDATEs, use INSERTs of delta records. That is, every time a transaction is about to commit, it INSERTs into the counts table a row like "45232 +1" ("I inserted one row") or "45232 -10" ("I deleted ten rows"). Assume that we somehow initialized the counts table with an entry "45232 total-rows" for each table. Then, a COUNT(*) on table 45232 is equivalent to "SELECT SUM(deltas) FROM counts WHERE reltable = 45232". As long as the number of rows you have to look at to compute this sum is smaller than the number of rows in the original table, it's a win. The cool thing about this approach is that it is actually MVCC-correct. If some transaction has committed, but is uncommitted according to your worldview, your SUM will automatically ignore its delta row. Another cool thing is that the INSERTs don't conflict with each other, so there's no contention or deadlock risk. You would periodically (perhaps during VACUUM) update the counts table with operations that are conceptually like BEGIN; INSERT INTO counts SELECT reltable, SUM(deltas) FROM counts WHERE xid < GLOBALXMIN GROUPBY reltable; DELETE FROM counts WHERE xid < GLOBALXMIN; COMMIT; to sweep together the past deltas from transactions that are so old no one cares about their individual effects anymore (GLOBALXMIN is the same cutoff used by VACUUM to decide it can remove a committed-dead tuple). This prevents the number of delta rows from growing indefinitely large over time. > And if I have tables where I insert lots of data, but couldn't care > less how many rows they have, this effort is wasted. Yes, this mechanism would be hugely expensive in any case. I can't see enabling it overall, it would have to be turned on only for specific tables by user command. It'd be interesting to try to code it as a contrib module that's fired by triggers on the tables you want to track. regards, tom lane
pgsql-hackers by date: