Re: So, is COUNT(*) fast now? - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: So, is COUNT(*) fast now? |
Date | |
Msg-id | CA+TgmobKtqjhHU6gquNOmhbHcu=zUhBNWwbfv-weKNpS8-NwgA@mail.gmail.com Whole thread Raw |
In response to | Re: So, is COUNT(*) fast now? (Jeff Janes <jeff.janes@gmail.com>) |
List | pgsql-hackers |
On Sun, Oct 23, 2011 at 7:01 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Fri, Oct 21, 2011 at 12:52 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> >> Also, this line is kind of expensive: >> >> if (!visibilitymap_test(scandesc->heapRelation, >> ItemPointerGetBlockNumber(tid), >> &node->ioss_VMBuffer)) >> >> Around 2%. But I don't see any way to avoid that, or even make it cheaper. > > Could we cache by ItemPointerGetBlockNumber(tid) the results of those > tests, for groups of tids on the same index page? > > How useful this would be would depend on how well-clustered the table > and index are. I thought about that, but the existing code is so ridiculously cheap that it's hard to believe a caching layer would save enough to pay for itself. I mean, I presume that the cost attributed to that line has to be associated with either (a) one of the pointer deferences, (b) the expense of evaluating ItemPointerGetBlockNumber(), (c) setting up the function call, or perhaps (d) overhead incurred as a result of branch mis-prediction. The actual time spent *inside* visibilitymap_test will be attributed to that function, not this one. If you add up the time for this line and visibilitymap_test(), it's like 10% of the runtime, which seems pretty significant. But it's 10% of the runtime that is spent basically a handful of arithmetic operations and then reading a byte from shared memory. It's astonishing to find that so expensive on a test with just one backend running. If you stick some kind of cache in there, it's going to involve adding a branch that isn't there now, and I think that's going to be pricey given how hot this code apparently is. Also, I'm not sure it's safe. Suppose that we lock the index page, return a tuple, check the visibility map, and find the page all visible. Another transaction comes along, adds a tuple to the index page, and clears the visibility map bit. We then go back, relock the index page, and return another tuple. We'd better notice that the visibility map bit has now been cleared, or we're in trouble. I wonder if it might help to create some method for the index to return all the matching keys on a single index page in one call. If you're dealing with an MVCC snapshot, any new tuples added after the start of the scan can't be visible anyway. That would save the overhead of unlocking and relocking the buffer once per tuple, and probably some overhead associated with validating and decoding the possibly-changed page contents each time. If you did it that way, it would also be safe to do what you're proposing - if a bunch of the tuples on the index page are also on the same heap page, you could do one visibility map check for all of them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: