Re: gprof SELECT COUNT(*) results - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: gprof SELECT COUNT(*) results
Date
Msg-id 1132933130.2906.7.camel@localhost.localdomain
Whole thread Raw
In response to Re: gprof SELECT COUNT(*) results  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, 2005-11-25 at 09:54 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Thu, 2005-11-24 at 23:48 -0500, Tom Lane wrote:
> >> Yeah, I was wondering the same.  It'd be possible to rewrite the seqscan
> >> stuff so that we do the visibility tests for all the tuples on a given
> >> page at once, taking the buffer content lock just once, and saving aside
> >> the valid tuple IDs to return later.  This should definitely be faster
> >> when all the tuples actually get fetched.  
> 
> > I was thinking of the brute force approach: take a complete copy of the
> > block when we read the first tuple off it. That way any wierdness
> > on-block is avoided until we logically attempt to read that tuple. It
> > also allows us to palloc the right amount of space first time.
> 
> That occurred to me too, but I rejected it on two grounds:
> * All that data copying will be expensive.
> * We couldn't update tuple commit hint bits on the real page.

OK, the second one is the clincher.

> Also, I'm not at all sure that it's a good idea to release the buffer
> pin before we're genuinely done with the page.  That sort of change
> would impact such things as the VACUUM interlock algorithm.  Maybe it'd
> be OK but I'm disinclined to mess with it for a dubious speed gain.

I think its safer not to try it all in one go. It's worth coming back to
later though: VACUUM will never remove rows the scanner can see anyway.

> Even with my version of the proposal, it'd be risky to use the check-
> all-in-advance approach for non-MVCC snapshots such as SnapshotNow.
> I'm not sure that there are any places that would get broken by missing
> tuples that weren't yet committed when the page was first touched ...
> but I'm not sure there aren't, either.  We could avoid this risk by
> having two paths through heapgettup, though.

That seems prudent. The main performance optimization is required for
MVCC access anyway; we seldom scan catalog tables and hopefully they
never get too big.

> > Are you, or will you be implementing this?
> 
> I plan to take a look at it soon.

Much appreciated and well done to Qingqing for spotting this.

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Ipv6 and Postgresql 8.0.3
Next
From: Olivier Thauvin
Date:
Subject: Re: gprof SELECT COUNT(*) results