Home > mailing lists

Re: gprof SELECT COUNT(*) results - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: gprof SELECT COUNT(*) results
Date	November 25, 2005 11:38:52
Msg-id	1132933130.2906.7.camel@localhost.localdomain Whole thread Raw
In response to	Re: gprof SELECT COUNT(*) results (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On Fri, 2005-11-25 at 09:54 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Thu, 2005-11-24 at 23:48 -0500, Tom Lane wrote:
> >> Yeah, I was wondering the same.  It'd be possible to rewrite the seqscan
> >> stuff so that we do the visibility tests for all the tuples on a given
> >> page at once, taking the buffer content lock just once, and saving aside
> >> the valid tuple IDs to return later.  This should definitely be faster
> >> when all the tuples actually get fetched.  
> 
> > I was thinking of the brute force approach: take a complete copy of the
> > block when we read the first tuple off it. That way any wierdness
> > on-block is avoided until we logically attempt to read that tuple. It
> > also allows us to palloc the right amount of space first time.
> 
> That occurred to me too, but I rejected it on two grounds:
> * All that data copying will be expensive.
> * We couldn't update tuple commit hint bits on the real page.

OK, the second one is the clincher.

> Also, I'm not at all sure that it's a good idea to release the buffer
> pin before we're genuinely done with the page.  That sort of change
> would impact such things as the VACUUM interlock algorithm.  Maybe it'd
> be OK but I'm disinclined to mess with it for a dubious speed gain.

I think its safer not to try it all in one go. It's worth coming back to
later though: VACUUM will never remove rows the scanner can see anyway.

> Even with my version of the proposal, it'd be risky to use the check-
> all-in-advance approach for non-MVCC snapshots such as SnapshotNow.
> I'm not sure that there are any places that would get broken by missing
> tuples that weren't yet committed when the page was first touched ...
> but I'm not sure there aren't, either.  We could avoid this risk by
> having two paths through heapgettup, though.

That seems prudent. The main performance optimization is required for
MVCC access anyway; we seldom scan catalog tables and hopefully they
never get too big.

> > Are you, or will you be implementing this?
> 
> I plan to take a look at it soon.

Much appreciated and well done to Qingqing for spotting this.

Best Regards, Simon Riggs

pgsql-hackers by date:

From: Tom Lane
Date: 25 November 2005, 11:32:16
Subject: Re: Ipv6 and Postgresql 8.0.3

From: Olivier Thauvin
Date: 25 November 2005, 12:00:03
Subject: Re: gprof SELECT COUNT(*) results

Re: gprof SELECT COUNT(*) results - Mailing list pgsql-hackers

Previous

Next