Thread: make BufferGetBlockNumber() a macro
The attached patch re-implements BufferGetBlockNumber() as a macro, for performance reasons. It also adds an assertion that should probably be present. I ran into this while profiling the hash index creation code (and trying to determine why it's so slow). I noticed that this simple function was being called about 2,500,000 times while creating a hash index on a reasonably sized table (225,000 rows). By implementing this as a macro, the performance of hash index creation improves by 8% (in a totally unscientific and probably unreliable benchmark). Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
Attachment
Neil Conway <nconway@klamath.dyndns.org> writes: > The attached patch re-implements BufferGetBlockNumber() as a macro, > for performance reasons. The trouble with this is that it forces storage/buf_internals.h to become part of bufmgr.h, thereby completely destroying any information hiding that we had from the division of the two headers --- any client of bufmgr.h might now access bufmgr internal stuff without noticing that it was doing wrong. I am not convinced that BufferGetBlockNumber is a bottleneck anyway; at least not if you don't have Assert checking enabled. There are a few places that do things like ItemPointerSet(&(tuple->t_self), BufferGetBlockNumber(buffer), offnum); which I believe results in two calls to BufferGetBlockNumber because of the way BlockIdSet is coded. This would be good to clean up, but it's BlockIdSet's problem not BufferGetBlockNumber's. > It also adds an assertion that should probably > be present. FWIW, BufferIsPinned also checks BufferIsValid; you do not need both. regards, tom lane
On Mon, 01 Apr 2002 17:59:06 -0500 "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Neil Conway <nconway@klamath.dyndns.org> writes: > > The attached patch re-implements BufferGetBlockNumber() as a macro, > > for performance reasons. > > The trouble with this is that it forces storage/buf_internals.h to > become part of bufmgr.h, thereby completely destroying any information > hiding that we had from the division of the two headers --- any client > of bufmgr.h might now access bufmgr internal stuff without noticing > that it was doing wrong. Yeah, I considered that. IMHO, this isn't that big of a drawback: C's information hiding is weak enough that you need to rely on the client programmer to exercise good judgement anyway; for example, there is absolutely nothing to stop someone from just including storage/buf_internals.h to begin with. Furthermore, in order to actually _see_ the declarations in buf_internals.h, a client programmer still needs to look at it, not bufmgr.h (i.e. the #include only effects cpp). If you prefer, I can move the definition of BufferDesc from storage/buf_internals.h to storage/bufmgr.h; this still destroys a little of the information hiding of the old code, but if a client programmer is stupid enough to manipulate a structure that is marked as internal and isn't packaged with any support functions or other code for dealing with it, they deserve their code getting broken. > I am not convinced that BufferGetBlockNumber is a bottleneck anyway; > at least not if you don't have Assert checking enabled. I didn't have assert checking enabled. As for it being a bottleneck, I'm sure the reason hash index creation is so slow lies elsewhere; but that doesn't stop this from being a performance improvement. > There are a few places that do things like > ItemPointerSet(&(tuple->t_self), BufferGetBlockNumber(buffer), offnum); > which I believe results in two calls to BufferGetBlockNumber because of > the way BlockIdSet is coded. This would be good to clean up, but it's > BlockIdSet's problem not BufferGetBlockNumber's. If BufferGetBlockNumber is inlined (i.e. implemented as either a macro or a C99 inline function), it becomes just a conditional and an array access, so there's little need to optimize the usage of it any further. As it stands, a full function call is quite a lot of overhead, particularly for something that is called so often (at least for the situation I was looking at). > > It also adds an assertion that should probably > > be present. > > FWIW, BufferIsPinned also checks BufferIsValid; you do not need both. Ah, ok. So BufferIsPinned() is the correct assertion. An updated patch is attached. Also, I removed an obsolete comment + line of code from storage/buf.h, and deleted an incoherent comment from storage/bufmgr.h Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
Attachment
Neil Conway <nconway@klamath.dyndns.org> writes: > C's information hiding is weak enough that you need to rely on the > client programmer to exercise good judgement anyway; This is not a good argument for making the hiding even weaker. I still object to this patch. It would be interesting to try to understand *why* BufferGetBlockNumber is showing up so high in your profile; AFAICS it is not called in any paths where it'd be a bottleneck --- ie, there is other much more expensive stuff in the same code paths. Do you have call counts by caller? What exactly was the test case, anyway? regards, tom lane
On Mon, 01 Apr 2002 20:50:27 -0500 "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Neil Conway <nconway@klamath.dyndns.org> writes: > > C's information hiding is weak enough that you need to rely on the > > client programmer to exercise good judgement anyway; > > This is not a good argument for making the hiding even weaker. I still > object to this patch. Even if it were modified to just move the BufferDesc structure into storage/bufmgr.h? > It would be interesting to try to understand *why* BufferGetBlockNumber > is showing up so high in your profile; It's showing up "high" in the sense that it's being called a lot of times (2.5 million); each call, of course, is fast, and is dominated by other parts of the profile (I just sent in the patch since I thought it was an easy way to improve the performance a little bit). > Do you have call counts by caller? Yes, I have the full output of gprof. I didn't send it along since it's ~400 KB; if you're interested, let me know. > What exactly was the test case, anyway? I created a table with a single TEXT column; I copied the contents of /usr/share/dict/words into the column, 4 times. I then quit the session and started a new one in which I created a hash index on the column, and ran gprof on the resulting profile data. Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
On Mon, 01 Apr 2002 20:50:27 -0500 "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Neil Conway <nconway@klamath.dyndns.org> writes: > > C's information hiding is weak enough that you need to rely on the > > client programmer to exercise good judgement anyway; > > This is not a good argument for making the hiding even weaker. I still > object to this patch. Ok. The attached trivial patch removes some obsolete or incoherent comments and fixes an assertion in BufferGetBlockNumber. The macro-ization of BufferGetBlockNumber is _not_ included. Please apply. Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
Attachment
Your patch has been added to the PostgreSQL unapplied patches list at: http://candle.pha.pa.us/cgi-bin/pgpatches I will try to apply it within the next 48 hours. --------------------------------------------------------------------------- Neil Conway wrote: > On Mon, 01 Apr 2002 20:50:27 -0500 > "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > > Neil Conway <nconway@klamath.dyndns.org> writes: > > > C's information hiding is weak enough that you need to rely on the > > > client programmer to exercise good judgement anyway; > > > > This is not a good argument for making the hiding even weaker. I still > > object to this patch. > > Ok. The attached trivial patch removes some obsolete or incoherent > comments and fixes an assertion in BufferGetBlockNumber. The > macro-ization of BufferGetBlockNumber is _not_ included. > > Please apply. > > Cheers, > > Neil > > -- > Neil Conway <neilconway@rogers.com> > PGP Key ID: DB3C29FC [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026