Re: [HACKERS] mdnblocks is an amazing time sink in huge relations - Mailing list pgsql-hackers

From Vadim Mikheev
Subject Re: [HACKERS] mdnblocks is an amazing time sink in huge relations
Date
Msg-id 380D6B63.9E7A158@krs.ru
Whole thread Raw
In response to Re: [HACKERS] mdnblocks is an amazing time sink in huge relations  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses RE: [HACKERS] mdnblocks is an amazing time sink in huge relations  ("Hiroshi Inoue" <Inoue@tpf.co.jp>)
List pgsql-hackers
Tom Lane wrote:
> 
> Vadim Mikheev <vadim@krs.ru> writes:
> > I agreed that there is no way to get accurate estimation for
> > # of rows to be seen by a query...
> > Well, let's keep up-to-date # of rows present in relation:
> > in any case a query will have to read them and this is what
> > we need to estimate cost of simple scans, as for costs of
> > joins - now way, currently(?) -:(
> 
> The optimizer is perfectly happy with approximate tuple counts.  It has
> to make enough other guesstimates that I really doubt an exact tuple
> count could help it measurably.  So updating the count via VACUUM is an
> appropriate solution as far as the optimizer is concerned.  The only

I'm not sure: scans read all (visible/unvisible, committed/uncommittd) 
tuples before making visibility decision... though, relpages is 
primary value for cost estimation.

> good reason I've ever seen for trying to keep an exact tuple count at
> all times is to allow short-circuiting of SELECT COUNT(*) queries ---
> and even there, it's only useful if there's no WHERE or GROUP BY.

...and only in READ COMMITTED mode...

> As far as I can see, keeping an exact tuple count couldn't possibly
> be worth the overhead it'd take.  Keeping an exact block count may
> have the same problem, but I'm not sure either way.

relpages is more significant thing to know. Extending relation
by one block at time is bad for insert performance. It would be nice
to have two values per relation in shared cache - # of blocks and
last block used. On the other hand this is mostly mdmgr issue.

But in any case, it seems to me that using shmem for 
shared catalog cache is much worther than for
shared cache invalidation...

Vadim


pgsql-hackers by date:

Previous
From: Peter Mount
Date:
Subject: RE: [HACKERS] Readline use in trouble?
Next
From: Gerd Thielemann AEK
Date:
Subject: