Home > mailing lists

Re: Seqscan in MAX(index_column) - Mailing list pgsql-hackers

From	Bruce Momjian
Subject	Re: Seqscan in MAX(index_column)
Date	September 4, 2003 23:03:15
Msg-id	200309050202.h8522jr06156@candle.pha.pa.us Whole thread Raw
In response to	Re: Seqscan in MAX(index_column) (Christopher Browne <cbbrowne@acm.org>)
Responses	Re: Seqscan in MAX(index_column)
List	pgsql-hackers

Tree view

Christopher Browne wrote:
> > IMHO portability is an important point. People are used to MAX() and
> > COUNT(*), and will be surprised that they need some special
> > treatment. While the reasons for this are perfectly explainable,
> > speeding up these aggregates with some extra effort would make porting
> > a bit easier.
> 
> The availability of cleverness with MAX()/MIN() is no grand surprise;
> it would be very nice to get some expansion of that to "SELECT VALUE
> FROM TABLE WHERE (CRITERIA) ORDER BY VALUE DESCENDING LIMIT 1;"
> 
> But I'm _very_ curious as to what the anticipated treatment to collect
> COUNT() more efficiently would be.  I would expect that it would only
> be able to get tuned much more if there's NO "where" clause, so that
> it could use some ("magically-kept-up-to-date") stats on table size.
> 
> I don't see any way to optimize COUNT when numbers of rows can
> continually vary.  Storing stats somewhere will just make updates more
> expensive.  And if those stats are for the table, that doesn't help me
> if I want "COUNT(*) FROM TABLE WHERE UPDATED_ON BETWEEN NOW() - '1
> day' and NOW()".

Yes, count would only use the cached stats for non-WHERE clause
COUNT(*).

My idea is that if a transaction doing a COUNT(*) would first look to
see if there already was a visible cached value, and if not, it would do
the COUNT(*) and insert into the cache table.  Any INSERT/DELETE would
remove the value from the cache.  As I see it, the commit of the
INSERT/DELETE transaction would then auto-invalidate the cache at the
exact time the transaction commits.  This would allow MVCC visibility of
the counts.

A trickier idea would be for INSERT/DELETE to UPDATE the cached value. 
It might be possible to always have a valid cache value for COUNT(*).
(COPY would also need to update the cache.)

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

pgsql-hackers by date:

From: Christopher Browne
Date: 04 September 2003, 22:55:17
Subject: Re: Seqscan in MAX(index_column)

From: Neil Conway
Date: 04 September 2003, 23:16:59
Subject: Re: Seqscan in MAX(index_column)

Re: Seqscan in MAX(index_column) - Mailing list pgsql-hackers

Previous

Next