Re: *sigh* - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: *sigh*
Date
Msg-id 000501c3cdf7$44d71b70$7bc886d9@LaptopDellXP
Whole thread Raw
In response to Re: *sigh*  (Mark Kirkwood <markir@paradise.net.nz>)
Responses Re: *sigh*
List pgsql-hackers
Can I chip in? I've had a look in the past at the way various databases
perform this. Most just go and read the data, though Informix does seem
to keep a permanent record of the number of rows in a table...which
probably adds overhead you don't really want.

Select count(*) could be evaluated against any available index
sub-tables, since all that is required is to count the rows. That would
be significantly faster than a full file scan and accurate too. You'd
simply count the pointers, after evaluating any WHERE clause against the
indexed col values - so it won't work except for fairly simple
count(*)'s. 

Why not implement estimated_count as a dictionary lookup, directly using
the value recorded there by the analyze? That would be the easiest way
to reuse existing code and give you access to many previously calculated
values.

This whole area is a major performance improver, with lots of
cross-overs with the materialized view sub-project.

Could you say a little more about why you wanted to achieve this?

Best Regards

Simon Riggs
2nd Quadrant
+44-7900-255520 

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Mark Kirkwood
Sent: Monday, December 29, 2003 08:36
To: Randolf Richardson
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] *sigh*

*growl* - it sounds like the business...and I was all set to code it, 
however after delving into Pg's aggregation structure a bit, it suffers 
a fatal flaw :

There appears to be no way to avoid visiting every row when defining an 
aggregate (even if you do nothing on each one) -- which defeats the 
whole point of my suggestion (i.e avoiding the visit to every row)

To make the original idea work requires amending the definition of Pg 
aggregates to introduce "fake" aggregates that don't actually get 
evaulated for every row. At this point I am not sure if this sort of 
modification is possible or reasonable - others who know feel free to 
chip in :-)

regards

Mark

Randolf Richardson wrote:

>"markir@paradise.net.nz (Mark Kirkwood)" wrote in 
>comp.databases.postgresql.hackers:
>
>[sNip]
>  
>
>>How about:
>>
>>Implement a function "estimated_count" that can be used instead of 
>>"count". It could use something like the algorithm in 
>>src/backend/commands/analyze.c to get a reasonably accurate psuedo
count 
>>quickly.
>>
>>The advantage of this approach is that "count" still means
(exact)count 
>>(for your xact snapshot anyway). Then the situation becomes:
>>
>>Want a fast count? - use estimated_count(*)
>>Want an exact count - use count(*)
>>    
>>
>
>        I think this is an excellent solution.
>
>  
>


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings



pgsql-hackers by date:

Previous
From: Martin Marques
Date:
Subject: Re: [GENERAL] Connecting to Postgres
Next
From: Michael Gill
Date:
Subject: Restrict users from describing table