Home > mailing lists

Re: sigh - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: sigh
Date	January 3, 2004 19:20:03
Msg-id	000501c3cdf7$44d71b70$7bc886d9@LaptopDellXP Whole thread Raw
In response to	Re: sigh (Mark Kirkwood <markir@paradise.net.nz>)
Responses	Re: sigh
List	pgsql-hackers

Tree view

Can I chip in? I've had a look in the past at the way various databases
perform this. Most just go and read the data, though Informix does seem
to keep a permanent record of the number of rows in a table...which
probably adds overhead you don't really want.

Select count(*) could be evaluated against any available index
sub-tables, since all that is required is to count the rows. That would
be significantly faster than a full file scan and accurate too. You'd
simply count the pointers, after evaluating any WHERE clause against the
indexed col values - so it won't work except for fairly simple
count(*)'s. 

Why not implement estimated_count as a dictionary lookup, directly using
the value recorded there by the analyze? That would be the easiest way
to reuse existing code and give you access to many previously calculated
values.

This whole area is a major performance improver, with lots of
cross-overs with the materialized view sub-project.

Could you say a little more about why you wanted to achieve this?

Best Regards

Simon Riggs
2nd Quadrant
+44-7900-255520 

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Mark Kirkwood
Sent: Monday, December 29, 2003 08:36
To: Randolf Richardson
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] *sigh*

*growl* - it sounds like the business...and I was all set to code it, 
however after delving into Pg's aggregation structure a bit, it suffers 
a fatal flaw :

There appears to be no way to avoid visiting every row when defining an 
aggregate (even if you do nothing on each one) -- which defeats the 
whole point of my suggestion (i.e avoiding the visit to every row)

To make the original idea work requires amending the definition of Pg 
aggregates to introduce "fake" aggregates that don't actually get 
evaulated for every row. At this point I am not sure if this sort of 
modification is possible or reasonable - others who know feel free to 
chip in :-)

regards

Mark

Randolf Richardson wrote:

>"markir@paradise.net.nz (Mark Kirkwood)" wrote in 
>comp.databases.postgresql.hackers:
>
>[sNip]
>  
>
>>How about:
>>
>>Implement a function "estimated_count" that can be used instead of 
>>"count". It could use something like the algorithm in 
>>src/backend/commands/analyze.c to get a reasonably accurate psuedo
count 
>>quickly.
>>
>>The advantage of this approach is that "count" still means
(exact)count 
>>(for your xact snapshot anyway). Then the situation becomes:
>>
>>Want a fast count? - use estimated_count(*)
>>Want an exact count - use count(*)
>>    
>>
>
>        I think this is an excellent solution.
>
>  
>

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

pgsql-hackers by date:

From: Martin Marques
Date: 03 January 2004, 19:19:59
Subject: Re: [GENERAL] Connecting to Postgres

From: Michael Gill
Date: 03 January 2004, 19:20:04
Subject: Restrict users from describing table

Re: sigh - Mailing list pgsql-hackers

Previous

Next

Re: *sigh* - Mailing list pgsql-hackers

Previous

Next

Re: sigh - Mailing list pgsql-hackers