Thread: PL/PgSQL for counting all rows in all tables.
Folks, I've noticed that when coming into an organization, I need to do some counting on what's in all the tables in a db. This shortens that process considerably, with the usual caveat that count(*) is a heavy operation. By the way, the 3 lines following "godawful hack" point to something PL/PgSQL ought (imho) to be able to do, namely something like EXECUTE INTO [ record | rowtype | type ] [sql TEXT string returning a single row]; Here 'tis: version 0.01... CREATE TYPE table_count AS (table_name TEXT, num_rows INTEGER); CREATE OR REPLACE FUNCTION count_em_all () RETURNS SETOF table_count AS ' DECLARE the_count RECORD; t_name RECORD; r table_count%ROWTYPE; BEGIN FOR t_name IN SELECT c.relname FROM pg_catalog.pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind = ''r'' AND n.nspname = ''public'' ORDER BY 1 LOOP -- The next 3 lines are a godawful hack. :P FOR the_countIN EXECUTE ''SELECT COUNT(*) AS "count" FROM '' || t_name.relname LOOP END LOOP; r.table_name:= t_name.relname; r.num_rows := the_count.count; RETURN NEXT r; END LOOP; RETURN; END; ' LANGUAGE plpgsql; COMMENT ON FUNCTION count_em_all () IS 'Spits out all tables in the public schema and the exact row counts for each.'; -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 My version: ANALYZE; SELECT n.nspname, relname, reltuples FROM pg_class c, pg_namespace n WHERE c.relnamespace=n.oid AND relkind='r' AND NOT n.nspname ~ '^pg_' ORDER BY 1,2; - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200409282138 -----BEGIN PGP SIGNATURE----- iD8DBQFBWhIivJuQZxSWSsgRAm3cAJwLySzR5mpOHDT06LN6vj9M5zyyOQCgt4j1 gWDPcXJoUZbTlZvxxrE7fcY= =MOMs -----END PGP SIGNATURE-----
"Greg Sabino Mullane" wrote: > ANALYZE; > > SELECT n.nspname, relname, reltuples > FROM pg_class c, pg_namespace n > WHERE c.relnamespace=n.oid > AND relkind='r' > AND NOT n.nspname ~ '^pg_' > ORDER BY 1,2; Maybe this gem should be passed onto the pgadmin folks. When you click on a table name in the interface it does what I can only presume is a count(*) from relation, which takes forever on enormous relations. It then claims this to be a row estimate anyway, so they could probably drop the analyze. Ian -- Ian Freislich
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Ian FREISLICH > Sent: 05 October 2004 09:57 > To: Greg Sabino Mullane > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] PL/PgSQL for counting all rows in all tables. > > "Greg Sabino Mullane" wrote: > > ANALYZE; > > > > SELECT n.nspname, relname, reltuples > > FROM pg_class c, pg_namespace n > > WHERE c.relnamespace=n.oid > > AND relkind='r' > > AND NOT n.nspname ~ '^pg_' > > ORDER BY 1,2; > > Maybe this gem should be passed onto the pgadmin folks. When > you click on a table name in the interface it does what I can > only presume is a count(*) from relation, which takes forever > on enormous relations. It then claims this to be a row > estimate anyway, so they could probably drop the analyze. The 'Rows (counted)' value is taken from a count(*), but only if the 'Rows (estimated)' value (which comes from pg_class.reltuples, as above, but without the costly analyze) is less than the cut-off value set in the options dialogue. So, if you never want to wait for the exact row count, just set the appropriate option to zero. Regards Dave
On Tue, 2004-10-05 at 05:09, Dave Page wrote: > > > > -----Original Message----- > > From: pgsql-hackers-owner@postgresql.org > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Ian FREISLICH > > Sent: 05 October 2004 09:57 > > To: Greg Sabino Mullane > > Cc: pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] PL/PgSQL for counting all rows in all tables. > > > > "Greg Sabino Mullane" wrote: > > > ANALYZE; > > > > > > SELECT n.nspname, relname, reltuples > > > FROM pg_class c, pg_namespace n > > > WHERE c.relnamespace=n.oid > > > AND relkind='r' > > > AND NOT n.nspname ~ '^pg_' > > > ORDER BY 1,2; > > > > Maybe this gem should be passed onto the pgadmin folks. When > > you click on a table name in the interface it does what I can > > only presume is a count(*) from relation, which takes forever > > on enormous relations. It then claims this to be a row > > estimate anyway, so they could probably drop the analyze. > > The 'Rows (counted)' value is taken from a count(*), but only if the > 'Rows (estimated)' value (which comes from pg_class.reltuples, as above, > but without the costly analyze) is less than the cut-off value set in > the options dialogue. So, if you never want to wait for the exact row > count, just set the appropriate option to zero. > How do you handle table growth that makes the reltuples value out of whack since the last analyze? Seems unfortunate that people may not realize that the numbers they are looking at are incorrect but I don't see much way to avoid it. Seems new tables would have that problem too since they would default to 1000... do you analyze after table creation? Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
> -----Original Message----- > From: Robert Treat [mailto:xzilla@users.sourceforge.net] > Sent: 11 October 2004 22:30 > To: Dave Page > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] PL/PgSQL for counting all rows in all tables. > > How do you handle table growth that makes the reltuples value > out of whack since the last analyze? Seems unfortunate that > people may not realize that the numbers they are looking at > are incorrect but I don't see much way to avoid it. Right-click the table object and select 'Count' on the current versions. Previously, iirc it showed the message 'Refresh table to count' in the actual count field, so you did a right-click -> Refresh. > Seems new tables would have that problem too since they would > default to 1000... do you analyze after table creation? No. 1000 is below the default threshold, so it'll get counted. If you lower the threshold, it will say 'not counted' in the actual count field. Regards, Dave.
On Tuesday 12 October 2004 03:22, Dave Page wrote: > > -----Original Message----- > > From: Robert Treat [mailto:xzilla@users.sourceforge.net] > > Sent: 11 October 2004 22:30 > > To: Dave Page > > Cc: pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] PL/PgSQL for counting all rows in all tables. > > > > How do you handle table growth that makes the reltuples value > > out of whack since the last analyze? Seems unfortunate that > > people may not realize that the numbers they are looking at > > are incorrect but I don't see much way to avoid it. > > Right-click the table object and select 'Count' on the current versions. > Previously, iirc it showed the message 'Refresh table to count' in the > actual count field, so you did a right-click -> Refresh. > Maybe I didn't phrase that quite right. How would a user know that he needs to do a real count? For example, if I have a table with est 1 million rows, and I load another 1 million rows into it, wont pgadmin show me 1 million rows until I run an analyze? Even if I run a manual count, wont it show 1 million next time I come into the application, and that time I may not realize that the table is off by 1 million rows so I take the estimated count at face value. BTW The reason I'm asking about this is we're trying to come up with a good scheme for phppgadmin to show estimated counts without showing incorrect numbers to users... or at least giving them a clue that the numbers might be really off. -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
Robert Treat wrote: >> >>Right-click the table object and select 'Count' on the current versions. >>Previously, iirc it showed the message 'Refresh table to count' in the >>actual count field, so you did a right-click -> Refresh. >> > > > Maybe I didn't phrase that quite right. How would a user know that he needs to > do a real count? For example, if I have a table with est 1 million rows, and > I load another 1 million rows into it, wont pgadmin show me 1 million rows > until I run an analyze? Even if I run a manual count, wont it show 1 million > next time I come into the application, and that time I may not realize that > the table is off by 1 million rows so I take the estimated count at face > value. pgAdmin will either show the true row count from a SELECT COUNT(*), or "not counted". The last rowcount from vacuum is named "estimated row count", so it's clearly distinguishable which number is counted and which is estimated. Regards, Andreas
> BTW The reason I'm asking about this is we're trying to come up with a good > scheme for phppgadmin to show estimated counts without showing incorrect > numbers to users... or at least giving them a clue that the numbers might be > really off. The vacuum daemon should know how far off the count might be. You can display 100 +- 15%, where the 100 is collected by ANALYZE and the 15% is determined based on the vacuum daemon estimation. Of course, AVD needs to be implemented and the interface created (a pg_count view?)...
> -----Original Message----- > From: Robert Treat [mailto:xzilla@users.sourceforge.net] > Sent: 12 October 2004 21:21 > To: Dave Page > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] PL/PgSQL for counting all rows in all tables. > > Maybe I didn't phrase that quite right. How would a user know > that he needs to do a real count? For example, if I have a > table with est 1 million rows, and I load another 1 million > rows into it, wont pgadmin show me 1 million rows until I run > an analyze? Even if I run a manual count, wont it show 1 > million next time I come into the application, and that time > I may not realize that the table is off by 1 million rows so > I take the estimated count at face value. Yeah, that's exactly what can happen. If it makes you feel any better about doing it in phppgadmin, then we never got any complaints about it! We simply rely on the fact that it's labelled as an estimate. I can't see that there's any other way around it that doesn't require running a potentially very expensive ANALYZE to update the stats. Regards, Dave