Thread: getpid() function
I have implemented this TODO item: * Add getpid() function to backend There were a large number of pg_stat functions that access pids and backends slots so I added it there: test=> select pg_stat_get_backend_mypid(); pg_stat_get_backend_mypid --------------------------- 2757 (1 row) Applied. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 Index: src/backend/utils/adt/pgstatfuncs.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/pgstatfuncs.c,v retrieving revision 1.4 diff -c -r1.4 pgstatfuncs.c *** src/backend/utils/adt/pgstatfuncs.c 25 Oct 2001 05:49:45 -0000 1.4 --- src/backend/utils/adt/pgstatfuncs.c 31 Jul 2002 00:36:27 -0000 *************** *** 19,24 **** --- 19,25 ---- extern Datum pg_stat_get_blocks_hit(PG_FUNCTION_ARGS); extern Datum pg_stat_get_backend_idset(PG_FUNCTION_ARGS); + extern Datum pg_stat_get_backend_mypid(PG_FUNCTION_ARGS); extern Datum pg_stat_get_backend_pid(PG_FUNCTION_ARGS); extern Datum pg_stat_get_backend_dbid(PG_FUNCTION_ARGS); extern Datum pg_stat_get_backend_userid(PG_FUNCTION_ARGS); *************** *** 208,213 **** --- 209,221 ---- ((ReturnSetInfo *) (fcinfo->resultinfo))->isDone = ExprMultipleResult; PG_RETURN_INT32(result); + } + + + Datum + pg_stat_get_backend_mypid(PG_FUNCTION_ARGS) + { + PG_RETURN_INT32(MyProcPid); } Index: src/include/catalog/pg_proc.h =================================================================== RCS file: /cvsroot/pgsql/src/include/catalog/pg_proc.h,v retrieving revision 1.246 diff -c -r1.246 pg_proc.h *** src/include/catalog/pg_proc.h 24 Jul 2002 19:11:13 -0000 1.246 --- src/include/catalog/pg_proc.h 31 Jul 2002 00:36:36 -0000 *************** *** 2703,2708 **** --- 2703,2710 ---- DESCR("Statistics: Number of blocks found in cache"); DATA(insert OID = 1936 ( pg_stat_get_backend_idset PGNSP PGUID 12 f f t t s 0 23 "" pg_stat_get_backend_idset- _null_ )); DESCR("Statistics: Currently active backend IDs"); + DATA(insert OID = 2026 ( pg_stat_get_backend_mypid PGNSP PGUID 12 f f t f s 0 23 "" pg_stat_get_backend_mypid- _null_ )); + DESCR("Statistics: My backend ID"); DATA(insert OID = 1937 ( pg_stat_get_backend_pid PGNSP PGUID 12 f f t f s 1 23 "23" pg_stat_get_backend_pid -_null_ )); DESCR("Statistics: PID of backend"); DATA(insert OID = 1938 ( pg_stat_get_backend_dbid PGNSP PGUID 12 f f t f s 1 26 "23" pg_stat_get_backend_dbid- _null_ ));
On Tue, Jul 30, 2002 at 08:40:13PM -0400, Bruce Momjian wrote: > I have implemented this TODO item: > > * Add getpid() function to backend > > There were a large number of pg_stat functions that access pids and > backends slots so I added it there: > > test=> select pg_stat_get_backend_mypid(); If we're going to add it to pg_stat_*, why is 'backend' part of the name? All the existing backend_* function fetch some piece of data about a given backend -- whereas this function does not (it takes no arguments). IMHO, a better name would be something like 'backend_process_id()', or 'unix_pid', or 'backend_pid()'. Also, can you add some documentation on this? Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
OK, renamed to backend_pid() to match the libpq name. I was unsure about merging it into the stats stuff myself. setest=> select backend_pid(); backend_pid ------------- 12996 (1 row) Where does the mention belong in the docs? I have it in the monitoring section in the stats section right now. --------------------------------------------------------------------------- Neil Conway wrote: > On Tue, Jul 30, 2002 at 08:40:13PM -0400, Bruce Momjian wrote: > > I have implemented this TODO item: > > > > * Add getpid() function to backend > > > > There were a large number of pg_stat functions that access pids and > > backends slots so I added it there: > > > > test=> select pg_stat_get_backend_mypid(); > > If we're going to add it to pg_stat_*, why is 'backend' part of the > name? All the existing backend_* function fetch some piece of data > about a given backend -- whereas this function does not (it takes > no arguments). > > IMHO, a better name would be something like 'backend_process_id()', > or 'unix_pid', or 'backend_pid()'. > > Also, can you add some documentation on this? > > Cheers, > > Neil > > -- > Neil Conway <neilconway@rogers.com> > PGP Key ID: DB3C29FC > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 Index: doc/src/sgml/monitoring.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/monitoring.sgml,v retrieving revision 1.7 diff -c -r1.7 monitoring.sgml *** doc/src/sgml/monitoring.sgml 22 Mar 2002 19:20:14 -0000 1.7 --- doc/src/sgml/monitoring.sgml 31 Jul 2002 01:42:11 -0000 *************** *** 481,490 **** </row> <row> <entry><function>pg_stat_get_backend_pid</function>(<type>integer</type>)</entry> <entry><type>integer</type></entry> <entry> ! PID of backend process </entry> </row> --- 481,498 ---- </row> <row> + <entry><function>backend_pid</function>()</entry> + <entry><type>integer</type></entry> + <entry> + Process ID of current backend + </entry> + </row> + + <row> <entry><function>pg_stat_get_backend_pid</function>(<type>integer</type>)</entry> <entry><type>integer</type></entry> <entry> ! Process ID of all backend processes </entry> </row> Index: src/backend/utils/adt/pgstatfuncs.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/pgstatfuncs.c,v retrieving revision 1.5 diff -c -r1.5 pgstatfuncs.c *** src/backend/utils/adt/pgstatfuncs.c 31 Jul 2002 00:40:40 -0000 1.5 --- src/backend/utils/adt/pgstatfuncs.c 31 Jul 2002 01:42:12 -0000 *************** *** 19,25 **** extern Datum pg_stat_get_blocks_hit(PG_FUNCTION_ARGS); extern Datum pg_stat_get_backend_idset(PG_FUNCTION_ARGS); ! extern Datum pg_stat_get_backend_mypid(PG_FUNCTION_ARGS); extern Datum pg_stat_get_backend_pid(PG_FUNCTION_ARGS); extern Datum pg_stat_get_backend_dbid(PG_FUNCTION_ARGS); extern Datum pg_stat_get_backend_userid(PG_FUNCTION_ARGS); --- 19,25 ---- extern Datum pg_stat_get_blocks_hit(PG_FUNCTION_ARGS); extern Datum pg_stat_get_backend_idset(PG_FUNCTION_ARGS); ! extern Datum backend_pid(PG_FUNCTION_ARGS); extern Datum pg_stat_get_backend_pid(PG_FUNCTION_ARGS); extern Datum pg_stat_get_backend_dbid(PG_FUNCTION_ARGS); extern Datum pg_stat_get_backend_userid(PG_FUNCTION_ARGS); *************** *** 213,219 **** Datum ! pg_stat_get_backend_mypid(PG_FUNCTION_ARGS) { PG_RETURN_INT32(MyProcPid); } --- 213,219 ---- Datum ! backend_pid(PG_FUNCTION_ARGS) { PG_RETURN_INT32(MyProcPid); } Index: src/include/catalog/pg_proc.h =================================================================== RCS file: /cvsroot/pgsql/src/include/catalog/pg_proc.h,v retrieving revision 1.247 diff -c -r1.247 pg_proc.h *** src/include/catalog/pg_proc.h 31 Jul 2002 00:40:40 -0000 1.247 --- src/include/catalog/pg_proc.h 31 Jul 2002 01:42:20 -0000 *************** *** 2703,2710 **** DESCR("Statistics: Number of blocks found in cache"); DATA(insert OID = 1936 ( pg_stat_get_backend_idset PGNSP PGUID 12 f f t t s 0 23 "" pg_stat_get_backend_idset- _null_ )); DESCR("Statistics: Currently active backend IDs"); ! DATA(insert OID = 2026 ( pg_stat_get_backend_mypid PGNSP PGUID 12 f f t f s 0 23 "" pg_stat_get_backend_mypid- _null_ )); ! DESCR("Statistics: My backend ID"); DATA(insert OID = 1937 ( pg_stat_get_backend_pid PGNSP PGUID 12 f f t f s 1 23 "23" pg_stat_get_backend_pid -_null_ )); DESCR("Statistics: PID of backend"); DATA(insert OID = 1938 ( pg_stat_get_backend_dbid PGNSP PGUID 12 f f t f s 1 26 "23" pg_stat_get_backend_dbid- _null_ )); --- 2703,2710 ---- DESCR("Statistics: Number of blocks found in cache"); DATA(insert OID = 1936 ( pg_stat_get_backend_idset PGNSP PGUID 12 f f t t s 0 23 "" pg_stat_get_backend_idset- _null_ )); DESCR("Statistics: Currently active backend IDs"); ! DATA(insert OID = 2026 ( backend_pid PGNSP PGUID 12 f f t f s 0 23 "" backend_pid - _null_ )); ! DESCR("Statistics: Current backend ID"); DATA(insert OID = 1937 ( pg_stat_get_backend_pid PGNSP PGUID 12 f f t f s 1 23 "23" pg_stat_get_backend_pid -_null_ )); DESCR("Statistics: PID of backend"); DATA(insert OID = 1938 ( pg_stat_get_backend_dbid PGNSP PGUID 12 f f t f s 1 26 "23" pg_stat_get_backend_dbid- _null_ ));
On Tue, Jul 30, 2002 at 09:48:42PM -0400, Bruce Momjian wrote: > OK, renamed to backend_pid() to match the libpq name. Ok, thanks. > Where does the mention belong in the docs? I have it in the monitoring > section in the stats section right now. I'd vote for User's Guide -> Functions & Operators -> Misc. Functions. I don't think it belongs in the monitoring section, since it isn't part of the stats collector and isn't really used for monitoring. Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
Neil Conway wrote: > On Tue, Jul 30, 2002 at 09:48:42PM -0400, Bruce Momjian wrote: > > OK, renamed to backend_pid() to match the libpq name. > > Ok, thanks. > > > Where does the mention belong in the docs? I have it in the monitoring > > section in the stats section right now. > > I'd vote for User's Guide -> Functions & Operators -> Misc. Functions. I > don't think it belongs in the monitoring section, since it isn't part of > the stats collector and isn't really used for monitoring. OK, docs moved to new section. I kept the function in pgstatfuncs.c. Not sure where else to put it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
nconway@klamath.dyndns.org (Neil Conway) writes: > On Tue, Jul 30, 2002 at 09:48:42PM -0400, Bruce Momjian wrote: >> Where does the mention belong in the docs? I have it in the monitoring >> section in the stats section right now. > I'd vote for User's Guide -> Functions & Operators -> Misc. Functions. There is a table in that section for "session information functions", which seems the correct choice. regards, tom lane
Tom Lane wrote: > nconway@klamath.dyndns.org (Neil Conway) writes: > > On Tue, Jul 30, 2002 at 09:48:42PM -0400, Bruce Momjian wrote: > >> Where does the mention belong in the docs? I have it in the monitoring > >> section in the stats section right now. > > > I'd vote for User's Guide -> Functions & Operators -> Misc. Functions. > > There is a table in that section for "session information functions", > which seems the correct choice. That's where I put it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Tue, Jul 30, 2002 at 09:48:42PM -0400, Bruce Momjian wrote: > > OK, renamed to backend_pid() to match the libpq name. I was unsure > about merging it into the stats stuff myself. > > setest=> select backend_pid(); > backend_pid > ------------- > 12996 > (1 row) Is there some common convention of names? Why not pg_backend_pid()? Karel -- Karel Zak <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
On Thu, Aug 01, 2002 at 12:01:52PM +0200, Karel Zak wrote: > Is there some common convention of names? No, there isn't (for example, pg_stat_backend_id() versus current_schema() -- or pg_get_viewdef() versus obj_description() ). Now that we have table functions, we might be using more built-in functions to provide information to the user -- so there will be an increasing need for some kind of naming convention for built-in functions. However, establishing a naming convention without breaking backwards compatibility might be tricky. Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
On Thu, 2002-08-01 at 10:44, Neil Conway wrote: > On Thu, Aug 01, 2002 at 12:01:52PM +0200, Karel Zak wrote: > > Is there some common convention of names? > functions. However, establishing a naming convention without > breaking backwards compatibility might be tricky. Supporting both names for a release with comments in the release notes stating the old names will disappear soon should be enough. Most of the time it'll be a simple replacement command. Providing a find -exec sed statement may help.
On Thu, Aug 01, 2002 at 10:44:23AM -0400, Neil Conway wrote: > On Thu, Aug 01, 2002 at 12:01:52PM +0200, Karel Zak wrote: > > Is there some common convention of names? > > No, there isn't (for example, pg_stat_backend_id() versus I know -- for this I asked. IMHO for large project like PostgreSQLit's important. It's not good if there is possible speculateaboutname of new function. It must be unmistakable -- for this is needfulmake some convension. If somebody add newfunction and it's released,it's in the PostgreSQL almost forever. > current_schema() -- or pg_get_viewdef() versus obj_description() ). > Now that we have table functions, we might be using more built-in > functions to provide information to the user -- so there will be > an increasing need for some kind of naming convention for built-in > functions. However, establishing a naming convention without > breaking backwards compatibility might be tricky.Yes, but we can try be clean for new stuff. Karel -- Karel Zak <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
Added to TODO: * Consistently name server-side internal functions --------------------------------------------------------------------------- Karel Zak wrote: > On Thu, Aug 01, 2002 at 10:44:23AM -0400, Neil Conway wrote: > > On Thu, Aug 01, 2002 at 12:01:52PM +0200, Karel Zak wrote: > > > Is there some common convention of names? > > > > No, there isn't (for example, pg_stat_backend_id() versus > > I know -- for this I asked. IMHO for large project like PostgreSQL > it's important. It's not good if there is possible speculate about > name of new function. It must be unmistakable -- for this is needful > make some convension. If somebody add new function and it's released, > it's in the PostgreSQL almost forever. > > > current_schema() -- or pg_get_viewdef() versus obj_description() ). > > Now that we have table functions, we might be using more built-in > > functions to provide information to the user -- so there will be > > an increasing need for some kind of naming convention for built-in > > functions. However, establishing a naming convention without > > breaking backwards compatibility might be tricky. > > Yes, but we can try be clean for new stuff. > > Karel > > -- > Karel Zak <zakkr@zf.jcu.cz> > http://home.zf.jcu.cz/~zakkr/ > > C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
I can rename backend_pid if people want. I just made it consistent with the other functions in that docs area. Comments? --------------------------------------------------------------------------- Karel Zak wrote: > On Thu, Aug 01, 2002 at 10:44:23AM -0400, Neil Conway wrote: > > On Thu, Aug 01, 2002 at 12:01:52PM +0200, Karel Zak wrote: > > > Is there some common convention of names? > > > > No, there isn't (for example, pg_stat_backend_id() versus > > I know -- for this I asked. IMHO for large project like PostgreSQL > it's important. It's not good if there is possible speculate about > name of new function. It must be unmistakable -- for this is needful > make some convension. If somebody add new function and it's released, > it's in the PostgreSQL almost forever. > > > current_schema() -- or pg_get_viewdef() versus obj_description() ). > > Now that we have table functions, we might be using more built-in > > functions to provide information to the user -- so there will be > > an increasing need for some kind of naming convention for built-in > > functions. However, establishing a naming convention without > > breaking backwards compatibility might be tricky. > > Yes, but we can try be clean for new stuff. > > Karel > > -- > Karel Zak <zakkr@zf.jcu.cz> > http://home.zf.jcu.cz/~zakkr/ > > C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Thu, 2002-08-01 at 19:41, Bruce Momjian wrote: > > Added to TODO: > > * Consistently name server-side internal functions I'd suggest: * Make up rules for consistently naming server-side internal functions * Consistently name _new_ server-side internal functions * make a plan for moving existing server-side internal functions to consistent naming --------------- Hannu
On Thu, Aug 01, 2002 at 05:09:52PM +0200, Karel Zak wrote: > I know -- for this I asked. IMHO for large project like PostgreSQL > it's important. It's not good if there is possible speculate about > name of new function. It must be unmistakable -- for this is needful > make some convension. If somebody add new function and it's released, > it's in the PostgreSQL almost forever. I agree that a naming convention would be useful in some circumstances, but for commonly-used functions, I think it would do more harm than good. 'pg_nextval()' is awfully ugly, for example. And if we're going to have a naming convention for builtin functions, what about builtin types? 'pg_int4', anyone? :-) Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
Neil Conway writes: > On Thu, Aug 01, 2002 at 12:01:52PM +0200, Karel Zak wrote: > > Is there some common convention of names? > > No, there isn't (for example, pg_stat_backend_id() versus > current_schema() -- or pg_get_viewdef() versus obj_description() ). The "pg_" naming scheme is obsolete because system and user namespaces are now isolated. Anything involving "get" is also redundant, IMHO, because we aren't dealing with object-oriented things. Besides that, the convention in SQL seems to be to use full noun phrases with words separated by underscores. So if "pg_get_viewdef" where reinvented today, by me, it would be called "view_definition". A whole 'nother issue is to use the right terms for the right things. For example, the term "backend" is rather ambiguous and PostgreSQL uses it differently from everyone else. Instead I would use "server process" when referring to the PID. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut wrote: > Neil Conway writes: > > > On Thu, Aug 01, 2002 at 12:01:52PM +0200, Karel Zak wrote: > > > Is there some common convention of names? > > > > No, there isn't (for example, pg_stat_backend_id() versus > > current_schema() -- or pg_get_viewdef() versus obj_description() ). > > The "pg_" naming scheme is obsolete because system and user namespaces are > now isolated. Anything involving "get" is also redundant, IMHO, because > we aren't dealing with object-oriented things. Besides that, the > convention in SQL seems to be to use full noun phrases with words > separated by underscores. > > So if "pg_get_viewdef" where reinvented today, by me, it would be called > "view_definition". > > A whole 'nother issue is to use the right terms for the right things. For > example, the term "backend" is rather ambiguous and PostgreSQL uses it > differently from everyone else. Instead I would use "server process" when > referring to the PID. Yes, I wanted to match libpq's function, which is the way people used to get the pid. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> No, there isn't (for example, pg_stat_backend_id() versus > current_schema() -- or pg_get_viewdef() versus obj_description() ). > Now that we have table functions, we might be using more built-in > functions to provide information to the user -- so there will be > an increasing need for some kind of naming convention for built-in > functions. However, establishing a naming convention without > breaking backwards compatibility might be tricky. I personally think that as many functions as possible should be prefixed pg_*... People are still used to avoiding pg_ as a prefix. Chris
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I can rename backend_pid if people want. I just made it consistent > with the other functions in that docs area. Comments? I'd go for pg_backend_pid, I think. It's not an SQL standard function and certainly never will be, so some sort of prefix seems appropriate. Perhaps a more relevant question is why are we cluttering the namespace with any such function at all? What's the use case for it? We've gotten along fine without one so far, and I don't really think that we *ought* to be exposing random bits of internal implementation details at the SQL level. regards, tom lane
... > Perhaps a more relevant question is why are we cluttering the namespace > with any such function at all? What's the use case for it? We've > gotten along fine without one so far, and I don't really think that we > *ought* to be exposing random bits of internal implementation details > at the SQL level. Actually, I was wondering the same thing, maybe for a different reason. Exposing the backend internals could have security implications (though don't make me concoct a scenario to prove it ;) Although it might have some usefulness for debugging, I think it should not be an "installed by default" feature, so istm would be a great candidate for a contrib/ function or library. If someone needs it, it is almost immediately available. - Thomas
On Thu, Aug 01, 2002 at 01:41:49PM -0400, Bruce Momjian wrote: > > Added to TODO: > > * Consistently name server-side internal functions OK, good start of discussion is define groups of the PostgreSQL functions: 1/ Extern compatible functions The functions compatible with standards or customs or others SQL servers. For example trim, to_char, ... 2/ PostgreSQLspecific functions used in standard SQL operations (the function works with standard data and not load it from internal PostgreSQL stuff). For example convert(), all datetype function like int(). The name convenition must be like names in group 1/ 3/ PostgreSQL specific system functions For example pg_backend_pid(). IMHO clean solution is use "pg_" prefix. 4/ The calls without '( )' For example "SELECT current_user;" IMHO right is not use "pg_" prefix _if_ you call it without braces. _But_ if youcall it with '()' and function can be member of group 3/ is right use "pg_" prefix. For example: SELECT current_user; SELECT pg_current_user(); 5/ Deprecated functions In docs marked as "deprecated" and will removed in some major release (for example in 8.0). 6/ ??? -- Karel Zak <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
> 2/ PostgreSQL specific functions used in standard SQL operations > > (the function works with standard data and not load it from > internal PostgreSQL stuff). > > For example convert(), all datetype function like int(). The name > convenition must be like names in group 1/ FYI, I have been proposing SQL99 compatible convert(). I would like to add it if no one objects. -- Tatsuo Ishii
> > For example convert(), all datetype function like int(). The name > > convenition must be like names in group 1/ > > FYI, I have been proposing SQL99 compatible convert(). I would like to > add it if no one objects. No objection, but what does it do out of interest? Will it cause a backwards compatibility problem at all? Chris
Hello, Sorry if it's wrong list for the question. Could you suggest some tweaks to the PostgreSQL 7.2.1 to handle the following types of tables faster? Here we have table "stats" with something over one millon records. Obvious "SELECT COUNT(*) FROM stats " takes over 40 seconds to execute, and this amount of time does not shorten considerably in subsequent similar requests. All the databases are vacuumed nightly. CREATE TABLE "stats" ( "url" varchar(50), "src_port" varchar(10), "ip" varchar(16), "dst_port" varchar(10), "proto"varchar(10), "size" int8, "login" varchar(20), "start_date" timestamptz, "end_date" timestamptz, "aggregated"int4 ); CREATE INDEX "aggregated_stats_key" ON "stats" ("aggregated"); CREATE INDEX "ip_stats_key" ON "stats" ("ip"); stats=> explain select count(*) from stats; NOTICE: QUERY PLAN: Aggregate (cost=113331.10..113331.10 rows=1 width=0) -> Seq Scan on stats (cost=0.00..110085.28 rows=1298328 width=0) EXPLAIN stats=> select count(*) from stats; count ---------1298328 (1 row) The system is FreeBSD-4.6-stable, softupdates on, Athlon XP 1500+, 512 Mb DDR, ATA 100 HDD. Thanks in advance, Yar
> Here we have table "stats" with something over one millon records. > Obvious "SELECT COUNT(*) FROM stats " takes over 40 seconds to execute, > and this amount of time does not shorten considerably in subsequent > similar requests. All the databases are vacuumed nightly. Doing a row count requires a sequential scan in Postgres. Try creating another summary table that just has one row and one column and is an integer. Then, create a trigger on your stats table that fires whenever a new row is added or deleted and updates the tally of rows in the summary table. Then, just select from the summary table to get an instantaneous count. Of course, insert and deletes will be marginally slowed down. Refer to the docs for CREATE TRIGGER, CREATE FUNCTION and PL/PGSQL for more info on how to do this. Regards, Chris
On Fri, Aug 02, 2002 at 05:38:37PM +0900, Tatsuo Ishii wrote: > > 2/ PostgreSQL specific functions used in standard SQL operations > > > > (the function works with standard data and not load it from > > internal PostgreSQL stuff). > > > > For example convert(), all datetype function like int(). The name > > convenition must be like names in group 1/ > > FYI, I have been proposing SQL99 compatible convert(). I would like to > add it if no one objects. I use convert() as example only. I think there is more functionfor group 2/. Karel -- Karel Zak <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
Christopher Kings-Lynne wrote: >Doing a row count requires a sequential scan in Postgres. > >Try creating another summary table that just has one row and one column and >is an integer. > > I have THREE summary tables derived from "stats" with different levels of aggregation. They work quite fast, But: 1) Summary tables grow too 2) There are requests which cannot be predicted, so they involve the "stats" table itself. So I am still interested in PostgreSQL's ability to deal with multimillon records tables. Best regards, Yar.
On Fri, Aug 02, 2002 at 03:48:39PM +0400, Yaroslav Dmitriev wrote: > > So I am still interested in PostgreSQL's ability to deal with > multimillon records tables. [x-posted and Reply-To: to -general; this isn't a development problem.] We have tables with multimillion records, and they are fast. But not fast to count(). The MVCC design of PostgreSQL will give you very few concurerncy problems, but you pay for that in the response time of certain kinds of aggregates, which cannot use an index. A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
times change if you do "SELECT COUNT(1) FROM stats" ? -- :: Sergio A. Kessler :: Linux user #64005 - http://counter.li.org "Yaroslav Dmitriev" <yar@warlock.ru> escribi� en el mensaje news:3D4A49E7.6090405@warlock.ru... > Hello, > > Sorry if it's wrong list for the question. Could you suggest some tweaks > to the PostgreSQL 7.2.1 to handle the following types of tables faster? > > Here we have table "stats" with something over one millon records. > Obvious "SELECT COUNT(*) FROM stats " takes over 40 seconds to execute, > and this amount of time does not shorten considerably in subsequent > similar requests. All the databases are vacuumed nightly. > > CREATE TABLE "stats" ( > "url" varchar(50), > "src_port" varchar(10), > "ip" varchar(16), > "dst_port" varchar(10), > "proto" varchar(10), > "size" int8, > "login" varchar(20), > "start_date" timestamptz, > "end_date" timestamptz, > "aggregated" int4 > ); > CREATE INDEX "aggregated_stats_key" ON "stats" ("aggregated"); > CREATE INDEX "ip_stats_key" ON "stats" ("ip"); > > stats=> explain select count(*) from stats; > NOTICE: QUERY PLAN: > > Aggregate (cost=113331.10..113331.10 rows=1 width=0) > -> Seq Scan on stats (cost=0.00..110085.28 rows=1298328 width=0) > > EXPLAIN > stats=> select count(*) from stats; > count > --------- > 1298328 > (1 row) > > The system is FreeBSD-4.6-stable, softupdates on, Athlon XP 1500+, 512 Mb DDR, ATA 100 HDD. > > Thanks in advance, > Yar > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Thomas Lockhart wrote: > ... > > Perhaps a more relevant question is why are we cluttering the namespace > > with any such function at all? What's the use case for it? We've > > gotten along fine without one so far, and I don't really think that we > > *ought* to be exposing random bits of internal implementation details > > at the SQL level. > > Actually, I was wondering the same thing, maybe for a different reason. > Exposing the backend internals could have security implications (though > don't make me concoct a scenario to prove it ;) > > Although it might have some usefulness for debugging, I think it should > not be an "installed by default" feature, so istm would be a great > candidate for a contrib/ function or library. If someone needs it, it is > almost immediately available. It was requested because it is exposed in libpq and people need it to generate unique names and stuff like that from within psql and functions. Seems like a valid use for the pid. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Perhaps a more relevant question is why are we cluttering the namespace > > with any such function at all? What's the use case for it? > It was requested because it is exposed in libpq and people need it to > generate unique names and stuff like that from within psql and > functions. Seems like a valid use for the pid. The sole reason libpq exposes it is so that you can tell a self-notify from an incoming notify. (ie, given you are LISTENing on a condition that both you and other clients send NOTIFYs for, is this particular message one that you sent yourself, or not? Compare the originator PID in the NOTIFY message to your backend_pid to find out.) I put that feature in back around 6.4, because it allowed some important optimizations in an app I had that used LISTEN/NOTIFY a lot. Since NOTIFY messages aren't even visible at the SQL level, the above is not a reason for making PIDs visible at the SQL level. I'm really dubious about using backend PID for the sort of purpose you suggest. Unique names would be *much* more safely handled with, say, a sequence generator. If you are not using libpq or another client library that can give you a backend-PID API call, then very likely you don't have a lot of control over the backend connection either, and shouldn't assume that backend PID is going to be stable for you. (Think about pooled connections in a webserver, etc.) Finally, the most legitimate uses of PID (like discovering a backend PID to send SIGINT to, when some client query is running wild) are not supported at all by a function that can only return your own backend's PID, because that's seldom the PID you need to know. The pg_stat_activity view handles this much better. So I'm still unconvinced that we need or want this ... regards, tom lane
> So I am still interested in PostgreSQL's ability to deal with > multimillon records tables. Postgres has no problem with multimillion row tables - many people on this list run them - just don't do sequential scans on them if you can't afford the time it takes. Chris
On Sat, 2002-08-03 at 01:25, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > Perhaps a more relevant question is why are we cluttering the namespace > > > with any such function at all? What's the use case for it? > > > It was requested because it is exposed in libpq and people need it to > > generate unique names and stuff like that from within psql and > > functions. Seems like a valid use for the pid. > > The sole reason libpq exposes it is so that you can tell a self-notify > from an incoming notify. (ie, given you are LISTENing on a condition > that both you and other clients send NOTIFYs for, is this particular > message one that you sent yourself, or not? Compare the originator PID > in the NOTIFY message to your backend_pid to find out.) I put that > feature in back around 6.4, because it allowed some important > optimizations in an app I had that used LISTEN/NOTIFY a lot. > > Since NOTIFY messages aren't even visible at the SQL level, the above is > not a reason for making PIDs visible at the SQL level. When I last time showed how backend_pid function can be trivially defined as hannu=# create function getpid() returns int hannu-# as '/lib/libc.so.6','getpid' language 'C'; CREATE hannu=# select getpid();getpid -------- 2832 (1 row) You claimed that NOTIFY uses some _other_ backend id (i.e. not process id). But when I now tested it it seems that this is not the case, notify does use the actual process id. hannu=# listen a; LISTEN hannu=# notify a; NOTIFY Asynchronous NOTIFY 'a' from backend with pid 2832 received. > > So I'm still unconvinced that we need or want this ... > And you can do it trivially as long as we support old-style C functions anyway. ------------ Hannu
Hannu Krosing <hannu@tm.ee> writes: > You claimed that NOTIFY uses some _other_ backend id (i.e. not process > id). I did? Must have been momentary brain fade on my part. It's always been process ID. regards, tom lane
As I remember, most cases where people have recently been asking for backend pid were related to temp tables because they were named by pid. I don't think they are anymore. (?) We can do two things. We can either rename it to pg_backend_pid and move it to the statistics section in the docs, where the backend pids of all active backends are available, or remove my code additions and see if anyone asks for it in 7.3. --------------------------------------------------------------------------- Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > Perhaps a more relevant question is why are we cluttering the namespace > > > with any such function at all? What's the use case for it? > > > It was requested because it is exposed in libpq and people need it to > > generate unique names and stuff like that from within psql and > > functions. Seems like a valid use for the pid. > > The sole reason libpq exposes it is so that you can tell a self-notify > from an incoming notify. (ie, given you are LISTENing on a condition > that both you and other clients send NOTIFYs for, is this particular > message one that you sent yourself, or not? Compare the originator PID > in the NOTIFY message to your backend_pid to find out.) I put that > feature in back around 6.4, because it allowed some important > optimizations in an app I had that used LISTEN/NOTIFY a lot. > > Since NOTIFY messages aren't even visible at the SQL level, the above is > not a reason for making PIDs visible at the SQL level. > > I'm really dubious about using backend PID for the sort of purpose you > suggest. Unique names would be *much* more safely handled with, say, > a sequence generator. If you are not using libpq or another client > library that can give you a backend-PID API call, then very likely you > don't have a lot of control over the backend connection either, and > shouldn't assume that backend PID is going to be stable for you. > (Think about pooled connections in a webserver, etc.) > > Finally, the most legitimate uses of PID (like discovering a backend PID > to send SIGINT to, when some client query is running wild) are not > supported at all by a function that can only return your own backend's > PID, because that's seldom the PID you need to know. The > pg_stat_activity view handles this much better. > > So I'm still unconvinced that we need or want this ... > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > As I remember, most cases where people have recently been asking for > backend pid were related to temp tables because they were named by pid. Ah, good point. > I don't think they are anymore. (?) Check. > We can do two things. We can either rename it to pg_backend_pid and > move it to the statistics section in the docs, where the backend pids of > all active backends are available, or remove my code additions and see > if anyone asks for it in 7.3. Let's take it out and wait to see if anyone really still wants it. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > As I remember, most cases where people have recently been asking for > > backend pid were related to temp tables because they were named by pid. > > Ah, good point. > > > I don't think they are anymore. (?) > > Check. > > > We can do two things. We can either rename it to pg_backend_pid and > > move it to the statistics section in the docs, where the backend pids of > > all active backends are available, or remove my code additions and see > > if anyone asks for it in 7.3. > > Let's take it out and wait to see if anyone really still wants it. Just when I am ready to throw it away, I come up with a use for the function: test=> select * from pg_stat_activity where procpid != backend_pid(); This shows all activity _except_ my session, which pgmonitor or others may want to use, and I can think of no other way to do it. Comments? Maybe this is why it should be called pg_backend_id and put in the stat section. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> Let's take it out and wait to see if anyone really still wants it. > Just when I am ready to throw it away, I come up with a use for the > function: > test=> select * from pg_stat_activity where procpid != backend_pid(); > This shows all activity _except_ my session, which pgmonitor or others > may want to use, and I can think of no other way to do it. Hm. Actually this seems like an argument for exposing MyBackendId, since what pg_stat_activity really depends on is BackendId. But as that view is presently defined, you'd not be able to writeWHERE backendid = my_backend_id() because the view doesn't expose backendid. > Comments? Maybe this is why it should be called pg_backend_id and put > in the stat section. *Please* don't call it pg_backend_id --- that invites confusion with BackendId which is a different thing. I'd suggest pg_backend_pid. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> Let's take it out and wait to see if anyone really still wants it. > > > Just when I am ready to throw it away, I come up with a use for the > > function: > > > test=> select * from pg_stat_activity where procpid != backend_pid(); > > > This shows all activity _except_ my session, which pgmonitor or others > > may want to use, and I can think of no other way to do it. > > Hm. Actually this seems like an argument for exposing MyBackendId, since > what pg_stat_activity really depends on is BackendId. But as that view > is presently defined, you'd not be able to write > WHERE backendid = my_backend_id() > because the view doesn't expose backendid. Yes. > > Comments? Maybe this is why it should be called pg_backend_id and put > > in the stat section. > > *Please* don't call it pg_backend_id --- that invites confusion with > BackendId which is a different thing. > > I'd suggest pg_backend_pid. Sorry, I mean pg_backend_pid. I could expose backend_id but it may confuse people so pid is probably better. If you had the id, you could use pg_stat_get_backend_pid() to get the pid. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Sorry, I mean pg_backend_pid. Okay, I was unsure if that was a typo or not. > I could expose backend_id but it may > confuse people so pid is probably better. If you had the id, you could > use pg_stat_get_backend_pid() to get the pid. Yeah, I thought of suggesting pg_backend_id() to return MyBackendId and then pg_stat_get_backend_pid() to get the PID, but was stopped by the thought that this breaks down if the stats collector isn't running. While I'm not convinced that there's any need for backend PID that's not connected to looking at stats-collector results, it's probably foolish to set up a mechanism that doesn't work outside that context. Let's go with pg_backend_pid(). regards, tom lane
> Hello, > > Sorry if it's wrong list for the question. Could you suggest some > tweaks to the PostgreSQL 7.2.1 to handle the following types of tables > faster? > > Here we have table "stats" with something over one millon records. > Obvious "SELECT COUNT(*) FROM stats " takes over 40 seconds to > execute, and this amount of time does not shorten considerably in > subsequent similar requests. All the databases are vacuumed nightly. > > CREATE TABLE "stats" ( > "url" varchar(50), > "src_port" varchar(10), > "ip" varchar(16), > "dst_port" varchar(10), > "proto" varchar(10), > "size" int8, > "login" varchar(20), > "start_date" timestamptz, > "end_date" timestamptz, > "aggregated" int4 > ); > CREATE INDEX "aggregated_stats_key" ON "stats" ("aggregated"); > CREATE INDEX "ip_stats_key" ON "stats" ("ip"); > > stats=> explain select count(*) from stats; > NOTICE: QUERY PLAN: > > Aggregate (cost=113331.10..113331.10 rows=1 width=0) > -> Seq Scan on stats (cost=0.00..110085.28 rows=1298328 width=0) > > EXPLAIN > stats=> select count(*) from stats; > count > --------- > 1298328 > (1 row) > > The system is FreeBSD-4.6-stable, softupdates on, Athlon XP 1500+, 512 > Mb DDR, ATA 100 HDD. > > Thanks in advance, > Yar > I have been dealing with a similar problem.. First I switched to scsi, second I installed enough memory and increased shared memory (in both freebsd kernel and pg.conf) so that the entire database could fit into ram; this combined with the summary table idea keeps me out of most trouble
We have tables of over 3.1 million records. Performance is fine for most things as long as access hits an index. As already stated, count(*) takes a long time. Just took over a minute for me to check the record count. Our DB is primarily a data warehouse role. Creating an index on a char(43) field on that table from scratch takes a while, but I think that's expected. Under normal loads we have well under 1 second "LIKE" queries on that the indexed char(43) field in the table with a join on a table of 1.1 million records using a char(12) primary key. Server is a Dell PowerEdge 2400, Dual PIII 667's with a gig of memory, 800 something megs allocated to postgres shared buffers. -Pete Andrew Sullivan wrote: >On Fri, Aug 02, 2002 at 03:48:39PM +0400, Yaroslav Dmitriev wrote: > > >>So I am still interested in PostgreSQL's ability to deal with >>multimillon records tables. >> >> > >[x-posted and Reply-To: to -general; this isn't a development >problem.] > >We have tables with multimillion records, and they are fast. But not >fast to count(). The MVCC design of PostgreSQL will give you very >few concurerncy problems, but you pay for that in the response time >of certain kinds of aggregates, which cannot use an index. > >A > > >