Thread: Explain explained
Hi, I'm trying to figure out how to interpret the query plan explanation returned by postgresql. There are a couple things that I can't explain to myself, so help is appreciated. This is the query plan returned by the query I'm inspecting. Nested Loop (cost=0.00..450.39 rows=1 width=128) -> Nested Loop (cost=0.00..444.71 rows=1 width=140) -> Nested Loop (cost=0.00..443.72 rows=1 width=148) -> Nested Loop (cost=0.00..442.40 rows=1 width=80) -> Nested Loop (cost=0.00..440.28 rows=1 width=72) -> Nested Loop (cost=0.00..438.97 rows=1 width=4) -> Seq Scan on individual_name name_0 (cost=0.00..430.68 rows=1 width=4) Filter: (name = 'http://www.University0.edu'::text) -> Index Scan using object_role_assertion_b_role_idx on object_role_assertion ora_0 (cost=0.00..8.28 rows=1 width=8) Index Cond: ((name_0.id = ora_0.b) AND (ora_0.object_role = 69)) -> Index Scan using individual_name_pkey on individual_name name_1 (cost=0.00..1.30 rows=1 width=68) Index Cond: (name_1.id = ora_0.a) -> Index Scan using object_role_assertion_b_role_idx on object_role_assertion ora_1 (cost=0.00..2.10 rows=1 width=8) Index Cond: ((ora_0.a = ora_1.b) AND (ora_1.object_role = 44)) -> Index Scan using individual_name_pkey on individual_name name_2 (cost=0.00..1.30 rows=1 width=68) Index Cond: (name_2.id = ora_1.a) -> Index Scan using concept_assertion_pkey on concept_assertion ca_0 (cost=0.00..0.98 rows=1 width=4) Index Cond: ((ca_0.concept = 5) AND (ca_0.individual = ora_1.a)) -> Index Scan using concept_assertion_pkey on concept_assertion ca_1 (cost=0.00..5.67 rows=1 width=4) Index Cond: ((ca_1.concept = 20) AND (ora_0.a = ca_1.individual)) Now, the first question is, how to read this. My understanding is that the plan contains some nested loops and the first constrain is on the table individual_name by filtering the name attribute with the value http://www.University0.edu. This is done by a sequential scan. Next we have an index scan on object_role_assertion using the object_role_assertion_b_role_idx. The result set returned by the sequential and index scan is then joined with the index scan on individual_name using individual_name_pkey with the constrain name_1.id = ora_0.a ... and so on. If this sequence is correct, I'm wondering about the following: 1/ How does postgresql know that the constrain individual_name.name = 'http://www.University0.edu' matches one row (which is in fact true)? I'm aware that databases rely on statistics of data distribution but it doesn't seem possible to me to exactly know the statistics about each possible value for attributes. Further, why expects postgresql for each operation a row size of 1? 2/ Sequential scans seem to me more expensive compared to index scans. I'm wondering why the sequential scan on individual_name is the first executed in the plan. Having or not having an index on individual_name.name doesn't change anything to the query plan. Why? 3/ There is one condition in the query, i.e. concept_assertion.concept = 5 with an empty result set, i.e. selectivity 0. In fact, the last index scan on concept_assertion ca_1 in the plan is never executed (this is what 'explain analyze' tells me). I'm wondering, why this constrain is not executed first. By executing this first, we could just never execute everything else. Postgresql seems to have fine-grained statistics, why does it get the estimate for individual_name.name = 'http://www.University0.edu' exactly and is wrong on concept_assertion.concept = 5? I did execute analyze before. Thanks for your explanations, markus
On Tue, Mar 4, 2008 at 9:53 AM, Markus Stocker <markus@wilabs.ch> wrote: > Hi, > > I'm trying to figure out how to interpret the query plan explanation > returned by postgresql. Good info here: http://redivi.com/~bob/oscon2005_pgsql_pdf/OSCON_Explaining_Explain_Public.pdf
"Markus Stocker" <markus@wilabs.ch> writes: > 1/ How does postgresql know that the constrain individual_name.name = > 'http://www.University0.edu' matches one row (which is in fact true)? It doesn't *know* that, except in the special case where it sees there's a unique index on that column. You didn't show us the table definitions so it's not clear if that's the case or not. Otherwise it's just estimating based on the pg_stats statistics for the column. There's a new section in the 8.3 docs that gives some more details about the estimation process: http://www.postgresql.org/docs/8.3/static/planner-stats-details.html > Further, why expects postgresql > for each operation a row size of 1? It's not expecting more than one join partner at each step. Again that's just statistical. > 2/ Sequential scans seem to me more expensive compared to index scans. > I'm wondering why the sequential scan on individual_name is the first > executed in the plan. I was wondering that too; it looks like it should be a candidate for an index search. Datatype problem maybe? Again, you've not shown us the table definitions... > 3/ There is one condition in the query, i.e. concept_assertion.concept > = 5 with an empty result set, i.e. selectivity 0. In fact, the last > index scan on concept_assertion ca_1 in the plan is never executed > (this is what 'explain analyze' tells me). I'm wondering, why this > constrain is not executed first. By executing this first, we could > just never execute everything else. Postgres never uses a rowcount estimate of less than one row, so it's not going to try to optimize that way. There are various rationales behind that choice, but the main one is that we don't trust the statistics unreservedly. The odds that an estimate of zero is more accurate than an estimate of one just aren't good enough, and the likelihood of generating a really awful plan if we did believe zero rows is too high. (In fact, I've been considering whether it wouldn't be a good idea to enforce a minimum estimate of two rows whenever we don't see a unique index proving there can be only one. This would help discourage the planner from using nestloops in cases where a nestloop loses badly for more than one matching row.) regards, tom lane
Tom, Thanks for your inputs, very enlightening! On Tue, Mar 4, 2008 at 1:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Markus Stocker" <markus@wilabs.ch> writes: > > 1/ How does postgresql know that the constrain individual_name.name = > > 'http://www.University0.edu' matches one row (which is in fact true)? > > It doesn't *know* that, except in the special case where it sees there's > a unique index on that column. Indeed, there is a unique index on that column, sorry about that. In fact, pg_stats says that there are no MCVs for that column (obviously). So that explains a lot. I get sel(individual_name.name = 'http://www.University0.edu') = (1 - null_frac) / num_distinct = (1 - 0) / 17174 rows = rel_cardinality * sel = 17174 * 1 / 17174 which I suppose explains too why the expected number of rows is 1. > There's a new section in the 8.3 docs that gives some more details about the estimation process: > http://www.postgresql.org/docs/8.3/static/planner-stats-details.html Thanks for pointing to this, very well documented. > > Further, why expects postgresql > > for each operation a row size of 1? > > It's not expecting more than one join partner at each step. Again > that's just statistical. Sounds reasonable. The first constrain (i.e. individual_name.name) returns (estimated) 1 row (see above) hence successive joins can't have more than one. > > 2/ Sequential scans seem to me more expensive compared to index scans. > > I'm wondering why the sequential scan on individual_name is the first > > executed in the plan. > > I was wondering that too; it looks like it should be a candidate for an > index search. Datatype problem maybe? Again, you've not shown us the > table definitions... I guess this is explained too now, at least partially. If I drop the index on individual_name.name there is still pg_stats telling that there are no MCVs. Hence, the expected number of rows doesn't change (even after analyze, the values simply are unique). Not sure how the sequential scan influences the overall costs compared to the index scan, though. The schema for individual_name relation is Column | Type | Modifiers --------+---------+----------- id | integer | not null name | text | not null Indexes: "individual_name_pkey" PRIMARY KEY, btree (id) "individual_name_name_idx" UNIQUE, btree (name) whereas the schema for the concept_assertion relation is Column | Type | Modifiers ------------+---------+----------- concept | integer | not null individual | integer | not null Indexes: "concept_assertion_pkey" PRIMARY KEY, btree (concept, individual) "concept_assertion_concept_idx" btree (concept) Foreign-key constraints: "concept_assertion_concept_fkey" FOREIGN KEY (concept) REFERENCES tbox_name(id) "concept_assertion_individual_fkey" FOREIGN KEY (individual) REFERENCES individual_name(id) > > 3/ There is one condition in the query, i.e. concept_assertion.concept > > = 5 with an empty result set, i.e. selectivity 0. In fact, the last > > index scan on concept_assertion ca_1 in the plan is never executed > > (this is what 'explain analyze' tells me). I'm wondering, why this > > constrain is not executed first. By executing this first, we could > > just never execute everything else. > > Postgres never uses a rowcount estimate of less than one row, > so it's not going to try to optimize that way. There are various > rationales behind that choice, but the main one is that we don't > trust the statistics unreservedly. The odds that an estimate of zero > is more accurate than an estimate of one just aren't good enough, > and the likelihood of generating a really awful plan if we did believe > zero rows is too high. Agreed, makes a lot of sense. If I try to calculate the estimated rows for concept_assertion.concept = 5 (i.e. the constrain which effectively returns an empty result set) I get sel(concept_assertion.concept = 5) = (1 - sum(mvf)) / (n_distinct - num_mcv) = 0.021545454 rows = rel_cardinality * sel = 18128 * 0.021545454 = (rounded) 391 Clearly, at least by inspecting the two constraints individually, i.e. the individual_name.name = 'http://www.University0.edu' and the concept_assertion.concept = 5 constrain, the optimizer chooses the first as it returns less expected rows, i.e. 1 vs. 391. Hope the explanations sound somewhat reasonable. Thanks again, markus
"Markus Stocker" <markus@wilabs.ch> writes: > On Tue, Mar 4, 2008 at 1:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> 2/ Sequential scans seem to me more expensive compared to index scans. >>> I'm wondering why the sequential scan on individual_name is the first >>> executed in the plan. >> >> I was wondering that too; it looks like it should be a candidate for an >> index search. Datatype problem maybe? Again, you've not shown us the >> table definitions... > I guess this is explained too now, at least partially. No, I meant it seemed like that should have been an indexscan; fetching one row via an index should have an estimated cost much less than 400. What do you get if you just do explain select * from individual_name where name = 'http://www.University0.edu' If it still says seqscan, what if you force it with set enable_seqscan = off? regards, tom lane
On Tue, Mar 4, 2008 at 5:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Markus Stocker" <markus@wilabs.ch> writes: > > On Tue, Mar 4, 2008 at 1:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >>> 2/ Sequential scans seem to me more expensive compared to index scans. > >>> I'm wondering why the sequential scan on individual_name is the first > >>> executed in the plan. > >> > >> I was wondering that too; it looks like it should be a candidate for an > >> index search. Datatype problem maybe? Again, you've not shown us the > >> table definitions... > > > I guess this is explained too now, at least partially. > > No, I meant it seemed like that should have been an indexscan; fetching > one row via an index should have an estimated cost much less than 400. > > What do you get if you just do > explain select * from individual_name where name = 'http://www.University0.edu' > If it still says seqscan, what if you force it with > set enable_seqscan = off? Sorry, I haven't been accurate in my explanations. If I have an index on individual_name.name then postgresql always executes an index scan. If I drop it, then it uses a sequential scan and if I then force it with enable_seqscan = off I get this QUERY PLAN -------------------------------------------------------------------------------- Seq Scan on individual_name (cost=100000000.00..100000430.67 rows=1 width=68) Filter: (name = 'http://www.University0.edu'::text) So, the query plan in my first email with the sequential scan was because the index on individual_name.name was dropped. Sorry about that. markus
Hello, I'm converting a database from SQLite to PostgreSQL, and I would like to have some punctuatioons about the numeric type. I want to store monetary values, they need to have decimals, of course, and most of times a precission of 2. I see that Postgre has a Monetary type, but in some places I read it's better to stay with numeric type. So, in Numeric, we can specify how many decimals (scale) and the precision. What happens if I only know how many decimals I want to store, and have no limit on the precision ? Is valid to have a numeric with no precision and a scale of 2 ? I'm plyaing a little and sometimes postgreSQL doesn't allow to change the precission of the defined fields. And I can do this and latter change it ? CREATE DOMAIN currency AS numeric(10,2); So, maybe sometimes I 'll need to have numbers greater than 8 digits (10-2=8) ? Also I read that the memory requeriments increase as more digits I declare ... I'm worrying for nothing ? thanks, raimon
Raimon Fernandez <coder@montx.com> writes: > Is valid to have a numeric with no precision and a scale of 2 ? No. You can do something like NUMERIC(1000,2) though. It's pretty much like VARCHAR --- the number you give is an upper limit, it's not causing any wasted space to be allocated. regards, tom lane
On 07/03/2008, at 14:37, Tom Lane wrote: > Raimon Fernandez <coder@montx.com> writes: >> Is valid to have a numeric with no precision and a scale of 2 ? > > No. You can do something like NUMERIC(1000,2) though. It's pretty > much like VARCHAR --- the number you give is an upper limit, it's > not causing any wasted space to be allocated. thanks ... raimon
Helo, I'm trying to back-up a PostgreSQL wich is installed on a remote windows machine. At this moment, I'm doing a back-up with pgAmind3 1.8.2 rev. 7032 from OS X, selecting the db and then BackUp with the 'compress' option. Later, with this file, I do a restore on my local copy of postgresql, but first I have to delete the database and create a new empty one, then I can Restore the backup. It seems that all the data is there, but I have some warnings like those, wich I don't really know what they mean ... ... pg_restore: creating FUNCTION pldbg_abort_target(integer) pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 21; 1255 16419 FUNCTION pldbg_abort_target(integer) postgres pg_restore: [archiver (db)] could not execute query: ERROR: could not access file "$libdir/pldbgapi": No such file or directory Command was: CREATE FUNCTION pldbg_abort_target(session integer) RETURNS SETOF boolean AS '$libdir/pldbgapi', 'pldbg_abort_target' ... pg_restore: [archiver (db)] could not execute query: ERROR: function public.pldbg_abort_target(integer) does not exist Command was: ALTER FUNCTION public.pldbg_abort_target(session integer) OWNER TO postgres; pg_restore: creating FUNCTION pldbg_attach_to_port(integer) pg_restore: [archiver (db)] Error from TOC entry 22; 1255 16420 FUNCTION pldbg_attach_to_port(integer) postgres pg_restore: [archiver (db)] could not execute query: ERROR: could not access file "$libdir/pldbgapi": No such file or directory Command was: CREATE FUNCTION pldbg_attach_to_port(portnumber integer) RETURNS integer AS '$libdir/pldbgapi', 'pldbg_attach_to_port' ... and so on till 40 warnings ... do I have to worry about them ? thanks, raimon
Raimon Fernandez <coder@montx.com> writes: > It seems that all the data is there, but I have some warnings like > those, wich I don't really know what they mean ... > pg_restore: [archiver (db)] could not execute query: ERROR: could not > access file "$libdir/pldbgapi": No such file or directory Seems you've forgotten to install an add-on package or two --- this one sounds like it's the EDB plpgsql debugger. > do I have to worry about them ? Yes, if you still want to use those functions... regards, tom lane
On 09/04/2008, at 16:43, Tom Lane wrote: > Raimon Fernandez <coder@montx.com> writes: >> It seems that all the data is there, but I have some warnings like >> those, wich I don't really know what they mean ... > >> pg_restore: [archiver (db)] could not execute query: ERROR: could >> not >> access file "$libdir/pldbgapi": No such file or directory > > Seems you've forgotten to install an add-on package or two --- this > one sounds like it's the EDB plpgsql debugger. well, I didn't install this copy of PostgreSQL so I don't know what's have been installed, I think it's a default windows installation. >> do I have to worry about them ? > > Yes, if you still want to use those functions... no at this moment, as I really don't know what they are and what are they for ... so, how I can make a 'standard' back-up without all those packages ? If I'm available to re-create all the database, that's enough (with sequences, schemas, etc.) thanks, raimon
Hello, After some months of Postgre on OS X, amb trying to install a complete package of Ruby On Rails with PostgreSQL, Capistrano, git, etc. etc. on a Debian linux. I've installed PostgreSQL from source, and executed the commands to install the start-up script: $ cp contrib/start-scripts/linux /etc/init.d/postgresql $ update-rc.d postgresql defaults on boot time, I'm getting a Permission denied: /etc/init.d/rc: line 78: /etc/rc2.d/S20postgresql : Permission denied even If I execute this line as a root I'm getting the permission denied: /etc/init.d/postgresql start I can start the database logged as the postgre user and executing: /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data my postgres user is postgres user, all with the default values. here is the file, in case someone can view any error ... thanks in advance, r. #! /bin/sh # chkconfig: 2345 98 02 # description: PostgreSQL RDBMS # This is an example of a start/stop script for SysV-style init, such # as is used on Linux systems. You should edit some of the variables # and maybe the 'echo' commands. # # Place this file at /etc/init.d/postgresql (or # /etc/rc.d/init.d/postgresql) and make symlinks to # /etc/rc.d/rc0.d/K02postgresql # /etc/rc.d/rc1.d/K02postgresql # /etc/rc.d/rc2.d/K02postgresql # /etc/rc.d/rc3.d/S98postgresql # /etc/rc.d/rc4.d/S98postgresql # /etc/rc.d/rc5.d/S98postgresql # Or, if you have chkconfig, simply: # chkconfig --add postgresql # # Proper init scripts on Linux systems normally require setting lock # and pid files under /var/run as well as reacting to network # settings, so you should treat this with care. # Original author: Ryan Kirkpatrick <pgsql@rkirkpat.net> # $PostgreSQL: pgsql/contrib/start-scripts/linux,v 1.8 2006/07/13 14:44:33 petere Exp $ ## EDIT FROM HERE # Installation prefix prefix=/usr/local/pgsql # Data directory PGDATA="/usr/local/pgsql/data" # Who to run the postmaster as, usually "postgres". (NOT "root") # Where to keep a log file PGLOG="$PGDATA/serverlog" ## STOP EDITING HERE # The path that is to be used for the script PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin # What to use to start up the postmaster (we do NOT use pg_ctl for this, # as it adds no value and can cause the postmaster to misrecognize a stale # lock file) DAEMON="$prefix/bin/postmaster" # What to use to shut down the postmaster PGCTL="$prefix/bin/pg_ctl" set -e # Only start if we can find the postmaster. test -x $DAEMON || exit 0 # Parse command line parameters. case $1 in start) echo -n "Starting PostgreSQL: " su - $PGUSER -c "$DAEMON -D '$PGDATA' &" >>$PGLOG 2>&1 echo "ok" ;; stop) echo -n "Stopping PostgreSQL: " su - $PGUSER -c "$PGCTL stop -D '$PGDATA' -s -m fast" echo "ok" ;; restart) echo -n "Restarting PostgreSQL: " su - $PGUSER -c "$PGCTL stop -D '$PGDATA' -s -m fast -w" su - $PGUSER -c "$DAEMON -D '$PGDATA' &" >>$PGLOG 2>&1 echo "ok" ;; reload) echo -n "Reload PostgreSQL: " su - $PGUSER -c "$PGCTL reload -D '$PGDATA' -s" echo "ok" ;; status) su - $PGUSER -c "$PGCTL status -D '$PGDATA'" ;; *) # Print help echo "Usage: $0 {start|stop|restart|reload|status}" 1>&2 exit 1 ;; esac exit 0
Raimon Fernandez wrote: > Hello, > > After some months of Postgre on OS X, amb trying to install a complete > package of Ruby On Rails with PostgreSQL, Capistrano, git, etc. etc. > on a Debian linux. > > I've installed PostgreSQL from source, and executed the commands to > install the start-up script: > > $ cp contrib/start-scripts/linux /etc/init.d/postgresql > $ update-rc.d postgresql defaults > > on boot time, I'm getting a Permission denied: > > /etc/init.d/rc: line 78: /etc/rc2.d/S20postgresql : Permission denied > > > even If I execute this line as a root I'm getting the permission denied: > > /etc/init.d/postgresql start > > > I can start the database logged as the postgre user and executing: > > /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data > > my postgres user is postgres user, all with the default values. > > here is the file, in case someone can view any error ... > > thanks in advance, > > r. You probably need to grant it executable permissions. chmod +x /etc/init.d/postgresql should do the trick. -salman
On 30/01/2009, at 17:52, salman wrote: > Raimon Fernandez wrote: >> Hello, >> >> After some months of Postgre on OS X, amb trying to install a >> complete package of Ruby On Rails with PostgreSQL, Capistrano, git, >> etc. etc. on a Debian linux. >> >> I've installed PostgreSQL from source, and executed the commands to >> install the start-up script: >> >> $ cp contrib/start-scripts/linux /etc/init.d/postgresql >> $ update-rc.d postgresql defaults >> >> on boot time, I'm getting a Permission denied: >> >> /etc/init.d/rc: line 78: /etc/rc2.d/S20postgresql : Permission denied >> >> >> even If I execute this line as a root I'm getting the permission >> denied: >> >> /etc/init.d/postgresql start >> >> >> I can start the database logged as the postgre user and executing: >> >> /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data >> >> my postgres user is postgres user, all with the default values. >> >> here is the file, in case someone can view any error ... >> >> thanks in advance, >> >> r. > > You probably need to grant it executable permissions. chmod +x /etc/ > init.d/postgresql should do the trick. thanks salman, that did the trick ..... r.
Hello, What's the difference between: SELECT count(id) FROM scanns WHERE tipus='esc'; => 11235 SELECT count(pages) FROM scanns WHERE tipus='esc'; => 11165 SELECT count(*) FROM scanns WHERE tipus='esc'; => 11235 Why are some rows that aren't included in the count(pages) SELECT ? There are not NULL values in the pages field: SELECT count(pages) FROM scanns WHERE tipus='esc' AND pages IS NOT NULL; => 11165 SELECT count(pages) FROM scanns WHERE tipus='esc' AND pages IS NULL; => 0 And I think the three SELECTS should return the same rows number ... thanks for your help, r.
On 09/09/2009, at 8:34, Raimon Fernandez wrote: > Hello, > > > What's the difference between: > > SELECT count(id) FROM scanns WHERE tipus='esc'; => 11235 > SELECT count(pages) FROM scanns WHERE tipus='esc'; => 11165 > SELECT count(*) FROM scanns WHERE tipus='esc'; => 11235 > > Why are some rows that aren't included in the count(pages) SELECT ? > > There are not NULL values in the pages field: > > SELECT count(pages) FROM scanns WHERE tipus='esc' AND pages IS NOT > NULL; => 11165 > SELECT count(pages) FROM scanns WHERE tipus='esc' AND pages IS NULL; > => 0 > > And I think the three SELECTS should return the same rows number ... > > > thanks for your help, > > r. ok, I found the problem ... this is wrong: SELECT count(pages) FROM scanns WHERE tipus='esc' AND pages IS NULL; => 0 this is the correct: SELECT count(*) FROM scanns WHERE tipus='esc' AND pages IS NULL; => 70 thanks, raimon