Thread: forced sequential scan when condition has current_user
hi, just a small question: is it normal that PostgreSQL 8.4.1 always uses sequential scanning on any table when there is a condition having the constant "current_user"? Of course there is a btree index set on that table, but the DBMS just doesn't want to utilize it. When I replace current_user to any string, the planner uses the index normally. I can demonstrate it with the following simple query: SELECT psz.kotesszam FROM projekt.projektszervezet psz WHERE psz.felhasznalo_id = current_user; Explain analyze: "Seq Scan on projektszervezet psz (cost=0.00..255.07 rows=42 width=9)" " Filter: ((felhasznalo_id)::name = "current_user"())" Metadata: CREATE TABLE "projekt"."projektszervezet" ( CONSTRAINT "projektszervezet_pkey" PRIMARY KEY("kotesszam", "felhasznalo_id"), CONSTRAINT "projektszervezet_fk_felhasznalo" FOREIGN KEY ("felhasznalo_id") REFERENCES "felhasznalo"."felhasznalo"("felhasznalo_id") ON DELETE RESTRICT ON UPDATE RESTRICT NOT DEFERRABLE, CONSTRAINT "projektszervezet_fk_projekt" FOREIGN KEY ("kotesszam") REFERENCES "projekt"."projekt"("kotesszam") ON DELETE CASCADE ON UPDATE CASCADE NOT DEFERRABLE, CONSTRAINT "projektszervezet_fk_szerep" FOREIGN KEY ("szerep_id") REFERENCES "felhasznalo"."szerep"("szerep_id") ON DELETE RESTRICT ON UPDATE RESTRICT NOT DEFERRABLE ) INHERITS ("projekt"."projektszervezet_sablon") WITH OIDS; CREATE INDEX "projektszervezet_idx_felhasznalo_id" ON "projekt"."projektszervezet" USING btree ("felhasznalo_id"); CREATE INDEX "projektszervezet_idx_kotesszam" ON "projekt"."projektszervezet" USING btree ("kotesszam"); CREATE TRIGGER "projektszervezet_archivalas" BEFORE UPDATE OR DELETE ON "projekt"."projektszervezet" FOR EACH ROW EXECUTE PROCEDURE "public"."projektszervezet_archivalas_trigger"(); CREATE TRIGGER "projektszervezet_idopecset" BEFORE UPDATE ON "projekt"."projektszervezet" FOR EACH ROW EXECUTE PROCEDURE "public"."idopecset_trigger"(); CREATE TRIGGER "projektszervezet_naplozas" BEFORE INSERT OR UPDATE OR DELETE ON "projekt"."projektszervezet" FOR EACH ROW EXECUTE PROCEDURE "public"."projektszervezet_naplozas_trigger"(); Inherited table: CREATE TABLE "projekt"."projektszervezet_sablon" ( "kotesszam" VARCHAR(10) NOT NULL, "felhasznalo_id" VARCHAR NOT NULL, "szerep_id" VARCHAR(3), "felvivo" VARCHAR DEFAULT "current_user"() NOT NULL, "felvitel_idopont" TIMESTAMP WITHOUT TIME ZONE DEFAULT now() NOT NULL, "modosito" VARCHAR, "modositas_idopont" TIMESTAMP WITHOUT TIME ZONE, "elso_felvivo" VARCHAR DEFAULT "current_user"() NOT NULL, "elso_felvitel_idopont" TIMESTAMP(0) WITHOUT TIME ZONE DEFAULT now() NOT NULL ) WITH OIDS; CREATE TRIGGER "projektszervezet_idopecset" BEFORE UPDATE ON "projekt"."projektszervezet_sablon" FOR EACH ROW EXECUTE PROCEDURE "public"."idopecset_trigger"(); Thanks! Balazs
2010/1/4 Keresztury Balázs <balazs@gaslightmusic.hu>: > just a small question: is it normal that PostgreSQL 8.4.1 always uses > sequential scanning on any table when there is a condition having the > constant "current_user"? Of course there is a btree index set on that table, > but the DBMS just doesn't want to utilize it. When I replace current_user to > any string, the planner uses the index normally. > > I can demonstrate it with the following simple query: > > SELECT psz.kotesszam FROM projekt.projektszervezet psz WHERE > psz.felhasznalo_id = current_user; > > Explain analyze: > > "Seq Scan on projektszervezet psz (cost=0.00..255.07 rows=42 width=9)" > " Filter: ((felhasznalo_id)::name = "current_user"())" You've only got 42 rows in that table - PostgreSQL probably thinks a sequential scan will be faster. It might even be right. The thing is, PostgreSQL doesn't know at planning time what the value of current_user() will be, so the plan can't depend on that; the planner just takes its best shot. But if you provide a particular value in the query then it will look at the stats and see what seems to make the most sense for that particular value. So using one of the more commonly-occuring value in the table might produce a sequential scan, while a less common value might lead to an index scan. ...Robert
Actually table itself has ~8000 records. I don't know why does it report 42 rows, since there is not even a matching row in the table for this specific condition.. But as we all know, the universal answer for every question is 42 ;) Autovacuum is on, and I also did some vacuuming before I started to play with this query. I could implement a function into my application to replace current_user to the actual username, but it just doesn't worth it. By the way, replacing current_user to a text constant reduces cost from 255->72, so there is a significant difference. Don't you think this is actually a bug, not a feature? balazs -----Original Message----- From: Robert Haas [mailto:robertmhaas@gmail.com] Sent: Monday, January 04, 2010 10:59 PM To: Keresztury Balázs Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] forced sequential scan when condition has current_user 2010/1/4 Keresztury Balázs <balazs@gaslightmusic.hu>: > just a small question: is it normal that PostgreSQL 8.4.1 always uses > sequential scanning on any table when there is a condition having the > constant "current_user"? Of course there is a btree index set on that table, > but the DBMS just doesn't want to utilize it. When I replace current_user to > any string, the planner uses the index normally. > > I can demonstrate it with the following simple query: > > SELECT psz.kotesszam FROM projekt.projektszervezet psz WHERE > psz.felhasznalo_id = current_user; > > Explain analyze: > > "Seq Scan on projektszervezet psz (cost=0.00..255.07 rows=42 width=9)" > " Filter: ((felhasznalo_id)::name = "current_user"())" You've only got 42 rows in that table - PostgreSQL probably thinks a sequential scan will be faster. It might even be right. The thing is, PostgreSQL doesn't know at planning time what the value of current_user() will be, so the plan can't depend on that; the planner just takes its best shot. But if you provide a particular value in the query then it will look at the stats and see what seems to make the most sense for that particular value. So using one of the more commonly-occuring value in the table might produce a sequential scan, while a less common value might lead to an index scan. ...Robert
On Jan 4, 2010, at 1:59 PM, Robert Haas wrote: > The thing is, PostgreSQL doesn't know at planning time what the value of > current_user() will be, so the plan can't depend on that; the planner > just takes its best shot. current_user() is a stable function and the manual is explicit that the result of stable function can be used in an indexscan: "A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for allrows within a single statement. This category allows the optimizer to optimize multiple calls of the function to a singlecall. In particular, it is safe to use an expression containing such a function in an index scan condition. (Sincean index scan will evaluate the comparison value only once, not once at each row, it is not valid to use a VOLATILEfunction in an index scan condition.)" postgres=# select provolatile from pg_proc where proname = 'current_user'; provolatile ------------- s So, I think the OP's question is still valid. Erik Jones, Database Administrator Engine Yard Support, Scalability, Reliability 866.518.9273 x 260 Location: US/Pacific IRC: mage2k
2010/1/4 Erik Jones <ejones@engineyard.com>: > On Jan 4, 2010, at 1:59 PM, Robert Haas wrote: >> The thing is, PostgreSQL doesn't know at planning time what the value of >> current_user() will be, so the plan can't depend on that; the planner >> just takes its best shot. > > current_user() is a stable function and the manual is explicit that the result of stable function can be used in an indexscan: That's true, but what I said is also true. It CAN be used in an index scan, and on a sufficiently large table it WILL be used in an index scan (I tried it). But the planner doesn't automatically use an index just because there is one; it tries to gauge whether that's the right strategy. Unfortunately, in cases where it is comparing to a function rather than a constant, its estimates are not always terribly accurate. One thing I notice is that the OP has not included any information on how fast the seqscan or index-scan actually is. If the seqscan is slower than the index-scan, then the OP might want to consider adjusting the page cost parameters - EXPLAIN ANALYZE output for both plans (perhaps obtained by temporarily setting enable_seqscan to false) would be helpful in understanding what is happening. ...Robert
Erik Jones wrote: > On Jan 4, 2010, at 1:59 PM, Robert Haas wrote: > >> The thing is, PostgreSQL doesn't know at planning time what the value of >> current_user() will be, so the plan can't depend on that; the planner >> just takes its best shot. > > current_user() is a stable function and the manual is explicit that the result of stable function can be used in an indexscan: Yes ... but the planner doesn't know the value current_user will return, so it can't use its statistics on the frequency with which a _particular_ value occurs to make decisions. It has to come up with the best generic plan for any value that current_user might return. It's as if current_user were a query parameter that won't be resolved until EXECUTE time. Arguably, in this particular case the planner *could* know what value current_user will return, but adding such special cases to the planner without a really good reason seems undesirable. -- Craig Ringer
Craig Ringer <craig@postnewspapers.com.au> writes: > Erik Jones wrote: >> current_user() is a stable function and the manual is explicit that the result of stable function can be used in an indexscan: > Yes ... but the planner doesn't know the value current_user will return, I think it's got nothing to do with that and everything to do with the fact that he's comparing to a varchar rather than text column ... regards, tom lane
> -----Mensaje original----- > De: Keresztury Balázs > > hi, > > just a small question: is it normal that PostgreSQL 8.4.1 > always uses sequential scanning on any table when there is a > condition having the constant "current_user"? Of course there > is a btree index set on that table, but the DBMS just doesn't > want to utilize it. When I replace current_user to any > string, the planner uses the index normally. > > I can demonstrate it with the following simple query: > > SELECT psz.kotesszam FROM projekt.projektszervezet psz WHERE > psz.felhasznalo_id = current_user; > Probably you are comparing different types. Try explicitly casting current_user to text: SELECT psz.kotesszam FROM projekt.projektszervezet psz WHERE psz.felhasznalo_id = current_user::text
You are right along with the others, the seq scan was only forced because of the varchar-text comparision.. Using the cast solves the problem. Thanks for the answers everyone! Balazs -----Original Message----- From: Fernando Hevia [mailto:fhevia@ip-tel.com.ar] Sent: Tuesday, January 05, 2010 3:16 PM To: 'Keresztury Balázs'; pgsql-performance@postgresql.org Subject: RE: [PERFORM] forced sequential scan when condition has current_user > -----Mensaje original----- > De: Keresztury Balázs > > hi, > > just a small question: is it normal that PostgreSQL 8.4.1 > always uses sequential scanning on any table when there is a > condition having the constant "current_user"? Of course there > is a btree index set on that table, but the DBMS just doesn't > want to utilize it. When I replace current_user to any > string, the planner uses the index normally. > > I can demonstrate it with the following simple query: > > SELECT psz.kotesszam FROM projekt.projektszervezet psz WHERE > psz.felhasznalo_id = current_user; > Probably you are comparing different types. Try explicitly casting current_user to text: SELECT psz.kotesszam FROM projekt.projektszervezet psz WHERE psz.felhasznalo_id = current_user::text