Thread: qurey plan and indices
Hi, I've experienced a brutal speedup (order of 2) separateing the following subquery and making it manually: explain select name,description from descriptions where in (select name from descriptions where description like '%Bankverbindung%'); Seq Scan on descriptions (cost=163.98 rows=3575 width=24) SubPlan -> Seq Scan on descriptions (cost=163.98 rows=2 width=12) [I had no patient to wait the resuls...] If I give manually the subquery it gives an immediate response (5 records at all). If I copy manually that 5 record as constants after 'in', then I receive an immediate response from the main query too! explain select name,description from descriptions where name in ('accounts', 'no_account', 'no_account', 'no_acc', 'account'); Index Scan using desc_ind3, desc_ind3, desc_ind3, desc_ind3 on descriptions (cost=9.01 rows=36 width=24) I saw in the archive that in, and not in can cause performance degradation, but this is not the case now. booth queries are using 'in' Why can't be the subquery evaluated first, and then run the main query with the results? ============== the stucture: =============================== CREATE TABLE "descriptions" ( "objectid" int4 DEFAULT nextval ( 'DescSeq' ) NOT NULL, "ts" int4 NOT NULL, "classid" int4 NOT NULL, "languageid" int4 NOT NULL, "priority" int4 NOT NULL, "name" character varying(50)NOT NULL, "description" character varying(200)); CREATE INDEX "desc_ind3" on "descriptions" using btree ( "name" "varchar_ops" ); CREATE INDEX "desc_ind2" on "descriptions" using btree ( "description" "varchar_ops" ); CREATE INDEX "desc_ind1" on "descriptions" using btree ( "classid" "int4_ops", "name" "varchar_ops" ); CREATE UNIQUE INDEX "desc_idx" on "descriptions" using btree ( "classid" "int4_ops", "languageid" "int4_ops", "name" "varchar_ops" ); CREATE UNIQUE INDEX "descriptions_pkey" on "descriptions" using btree ( "objectid" "int4_ops" ); version: postgresql 6.5, debian potato.
MESZAROS Attila wrote: > > Hi, > > I've experienced a brutal speedup (order of 2) separateing the following > subquery and making it manually: > > explain select name,description > from descriptions > where in (select name > from descriptions > where description like '%Bankverbindung%'); > > Seq Scan on descriptions (cost=163.98 rows=3575 width=24) > SubPlan > -> Seq Scan on descriptions (cost=163.98 rows=2 width=12) > [I had no patient to wait the resuls...] > Correct me if I am wrong, however if using LIKE '%something%' (which means, the field contains 'something' somewhere in the field) there is never a chance to use the index you have created - a sequential table scan MUST be made, thus you have to read all 3575 rows to return the set. However, if you change your query so that you are looking for LIKE 'Bankverbindung%' then an index can be used becase you know what the first few characters are known. Hope this helps, Cordially Patrick Giagnocavo a222@redrose.net
Patrick Giagnocavo <a222@redrose.net> writes: > MESZAROS Attila wrote: >> >> I've experienced a brutal speedup (order of 2) separateing the following >> subquery and making it manually: >> >> explain select name,description >> from descriptions >> where in (select name >> from descriptions >> where description like '%Bankverbindung%'); > Correct me if I am wrong, however if using > LIKE '%something%' > (which means, the field contains 'something' somewhere in the field) > there is never a chance to use the index you have created - a > sequential table scan MUST be made, thus you have to read all 3575 > rows to return the set. Yes. The inner query must happen by sequential scan. But the outer query can use an index if it is in the form... where (name = 'a') or (name = 'b') or (name = 'c'); which is what "where name in ('a','b','c')" gets translated to. However, the *real* problem here is that the inner query is treated as a subplan, which means it is re-evaluated for each tuple scanned by the outer query. This is not very bright; the system ought to notice that the inner query does not depend on the state of the outer query, and do it only once. There is already a notion of InitPlan vs. SubPlan. I'm not sure why this case is being classified as a SubPlan, but it sure looks like it ought to be done as an InitPlan... regards, tom lane
Hi, > LIKE '%something%' > (which means, the field contains 'something' somewhere in the field)I should have to remove that % sign...this is not thereal problem, bceause the subquery alone withbooth the % signs responds faster than a second. I think the real problem is that, the subquery is evaluated 3575times :(((( If you see the plan of the direct manul query in my previous mailyou will see that there is no sequential search at all.... what makes the difference???? Attila
> However, the *real* problem here is that the inner query is treated > as a subplan, which means it is re-evaluated for each tuple scanned > by the outer query. This is not very bright; the system ought to > notice that the inner query does not depend on the state of the > outer query, and do it only once. There is already a notion of > InitPlan vs. SubPlan. I'm not sure why this case is being classified > as a SubPlan, but it sure looks like it ought to be done as an > InitPlan...So this means, it is a bug somewhere in the planner?who should we report this?? Attila any workarounds? [better than temptables?]
Tom Lane wrote: > > Yes. The inner query must happen by sequential scan. But the outer > query can use an index if it is in the form > ... where (name = 'a') or (name = 'b') or (name = 'c'); > which is what "where name in ('a','b','c')" gets translated to. > > However, the *real* problem here is that the inner query is treated > as a subplan, which means it is re-evaluated for each tuple scanned > by the outer query. This is not very bright; the system ought to > notice that the inner query does not depend on the state of the > outer query, and do it only once. There is already a notion of > InitPlan vs. SubPlan. I'm not sure why this case is being classified > as a SubPlan, but it sure looks like it ought to be done as an > InitPlan... Not in all cases. First, to use InitPlan you would have to add DISTINCT to subquery to avoid duplicates. Second, there is another way of optimization - cache/hash subquery results to avoid re-execution of subquery plan. Planner should make decision what way to follow. Vadim
> Not in all cases. First, to use InitPlan you would have to add > DISTINCT to subquery to avoid duplicates. Second, there is another > way of optimization - cache/hash subquery results to avoid > re-execution of subquery plan. Planner should make decision what > way to follow. Yep, that's what I was looking for. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026