Re: unoptimized nested loops - Mailing list pgsql-general
From | Tim Kelly |
---|---|
Subject | Re: unoptimized nested loops |
Date | |
Msg-id | 629B42C2.6020107@dialectronics.com Whole thread Raw |
In response to | Re: unoptimized nested loops (Jeff Janes <jeff.janes@gmail.com>) |
List | pgsql-general |
I resolved the problem by eliminating the planner from the decision making altogether, through using a (permanently) temporary table populated by the subset of data records from an initial query generated from the metadata, then searching only in the temp table for the actual data records I want. I ran this code into a function sxa ("search by author"). In my particular case I am looking for the same snippet of text over and over again so I hardwired it into the function. The function uses 'like' instead of '=' on the author so I am actually comparing an exact match using the inner join to a pattern match in the function. c_db=> select count(id) from content; count --------- 1248954 (1 row) Time: 587.325 ms c_db=> select count(id) from data inner join metadata on data.id = metadata.id where author = 'Powers'; count ------- 347 (1 row) Time: 519.435 ms c_db=> select count(id) from data inner join metadata on data.id = metadata.id where author = 'Powers' and content like '%some text%'; count ------- 14 (1 row) Time: 209895.655 ms c_db=> select count(id) from sxa('Powers'); count ------- 14 (1 row) Time: 1794.600 ms The above function run time includes deleting the previous search results and creating 347 records in the temp table. I also find that it appears to sometimes run even faster depending on the search (due to variation in content length) and can be even less than half the above time: c_db=> select count(id) from sxa('Zelazny'); count ------- 13 (1 row) Time: 790.551 ms The inner join time run time variance of searches is greater than the total time for any search with the function. Total run time for any inner join search was always 209 to 211 seconds, as all of the content values appear to be searched. Use of a temp table reduced search time of 1.25m records from 3 1/2 minutes to less than two seconds and in some cases to less than one. In summary, the answer to how to overcome a bad decision by the postgresql planner appears to be move the subset of data into a temporary table and force postgresql to look in it instead. The version of postgresql I am using is 8.4.1, admittedly old. If there is a newer version of postgresql that has fixed this, please point me to it and I will see if I can upgrade to it, instead of crafting functions. Also, if someone could refresh my memory on how relational databases are supposed to work by default, I would appreciate it. I seem to have gotten confused somewhere. tim c_db=> \d t Table "public.t" Column | Type | Modifiers ---------+-----------------------+----------- id | character varying(30) | not null content | text | Indexes: "t_pkey" PRIMARY KEY, btree (id) CREATE OR REPLACE FUNCTION public.sxa(author character varying) RETURNS TABLE(id character varying) LANGUAGE sql AS $function$ delete from t; insert into t (select id, content from metadata inner join data on metadata.id = data.id where author like $1); select id from data where id in (select id from t where content like '%some text%'); $function$ ;
pgsql-general by date: