Thread: Performance Killer 'IN' ?
Hossa, I just made a little test on our test-database. I have an excel sheet with about 2000 entries that should be updated with the same value. In a first try I generated an SQL-Syntax for every entry like: UPDATE xyz SET status=-6 WHERE id=xyz1 AND status>-1; UPDATE xyz SET status=-6 WHERE id=xyz2 AND status>-1; UPDATE xyz SET status=-6 WHERE id=... AND status>-1; The execution of the ~2000 SQL-commands took about 5-10 seconds. Then I tried the same with generating only one request using IN with the twothousand entries like: UPDATE xyz WHERE id IN (xyz1, xyz2, ....) AND status>-1; and it took about 10 Minutes to execute. So it is nearly a hundred times slower. Can this be verified? Is there anything that can be done about that? Else I would need to have a few words with our programmers... By the way the testsystem is a basic Suse 9.3 with a default postgres installation 8.0.x Thanks and *greets* Kai
> and it took about 10 Minutes to execute. So it is nearly a hundred times > slower. Can this be verified? Is there anything that can be done about > that? Else I would need to have a few words with our programmers... OK... I overstated a little bit. The real numbers are: 1.) 21.5 seconds 2.) 363.7 seconds But it is still a significant difference.
Kai Hessing wrote: >> and it took about 10 Minutes to execute. So it is nearly a hundred times >> slower. Can this be verified? Is there anything that can be done about >> that? Else I would need to have a few words with our programmers... > > OK... I overstated a little bit. The real numbers are: > > 1.) 21.5 seconds > 2.) 363.7 seconds > > But it is still a significant difference. Can you provide an explain analyze of each query? > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
Joshua D. Drake wrote: > Kai Hessing wrote: >> 1.) 21.5 seconds >> 2.) 363.7 seconds >> >> But it is still a significant difference. > > Can you provide an explain analyze of each query? There isn't an explain analyze of the first variant, because there are just 2000 SQL-Updates. The explain analyze for just one of the commands (UPDATE xyz SET status=-6 WHERE phon='xyz1' AND status>-1;) is: ------------------ Index Scan using phon_phon_idx on phon (cost=0.00..5193.83 rows=530 width=148) (actual time=0.146..0.146 rows=0 loops=1) Index Cond: ((phon)::text = 'xyz'::text) Filter: (status > -1) Total runtime: 0.387 ms 1 Datensätze (means data sets) Laufzeit gesamt: 16.682 ms (means running time) ------------------ The second one (UPDATE xyz WHERE id IN (xyz1, xyz2, ....) AND status>-1;) returns: ------------------ Seq Scan on phon (cost=0.00..1573304.58 rows=105931 width=148) (actual time=369563.565..369563.565 rows=0 loops=1) Filter: ((((phon)::text = 'xyz1) OR ((phon)::text = 'xyz2'::text) OR ((phon)::text = 'xyz3'::text) OR ((phon)::text = 'xyz4'::text) OR ... [all the 2000 entries from the IN clause] Total runtime: 369566.954 ms 667 Datensätze (means data sets) Laufzeit gesamt: 370,179.246 ms (means running time) ------------------ Hope, that helps. Have a nice weekend. Being back to work on monday ;) *greets* Kai
On 3/31/06, Kai Hessing <kai.hessing@hobsons.de> wrote: > The second one (UPDATE xyz WHERE id IN (xyz1, xyz2, ....) AND > status>-1;) returns: > ------------------ > Seq Scan on phon (cost=0.00..1573304.58 rows=105931 width=148) (actual > time=369563.565..369563.565 rows=0 loops=1) Just a shot in the dark: does the plan stay the same, when you remove the ' AND status > -1' ? -- marko
Kai Hessing <kai.hessing@hobsons.de> writes: > Index Scan using phon_phon_idx on phon (cost=0.00..5193.83 rows=530 > width=148) (actual time=0.146..0.146 rows=0 loops=1) > ... > Seq Scan on phon (cost=0.00..1573304.58 rows=105931 width=148) (actual > time=369563.565..369563.565 rows=0 loops=1) You need to look into the discrepancy between estimated and actual row counts. (I suppose the reason you're showing 0 rows here is that you already did these UPDATEs and so none of the rows in question pass the status filter --- but how many rows are there matching the phon index conditions?) Perhaps a larger statistics target for the phon column would be a good idea. regards, tom lane
Marko Kreen wrote: > On 3/31/06, Kai Hessing <kai.hessing@hobsons.de> wrote: >> The second one (UPDATE xyz WHERE id IN (xyz1, xyz2, ....) AND >> status>-1;) returns: >> ------------------ >> Seq Scan on phon (cost=0.00..1573304.58 rows=105931 width=148) (actual >> time=369563.565..369563.565 rows=0 loops=1) > > Just a shot in the dark: does the plan stay the same, > when you remove the ' AND status > -1' ? No difference: I skipped the 'AND status > -1' and have the following results... Using 2000x 'UPDATE phon SET status=-6 WHERE ' returns each time: ---------------------------- Index Scan using phon_phon_idx on phon (cost=0.00..5179.80 rows=1587 width=148) (actual time=31.452..31.470 rows=1 loops=1) Index Cond: ((phon)::text = 'wink4103@uni-trier.de'::text) Total runtime:3.414 ms ---------------------------- (Total runtime for all 2000 Updates: 23335.393 ms Using the 'UPDATE xyz WHERE id IN ('xyz1', 'xyz2', other 2000 values.....)' returns: ---------------------------- Seq Scan on phon (cost=0.00..1564960.67 rows=317227 width=148) (actual time=68.315..365621.761 rows=2522 loops=1) Filter: (((phon)::text = '..... Total runtime: 393182.745 ms ----------------------------
Tom Lane wrote: > Kai Hessing <kai.hessing@hobsons.de> writes: >> Index Scan using phon_phon_idx on phon (cost=0.00..5193.83 rows=530 >> width=148) (actual time=0.146..0.146 rows=0 loops=1) >> ... >> Seq Scan on phon (cost=0.00..1573304.58 rows=105931 width=148) (actual >> time=369563.565..369563.565 rows=0 loops=1) > > You need to look into the discrepancy between estimated and actual row > counts. (I suppose the reason you're showing 0 rows here is that you > already did these UPDATEs and so none of the rows in question pass the > status filter --- but how many rows are there matching the phon index > conditions?) Perhaps a larger statistics target for the phon column > would be a good idea. Yes... The 0 rows are there because I did the command before. Now I resetted the test database to a previous state and dopped the 'AND status>-1' in the SQL-syntax: Using the 'UPDATE xyz WHERE id IN ('xyz1', 'xyz2', other 2000 values.....)' returns: ---------------------------- Seq Scan on phon (cost=0.00..1564960.67 rows=317227 width=148) (actual time=68.315..365621.761 rows=2522 loops=1) Filter: (((phon)::text = '..... Total runtime: 393182.745 ms ---------------------------- (please see also the other post <49ck9sFo32mbU1@individual.net> ) What do you mean with larger statistics target?
On 4/3/06, Kai Hessing <kai.hessing@hobsons.de> wrote: > Marko Kreen wrote: > > Just a shot in the dark: does the plan stay the same, > > when you remove the ' AND status > -1' ? > > No difference: I skipped the 'AND status > -1' and have the following > results... Ok. Thanks. I once had similar query and it seemed that postgres got confused if there was big IN and alse something else in WHERE clause. Dropping the rest got postgres to use index for IN. But I did not have time to research it then and with your result seems it's not a problem in recent versions. -- marko
Kai Hessing <kai.hessing@hobsons.de> writes: > Yes... The 0 rows are there because I did the command before. Now I > resetted the test database to a previous state and dopped the 'AND > status>-1' in the SQL-syntax: > Using the 'UPDATE xyz WHERE id IN ('xyz1', 'xyz2', other 2000 > values.....)' returns: > ---------------------------- > Seq Scan on phon (cost=0.00..1564960.67 rows=317227 width=148) (actual > time=68.315..365621.761 rows=2522 loops=1) > Filter: (((phon)::text = '..... Well, here's our problem it would seem: the planner is estimating the IN clause to match 317227 rows, rather than the actual 2522. That's naturally going to bias it against choosing an indexscan. You need to get that estimate closer before there's going to be much chance of choosing the right plan. > What do you mean with larger statistics target? See ALTER TABLE SET STATISTICS, or just change default_statistics_target and re-ANALYZE. regards, tom lane
Tom Lane wrote: > Well, here's our problem it would seem: the planner is estimating the IN > clause to match 317227 rows, rather than the actual 2522. That's > naturally going to bias it against choosing an indexscan. You need to > get that estimate closer before there's going to be much chance of > choosing the right plan. > >> What do you mean with larger statistics target? > > See ALTER TABLE SET STATISTICS, or just change default_statistics_target > and re-ANALYZE. Thanks, that definitly looks like a starting point. I will test it and post my results. Btw. what happens if the estimation would be to low?
Kai Hessing wrote: >>> What do you mean with larger statistics target? >> >> See ALTER TABLE SET STATISTICS, or just change default_statistics_target >> and re-ANALYZE. > > Thanks, that definitly looks like a starting point. I will test it and > post my results. Btw. what happens if the estimation would be to low? unbelievable... I found the problem: Doing a simple ANALYZE after resetting the test-db to a previous state is my friend. The results now are: --------- Index Scan using phon_phon_idx, phon_phon_idx, phon_phon_idx, ..... (cost=0.00..56793.09 rows=4068 width=53) (actual time=0.155..363.810 rows=2534 loops=1) Index Cond: (((phon)::text = 'xyz'::text) OR ..... Filter: (status > -1) Total runtime: 11140.480 ms --------- Thank you very much, this was the right hint!