Re: Differences in identical queries - Mailing list pgsql-general
From | Rob Schall |
---|---|
Subject | Re: Differences in identical queries |
Date | |
Msg-id | 45E868EC.3050101@callone.net Whole thread Raw |
In response to | Re: Differences in identical queries (Richard Huxton <dev@archonet.com>) |
List | pgsql-general |
The final answer wound up being.... ALTER TABLE current ALTER COLUMN orignum SET STATISTICS 100; and then an ANALYZE current; Then the queries started running like a champ (split seconds, not minutes). Thanks Richard! Rob Richard Huxton wrote: > Rob Schall wrote: >> When I reanalyzed the anitmp table with just the 4 entries (2 with >> istf=true and 2 with istf=false), both queries then ran the same >> way/time. >> So it would appear, if you want to do a join or a subselect (IN), then >> the number of items if will be comparing it to must be less than a >> certain number. > > I'm still not sure how you're concluding that. > > > In my case, the full amount of false's that fail is >> actually 40. So if you have a table of 2 million items (in current) and >> want to join it to a table of 40 items, then it must do the longer >> hashed join version and not the nested loop. >> >> Below are the results with the anitmp table with 42 items. 40 set as >> false, and 2 set as true. Is there a way to rewrite my query to have it >> run as quick as the others? > >> calldetail=> EXPLAIN ANALYZE SELECT current.* FROM current JOIN anitmp >> ON current.orignum=anitmp.ani AND istf=false; >> QUERY PLAN >> ----------------------------------------------------------------------------------------------------------------------------------- >> >> Hash Join (cost=1.63..1031833.26 rows=256563 width=108) (actual >> time=1889.469..155380.749 rows=653 loops=1) > > OK - so here's the root of the problem. The planner thinks it'll get > back 256,563 rows but actually gets 653. If you actually got more than > 200,000 rows back then a seq-scan on current might well make sense. > >> Hash Cond: ("outer".orignum = "inner".ani) >> -> Seq Scan on current (cost=0.00..920527.00 rows=10873900 >> width=108) (actual time=670.402..136192.991 rows=10681150 loops=1) >> -> Hash (cost=1.52..1.52 rows=41 width=8) (actual time=0.187..0.187 >> rows=0 loops=1) >> -> Seq Scan on anitmp (cost=0.00..1.52 rows=41 width=8) >> (actual time=0.014..0.108 rows=40 loops=1) >> Filter: (istf = false) > > Hmm - what sort of distribution of values do you have in "orignum" - > telephone numbers, so presumably they're quite distinct. > > I'd be tempted to up the statistics on that column, reanalyse both > tables and see what happens. > ALTER TABLE current ALTER COLUMN orignum SET STATISTICS=100; > You can set values up to 1000, start at 100 and step up. Not checked > the syntax on that last statement btw. > > You can see the before and after effects by looking at: > SELECT * FROM pg_stats WHERE tablename='current' AND attname='orignum'; >
pgsql-general by date: