Re: Why is seq search preferred here by planner? - Mailing list pgsql-sql
From | |
---|---|
Subject | Re: Why is seq search preferred here by planner? |
Date | |
Msg-id | 1286.219.65.233.8.1051120485.squirrel@mail.trade-india.com Whole thread Raw |
In response to | Re: Why is seq search preferred here by planner? (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: Why is seq search preferred here by planner?
|
List | pgsql-sql |
> Mallah, > >> Hmm i am not running postgresql on a CRAY :-) >> >> that was the time for "begin work;" since >> i am explain analysing an update i am putting it in transaction. >> >> the actualt time was nearing 300 secs which is 5 mins >> >> i did an explain of the UPDATE FROM variant of the query it has the same plan for it. > > It's the same query. When you forget the FROM, postgres tries to insert it for you ... in > fact, you generally get a notice like "Inserting missing FROM clause for table > 'personal_account_details'". Josh when i get that such Notices the result of update is usually worng. I do not get that notice for the SQL i posted. > > Am I reading your anaylze right? i think u may have missed parts of it. I'm never 100% sure with 7.2 ... you *are* updating 150,000 > rows? 7.2 u mean , pgsql 7.2 ? No i am running pgsql 7.3.2 yes i did update 1,50,000 rows that time. > > If I'm right, then the query is using a seq scan because it's faster than an index scan for a > large number of rows. You can always test this by runninng the query after an SET > ENABLE_SEQSCAN = FALSE; generally the resulting query will take 5-10 times as long. Yes as explained in docs (somewhere ) and by many ppl seq scan is faster when majority of the rows are stisfy the search criteria. my botheration was that if pgsql were to search the personal_account_details with userid (which is incidently the pkey there) why would it not use the uniq index ? logically the query shud be looking the coutry feild from the personal_account_details for every userid (row) encoutered in email_bank table. may be i am wrong here in correctly under standing the explain output. tradein_clients=# explain UPDATE email_bank set country=personal_account_details.country FROM personal_account_details where email_bank.userid > 0 and email_bank.userid=personal_account_details.userid and ( email_bank.country <> personal_account_details.country or email_bank.country IS NULL ); QUERY PLAN ----------------------------------------------------------------------------------------------Hash Join (cost=14497.06..37806.55rows=186849 width=144) Hash Cond: ("outer".userid = "inner".userid) Join Filter: (("outer".country<> "inner".country) OR ("outer".country IS NULL)) -> Seq Scan on email_bank (cost=0.00..16268.10 rows=216658width=130) Filter: (userid > 0) -> Hash (cost=14113.45..14113.45 rows=153445 width=14) -> SeqScan on personal_account_details (cost=0.00..14113.45 rows=153445 width=14) (7 rows) I am bothered abt the second seq scan , sorry i do not understand "Hash" is there any docs that explain me that ? > > Your query delay time is updating 150,000 rows, period. That's a *lot* of disk activity, and > it can only be improved by adjusting your postgresql.conf, the disk locations of your files, > the speed of your disk array, and your I/O bandwidth. my WAL logs (pg_xlog) are already in a dedicated disk [ buts thats a different topic ] > > How long does this take? > > UPDATE email_bank set country=personal_account_details.country > FROM personal_account_details > where userid > 0 and userid=personal_account_details.userid > and email_bank.country <> personal_account_details.country > or email_bank.country IS NULL; Yes It took very less , because there were actually very less rows to update . I agree the "email_bank.country <> personal_account_details.country or email_bank.country IS NULL" was a neat thing to do in last query which i didnt' :-( this time since most of the rows were already equal becoz i updated them a little while back with that 5 mins query ;-) anyway could u plez explain the second "seq scan part" ? HERE WAS THE EXPLIAN ANALYZE OUTPUT: hey dont you think the parentheis i added were necessary for the query to update proper rows ? becoz the "or" part could have evaluated to true even for no matching userids resulting in update of unecessary rows. tradein_clients=# begin work ; explain analyze UPDATE email_bank set country=personal_account_details.country FROM personal_account_details where email_bank.userid > 0 and email_bank.userid=personal_account_details.userid and ( email_bank.country <> personal_account_details.country or email_bank.country IS NULL );BEGIN Time: 720.04 ms QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- HashJoin (cost=14497.06..37806.55 rows=186849 width=144) (actual time=5874.10..8754.98rows=1033 loops=1) Hash Cond: ("outer".userid= "inner".userid) Join Filter: (("outer".country <> "inner".country) OR ("outer".country IS NULL)) -> SeqScan on email_bank (cost=0.00..16268.10 rows=216658 width=130) (actual time=0.83..1361.35 rows=156669 loops=1) Filter: (userid > 0) -> Hash (cost=14113.45..14113.45 rows=153445 width=14) (actual time=5855.59..5855.59 rows=0 loops=1) -> Seq Scan on personal_account_details (cost=0.00..14113.45 rows=153445 width=14) (actual time=0.04..5550.93rows=153466 loops=1) Total runtime: 9370.74 msec (8 rows) > > -- > Josh Berkus > Aglio Database Solutions > San Francisco > > > ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' > the postmaster ----------------------------------------- Get your free web based email at trade-india.com. "India's Leading B2B eMarketplace.!" http://www.trade-india.com/