Re: Why is seq search preferred here by planner? - Mailing list pgsql-sql

From
Subject Re: Why is seq search preferred here by planner?
Date
Msg-id 1286.219.65.233.8.1051120485.squirrel@mail.trade-india.com
Whole thread Raw
In response to Re: Why is seq search preferred here by planner?  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Why is seq search preferred here by planner?
List pgsql-sql
> Mallah,
>
>> Hmm i am not running postgresql on a CRAY  :-)
>>
>>  that was the time for "begin work;"  since
>> i am explain analysing an update i am putting it in transaction.
>>
>> the actualt time was nearing 300 secs which is 5 mins
>>
>> i did an explain of the UPDATE FROM variant of the query it has the same plan for it.
>
> It's the same query.  When you forget the FROM, postgres tries to insert it  for you ... in
> fact, you generally get a notice like "Inserting missing FROM  clause for table
> 'personal_account_details'".

Josh when i get that such  Notices the result of update is usually worng.
I do not get that notice for the SQL i posted.

>
> Am I reading your anaylze right?
i think u may have missed parts of it.
 I'm never 100% sure with 7.2 ... you *are*  updating 150,000
> rows?

7.2 u mean ,  pgsql 7.2 ?  No i am running pgsql 7.3.2

yes i did update 1,50,000 rows that time.

>
> If I'm right, then the query is using a seq scan because it's faster than an  index scan for a
> large number of rows.   You can always test this by runninng  the query after an SET
> ENABLE_SEQSCAN = FALSE; generally the resulting query  will take 5-10 times as long.


Yes as explained in docs (somewhere ) and by many ppl seq scan is faster when
majority of  the rows are stisfy the search  criteria.

my botheration was that if pgsql were to search the personal_account_details with userid
(which is incidently the pkey there) why would it not use the uniq  index  ?
logically the query shud be looking the coutry feild from the personal_account_details
for every userid (row) encoutered in email_bank table.

may be i am wrong here in correctly under standing the  explain output.

tradein_clients=# explain UPDATE email_bank set country=personal_account_details.country FROM
personal_account_details where email_bank.userid > 0 and
email_bank.userid=personal_account_details.userid and ( email_bank.country <>
personal_account_details.country or email_bank.country IS NULL );                                          QUERY PLAN
----------------------------------------------------------------------------------------------Hash Join
(cost=14497.06..37806.55rows=186849 width=144)  Hash Cond: ("outer".userid = "inner".userid)  Join Filter:
(("outer".country<> "inner".country) OR ("outer".country IS NULL))  ->  Seq Scan on email_bank  (cost=0.00..16268.10
rows=216658width=130)        Filter: (userid > 0)  ->  Hash  (cost=14113.45..14113.45 rows=153445 width=14)        ->
SeqScan on personal_account_details  (cost=0.00..14113.45 rows=153445 width=14) 
(7 rows)

I am bothered abt the second seq scan , sorry i do not understand "Hash" is there any docs that
explain me that ?

>
> Your query delay time is updating 150,000 rows, period.   That's a *lot* of  disk activity, and
> it can only be improved by adjusting your postgresql.conf,  the disk locations of your files,
> the speed of your disk array, and your I/O  bandwidth.

my WAL logs (pg_xlog) are already in a dedicated disk [ buts thats a different topic ]

>
> How long does this take?
>
> UPDATE email_bank set country=personal_account_details.country
> FROM personal_account_details
> where userid > 0 and userid=personal_account_details.userid
>     and email_bank.country <> personal_account_details.country
>     or email_bank.country IS NULL;

Yes It took very less , because there were actually very less rows to update .
I agree the  "email_bank.country <> personal_account_details.country     or email_bank.country IS NULL"
was a neat thing to do in last query which i didnt'   :-(

this time since most of the rows were already equal becoz i updated them a little while
back with that 5 mins query ;-)

anyway could u plez explain the second "seq scan part" ?



HERE WAS THE EXPLIAN ANALYZE OUTPUT:

hey dont you think the parentheis i added were necessary for the query
to update proper rows ? becoz the  "or" part could have evaluated  to true even for
no matching userids resulting in update of unecessary rows.


tradein_clients=# begin work ; explain analyze UPDATE email_bank set
country=personal_account_details.country FROM personal_account_details where email_bank.userid > 0
and email_bank.userid=personal_account_details.userid and ( email_bank.country <>
personal_account_details.country or email_bank.country IS NULL );BEGIN
Time: 720.04 ms                                                                 QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
HashJoin  (cost=14497.06..37806.55 rows=186849 width=144) (actual time=5874.10..8754.98rows=1033 loops=1)   Hash Cond:
("outer".userid= "inner".userid)  Join Filter: (("outer".country <> "inner".country) OR ("outer".country IS NULL))  ->
SeqScan on email_bank  (cost=0.00..16268.10 rows=216658 width=130) (actual  time=0.83..1361.35 rows=156669 loops=1)
   Filter: (userid > 0)  ->  Hash  (cost=14113.45..14113.45 rows=153445 width=14) (actual time=5855.59..5855.59 rows=0
loops=1)        ->  Seq Scan on personal_account_details  (cost=0.00..14113.45 rows=153445 width=14)        (actual
time=0.04..5550.93rows=153466 loops=1) Total runtime: 9370.74 msec 
(8 rows)






>
> --
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>
>
> ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9'
> the postmaster



-----------------------------------------
Get your free web based email at trade-india.com.  "India's Leading B2B eMarketplace.!"
http://www.trade-india.com/



pgsql-sql by date:

Previous
From: Rod Taylor
Date:
Subject: Re: Optomizing left outer joins
Next
From: Richard Huxton
Date:
Subject: Re: OUTER JOIN