Thread: Joins and full index scans...mysql vs postgres?
I am issing a query like this: SELECT * FROM users users LEFT JOIN phorum_users_base ON users.uid = phorum_users_base.user_id LEFT JOIN useraux ON useraux.uid = users.uid; The joins are all on the PKs of the tables. It takes 1000ms to run on postgres. The identical mysql version runs in 230ms. The problem seems to stem from postgres's insistence to do three complete table scans, where mysql does one and joins 1:1 against the results of the first. I have switched the joins to inner joins and the difference is negligible. Here are the explains on both postgres and mysql. Is there a way to optimize this basic query for postgres that I am missing? Postgres Explain Merge Left Join (cost=0.00..2656.36 rows=6528 width=1522) Merge Cond: ("outer".uid = "inner".uid) -> Merge Left Join (cost=0.00..1693.09 rows=6528 width=1264) Merge Cond: ("outer".uid = "inner".user_id) -> Index Scan using users_pkey on users (cost=0.00..763.81 rows=6528 width=100) -> Index Scan using phorum_users_base_pkey on phorum_users_base (cost=0.00..822.92 rows=9902 width=1168) -> Index Scan using useraux_pkey on useraux (cost=0.00..846.40 rows=7582 width=262) MySQL Explain: id,select_type,table,possible_keys,key,key_len,ref,rows,extra 1, 'PRIMARY', 'USERS', 'ALL', '', '', '', '', 6528, '' 1, 'PRIMARY', 'phorum_users_base', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'wh2o.USERS.UID', 1, '' 1, 'PRIMARY', 'useraux', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'wh2o.USERS.UID', 1, ''
On Wed, Feb 22, 2006 at 12:26:47PM -0500, ryan groth wrote: > Postgres Explain We need to see EXPLAIN ANALYZE results here. What's your work_mem set to? /* Steinar */ -- Homepage: http://www.sesse.net/
Does this work: "Merge Left Join (cost=0.00..2656.36 rows=6528 width=1522) (actual time=0.057..123.659 rows=6528 loops=1)" " Merge Cond: ("outer".uid = "inner".uid)" " -> Merge Left Join (cost=0.00..1693.09 rows=6528 width=1264) (actual time=0.030..58.876 rows=6528 loops=1)" " Merge Cond: ("outer".uid = "inner".user_id)" " -> Index Scan using users_pkey on users (cost=0.00..763.81 rows=6528 width=100) (actual time=0.016..9.446 rows=6528 loops=1)" " -> Index Scan using phorum_users_base_pkey on phorum_users_base (cost=0.00..822.92 rows=9902 width=1168) (actual time=0.007..15.674 rows=9845 loops=1)" " -> Index Scan using useraux_pkey on useraux (cost=0.00..846.40 rows=7582 width=262) (actual time=0.007..11.935 rows=7529 loops=1)" "Total runtime: 127.442 ms" > On Wed, Feb 22, 2006 at 12:26:47PM -0500, ryan groth wrote: > > Postgres Explain > > We need to see EXPLAIN ANALYZE results here. > > What's your work_mem set to? > > /* Steinar */ > -- > Homepage: http://www.sesse.net/ > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > > -- > On Wed, Feb 22, 2006 at 12:26:47PM -0500, ryan groth wrote: > > Postgres Explain > > We need to see EXPLAIN ANALYZE results here. > > What's your work_mem set to? > > /* Steinar */ > -- > Homepage: http://www.sesse.net/ > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > > --
workmem is set to the default, increasing it decreases performance. > Does this work: > > "Merge Left Join (cost=0.00..2656.36 rows=6528 width=1522) (actual > time=0.057..123.659 rows=6528 loops=1)" > " Merge Cond: ("outer".uid = "inner".uid)" > " -> Merge Left Join (cost=0.00..1693.09 rows=6528 width=1264) > (actual time=0.030..58.876 rows=6528 loops=1)" > " Merge Cond: ("outer".uid = "inner".user_id)" > " -> Index Scan using users_pkey on users (cost=0.00..763.81 > rows=6528 width=100) (actual time=0.016..9.446 rows=6528 loops=1)" > " -> Index Scan using phorum_users_base_pkey on > phorum_users_base (cost=0.00..822.92 rows=9902 width=1168) (actual > time=0.007..15.674 rows=9845 loops=1)" > " -> Index Scan using useraux_pkey on useraux (cost=0.00..846.40 > rows=7582 width=262) (actual time=0.007..11.935 rows=7529 loops=1)" > "Total runtime: 127.442 ms" > > > > On Wed, Feb 22, 2006 at 12:26:47PM -0500, ryan groth wrote: > > > Postgres Explain > > > > We need to see EXPLAIN ANALYZE results here. > > > > What's your work_mem set to? > > > > /* Steinar */ > > -- > > Homepage: http://www.sesse.net/ > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 3: Have you checked our extensive FAQ? > > > > http://www.postgresql.org/docs/faq > > > > > > -- > > > > On Wed, Feb 22, 2006 at 12:26:47PM -0500, ryan groth wrote: > > > Postgres Explain > > > > We need to see EXPLAIN ANALYZE results here. > > > > What's your work_mem set to? > > > > /* Steinar */ > > -- > > Homepage: http://www.sesse.net/ > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 3: Have you checked our extensive FAQ? > > > > http://www.postgresql.org/docs/faq > > > > > > -- > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > > --
On Wed, 22 Feb 2006, ryan groth wrote: > Does this work: > > "Merge Left Join (cost=0.00..2656.36 rows=6528 width=1522) (actual > time=0.057..123.659 rows=6528 loops=1)" > " Merge Cond: ("outer".uid = "inner".uid)" > " -> Merge Left Join (cost=0.00..1693.09 rows=6528 width=1264) > (actual time=0.030..58.876 rows=6528 loops=1)" > " Merge Cond: ("outer".uid = "inner".user_id)" > " -> Index Scan using users_pkey on users (cost=0.00..763.81 > rows=6528 width=100) (actual time=0.016..9.446 rows=6528 loops=1)" > " -> Index Scan using phorum_users_base_pkey on > phorum_users_base (cost=0.00..822.92 rows=9902 width=1168) (actual > time=0.007..15.674 rows=9845 loops=1)" > " -> Index Scan using useraux_pkey on useraux (cost=0.00..846.40 > rows=7582 width=262) (actual time=0.007..11.935 rows=7529 loops=1)" > "Total runtime: 127.442 ms" Well, this implies the query took about 127 ms on the server side. Where did the 1000 ms number come from (was that on a client, and if so, what type)?
Hmm, it came from the timer on the pgadmin III sql query tool. I guess the 1,000ms includes the round-trip? See the wierd thing is that mysqlserver is running default configuration on a virtual machine (P3/1.3GHZ conf'd for 128mb ram) over a 100m/b ethernet connection. Postgres is running on a real P4/3.0ghz 4GB running localhost. Timings from the mysql query tool indicate that the 6.5k record query runs in "1.3346s (.3361s)" vs. the pgadmin query tool saying that the query runs "997+3522 ms". Am I reading these numbers wrong? Are these numbers reflective of application performance? Is there an optimization I am missing? Ryan > On Wed, 22 Feb 2006, ryan groth wrote: > > > Does this work: > > > > "Merge Left Join (cost=0.00..2656.36 rows=6528 width=1522) (actual > > time=0.057..123.659 rows=6528 loops=1)" > > " Merge Cond: ("outer".uid = "inner".uid)" > > " -> Merge Left Join (cost=0.00..1693.09 rows=6528 width=1264) > > (actual time=0.030..58.876 rows=6528 loops=1)" > > " Merge Cond: ("outer".uid = "inner".user_id)" > > " -> Index Scan using users_pkey on users (cost=0.00..763.81 > > rows=6528 width=100) (actual time=0.016..9.446 rows=6528 loops=1)" > > " -> Index Scan using phorum_users_base_pkey on > > phorum_users_base (cost=0.00..822.92 rows=9902 width=1168) (actual > > time=0.007..15.674 rows=9845 loops=1)" > > " -> Index Scan using useraux_pkey on useraux (cost=0.00..846.40 > > rows=7582 width=262) (actual time=0.007..11.935 rows=7529 loops=1)" > > "Total runtime: 127.442 ms" > > Well, this implies the query took about 127 ms on the server side. Where > did the 1000 ms number come from (was that on a client, and if so, what > type)? > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > > --
On Wed, 2006-02-22 at 12:11, ryan groth wrote: > Does this work: > > "Merge Left Join (cost=0.00..2656.36 rows=6528 width=1522) (actual > time=0.057..123.659 rows=6528 loops=1)" > " Merge Cond: ("outer".uid = "inner".uid)" > " -> Merge Left Join (cost=0.00..1693.09 rows=6528 width=1264) > (actual time=0.030..58.876 rows=6528 loops=1)" > " Merge Cond: ("outer".uid = "inner".user_id)" > " -> Index Scan using users_pkey on users (cost=0.00..763.81 > rows=6528 width=100) (actual time=0.016..9.446 rows=6528 loops=1)" > " -> Index Scan using phorum_users_base_pkey on > phorum_users_base (cost=0.00..822.92 rows=9902 width=1168) (actual > time=0.007..15.674 rows=9845 loops=1)" > " -> Index Scan using useraux_pkey on useraux (cost=0.00..846.40 > rows=7582 width=262) (actual time=0.007..11.935 rows=7529 loops=1)" > "Total runtime: 127.442 ms" In MySQL, have you tried writing a short perl or php script or even timing the mysql client running in one shot mode (I assume it can do that) from the outside to see how long it takes to actually run the query AND retrieve the data? My guess is most of the time for both queries will be taken in delivering the data.
> "997+3522 ms". Am I reading these numbers wrong? Are these numbers > reflective of application performance? Is there an optimization I am > missing? It also reflects the time it takes to pgadmin to insert the results into its GUI... If you want to get an approximation of the time the server needs to process your request, without the data marshalling time on the network and anything, you can either use EXPLAIN ANALYZE (but mysql doesn't have it, and the instrumentation adds overhead), or simply something like "SELECT sum(1) FROM (query to benchmark)", which only returns 1 row, and the sum() overhead is minimal, and it works on most databases. I find it useful because in knowing which portion of the time is spent by the server processing the query, or in data transfer, or in data decoding on the client side, or simply in displaying...
ryan groth wrote: > I am issing a query like this: > SELECT * > FROM users users > LEFT JOIN phorum_users_base ON users.uid = phorum_users_base.user_id > LEFT JOIN useraux ON useraux.uid = users.uid; > I'm not sure if postgres would rewrite your query to do the joins properly, though I guess someone else might've already suggested this :) I'm probably wrong but I read that as: join users -> phorum_users_base (ON users.uid = phorum_users_base.user_id) join phorum_users_base -> useraux (ON useraux.uid = users.uid) which won't be indexable because u.uid doesn't exist in phorum_users_base. Try SELECT * FROM users users LEFT JOIN phorum_users_base ON users.uid = phorum_users_base.user_id LEFT JOIN useraux ON useraux.uid = phorum_users_base.user_id or SELECT * FROM users u, phorum_users_base pub, useraux ua WHERE u.uid = pub.user_id AND au.uid = u.uid AND pub.user_id=au.uid; -- Postgresql & php tutorials http://www.designmagick.com/
The pgAdmin query tool is known to give an answer about 5x the real answer - don't believe it! ryan groth wrote: > Hmm, it came from the timer on the pgadmin III sql query tool. I guess > the 1,000ms includes the round-trip? See the wierd thing is that > mysqlserver is running default configuration on a virtual machine > (P3/1.3GHZ conf'd for 128mb ram) over a 100m/b ethernet connection. > Postgres is running on a real P4/3.0ghz 4GB running localhost. Timings > from the mysql query tool indicate that the 6.5k record query runs in > "1.3346s (.3361s)" vs. the pgadmin query tool saying that the query runs > "997+3522 ms". Am I reading these numbers wrong? Are these numbers > reflective of application performance? Is there an optimization I am > missing? > > Ryan > > >> On Wed, 22 Feb 2006, ryan groth wrote: >> >>> Does this work: >>> >>> "Merge Left Join (cost=0.00..2656.36 rows=6528 width=1522) (actual >>> time=0.057..123.659 rows=6528 loops=1)" >>> " Merge Cond: ("outer".uid = "inner".uid)" >>> " -> Merge Left Join (cost=0.00..1693.09 rows=6528 width=1264) >>> (actual time=0.030..58.876 rows=6528 loops=1)" >>> " Merge Cond: ("outer".uid = "inner".user_id)" >>> " -> Index Scan using users_pkey on users (cost=0.00..763.81 >>> rows=6528 width=100) (actual time=0.016..9.446 rows=6528 loops=1)" >>> " -> Index Scan using phorum_users_base_pkey on >>> phorum_users_base (cost=0.00..822.92 rows=9902 width=1168) (actual >>> time=0.007..15.674 rows=9845 loops=1)" >>> " -> Index Scan using useraux_pkey on useraux (cost=0.00..846.40 >>> rows=7582 width=262) (actual time=0.007..11.935 rows=7529 loops=1)" >>> "Total runtime: 127.442 ms" >> Well, this implies the query took about 127 ms on the server side. Where >> did the 1000 ms number come from (was that on a client, and if so, what >> type)? >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 6: explain analyze is your friend >> >> >
Christopher Kings-Lynne wrote: > The pgAdmin query tool is known to give an answer about 5x the real > answer - don't believe it! Everybody please forget immediately the factor 5. It's no factor at all, but the GUI update time that is *added*, which depends on rows*columns. > ryan groth wrote: > >> the pgadmin query tool saying that the query runs >> "997+3522 ms". Means 997ms until all data is at the client (libpq reports the rowset), the rest is GUI overhead. Regards, Andreas