Thread: Joins and full index scans...mysql vs postgres?

Joins and full index scans...mysql vs postgres?

From
"ryan groth"
Date:
I am issing a query like this:
SELECT *
   FROM users users
   LEFT JOIN phorum_users_base ON users.uid = phorum_users_base.user_id
   LEFT JOIN useraux ON useraux.uid = users.uid;

The joins are all on the PKs of the tables. It takes 1000ms to run on
postgres. The identical mysql version runs in 230ms. The problem seems
to stem from postgres's insistence to do three complete table scans,
where mysql does one and joins 1:1 against the results of the first. I
have switched the joins to inner joins and the difference is negligible.
Here are the explains on both postgres and mysql. Is there a way to
optimize this basic query for postgres that I am missing?

Postgres Explain

Merge Left Join  (cost=0.00..2656.36 rows=6528 width=1522)
Merge Cond: ("outer".uid = "inner".uid)
  ->  Merge Left Join  (cost=0.00..1693.09 rows=6528 width=1264)
        Merge Cond: ("outer".uid = "inner".user_id)
        ->  Index Scan using users_pkey on users  (cost=0.00..763.81
rows=6528 width=100)
        ->  Index Scan using phorum_users_base_pkey on phorum_users_base
 (cost=0.00..822.92 rows=9902 width=1168)
  ->  Index Scan using useraux_pkey on useraux  (cost=0.00..846.40
rows=7582 width=262)


MySQL Explain:

id,select_type,table,possible_keys,key,key_len,ref,rows,extra
1, 'PRIMARY', 'USERS', 'ALL', '', '', '', '', 6528, ''
1, 'PRIMARY', 'phorum_users_base', 'eq_ref', 'PRIMARY', 'PRIMARY', '4',
'wh2o.USERS.UID', 1, ''
1, 'PRIMARY', 'useraux', 'eq_ref', 'PRIMARY', 'PRIMARY', '4',
'wh2o.USERS.UID', 1, ''


Re: Joins and full index scans...mysql vs postgres?

From
"Steinar H. Gunderson"
Date:
On Wed, Feb 22, 2006 at 12:26:47PM -0500, ryan groth wrote:
> Postgres Explain

We need to see EXPLAIN ANALYZE results here.

What's your work_mem set to?

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Joins and full index scans...mysql vs postgres?

From
"ryan groth"
Date:
Does this work:

"Merge Left Join  (cost=0.00..2656.36 rows=6528 width=1522) (actual
time=0.057..123.659 rows=6528 loops=1)"
"  Merge Cond: ("outer".uid = "inner".uid)"
"  ->  Merge Left Join  (cost=0.00..1693.09 rows=6528 width=1264)
(actual time=0.030..58.876 rows=6528 loops=1)"
"        Merge Cond: ("outer".uid = "inner".user_id)"
"        ->  Index Scan using users_pkey on users  (cost=0.00..763.81
rows=6528 width=100) (actual time=0.016..9.446 rows=6528 loops=1)"
"        ->  Index Scan using phorum_users_base_pkey on
phorum_users_base  (cost=0.00..822.92 rows=9902 width=1168) (actual
time=0.007..15.674 rows=9845 loops=1)"
"  ->  Index Scan using useraux_pkey on useraux  (cost=0.00..846.40
rows=7582 width=262) (actual time=0.007..11.935 rows=7529 loops=1)"
"Total runtime: 127.442 ms"


> On Wed, Feb 22, 2006 at 12:26:47PM -0500, ryan groth wrote:
> > Postgres Explain
>
> We need to see EXPLAIN ANALYZE results here.
>
> What's your work_mem set to?
>
> /* Steinar */
> --
> Homepage: http://www.sesse.net/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>
>

--


> On Wed, Feb 22, 2006 at 12:26:47PM -0500, ryan groth wrote:
> > Postgres Explain
>
> We need to see EXPLAIN ANALYZE results here.
>
> What's your work_mem set to?
>
> /* Steinar */
> --
> Homepage: http://www.sesse.net/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>
>

--


Re: Joins and full index scans...mysql vs postgres?

From
"ryan groth"
Date:
workmem is set to the default, increasing it decreases performance.

> Does this work:
>
> "Merge Left Join  (cost=0.00..2656.36 rows=6528 width=1522) (actual
> time=0.057..123.659 rows=6528 loops=1)"
> "  Merge Cond: ("outer".uid = "inner".uid)"
> "  ->  Merge Left Join  (cost=0.00..1693.09 rows=6528 width=1264)
> (actual time=0.030..58.876 rows=6528 loops=1)"
> "        Merge Cond: ("outer".uid = "inner".user_id)"
> "        ->  Index Scan using users_pkey on users  (cost=0.00..763.81
> rows=6528 width=100) (actual time=0.016..9.446 rows=6528 loops=1)"
> "        ->  Index Scan using phorum_users_base_pkey on
> phorum_users_base  (cost=0.00..822.92 rows=9902 width=1168) (actual
> time=0.007..15.674 rows=9845 loops=1)"
> "  ->  Index Scan using useraux_pkey on useraux  (cost=0.00..846.40
> rows=7582 width=262) (actual time=0.007..11.935 rows=7529 loops=1)"
> "Total runtime: 127.442 ms"
>
>
> > On Wed, Feb 22, 2006 at 12:26:47PM -0500, ryan groth wrote:
> > > Postgres Explain
> >
> > We need to see EXPLAIN ANALYZE results here.
> >
> > What's your work_mem set to?
> >
> > /* Steinar */
> > --
> > Homepage: http://www.sesse.net/
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: Have you checked our extensive FAQ?
> >
> >                http://www.postgresql.org/docs/faq
> >
> >
>
> --
>
>
> > On Wed, Feb 22, 2006 at 12:26:47PM -0500, ryan groth wrote:
> > > Postgres Explain
> >
> > We need to see EXPLAIN ANALYZE results here.
> >
> > What's your work_mem set to?
> >
> > /* Steinar */
> > --
> > Homepage: http://www.sesse.net/
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: Have you checked our extensive FAQ?
> >
> >                http://www.postgresql.org/docs/faq
> >
> >
>
> --
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>
>

--


Re: Joins and full index scans...mysql vs postgres?

From
Stephan Szabo
Date:
On Wed, 22 Feb 2006, ryan groth wrote:

> Does this work:
>
> "Merge Left Join  (cost=0.00..2656.36 rows=6528 width=1522) (actual
> time=0.057..123.659 rows=6528 loops=1)"
> "  Merge Cond: ("outer".uid = "inner".uid)"
> "  ->  Merge Left Join  (cost=0.00..1693.09 rows=6528 width=1264)
> (actual time=0.030..58.876 rows=6528 loops=1)"
> "        Merge Cond: ("outer".uid = "inner".user_id)"
> "        ->  Index Scan using users_pkey on users  (cost=0.00..763.81
> rows=6528 width=100) (actual time=0.016..9.446 rows=6528 loops=1)"
> "        ->  Index Scan using phorum_users_base_pkey on
> phorum_users_base  (cost=0.00..822.92 rows=9902 width=1168) (actual
> time=0.007..15.674 rows=9845 loops=1)"
> "  ->  Index Scan using useraux_pkey on useraux  (cost=0.00..846.40
> rows=7582 width=262) (actual time=0.007..11.935 rows=7529 loops=1)"
> "Total runtime: 127.442 ms"

Well, this implies the query took about 127 ms on the server side. Where
did the 1000 ms number come from (was that on a client, and if so, what
type)?

Re: Joins and full index scans...mysql vs postgres?

From
"ryan groth"
Date:
Hmm, it came from the timer on the pgadmin III sql query tool. I guess
the 1,000ms includes the round-trip? See the wierd thing is that
mysqlserver is running default configuration on a virtual machine
(P3/1.3GHZ conf'd for 128mb ram) over a 100m/b ethernet connection.
Postgres is running on a real P4/3.0ghz 4GB running localhost. Timings
from the mysql query tool indicate that the 6.5k record query runs in
"1.3346s (.3361s)" vs. the pgadmin query tool saying that the query runs
"997+3522 ms". Am I reading these numbers wrong? Are these numbers
reflective of application performance? Is there an optimization I am
missing?

Ryan


> On Wed, 22 Feb 2006, ryan groth wrote:
>
> > Does this work:
> >
> > "Merge Left Join  (cost=0.00..2656.36 rows=6528 width=1522) (actual
> > time=0.057..123.659 rows=6528 loops=1)"
> > "  Merge Cond: ("outer".uid = "inner".uid)"
> > "  ->  Merge Left Join  (cost=0.00..1693.09 rows=6528 width=1264)
> > (actual time=0.030..58.876 rows=6528 loops=1)"
> > "        Merge Cond: ("outer".uid = "inner".user_id)"
> > "        ->  Index Scan using users_pkey on users  (cost=0.00..763.81
> > rows=6528 width=100) (actual time=0.016..9.446 rows=6528 loops=1)"
> > "        ->  Index Scan using phorum_users_base_pkey on
> > phorum_users_base  (cost=0.00..822.92 rows=9902 width=1168) (actual
> > time=0.007..15.674 rows=9845 loops=1)"
> > "  ->  Index Scan using useraux_pkey on useraux  (cost=0.00..846.40
> > rows=7582 width=262) (actual time=0.007..11.935 rows=7529 loops=1)"
> > "Total runtime: 127.442 ms"
>
> Well, this implies the query took about 127 ms on the server side. Where
> did the 1000 ms number come from (was that on a client, and if so, what
> type)?
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
>

--


Re: Joins and full index scans...mysql vs postgres?

From
Scott Marlowe
Date:
On Wed, 2006-02-22 at 12:11, ryan groth wrote:
> Does this work:
>
> "Merge Left Join  (cost=0.00..2656.36 rows=6528 width=1522) (actual
> time=0.057..123.659 rows=6528 loops=1)"
> "  Merge Cond: ("outer".uid = "inner".uid)"
> "  ->  Merge Left Join  (cost=0.00..1693.09 rows=6528 width=1264)
> (actual time=0.030..58.876 rows=6528 loops=1)"
> "        Merge Cond: ("outer".uid = "inner".user_id)"
> "        ->  Index Scan using users_pkey on users  (cost=0.00..763.81
> rows=6528 width=100) (actual time=0.016..9.446 rows=6528 loops=1)"
> "        ->  Index Scan using phorum_users_base_pkey on
> phorum_users_base  (cost=0.00..822.92 rows=9902 width=1168) (actual
> time=0.007..15.674 rows=9845 loops=1)"
> "  ->  Index Scan using useraux_pkey on useraux  (cost=0.00..846.40
> rows=7582 width=262) (actual time=0.007..11.935 rows=7529 loops=1)"
> "Total runtime: 127.442 ms"

In MySQL, have you tried writing a short perl or php script or even
timing the mysql client running in one shot mode (I assume it can do
that) from the outside to see how long it takes to actually run the
query AND retrieve the data?

My guess is most of the time for both queries will be taken in
delivering the data.

Re: Joins and full index scans...mysql vs postgres?

From
PFC
Date:
> "997+3522 ms". Am I reading these numbers wrong? Are these numbers
> reflective of application performance? Is there an optimization I am
> missing?

    It also reflects the time it takes to pgadmin to insert the results into
its GUI...

    If you want to get an approximation of the time the server needs to
process your request, without the data marshalling time on the network and
anything, you can either use EXPLAIN ANALYZE (but mysql doesn't have it,
and the instrumentation adds overhead), or simply something like "SELECT
sum(1) FROM (query to benchmark)", which only returns 1 row, and the sum()
overhead is minimal, and it works on most databases. I find it useful
because in knowing which portion of the time is spent by the server
processing the query, or in data transfer, or in data decoding on the
client side, or simply in displaying...

Re: Joins and full index scans...mysql vs postgres?

From
Chris
Date:
ryan groth wrote:
> I am issing a query like this:
> SELECT *
>    FROM users users
>    LEFT JOIN phorum_users_base ON users.uid = phorum_users_base.user_id
>    LEFT JOIN useraux ON useraux.uid = users.uid;
>


I'm not sure if postgres would rewrite your query to do the joins
properly, though I guess someone else might've already suggested this :)


I'm probably wrong but I read that as:

join users -> phorum_users_base (ON users.uid = phorum_users_base.user_id)

join phorum_users_base -> useraux (ON useraux.uid = users.uid) which
won't be indexable because u.uid doesn't exist in phorum_users_base.



Try

SELECT *
FROM users users
LEFT JOIN phorum_users_base ON users.uid = phorum_users_base.user_id
LEFT JOIN useraux ON useraux.uid = phorum_users_base.user_id

or

SELECT *
FROM users u, phorum_users_base pub, useraux ua WHERE u.uid =
pub.user_id AND au.uid = u.uid AND pub.user_id=au.uid;


--
Postgresql & php tutorials
http://www.designmagick.com/

Re: Joins and full index scans...mysql vs postgres?

From
Christopher Kings-Lynne
Date:
The pgAdmin query tool is known to give an answer about 5x the real
answer - don't believe it!

ryan groth wrote:
> Hmm, it came from the timer on the pgadmin III sql query tool. I guess
> the 1,000ms includes the round-trip? See the wierd thing is that
> mysqlserver is running default configuration on a virtual machine
> (P3/1.3GHZ conf'd for 128mb ram) over a 100m/b ethernet connection.
> Postgres is running on a real P4/3.0ghz 4GB running localhost. Timings
> from the mysql query tool indicate that the 6.5k record query runs in
> "1.3346s (.3361s)" vs. the pgadmin query tool saying that the query runs
> "997+3522 ms". Am I reading these numbers wrong? Are these numbers
> reflective of application performance? Is there an optimization I am
> missing?
>
> Ryan
>
>
>> On Wed, 22 Feb 2006, ryan groth wrote:
>>
>>> Does this work:
>>>
>>> "Merge Left Join  (cost=0.00..2656.36 rows=6528 width=1522) (actual
>>> time=0.057..123.659 rows=6528 loops=1)"
>>> "  Merge Cond: ("outer".uid = "inner".uid)"
>>> "  ->  Merge Left Join  (cost=0.00..1693.09 rows=6528 width=1264)
>>> (actual time=0.030..58.876 rows=6528 loops=1)"
>>> "        Merge Cond: ("outer".uid = "inner".user_id)"
>>> "        ->  Index Scan using users_pkey on users  (cost=0.00..763.81
>>> rows=6528 width=100) (actual time=0.016..9.446 rows=6528 loops=1)"
>>> "        ->  Index Scan using phorum_users_base_pkey on
>>> phorum_users_base  (cost=0.00..822.92 rows=9902 width=1168) (actual
>>> time=0.007..15.674 rows=9845 loops=1)"
>>> "  ->  Index Scan using useraux_pkey on useraux  (cost=0.00..846.40
>>> rows=7582 width=262) (actual time=0.007..11.935 rows=7529 loops=1)"
>>> "Total runtime: 127.442 ms"
>> Well, this implies the query took about 127 ms on the server side. Where
>> did the 1000 ms number come from (was that on a client, and if so, what
>> type)?
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 6: explain analyze is your friend
>>
>>
>


Re: Joins and full index scans...mysql vs postgres?

From
Andreas Pflug
Date:
Christopher Kings-Lynne wrote:
> The pgAdmin query tool is known to give an answer about 5x the real
> answer - don't believe it!

Everybody please forget immediately the factor 5. It's no factor at all,
but the GUI update time that is *added*, which depends on rows*columns.


> ryan groth wrote:
>
>> the pgadmin query tool saying that the query runs
>> "997+3522 ms".

Means 997ms until all data is at the client (libpq reports the rowset),
the rest is GUI overhead.

Regards,
Andreas