Thread: Many to many join seems slow?

Many to many join seems slow?

From
Drew Wilson
Date:
I'm trying to debug a query that gets all the French translations for
all US string values. Ultimately, my goal is to rank them all by edit
distance, and only pick the top N.

However, I cannot get the basic many-to-many join to return all the
results in less than 3 seconds, which seems slow to me. (My
competition is an in-memory perl hash that runs on client machines
providing results in around 3 seconds, after a 30 second startup time.)

The simplified schema is :
    source ->> translation_pair <<- translation

The keys are all sequence generated oids. I do wonder if the
performance would be better if I used the string values as keys to
get better data distribution. Would this help speed up performance?

There are 159283 rows in source
There are 1723935 rows in translation, of which 159686 are French

=# explain SELECT s.source_id, s.value AS sourceValue, t.value AS
translationValue
       FROM
           source s,
           translation_pair tp,
           translation t,
           language l
       WHERE
           s.source_id = tp.source_id
           AND tp.translation_id = t.translation_id
           AND t.language_id = l.language_id
           AND l.name = 'French' ;

                                                          QUERY PLAN
------------------------------------------------------------------------
-----------------------------------------------------
Merge Join  (cost=524224.49..732216.29 rows=92447 width=97)
    Merge Cond: (tp.source_id = s.source_id)
    ->  Sort  (cost=524224.49..524455.60 rows=92447 width=55)
          Sort Key: tp.source_id
          ->  Nested Loop  (cost=1794.69..516599.30 rows=92447 width=55)
                ->  Nested Loop  (cost=1794.69..27087.87 rows=86197
width=55)
                      ->  Index Scan using language_name_key on
"language" l  (cost=0.00..8.27 rows=1 width=4)
                            Index Cond: ((name)::text = 'French'::text)
                      ->  Bitmap Heap Scan on translation t
(cost=1794.69..25882.43 rows=95774 width=59)
                            Recheck Cond: (t.language_id =
l.language_id)
                            ->  Bitmap Index Scan on
translation_language_l_key  (cost=0.00..1770.74 rows=95774 width=0)
                                  Index Cond: (t.language_id =
l.language_id)
                ->  Index Scan using translation_pair_translation_id
on translation_pair tp  (cost=0.00..5.67 rows=1 width=8)
                      Index Cond: (tp.translation_id = t.translation_id)
    ->  Index Scan using source_pkey on source s
(cost=0.00..206227.65 rows=159283 width=46)
(15 rows)

I'm running Postgres 8.2.3 on latest Mac OSX 10.4.x. The CPU is a
3Ghz Dual-Core Intel Xeon, w/ 5G ram. The drive is very fast although
I don't know the configuration (I think its an XRaid w/ 3 SAS/SCSI
70G Seagate drives).

The regular performance configurable values are:
work_mem           32MB
shared_buffers     32MB
max_fsm_pages      204800
max_fsm_relations  1000


Thanks for any advice,

Drew

Re: Many to many join seems slow?

From
Alvaro Herrera
Date:
Drew Wilson escribió:

> =# explain SELECT s.source_id, s.value AS sourceValue, t.value AS
> translationValue
>       FROM
>           source s,
>           translation_pair tp,
>           translation t,
>           language l
>       WHERE
>           s.source_id = tp.source_id
>           AND tp.translation_id = t.translation_id
>           AND t.language_id = l.language_id
>           AND l.name = 'French' ;

Please provide an EXPLAIN ANALYZE of the query.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Many to many join seems slow?

From
Drew Wilson
Date:
> Please provide an EXPLAIN ANALYZE of the query.

Oops, sorry about that.

=# EXPLAIN ANALYZE SELECT s.source_id, s.value as sourceValue,
t.value as translationValue
-#     FROM
-#         source s,
-#     translation_pair tp,
-#     translation t,
-#     language l
-#     WHERE
-#     s.source_id = tp.source_id
-#     AND tp.translation_id = t.translation_id
-#     AND t.language_id = l.language_id
-#     AND l.name = 'French' ;

           QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------------
Merge Join  (cost=524224.49..732216.29 rows=92447 width=97) (actual
time=1088.871..1351.840 rows=170759 loops=1)
    Merge Cond: (tp.source_id = s.source_id)
    ->  Sort  (cost=524224.49..524455.60 rows=92447 width=55) (actual
time=1088.774..1113.483 rows=170759 loops=1)
          Sort Key: tp.source_id
          ->  Nested Loop  (cost=1794.69..516599.30 rows=92447
width=55) (actual time=23.252..929.847 rows=170759 loops=1)
                ->  Nested Loop  (cost=1794.69..27087.87 rows=86197
width=55) (actual time=23.194..132.139 rows=159686 loops=1)
                      ->  Index Scan using language_name_key on
"language" l  (cost=0.00..8.27 rows=1 width=4) (actual
time=0.030..0.031 rows=1 loops=1)
                            Index Cond: ((name)::text = 'French'::text)
                      ->  Bitmap Heap Scan on translation t
(cost=1794.69..25882.43 rows=95774 width=59) (actual
time=23.155..95.227 rows=159686 loops=1)
                            Recheck Cond: (t.language_id =
l.language_id)
                            ->  Bitmap Index Scan on
translation_language_l_key  (cost=0.00..1770.74 rows=95774 width=0)
(actual time=22.329..22.329 rows=159686 loops=1)
                                  Index Cond: (t.language_id =
l.language_id)
                ->  Index Scan using translation_pair_translation_id
on translation_pair tp  (cost=0.00..5.67 rows=1 width=8) (actual
time=0.004..0.004 rows=1 loops=159686)
                      Index Cond: (tp.translation_id = t.translation_id)
    ->  Index Scan using source_pkey on source s
(cost=0.00..206227.65 rows=159283 width=46) (actual
time=0.086..110.564 rows=186176 loops=1)
Total runtime: 1366.757 ms
(16 rows)

On May 15, 2007, at 7:05 AM, Alvaro Herrera wrote:

> Drew Wilson escribió:
>
>> =# explain SELECT s.source_id, s.value AS sourceValue, t.value AS
>> translationValue
>>       FROM
>>           source s,
>>           translation_pair tp,
>>           translation t,
>>           language l
>>       WHERE
>>           s.source_id = tp.source_id
>>           AND tp.translation_id = t.translation_id
>>           AND t.language_id = l.language_id
>>           AND l.name = 'French' ;
>
> Please provide an EXPLAIN ANALYZE of the query.
>
> --
> Alvaro Herrera                                http://
> www.CommandPrompt.com/
> The PostgreSQL Company - Command Prompt, Inc.


Re: Many to many join seems slow?

From
Heikki Linnakangas
Date:
Drew Wilson wrote:
> Merge Join  (cost=524224.49..732216.29 rows=92447 width=97) (actual
> time=1088.871..1351.840 rows=170759 loops=1)
> ...
> Total runtime: 1366.757 ms

It looks like the query actual runs in less than 3 seconds, but it takes
some time to fetch 170759 rows to the client.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Many to many join seems slow?

From
"Daniel Cristian Cruz"
Date:
2007/5/15, Drew Wilson <drewmwilson@gmail.com>:
> =# explain SELECT s.source_id, s.value AS sourceValue, t.value AS
> translationValue
>        FROM
>            source s,
>            translation_pair tp,
>            translation t,
>            language l
>        WHERE
>            s.source_id = tp.source_id
>            AND tp.translation_id = t.translation_id
>            AND t.language_id = l.language_id
>            AND l.name = 'French' ;
>
>                                                           QUERY PLAN
> ------------------------------------------------------------------------
> -----------------------------------------------------
> Merge Join  (cost=524224.49..732216.29 rows=92447 width=97)

This way you get all word matches for the French language. Shouldn't
it be all matches for a specific word (s.value = 'word' in WHERE)?

--
Daniel Cristian Cruz

Re: Many to many join seems slow?

From
Drew Wilson
Date:
Yes, I'll be filtering by string value. However, I just wanted to see
how long it takes to scan all translations in a particular language.

Drew

On May 15, 2007, at 9:00 AM, Daniel Cristian Cruz wrote:

> 2007/5/15, Drew Wilson <drewmwilson@gmail.com>:
>> =# explain SELECT s.source_id, s.value AS sourceValue, t.value AS
>> translationValue
>>        FROM
>>            source s,
>>            translation_pair tp,
>>            translation t,
>>            language l
>>        WHERE
>>            s.source_id = tp.source_id
>>            AND tp.translation_id = t.translation_id
>>            AND t.language_id = l.language_id
>>            AND l.name = 'French' ;
>>
>>                                                           QUERY PLAN
>> ---------------------------------------------------------------------
>> ---
>> -----------------------------------------------------
>> Merge Join  (cost=524224.49..732216.29 rows=92447 width=97)
>
> This way you get all word matches for the French language. Shouldn't
> it be all matches for a specific word (s.value = 'word' in WHERE)?
>
> --
> Daniel Cristian Cruz


Re: Many to many join seems slow?

From
Drew Wilson
Date:
You're right. If I redirect output to /dev/null, the query completes
in 1.4s.

# \o /dev/null
# SELECT s.source_id, s.value as sourceValue, t.value as
translationValue...
...
Time: 1409.557 ms
#

That'll do for now.

Thanks,

Drew

On May 15, 2007, at 7:17 AM, Heikki Linnakangas wrote:

> Drew Wilson wrote:
>> Merge Join  (cost=524224.49..732216.29 rows=92447 width=97)
>> (actual time=1088.871..1351.840 rows=170759 loops=1)
>> ...
>> Total runtime: 1366.757 ms
>
> It looks like the query actual runs in less than 3 seconds, but it
> takes some time to fetch 170759 rows to the client.
>
> --
>   Heikki Linnakangas
>   EnterpriseDB   http://www.enterprisedb.com