Re: Slow join over three tables - Mailing list pgsql-general

From Alban Hertroys
Subject Re: Slow join over three tables
Date
Msg-id 792DDB09-CCAC-41EB-9127-0425E84F06B6@gmail.com
Whole thread Raw
In response to Re: Slow join over three tables  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Slow join over three tables
Re: Slow join over three tables
List pgsql-general
> On 27 Apr 2016, at 4:09, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> On 27 April 2016 at 11:27, Tim van der Linden <tim@shisaa.jp> wrote:

>> The query:
>>
>> SELECT r.id, r.age, r.gender, r.created, a.adverse, d.drug
>> FROM reports r
>> JOIN report_drugs d ON d.rid = r.id
>> JOIN report_adverses a ON a.rid = r.id
>> WHERE a.adverse = ANY (ARRAY['back pain - nonspecific', 'nonspecific back pain', 'back pain'])
>> AND d.drug = ANY (ARRAY[359, 360, 361, 362, 363]) ORDER BY r.created;
>>

> Likely the only way to speed this up would be to create indexes;
>
> create index on reports (id, age, gender, created);

Since you're not using age and gender in this (particular) query until the rows are combined into a result set already,
itdoesn't make a whole lot of sense to add them to the index. Moreover, since your'e ordering by created, I'd at least
putthat directly after id: 

    create index on reports (id, created);

> create index on report_adverses (rid, adverse);

In this case, you're using the values in adverse to filter relevant rid's for the FK join, so you might be better off
withthe inverse of above index: 
    create index on report_adverses (adverse, rid);
    create index on report_drugs (drug, rid);

Do these tables have a primary key and can that be used here?

I hope I'm not being superfluous here, but don't forget to vacuum analyze after creating those indexes. Eventually
autovacuumwill do that for you, but you could otherwise be running tests to verify the impact of adding those indexes
beforeautovacuum gets around to it. 

Finally, perhaps it's more efficient to weed out all unrelated drugs and adverses before relating them to reports,
somethinglike: 

SELECT r.id, r.age, r.gender, r.created, x.adverse, x.drug
FROM reports r
JOIN (
    SELECT d.rid, a.adverse, d.drug
    FROM report_drugs d
    JOIN report_adverses a ON a.rid = d.rid
    WHERE a.adverse = ANY (ARRAY['back pain - nonspecific', 'nonspecific back pain', 'back pain'])
    AND d.drug = ANY (ARRAY[359, 360, 361, 362, 363])
) x ON x.rid = r.id
ORDER BY r.created;

Looking at the cardinality of your tables that does seem a bit unlikely though. Still, worth a shot...

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



pgsql-general by date:

Previous
From: Tim van der Linden
Date:
Subject: Re: Slow join over three tables
Next
From: Tim van der Linden
Date:
Subject: Re: Slow join over three tables