Re: Most efficient way of querying M 'related' tables where N out of M may contain the key - Mailing list pgsql-performance

From David G. Johnston
Subject Re: Most efficient way of querying M 'related' tables where N out of M may contain the key
Date
Msg-id CAKFQuwZhfAGJrhcQ0F-F9VHqyX_CAaSku5Dk8jm+YWh9hWWYDA@mail.gmail.com
Whole thread Raw
In response to Re: Most efficient way of querying M 'related' tables where N out of M may contain the key  (Stephane Bailliez <sbailliez@gmail.com>)
List pgsql-performance
On Fri, Aug 21, 2015 at 8:07 AM, Stephane Bailliez <sbailliez@gmail.com> wrote:

On Thu, Aug 20, 2015 at 8:19 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:

​SELECT [...]
FROM (SELECT reference_id, [...] FROM table_where_referenced_id_is_a_pk WHERE reference_id EXISTS/IN/JOIN)​
 
​src
​LEFT JOIN type1 USING (reference_id)
LEFT JOIN type2 USING (reference_id)
[...]


​Place ^ in a CTE named (find_all)​


there are no tables where reference_id is a pk, I could create one or do : select reference_id from ( values (..), (...), (...) .... )

the tricky part with the join (and where I was not clear about it in my original description) is that a reference_id  can match in multiple tables (eg. it can be a fk in type1 and type2), so it then becomes a bit harder to collect all the common attributes and 'types' when doing joins like this.

For example let's assume there is a group_id to be be retrieved among all tables as a common attribute:

if reference_id was existing only in one table, I could do coalesce(type1.group_id, ... type5.group_id) as group_id in the main select
however that would not work in this case.


​WITH find_all (reference_id, type_identifier, type_id) AS ( ... )
SELECT ​type_identifier, array_agg(reference_id), array_agg(type_id)
FROM find_all
WHERE type_identifier IS NOT NULL
GROUP BY type_identifier

​find_all will return at least one row, possibly empty if no matches are present, and will return multiple rows if more than one matches.  You can use array_agg as shown, or play around with custom composite types, ​or even build a JSON document.

David J.
 

pgsql-performance by date:

Previous
From: Stephane Bailliez
Date:
Subject: Re: Most efficient way of querying M 'related' tables where N out of M may contain the key
Next
From: Genc, Ömer
Date:
Subject: Performance bottleneck due to array manipulation