Thread: Most efficient way of querying M 'related' tables where N out of M may contain the key

Most efficient way of querying M 'related' tables where N out of M may contain the key

From

Stephane Bailliez

Date:

21 August 2015, 00:03:38

Pretty bad subject description... but let me try to explain.

I'm trying to figure out what would be the most efficient way to query data from multiple tables using a foreign key.

Right now the table design is such that I have multiple tables that share some common information, and some specific information. (in OO world we could see them as derived tables) For the sake of simplicity let's assume there are only 5,

table type1(int id, varchar(24) reference_id, ....specific columns)
table type2(int id, varchar(24) reference_id, ....specific columns)
table type3(int id, varchar(24) reference_id, ....specific columns)
table type4(int id, varchar(24) reference_id, ....specific columns)
table type5(int id, varchar(24) reference_id, ....specific columns)

NB: you could imagine those 5 tables inheriting from a base_type table that shares a few attributes.

I have a list of reference ids, those reference ids could be in any of those 5 tables but I don't know in which one.

I want to most efficiently retrieve the data on N out of 5 relevant tables but still want to query individually those 5 tables (for orm simplicity reason).

So the idea is first to identify which tables I should query for.

The naive implementation would be to always query those 5 tables for the list of reference ids, however 90% of the time the reference ids would only be stored in one single table though. So 4/5th of the queries would then be irrelevant.

what I initially came up with was doing a union of the tables such as:

SELECT 'type1', id FROM type5 WHERE reference_id IN (....)
UNION
SELECT 'type2', id FROM type4 WHERE reference_id IN (....)

UNION
...
SELECT 'type2', id FROM type3 WHERE reference_id IN (....)

then effectively figuring the list of which reference ids are in type1, type2, type3, ...etc..

and then issuing the right select to the tables for the related reference ids.

which means in best case scenario I would only do 2 queries instead of 5.
1 to retrieve the list of reference ids per 'type'
1 to retrieve the list of types with the corresponding reference ids

I'm trying to figure out if there is a more efficient way to retrieve this information than doing a union across all tables (there can be a couple hundreds reference ids to query for in the list)

I was thinking worse case scenario I could maintain this information in another table via triggers to avoid doing this union, but that seems a bit of a hammer solution initially and wondering if there is not something simpler via joins that could be more performant.

Thanks for any suggestions.

Re: Most efficient way of querying M 'related' tables where N out of M may contain the key

From

"David G. Johnston"

Date:

21 August 2015, 00:19:27

On Thu, Aug 20, 2015 at 8:03 PM, Stephane Bailliez <sbailliez@gmail.com> wrote:

Pretty bad subject description... but let me try to explain.

I'm trying to figure out what would be the most efficient way to query data from multiple tables using a foreign key.

SELECT [...]

FROM (SELECT reference_id, [...] FROM table_where_referenced_id_is_a_pk WHERE reference_id EXISTS/IN/JOIN)

src

LEFT JOIN type1 USING (reference_id)

LEFT JOIN type2 USING (reference_id)

[...]

Or consider whether PostgreSQL Inheritance would work - though basically its a friendly API over the "UNION ALL" query you proposed.

David J.

Re: Most efficient way of querying M 'related' tables where N out of M may contain the key

From

Stephane Bailliez

Date:

21 August 2015, 12:07:31

On Thu, Aug 20, 2015 at 8:19 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:

SELECT [...]
FROM (SELECT reference_id, [...] FROM table_where_referenced_id_is_a_pk WHERE reference_id EXISTS/IN/JOIN)

src
LEFT JOIN type1 USING (reference_id)
LEFT JOIN type2 USING (reference_id)
[...]

there are no tables where reference_id is a pk, I could create one or do : select reference_id from ( values (..), (...), (...) .... )

the tricky part with the join (and where I was not clear about it in my original description) is that a reference_id can match in multiple tables (eg. it can be a fk in type1 and type2), so it then becomes a bit harder to collect all the common attributes and 'types' when doing joins like this.

For example let's assume there is a group_id to be be retrieved among all tables as a common attribute:

if reference_id was existing only in one table, I could do coalesce(type1.group_id, ... type5.group_id) as group_id in the main select

however that would not work in this case.

I could work around the common attributes however.

But for retrieving the types, what I really need to have as a return of this query is data that allows me to partition the reference_id for each type like:

type1 -> ref1, ref2, ref5

type2 -> ref1, ref3

type3 -> ref4, ref3

I guess I could try to return an array and fill it with case/when for each table eg. something like

ARRAY( CASE WHEN type1.id IS NOT NULL THEN 'type1' END, ... CASE WHEN type1.id IS NOT NULL THEN 'type5' END)

and then collect all the non-null values in the code.

Or consider whether PostgreSQL Inheritance would work - though basically its a friendly API over the "UNION ALL" query you proposed.

The problem with postgresql inheritance is that it would not play well with the orm and substantially complicates implementation.

Thanks for the all the ideas, that helps me a lot to brainstorm more.

Re: Most efficient way of querying M 'related' tables where N out of M may contain the key

From

"David G. Johnston"

Date:

21 August 2015, 12:24:43

On Fri, Aug 21, 2015 at 8:07 AM, Stephane Bailliez <sbailliez@gmail.com> wrote:

On Thu, Aug 20, 2015 at 8:19 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:

SELECT [...]
FROM (SELECT reference_id, [...] FROM table_where_referenced_id_is_a_pk WHERE reference_id EXISTS/IN/JOIN)

src
LEFT JOIN type1 USING (reference_id)
LEFT JOIN type2 USING (reference_id)
[...]

Place ^ in a CTE named (find_all)

there are no tables where reference_id is a pk, I could create one or do : select reference_id from ( values (..), (...), (...) .... )

the tricky part with the join (and where I was not clear about it in my original description) is that a reference_id can match in multiple tables (eg. it can be a fk in type1 and type2), so it then becomes a bit harder to collect all the common attributes and 'types' when doing joins like this.

For example let's assume there is a group_id to be be retrieved among all tables as a common attribute:

if reference_id was existing only in one table, I could do coalesce(type1.group_id, ... type5.group_id) as group_id in the main select
however that would not work in this case.

WITH find_all (reference_id, type_identifier, type_id) AS ( ... )

SELECT type_identifier, array_agg(reference_id), array_agg(type_id)

FROM find_all

WHERE type_identifier IS NOT NULL

GROUP BY type_identifier

find_all will return at least one row, possibly empty if no matches are present, and will return multiple rows if more than one matches. You can use array_agg as shown, or play around with custom composite types, or even build a JSON document.

David J.