Thread: Performance issue in foreign-key-aware join estimation

Performance issue in foreign-key-aware join estimation

From
Tom Lane
Date:
In connection with David Rowley's proposal to change bitmapset.c to use
64-bit words, I dug out an old test case I had for a complex-to-plan query
(attached).  Andres Freund posted this to the lists perhaps ten years ago,
though I can't locate the original posting right now.

I was distressed to discover via perf that 69% of the runtime of this
test now goes into match_eclasses_to_foreign_key_col().  That seems
clearly unacceptable.  There aren't an unreasonable number of foreign key
constraints in the underlying schema --- mostly one per table, and there's
only one table with as many as 3.  However, poking around in the planner
data structures, it turns out there are:

888 base relations

1005 EquivalenceClasses

167815 fkey_list entries initially

690 fkey_list entries after match_foreign_keys_to_quals trims them

So the reason match_eclasses_to_foreign_key_col is so dominant in the
runtime is it's invoked 167815 times and has to scan a thousand
EquivalenceClasses (unsuccessfully) on most of those calls.

How did the fkey_list get that big?  I think the issue is that the
query touches the same tables many many times (888 baserels, but
there are only 20 distinct tables in the schema) and we get an
O(N^2) growth in the apparent number of FKs.

Clearly, we ought to rethink that data structure.  I'm not sure
offhand how to make it better, but this is pretty awful.

Perhaps there'd also be some use in having better indexing for
the EquivalenceClass list, but again I'm not sure what that'd
look like.

            regards, tom lane

\set ON_ERROR_STOP on

DROP SCHEMA IF EXISTS test_data CASCADE;
DROP SCHEMA IF EXISTS test_view CASCADE;
CREATE SCHEMA test_data;
CREATE SCHEMA test_view;
SET SEARCH_PATH = test_data, test_view;

CREATE TABLE proband (
       proband_id bigserial PRIMARY KEY
);

CREATE TABLE sample (
       sample_id bigserial PRIMARY KEY
);


CREATE TABLE proband__sample (
       proband_id bigint NOT NULL REFERENCES proband,
       sample_id bigint NOT NULL REFERENCES sample,

       UNIQUE(proband_id, sample_id)
);

CREATE TABLE project (
      project_id bigserial PRIMARY KEY,
      name text NOT NULL UNIQUE
);

/*
 * Stuff like:
 *     -Double Entry
 *     -Single Entry
 *     -Machine Read
 *     - ...
 */
CREATE TABLE data_quality (
       data_quality_id bigserial PRIMARY KEY,
       name text NOT NULL UNIQUE,
       description text
);


CREATE TABLE information_set (
       information_set_id bigserial PRIMARY KEY,
       name text NOT NULL UNIQUE,
       description text
);

CREATE TABLE information (
       information_id bigserial PRIMARY KEY,
       information_set_id bigint NOT NULL REFERENCES information_set,
       name text NOT NULL UNIQUE,
       description text
);
CREATE INDEX information__information_set_id ON information (information_set_id);



CREATE TABLE information_set_instance (
       information_set_instance_id bigserial PRIMARY KEY,
       information_set_id bigint NOT NULL REFERENCES information_set
);
CREATE INDEX information_set_instance__information_set_id ON information_set_instance (information_set_id);

CREATE TABLE information_set_instance__proband (
       information_set_instance_id bigint NOT NULL REFERENCES information_set_instance,
       proband_id bigint NOT NULL REFERENCES proband,
       UNIQUE (information_set_instance_id, proband_id)
);
CREATE INDEX information_set_instance__proband__information_set_id ON information_set_instance__proband
(information_set_instance_id);
CREATE INDEX information_set_instance__proband__proband_id ON information_set_instance__proband (proband_id);

CREATE TABLE information_set_instance__sample (
       information_set_instance_id bigint NOT NULL REFERENCES information_set_instance,
       sample_id bigint NOT NULL REFERENCES sample,
       UNIQUE (information_set_instance_id, sample_id)
);

CREATE INDEX information_set_instance__sample__information_set_id ON information_set_instance__sample
(information_set_instance_id);
CREATE INDEX information_set_instance__sample__sample_id ON information_set_instance__sample (sample_id);

CREATE TABLE information_instance (
       information_instance_id bigserial PRIMARY KEY,
       information_set_instance_id bigint NOT NULL REFERENCES information_set_instance,
       information_id bigint NOT NULL REFERENCES information,
       data_quality_id int REFERENCES data_quality
);
CREATE INDEX information_instance__information_set_instance_id ON
information_set_instance(information_set_instance_id);
CREATE INDEX information_instance__information_id ON information(information_id);

CREATE TABLE information_about_tnm (
       information_about_tnm bigserial PRIMARY KEY,
       information_instance_id bigint REFERENCES information_instance,
       t int,
       n int,
       m int
);
CREATE INDEX information_about_tnm__information_instance_id ON information_about_tnm(information_instance_id);


CREATE TABLE information_about_sequenced_data (
       information_about_sequenced_data bigserial PRIMARY KEY,
       information_instance_id bigint REFERENCES information_instance,
       filename text,
       data text
);
CREATE INDEX information_about_sequenced_data__information_instance_id ON
information_about_sequenced_data(information_instance_id);


CREATE TABLE information_about_time (
       information_about_sequenced_data bigserial PRIMARY KEY,
       information_instance_id bigint REFERENCES information_instance,
       data timestamp
);
CREATE INDEX information_about_time__information_instance_id ON information_about_time(information_instance_id);

CREATE TABLE information_about_text (
       information_about_text_data bigserial PRIMARY KEY,
       information_instance_id bigint REFERENCES information_instance,
       data text
);
CREATE INDEX information_about_text__information_instance_id ON information_about_text(information_instance_id);

/*
 * Normally refers to a table containing icd10 classiffication codes
 */
CREATE TABLE information_about_icd10_classification (
       information_about_icd10_classification_data bigserial PRIMARY KEY,
       information_instance_id bigint REFERENCES information_instance,
       data int
);
CREATE INDEX information_about_icd10_classification__information_instance_id ON
information_about_icd10_classification(information_instance_id);

CREATE TABLE information_about_location (
       information_about_allowance bigserial PRIMARY KEY,
       information_instance_id bigint REFERENCES information_instance,
       street text,
       building text
       -- Additional Attributes
);
CREATE INDEX information_about_location__information_instance_id ON
information_about_location(information_instance_id);


CREATE TABLE information_about_allowance (
       information_about_allowance bigserial PRIMARY KEY,
       information_instance_id bigint REFERENCES information_instance,
       granted bool
);
CREATE INDEX information_about_allowance__information_instance_id ON
information_about_allowance(information_instance_id);

CREATE TABLE information_about_clinic (
       information_about_clinic bigserial PRIMARY KEY,
       information_instance_id bigint REFERENCES information_instance,
       clinic_id int
);
CREATE INDEX information_about_clinic__information_instance_id ON information_about_allowance(information_instance_id);

CREATE TABLE information_about_personel (
       information_about_personel bigserial PRIMARY KEY,
       information_instance_id bigint REFERENCES information_instance,
       name text
);
CREATE INDEX information_about_personel__information_instance_id ON
information_about_allowance(information_instance_id);

/*
 * Lots of other
 * information_about_* tables
 */

CREATE TABLE information_about_placeholder (
       information_about_personel bigserial PRIMARY KEY,
       information_instance_id bigint REFERENCES information_instance,
       data text
);
CREATE INDEX information_about_placeholder__information_instance_id ON
information_about_allowance(information_instance_id);


/*
 * Views
 */

DROP SCHEMA IF EXISTS test_view CASCADE;
CREATE SCHEMA test_view;
SET SEARCH_PATH = test_view, test_data;


CREATE OR REPLACE VIEW information_set_completition_status AS
SELECT
    information_set.information_set_id,
    information_set_instance.information_set_instance_id,
    COUNT(information.information_id) AS possible_information_nr,
    COUNT(information_instance.information_id) AS available_information_nr
FROM
    information_set_instance
    JOIN information_set USING(information_set_id)
    JOIN information USING(information_set_id)
    LEFT JOIN information_instance USING(information_id)
GROUP BY
    information_set.information_set_id,
    information_set_instance.information_set_instance_id;


CREATE OR REPLACE VIEW information_generic_allowance AS
SELECT
    information_set_instance__proband.proband_id

    ,information_set.information_set_id
    ,information_set_instance.information_set_instance_id

    ,generic_allowance.granted AS generic_allowance_granted
    ,anonymized_use_noncommercial_research.granted AS anonymized_use_noncommercial_research_granted
    ,anonymized_use_commercial_research.granted AS anonymized_use_commercial_research_granted
    ,contact_in_noncommercial_research.granted AS contact_in_noncommercial_research_granted
    ,contact_in_commercial_research.granted AS contact_in_commercial_research_granted
    ,information_6.granted AS information_6_granted
    ,information_7.granted AS information_7_granted
    ,information_8.granted AS information_8_granted
    ,information_9.granted AS information_9_granted
    ,information_10.granted AS information_10_granted
    ,information_11.granted AS information_11_granted

FROM
    information_set_instance__proband
    JOIN information_set_instance USING (information_set_instance_id)
    JOIN information_set USING (information_set_id)

    LEFT JOIN (
        SELECT
            information.information_set_id,
            information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'generic_allowance'
    ) AS generic_allowance
    USING (information_set_id)

    LEFT JOIN (
        SELECT
            information.information_set_id,
            information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'anonymized_use_noncommercial_research'
    ) AS anonymized_use_noncommercial_research
    USING (information_set_id)

    LEFT JOIN (
        SELECT
            information.information_set_id,
            information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'anonymized_use_commercial_research'
    ) AS anonymized_use_commercial_research
    USING (information_set_id)
    LEFT JOIN (
        SELECT
            information.information_set_id,
            information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'contact_in_noncommercial_research'
    ) AS contact_in_noncommercial_research
    USING (information_set_id)

    LEFT JOIN (
        SELECT
            information.information_set_id,
            information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'contact_in_commercial_research'
    ) AS contact_in_commercial_research
    USING (information_set_id)

    /*
     * All further joins have pointless names to ease the writing
     */
    LEFT JOIN (
        SELECT
            information.information_set_id,
            information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_5'
    ) AS information_6
    USING (information_set_id)

    LEFT JOIN (
        SELECT
            information.information_set_id,
            information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_6'
    ) AS information_7
    USING (information_set_id)

    LEFT JOIN (
        SELECT
            information.information_set_id,
            information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_7'
    ) AS information_8
    USING (information_set_id)

    LEFT JOIN (
        SELECT
            information.information_set_id,
            information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_8'
    ) AS information_9
    USING (information_set_id)

    LEFT JOIN (
        SELECT
            information.information_set_id,
            information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_9'
    ) AS information_10
    USING (information_set_id)

    LEFT JOIN (
        SELECT
            information.information_set_id,
            information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_10'
    ) AS information_11
    USING (information_set_id)

    LEFT JOIN (
        SELECT
            information.information_set_id,
            information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_11'
    ) AS information_12
    USING (information_set_id)
WHERE TRUE
    AND information_set.name = 'generic_allowance_v3';


CREATE OR REPLACE VIEW information_genetic_allowance AS
SELECT
    information_set_instance__proband.proband_id

    ,information_set.information_set_id
    ,information_set_instance.information_set_instance_id

    ,notification_incurable_illness_found.granted AS notification_incurable_illness_found_granted
    ,notification_curable_illness_found.granted AS notification_curable_illness_found_granted
    ,notification_inheritable_illness_found.granted AS notification_inheritable_illness_found_granted
    ,use_of_genetic_data_if_ill_internal.granted AS use_of_genetic_data_if_ill_internal_granted
    ,use_of_genetic_data_if_ill_external.granted AS use_of_genetic_data_if_ill_external_granted
    ,information_6.granted AS information_6_granted
    ,information_7.granted AS information_7_granted
    ,information_8.granted AS information_8_granted
    ,information_9.granted AS information_9_granted
    ,information_10.granted AS information_10_granted
    ,information_11.granted AS information_11_granted
    ,information_12.granted AS information_12_granted
    ,information_13.granted AS information_13_granted
    ,information_14.granted AS information_14_granted
    ,information_15.granted AS information_15_granted
    ,information_16.granted AS information_16_granted
    ,information_17.granted AS information_17_granted
    ,information_18.granted AS information_18_granted
    ,information_19.granted AS information_19_granted
    ,information_20.granted AS information_20_granted
    ,information_21.granted AS information_21_granted


FROM
    information_set_instance__proband
    JOIN information_set_instance USING (information_set_instance_id)
    JOIN information_set USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'notification_incurable_illness_found'
    ) AS notification_incurable_illness_found
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'notification_curable_illness_found'
    ) AS notification_curable_illness_found
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'notification_inheritable_illness_found'
    ) AS notification_inheritable_illness_found
    USING (information_set_id)
    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'use_of_genetic_data_if_ill_internal'
    ) AS use_of_genetic_data_if_ill_internal
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'use_of_genetic_data_if_ill_external'
    ) AS use_of_genetic_data_if_ill_external
    USING (information_set_id)

    /*
     * All further joins have pointless names to ease the writing
     */
    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_6'
    ) AS information_6
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_7'
    ) AS information_7
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_8'
    ) AS information_8
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_9'
    ) AS information_9
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_10'
    ) AS information_10
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_11'
    ) AS information_11
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_12'
    ) AS information_12

    USING (information_set_id)
    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_13'
    ) AS information_13
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_14'
    ) AS information_14
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_15'
    ) AS information_15
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_16'
    ) AS information_16
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_17'
    ) AS information_17
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_18'
    ) AS information_18
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_19'
    ) AS information_19
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_20'
    ) AS information_20
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_allowance.granted
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_allowance USING (information_instance_id)
        WHERE true
            AND information.name = 'information_21'
    ) AS information_21
    USING (information_set_id)
WHERE TRUE
    AND information_set.name = 'genetic_allowance_v3';







CREATE OR REPLACE VIEW information_patient_diagnosis AS
SELECT
    information_set_instance__proband.proband_id

    ,information_set.information_set_id
    ,information_set_instance.information_set_instance_id

    ,diagnosis_icd10.data AS diagnosis_icd10
FROM
    information_set_instance__proband
    JOIN information_set_instance USING (information_set_instance_id)
    JOIN information_set USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_icd10_classification.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_icd10_classification USING (information_instance_id)
        WHERE true
            AND information.name = 'diagnosis'
    ) AS diagnosis_icd10
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_time.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_time USING (information_instance_id)
        WHERE true
            AND information.name = 'diagnosis_date'
    ) AS diagnosis_date
    USING (information_set_id)


    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_clinic.clinic_id
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_clinic USING (information_instance_id)
        WHERE true
            AND information.name = 'diagnosing_clinic'
    ) AS diagnosis_clinic
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_personel.name
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_personel USING (information_instance_id)
        WHERE true
            AND information.name = 'diagnosing_personel'
    ) AS diagnosis_personel
    USING (information_set_id)


    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_1'
    ) AS information_1
    USING (information_set_id)


    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_2'
    ) AS information_2
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_3'
    ) AS information_3
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_4'
    ) AS information_4
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_5'
    ) AS information_5
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_6'
    ) AS information_6
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_7'
    ) AS information_7
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_8'
    ) AS information_8
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_9'
    ) AS information_9
    USING (information_set_id)


    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_10'
    ) AS information_10
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_11'
    ) AS information_11
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_12'
    ) AS information_12
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_13'
    ) AS information_13
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_14'
    ) AS information_14
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_15'
    ) AS information_15
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_16'
    ) AS information_16
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_17'
    ) AS information_17
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_18'
    ) AS information_18
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_19'
    ) AS information_19
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_20'
    ) AS information_20
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_21'
    ) AS information_21
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_22'
    ) AS information_22
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_23'
    ) AS information_23
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_24'
    ) AS information_24
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_25'
    ) AS information_25
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_26'
    ) AS information_26
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_27'
    ) AS information_27
    USING (information_set_id)


WHERE TRUE
    AND information_set.name = 'patient_diagnosis_plain';



CREATE OR REPLACE VIEW information_patient_placeholder AS
SELECT
    information_set_instance__proband.proband_id

    ,information_set.information_set_id
    ,information_set_instance.information_set_instance_id

    ,information_1.data AS information_1_data
    ,information_2.data AS information_2_data
    ,information_3.data AS information_3_data
    ,information_4.data AS information_4_data
    ,information_5.data AS information_5_data
    ,information_6.data AS information_6_data
    ,information_7.data AS information_7_data
    ,information_8.data AS information_8_data
    ,information_9.data AS information_9_data
    ,information_10.data AS information_10_data
    ,information_11.data AS information_11_data
    ,information_12.data AS information_12_data
    ,information_13.data AS information_13_data
    ,information_14.data AS information_14_data
    ,information_15.data AS information_15_data
    ,information_16.data AS information_16_data
    ,information_17.data AS information_17_data
    ,information_18.data AS information_18_data
    ,information_19.data AS information_19_data
    ,information_20.data AS information_20_data
    ,information_21.data AS information_21_data
    ,information_22.data AS information_22_data
    ,information_23.data AS information_23_data
    ,information_24.data AS information_24_data
    ,information_25.data AS information_25_data
    ,information_26.data AS information_26_data
    ,information_27.data AS information_27_data
FROM
    information_set_instance__proband
    JOIN information_set_instance USING (information_set_instance_id)
    JOIN information_set USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_1'
    ) AS information_1
    USING (information_set_id)


    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_2'
    ) AS information_2
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_3'
    ) AS information_3
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_4'
    ) AS information_4
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_5'
    ) AS information_5
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_6'
    ) AS information_6
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_7'
    ) AS information_7
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_8'
    ) AS information_8
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_9'
    ) AS information_9
    USING (information_set_id)


    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_10'
    ) AS information_10
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_11'
    ) AS information_11
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_12'
    ) AS information_12
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_13'
    ) AS information_13
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_14'
    ) AS information_14
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_15'
    ) AS information_15
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_16'
    ) AS information_16
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_17'
    ) AS information_17
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_18'
    ) AS information_18
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_19'
    ) AS information_19
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_20'
    ) AS information_20
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_21'
    ) AS information_21
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_22'
    ) AS information_22
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_23'
    ) AS information_23
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_24'
    ) AS information_24
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_25'
    ) AS information_25
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_26'
    ) AS information_26
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_27'
    ) AS information_27
    USING (information_set_id)


WHERE TRUE
    AND information_set.name = 'patient_placeholder';



CREATE OR REPLACE VIEW information_sample_placeholder AS
SELECT
    information_set_instance__sample.sample_id

    ,information_set.information_set_id
    ,information_set_instance.information_set_instance_id

    ,information_1.data AS information_1_data
    ,information_2.data AS information_2_data
    ,information_3.data AS information_3_data
    ,information_4.data AS information_4_data
    ,information_5.data AS information_5_data
    ,information_6.data AS information_6_data
    ,information_7.data AS information_7_data
    ,information_8.data AS information_8_data
    ,information_9.data AS information_9_data
    ,information_10.data AS information_10_data
    ,information_11.data AS information_11_data
    ,information_12.data AS information_12_data
    ,information_13.data AS information_13_data
    ,information_14.data AS information_14_data
    ,information_15.data AS information_15_data
    ,information_16.data AS information_16_data
    ,information_17.data AS information_17_data
    ,information_18.data AS information_18_data
    ,information_19.data AS information_19_data
    ,information_20.data AS information_20_data
    ,information_21.data AS information_21_data
    ,information_22.data AS information_22_data
    ,information_23.data AS information_23_data
    ,information_24.data AS information_24_data
    ,information_25.data AS information_25_data
    ,information_26.data AS information_26_data
    ,information_27.data AS information_27_data

FROM
    information_set_instance__sample
    JOIN information_set_instance USING (information_set_instance_id)
    JOIN information_set USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_1'
    ) AS information_1
    USING (information_set_id)


    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_2'
    ) AS information_2
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_3'
    ) AS information_3
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_4'
    ) AS information_4
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_5'
    ) AS information_5
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_6'
    ) AS information_6
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_7'
    ) AS information_7
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_8'
    ) AS information_8
    USING (information_set_id)

    JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_9'
    ) AS information_9
    USING (information_set_id)


    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_10'
    ) AS information_10
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_11'
    ) AS information_11
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_12'
    ) AS information_12
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_13'
    ) AS information_13
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_14'
    ) AS information_14
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_15'
    ) AS information_15
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_16'
    ) AS information_16
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_17'
    ) AS information_17
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_18'
    ) AS information_18
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_19'
    ) AS information_19
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_20'
    ) AS information_20
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_21'
    ) AS information_21
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_22'
    ) AS information_22
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_23'
    ) AS information_23
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_24'
    ) AS information_24
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_25'
    ) AS information_25
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_26'
    ) AS information_26
    USING (information_set_id)

    LEFT JOIN (
        SELECT
        information.information_set_id,
               information_about_placeholder.data
        FROM
            information
            JOIN information_instance USING (information_id)
            JOIN information_about_placeholder USING (information_instance_id)
        WHERE true
            AND information.name = 'information_27'
    ) AS information_27
    USING (information_set_id)


WHERE TRUE
    AND information_set.name = 'sample_placeholder';
SET SEARCH_PATH = test_data, test_view;
/*
 * moderately complex query
 */
explain
SELECT *
FROM
    proband
    JOIN proband__sample USING(proband_id)
    JOIN sample USING (sample_id)

    JOIN information_generic_allowance USING (proband_id)
    JOIN information_genetic_allowance USING (proband_id)
    JOIN information_patient_diagnosis diag_1 USING (proband_id)
    JOIN information_patient_diagnosis diag_2 USING (proband_id)
    JOIN information_patient_placeholder patient_histology USING (proband_id)

    JOIN information_sample_placeholder sample_rna_experiment_1_status USING (sample_id)
    JOIN information_sample_placeholder sample_rna_experiment_1_data USING (sample_id)

    JOIN information_sample_placeholder sample_rna_experiment_2_status USING (sample_id)
    JOIN information_sample_placeholder sample_rna_experiment_2_data USING (sample_id)

    JOIN information_sample_placeholder sample_rna_experiment_3_status USING (sample_id)
    JOIN information_sample_placeholder sample_rna_experiment_3_data USING (sample_id)

WHERE TRUE
    AND information_generic_allowance.generic_allowance_granted = true
    AND information_genetic_allowance.information_6_granted = true

    AND diag_1.diagnosis_icd10 = 1343 /*some icd code*/
    AND diag_2.diagnosis_icd10 = 1344 /*another icd code*/

    AND (
        SELECT
         available_information_nr / available_information_nr
        FROM
        information_set_completition_status
    WHERE
        information_set_completition_status.information_set_instance_id = patient_histology.information_set_instance_id
    ) > 0.8


    AND (
        SELECT
         available_information_nr / available_information_nr
        FROM
        information_set_completition_status
    WHERE
        information_set_completition_status.information_set_instance_id = diag_1.information_set_instance_id
    ) > 0.5


    AND (
        SELECT
         available_information_nr / available_information_nr
        FROM
        information_set_completition_status
    WHERE
        information_set_completition_status.information_set_instance_id = diag_2.information_set_instance_id
    ) > 0.5;

Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
On Fri, 21 Dec 2018 at 06:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I was distressed to discover via perf that 69% of the runtime of this
> test now goes into match_eclasses_to_foreign_key_col().  That seems
> clearly unacceptable.

Agreed. That's pretty terrible.

I looked at this a bit and see that
match_eclasses_to_foreign_key_col() is missing any smarts to skip
equivalence classes that don't have ec_relids bits for both rels. With
that added the run-time is reduced pretty dramatically.

I've only tested with a debug build as of now, but I get:

Unpatched:

$ pgbench -n -T 60 -f query.sql postgres
latency average = 18411.604 ms

Patched:
latency average = 8748.177 ms

Going by my profiler this drops match_eclasses_to_foreign_key_col()
down to just 10% of total planner time for this query. The new
bms_is_member() call is pretty hot inside that function though.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
On Sat, 22 Dec 2018 at 09:28, David Rowley <david.rowley@2ndquadrant.com> wrote:
> Going by my profiler this drops match_eclasses_to_foreign_key_col()
> down to just 10% of total planner time for this query. The new
> bms_is_member() call is pretty hot inside that function though.

I should have said 28% instead of 10%. 10% is the time spent
exclusively just in that function (not in a function called by that
function). The bms_is_member() call I mention above is about 20% of
the total time, which likely includes some other call sites too.

Back in [1], I mentioned that I'd like to start moving away from our
linked list based implementation of List and start using an array
based version instead. If we had this we could easily further improve
this code fkey matching code to not even look near equivalence classes
that don't contain members for the relations in question with a design
something like:

1. Make PlannerInfo->eq_classes an array list instead of an array,
this will significantly improve the performance of list_nth().
2. Have a Bitmapset per relation that indexes which items in
eq_classes have ec_members for the given relation.
3. In match_eclasses_to_foreign_key_col() perform a bms_overlap() on
the eq_classes index bitmapsets for the relations at either side of
the foreign key then perform a bms_next_member() type loop over the
result of that in order to skip over the eq_classes items that can't
match.

Since match_foreign_keys_to_quals() calls
match_eclasses_to_foreign_key_col() once for each item in
PlannerInfo->fkey_list (167815 items in this case) and again for each
foreign key column in those keys, then this should significantly
reduce the effort required since we make a pass over *every*
equivalence  class each time match_eclasses_to_foreign_key_col() gets
called.

This array list implementation is something I did want to get one for
PG12. The height of the bar to do this is pretty high given what was
mentioned in [2].

[1] https://www.postgresql.org/message-id/CAKJS1f_2SnXhPVa6eWjzy2O9A%3Docwgd0Cj-LQeWpGtrWqbUSDA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/21592.1509632225%40sss.pgh.pa.us

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
On Sat, 22 Dec 2018 at 10:53, David Rowley <david.rowley@2ndquadrant.com> wrote:
> Back in [1], I mentioned that I'd like to start moving away from our
> linked list based implementation of List and start using an array
> based version instead. If we had this we could easily further improve
> this code fkey matching code to not even look near equivalence classes
> that don't contain members for the relations in question with a design
> something like:
>
> 1. Make PlannerInfo->eq_classes an array list instead of an array,
> this will significantly improve the performance of list_nth().
> 2. Have a Bitmapset per relation that indexes which items in
> eq_classes have ec_members for the given relation.
> 3. In match_eclasses_to_foreign_key_col() perform a bms_overlap() on
> the eq_classes index bitmapsets for the relations at either side of
> the foreign key then perform a bms_next_member() type loop over the
> result of that in order to skip over the eq_classes items that can't
> match.

Using the above idea, but rather than going to the trouble of storing
PlannerInfo->eq_classes as an array type list, if we build the array
on the fly inside match_foreign_keys_to_quals(), then build a
Bitmapset type index to mark which of the eclasses contains members
for each relation, then I can get the run-time for the function down
to just 0.89%.  Looking at other functions appearing high on the
profile I also see; have_relevant_eclass_joinclause() (14%),
generate_join_implied_equalities_for_ecs() (23%).

I think if we seriously want to improve planning performance when
there are many stored equivalence classes, then we need to have
indexing along the lines of what I've outlined above.

I've attached the patch I used to test this idea.  It might be
possible to develop this into something committable, perhaps if we
invent a new function in equivclass.c that builds the index into a
single struct and we pass a pointer to that down to the functions that
require the index.  Such a function could also optionally skip
indexing eclasses such as ones with ec_has_volatile or ec_has_const
when the use case for the index requires ignoring such eclasses.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
On Mon, 24 Dec 2018 at 09:38, David Rowley <david.rowley@2ndquadrant.com> wrote:
> Using the above idea, but rather than going to the trouble of storing
> PlannerInfo->eq_classes as an array type list, if we build the array
> on the fly inside match_foreign_keys_to_quals(), then build a
> Bitmapset type index to mark which of the eclasses contains members
> for each relation, then I can get the run-time for the function down
> to just 0.89%.  Looking at other functions appearing high on the
> profile I also see; have_relevant_eclass_joinclause() (14%),
> generate_join_implied_equalities_for_ecs() (23%).

I've now expanded the proof of concept patch to use this indexing
technique for have_relevant_eclass_joinclause() and
generate_join_implied_equalities_for_ecs().  With Tom's test from
up-thread, I get:

Master:
latency average = 14125.374 ms

Patched:
latency average = 2417.164 ms

There are some other cases, such as
generate_implied_equalities_for_column(), that are possibly also
indexable, but in that case, we cannot use ec_relids to help build the
index since it does not keep track of other member relation
equivalence class members. That function is appearing at about 3.3% of
total plan time with the patched version of the code, so there's still
some small gains to be had there. The performance of
has_relevant_eclass_joinclause() could likely also be improved from
this indexing. According to my profiling, it's currently still about
2.6% of total planning time with the patched version.

I've attached the updated (rough) proof of concept patch.  I ended up
stuffing the equivalence class index structure into PlannerInfo so
that it would be available in all the places it was required, but
build just once higher up the call stack. I don't believe this is the
correct solution for a finished patch, but I didn't really have any
better ideas and this seemed good enough to demonstrate what the
performance could look like.

Other ideas I have to further improve the performance of this query
would be to move the fkey_list out of PlannerInfo and instead include
a per-relation list inside RelOptInfo. This would allow
get_foreign_key_join_selectivity() to just look at foreign keys that
are relevant to the given relations rather than having to skip all
foreign keys that are not. This function is still accounting for about
5.5% of the total planning time for this query.  I imagine it wouldn't
hurt match_foreign_keys_to_quals() too much to have it loop over each
RelOptInfo and look at the foreign keys defined on each of those.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Performance issue in foreign-key-aware join estimation

From
Tomas Vondra
Date:
On 12/24/18 1:07 PM, David Rowley wrote:
> On Mon, 24 Dec 2018 at 09:38, David Rowley <david.rowley@2ndquadrant.com> wrote:
>> Using the above idea, but rather than going to the trouble of storing
>> PlannerInfo->eq_classes as an array type list, if we build the array
>> on the fly inside match_foreign_keys_to_quals(), then build a
>> Bitmapset type index to mark which of the eclasses contains members
>> for each relation, then I can get the run-time for the function down
>> to just 0.89%.  Looking at other functions appearing high on the
>> profile I also see; have_relevant_eclass_joinclause() (14%),
>> generate_join_implied_equalities_for_ecs() (23%).
> 
> I've now expanded the proof of concept patch to use this indexing
> technique for have_relevant_eclass_joinclause() and
> generate_join_implied_equalities_for_ecs().  With Tom's test from
> up-thread, I get:
> 
> Master:
> latency average = 14125.374 ms
> 
> Patched:
> latency average = 2417.164 ms
> 

Yes, I can confirm these measurements. On my machine, timing on master
is about 10530ms, with v1 of the patch it drops to 2600ms and v2 pushes
it down to 1610ms.

I however observe failures on 4 regression test suites - inherit,
equivclass, partition_join and partition_prune (diff attached). That's a
bit surprising, because AFAICS the patch merely optimizes the execution
and should not change the planning otherwise (all the failures seem to
be a query plan changing in some way). I'm not sure if the plans are
correct or better than the old ones.

The other thing is that we probably should not use a single test case to
measure the optimization - I doubt it can improve less extreme queries,
but maybe we should verify it does not regress them?

With the patch attached, bms_overlap gets quite high in the profiles. I
think one reason for that is that all bitmap operations (not just
bms_overlap) always start the loop at 0. For the upper boundary is
usually determined as Min() of the lengths, but we don't do that for
lower boundary because we don't track that. The bitmaps for eclasses are
likely sparse, so this is quite painful. Attaches is a simple patch that
adds tracking of "minword" and uses it in various bms_ methods to skip
initial part of the loop. On my machine this reduces the timings by
roughly 5% (from 1610 to 1530 ms).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
On Tue, 25 Dec 2018 at 13:46, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> I however observe failures on 4 regression test suites - inherit,
> equivclass, partition_join and partition_prune (diff attached). That's a
> bit surprising, because AFAICS the patch merely optimizes the execution
> and should not change the planning otherwise (all the failures seem to
> be a query plan changing in some way). I'm not sure if the plans are
> correct or better than the old ones.

Seems I didn't run the tests after doing a last-minute move of the
create_eclass_index() call in make_one_rel(). I'd previously had it
just below set_base_rel_pathlists(), which meant that the index didn't
exist in some cases so it would fall back on the original code.  It
appears the use case call from set_base_rel_pathlists() require
is_child eclass members to be indexed too, but these were not in v1 or
v2 since ec_relids does not record child rel members.

It seems simple enough to fix this just by adding ec_allrelids and
setting it for all members rather than just non-children members.
This also allows me to add the two additional cases to allow
generate_implied_equalities_for_column() and
has_relevant_eclass_joinclause() to also make use of the eclass index.
This further reduces the total planning time for the query on my
machine to 2304 ms.

> The other thing is that we probably should not use a single test case to
> measure the optimization - I doubt it can improve less extreme queries,
> but maybe we should verify it does not regress them?

Agreed. That still needs to be verified.  Although I'm yet unsure what
sort of form we could use this idea in.  I wonder if it's fine to
create this eclass index on the fly, or if it's something we should
keep maintained all the time.  The problem I saw with keeping it all
the time is down to eq_classes being a List and list_nth() is slow,
which means the bms_next_member() loops I've added would perform
poorly when compiled with a linked list lookup.

> With the patch attached, bms_overlap gets quite high in the profiles. I
> think one reason for that is that all bitmap operations (not just
> bms_overlap) always start the loop at 0. For the upper boundary is
> usually determined as Min() of the lengths, but we don't do that for
> lower boundary because we don't track that. The bitmaps for eclasses are
> likely sparse, so this is quite painful. Attaches is a simple patch that
> adds tracking of "minword" and uses it in various bms_ methods to skip
> initial part of the loop. On my machine this reduces the timings by
> roughly 5% (from 1610 to 1530 ms).

Seems like an interesting idea, although for the most part, Bitmapsets
are small and this adds some overhead to all use cases.  I doubt it is
worth the trouble for bms_is_member(), since that does not need to
perform a loop over each bitmapword.  It looks like most of the
bms_overlap() usages are caused by functions like
have_join_order_restriction(), join_is_legal(),
check_outerjoin_delay() and add_paths_to_joinrel(), all of which are
performing a loop over the join_info_list. Perhaps that could be
indexed a similar way to the eq_classes List. I'd imagine there's
about another 20-30% of performance to squeeze out of this by doing
that, plus a bit more by making the fkey_list per RelOptInfo.

Attached is v3 of the hacked up proof of concept performance demo patch.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Performance issue in foreign-key-aware join estimation

From
Tomas Vondra
Date:
On 12/25/18 3:48 AM, David Rowley wrote:
> On Tue, 25 Dec 2018 at 13:46, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> I however observe failures on 4 regression test suites - inherit,
>> equivclass, partition_join and partition_prune (diff attached). That's a
>> bit surprising, because AFAICS the patch merely optimizes the execution
>> and should not change the planning otherwise (all the failures seem to
>> be a query plan changing in some way). I'm not sure if the plans are
>> correct or better than the old ones.
> 
> Seems I didn't run the tests after doing a last-minute move of the
> create_eclass_index() call in make_one_rel(). I'd previously had it
> just below set_base_rel_pathlists(), which meant that the index didn't
> exist in some cases so it would fall back on the original code.  It
> appears the use case call from set_base_rel_pathlists() require
> is_child eclass members to be indexed too, but these were not in v1 or
> v2 since ec_relids does not record child rel members.
> 
> It seems simple enough to fix this just by adding ec_allrelids and
> setting it for all members rather than just non-children members.
> This also allows me to add the two additional cases to allow
> generate_implied_equalities_for_column() and
> has_relevant_eclass_joinclause() to also make use of the eclass index.
> This further reduces the total planning time for the query on my
> machine to 2304 ms.
> 

OK, that makes sense.

>> The other thing is that we probably should not use a single test case to
>> measure the optimization - I doubt it can improve less extreme queries,
>> but maybe we should verify it does not regress them?
> 
> Agreed. That still needs to be verified.  Although I'm yet unsure what
> sort of form we could use this idea in.  I wonder if it's fine to
> create this eclass index on the fly, or if it's something we should
> keep maintained all the time.  The problem I saw with keeping it all
> the time is down to eq_classes being a List and list_nth() is slow,
> which means the bms_next_member() loops I've added would perform
> poorly when compiled with a linked list lookup.
> 

Yeah, good questions. I think the simplest thing we could do is building
them on the first access - that would at least ensure we don't build the
index without accessing it at least once.

Of course, it might still be faster to do the check directly, for small
numbers of eclasses / fkeys and rels. Perhaps there's some sort of
heuristics deciding when to build the indexes, but that seems like an
overkill at this point.

What I propose is constructing a "minimal" simple query invoking this
code (something like two rels with a join on a foreign key) and measure
impact on that. That seems like a fairly realistic use case.

ALso, I wonder what is te goal here - how much do we need to redude the
duration to be happy? Initially I'd say "on par with the state before
the FK patch" but I guess we're already optimizing beyond that point.
What was the timing before adding the FK stuff?

>> With the patch attached, bms_overlap gets quite high in the profiles. I
>> think one reason for that is that all bitmap operations (not just
>> bms_overlap) always start the loop at 0. For the upper boundary is
>> usually determined as Min() of the lengths, but we don't do that for
>> lower boundary because we don't track that. The bitmaps for eclasses are
>> likely sparse, so this is quite painful. Attaches is a simple patch that
>> adds tracking of "minword" and uses it in various bms_ methods to skip
>> initial part of the loop. On my machine this reduces the timings by
>> roughly 5% (from 1610 to 1530 ms).
> 
> Seems like an interesting idea, although for the most part, Bitmapsets
> are small and this adds some overhead to all use cases.  I doubt it is
> worth the trouble for bms_is_member(), since that does not need to
> perform a loop over each bitmapword.

Perhaps. The bitmapsets are generally small in number of members, but
the values may be actually quite high in some cases (I don't have any
exact statistics, though).

You're right it adds a bit of overhead, but I'd expect that to be
outweighted by the reduction of cache misses. But 5% is fairly close to
noise.

> It looks like most of the bms_overlap() usages are caused by
> functions like have_join_order_restriction(), join_is_legal(), 
> check_outerjoin_delay() and add_paths_to_joinrel(), all of which are 
> performing a loop over the join_info_list. Perhaps that could be 
> indexed a similar way to the eq_classes List. I'd imagine there's 
> about another 20-30% of performance to squeeze out of this by doing 
> that, plus a bit more by making the fkey_list per RelOptInfo.
> 

Looks promising.

> Attached is v3 of the hacked up proof of concept performance demo patch.
> 

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
On Wed, 26 Dec 2018 at 09:50, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> Yeah, good questions. I think the simplest thing we could do is building
> them on the first access - that would at least ensure we don't build the
> index without accessing it at least once.

I think we first need to focus on what is back-patchable here.  The
problem I see with the equivalence class index idea is that it would
require passing the index down into
match_eclasses_to_foreign_key_col() which is not a static function, so
we can't really go changing its signature on a backbranch.

Another idea would be to create a new version of
match_eclasses_to_foreign_key_col() which uses the index, which would
mean we'd not break any extensions that might happen to use
match_eclasses_to_foreign_key_col().

Ideally, the quick fix in the v1 patch would be good enough for the
backbranches, but a quick bit of benchmarking shows that there's still
a big regression to what the performance is like without the foreign
keys.

(Average of EXPLAIN over 60 seconds)

foreign key qual matching code commented out: 2486.204 ms
Master: 13909.551 ms
v1 patch: 7310.719 ms

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
On Sat, 22 Dec 2018 at 09:28, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> On Fri, 21 Dec 2018 at 06:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I was distressed to discover via perf that 69% of the runtime of this
> > test now goes into match_eclasses_to_foreign_key_col().  That seems
> > clearly unacceptable.
>
> Agreed. That's pretty terrible.
>
> I looked at this a bit and see that
> match_eclasses_to_foreign_key_col() is missing any smarts to skip
> equivalence classes that don't have ec_relids bits for both rels. With
> that added the run-time is reduced pretty dramatically.
>
> I've only tested with a debug build as of now, but I get:
>
> Unpatched:
>
> $ pgbench -n -T 60 -f query.sql postgres
> latency average = 18411.604 ms
>
> Patched:
> latency average = 8748.177 ms

So that this does not get lost, I've added an entry for the original
patch for the March commitfest.

While the patch does not bring the performance back to what it was
before this code was added, it makes a massive dent in the additional
overhead.

https://commitfest.postgresql.org/22/1984/

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
On Tue, 5 Feb 2019 at 22:43, David Rowley <david.rowley@2ndquadrant.com> wrote:
> So that this does not get lost, I've added an entry for the original
> patch for the March commitfest.

Attaching the original patch again so the commitfest bot gets off my
back about the other one not applying.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Performance issue in foreign-key-aware join estimation

From
Tom Lane
Date:
David Rowley <david.rowley@2ndquadrant.com> writes:
> Attaching the original patch again so the commitfest bot gets off my
> back about the other one not applying.

Pushed that one.  I'm interested by the "POC" patch, but I agree
that it'd take some research to show that it isn't a net negative
for simple queries.  It sounds like you're not really interested
in pursuing that right now?

Anyway, I rebased the POC patch up to HEAD, just in case anyone
still wants to play with it.

            regards, tom lane

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 65302fe..2ffa00c 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2351,6 +2351,7 @@ _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
     WRITE_NODE_FIELD(ec_sources);
     WRITE_NODE_FIELD(ec_derives);
     WRITE_BITMAPSET_FIELD(ec_relids);
+    WRITE_BITMAPSET_FIELD(ec_allrelids);
     WRITE_BOOL_FIELD(ec_has_const);
     WRITE_BOOL_FIELD(ec_has_volatile);
     WRITE_BOOL_FIELD(ec_below_outer_join);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 0debac7..c5888b8 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -216,6 +216,8 @@ make_one_rel(PlannerInfo *root, List *joinlist)
     }
     root->total_table_pages = total_pages;

+    root->eq_index = create_eclass_index(root, EQUIVALANCE_IDX_MULTI_MEMBER);
+
     /*
      * Generate access paths for each base rel.
      */
@@ -226,6 +228,9 @@ make_one_rel(PlannerInfo *root, List *joinlist)
      */
     rel = make_rel_from_joinlist(root, joinlist);

+    free_eclass_index(root->eq_index);
+    root->eq_index = NULL;
+
     /*
      * The result should join all and only the query's base rels.
      */
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 61b5b11..570b42f 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -358,6 +358,7 @@ process_equivalence(PlannerInfo *root,
         ec1->ec_sources = list_concat(ec1->ec_sources, ec2->ec_sources);
         ec1->ec_derives = list_concat(ec1->ec_derives, ec2->ec_derives);
         ec1->ec_relids = bms_join(ec1->ec_relids, ec2->ec_relids);
+        ec1->ec_allrelids = bms_join(ec1->ec_allrelids, ec2->ec_allrelids);
         ec1->ec_has_const |= ec2->ec_has_const;
         /* can't need to set has_volatile */
         ec1->ec_below_outer_join |= ec2->ec_below_outer_join;
@@ -372,6 +373,7 @@ process_equivalence(PlannerInfo *root,
         ec2->ec_sources = NIL;
         ec2->ec_derives = NIL;
         ec2->ec_relids = NULL;
+        ec2->ec_allrelids = NULL;
         ec1->ec_sources = lappend(ec1->ec_sources, restrictinfo);
         ec1->ec_below_outer_join |= below_outer_join;
         ec1->ec_min_security = Min(ec1->ec_min_security,
@@ -432,6 +434,7 @@ process_equivalence(PlannerInfo *root,
         ec->ec_sources = list_make1(restrictinfo);
         ec->ec_derives = NIL;
         ec->ec_relids = NULL;
+        ec->ec_allrelids = NULL;
         ec->ec_has_const = false;
         ec->ec_has_volatile = false;
         ec->ec_below_outer_join = below_outer_join;
@@ -572,6 +575,7 @@ add_eq_member(EquivalenceClass *ec, Expr *expr, Relids relids,
     {
         ec->ec_relids = bms_add_members(ec->ec_relids, relids);
     }
+    ec->ec_allrelids = bms_add_members(ec->ec_allrelids, relids);
     ec->ec_members = lappend(ec->ec_members, em);

     return em;
@@ -708,6 +712,7 @@ get_eclass_for_sort_expr(PlannerInfo *root,
     newec->ec_sources = NIL;
     newec->ec_derives = NIL;
     newec->ec_relids = NULL;
+    newec->ec_allrelids = NULL;
     newec->ec_has_const = false;
     newec->ec_has_volatile = contain_volatile_functions((Node *) expr);
     newec->ec_below_outer_join = false;
@@ -1114,6 +1119,60 @@ generate_join_implied_equalities_for_ecs(PlannerInfo *root,
         nominal_join_relids = join_relids;
     }

+    if (root->eq_index != NULL)
+    {
+        Bitmapset  *inner_ecs = NULL;
+        Bitmapset  *join_ecs = NULL;
+        Bitmapset  *matching_ecs;
+        int            i;
+
+        Assert((root->eq_index->ei_flags & EQUIVALANCE_IDX_MULTI_MEMBER));
+
+        i = -1;
+        while ((i = bms_next_member(inner_relids, i)) > 0)
+            inner_ecs = bms_add_members(inner_ecs, root->eq_index->ei_index[i]);
+
+        i = -1;
+        while ((i = bms_next_member(join_relids, i)) > 0)
+            join_ecs = bms_add_members(join_ecs, root->eq_index->ei_index[i]);
+
+        /* Determine all eclasses in common between inner rels and join rels */
+        matching_ecs = bms_int_members(inner_ecs, join_ecs);
+
+        i = -1;
+        while ((i = bms_next_member(matching_ecs, i)) >= 0)
+        {
+            EquivalenceClass *ec = root->eq_index->ei_classes[i];
+            List       *sublist = NIL;
+
+            /* ECs containing consts do not need any further enforcement */
+            if (ec->ec_has_const)
+                continue;
+
+            /* Ensure the class contains members for any of nominal_join_relids */
+            Assert(bms_overlap(ec->ec_relids, nominal_join_relids));
+
+            if (!ec->ec_broken)
+                sublist = generate_join_implied_equalities_normal(root,
+                                                                  ec,
+                                                                  join_relids,
+                                                                  outer_relids,
+                                                                  inner_relids);
+
+            /* Recover if we failed to generate required derived clauses */
+            if (ec->ec_broken)
+                sublist = generate_join_implied_equalities_broken(root,
+                                                                  ec,
+                                                                  nominal_join_relids,
+                                                                  outer_relids,
+                                                                  nominal_inner_relids,
+                                                                  inner_rel);
+
+            result = list_concat(result, sublist);
+        }
+        return result;
+    }
+
     foreach(lc, eclasses)
     {
         EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc);
@@ -2026,6 +2085,8 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
  * we ignore that fine point here.)  This is much like exprs_known_equal,
  * except that we insist on the comparison operator matching the eclass, so
  * that the result is definite not approximate.
+ *
+ * An eq_index without volatile classes must already exist on 'root'.
  */
 EquivalenceClass *
 match_eclasses_to_foreign_key_col(PlannerInfo *root,
@@ -2038,31 +2099,35 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
     AttrNumber    var2attno = fkinfo->confkey[colno];
     Oid            eqop = fkinfo->conpfeqop[colno];
     List       *opfamilies = NIL;    /* compute only if needed */
-    ListCell   *lc1;
+    EquivalenceClassIndex *eq_index;
+    Bitmapset  *matching_ecs;
+    int            i;

-    foreach(lc1, root->eq_classes)
+    Assert(root->eq_index != NULL);
+    Assert((root->eq_index->ei_flags & EQUIVALANCE_IDX_NON_VOLATILE) != 0);
+
+    eq_index = root->eq_index;
+
+    /* Determine which eclasses contain members for both of the relations */
+    matching_ecs = bms_intersect(eq_index->ei_index[var1varno],
+                                 eq_index->ei_index[var2varno]);
+
+    i = -1;
+    while ((i = bms_next_member(matching_ecs, i)) >= 0)
     {
-        EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc1);
+        EquivalenceClass *ec = eq_index->ei_classes[i];
         bool        item1member = false;
         bool        item2member = false;
-        ListCell   *lc2;
+        ListCell   *lc;

-        /* Never match to a volatile EC */
-        if (ec->ec_has_volatile)
-            continue;
-        /* Note: it seems okay to match to "broken" eclasses here */
+        /* the eclass index shouldn't have indexed these */
+        Assert(!ec->ec_has_volatile);

-        /*
-         * If eclass visibly doesn't have members for both rels, there's no
-         * need to grovel through the members.
-         */
-        if (!bms_is_member(var1varno, ec->ec_relids) ||
-            !bms_is_member(var2varno, ec->ec_relids))
-            continue;
+        /* Note: it seems okay to match to "broken" eclasses here */

-        foreach(lc2, ec->ec_members)
+        foreach(lc, ec->ec_members)
         {
-            EquivalenceMember *em = (EquivalenceMember *) lfirst(lc2);
+            EquivalenceMember *em = (EquivalenceMember *) lfirst(lc);
             Var           *var;

             if (em->em_is_child)
@@ -2238,6 +2303,106 @@ generate_implied_equalities_for_column(PlannerInfo *root,
     else
         parent_relids = NULL;    /* not used, but keep compiler quiet */

+    if (root->eq_index != NULL)
+    {
+        Bitmapset       *matched_ecs = NULL;
+        int                i;
+
+        Assert((root->eq_index->ei_flags & EQUIVALANCE_IDX_MULTI_MEMBER));
+
+        i = -1;
+        while ((i = bms_next_member(rel->relids, i)) > 0)
+            matched_ecs = bms_add_members(matched_ecs, root->eq_index->ei_index[i]);
+
+        i = -1;
+        while ((i = bms_next_member(matched_ecs, i)) >= 0)
+        {
+            EquivalenceClass *cur_ec = root->eq_index->ei_classes[i];
+            EquivalenceMember *cur_em;
+            ListCell   *lc2;
+
+            /* Won't generate joinclauses if const */
+            if (cur_ec->ec_has_const)
+                continue;
+
+            /*
+             * Scan members, looking for a match to the target column.  Note that
+             * child EC members are considered, but only when they belong to the
+             * target relation.  (Unlike regular members, the same expression
+             * could be a child member of more than one EC.  Therefore, it's
+             * potentially order-dependent which EC a child relation's target
+             * column gets matched to.  This is annoying but it only happens in
+             * corner cases, so for now we live with just reporting the first
+             * match.  See also get_eclass_for_sort_expr.)
+             */
+            cur_em = NULL;
+            foreach(lc2, cur_ec->ec_members)
+            {
+                cur_em = (EquivalenceMember *) lfirst(lc2);
+                if (bms_equal(cur_em->em_relids, rel->relids) &&
+                    callback(root, rel, cur_ec, cur_em, callback_arg))
+                    break;
+                cur_em = NULL;
+            }
+
+            if (!cur_em)
+                continue;
+
+            /*
+             * Found our match.  Scan the other EC members and attempt to generate
+             * joinclauses.
+             */
+            foreach(lc2, cur_ec->ec_members)
+            {
+                EquivalenceMember *other_em = (EquivalenceMember *) lfirst(lc2);
+                Oid            eq_op;
+                RestrictInfo *rinfo;
+
+                if (other_em->em_is_child)
+                    continue;        /* ignore children here */
+
+                /* Make sure it'll be a join to a different rel */
+                if (other_em == cur_em ||
+                    bms_overlap(other_em->em_relids, rel->relids))
+                    continue;
+
+                /* Forget it if caller doesn't want joins to this rel */
+                if (bms_overlap(other_em->em_relids, prohibited_rels))
+                    continue;
+
+                /*
+                 * Also, if this is a child rel, avoid generating a useless join
+                 * to its parent rel(s).
+                 */
+                if (is_child_rel &&
+                    bms_overlap(parent_relids, other_em->em_relids))
+                    continue;
+
+                eq_op = select_equality_operator(cur_ec,
+                                                 cur_em->em_datatype,
+                                                 other_em->em_datatype);
+                if (!OidIsValid(eq_op))
+                    continue;
+
+                /* set parent_ec to mark as redundant with other joinclauses */
+                rinfo = create_join_clause(root, cur_ec, eq_op,
+                                           cur_em, other_em,
+                                           cur_ec);
+
+                result = lappend(result, rinfo);
+            }
+
+            /*
+             * If somehow we failed to create any join clauses, we might as well
+             * keep scanning the ECs for another match.  But if we did make any,
+             * we're done, because we don't want to return non-redundant clauses.
+             */
+            if (result)
+                break;
+        }
+        return result;
+    }
+
     foreach(lc1, root->eq_classes)
     {
         EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc1);
@@ -2354,6 +2519,23 @@ have_relevant_eclass_joinclause(PlannerInfo *root,
 {
     ListCell   *lc1;

+    if (root->eq_index != NULL)
+    {
+        Bitmapset *rel1_ecs = NULL;
+        Bitmapset *rel2_ecs = NULL;
+
+        int i = -1;
+        while ((i = bms_next_member(rel1->relids, i)) > 0)
+            rel1_ecs = bms_add_members(rel1_ecs, root->eq_index->ei_index[i]);
+
+        i = -1;
+        while ((i = bms_next_member(rel2->relids, i)) > 0)
+            rel2_ecs = bms_add_members(rel2_ecs, root->eq_index->ei_index[i]);
+
+        /* return true if there are any eclasses in common between the two relations */
+        return bms_overlap(rel1_ecs, rel2_ecs);
+    }
+
     foreach(lc1, root->eq_classes)
     {
         EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc1);
@@ -2407,6 +2589,28 @@ has_relevant_eclass_joinclause(PlannerInfo *root, RelOptInfo *rel1)
 {
     ListCell   *lc1;

+    if (root->eq_index != NULL)
+    {
+        Bitmapset       *matched_ecs = NULL;
+        int                i;
+
+        Assert((root->eq_index->ei_flags & EQUIVALANCE_IDX_MULTI_MEMBER));
+
+        i = -1;
+        while ((i = bms_next_member(rel1->relids, i)) > 0)
+            matched_ecs = bms_add_members(matched_ecs, root->eq_index->ei_index[i]);
+
+        i = -1;
+
+        while ((i = bms_next_member(matched_ecs, i)) >= 0)
+        {
+            EquivalenceClass *ec = root->eq_index->ei_classes[i];
+            if (!bms_is_subset(ec->ec_relids, rel1->relids))
+                return true;
+        }
+        return false;
+    }
+
     foreach(lc1, root->eq_classes)
     {
         EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc1);
@@ -2556,3 +2760,76 @@ is_redundant_with_indexclauses(RestrictInfo *rinfo, List *indexclauses)

     return false;
 }
+
+EquivalenceClassIndex *
+create_eclass_index(PlannerInfo *root, int flags)
+{
+    EquivalenceClassIndex *ec_index = makeNode(EquivalenceClassIndex);
+    EquivalenceClass      **ei_classes;
+    Bitmapset              **ei_index;
+    ListCell *lc;
+    int i;
+
+    ec_index->ei_classes = ei_classes = palloc(sizeof(EquivalenceClass *) *
+        list_length(root->eq_classes));
+    ec_index->ei_index = ei_index = palloc0(sizeof(Bitmapset *) *
+        root->simple_rel_array_size);
+    ec_index->ei_flags = flags;
+
+    /*
+     * Index the eclass list so that we can quickly identify eclasses
+     * belonging to a given relation.  First we build an array to store
+     * each eclass, this allows us to quickly lookup the eclass by array
+     * index.  We then build an index using a Bitmapset for each relation to
+     * mark which eclass array element contains a member for that relation.
+     */
+    i = 0;
+    foreach(lc, root->eq_classes)
+        ei_classes[i++] = lfirst(lc);
+
+    /*
+     * We could build the indexes in the loop above, but looping backwards
+     * allows the Bitmapset to be allocated to its final size on the first
+     * allocation.  Doing this forward could cause small incremental
+     * allocations which can be slower.
+     */
+    for (i--; i >= 0; i--)
+    {
+        int relid = -1;
+
+        while ((relid = bms_next_member(ei_classes[i]->ec_allrelids, relid)) > 0)
+        {
+            EquivalenceClass  *ec = ei_classes[i];
+
+            /* Don't index classes with a const if flags mention not to. */
+            if (ec->ec_has_const && (flags & EQUIVALANCE_IDX_NON_CONST) != 0)
+                continue;
+
+            /* Skip volatile classes when not required in the index */
+            if (ec->ec_has_volatile &&
+                (flags & EQUIVALANCE_IDX_NON_VOLATILE) != 0)
+                continue;
+
+            if (list_length(ec->ec_members) <= 1 &&
+                (flags & EQUIVALANCE_IDX_MULTI_MEMBER) != 0)
+                continue;
+
+            Assert(relid < root->simple_rel_array_size);
+
+            /* mark this eclass as having a member for relid */
+            ei_index[relid] = bms_add_member(ei_index[relid], i);
+        }
+    }
+
+    return ec_index;
+}
+
+void
+free_eclass_index(EquivalenceClassIndex *eci)
+{
+    Assert(IsA(eci, EquivalenceClassIndex));
+
+    pfree(eci->ei_classes);
+    pfree(eci->ei_index);
+}
+
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 2afc3f1..a3bb646 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -2433,6 +2433,9 @@ match_foreign_keys_to_quals(PlannerInfo *root)
     List       *newlist = NIL;
     ListCell   *lc;

+
+    root->eq_index = create_eclass_index(root, EQUIVALANCE_IDX_NON_VOLATILE);
+
     foreach(lc, root->fkey_list)
     {
         ForeignKeyOptInfo *fkinfo = (ForeignKeyOptInfo *) lfirst(lc);
@@ -2578,6 +2581,10 @@ match_foreign_keys_to_quals(PlannerInfo *root)
     }
     /* Replace fkey_list, thereby discarding any useless entries */
     root->fkey_list = newlist;
+
+    /* clean up the eclass index */
+    free_eclass_index(root->eq_index);
+    root->eq_index = NULL;
 }


diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index f938925..736a1db 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -259,6 +259,7 @@ typedef enum NodeTag
     /* these aren't subclasses of Path: */
     T_EquivalenceClass,
     T_EquivalenceMember,
+    T_EquivalenceClassIndex,
     T_PathKey,
     T_PathTarget,
     T_RestrictInfo,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index a008ae0..f50f98c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -263,6 +263,8 @@ struct PlannerInfo

     List       *eq_classes;        /* list of active EquivalenceClasses */

+    struct EquivalenceClassIndex *eq_index;    /* Temporary eclass index */
+
     List       *canon_pathkeys; /* list of "canonical" PathKeys */

     List       *left_join_clauses;    /* list of RestrictInfos for mergejoinable
@@ -923,6 +925,7 @@ typedef struct EquivalenceClass
     List       *ec_derives;        /* list of derived RestrictInfos */
     Relids        ec_relids;        /* all relids appearing in ec_members, except
                                  * for child members (see below) */
+    Relids        ec_allrelids;    /* all relids, including child members */
     bool        ec_has_const;    /* any pseudoconstants in ec_members? */
     bool        ec_has_volatile;    /* the (sole) member is a volatile expr */
     bool        ec_below_outer_join;    /* equivalence applies below an OJ */
@@ -933,6 +936,22 @@ typedef struct EquivalenceClass
     struct EquivalenceClass *ec_merged; /* set if merged into another EC */
 } EquivalenceClass;

+typedef struct EquivalenceClassIndex
+{
+    NodeTag        type;
+
+    EquivalenceClass      **ei_classes;    /* an array of EquivalenceClasses */
+    Bitmapset              **ei_index; /* an array, indexed by relid contining
+                                       * a bitmapset of each ei_classes item
+                                       * which has a member belonging to this
+                                       * relation */
+    int                        ei_flags;
+} EquivalenceClassIndex;
+
+#define EQUIVALANCE_IDX_NON_CONST        1
+#define EQUIVALANCE_IDX_NON_VOLATILE    2
+#define EQUIVALANCE_IDX_MULTI_MEMBER    4
+
 /*
  * If an EC contains a const and isn't below-outer-join, any PathKey depending
  * on it must be redundant, since there's only one possible value of the key.
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 040335a..7c81239 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -169,6 +169,9 @@ extern bool eclass_useful_for_merging(PlannerInfo *root,
 extern bool is_redundant_derived_clause(RestrictInfo *rinfo, List *clauselist);
 extern bool is_redundant_with_indexclauses(RestrictInfo *rinfo,
                                List *indexclauses);
+extern EquivalenceClassIndex *create_eclass_index(PlannerInfo *root,
+                    int flags);
+extern void free_eclass_index(EquivalenceClassIndex *eci);

 /*
  * pathkeys.c

Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
On Thu, 21 Feb 2019 at 15:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Pushed that one.  I'm interested by the "POC" patch, but I agree
> that it'd take some research to show that it isn't a net negative
> for simple queries.  It sounds like you're not really interested
> in pursuing that right now?

Thanks for pushing it.

I'm still interested in the POC patch. I just knew it wasn't something
for the back branches and thought something that was would be more
important... Things having gone quiet here wasn't a good source of
inspiration to do any further work on it. It's good to hear you're
interested.

> Anyway, I rebased the POC patch up to HEAD, just in case anyone
> still wants to play with it.

Cool. Thanks.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Performance issue in foreign-key-aware join estimation

From
Tom Lane
Date:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Thu, 21 Feb 2019 at 15:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Anyway, I rebased the POC patch up to HEAD, just in case anyone
>> still wants to play with it.

> Cool. Thanks.

I haven't done any of the performance testing that this patch
clearly needs, but just in a quick look through the code:

* I seriously dislike the usage of root->eq_classes, primarily the
underdocumented way that it means one thing in some phases of the
planner and something else in other phases.  You seem to be doing that
in hopes of saving some memory, but is the index data structure really
large enough to matter?  This scheme is certainly unlikely to continue
to work if we add any additional uses of EquivalenceClassIndexes.
So I think it might be better to pass them around as explicit
arguments and/or attach them someplace else than PlannerInfo.

* I'm also not very excited about having both a fast and slow path
in places like has_relevant_eclass_joinclause() depending on whether
the index exists or not.  IMO, if we're going to do this at all,
we should just go over to requiring the index to exist when needed,
and get rid of the slow paths.  (A possible variant to that is "build
the index structure on-demand", though you'd have to be careful about
GEQO memory management.)  Otherwise we'll forever be fighting hidden
planner-performance bugs of the form "if you call function xyz from
here, it'll be way slower than you expected".

* There's not much point in making EquivalenceClassIndex a Node type
if you're not going to wire it up to any of the Node infrastructure
(particularly outfuncs.c, which might be useful for debug purposes).
But I'm not really sure that it ought to be a distinct data structure
at all --- maybe this requirement should be more tightly integrated
with the ECs themselves?  One idea of what that might look like is to
let each base RelOptInfo have a list of associated EquivalenceClasses,
so that you'd only have to search through directly-relevant ECs when
trying to prove something.  But I'm just handwaving here.

* Spell check please, particularly EQUIVALANCE.

* Documentation of the data structure is pretty weak, eg what is
"this relation" in the comment about ei_index?  And how do you
know how long the arrays are, or what they're indexed by?  And
there's no explicit statement that ei_flags is a bitmask of those
other symbols, much less any real statement of what each flag means.

Setting the CF entry to WOA for now.  I wonder though if we should
just push it out to v13 immediately --- are you intending to do more
with it in the near future?

            regards, tom lane


Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
On Sat, 9 Mar 2019 at 08:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Setting the CF entry to WOA for now.  I wonder though if we should
> just push it out to v13 immediately --- are you intending to do more
> with it in the near future?

Thanks a lot for going over this and providing feedback.  I put the
patch out there mostly to see if such a thing is something we might
want, and it's good to see no objections to the idea. I didn't want to
waste too much time if there was going to be some serious objections
to the idea.

This is something I'd like to look into for v13.  I think it'll be
much easier to do if we can get your List reimplementation patch in
first. That's going to allow list_nth on PlannerInfo->eq_classes to
work quickly and will remove the need for having to build an array to
store the classes, and as you mention, RelOptInfo could store a
Bitmapset to store which ECs have members for this rel.   I've
modified the CF entry to target v13.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
On Sat, 9 Mar 2019 at 13:18, David Rowley <david.rowley@2ndquadrant.com> wrote:
> This is something I'd like to look into for v13.  I think it'll be
> much easier to do if we can get your List reimplementation patch in
> first. That's going to allow list_nth on PlannerInfo->eq_classes to
> work quickly and will remove the need for having to build an array to
> store the classes, and as you mention, RelOptInfo could store a
> Bitmapset to store which ECs have members for this rel.   I've
> modified the CF entry to target v13.

I started looking at this again and I came up with the attached
eclass_indexes_v2.patch.  This modifies the original patch
significantly so that we no longer build this big eclass index when we
think we might start to need it.  Instead, I've added a Bitmapset
field to RelOptInfo named "eclass_indexes".  During add_eq_member() we
just note down the PlannerInfo->eq_classes index of the eclass we're
adding to in each RelOptInfo->eclass_indexes that's involved in the
new eclass member being added.  This means we can use the Bitmapset
lookup anytime we like. That allowed me to get rid of those special
cases I had added to make use of the index when it exists.  I've now
modified all the existing code to make use of the
RelOptInfo->eclass_indexes field.

As mentioned above, my thoughts on this patch were that this new
method would only be any good once we got Tom's list implementation
patch as that makes list_nth() O(1) instead of O(N).  On benchmarking
this I was quite surprised that it still improves performance quite a
bit even without the list reimplementation patch.

Using Tom's test case [1], I get the following:

* master (82a5649fb)
latency average = 6992.320 ms
latency average = 6990.297 ms
latency average = 7030.619 ms

* master + eclass_indexes_v2
latency average = 2537.536 ms
latency average = 2530.824 ms
latency average = 2543.770 ms

If I add in Tom's list reimplementation too, I get:

* master + Tom's list reimplementation v3 + eclass_indexes
latency average = 1225.910 ms
latency average = 1209.565 ms
latency average = 1207.326 ms

I really didn't expect just this patch alone to speed this up as it
calls list_nth() in a loop.  I can only imagine that with this
workload that these list_nth() loops are not performing many loops,
otherwise, the performance would die off quickly. I thought about how
we could defend against workloads where there's a large number of
eclasses and most match a given relation.  I ended up with a small
function named list_skip_forward() that simply keeps looping for N
times eating N ListCells into the given ListCell.  This does improve
performance a little bit more, but probably, more importantly, should
be a good defence against the worst case.  As is, the function would
become completely redundant with Tom's list reimplementation patch, so
for now, I just snuck it in as a static function in equivclass.c.

I've attached this list_skip_forward.patch too.  This patch is a bit
rough around the edges as I only just started playing with it in the
past 30 mins or so.  Perhaps it shows that we might be able to fix
this in PG12 and not have to wait for the list reimplementation at
all.

The performance with both attached patches is:

* master + eclass_indexes + list_skip_forward_v0
latency average = 2375.383 ms
latency average = 2351.450 ms
latency average = 2339.259 ms

Still nowhere near as good as with the list reimplementation patch,
but way better than master.

[1] https://www.postgresql.org/message-id/6970.1545327857@sss.pgh.pa.us

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
On Sun, 10 Mar 2019 at 21:34, David Rowley <david.rowley@2ndquadrant.com> wrote:
> I started looking at this again and I came up with the attached
> eclass_indexes_v2.patch.

The CF bot didn't like v2. It warned about an uninitialized variable.
My compiler didn't.

Here's v3, hopefully with that fixed.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Performance issue in foreign-key-aware join estimation

From
Tom Lane
Date:
David Rowley <david.rowley@2ndquadrant.com> writes:
> [ eclass_indexes_v3.patch ]

I looked at this for a little bit.  I'm on board with the basic idea
of assigning integer indexes to the ECs and keeping bitmapsets of
relevant EC indexes in RelOptInfos.  However ... man, is that
delete_eclass() thing ugly.  And slow, and fragile-feeling.

I think that maybe we could improve that situation by not trying to
maintain the RelOptInfo.eclass_indexes sets at all during the initial
construction of the ECs, but only setting them up after we've completed
doing that.  We already have an assumption that we know when EC merging
is done and it's safe to make pathkeys that reference the "canonical"
(merged) ECs, so it seems like that would be a point where we could
build the indexing bitmapsets safely.  This hinges on not needing the
bitmapsets till after that point, but it'd be easy to arrange some
Asserts verifying that we don't try to use them before that.

If that doesn't work (because we need the index info sooner), maybe we
could consider never removing ECs from eq_classes, so that their indices
never change, then teaching most/all of the loops over eq_classes to
ignore entries with non-null ec_merged.  But we probably want the indexing
bitmapsets to reflect canonical EC numbers, so we'd still have to do some
updating I fear -- or else be prepared to chase up the ec_merged links
when using the index bitmaps.

Stepping back a little bit, I wonder whether now is the time to rethink
the overall EC data structure, as foreseen in this old comment:

 * Note: constructing merged EquivalenceClasses is a standard UNION-FIND
 * problem, for which there exist better data structures than simple lists.
 * If this code ever proves to be a bottleneck then it could be sped up ---
 * but for now, simple is beautiful.

I'm not very sure what a better data structure would really look like ---
after perusing Sedgewick's section on UNION-FIND, it seems like the
ec_merged links are more or less what he's recommending anyway, and the
expensive part of this is probably all the equal() tests to find whether
two proposed EC member expressions are already in the set of known
expressions.  So I'm just handwaving here.  But my point is that we
needn't feel wedded to the idea of keeping the ECs in a list, if we can
think of some better data structure for them.

            regards, tom lane


Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
Thanks for looking at this.

On Mon, 18 Mar 2019 at 10:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I looked at this for a little bit.  I'm on board with the basic idea
> of assigning integer indexes to the ECs and keeping bitmapsets of
> relevant EC indexes in RelOptInfos.  However ... man, is that
> delete_eclass() thing ugly.  And slow, and fragile-feeling.

Yeah. When I discovered we remove eclasses from the list I was a
little annoyed as the code was pretty simple before having to account
for that.  I'll grant you ugly and slow, but I don't quite see the
fragile part.

> I think that maybe we could improve that situation by not trying to
> maintain the RelOptInfo.eclass_indexes sets at all during the initial
> construction of the ECs, but only setting them up after we've completed
> doing that.  We already have an assumption that we know when EC merging
> is done and it's safe to make pathkeys that reference the "canonical"
> (merged) ECs, so it seems like that would be a point where we could
> build the indexing bitmapsets safely.  This hinges on not needing the
> bitmapsets till after that point, but it'd be easy to arrange some
> Asserts verifying that we don't try to use them before that.

I don't think that's really that easily workable.  Consider, for
example, when add_child_rel_equivalences() is called, this is well
after the location I think you have in mind. Probably this function
should be modified to use the indexes, but it must also update the
indexes too.

> If that doesn't work (because we need the index info sooner), maybe we
> could consider never removing ECs from eq_classes, so that their indices
> never change, then teaching most/all of the loops over eq_classes to
> ignore entries with non-null ec_merged.  But we probably want the indexing
> bitmapsets to reflect canonical EC numbers, so we'd still have to do some
> updating I fear -- or else be prepared to chase up the ec_merged links
> when using the index bitmaps.

That's probably a better solution. Perhaps we can continue to nullify
the ec_members, then just skip eclasses with a NIL ec_members.  I
avoided that in the patch because I was worried about what extension
might be doing, but if you think it's okay, then I can change the
patch.

> Stepping back a little bit, I wonder whether now is the time to rethink
> the overall EC data structure, as foreseen in this old comment:
>
>  * Note: constructing merged EquivalenceClasses is a standard UNION-FIND
>  * problem, for which there exist better data structures than simple lists.
>  * If this code ever proves to be a bottleneck then it could be sped up ---
>  * but for now, simple is beautiful.

I've thought for a while now that it wouldn't be too hard to have
equal(), or some version of it return -1, 0, 1 to allow us to binary
search or build a binary search tree of nodes.  I imagine it wouldn't
be too hard to create a compact binary search tree inside an array
with indexes to the left and right sub-tree instead of pointers. That
would likely be a bit more cache friendly and allow simple
non-recursive traversals of the entire tree.  However, that does not
sound like something this patch should be doing.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
On Mon, 18 Mar 2019 at 14:06, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> On Mon, 18 Mar 2019 at 10:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > If that doesn't work (because we need the index info sooner), maybe we
> > could consider never removing ECs from eq_classes, so that their indices
> > never change, then teaching most/all of the loops over eq_classes to
> > ignore entries with non-null ec_merged.  But we probably want the indexing
> > bitmapsets to reflect canonical EC numbers, so we'd still have to do some
> > updating I fear -- or else be prepared to chase up the ec_merged links
> > when using the index bitmaps.
>
> That's probably a better solution. Perhaps we can continue to nullify
> the ec_members, then just skip eclasses with a NIL ec_members.  I
> avoided that in the patch because I was worried about what extension
> might be doing, but if you think it's okay, then I can change the
> patch.

I've modified the patch to do it this way.

The only loop over eq_classes I saw outside of equivclass.c was in
postgres_fdw.c.  This just calls eclass_useful_for_merging() on each
EC in the list. Instead of having that loop skip deleted ECs, I
changed eclass_useful_for_merging() so that it just returns false for
that case.

The only other thing I change was to create a new function named
get_common_eclass_indexes() which removes some duplicate code where we
were getting ECs common to two relations. I also made it so this
function does not allocate unnecessary Bitmapsets when the inputs are
simple relations.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Performance issue in foreign-key-aware join estimation

From
Tom Lane
Date:
David Rowley <david.rowley@2ndquadrant.com> writes:
> [ eclass_indexes_v4.patch ]

I still don't like this approach too much.  I think we can fairly easily
construct the eclass_indexes at a higher level instead of trying to
manage them in add_eq_member, and after some hacking I have the attached.
Some notes:

* To be sure of knowing whether the indexes have been made yet, I added
a new flag variable PlannerInfo.ec_merging_done.  This seems like a
cleaner fix for the dependency between canonical-pathkey construction
and EC construction, too.  I did need to add code to set the flag in
a couple of short-circuit code paths where we never call
generate_base_implied_equalities(), though.

* I kind of want to rename generate_base_implied_equalities() to
something else, considering that it does a good bit more than just
that now.  But I haven't thought of a good name.  I considered
finalize_equivalence_classes(), but that seems like an overstatement
when it's still possible for new sort-key eclasses to get made later.

* I did not include your changes to use the indexes in
generate_join_implied_equalities[_for_ecs], because they were wrong.
generate_join_implied_equalities_for_ecs is supposed to consider only
ECs listed in the input list.  (It's troublesome that apparently we
don't have any regression tests exposing that need; we should try to
make a test case that does show it.)  There's probably some way to
still make use of the indexes there.  If nothing else, we could refactor
so that generate_join_implied_equalities isn't just a wrapper around
generate_join_implied_equalities_for_ecs but has its own looping, and
we only use the indexes in generate_join_implied_equalities not the
other entry point.  I haven't tried though.

* You mentioned postgres_fdw.c's get_useful_ecs_for_relation as
potentially affected by this change.  It looks to me like we could
nuke that function altogether in favor of having its caller scan the
foreign table's eclass_indexes.  I haven't tried that either.


I'm unsure how hard we should push to get something like this into v12.
I'm concerned that its dependency on list_nth might result in performance
regressions in some cases; it's a lot easier to believe that this will
be mostly-a-win with the better List infrastructure we're hoping to get
into v13.  If you want to keep playing with it, OK, but I'm kind of
tempted to shelve it for now.

            regards, tom lane

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 69179a0..4e8400d 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2190,6 +2190,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
     WRITE_NODE_FIELD(cte_plan_ids);
     WRITE_NODE_FIELD(multiexpr_params);
     WRITE_NODE_FIELD(eq_classes);
+    WRITE_BOOL_FIELD(ec_merging_done);
     WRITE_NODE_FIELD(canon_pathkeys);
     WRITE_NODE_FIELD(left_join_clauses);
     WRITE_NODE_FIELD(right_join_clauses);
@@ -2256,6 +2257,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
     WRITE_UINT_FIELD(pages);
     WRITE_FLOAT_FIELD(tuples, "%.0f");
     WRITE_FLOAT_FIELD(allvisfrac, "%.6f");
+    WRITE_BITMAPSET_FIELD(eclass_indexes);
     WRITE_NODE_FIELD(subroot);
     WRITE_NODE_FIELD(subplan_params);
     WRITE_INT_FIELD(rel_parallel_workers);
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 61b5b11..804b211 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -64,6 +64,10 @@ static bool reconsider_outer_join_clause(PlannerInfo *root,
                              bool outer_on_left);
 static bool reconsider_full_join_clause(PlannerInfo *root,
                             RestrictInfo *rinfo);
+static Bitmapset *get_eclass_indexes_for_relids(PlannerInfo *root,
+                              Relids relids);
+static Bitmapset *get_common_eclass_indexes(PlannerInfo *root, Relids relids1,
+                          Relids relids2);


 /*
@@ -341,10 +345,11 @@ process_equivalence(PlannerInfo *root,

         /*
          * Case 2: need to merge ec1 and ec2.  This should never happen after
-         * we've built any canonical pathkeys; if it did, those pathkeys might
-         * be rendered non-canonical by the merge.
+         * the ECs have reached canonical state; otherwise, pathkeys could be
+         * rendered non-canonical by the merge, and relation eclass indexes
+         * would get broken by removal of an eq_classes list entry.
          */
-        if (root->canon_pathkeys != NIL)
+        if (root->ec_merging_done)
             elog(ERROR, "too late to merge equivalence classes");

         /*
@@ -743,6 +748,26 @@ get_eclass_for_sort_expr(PlannerInfo *root,

     root->eq_classes = lappend(root->eq_classes, newec);

+    /*
+     * If EC merging is already complete, we have to mop up by adding the new
+     * EC to the eclass_indexes of the relation(s) mentioned in it.
+     */
+    if (root->ec_merging_done)
+    {
+        int            ec_index = list_length(root->eq_classes) - 1;
+        int            i = -1;
+
+        while ((i = bms_next_member(newec->ec_relids, i)) > 0)
+        {
+            RelOptInfo *rel = root->simple_rel_array[i];
+
+            Assert(rel->reloptkind == RELOPT_BASEREL);
+
+            rel->eclass_indexes = bms_add_member(rel->eclass_indexes,
+                                                 ec_index);
+        }
+    }
+
     MemoryContextSwitchTo(oldcontext);

     return newec;
@@ -800,42 +825,71 @@ get_eclass_for_sort_expr(PlannerInfo *root,
 void
 generate_base_implied_equalities(PlannerInfo *root)
 {
+    int            ec_index;
     ListCell   *lc;
-    Index        rti;

+    /*
+     * At this point, we're done absorbing knowledge of equivalences in the
+     * query, so no further EC merging should happen, and ECs remaining in the
+     * eq_classes list can be considered canonical.  (But note that it's still
+     * possible for new single-member ECs to be added through
+     * get_eclass_for_sort_expr().)
+     */
+    root->ec_merging_done = true;
+
+    ec_index = 0;
     foreach(lc, root->eq_classes)
     {
         EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc);
+        bool        can_generate_joinclause = false;
+        int            i;

         Assert(ec->ec_merged == NULL);    /* else shouldn't be in list */
         Assert(!ec->ec_broken); /* not yet anyway... */

-        /* Single-member ECs won't generate any deductions */
-        if (list_length(ec->ec_members) <= 1)
-            continue;
+        /*
+         * Generate implied equalities that are restriction clauses.
+         * Single-member ECs won't generate any deductions, either here or at
+         * the join level.
+         */
+        if (list_length(ec->ec_members) > 1)
+        {
+            if (ec->ec_has_const)
+                generate_base_implied_equalities_const(root, ec);
+            else
+                generate_base_implied_equalities_no_const(root, ec);

-        if (ec->ec_has_const)
-            generate_base_implied_equalities_const(root, ec);
-        else
-            generate_base_implied_equalities_no_const(root, ec);
+            /* Recover if we failed to generate required derived clauses */
+            if (ec->ec_broken)
+                generate_base_implied_equalities_broken(root, ec);

-        /* Recover if we failed to generate required derived clauses */
-        if (ec->ec_broken)
-            generate_base_implied_equalities_broken(root, ec);
-    }
+            /* Detect whether this EC might generate join clauses */
+            can_generate_joinclause =
+                (bms_membership(ec->ec_relids) == BMS_MULTIPLE);
+        }

-    /*
-     * This is also a handy place to mark base rels (which should all exist by
-     * now) with flags showing whether they have pending eclass joins.
-     */
-    for (rti = 1; rti < root->simple_rel_array_size; rti++)
-    {
-        RelOptInfo *brel = root->simple_rel_array[rti];
+        /*
+         * Mark the base rels cited in each eclass (which should all exist by
+         * now) with the eq_classes indexes of all eclasses mentioning them.
+         * This will let us avoid searching in subsequent lookups.  While
+         * we're at it, we can mark base rels that have pending eclass joins;
+         * this is a cheap version of has_relevant_eclass_joinclause().
+         */
+        i = -1;
+        while ((i = bms_next_member(ec->ec_relids, i)) > 0)
+        {
+            RelOptInfo *rel = root->simple_rel_array[i];

-        if (brel == NULL)
-            continue;
+            Assert(rel->reloptkind == RELOPT_BASEREL);
+
+            rel->eclass_indexes = bms_add_member(rel->eclass_indexes,
+                                                 ec_index);
+
+            if (can_generate_joinclause)
+                rel->has_eclass_joins = true;
+        }

-        brel->has_eclass_joins = has_relevant_eclass_joinclause(root, brel);
+        ec_index++;
     }
 }

@@ -2037,12 +2091,23 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
     Index        var2varno = fkinfo->ref_relid;
     AttrNumber    var2attno = fkinfo->confkey[colno];
     Oid            eqop = fkinfo->conpfeqop[colno];
+    RelOptInfo *rel1 = root->simple_rel_array[var1varno];
+    RelOptInfo *rel2 = root->simple_rel_array[var2varno];
     List       *opfamilies = NIL;    /* compute only if needed */
-    ListCell   *lc1;
-
-    foreach(lc1, root->eq_classes)
+    Bitmapset  *matching_ecs;
+    int            i;
+
+    /* Consider only eclasses mentioning both relations */
+    Assert(root->ec_merging_done);
+    Assert(IS_SIMPLE_REL(rel1));
+    Assert(IS_SIMPLE_REL(rel2));
+    matching_ecs = bms_intersect(rel1->eclass_indexes,
+                                 rel2->eclass_indexes);
+
+    i = -1;
+    while ((i = bms_next_member(matching_ecs, i)) >= 0)
     {
-        EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc1);
+        EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
         bool        item1member = false;
         bool        item2member = false;
         ListCell   *lc2;
@@ -2052,14 +2117,6 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
             continue;
         /* Note: it seems okay to match to "broken" eclasses here */

-        /*
-         * If eclass visibly doesn't have members for both rels, there's no
-         * need to grovel through the members.
-         */
-        if (!bms_is_member(var1varno, ec->ec_relids) ||
-            !bms_is_member(var2varno, ec->ec_relids))
-            continue;
-
         foreach(lc2, ec->ec_members)
         {
             EquivalenceMember *em = (EquivalenceMember *) lfirst(lc2);
@@ -2119,11 +2176,19 @@ add_child_rel_equivalences(PlannerInfo *root,
                            RelOptInfo *parent_rel,
                            RelOptInfo *child_rel)
 {
-    ListCell   *lc1;
+    int            i;

-    foreach(lc1, root->eq_classes)
+    /*
+     * EC merging should be complete already, so we can use the parent rel's
+     * eclass_indexes to avoid searching all of root->eq_classes.
+     */
+    Assert(root->ec_merging_done);
+    Assert(IS_SIMPLE_REL(parent_rel));
+
+    i = -1;
+    while ((i = bms_next_member(parent_rel->eclass_indexes, i)) >= 0)
     {
-        EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc1);
+        EquivalenceClass *cur_ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
         ListCell   *lc2;

         /*
@@ -2134,14 +2199,6 @@ add_child_rel_equivalences(PlannerInfo *root,
         if (cur_ec->ec_has_volatile)
             continue;

-        /*
-         * No point in searching if parent rel not mentioned in eclass; but we
-         * can't tell that for sure if parent rel is itself a child.
-         */
-        if (parent_rel->reloptkind == RELOPT_BASEREL &&
-            !bms_is_subset(parent_rel->relids, cur_ec->ec_relids))
-            continue;
-
         foreach(lc2, cur_ec->ec_members)
         {
             EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
@@ -2191,6 +2248,14 @@ add_child_rel_equivalences(PlannerInfo *root,
             }
         }
     }
+
+    /*
+     * The child is now mentioned in all the same eclasses as its parent ---
+     * except for corner cases such as a volatile EC.  But it's okay if
+     * eclass_indexes lists too many rels, so just borrow the parent's index
+     * set rather than making a new one.
+     */
+    child_rel->eclass_indexes = parent_rel->eclass_indexes;
 }


@@ -2227,7 +2292,10 @@ generate_implied_equalities_for_column(PlannerInfo *root,
     List       *result = NIL;
     bool        is_child_rel = (rel->reloptkind == RELOPT_OTHER_MEMBER_REL);
     Relids        parent_relids;
-    ListCell   *lc1;
+    int            i;
+
+    /* Should be OK to rely on eclass_indexes */
+    Assert(root->ec_merging_done);

     /* Indexes are available only on base or "other" member relations. */
     Assert(IS_SIMPLE_REL(rel));
@@ -2238,9 +2306,10 @@ generate_implied_equalities_for_column(PlannerInfo *root,
     else
         parent_relids = NULL;    /* not used, but keep compiler quiet */

-    foreach(lc1, root->eq_classes)
+    i = -1;
+    while ((i = bms_next_member(rel->eclass_indexes, i)) >= 0)
     {
-        EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc1);
+        EquivalenceClass *cur_ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
         EquivalenceMember *cur_em;
         ListCell   *lc2;

@@ -2252,14 +2321,6 @@ generate_implied_equalities_for_column(PlannerInfo *root,
             continue;

         /*
-         * No point in searching if rel not mentioned in eclass (but we can't
-         * tell that for a child rel).
-         */
-        if (!is_child_rel &&
-            !bms_is_subset(rel->relids, cur_ec->ec_relids))
-            continue;
-
-        /*
          * Scan members, looking for a match to the target column.  Note that
          * child EC members are considered, but only when they belong to the
          * target relation.  (Unlike regular members, the same expression
@@ -2352,11 +2413,16 @@ bool
 have_relevant_eclass_joinclause(PlannerInfo *root,
                                 RelOptInfo *rel1, RelOptInfo *rel2)
 {
-    ListCell   *lc1;
+    Bitmapset  *matching_ecs;
+    int            i;

-    foreach(lc1, root->eq_classes)
+    /* Examine only eclasses mentioning both rel1 and rel2 */
+    matching_ecs = get_common_eclass_indexes(root, rel1->relids, rel2->relids);
+
+    i = -1;
+    while ((i = bms_next_member(matching_ecs, i)) >= 0)
     {
-        EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc1);
+        EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);

         /*
          * Won't generate joinclauses if single-member (this test covers the
@@ -2384,6 +2450,10 @@ have_relevant_eclass_joinclause(PlannerInfo *root,
          * b.y and a.x = 42", it is worth considering a join between a and b,
          * since the join result is likely to be small even though it'll end
          * up being an unqualified nestloop.
+         *
+         * The bms_overlap tests here are normally redundant, given the work
+         * done by get_common_eclass_indexes, but we keep them in case either
+         * rel's eclass_indexes contains extra entries.
          */
         if (bms_overlap(rel1->relids, ec->ec_relids) &&
             bms_overlap(rel2->relids, ec->ec_relids))
@@ -2405,11 +2475,16 @@ have_relevant_eclass_joinclause(PlannerInfo *root,
 bool
 has_relevant_eclass_joinclause(PlannerInfo *root, RelOptInfo *rel1)
 {
-    ListCell   *lc1;
+    Bitmapset  *matched_ecs;
+    int            i;

-    foreach(lc1, root->eq_classes)
+    /* Examine only eclasses mentioning rel1 */
+    matched_ecs = get_eclass_indexes_for_relids(root, rel1->relids);
+
+    i = -1;
+    while ((i = bms_next_member(matched_ecs, i)) >= 0)
     {
-        EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc1);
+        EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);

         /*
          * Won't generate joinclauses if single-member (this test covers the
@@ -2556,3 +2631,54 @@ is_redundant_with_indexclauses(RestrictInfo *rinfo, List *indexclauses)

     return false;
 }
+
+/*
+ * get_eclass_indexes_for_relids
+ *        Build and return a Bitmapset containing the indexes into root's
+ *        eq_classes list for all eclasses that mention any of these relids
+ */
+static Bitmapset *
+get_eclass_indexes_for_relids(PlannerInfo *root, Relids relids)
+{
+    Bitmapset  *ec_indexes = NULL;
+    int            i = -1;
+
+    /* Should be OK to rely on eclass_indexes */
+    Assert(root->ec_merging_done);
+
+    while ((i = bms_next_member(relids, i)) > 0)
+    {
+        RelOptInfo *rel = root->simple_rel_array[i];
+
+        ec_indexes = bms_add_members(ec_indexes, rel->eclass_indexes);
+    }
+    return ec_indexes;
+}
+
+/*
+ * get_common_eclass_indexes
+ *        Build and return a Bitmapset containing the indexes into root's
+ *        eq_classes list for all eclasses that mention rels in both
+ *        relids1 and relids2.
+ */
+static Bitmapset *
+get_common_eclass_indexes(PlannerInfo *root, Relids relids1, Relids relids2)
+{
+    Bitmapset  *rel1ecs;
+    Bitmapset  *rel2ecs;
+    int            relid;
+
+    rel1ecs = get_eclass_indexes_for_relids(root, relids1);
+
+    /*
+     * We can get away with just using the relation's eclass_indexes directly
+     * when relids2 is a singleton set.
+     */
+    if (bms_get_singleton_member(relids2, &relid))
+        rel2ecs = root->simple_rel_array[relid]->eclass_indexes;
+    else
+        rel2ecs = get_eclass_indexes_for_relids(root, relids2);
+
+    /* Calculate and return the common EC indexes, recycling the left input. */
+    return bms_int_members(rel1ecs, rel2ecs);
+}
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 56d839b..c34380f 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -42,9 +42,7 @@ static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);
  *      entry if there's not one already.
  *
  * Note that this function must not be used until after we have completed
- * merging EquivalenceClasses.  (We don't try to enforce that here; instead,
- * equivclass.c will complain if a merge occurs after root->canon_pathkeys
- * has become nonempty.)
+ * merging EquivalenceClasses.
  */
 PathKey *
 make_canonical_pathkey(PlannerInfo *root,
@@ -55,6 +53,10 @@ make_canonical_pathkey(PlannerInfo *root,
     ListCell   *lc;
     MemoryContext oldcontext;

+    /* Can't make canonical pathkeys if the set of ECs might still change */
+    if (!root->ec_merging_done)
+        elog(ERROR, "too soon to build canonical pathkeys");
+
     /* The passed eclass might be non-canonical, so chase up to the top */
     while (eclass->ec_merged)
         eclass = eclass->ec_merged;
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 3cedd01..9fc1b24 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -142,6 +142,12 @@ query_planner(PlannerInfo *root, List *tlist,
                 set_cheapest(final_rel);

                 /*
+                 * We don't need to run generate_base_implied_equalities, but
+                 * we do need to pretend that EC merging is complete.
+                 */
+                root->ec_merging_done = true;
+
+                /*
                  * We still are required to call qp_callback, in case it's
                  * something like "SELECT 2+2 ORDER BY 1".
                  */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index e408e77..0a0fc97 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -620,6 +620,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
     root->cte_plan_ids = NIL;
     root->multiexpr_params = NIL;
     root->eq_classes = NIL;
+    root->ec_merging_done = false;
     root->append_rel_list = NIL;
     root->rowMarks = NIL;
     memset(root->upper_rels, 0, sizeof(root->upper_rels));
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index aebe162..5d4aa60 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -886,6 +886,7 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
     subroot->cte_plan_ids = NIL;
     subroot->multiexpr_params = NIL;
     subroot->eq_classes = NIL;
+    subroot->ec_merging_done = false;
     subroot->append_rel_list = NIL;
     subroot->rowMarks = NIL;
     memset(subroot->upper_rels, 0, sizeof(subroot->upper_rels));
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index eb815c2..3a5482b 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -121,6 +121,15 @@ plan_set_operations(PlannerInfo *root)
     Assert(parse->distinctClause == NIL);

     /*
+     * In the outer query level, we won't have any true equivalences to deal
+     * with; but we do want to be able to make pathkeys, which will require
+     * single-member EquivalenceClasses.  Indicate that EC merging is complete
+     * so that pathkeys.c won't complain.
+     */
+    Assert(root->eq_classes == NIL);
+    root->ec_merging_done = true;
+
+    /*
      * We'll need to build RelOptInfos for each of the leaf subqueries, which
      * are RTE_SUBQUERY rangetable entries in this Query.  Prepare the index
      * arrays for that.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 4130514..4a3a4e3 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -178,6 +178,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
     rel->pages = 0;
     rel->tuples = 0;
     rel->allvisfrac = 0;
+    rel->eclass_indexes = NULL;
     rel->subroot = NULL;
     rel->subplan_params = NIL;
     rel->rel_parallel_workers = -1; /* set up in get_relation_info */
@@ -600,6 +601,7 @@ build_join_rel(PlannerInfo *root,
     joinrel->pages = 0;
     joinrel->tuples = 0;
     joinrel->allvisfrac = 0;
+    joinrel->eclass_indexes = NULL;
     joinrel->subroot = NULL;
     joinrel->subplan_params = NIL;
     joinrel->rel_parallel_workers = -1;
@@ -779,6 +781,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
     joinrel->pages = 0;
     joinrel->tuples = 0;
     joinrel->allvisfrac = 0;
+    joinrel->eclass_indexes = NULL;
     joinrel->subroot = NULL;
     joinrel->subplan_params = NIL;
     joinrel->serverid = InvalidOid;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 253e0b7..bdf43ae 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -265,6 +265,8 @@ struct PlannerInfo

     List       *eq_classes;        /* list of active EquivalenceClasses */

+    bool        ec_merging_done;    /* set true once ECs are canonical */
+
     List       *canon_pathkeys; /* list of "canonical" PathKeys */

     List       *left_join_clauses;    /* list of RestrictInfos for mergejoinable
@@ -495,6 +497,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
  *        pages - number of disk pages in relation (zero if not a table)
  *        tuples - number of tuples in relation (not considering restrictions)
  *        allvisfrac - fraction of disk pages that are marked all-visible
+ *        eclass_indexes - EquivalenceClasses that mention this rel (filled
+ *                         only after EC merging is complete)
  *        subroot - PlannerInfo for subquery (NULL if it's not a subquery)
  *        subplan_params - list of PlannerParamItems to be passed to subquery
  *
@@ -668,6 +672,8 @@ typedef struct RelOptInfo
     BlockNumber pages;            /* size estimates derived from pg_class */
     double        tuples;
     double        allvisfrac;
+    Bitmapset  *eclass_indexes; /* Indexes in PlannerInfo's eq_classes list of
+                                 * ECs that mention this rel */
     PlannerInfo *subroot;        /* if subquery */
     List       *subplan_params; /* if subquery */
     int            rel_parallel_workers;    /* wanted number of parallel workers */

Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
Thanks for having a hack at this.

On Fri, 22 Mar 2019 at 10:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm unsure how hard we should push to get something like this into v12.
> I'm concerned that its dependency on list_nth might result in performance
> regressions in some cases; it's a lot easier to believe that this will
> be mostly-a-win with the better List infrastructure we're hoping to get
> into v13.  If you want to keep playing with it, OK, but I'm kind of
> tempted to shelve it for now.

Yeah,  mentioned a similar concern above.  The best I could come up
with to combat it was the list_skip_forward function. It wasn't
particularly pretty and was only intended as a stop-gap until List
become array-based. I think it should stop any regression.  I'm okay
with waiting until we get array based Lists, if you think that's best,
but it's a bit sad to leave this regression in yet another major
release.

However, there's always a danger we find some show-stopper with your
list reimplementation patch, in which case I wouldn't really like to
be left with list_skip_forward() in core.

If there's any consensus we want this for v12, then I'll happily look
over your patch, otherwise, I'll look sometime before July.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Performance issue in foreign-key-aware join estimation

From
Tom Lane
Date:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Fri, 22 Mar 2019 at 10:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm unsure how hard we should push to get something like this into v12.
>> I'm concerned that its dependency on list_nth might result in performance
>> regressions in some cases; ...

> However, there's always a danger we find some show-stopper with your
> list reimplementation patch, in which case I wouldn't really like to
> be left with list_skip_forward() in core.

We could always do something like what we've already done with
simple_rel_array and simple_rte_array, ie, replace the eq_classes
List with a manually-managed pointer array.  Given the small number
of places that touch that list, it wouldn't be too awful --- but
still, I'd only consider that if the List-reimplementation patch
goes down in flames.

            regards, tom lane


Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
On Fri, 22 Mar 2019 at 10:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <david.rowley@2ndquadrant.com> writes:
> > [ eclass_indexes_v4.patch ]
>
> I still don't like this approach too much.  I think we can fairly easily
> construct the eclass_indexes at a higher level instead of trying to
> manage them in add_eq_member, and after some hacking I have the attached.

I've rebased this on top of the pgindent changes (attached) and looked over it.

The only problem I see is that you're not performing a bms_copy() of
the parent's eclass_indexes in add_child_rel_equivalences(). You've
written a comment that claims it's fine to just point to the parent's
in:

/*
* The child is now mentioned in all the same eclasses as its parent ---
* except for corner cases such as a volatile EC.  But it's okay if
* eclass_indexes lists too many rels, so just borrow the parent's index
* set rather than making a new one.
*/
child_rel->eclass_indexes = parent_rel->eclass_indexes;

but that's not true since in get_eclass_for_sort_expr() we perform:

rel->eclass_indexes = bms_add_member(rel->eclass_indexes,
ec_index);

which will work fine about in about 63 out of 64 cases, but once
bms_add_member has to reallocate the set then it'll leave the child
rel's eclass_indexes pointing to freed memory.  I'm not saying I have
any reproducible test case that can crash the backend from this, but
it does seem like a bug waiting to happen.

Apart from that, as far as I can tell, the patch seems fine.

I didn't add the bms_copy(). I'll wait for your comments before doing so.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
On Sat, 25 May 2019 at 16:37, David Rowley <david.rowley@2ndquadrant.com> wrote:
> The only problem I see is that you're not performing a bms_copy() of
> the parent's eclass_indexes in add_child_rel_equivalences(). You've
> written a comment that claims it's fine to just point to the parent's
> in:
>
> /*
> * The child is now mentioned in all the same eclasses as its parent ---
> * except for corner cases such as a volatile EC.  But it's okay if
> * eclass_indexes lists too many rels, so just borrow the parent's index
> * set rather than making a new one.
> */
> child_rel->eclass_indexes = parent_rel->eclass_indexes;
>
> but that's not true since in get_eclass_for_sort_expr() we perform:
>
> rel->eclass_indexes = bms_add_member(rel->eclass_indexes,
> ec_index);
>
> which will work fine about in about 63 out of 64 cases, but once
> bms_add_member has to reallocate the set then it'll leave the child
> rel's eclass_indexes pointing to freed memory.  I'm not saying I have
> any reproducible test case that can crash the backend from this, but
> it does seem like a bug waiting to happen.
>
> Apart from that, as far as I can tell, the patch seems fine.
>
> I didn't add the bms_copy(). I'll wait for your comments before doing so.

I've rebased this on top of the current master. d25ea0127 conflicted
with the old version.

I also tried to get the planner to crash by trying to find a case
where a new eclass is added after setting the child relations
eclass_indexes. I thought this could be done via a call to
make_pathkey_from_sortinfo(), but I was unable to find any case that
passes create_it as true. I added some code to emit a warning when
this happens after a call to add_child_rel_equivalences() and found
that the warning wasn't raised after running make check. However, I'm
still pretty scared by not making a copy of the Bitmapset. It seems
like if anything ever changed in this area then we could end up with
some very rare crashes if the parent's eclass_indexes grew another
bitmapword since the child took it's copy of them, so I added the
bms_copy() in the attached version.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
On Sun, 16 Jun 2019 at 19:42, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> I've rebased this on top of the current master. d25ea0127 conflicted
> with the old version.

... and again, per recent conflicting change in equivclass.c

I've also taken a fresh set of performance benchmarks since 1cff1b95a
has recently changed the list.c implementation to use arrays instead
of singly-linked-lists.

I've attached 2 patched:

0001: Is the rebased version of eclass_indexes_v7.patch.
0002: Is new and goes even further to improve performance.

Using schema.sql and query.sql from
https://postgr.es/m/6970.1545327857%40sss.pgh.pa.us I get:

master @ 21039555

postgres=# \i query.sql
Time: 5078.105 ms (00:05.078)
Time: 5279.733 ms (00:05.280)
Time: 5375.766 ms (00:05.376)
Time: 5382.716 ms (00:05.383)

master + 0001:

postgres=# \i query.sql
Time: 2116.394 ms (00:02.116)
Time: 2076.883 ms (00:02.077)
Time: 2142.237 ms (00:02.142)
Time: 2199.468 ms (00:02.199)

(2.47x faster than master)

Per what Tom mentioned in
https://postgr.es/m/16252.1553202606@sss.pgh.pa.us about
generate_join_implied_equalities[_for_ecs].   Since
generate_join_implied_equalities() is still quite an overhead in
profiles, it seems to make sense to special purpose this function
rather than have it call generate_join_implied_equalities_for_ecs()
and pass the root->eq_classes list.  Passing the list means we can't
use the new ec_indexes Bitmapset, so can get no benefit of the
improved EC lookup method.

Since generate_join_implied_equalities_for_ecs() is fairly short, I
don't think it's all that bad to keep another slightly altered copy of
it.  Especially given the following performance results from doing so:

master + 0001 + 0002:

postgres=# \i query.sql
Time: 1308.742 ms (00:01.309)
Time: 1294.766 ms (00:01.295)
Time: 1293.113 ms (00:01.293)
Time: 1300.643 ms (00:01.301)

(4.06x faster than master)

Unless there's some objection, I'll be looking into pushing both 0001
and 0002 in a single commit in the next few days.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
On Thu, 18 Jul 2019 at 19:24, David Rowley <david.rowley@2ndquadrant.com> wrote:
> Unless there's some objection, I'll be looking into pushing both 0001
> and 0002 in a single commit in the next few days.

I've pushed this after doing a bit of final tweaking.

After a re-read, I didn't really like all the code that rechecked that
ec->ec_relids matched the relation we're searching for.  The only code
that seems to be able to put additional members into eclass_indexes
that there's no mention of in the actual class was in
add_child_rel_equivalences(). The comments for the definition of
eclass_indexes said nothing about there being a possibility of the
field containing an index for an EC that knows nothing about the given
relation.   Fixing that either meant fixing the comment to say that
"they *may* contain", or fixing up the code so that it's strict about
what ECs can be mentioned in eclass_indexes.  I went with the fixing
the code option since it also allows us to get rid of some redundant
checks, to which I turned into Asserts() to catch any possible future
bugs that might be introduced by any code that might one day remove
rels from an EC, e.g something like
https://commitfest.postgresql.org/23/1712/

I also did some performance tests on the most simple query I could
think of that uses eclasses.

select2.sql: SELECT * FROM t1 INNER JOIN t2 ON t1.a=t2.a

Master:

$ pgbench -n -f select2.sql -T 60 postgres
tps = 12143.597276 (excluding connections establishing)
tps = 12100.773839 (excluding connections establishing)
tps = 12086.209389 (excluding connections establishing)
tps = 12098.194927 (excluding connections establishing)
tps = 12105.140058 (excluding connections establishing)

Patched:

$ pgbench -n -f select2.sql -T 60 postgres
tps = 12224.597530 (excluding connections establishing)
tps = 12097.286522 (excluding connections establishing)
tps = 12035.306729 (excluding connections establishing)
tps = 11965.848289 (excluding connections establishing)
tps = 12059.846474 (excluding connections establishing)

There's a bit of noise there, but on average we're just 0.25% slower
on the worse case and the benchmarks shown above put us ~406% better
on with the fairly complex query that Tom posted in the initial email
on this thread.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Performance issue in foreign-key-aware join estimation

From
Andreas Seltenreich
Date:
David Rowley writes:

> On Thu, 18 Jul 2019 at 19:24, David Rowley <david.rowley@2ndquadrant.com> wrote:
>> Unless there's some objection, I'll be looking into pushing both 0001
>> and 0002 in a single commit in the next few days.
>
> I've pushed this after doing a bit of final tweaking.

sqlsmith triggers an assertion in this commit (3373c7155350):

TRAP: FailedAssertion("!(rel->reloptkind == RELOPT_BASEREL)", File: "equivclass.c", Line: 764)

Here's a somewhat reduced testcase to be run on the regression db:

--8<---------------cut here---------------start------------->8---
select
    max(date_mii(now()::date, 42)) over (partition by subq_1.c9 order by c3),
    min(c3) over (partition by subq_1.c8 )
from
  (select 1 as c3 from public.partr_def2 as ref_0
        left join public.num_exp_power_10_ln as sample_0
        on (ref_0.a = sample_0.id ) ) as subq_0
    right join (select 1 as c8, 1 as c9) as subq_1
    on (true);
--8<---------------cut here---------------end--------------->8---

Backtrace below.

regards,
Andreas


(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007face8834535 in __GI_abort () at abort.c:79
#2  0x0000562b8d2731a3 in ExceptionalCondition (
    conditionName=conditionName@entry=0x562b8d418320 "!(rel->reloptkind == RELOPT_BASEREL)",
    errorType=errorType@entry=0x562b8d2c601d "FailedAssertion", fileName=fileName@entry=0x562b8d418190 "equivclass.c",
    lineNumber=lineNumber@entry=764) at assert.c:54
#3  0x0000562b8d067ddc in get_eclass_for_sort_expr (root=root@entry=0x562b8e9077c8, expr=expr@entry=0x7facdf610030,
    nullable_relids=0x7facdf6216f8, nullable_relids@entry=0x7facdf615560, opfamilies=0x7facdf621348,
    opcintype=opcintype@entry=2277, collation=collation@entry=0, sortref=1, rel=0x0, create_it=true) at
equivclass.c:764
#4  0x0000562b8d07326e in make_pathkey_from_sortinfo (root=0x562b8e9077c8, expr=0x7facdf610030,
nullable_relids=0x7facdf615560,
    opfamily=397, opcintype=2277, collation=0, reverse_sort=false, nulls_first=false, sortref=1, rel=0x0,
create_it=true)
    at pathkeys.c:215
#5  0x0000562b8d07401f in make_pathkey_from_sortop (create_it=true, sortref=1, nulls_first=false, ordering_op=1072,
    nullable_relids=0x7facdf615560, expr=0x7facdf610030, root=0x562b8e9077c8) at pathkeys.c:258
#6  make_pathkeys_for_sortclauses (root=root@entry=0x562b8e9077c8, sortclauses=sortclauses@entry=0x7facdf620f98,
    tlist=tlist@entry=0x562b8e929768) at pathkeys.c:1086
#7  0x0000562b8d082533 in make_pathkeys_for_window (root=root@entry=0x562b8e9077c8, tlist=0x562b8e929768, wc=<optimized
out>,
    wc=<optimized out>) at planner.c:5642
#8  0x0000562b8d087c81 in create_one_window_path (wflists=<optimized out>, activeWindows=<optimized out>,
    output_target=<optimized out>, input_target=<optimized out>, path=0x7facdf620ea8, window_rel=<optimized out>,
    root=<optimized out>) at planner.c:4663
#9  create_window_paths (activeWindows=<optimized out>, wflists=0x7facdf613b80, output_target_parallel_safe=<optimized
out>,
    output_target=0x7facdf6205b8, input_target=0x7facdf6206f8, input_rel=<optimized out>, root=0x562b8e9077c8) at
planner.c:4588
#10 grouping_planner (root=<optimized out>, inheritance_update=false, tuple_fraction=<optimized out>) at
planner.c:2211
#11 0x0000562b8d089dfa in subquery_planner (glob=glob@entry=0x562b8e91bb20, parse=parse@entry=0x562b8e906740,
    parent_root=parent_root@entry=0x0, hasRecursion=hasRecursion@entry=false, tuple_fraction=tuple_fraction@entry=0)
    at planner.c:1013
#12 0x0000562b8d08afa7 in standard_planner (parse=0x562b8e906740, cursorOptions=256, boundParams=<optimized out>) at
planner.c:406



Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
On Mon, 22 Jul 2019 at 00:44, Andreas Seltenreich <seltenreich@gmx.de> wrote:
> sqlsmith triggers an assertion in this commit (3373c7155350):
>
> TRAP: FailedAssertion("!(rel->reloptkind == RELOPT_BASEREL)", File: "equivclass.c", Line: 764)

Thanks for the report.

It looks like this is caused by the join removal code removing the
LEFT JOIN and leaving a dead rel in the eclasses ec_relids.  The fix
is likely either to adjust the Assert to allow that or to add an if
test so that we only bother calling bms_add_member for base rels.  I'm
not quite sure yet.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Performance issue in foreign-key-aware join estimation

From
David Rowley
Date:
On Mon, 22 Jul 2019 at 01:50, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> On Mon, 22 Jul 2019 at 00:44, Andreas Seltenreich <seltenreich@gmx.de> wrote:
> > sqlsmith triggers an assertion in this commit (3373c7155350):
> >
> > TRAP: FailedAssertion("!(rel->reloptkind == RELOPT_BASEREL)", File: "equivclass.c", Line: 764)
>
> Thanks for the report.
>
> It looks like this is caused by the join removal code removing the
> LEFT JOIN and leaving a dead rel in the eclasses ec_relids.  The fix
> is likely either to adjust the Assert to allow that or to add an if
> test so that we only bother calling bms_add_member for base rels.  I'm
> not quite sure yet.

I ended up adjusting the Assert to allow dead rels too.  I thought
about adding an if test so we only do the bms_add_member for base
rels, but I didn't really like the special case where eclass_indexes
wouldn't be correctly set for dead rels. I had thoughts that dead rels
were not common enough to go to additional trouble over.  That's
debatable of course.  I also thought about removing the Assert
completely, but it does help verify we don't get anything unexpected
in ec_relids.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services