Re: Join Bad Performance on different data types - Mailing list pgsql-general

From Victor Yegorov
Subject Re: Join Bad Performance on different data types
Date
Msg-id CAGnEbogBaLbxfQPOhy_FUc8ZuLuM5Xtd+SJO3D5vygeyfRW7tw@mail.gmail.com
Whole thread Raw
In response to Join Bad Performance on different data types  (Adarsh Sharma <eddy.adarsh@gmail.com>)
List pgsql-general
2014-03-05 6:53 GMT+02:00 Adarsh Sharma <eddy.adarsh@gmail.com>:
Anyhow working on finding the root cause.

I would do the following:

1. Check your `default_statistics_target`, like:

        SHOW default_statistics_target;

2. Your previously posted `EXPLAIN ANALYZE` with `set enable_seqscan=off` shows
    a significant skew in the estiamted number of rows. I would really bump up the
    `default_statistics_target` higher (modify your `postgresql.conf` and reload),
    up to 500, re-analyzed all tables and checked the performance again.

    If you don't want to do it for the whole cluster, then increase for individual columns.
    I think increasing it up to 500 (or higher) for the columns containing arrays will be
    beneficial in any case:

        ALTER TABLE graph5 ALTER detail_id SET STATISTICS 1000;
        ALTER TABLE graph3 ALTER id_list  SET STATISTICS 1000;
        ALTER TABLE graph2 ALTER targeting_ids SET STATISTICS 1000;

    Of course, you need to `ANALYZE` (or `VACUUM ANALYZE`) the tables after this change.

3. Next, I've reformatted your query the following way (easier for me):

        SELECT DISTINCT
               glt.id || ':' || gtt.name as id_type,
               glt.name,latitude,longitude,radius,
               latitude || ',' || longitude as latlon
          FROM graph5 td JOIN graph6 gtt ON gtt.id=td.entity_type_id JOIN graph2 gcr ON gcr.id=ANY(td.detail_id) JOIN graph3 gtd ON gtd.id=gcr.t_ids JOIN graph1 glt ON glt.id=ANY(gtd.id_list);

    I've ordered joins (well, tables) in the way they depend on each other.
    I would try out to move the first join into the CTE in order to isolate it's results
    and help optimizer to better estiamte the rows. Something like this:

        WITH first AS (
            SELECT td.*,gtt.*
              FROM graph5 td
              JOIN graph6 gtt ON gtt.id=td.entity_type_id
        )
        SELECT DISTINCT
               glt.id || ':' || first.name as id_type,
               glt.name,latitude,longitude,radius,
               latitude || ',' || longitude as latlon
          FROM first
          JOIN graph2 gcr ON gcr.id=ANY(first.detail_id)
          JOIN graph3 gtd ON gtd.id=gcr.t_ids
          JOIN graph1 glt ON glt.id=ANY(gtd.id_list)

4. Try disabling materialization, like `set enable_material=off`.


--
Victor Y. Yegorov

pgsql-general by date:

Previous
From: Venkata Balaji Nagothi
Date:
Subject: Re: How to recovery failed master after failover
Next
From: "Raschick, Hartmut"
Date:
Subject: too-may-open-files log file entries when vauuming under solaris