Thread: BUG #15102: Performance problem when doing join, index are not used

BUG #15102: Performance problem when doing join, index are not used

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15102
Logged by:          Mehdi Rahman
Email address:      mehdirahbvd@gmail.com
PostgreSQL version: 10.3
Operating system:   Windows 2012 r2 std 64 bit
Description:

I did write a query finding the evolution of company global ultimate owner
during years 2007 to 2016. For that, I join a table with ids of all
companies we have to tables with the id of the guo for each year.


The computer is a dedicated server windows 2012 r2 std 64 bit with 64 gb
RAM, 64 bit, 8 cpu x64 and SCSI disks (6 To). As it is dedicated? I did
authorize postgress to use 16 Go RAM and 1 Go per request.

Tables are huge (million of rows) and have indexes on id (queries to create
table and index at end). When doing an explain (see below), the query do not
use my indexes, and runs for half an hour using 1 CPU and barely doing IO.

Here is a sample of data:
Data in tables public.links_xxx_guo is like this. links_2016_guo contains
63075671 rows without duplicates.
"US411568618"    "US174295875L"
"US411652566"    "US542185193"
"US411655168"    "US411849203"
"US410952918"    "US411504721"
"US4-5891774"    "USLEI77366"
"US4-5895366"    "US4-5927464"
"US4-5895406"    "US4-5927464"
"US4-5895441"    "US4-5927464"
"US4-5895468"    "US4-5927464"
"US4-5895661"    "US4-5927464"
"US4-5906622"    "USIA80173288"
"US4-5907020"    "US133257110L"
"US4-5907665"    "US4-0383146"

Data in allsubjects is like this. it contains 124730186 rows without
duplicates.
"AU083774798"
"AU083774798-B0002"
"AU083775357"
"AU083775357-B0001"
"AU083775357-B0002"
"AU083775473"
"AU083775473-B0001"
"AU083775795"
"AU083775795-B0001"
"AU083775795-B0002"
"AU083775795-B0003"
"AU083775795-B0004"
"AU083775795-B0006"
"AU083775928"
"AU083775928-B0003"
"AU083775928-B0004"
"AU083775982"

Here is the query:
select allsubjects."Subsidiary BvD ID" as
SubjectID,links_2007_guo."Shareholder BvD ID" as
GUO_2007,links_2008_guo."Shareholder BvD ID" as
GUO_2008,links_2009_guo."Shareholder BvD ID" as
GUO_2009,links_2010_guo."Shareholder BvD ID" as
GUO_2010,links_2011_guo."Shareholder BvD ID" as
GUO_2011,links_2012_guo."Shareholder BvD ID" as
GUO_2012,links_2013_guo."Shareholder BvD ID" as
GUO_2013,links_2014_guo."Shareholder BvD ID" as
GUO_2014,links_2015_guo."Shareholder BvD ID" as
GUO_2015,links_2016_guo."Shareholder BvD ID" as GUO_2016 into guo50 from
allsubjects left outer join links_2007_guo on allsubjects."Subsidiary BvD
ID" = links_2007_guo."Subsidiary BvD ID" left outer join links_2008_guo on
allsubjects."Subsidiary BvD ID" = links_2008_guo."Subsidiary BvD ID" left
outer join links_2009_guo on allsubjects."Subsidiary BvD ID" =
links_2009_guo."Subsidiary BvD ID" left outer join links_2010_guo on
allsubjects."Subsidiary BvD ID" = links_2010_guo."Subsidiary BvD ID" left
outer join links_2011_guo on allsubjects."Subsidiary BvD ID" =
links_2011_guo."Subsidiary BvD ID" left outer join links_2012_guo on
allsubjects."Subsidiary BvD ID" = links_2012_guo."Subsidiary BvD ID" left
outer join links_2013_guo on allsubjects."Subsidiary BvD ID" =
links_2013_guo."Subsidiary BvD ID" left outer join links_2014_guo on
allsubjects."Subsidiary BvD ID" = links_2014_guo."Subsidiary BvD ID" left
outer join links_2015_guo on allsubjects."Subsidiary BvD ID" =
links_2015_guo."Subsidiary BvD ID" left outer join links_2016_guo on
allsubjects."Subsidiary BvD ID" = links_2016_guo."Subsidiary BvD ID" 

Here is the explain results with real data:
"Merge Left Join  (cost=64186781.61..72528875.02 rows=124730184
width=148)"
"  Merge Cond: ((allsubjects."Subsidiary BvD ID")::text =
(links_2016_guo."Subsidiary BvD ID")::text)"
"  ->  Merge Left Join  (cost=54054997.55..60981441.25 rows=124730184
width=134)"
"        Merge Cond: ((allsubjects."Subsidiary BvD ID")::text =
(links_2015_guo."Subsidiary BvD ID")::text)"
"        ->  Merge Left Join  (cost=45929338.09..51650216.35 rows=124730184
width=120)"
"              Merge Cond: ((allsubjects."Subsidiary BvD ID")::text =
(links_2014_guo."Subsidiary BvD ID")::text)"
"              ->  Merge Left Join  (cost=39700486.68..44416349.85
rows=124730184 width=106)"
"                    Merge Cond: ((allsubjects."Subsidiary BvD ID")::text =
(links_2013_guo."Subsidiary BvD ID")::text)"
"                    ->  Merge Left Join  (cost=34095001.72..37872283.29
rows=124730184 width=92)"
"                          Merge Cond: ((allsubjects."Subsidiary BvD
ID")::text = (links_2012_guo."Subsidiary BvD ID")::text)"
"                          ->  Merge Left Join
(cost=29447210.87..32388567.11 rows=124730184 width=78)"
"                                Merge Cond: ((allsubjects."Subsidiary BvD
ID")::text = (links_2011_guo."Subsidiary BvD ID")::text)"
"                                ->  Merge Left Join
(cost=26186284.29..28442070.80 rows=124730184 width=65)"
"                                      Merge Cond: ((allsubjects."Subsidiary
BvD ID")::text = (links_2010_guo."Subsidiary BvD ID")::text)"
"                                      ->  Merge Left Join
(cost=24091276.15..25805109.28 rows=124730184 width=52)"
"                                            Merge Cond:
((allsubjects."Subsidiary BvD ID")::text = (links_2009_guo."Subsidiary BvD
ID")::text)"
"                                            ->  Merge Left Join
(cost=22434390.29..23652173.77 rows=124730184 width=39)"
"                                                  Merge Cond:
((allsubjects."Subsidiary BvD ID")::text = (links_2008_guo."Subsidiary BvD
ID")::text)"
"                                                  ->  Merge Left Join
(cost=21061837.72..21813642.52 rows=124730184 width=26)"
"                                                        Merge Cond:
((allsubjects."Subsidiary BvD ID")::text = (links_2007_guo."Subsidiary BvD
ID")::text)"
"                                                        ->  Sort
(cost=19932318.48..20244143.94 rows=124730184 width=13)"
"                                                              Sort Key:
allsubjects."Subsidiary BvD ID""
"                                                              ->  Seq Scan
on allsubjects  (cost=0.00..1941633.84 rows=124730184 width=13)"
"                                                        ->  Sort
(cost=1129519.24..1150878.22 rows=8543592 width=25)"
"                                                              Sort Key:
links_2007_guo."Subsidiary BvD ID""
"                                                              ->  Seq Scan
on links_2007_guo  (cost=0.00..145877.92 rows=8543592 width=25)"
"                                                  ->  Sort
(cost=1372552.57..1398244.78 rows=10276881 width=25)"
"                                                        Sort Key:
links_2008_guo."Subsidiary BvD ID""
"                                                        ->  Seq Scan on
links_2008_guo  (cost=0.00..175660.81 rows=10276881 width=25)"
"                                            ->  Sort
(cost=1656885.86..1687589.89 rows=12281613 width=26)"
"                                                  Sort Key:
links_2009_guo."Subsidiary BvD ID""
"                                                  ->  Seq Scan on
links_2009_guo  (cost=0.00..210726.13 rows=12281613 width=26)"
"                                      ->  Sort
(cost=2095008.14..2133362.79 rows=15341861 width=26)"
"                                            Sort Key:
links_2010_guo."Subsidiary BvD ID""
"                                            ->  Seq Scan on links_2010_guo
(cost=0.00..263882.61 rows=15341861 width=26)"
"                                ->  Materialize
(cost=3260926.57..3367710.65 rows=21356816 width=26)"
"                                      ->  Sort
(cost=3260926.57..3314318.61 rows=21356816 width=26)"
"                                            Sort Key:
links_2011_guo."Subsidiary BvD ID""
"                                            ->  Seq Scan on links_2011_guo
(cost=0.00..368939.16 rows=21356816 width=26)"
"                          ->  Materialize  (cost=4647790.85..4797533.67
rows=29948564 width=27)"
"                                ->  Sort  (cost=4647790.85..4722662.26
rows=29948564 width=27)"
"                                      Sort Key: links_2012_guo."Subsidiary
BvD ID""
"                                      ->  Seq Scan on links_2012_guo
(cost=0.00..519326.64 rows=29948564 width=27)"
"                    ->  Materialize  (cost=5605484.97..5784558.15
rows=35814636 width=27)"
"                          ->  Sort  (cost=5605484.97..5695021.56
rows=35814636 width=27)"
"                                Sort Key: links_2013_guo."Subsidiary BvD
ID""
"                                ->  Seq Scan on links_2013_guo
(cost=0.00..622160.36 rows=35814636 width=27)"
"              ->  Materialize  (cost=6228851.41..6426905.59 rows=39610836
width=27)"
"                    ->  Sort  (cost=6228851.41..6327878.50 rows=39610836
width=27)"
"                          Sort Key: links_2014_guo."Subsidiary BvD ID""
"                          ->  Seq Scan on links_2014_guo
(cost=0.00..688528.36 rows=39610836 width=27)"
"        ->  Materialize  (cost=8125659.46..8381013.74 rows=51070856
width=28)"
"              ->  Sort  (cost=8125659.46..8253336.60 rows=51070856
width=28)"
"                    Sort Key: links_2015_guo."Subsidiary BvD ID""
"                    ->  Seq Scan on links_2015_guo  (cost=0.00..888822.56
rows=51070856 width=28)"
"  ->  Materialize  (cost=10131784.05..10447162.41 rows=63075672
width=28)"
"        ->  Sort  (cost=10131784.05..10289473.23 rows=63075672 width=28)"
"              Sort Key: links_2016_guo."Subsidiary BvD ID""
"              ->  Seq Scan on links_2016_guo  (cost=0.00..1097783.72
rows=63075672 width=28)"

Here are the queries to create used table and indexes to reproduce the
problem:
CREATE TABLE public.allsubjects
(
    "Subsidiary BvD ID" character varying(100) COLLATE
pg_catalog."default"
)

CREATE TABLE public.links_2007_guo
(
    "Subsidiary BvD ID" character varying(1500) COLLATE
pg_catalog."default",
    "Shareholder BvD ID" character varying(1500) COLLATE
pg_catalog."default"
)

CREATE TABLE public.links_2008_guo
(
    "Subsidiary BvD ID" character varying(1500) COLLATE
pg_catalog."default",
    "Shareholder BvD ID" character varying(1500) COLLATE
pg_catalog."default"
)

CREATE TABLE public.links_2009_guo
(
    "Subsidiary BvD ID" character varying(1500) COLLATE
pg_catalog."default",
    "Shareholder BvD ID" character varying(1500) COLLATE
pg_catalog."default"
)

CREATE TABLE public.links_2010_guo
(
    "Subsidiary BvD ID" character varying(1500) COLLATE
pg_catalog."default",
    "Shareholder BvD ID" character varying(1500) COLLATE
pg_catalog."default"
)

CREATE TABLE public.links_2011_guo
(
    "Subsidiary BvD ID" character varying(1500) COLLATE
pg_catalog."default",
    "Shareholder BvD ID" character varying(1500) COLLATE
pg_catalog."default"
)

CREATE TABLE public.links_2012_guo
(
    "Subsidiary BvD ID" character varying(1500) COLLATE
pg_catalog."default",
    "Shareholder BvD ID" character varying(1500) COLLATE
pg_catalog."default"
)

CREATE TABLE public.links_2013_guo
(
    "Subsidiary BvD ID" character varying(1500) COLLATE
pg_catalog."default",
    "Shareholder BvD ID" character varying(1500) COLLATE
pg_catalog."default"
)

CREATE TABLE public.links_2014_guo
(
    "Subsidiary BvD ID" character varying(1500) COLLATE
pg_catalog."default",
    "Shareholder BvD ID" character varying(1500) COLLATE
pg_catalog."default"
)

CREATE TABLE public.links_2015_guo
(
    "Subsidiary BvD ID" character varying(1500) COLLATE
pg_catalog."default",
    "Shareholder BvD ID" character varying(1500) COLLATE
pg_catalog."default"
)

CREATE TABLE public.links_2016_guo
(
    "Subsidiary BvD ID" character varying(1500) COLLATE
pg_catalog."default",
    "Shareholder BvD ID" character varying(1500) COLLATE
pg_catalog."default"
)


CREATE INDEX "idx_AllSubjects_SubBvDID"
    ON public.allsubjects USING hash
    ("Subsidiary BvD ID" COLLATE pg_catalog."default")
    TABLESPACE pg_default;

CREATE INDEX idx_links_2007_guo_subbvdid
    ON public.links_2007_guo USING btree
    ("Subsidiary BvD ID" COLLATE pg_catalog."default", "Shareholder BvD ID"
COLLATE pg_catalog."default")
    TABLESPACE pg_default;

CREATE INDEX idx_links_2008_guo_subbvdid
    ON public.links_2008_guo USING btree
    ("Subsidiary BvD ID" COLLATE pg_catalog."default", "Shareholder BvD ID"
COLLATE pg_catalog."default")
    TABLESPACE pg_default;

CREATE INDEX idx_links_2009_guo_subbvdid
    ON public.links_2009_guo USING btree
    ("Subsidiary BvD ID" COLLATE pg_catalog."default", "Shareholder BvD ID"
COLLATE pg_catalog."default")
    TABLESPACE pg_default;

CREATE INDEX idx_links_2010_guo_subbvdid
    ON public.links_2010_guo USING btree
    ("Subsidiary BvD ID" COLLATE pg_catalog."default", "Shareholder BvD ID"
COLLATE pg_catalog."default")
    TABLESPACE pg_default;

CREATE INDEX idx_links_2011_guo_subbvdid
    ON public.links_2011_guo USING btree
    ("Subsidiary BvD ID" COLLATE pg_catalog."default", "Shareholder BvD ID"
COLLATE pg_catalog."default")
    TABLESPACE pg_default;

CREATE INDEX idx_links_2012_guo_subbvdid
    ON public.links_2012_guo USING btree
    ("Subsidiary BvD ID" COLLATE pg_catalog."default", "Shareholder BvD ID"
COLLATE pg_catalog."default")
    TABLESPACE pg_default;

CREATE INDEX idx_links_2013_guo_subbvdid
    ON public.links_2013_guo USING btree
    ("Subsidiary BvD ID" COLLATE pg_catalog."default", "Shareholder BvD ID"
COLLATE pg_catalog."default")
    TABLESPACE pg_default;

CREATE INDEX idx_links_2014_guo_subbvdid
    ON public.links_2014_guo USING btree
    ("Subsidiary BvD ID" COLLATE pg_catalog."default", "Shareholder BvD ID"
COLLATE pg_catalog."default")
    TABLESPACE pg_default;

CREATE INDEX idx_links_2015_guo_subbvdid
    ON public.links_2015_guo USING btree
    ("Subsidiary BvD ID" COLLATE pg_catalog."default", "Shareholder BvD ID"
COLLATE pg_catalog."default")
    TABLESPACE pg_default;

CREATE INDEX idx_links_2016_guo_subbvdid
    ON public.links_2016_guo USING btree
    ("Subsidiary BvD ID" COLLATE pg_catalog."default", "Shareholder BvD ID"
COLLATE pg_catalog."default")
    TABLESPACE pg_default;



=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
> Here is the query:
> [ select with 11 input tables ]

Perhaps raising join_collapse_limit to 11 or more would let the query
planner find a better plan.  Having said that, I see no especially good
reason to think that sort-and-merge isn't a good join type for this query.
Indexes aren't always the answer, especially not when joining large
numbers of rows as you are here.

Another direction to pursue is to raise work_mem to allow the sorts to
proceed more efficiently.  Don't go overboard on that, but judicious
increases can help.

Lastly, I see no reason whatever to think this is a bug.  You might
have better luck discussing the issue on the pgsql-performance list.

            regards, tom lane


Re: BUG #15102: Performance problem when doing join, index are not used

From
Mehdi Rahman
Date:
Hello,

Thanks a lot for your answer. I did change parameters and will retry the query.

I am sorry for posting in the bad list and will put any future performance questions at pgsql-performance.

Have a nice day,
Mehdi Rahman

2018-03-08 16:41 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
PG Bug reporting form <noreply@postgresql.org> writes:
> Here is the query:
> [ select with 11 input tables ]

Perhaps raising join_collapse_limit to 11 or more would let the query
planner find a better plan.  Having said that, I see no especially good
reason to think that sort-and-merge isn't a good join type for this query.
Indexes aren't always the answer, especially not when joining large
numbers of rows as you are here.

Another direction to pursue is to raise work_mem to allow the sorts to
proceed more efficiently.  Don't go overboard on that, but judicious
increases can help.

Lastly, I see no reason whatever to think this is a bug.  You might
have better luck discussing the issue on the pgsql-performance list.

                        regards, tom lane