Re: Slow query: Postgres chooses nested loop over hash join, whery by hash join is much faster, wrong number of rows estimated - Mailing list pgsql-performance

From Merlin Moncure
Subject Re: Slow query: Postgres chooses nested loop over hash join, whery by hash join is much faster, wrong number of rows estimated
Date
Msg-id CAHyXU0wykYJsZN-gDOOv_8E6j8y-2LPj2+R8mnScVuamwem83g@mail.gmail.com
Whole thread Raw
In response to Slow query: Postgres chooses nested loop over hash join, whery by hash join is much faster, wrong number of rows estimated  (Sasa Vilic <sasavilic@gmail.com>)
List pgsql-performance
On Thu, Jun 11, 2015 at 7:18 PM, Sasa Vilic <sasavilic@gmail.com> wrote:
> Hi,
>
> I have a query that takes ridiculously long to complete (over 500ms) but if
> I disable nested loop it does it really fast (24.5ms)
>
> Here are links for
> * first request (everything enabled): http://explain.depesz.com/s/Q1M
> * second request (nested loop disabled): http://explain.depesz.com/s/9ZY
>
> I have also noticed, that setting
>
> set join_collapse_limit = 1;
>
> produces similar results as when nested loops are disabled.
>
> Autovacuumm is running, and I did manually performed both: analyze and
> vacuumm analyze. No effect.
>
> I tried increasing statistics for columns (slot, path_id, key) to 5000 for
> table data. No effect.
>
> I tried increasing statistics for columns (id, parent, key) to 5000 for
> table path. No effect.
>
> I can see, that postgres is doing wrong estimation on request count, but I
> can't figure it out why.
>
> Table path is used to represent tree-like structure.
>
> == QUERY ==
>
> SELECT p1.value as request_type, p2.value as app_id, p3.value as app_ip,
> p3.id as id, data.*, server.name
> FROM data
> INNER JOIN path p3 ON data.path_id = p3.id
> INNER JOIN server on data.server_id = server.id
> INNER JOIN path p2 on p2.id = p3.parent
> INNER JOIN path p1 on p1.id = p2.parent
> WHERE data.slot between '2015-02-18 00:00:00' and '2015-02-19 00:00:00'
>     AND p1.key = 'request_type' AND p2.key = 'app_id' AND p3.key = 'app_ip'
> ;
>
> == TABLES ==
>                                              Table "public.path"
>  Column |         Type          | Modifiers                     | Storage  |
> Description
> --------+-----------------------+---------------------------------------------------+----------+-------------
>  id     | integer               | not null default
> nextval('path_id_seq'::regclass) | plain    |
>  parent | integer |                                                   |
> plain    |
>  key    | character varying(25) | not null
> | extended |
>  value  | character varying(50) | not null
> | extended |
> Indexes:
>     "path_pkey" PRIMARY KEY, btree (id)
>     "path_unique" UNIQUE CONSTRAINT, btree (parent, key, value)
> Foreign-key constraints:
>     "path.fg.parent->path(id)" FOREIGN KEY (parent) REFERENCES path(id)
> Referenced by:
>     TABLE "data" CONSTRAINT "data_fkey_path" FOREIGN KEY (path_id)
> REFERENCES path(id)
>     TABLE "path" CONSTRAINT "path.fg.parent->path(id)" FOREIGN KEY (parent)
> REFERENCES path(id)
> Has OIDs: no
>
>                                Table "public.data"
>   Column   |              Type              | Modifiers | Storage  |
> Description
> -----------+--------------------------------+-----------+----------+-------------
>  slot      | timestamp(0) without time zone | not null  | plain    |
>  server_id | integer                        | not null  | plain    |
>  path_id   | integer                        | not null  | plain    |
>  key       | character varying(50)          | not null  | extended |
>  value     | real                           | not null  | plain    |
> Indexes:
>     "data_pkey" PRIMARY KEY, btree (slot, server_id, path_id, key)
> Foreign-key constraints:
>     "data_fkey_path" FOREIGN KEY (path_id) REFERENCES path(id)
> Has OIDs: no
>
> svilic=> select count(*) from path;
>  count
> -------
>    603
>
> svilic=> select count(*) from path p1 inner join path p2 on p1.id =
> p2.parent inner join path p3 on p2.id = p3.parent where p1.parent is null;
>  count
> -------
>    463
>
> svilic=> select count(*) from server;
>  count
> -------
>     37
>
> svilic=> select count(*) from data;
>   count
> ----------
>  23495552
>
>
> svilic=> select version();
> version
> -------------------------------------------------------------------------------------------------------------
>  PostgreSQL 9.1.17 on x86_64-unknown-linux-gnu, compiled by gcc
> (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
>
> == SERVER CONFIGURATION ==
>
> shared_buffers = 512MB
> work_mem = 8MB (I have tried changing it to 32, 128 and 512, no effect)
> maintenance_work_mem = 64MB
> checkpoint_segments = 100
> random_page_cost = 4.0
> effective_cache_size = 3072MB
>
> == HARDWARE CONFIGURATION ==
>
> cpu: Intel(R) Core(TM) i3-2100 CPU @ 3.10GHz (4 cores)
> mem: 8GB
> system is using regular disks, (no raid and no ssd)

huh.  the query looks pretty clean  (except for possible overuse of
surrogate keys which tend to exacerbate planning issues in certain
cases).

Let's try cranking statistics on data.path_id, first to 1000 and then
to 10000 and see how it affects the plan.   The database is clearly
misestimating row counts on that join.

merlin


pgsql-performance by date:

Previous
From: Kaijiang Chen
Date:
Subject: Re: Do work_mem and shared buffers have 1g or 2g limit on 64 bit linux?
Next
From: Tomas Vondra
Date:
Subject: Re: Do work_mem and shared buffers have 1g or 2g limit on 64 bit linux?