[PERFORM] Re: Fwd: Slow query from ~7M rows, joined to two tables of ~100 rowseach - Mailing list pgsql-performance

From Karl Czajkowski
Subject [PERFORM] Re: Fwd: Slow query from ~7M rows, joined to two tables of ~100 rowseach
Date
Msg-id 20170624020116.GA27236@moraine.isi.edu
Whole thread Raw
In response to [PERFORM] Fwd: Slow query from ~7M rows, joined to two tables of ~100 rows each  (Chris Wilson <chris+postgresql@qwirx.com>)
List pgsql-performance
On Jun 23, Chris Wilson modulated:
> ...
>     create table metric_pos (id serial primary key, pos integer);
>     create index idx_metric_pos_id_pos on metric_pos (id, pos);
> ...
>     create table asset_pos (id serial primary key, pos integer);
> ...

Did you only omit a CREATE INDEX statement on asset_pos (id, pos) from
your problem statement or also from your actual tests?  Without any
index, you are forcing the query planner to do that join the hard way.


>     CREATE TABLE metric_value
>     (
>       id_asset integer NOT NULL,
>       id_metric integer NOT NULL,
>       value double precision NOT NULL,
>       date date NOT NULL,
>       timerange_transaction tstzrange NOT NULL,
>       id bigserial NOT NULL,
>       CONSTRAINT cons_metric_value_pk PRIMARY KEY (id)
>     )
>     WITH (
>       OIDS=FALSE
>     );
>
> ...
>     CREATE INDEX idx_metric_value_id_metric_id_asset_date ON
>     metric_value (id_metric, id_asset, date, timerange_transaction,
>     value);
> ...

Have you tried adding a foreign key constraint on the id_asset and
id_metric columns?  I wonder if you'd get a better query plan if the
DB knew that the inner join would not change the number of result
rows.  I think it's doing the join inside the filter step because
it assumes that the inner join may drop rows.

Also, did you include an ANALYZE step between your table creation
statements and your query benchmarks?  Since you are dropping and
recreating test data, you have no stats on anything.


> This is an example of the kind of query we would like to speed up:
>
>
>     SELECT metric_pos.pos AS pos_metric, asset_pos.pos AS pos_asset,
>     date, value
>     FROM metric_value
>     INNER JOIN asset_pos ON asset_pos.id = metric_value.id_asset
>     INNER JOIN metric_pos ON metric_pos.id = metric_value.id_metric
>     WHERE
>     date >= '2016-01-01' and date < '2016-06-01'
>     AND timerange_transaction @> current_timestamp
>     ORDER BY metric_value.id_metric, metric_value.id_asset, date
>

How sparse is the typical result set selected by these date and
timerange predicates?  If it is sparse, I'd think you want your
compound index to start with those two columns.

Finally, your subject line said you were joining hundreds of rows to
millions.  In queries where we used a similarly small dimension table
in the WHERE clause, we saw massive speedup by pre-evaluating that
dimension query to produce an array of keys, the in-lining the actual
key constants in the where clause of a main fact table query that
no longer had the join in it.

In your case, the equivalent hack would be to compile the small
dimension tables into big CASE statements I suppose...


Karl


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PERFORM] Efficiently merging and sorting collections of sorted rows
Next
From: Rikard Pavelic
Date:
Subject: [PERFORM] slow delete due to reference