Thread: help with too slow query

help with too slow query

From
Pedro Jiménez Pérez
Date:
Hello,
  I have this table definition:
CREATE TABLE ism_floatvalues
(
  id_signal bigint NOT NULL, -- Indica la señal a la que pertenece este valor. Clave foránea que referencia al campo id_signal de la tabla ism_signal.
  time_stamp timestamp without time zone NOT NULL, -- Marca de tiempo que indica fecha y hora correpondiente a este dato. Junto con id_signal forma la clave primaria de esta tabla
  var_value double precision, -- Almacena el valor concreto de la señal en la marca de tiempo espeficicada.
  CONSTRAINT ism_floatvalues_id_signal_fkey FOREIGN KEY (id_signal)
      REFERENCES ism_signal (id_signal) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE
)
WITH (
  OIDS=FALSE
);

CREATE INDEX ism_floatvalues_index_idsignal_timestamp
  ON ism_floatvalues
  USING btree
  (id_signal, time_stamp DESC);



*********************************************

Then I run this query....
*********************************************
EXPLAIN analyze
select round(CAST(sum(var_value) AS numeric),2) as var_value, date_trunc('month', time_stamp) as time_stamp , date_part('month',date_trunc('month', time_stamp)) as month, date_part('year',date_trunc('year', time_stamp)) as year from ism_floatvalues where id_signal in
(
select id_signal from ism_signal where reference = 'EDCA' and id_source in
(
select id_source from ism_installation where id_installation in
(select id_installation from ism_groupxinstallation where id_group = 101)
)
)
and time_stamp > date_trunc('month', current_date - interval '11 months')
group by date_trunc('month', time_stamp), month, year
order by time_stamp

******************************
And this is the result:
******************************

"GroupAggregate  (cost=4766541.62..4884678.62 rows=39483 width=16) (actual time=1302542.073..1302713.154 rows=10 loops=1)"
"  ->  Sort  (cost=4766541.62..4789932.12 rows=9356201 width=16) (actual time=1302444.324..1302531.447 rows=9741 loops=1)"
"        Sort Key: (date_trunc('month'::text, ism_floatvalues.time_stamp)), (date_part('month'::text, date_trunc('month'::text, ism_floatvalues.time_stamp))), (date_part('year'::text, date_trunc('year'::text, ism_floatvalues.time_stamp)))"
"        Sort Method:  quicksort  Memory: 941kB"
"        ->  Hash Join  (cost=545.65..3203518.39 rows=9356201 width=16) (actual time=458941.090..1302245.307 rows=9741 loops=1)"
"              Hash Cond: (ism_floatvalues.id_signal = ism_signal.id_signal)"
"              ->  Seq Scan on ism_floatvalues  (cost=0.00..2965077.57 rows=28817098 width=24) (actual time=453907.600..1002381.652 rows=29114105 loops=1)"
"                    Filter: (time_stamp > date_trunc('month'::text, (('now'::text)::date - '11 mons'::interval)))"
"              ->  Hash  (cost=544.19..544.19 rows=117 width=8) (actual time=733.782..733.782 rows=40 loops=1)"
"                    Buckets: 1024  Batches: 1  Memory Usage: 2kB"
"                    ->  HashAggregate  (cost=543.02..544.19 rows=117 width=8) (actual time=733.072..733.412 rows=40 loops=1)"
"                          ->  Hash Semi Join  (cost=27.61..542.73 rows=117 width=8) (actual time=638.175..687.934 rows=40 loops=1)"
"                                Hash Cond: (ism_signal.id_source = ism_installation.id_source)"
"                                ->  Bitmap Heap Scan on ism_signal  (cost=18.84..530.42 rows=850 width=16) (actual time=243.690..284.303 rows=850 loops=1)"
"                                      Recheck Cond: ((reference)::text = 'EDCA'::text)"
"                                      ->  Bitmap Index Scan on ism_signal_idx_reference  (cost=0.00..18.63 rows=850 width=0) (actual time=243.429..243.429 rows=865 loops=1)"
"                                            Index Cond: ((reference)::text = 'EDCA'::text)"
"                                ->  Hash  (cost=8.27..8.27 rows=40 width=8) (actual time=394.393..394.393 rows=40 loops=1)"
"                                      Buckets: 1024  Batches: 1  Memory Usage: 2kB"
"                                      ->  Hash Semi Join  (cost=3.25..8.27 rows=40 width=8) (actual time=391.966..394.000 rows=40 loops=1)"
"                                            Hash Cond: (ism_installation.id_installation = ism_groupxinstallation.id_installation)"
"                                            ->  Seq Scan on ism_installation  (cost=0.00..4.17 rows=117 width=16) (actual time=0.086..1.354 rows=117 loops=1)"
"                                            ->  Hash  (cost=2.75..2.75 rows=40 width=8) (actual time=390.274..390.274 rows=40 loops=1)"
"                                                  Buckets: 1024  Batches: 1  Memory Usage: 2kB"
"                                                  ->  Seq Scan on ism_groupxinstallation  (cost=0.00..2.75 rows=40 width=8) (actual time=389.536..389.903 rows=40 loops=1)"
"                                                        Filter: (id_group = 101)"
"Total runtime: 1302731.013 ms"


This query is very slow as you can see, it took about 20 minutos to complete.... Can someone help me to improve performance on this query??
Regards.
--

Pedro Jiménez Pérez
p.jimenez@ismsolar.com

Innovación en Sistemas de Monitorización, S.L.
Edificio Hevimar
C/ Iván Pavlov
2 y 4 - Parcela 4 2ª Planta Local 9
Parque Tecnológico de Andalucía
29590 Campanillas (Málaga)
Tlfno. 952 02 07 13
contacto@ismsolar.com

firma_gpt.jpg, 1 kB

Antes de imprimir, piensa en tu responsabilidad y compromiso con el MEDIO AMBIENTE!

Before printing, think about your responsibility and commitment with the ENVIRONMENT!

CLÁUSULA DE CONFIDENCIALIDAD.- Este mensaje, y en su caso, cualquier fichero anexo al mismo, puede contener información confidencial o legalmente protegida (LOPD 15/1999 de 13 de Diciembre), siendo para uso exclusivo del destinatario. No hay renuncia a la confidencialidad o secreto profesional por cualquier transmisión defectuosa o errónea, y queda expresamente prohibida su divulgación, copia o distribución a terceros sin la autorización expresa del remitente. Si ha recibido este mensaje por error, se ruega lo notifique al remitente enviando un mensaje al correo electrónico contacto@ismsolar.com y proceda inmediatamente al borrado del mensaje original y de todas sus copias. Gracias por su colaboración.

Attachment

Re: help with too slow query

From
Виктор Егоров
Date:
2012/11/2 Pedro Jiménez Pérez <p.jimenez@ismsolar.com>
  I have this table definition:

1) Could you kindly include also information bout ism_signal and ism_installation tables?
2) Please, follow this guide to provide more input: http://wiki.postgresql.org/wiki/Slow_Query_Questions


--
Victor Y. Yegorov

Re: help with too slow query

From
"Albe Laurenz"
Date:
Pedro Jiménez Pérez wrote:
> Sent: Friday, November 02, 2012 1:14 PM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] help with too slow query
>
> Hello,
>   I have this table definition:
> CREATE TABLE ism_floatvalues
> (
>   id_signal bigint NOT NULL, -- Indica la señal a la que pertenece este valor. Clave foránea que
> referencia al campo id_signal de la tabla ism_signal.
>   time_stamp timestamp without time zone NOT NULL, -- Marca de tiempo que indica fecha y hora
> correpondiente a este dato. Junto con id_signal forma la clave primaria de esta tabla
>   var_value double precision, -- Almacena el valor concreto de la señal en la marca de tiempo
> espeficicada.
>   CONSTRAINT ism_floatvalues_id_signal_fkey FOREIGN KEY (id_signal)
>       REFERENCES ism_signal (id_signal) MATCH SIMPLE
>       ON UPDATE NO ACTION ON DELETE CASCADE
> )
> WITH (
>   OIDS=FALSE
> );
>
> CREATE INDEX ism_floatvalues_index_idsignal_timestamp
>   ON ism_floatvalues
>   USING btree
>   (id_signal, time_stamp DESC);
>
>
>
> *********************************************
>
> Then I run this query....
> *********************************************
> EXPLAIN analyze
> select round(CAST(sum(var_value) AS numeric),2) as var_value, date_trunc('month', time_stamp) as
> time_stamp , date_part('month',date_trunc('month', time_stamp)) as month,
> date_part('year',date_trunc('year', time_stamp)) as year from ism_floatvalues where id_signal in
> (
> select id_signal from ism_signal where reference = 'EDCA' and id_source in
> (
> select id_source from ism_installation where id_installation in
> (select id_installation from ism_groupxinstallation where id_group = 101)
> )
> )
> and time_stamp > date_trunc('month', current_date - interval '11 months')
> group by date_trunc('month', time_stamp), month, year
> order by time_stamp
>
> ******************************
> And this is the result:
> ******************************
>
> "GroupAggregate  (cost=4766541.62..4884678.62 rows=39483 width=16) (actual time=1302542.073..1302713.154 rows=10
loops=1)"
[...]
> "        ->  Hash Join  (cost=545.65..3203518.39 rows=9356201 width=16) (actual time=458941.090..1302245.307
rows=9741loops=1)" 
> "              Hash Cond: (ism_floatvalues.id_signal = ism_signal.id_signal)"
> "              ->  Seq Scan on ism_floatvalues  (cost=0.00..2965077.57 rows=28817098 width=24) (actual
time=453907.600..1002381.652rows=29114105 loops=1)" 
> "                    Filter: (time_stamp > date_trunc('month'::text, (('now'::text)::date - '11 mons'::interval)))"
[...]

> This query is very slow as you can see, it took about 20 minutos to complete.... Can someone help me
> to improve performance on this query??
> Regards.

This sequential scan takes the lion share of the time.

Are the 29 million rows selected in that scan a significant percentage
of the total rows?  If yes, then the sequential scan is the
most efficient way to get the result, and the only remedy is to get
faster I/O or to cache more of the table in RAM.

If the query needs to access a lot of rows to complete, it must
be slow.

Yours,
Laurenz Albe


Re: help with too slow query

From
Pedro Jiménez Pérez
Date:
Ok, here we go:

I'm using postgresql version 8.0

Here is my query that is too slow:  http://explain.depesz.com/s/GbQ

***************************************************
EXPLAIN analyze
select round(CAST(sum(var_value) AS numeric),2) as var_value, date_trunc('month', time_stamp) as time_stamp , date_part('month',date_trunc('month', time_stamp)) as month, date_part('year',date_trunc('year', time_stamp)) as year from ism_floatvalues where id_signal in
(
select id_signal from ism_signal where reference = 'EDCA' and id_source in
(
select id_source from ism_installation where id_installation in
(select id_installation from ism_groupxinstallation where id_group = 101)
)
)
and time_stamp > date_trunc('month', current_date - interval '11 months')
group by date_trunc('month', time_stamp), month, year
order by time_stamp

***************************************************
Here are the tables:

Table ism_floatvalues:
Table ism_floatvalues has about 100 million records.
This table is updated everyday. Everyday we delete the data stored regarding yesterday, usually from 8 am to 13 pm moreless, and then we insert the data for yesterday (complete data) and the data we have available for today (usually from 8 am to 13pm) Then, tomorrow, we start over again.... so about 30% of records are deleted at least one time.
***************************************************
CREATE TABLE ism_floatvalues
(
  id_signal bigint NOT NULL, -- Indica la señal a la que pertenece este valor. Clave foránea que referencia al campo id_signal de la tabla ism_signal.
  time_stamp timestamp without time zone NOT NULL, -- Marca de tiempo que indica fecha y hora correpondiente a este dato. Junto con id_signal forma la clave primaria de esta tabla
  var_value double precision, -- Almacena el valor concreto de la señal en la marca de tiempo espeficicada.
  CONSTRAINT ism_floatvalues_id_signal_fkey FOREIGN KEY (id_signal)
      REFERENCES ism_signal (id_signal) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE
)
WITH (
  OIDS=FALSE
);
CREATE INDEX ism_floatvalues_index_idsignal_timestamp
  ON ism_floatvalues
  USING btree
  (id_signal, time_stamp DESC);
***************************************************

Table ism_signal:
this table has about 24K records
****************************************************
CREATE TABLE ism_signal
(
  id_signal bigserial NOT NULL, -- Código numérico autoincremental. Clave primaria de la tabla.
  id_source bigint NOT NULL,
  reference character varying NOT NULL, -- Cadena de caracteres con la que se identifica de forma única cada señal ( válida para toda la plataforma de ISM ).
  "name" character varying NOT NULL, -- Cadena de caracteres con la que se muestra este señal al usuario.
  signalclass character varying NOT NULL, -- Indica la clase de la señal. Sólo admite valores measure, global, hourly, daily, monthly, yearly, alarm, event, constant y attribute.
  signaltype character varying NOT NULL, -- Indica el tipo de dato de la señal.
  opcitem character varying, -- Cadena de caracteres que indica el item OPC de donde debe leerse esta señal. Como un mismo servidor OPC puede manejar varios sistemas, este item OPC se personaliza según el el espacio de nombres para una configuración concreta del servidor OPC (si hay 3 inversores gestionados por el mismo servidor OPC, el canal que genéricamente se donomina PCC debe ser personalizado a 1.PCC, 2.PCC o 3.PCC.
  formula character varying, -- Cadena de caracteres que indica la fórmula de ajuste para esta señal (como como argumento el valor leido del servidor OPC y le aplica esta fórmula).
  id_opcserverconf bigint, -- Referencia a un servidor OPC configurado de una forma determinada. Clave foránea al campo id_opcServerConf de la tabla ism_opcServerConf
  decimals smallint, -- Número de cifras decimales que se muestran al usuario
  unit character varying, -- Cadena de caracteres que indica las unidades de medida.
  description text, -- Breve descripción y comentarios adicionales.
  max_value double precision, -- Límite superior para representar gráficamente la magnitud.
  min_value double precision, -- Límite inferior para representar gráficamente la magnitud.
  critical_day date, -- Indica el día a partir del cual debemos empezar a pedir datos de esta señal.
  erased boolean DEFAULT false, -- Indica si este canal debe o no mostrarse en la web. Si vale "false" el dato debe mostrarse.
  writetodb boolean DEFAULT true, -- Indica si este canal debe o no almacenarse en la base de datos.       Si vale "true" el dato debe almacenarse
  dbupdaterate integer DEFAULT 0, -- Indica el intervalo de tiempo en el que debe recuperarse y almacenarse esta señal desde el servidor OPC a la base de datos.
  ordering integer, -- Indica el orden en el que deben mostrarse las señales al usuarioIndica el intervalo de tiempo en el que debe recuperarse y almacenarse esta señal desde el servidor OPC a la base de datos. Si están a NULL los canales se ordenan por orden alfabético.
  sync_level smallint NOT NULL DEFAULT 0, -- ndica el nivel de sincronización.       Si vale 0 indica que el dato se obtiene directamente un servidor OPC asociado a un dispositivo. Si vale N (N>0) significa que la señal se obtiene mediante cálculos sobre alguna señal de nivel N-1 y se lee de un servidor OPC asociado a la base de datos. Las señales de nivel N deben sincronizarse antes que las de nivel N+1.
  hastodaydata boolean NOT NULL DEFAULT true, -- Indica si la medida tiene datos en el dia actual o si por el contrario solo tiene datos de dias ya pasados
  interval_minutes smallint,
  updaterate integer DEFAULT 0,
  CONSTRAINT ism_signal_pkey PRIMARY KEY (id_signal),
  CONSTRAINT ism_signal_id_opcserverconf_fkey FOREIGN KEY (id_opcserverconf)
      REFERENCES ism_opcserverconf (id_opcserverconf) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT ism_signal_id_source_fkey FOREIGN KEY (id_source)
      REFERENCES ism_source (id_source) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT ism_signal_name_check CHECK (name::text <> ''::text),
  CONSTRAINT ism_signal_reference_check CHECK (reference::text ~ '^[_A-Za-z0-9]+$'::text),
  CONSTRAINT ism_signal_signalclass_check2 CHECK (signalclass::text = 'measure'::text OR signalclass::text = 'global'::text OR signalclass::text = 'hourly'::text OR signalclass::text = 'daily'::text OR signalclass::text = 'monthly'::text OR signalclass::text = 'yearly'::text OR signalclass::text = 'alarm'::text OR signalclass::text = 'event'::text OR signalclass::text = 'constant'::text OR signalclass::text = 'attribute'::text OR signalclass::text = 'DAmeasure'::text OR signalclass::text = 'filter'::text),
  CONSTRAINT ism_signal_signaltype_check CHECK (signaltype::text = 'float'::text OR signaltype::text = 'integer'::text OR signaltype::text = 'char'::text OR signaltype::text = 'string'::text OR signaltype::text = 'boolean'::text OR signaltype::text = 'memo'::text OR signaltype::text = 'void'::text)
)
WITH (
  OIDS=FALSE
);

CREATE INDEX ism_signal_idx_id_signal
  ON ism_signal
  USING btree
  (id_signal);

CREATE INDEX ism_signal_idx_id_source
  ON ism_signal
  USING btree
  (id_source);

CREATE INDEX ism_signal_idx_reference
  ON ism_signal
  USING btree
  (reference);

******************************************

Table ism_installation:
This table has about 200 records.
******************************************

CREATE TABLE ism_installation
(
  id_installation bigserial NOT NULL, -- Código numérico autoincremental. Clave primaria de la tabla.
  id_source bigint NOT NULL, -- Código único para cualquier fuente de señales (source). Clave foránea al campo id_source de la tabla ism_source
  "name" character varying NOT NULL, -- Nombre de la instalación. No puede ser cadena vacía ni nulo. No puede haber dos instalaciones con el mismo nombre.
  description text, -- Breve descripción y comentarios adicionales.
  latitude_degree smallint, -- Grados de latitud del emplazamineto de la instalación. Entero entre -180 y +180
  latitude_minute smallint, -- Minutos de latitud del emplazamineto de la instalación. Entero entre 0 y 59
  longitude_degree smallint, -- Grados de longitud del emplazamineto de la instalación. Entero entre -180 y +180
  longitude_minute smallint, -- Minutos de longitud del emplazamineto de la instalación. Entero entre 0 y 59
  city character varying, -- Nombre del término municipal donde está la instalación
  province character varying, -- Nombre de la provincia donde está la instalación.
  id_class bigint, -- Referncia a un elemento de ism_class que indica la tecnología de la instalación(fotovoltaica, eólica, térmica). Clave foránea al campo id_class de la tabla ism_class
  initial_date date,
  last_date date,
  last_hour time without time zone,
  ordering integer DEFAULT 0, -- Orden de las instalaciones de un grupo
  short_name character varying(20), -- Nombre corto de la instalcion para mostrarlo en un menu lateral de la web, que no debe exceder de 20 caracteres
  id_owner bigint,
  id_distributor bigint,
  installation_type character(1),
  id_syncgroup integer,
  address character varying(256),
  ripre character varying(256),
  plantgroup character varying(256),
  power character varying(256),
  edecode character varying(256),
  instgroup character varying(256),
  id_zone bigint,
  active boolean DEFAULT true,
  latitude_second double precision,
  longitude_second double precision,
  id_node integer,
  firstdataday date,
  CONSTRAINT ism_installation_pkey PRIMARY KEY (id_installation),
  CONSTRAINT id_syncgroup_fk FOREIGN KEY (id_syncgroup)
      REFERENCES ism_syncgroup (id_syncgroup) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT id_zone_fk FOREIGN KEY (id_zone)
      REFERENCES ism_counterzone (id_zone) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT ism_installation_distributor_fkey FOREIGN KEY (id_distributor)
      REFERENCES ism_distributor (id_distributor) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT ism_installation_id_class_fkey FOREIGN KEY (id_class)
      REFERENCES ism_class (id_class) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT ism_installation_id_source_fkey FOREIGN KEY (id_source)
      REFERENCES ism_source (id_source) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT ism_installation_owner_fkey FOREIGN KEY (id_owner)
      REFERENCES ism_owner (id_owner) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT ism_installation_name_key UNIQUE (name),
  CONSTRAINT ism_installation_latitude_degree_check CHECK ((-180) <= latitude_degree AND latitude_degree <= 180),
  CONSTRAINT ism_installation_latitude_minute_check CHECK (0 <= latitude_minute AND latitude_minute < 60),
  CONSTRAINT ism_installation_latitude_second_check CHECK (0::double precision <= latitude_second AND latitude_second < 60::double precision),
  CONSTRAINT ism_installation_longitude_degree_check CHECK ((-180) <= longitude_degree AND longitude_degree <= 180),
  CONSTRAINT ism_installation_longitude_minute_check CHECK (0 <= longitude_minute AND longitude_minute < 60),
  CONSTRAINT ism_installation_longitude_second_check CHECK (0::double precision <= longitude_second AND longitude_second < 60::double precision),
  CONSTRAINT ism_installation_name_check CHECK (name::text <> ''::text)
)
WITH (
  OIDS=FALSE
);


****************************************************

Regards.


El 05/11/2012 10:54, Виктор Егоров escribió:
2012/11/2 Pedro Jiménez Pérez <p.jimenez@ismsolar.com>
  I have this table definition:

1) Could you kindly include also information bout ism_signal and ism_installation tables?
2) Please, follow this guide to provide more input: http://wiki.postgresql.org/wiki/Slow_Query_Questions


--
Victor Y. Yegorov

--

Pedro Jiménez Pérez
p.jimenez@ismsolar.com

Innovación en Sistemas de Monitorización, S.L.
Edificio Hevimar
C/ Iván Pavlov
2 y 4 - Parcela 4 2ª Planta Local 9
Parque Tecnológico de Andalucía
29590 Campanillas (Málaga)
Tlfno. 952 02 07 13
contacto@ismsolar.com

firma_gpt.jpg, 1 kB

Antes de imprimir, piensa en tu responsabilidad y compromiso con el MEDIO AMBIENTE!

Before printing, think about your responsibility and commitment with the ENVIRONMENT!

CLÁUSULA DE CONFIDENCIALIDAD.- Este mensaje, y en su caso, cualquier fichero anexo al mismo, puede contener información confidencial o legalmente protegida (LOPD 15/1999 de 13 de Diciembre), siendo para uso exclusivo del destinatario. No hay renuncia a la confidencialidad o secreto profesional por cualquier transmisión defectuosa o errónea, y queda expresamente prohibida su divulgación, copia o distribución a terceros sin la autorización expresa del remitente. Si ha recibido este mensaje por error, se ruega lo notifique al remitente enviando un mensaje al correo electrónico contacto@ismsolar.com y proceda inmediatamente al borrado del mensaje original y de todas sus copias. Gracias por su colaboración.

Attachment

Re: help with too slow query

From
Виктор Егоров
Date:
2012/11/6 Pedro Jiménez Pérez <p.jimenez@ismsolar.com>
> Ok, here we go:
>
> I'm using postgresql version 8.0
>
> Here is my query that is too slow:  http://explain.depesz.com/s/GbQ


Well, I would start with a note, that 8.0 is not supported anymore:
http://www.postgresql.org/support/versioning/
Please, consider upgrading your instance.

Also, it is not handy to provide schema details here and anonymize the
EXPLAIN output.
Here's the visualization of your initial plan: http://explain.depesz.com/s/AOAN

The following join: (ism_floatvalues.id_signal = ism_signal.id_signal)
is wrongly estimated by the planner (row 3 of the above explain visualization).
It looks like NestedLoop join with IndexScan over
ism_floatvalues_index_idsignal_timestamp
might do a better job.

Try the following:
ALTER TABLE ism_floatvalues ALTER COLUMN id_signal SET STATISTICS
1000; /* 1000 is maximum for 8.0 */
ANALYZE ism_floatvalues;

Let me know if it helps.


--
Victor Y. Yegorov


Re: help with too slow query

From
Willem Leenen
Date:
@Victor,
 
Is the reason of the wrong cardinality estimations of the join indeed due to wrong statistics? I thought that the full table scan was due to the index on the timefield couldn't be used with this predicate:
 
time_stamp > date_trunc('month', current_date - interval '11 months')
 
It seems to me that a deterministic FBI should be made of this, deviding the records into month chuncks. Sort of a patch in stead of using partitions. But I'm new to Postgresql, so correct me if i'm wrong,
 
Regards,
Willem Leenen
Oracle DBA
 
> Date: Tue, 6 Nov 2012 14:17:07 +0200
> Subject: Re: [PERFORM] help with too slow query
> From: vyegorov@gmail.com
> To: p.jimenez@ismsolar.com
> CC: pgsql-performance@postgresql.org
>
> 2012/11/6 Pedro Jiménez Pérez <p.jimenez@ismsolar.com>
> > Ok, here we go:
> >
> > I'm using postgresql version 8.0
> >
> > Here is my query that is too slow: http://explain.depesz.com/s/GbQ
>
>
> Well, I would start with a note, that 8.0 is not supported anymore:
> http://www.postgresql.org/support/versioning/
> Please, consider upgrading your instance.
>
> Also, it is not handy to provide schema details here and anonymize the
> EXPLAIN output.
> Here's the visualization of your initial plan: http://explain.depesz.com/s/AOAN
>
> The following join: (ism_floatvalues.id_signal = ism_signal.id_signal)
> is wrongly estimated by the planner (row 3 of the above explain visualization).
> It looks like NestedLoop join with IndexScan over
> ism_floatvalues_index_idsignal_timestamp
> might do a better job.
>
> Try the following:
> ALTER TABLE ism_floatvalues ALTER COLUMN id_signal SET STATISTICS
> 1000; /* 1000 is maximum for 8.0 */
> ANALYZE ism_floatvalues;
>
> Let me know if it helps.
>
>
> --
> Victor Y. Yegorov
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: help with too slow query

From
Виктор Егоров
Date:
2012/11/6 Willem Leenen <willem_leenen@hotmail.com>:
> @Victor,
>
> Is the reason of the wrong cardinality estimations of the join indeed due to
> wrong statistics? I thought that the full table scan was due to the index on
> the timefield couldn't be used with this predicate:
>
> time_stamp > date_trunc('month', current_date - interval '11 months')
>
> It seems to me that a deterministic FBI should be made of this, deviding the
> records into month chuncks. Sort of a patch in stead of using partitions.
> But I'm new to Postgresql, so correct me if i'm wrong,

In 8.0, default_statistics_target=10, which means 1e8 rows big table
will get only 10 ranges
for the histograms, a bit too low to get a proper guess on the data
distribution. I would also
have increased default_statistics_target instance-wide, up to 50 at least.

PostgreSQL can use the index as it is and apply a filter afterwards
for each record emited by
the index scan. Very rough estimate shows, that there'll be round 4.2k
rows for each id_signal
in the ism_floatvalues tables. So index scan looks valid here with the
given setup.

With increased statistics target for the column I hope optimizer will
do a more precise estimate on
the column selectivity and will prefer to do a NestedLoop join between
ism_signal and ism_floatvalues tables.

I haven't considered the FBI though.

I hope I'm not mistaken here, waiting for the OP to provide more input.


--
Victor Y. Yegorov


Re: help with too slow query

From
Willem Leenen
Date:
@Victor,
 
Spasibo for the information, seems valid to me. 
 
Regards,
Willem Leenen

 
> Date: Tue, 6 Nov 2012 16:52:01 +0200
> Subject: Re: [PERFORM] help with too slow query
> From: vyegorov@gmail.com
> To: willem_leenen@hotmail.com
> CC: p.jimenez@ismsolar.com; pgsql-performance@postgresql.org
>
> 2012/11/6 Willem Leenen <willem_leenen@hotmail.com>:
> > @Victor,
> >
> > Is the reason of the wrong cardinality estimations of the join indeed due to
> > wrong statistics? I thought that the full table scan was due to the index on
> > the timefield couldn't be used with this predicate:
> >
> > time_stamp > date_trunc('month', current_date - interval '11 months')
> >
> > It seems to me that a deterministic FBI should be made of this, deviding the
> > records into month chuncks. Sort of a patch in stead of using partitions.
> > But I'm new to Postgresql, so correct me if i'm wrong,
>
> In 8.0, default_statistics_target=10, which means 1e8 rows big table
> will get only 10 ranges
> for the histograms, a bit too low to get a proper guess on the data
> distribution. I would also
> have increased default_statistics_target instance-wide, up to 50 at least.
>
> PostgreSQL can use the index as it is and apply a filter afterwards
> for each record emited by
> the index scan. Very rough estimate shows, that there'll be round 4.2k
> rows for each id_signal
> in the ism_floatvalues tables. So index scan looks valid here with the
> given setup.
>
> With increased statistics target for the column I hope optimizer will
> do a more precise estimate on
> the column selectivity and will prefer to do a NestedLoop join between
> ism_signal and ism_floatvalues tables.
>
> I haven't considered the FBI though.
>
> I hope I'm not mistaken here, waiting for the OP to provide more input.
>
>
> --
> Victor Y. Yegorov
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance