Thread: PostgreSQL 9.0.1 on Windows performance tunning help please

PostgreSQL 9.0.1 on Windows performance tunning help please

From
tuanhoanganh
Date:
I have postgresql 9.0.1 on windows 2003 ent with 6GB ram, 4 disk SATA RAID 10.
I am running SymmetricDS to replication over WAN. But yesterday there was a big problem, i updated alot of rows and query to gap data of SymmetricDS run verry very slowly.

Here is my postgresql.conf to tunning PostgreSQL
effective_cache_size = 4GB
work_mem = 2097151
shared_buffers = 1GB

Here is query :
explain analyze select d.data_id, d.table_name, d.event_type, d.row_data, d.pk_data, d.old_data, d.create_time, d.trigger_hist_id, d.channel_id, d.transaction_id, d.source_node_id, d.external_data, '' from sym_data d inner join sym_data_gap g on g.status='GP' and d.data_id between g.start_id and g.end_id where d.channel_id='sale_transaction' order by d.data_id asc;

And here is result : 
Nested Loop  (cost=0.00..1517515125.95 rows=26367212590 width=1403) (actual time=14646.390..7745828.163 rows=2764140 loops=1)
  ->  Index Scan using sym_data_pkey on sym_data d  (cost=0.00..637148.72 rows=3129103 width=1403) (actual time=71.989..55643.665 rows=3124631 loops=1)
        Filter: ((channel_id)::text = 'sale_transaction'::text)
  ->  Index Scan using sym_data_gap_pkey on sym_data_gap g  (cost=0.00..358.37 rows=8426 width=8) (actual time=2.459..2.459 rows=1 loops=3124631)
        Index Cond: ((d.data_id >= g.start_id) AND (d.data_id <= g.end_id))
        Filter: (g.status = 'GP'::bpchar)
Total runtime: 7746577.478 ms

Here is table sym_data it have 437319 rows with data_id between start_id and end_id of sym_data_gap has status = 'GP'

CREATE TABLE sym_data
(
  data_id serial NOT NULL,
  table_name character varying(50) NOT NULL,
  event_type character(1) NOT NULL,
  row_data text,
  pk_data text,
  old_data text,
  trigger_hist_id integer NOT NULL,
  channel_id character varying(20),
  transaction_id character varying(255),
  source_node_id character varying(50),
  external_data character varying(50),
  create_time timestamp without time zone,
  CONSTRAINT sym_data_pkey PRIMARY KEY (data_id)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE sym_data OWNER TO postgres;

-- Index: idx_d_channel_id

-- DROP INDEX idx_d_channel_id;

CREATE INDEX idx_d_channel_id
  ON sym_data
  USING btree
  (data_id, channel_id);

And here is sym_data_gap table it have 57838 rows have status = 'GP'

CREATE TABLE sym_data_gap
(
  start_id integer NOT NULL,
  end_id integer NOT NULL,
  status character(2),
  create_time timestamp without time zone NOT NULL,
  last_update_hostname character varying(255),
  last_update_time timestamp without time zone NOT NULL,
  CONSTRAINT sym_data_gap_pkey PRIMARY KEY (start_id, end_id)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE sym_data_gap OWNER TO postgres;

-- Index: idx_dg_status

-- DROP INDEX idx_dg_status;

CREATE INDEX idx_dg_status
  ON sym_data_gap
  USING btree
  (status);

Because the query run very slowly so data is not replication between to distance. Please help me.

Sorry for my English
Tuan Hoang ANh

Re: PostgreSQL 9.0.1 on Windows performance tunning help please

From
"Kevin Grittner"
Date:
tuanhoanganh <hatuan05@gmail.com> wrote:

> I have postgresql 9.0.1

http://www.postgresql.org/support/versioning

> 6GB ram

> work_mem = 2097151

I think that has the potential to push you into swapping:

cc=> set work_mem = 2097151;
SET
cc=> show work_mem;
 work_mem
-----------
 2097151kB
(1 row)

That's 2GB, and that much can be allocated, potentially several
times, per connection.

>   ->  Index Scan using sym_data_pkey on sym_data d
>           (cost=0.00..637148.72 rows=3129103 width=1403)
>           (actual time=71.989..55643.665 rows=3124631 loops=1)
>         Filter: ((channel_id)::text = 'sale_transaction'::text)

This index scan is going to randomly access all tuples in the
table's heap. That is probably going to be much slower than a
sequential scan. It is apparently choosing this index to avoid a
sort, because of the mis-estimation on the number of rows.  Is it
critical that the rows be returned in that order?  If not, you might
see much faster performance by leaving off the ORDER BY clause so
that it can use the seqscan.

You could potentially make queries like this much faster by indexing
on channel_id, or by indexing on data_id WHERE channel_id =
'sale_transaction'..

You could also set up optimization barriers with clever use of a CTE
or an OFFSET 0 to force it to use a seqscan followed by a sort, but
I would look at the other options first.

-Kevin

Re: PostgreSQL 9.0.1 on Windows performance tunning help please

From
tuanhoanganh
Date:
Thanks for your help.
I create index on channel_id and data_id like your comment. 

- Index: idx_d_channel_id2

-- DROP INDEX idx_d_channel_id2;

CREATE INDEX idx_d_channel_id2
  ON sym_data
  USING btree
  (channel_id);

-- Index: idx_d_channel_id3

-- DROP INDEX idx_d_channel_id3;

CREATE INDEX idx_d_channel_id3
  ON sym_data
  USING btree
  (data_id)
  WHERE channel_id::text = 'sale_transaction'::text;

-- Index: idx_d_channel_id4

-- DROP INDEX idx_d_channel_id4;

CREATE INDEX idx_d_channel_id4
  ON sym_data
  USING btree
  (data_id)
  WHERE channel_id::text = 'item'::text;

Here is new explan analyze

explain analyze select d.data_id, d.table_name, d.event_type, d.row_data, d.pk_data, d.old_data, d.create_time, d.trigger_hist_id, d.channel_id, d.transaction_id, d.source_node_id, d.external_data, '' from sym_data d inner join sym_data_gap g on g.status='GP' and d.data_id between g.start_id and g.end_id where d.channel_id='sale_transaction' order by d.data_id asc;

Nested Loop  (cost=0.00..1512979014.35 rows=26268463088 width=1401) (actual time=25741.704..7650979.311 rows=2764140 loops=1)
  ->  Index Scan using idx_d_channel_id3 on sym_data d  (cost=0.00..1781979.40 rows=3117384 width=1401) (actual time=83.718..55126.002 rows=3124631 loops=1)
  ->  Index Scan using sym_data_gap_pkey on sym_data_gap g  (cost=0.00..358.37 rows=8426 width=8) (actual time=2.428..2.429 rows=1 loops=3124631)
        Index Cond: ((d.data_id >= g.start_id) AND (d.data_id <= g.end_id))
        Filter: (g.status = 'GP'::bpchar)
Total runtime: 7651803.073 ms

But query performance don't change.
Please help me.

Tuan Hoang ANh

On Sat, Aug 6, 2011 at 12:20 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
tuanhoanganh <hatuan05@gmail.com> wrote:

> I have postgresql 9.0.1

http://www.postgresql.org/support/versioning

> 6GB ram

> work_mem = 2097151

I think that has the potential to push you into swapping:

cc=> set work_mem = 2097151;
SET
cc=> show work_mem;
 work_mem
-----------
 2097151kB
(1 row)

That's 2GB, and that much can be allocated, potentially several
times, per connection.

>   ->  Index Scan using sym_data_pkey on sym_data d
>           (cost=0.00..637148.72 rows=3129103 width=1403)
>           (actual time=71.989..55643.665 rows=3124631 loops=1)
>         Filter: ((channel_id)::text = 'sale_transaction'::text)

This index scan is going to randomly access all tuples in the
table's heap. That is probably going to be much slower than a
sequential scan. It is apparently choosing this index to avoid a
sort, because of the mis-estimation on the number of rows.  Is it
critical that the rows be returned in that order?  If not, you might
see much faster performance by leaving off the ORDER BY clause so
that it can use the seqscan.

You could potentially make queries like this much faster by indexing
on channel_id, or by indexing on data_id WHERE channel_id =
'sale_transaction'..

You could also set up optimization barriers with clever use of a CTE
or an OFFSET 0 to force it to use a seqscan followed by a sort, but
I would look at the other options first.

-Kevin

Re: PostgreSQL 9.0.1 on Windows performance tunning help please

From
Greg Williamson
Date:
Tuan --

>
> Thanks for your help.
> I create index on channel_id and data_id like your comment. 
...
<...>
>
> explain analyze select d.data_id, d.table_name, d.event_type, d.row_data, d.pk_data, d.old_data, d.create_time,
d.trigger_hist_id,d.channel_id, d.transaction_id, > d.source_node_id, d.external_data, '' from sym_data d inner join
sym_data_gapg on g.status='GP' and d.data_id between g.start_id and g.end_id where 
> d.channel_id='sale_transaction' order by d.data_id asc;
>
> Nested Loop  (cost=0.00..1512979014.35 rows=26268463088 width=1401) (actual time=25741.704..7650979.311 rows=2764140
loops=1)
>   ->  Index Scan using idx_d_channel_id3 on sym_data d  (cost=0.00..1781979.40 rows=3117384 width=1401) (actual
time=83.718..55126.002rows=3124631 loops=1) 
>   ->  Index Scan using sym_data_gap_pkey on sym_data_gap g  (cost=0.00..358.37 rows=8426 width=8) (actual
time=2.428..2.429rows=1 loops=3124631) 
>        Index Cond: ((d.data_id >= g.start_id) AND (d.data_id <= g.end_id))
>         Filter: (g.status = 'GP'::bpchar)
> Total runtime: 7651803.073 ms
>
> But query performance don't change.
> Please help me.

Did you run an analyze on the table after building the new indexes ? The row estimates seem to be off wildly,
although that may be a symptom of something else and not related, it is worth ruling out the easily tried.

HTH,

Greg Williamson


Re: PostgreSQL 9.0.1 on Windows performance tunning help please

From
tuanhoanganh
Date:
Yes, I run
VACUUM VERBOSE ANALYZE sym_data;
VACUUM VERBOSE ANALYZE sym_data_gap;
after create index.

If i remove ORDER BY, the query run faster.

explain analyze select d.data_id, d.table_name, d.event_type, d.row_data, d.pk_data, d.old_data, d.create_time, d.trigger_hist_id, d.channel_id, d.transaction_id, d.source_node_id, d.external_data, '' from sym_data d inner join sym_data_gap g on g.status='GP' and d.data_id between g.start_id and g.end_id where d.channel_id='sale_transaction';

Nested Loop  (cost=0.00..1384889042.54 rows=26266634550 width=1400) (actual time=63.546..36699.188 rows=2764140 loops=1)
  ->  Index Scan using idx_dg_status on sym_data_gap g  (cost=0.00..2802.42 rows=75838 width=8) (actual time=63.348..122.565 rows=75838 loops=1)
        Index Cond: (status = 'GP'::bpchar)
  ->  Index Scan using idx_d_channel_id3 on sym_data d  (cost=0.00..13065.83 rows=346352 width=1400) (actual time=0.027..0.450 rows=36 loops=75838)
        Index Cond: ((d.data_id >= g.start_id) AND (d.data_id <= g.end_id))
Total runtime: 37226.543 ms

On Sat, Aug 6, 2011 at 10:09 AM, Greg Williamson <gwilliamson39@yahoo.com> wrote:

Did you run an analyze on the table after building the new indexes ? The row estimates seem to be off wildly,
although that may be a symptom of something else and not related, it is worth ruling out the easily tried.

HTH,

Greg Williamson