Thread: Why Postgres use a little memory on Windows.

From:
tuanhoanganh
Date:

Hello

I have a bad query on PostgreSQL 9.0.23 - 64bit - Windows 2012 R2 - 48GB Ram

explain analyze select d.data_id, d.table_name, d.event_type, d.row_data, d.pk_data, d.old_data, d.create_time, d.trigger_hist_id, d.channel_id, d.transaction_id, d.source_node_id, d.external_data, '' from sym_data d inner join sym_data_gap g on g.status='GP' and d.data_id between g.start_id and g.end_id where d.channel_id='sale_transaction' order by d.data_id asc;

Here is result

Nested Loop  (cost=319.42..4879348246.58 rows=32820035265 width=1525) (actual time=64656.747..5594654.189 rows=3617090 loops=1) ->  Index Scan using sym_data_pkey on sym_data d  (cost=0.00..3671742.82 rows=3867095 width=1525) (actual time=9.775..12465.153 rows=3866359 loops=1)       Filter: ((channel_id)::text = 'sale_transaction'::text) ->  Bitmap Heap Scan on sym_data_gap g  (cost=319.42..1133.51 rows=8487 width=8) (actual time=1.438..1.439 rows=1 loops=3866359)       Recheck Cond: ((d.data_id >= g.start_id) AND (d.data_id <= g.end_id))       Filter: (g.status = 'GP'::bpchar)       ->  Bitmap Index Scan on sym_data_gap_pkey  (cost=0.00..317.30 rows=8487 width=0) (actual time=1.436..1.436 rows=1 loops=3866359)             Index Cond: ((d.data_id >= g.start_id) AND (d.data_id <= g.end_id))
http://explain.depesz.com/s/c3DT

I have run vaccum full. Here is my PostgreSQL config

shared_buffers = 2GB
work_mem = 64MB
maintenance_work_mem = 1GB
wal_buffers = 256
effective_cache_size = 4GB
checkpoint_segments = 256
wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 256
random_page_cost = 3.5
autovacuum_vacuum_threshold = 1000
autovacuum_analyze_threshold = 250
max_locks_per_transaction = 2000

When I check taskmanager, I found postgres process is user 4-5MB

What happened with my PostgreSQL. Please help me

Thank you in advance.

Tuan Hoang Anh



From:
Adrian Klaver
Date:

On 02/20/2016 08:46 AM, tuanhoanganh wrote:
> Hello
>
> I have a bad query on PostgreSQL 9.0.23 - 64bit - Windows 2012 R2 - 48GB Ram
>
> explain analyze select d.data_id, d.table_name, d.event_type,
> d.row_data, d.pk_data, d.old_data, d.create_time, d.trigger_hist_id,
> d.channel_id, d.transaction_id, d.source_node_id, d.external_data, ''
> from sym_data d inner join sym_data_gap g on g.status='GP' and d.data_id
> between g.start_id and g.end_id where d.channel_id='sale_transaction'
> order by d.data_id asc;

Took liberty of reformatting the above here:
http://sqlformat.darold.net/

EXPLAIN ANALYZE
SELECT
     d.data_id,
     d.table_name,
     d.event_type,
     d.row_data,
     d.pk_data,
     d.old_data,
     d.create_time,
     d.trigger_hist_id,
     d.channel_id,
     d.transaction_id,
     d.source_node_id,
     d.external_data,
     ''
FROM
     sym_data d INNER JOIN sym_data_gap g ON g.status = 'GP'
     AND d.data_id BETWEEN g.start_id
     AND g.end_id
WHERE
     d.channel_id = 'sale_transaction'
ORDER BY
     d.data_id ASC;

The thing that stands out to me is that I do not see that sym_data and
sym_data_gp are actually joined on anything.

Also is it possible to see the schema definitions for the two tables?



>
> Here is result
>
> Nested Loop  (cost=319.42..4879348246.58 rows=32820035265 width=1525) (actual time=64656.747..5594654.189
rows=3617090loops=1) 
>    ->  Index Scan using sym_data_pkey on sym_data d  (cost=0.00..3671742.82 rows=3867095 width=1525) (actual
time=9.775..12465.153rows=3866359 loops=1) 
>          Filter: ((channel_id)::text = 'sale_transaction'::text)
>    ->  Bitmap Heap Scan on sym_data_gap g  (cost=319.42..1133.51 rows=8487 width=8) (actual time=1.438..1.439 rows=1
loops=3866359)
>          Recheck Cond: ((d.data_id >= g.start_id) AND (d.data_id <= g.end_id))
>          Filter: (g.status = 'GP'::bpchar)
>          ->  Bitmap Index Scan on sym_data_gap_pkey  (cost=0.00..317.30 rows=8487 width=0) (actual time=1.436..1.436
rows=1loops=3866359) 
>                Index Cond: ((d.data_id >= g.start_id) AND (d.data_id <= g.end_id))
>
> http://explain.depesz.com/s/c3DT
>
>
> I have run vaccum full. Here is my PostgreSQL config
>
> shared_buffers = 2GB
> work_mem = 64MB
> maintenance_work_mem = 1GB
> wal_buffers = 256
> effective_cache_size = 4GB
> checkpoint_segments = 256
> wal_level = hot_standby
> max_wal_senders = 5
> wal_keep_segments = 256
> random_page_cost = 3.5
> autovacuum_vacuum_threshold = 1000
> autovacuum_analyze_threshold = 250
> max_locks_per_transaction = 2000
>
> When I check taskmanager, I found postgres process is user 4-5MB
>
> What happened with my PostgreSQL. Please help me
>
> Thank you in advance.
>
> Tuan Hoang Anh
>
>
>


--
Adrian Klaver



From:
Tom Lane
Date:

Adrian Klaver <> writes:
> Took liberty of reformatting the above here:
> ...
> FROM
>      sym_data d INNER JOIN sym_data_gap g ON g.status = 'GP'
>      AND d.data_id BETWEEN g.start_id
>      AND g.end_id
> WHERE
>      d.channel_id = 'sale_transaction'
> ORDER BY
>      d.data_id ASC;

> The thing that stands out to me is that I do not see that sym_data and
> sym_data_gp are actually joined on anything.

The "d.data_id BETWEEN g.start_id AND g.end_id" part is a join condition
... but not one that can be handled by either hash or merge join, because
those require simple equality join conditions.  So the nestloop plan shown
here is really about as good as you're going to get without redesigning
the query and/or the data representation.

It looks like the bitmap heap scan generally returns exactly one row for
each outer row, which makes me wonder if the BETWEEN couldn't be replaced
with some sort of equality.  But that might take some rethinking of the
data.

            regards, tom lane


From:
Francisco Olarte
Date:

On Sat, Feb 20, 2016 at 7:13 PM, Adrian Klaver
<> wrote:
.....
> FROM
>     sym_data d INNER JOIN sym_data_gap g ON g.status = 'GP'
>     AND d.data_id BETWEEN g.start_id
>     AND g.end_id
.....
> The thing that stands out to me is that I do not see that sym_data and
> sym_data_gp are actually joined on anything.

Yes they are, although the formatting hid it somehow.

It is a classic,  data_gap defines intervals via start+end id over
data, he wants to join every data with the corresponding gap. It is  a
hard optimization problem without knowing more of the data
distributions, maybe the interval types and ginindexes can help him.
When faced with this kind of structure, depending on the data
distribution, I've solved it via two paralell queries ( gap sorted by
start plus end, data sorted by id, sweep them in paralell joining by
code, typical tape-update problem, works like a charm for
non-overlapping ranges and even for some overlapping ones with a
couple of queues  ) . And he seems to want all of the data ( sometime
this goes faster if you can add a couple of range conditions for
data.id / gap.start/end_id.

> Also is it possible to see the schema definitions for the two tables?

My bet is on somethink like data.id ~serial primary key,
gap.start/end_id foreign key to that.

Francisco Olarte.


From:
Francisco Olarte
Date:

On Sat, Feb 20, 2016 at 7:37 PM, Tom Lane <> wrote:
> It looks like the bitmap heap scan generally returns exactly one row for
> each outer row, which makes me wonder if the BETWEEN couldn't be replaced
> with some sort of equality.

Mm, I'm not good reading explains, but that seems to confirm my
suspicion that gaps partition the id range in non overlapping ranges.

> But that might take some rethinking of the data.

If id is a series, gap defines a range, he can do something with an
auxiliary table, like

select start as a, 0 as b from gaps where status = 'GP'
union all
select id as a,1 as b from data
union all end-1 as a, 2 as b from gaps  where status='gp' -- to end-1
to make intervals half open.
order by a,b

which would give all the ids in a with b=1  surrounded by (0,2) when
valid and by (2,0) when invalid.

and then, with a creative window clause or a small function, filter
that and join with data.id. I suppose adding a third c column, null on
b=1 and =b on b=0/2 and selecting the previous non-null in the
sequence could do it, but it's somehow above my window-fu, I'm more of
a code gouy and would do it with two nested loops on a function.

Francisco Olarte.


From:
Adrian Klaver
Date:

On 02/20/2016 10:39 AM, Francisco Olarte wrote:
> On Sat, Feb 20, 2016 at 7:13 PM, Adrian Klaver
> <> wrote:
> .....
>> FROM
>>      sym_data d INNER JOIN sym_data_gap g ON g.status = 'GP'
>>      AND d.data_id BETWEEN g.start_id
>>      AND g.end_id
> .....
>> The thing that stands out to me is that I do not see that sym_data and
>> sym_data_gp are actually joined on anything.
>
> Yes they are, although the formatting hid it somehow.
>
> It is a classic,  data_gap defines intervals via start+end id over
> data, he wants to join every data with the corresponding gap. It is  a
> hard optimization problem without knowing more of the data
> distributions, maybe the interval types and ginindexes can help him.
> When faced with this kind of structure, depending on the data
> distribution, I've solved it via two paralell queries ( gap sorted by
> start plus end, data sorted by id, sweep them in paralell joining by
> code, typical tape-update problem, works like a charm for
> non-overlapping ranges and even for some overlapping ones with a
> couple of queues  ) . And he seems to want all of the data ( sometime
> this goes faster if you can add a couple of range conditions for
> data.id / gap.start/end_id.

Thanks to you and Tom for enlightening me. I am going to have to spend
some time puzzling this out to convert what you have shown into
something that I can wrap my head around.

>
>> Also is it possible to see the schema definitions for the two tables?
>
> My bet is on somethink like data.id ~serial primary key,
> gap.start/end_id foreign key to that.
>
> Francisco Olarte.
>


--
Adrian Klaver



From:
Jeff Janes
Date:

On Sat, Feb 20, 2016 at 8:46 AM, tuanhoanganh <> wrote:
> Hello
>
> I have a bad query on PostgreSQL 9.0.23 - 64bit - Windows 2012 R2 - 48GB Ram

9.0 is no longer supported.  You should work toward upgrading to a
newer version.  It might not solve this problem, but it would give you
better tools for diagnosing the problem.  Which is a pretty good step
toward solving it.

> When I check taskmanager, I found postgres process is user 4-5MB

Other people have explained the details of how the query is being run
and why it is being run that way.  But I would like to take a step
back from that, and tell you that the reason that PostgreSQL is not
using more memory, is that it doesn't think that using more memory
would help.

Cheers,

Jeff


From:
tuanhoanganh
Date:

Thanks for all help of everyone.

I have tried to change effective_cache_size = 24GB and it run well.

Tuan Hoang Anh