Thread: Re: [HACKERS] Out of memory problem (forwarded bug report)

Re: [HACKERS] Out of memory problem (forwarded bug report)

From
"Vladimír Beneš"
Date:
Hi,
   thank You very much, You helped me :-)

   Well, I will use function date() and not date_trunc() in my views.
   I tryed change type of "bytes" column from int8 to int4. Range 0 .. +2
147 483 647 will suffice here. But I must retype this type to int8 in sum
function because overflow can occur:

collector=> create table flow_sums (
collector-> primary_collector varchar(50) not null,
collector-> start datetime not null,
collector-> end_period datetime not null,
collector-> dead_time_rel float4 not null,
collector-> src_name varchar(50) not null,
collector-> dst_name varchar(50) not null,
collector-> bytes  int4 not null,
collector-> packets int4 not null);
CREATE
collector=> insert into
flow_sums(primary_collector,start,end_period,dead_time_rel,src_name,dst_name
,bytes,packets)
collector-> select
primary_collector,start,end_period,dead_time_rel,src_name,dst_name,bytes,pac
kets
collector-> from flow_sums_200002;
INSERT 0 3198588
collector=> create unique index flow_sums_pk on
flow_sums(primary_collector,start, src_name, dst_name);
CREATE
collector=>
collector=> select sum(bytes) from flow_sums;       sum
-----------
-1712976144
(1 row)
collector=> select sum(int8(bytes)) from flow_sums;        sum
------------
603877412592
(1 row)

collector=>
   So my views will retype attributes in sums to int8 and they will use
function date to round datetime attribute.
   Thank You very much and please forward this mail to
pgsql-hackers@postgreSQL.org becouse the robot will bounce my CC.
       Thanks, V. Benes


-----Původní zpráva-----
Od: Tom Lane <tgl@sss.pgh.pa.us>
Komu: Vladimír Beneš <Vladimir.Benes@pvt.cz>
Kopie: pgsql-hackers@postgreSQL.org <pgsql-hackers@postgreSQL.org>;
Mühlpachr Michal <michalm@pvt.net>
Datum: 24. února 2000 6:46
Předmět: Re: [HACKERS] Out of memory problem (forwarded bug report)


>Vladimir,
>  Thanks for the details.  I think you are undoubtedly running into
>expression evaluation memory leaks.  Basically, any expression that
>yields a non-pass-by-value data type consumes memory that is not
>reclaimed until end of statement --- so when you process a few million
>rows, that memory starts to add up.  (Yes, I realize this is a horrible
>misfeature.  It's on our TO-DO list to fix it, but it probably won't
>happen until 7.1 or 7.2.)  In the meantime the best I can offer you
>is workarounds.
>
>  I think the major problems here are coming from the
>"date_trunc('day',start)" calculation (because its datetime result is
>pass-by-reference) and to a lesser extent from the sum(bytes)
>calculation (because int8 is pass-by-reference).  You could easily
>replace "date_trunc('day',start)" with "date(start)"; since date is
>a pass-by-value type, that won't leak memory, and it should give
>equivalent results.  The int8 sum is not quite so easy to fix.
>I assume you can't get away with switching to int4 --- probably
>your sum would overflow an int4?  It may be that just fixing the
>inefficient date_trunc calc will reduce your memory requirements
>enough to get by.  If not, the only good news I have is that release
>7.0 does fix the memory-leak problem for internal calculations of
>aggregate functions like sum().  You can get the first beta release
>for 7.0 now.
>
> regards, tom lane
>
>
>"Vladimír Beneš" <Vladimir.Benes@pvt.cz> writes:
>> -----Původní zpráva-----
>> Od: Tom Lane <tgl@sss.pgh.pa.us>
>> Komu: Oliver Elphick <olly@lfix.co.uk>
>> Kopie: pgsql-hackers@postgreSQL.org <pgsql-hackers@postgreSQL.org>;
>> Vladimir.Benes@pvt.cz <Vladimir.Benes@pvt.cz>
>> Datum: 22. února 2000 18:06
>> Předmět: Re: [HACKERS] Out of memory problem (forwarded bug report)
>
>
>>> "Oliver Elphick" <olly@lfix.co.uk> writes:
>>>> Can someone advise, please, how to deal with this problem in 6.5.3?
>>>
>
>>> My guess is that the cause is memory leaks during expression evaluation;
>>> but without seeing the complete view definitions and underlying table
>>> definitions, it's impossible to know what processing is being invoked
>>> by this query...
>>>
>>> regards, tom lane
>
>
>
>>     Well, I will append views and underlying table definition:
>
>> 1) Once again - failure query:
>> select comm_type,name,tot_bytes,tot_packets
>> from flow_sums_days_send_200002_view
>> where day='2000-02-21' and name not like '@%'
>> union all
>> select comm_type,name,tot_bytes,tot_packets
>> from flow_sums_days_receive_200002_view
>> where day='2000-02-21' and name not like '@%'
>
>> 2) views definition:
>> create view flow_sums_days_send_200002_view as
>> select
>>   'send'::varchar as comm_type, date_trunc('day',start) as day,
>>   src_name as name, sum(bytes) as tot_bytes, sum(packets) as tot_packets
>> from flow_sums_200002
>> group by day, src_name
>
>> create view flow_sums_days_receive_200002_view as
>> select
>>   'receive'::varchar as comm_type, date_trunc('day',start) as day,
>>   dst_name as name, sum(bytes) as tot_bytes, sum(packets) as tot_packets
>> from flow_sums_200002
>> group by day, dst_name
>
>
>> I wanted create only one usefull view:
>
>> create view flow_sums_days_200002_view as
>> select
>>   'send'::varchar as comm_type, date_trunc('day',start) as day,
>>   src_name as name, sum(bytes) as tot_bytes, sum(packets) as tot_packets
>> from flow_sums_200002
>> group by day, src_name
>> UNION ALL
>> select
>>   'receive'::varchar as comm_type, date_trunc('day',start) as day,
>>   dst_name as name, sum(bytes) as tot_bytes, sum(packets) as tot_packets
>> from flow_sums_200002
>> group by day, dst_name
>
>> ...but Postgres cann't use clause UNION ALL at view definition. So I
created
>> two views mentioned above and I wanted use this ones with UNION ALL
clause
>> only.
>
>> 3) underlaying table definition:
>> create table flow_sums_200002 (
>>     primary_collector varchar(50) not null,
>>     start datetime not null,
>>     end_period datetime not null,
>>     dead_time_rel float4 not null,
>>     src_name varchar(50) not null,
>>     dst_name varchar(50) not null,
>>     bytes int8 not null,
>>     packets int4 not null
>> )
>
>>     Today this table has about 3 000 000 rows and the select command
>> mentioned above returns 190 + 255 rows.
>
>
>>     Now I don't use clause "UNION ALL" and the program executes two
queryes
>> and then adds both result to new result. I reduced time increment of
number
>> rows to flow_sums_200002 table (three times less). This table contains
data
>> of February 2000 and the program will create table flow_sums_200003 with
>> relevant views next month.
>>     Well, now this solution solve my problem but always depends on number
of
>> rows - I only moved limit of rows count.
>
>
>>             Thank You, V. Benes
>
>> P.S.: I append part of top on my system while the query is running:
>
>> CPU states: 98.6% user,  1.3% system,  0.0% nice,  0.0% idle
>> Mem:  127256K av, 124316K used,   2940K free,  29812K shrd,   2620K buff
>> Swap: 128516K av,  51036K used,  77480K free                  7560K
cached
>
>>   PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME
COMMAND
>> 2942 postgres  20   0  141M  99M 17348 R       0 99.0 80.4   1:22
postmaster
>
>> => postmaster later took 80 - 95% of memory, free memory decressed to 2
MB,
>> CPU was overloaded (0% idle and 99% by user process of postmaster). Have
You
>> ever seen something similar :-) ?



Re: [HACKERS] Out of memory problem (forwarded bug report)

From
Tom Lane
Date:
"Vladimír Beneš" <Vladimir.Benes@pvt.cz> writes:
> collector=> select sum(int8(bytes)) from flow_sums;

That will not help, because you're still invoking the int8 flavor of
sum().  Might as well leave it alone and update to 7.0 beta.
        regards, tom lane