Thread: Memory usage on subselect

Memory usage on subselect

From
"Alexander M. Pravking"
Date:
Hello, PostgreSQL users and developers.
I've got a memory usage problem when I try to do a subselect on the same
table as the main select (PostgreSQL 7.3.4 on FreeBSD). Here's my query:

SELECT  sreq(s1.id, 'ipacct_ip', now()), s1.*
FROM    services s1
WHERE   EXISTS (               SELECT  1               FROM    services s2               WHERE   s2.id != s1.id
     AND     sreq(s2.id, 'ipacct_ip', now()) = sreq(s1.id, 'ipacct_ip', now())               AND     s2.sdate < now()
ANDs2.edate > now()       )
 
AND     s1.sdate < now() AND s1.edate > now();

I.e. I want to find all records from services which have equal values of
sreq(...) for them (additionally filtering only those which are actual
now).

The "services" table is indexed only on "id" column and has about a
thousand tuples. sreq(integer, text, timestamptz) is a strict immutable
function written in SQL.
EXPLAIN says the following:
Seq Scan on services s1  (cost=0.00..38628.80 rows=38 width=55)  Filter: ((sdate < now()) AND (edate > now()) AND
(subplan)) SubPlan    ->  Seq Scan on services s2  (cost=0.00..56.08 rows=1 width=0)          Filter: ((id <> $0) AND
(sreq(id,'ipacct_ip'::text, now()) = sreq($0, 'ipacct_ip'::text, now())) AND (sdate < now()) AND (edate > now()))
 

I see no evil here (of course, the query is going to be slow), but the
postgres process begins to consume a lot of memory (I cancelled a query
after ~500M).

Am I doing something wrong or is it expected behavour?
I never seen this before, so I'd think it's me who mistaken,
but I can't find anything wrong for a few hours :)

Here's subquerie's EXPLAIN ANALYZE for a sample (existing) s1.id:

EXPLAIN ANALYZE               SELECT  1               FROM    services s2               WHERE   s2.id != 561
  AND     sreq(s2.id, 'ipacct_ip', now()) = sreq(561, 'ipacct_ip', now())               AND     s2.sdate < now() AND
s2.edate> now();
 
Seq Scan on services s2  (cost=0.00..56.08 rows=1 width=0) (actual time=177.01..177.01 rows=0 loops=1)  Filter: ((id <>
561)AND (sreq(id, 'ipacct_ip'::text, now()) = sreq(561, 'ipacct_ip'::text, now())) AND (sdate < now()) AND (edate >
now()))Totalruntime: 177.05 msec
 

I can provide other details, if needed. Thanks in advance.

-- 
Fduch M. Pravking


Re: Memory usage on subselect

From
Tom Lane
Date:
"Alexander M. Pravking" <fduch@antar.bryansk.ru> writes:
> I've got a memory usage problem when I try to do a subselect on the same
> table as the main select (PostgreSQL 7.3.4 on FreeBSD). Here's my query:
> ...
> The "services" table is indexed only on "id" column and has about a
> thousand tuples. sreq(integer, text, timestamptz) is a strict immutable
> function written in SQL.

IIRC, there were intraquery memory leaks associated with SQL-language
functions until fairly recently.  Can you try your problem case on 7.4?
Or see if you can rewrite the sreq function in plpgsql.
        regards, tom lane


Re: Memory usage on subselect

From
"Alexander M. Pravking"
Date:
On Sun, May 23, 2004 at 12:28:18PM -0400, Tom Lane wrote:
> "Alexander M. Pravking" <fduch@antar.bryansk.ru> writes:
> > I've got a memory usage problem when I try to do a subselect on the same
> > table as the main select (PostgreSQL 7.3.4 on FreeBSD). Here's my query:
> > ...
> > The "services" table is indexed only on "id" column and has about a
> > thousand tuples. sreq(integer, text, timestamptz) is a strict immutable
> > function written in SQL.
> 
> IIRC, there were intraquery memory leaks associated with SQL-language
> functions until fairly recently.  Can you try your problem case on 7.4?
> Or see if you can rewrite the sreq function in plpgsql.

Thanks, Tom, both 7.4.1 and plpgsql function on 7.3.4 work well. However,
plpgsql function is a little slower, but that's another story.

I also tried to turn off IMMUTABLE for the function, but it did not
affect memory usage on 7.3.4.

BTW, after cancelling the original query postgres freed all the memory,
and used ~7M again, so the leak was not "forever".


Good luck, thansk again ;)

-- 
Fduch M. Pravking


Re: Memory usage on subselect

From
Dan Field
Date:
On 23 May 2004, at 19:32, Alexander M. Pravking wrote:
> BTW, after cancelling the original query postgres freed all the memory,
> and used ~7M again, so the leak was not "forever".
>

I have a similar problem with just one of my queries (although it isn't
a sub select):

SELECT
    da_records.*
FROM
    da_records, lu_dewey, da_dewey_list
WHERE
    (da_records.RECORD_ID = da_dewey_list.RECORD_ID) AND
    (lu_dewey.DEWEY_ID = da_dewey_list.DEWEY_ID) AND
    (lu_dewey.DEWEY_HUNDREDS = 7) AND
    (lu_dewey.DEWEY_TENS = 0) AND
    (lu_dewey.DEWEY_ONES = 8) AND
    (lu_dewey.DEWEY_LANG = 'en') AND
    (lu_dewey.DEWEY_TYPE = 't') AND
    (lu_dewey.DEWEY_ARCHIVE IS NOT TRUE) AND
    (lu_dewey.dewey_point_ones IS NULL) AND
    (lu_dewey.dewey_point_tens IS NULL) AND
    (lu_dewey.dewey_point_hundreds IS NULL) AND
    (lu_dewey.dewey_point_thousands IS NULL) AND
    (lu_dewey.dewey_point_tenthousands IS NULL) AND
    (lu_dewey.dewey_point_hundredthousands IS NULL)

EXPLAIN results:

Hash Join  (cost=57.58..82.89 rows=25 width=661)
    Hash Cond: ("outer".record_id = "inner".record_id)
    ->  Seq Scan on da_records  (cost=0.00..20.00 rows=1000 width=649)
    ->  Hash  (cost=57.56..57.56 rows=5 width=12)
          ->  Hash Join  (cost=32.50..57.56 rows=5 width=12)
                Hash Cond: ("outer".dewey_id = "inner".dewey_id)
                ->  Seq Scan on da_dewey_list  (cost=0.00..20.00
rows=1000 width=8)
                ->  Hash  (cost=32.50..32.50 rows=1 width=4)
                      ->  Seq Scan on lu_dewey  (cost=0.00..32.50 rows=1
width=4)
                            Filter: (((dewey_hundreds)::text =
'7'::text) AND ((dewey_tens)::text = '0'::text) AND ((dewey_ones)::text
= '8'::text) AND (dewey_lang = 'en'::bpchar) AND (dewey_type =
't'::bpchar) AND (dewey_archive IS NOT TRUE) AND (dewey_point_ones IS
NULL) AND (dewey_point_tens IS NULL) AND (dewey_point_hundreds IS NULL)
AND (dewey_point_thousands IS NULL) AND (dewey_point_tenthousands IS
NULL) AND (dewey_point_hundredthousands IS NULL))
(10 rows)

--
Dan Field <dof@llgc.org.uk> - Support Programmer: Cymru ar y we
cy_GB: http://www.cymruarywe.org
en_GB: http://www.walesontheweb.org

Re: Memory usage on subselect

From
Tom Lane
Date:
Dan Field <dof@llgc.org.uk> writes:
> I have a similar problem with just one of my queries (although it isn't 
> a sub select):

You really ought to vacuum and/or analyze occasionally.  The EXPLAIN
results show that the planner hasn't got any non-default statistics
for any of these tables.
        regards, tom lane


Re: Memory usage on subselect

From
Dan Field
Date:
On 24 May 2004, at 14:37, Tom Lane wrote:

> Dan Field <dof@llgc.org.uk> writes:
>> I have a similar problem with just one of my queries (although it 
>> isn't
>> a sub select):
>
> You really ought to vacuum and/or analyze occasionally.  The EXPLAIN
> results show that the planner hasn't got any non-default statistics
> for any of these tables.

Wow, thanks for that. I'd been pulling my hair out for a couple of days 
wondering where I was going wrong.
I went from 45 second queries down to sub second query lengths  after a 
simple vacuum full analyze.

I've now added nightly and monthly cron jobs to do this for me in 
future.

Out of curiosity, why is this deemed a DBA task rather than an 
automated postgres task?

Once again, many thanks.

-- 
Dan Field <dof@llgc.org.uk> - Support Programmer: Cymru ar y we
cy_GB: http://www.cymruarywe.org
en_GB: http://www.walesontheweb.org



Re: Memory usage on subselect

From
Gaetano Mendola
Date:
Dan Field wrote:

> 
> On 24 May 2004, at 14:37, Tom Lane wrote:
> 
>> Dan Field <dof@llgc.org.uk> writes:
>>
>>> I have a similar problem with just one of my queries (although it isn't
>>> a sub select):
>>
>>
>> You really ought to vacuum and/or analyze occasionally.  The EXPLAIN
>> results show that the planner hasn't got any non-default statistics
>> for any of these tables.
> 
> 
> Wow, thanks for that. I'd been pulling my hair out for a couple of days 
> wondering where I was going wrong.
> I went from 45 second queries down to sub second query lengths  after a 
> simple vacuum full analyze.
> 
> I've now added nightly and monthly cron jobs to do this for me in future.
> 
> Out of curiosity, why is this deemed a DBA task rather than an automated 
> postgres task?
> 
> Once again, many thanks.

You have to use the pg_autovacuum demon.

Run the vacuum full and the reindex once in a week.


Regards
Gaetano Mendola






Re: The PostgreSQL

From
sad
Date:
Dear developers,

many tricks were discussed here,
and many feature requests generated.

Every discussed problem ALREADY have solution in the current version of the
SQL. Most of the problems are forced by incorrect design of a database !

I ask you:
do not burden the PostgreSQL with features !!! Please !!!

So well known Oracle is now sinking in the ocean of own features.

Thnx.