Thread: Re: [postgis-users] Union as an aggregate

Re: [postgis-users] Union as an aggregate

From
strk
Date:
dblasby wrote:
> strk wrote:
> > Dave, I made no special 'unite' code, just sql create command:
> > CREATE AGGREGATE unite (
> >     sfunc = GeomUnion,
> >     basetype = geometry,
> >     stype = geometry
> > );
>
> Hum - this shouldnt do anything.
>
> The aggregate should have "null" as its initial value.
>
> GeomUnion(null,<geometry>) --> null.
>
> So, you'll end up with null *very* quickly because
> GeomUnion(null,<geometry>) returns null without any actual execution.

It looks like postgres is behaving differently since I always get
a not-null result.

>
> Are you running this on a big table?  At worst I'd expect this to use up
> memory the same size all the geometries in the table.
>

Geometry set size is about 2MB:
  gis=# select sum(mem_size(the_geom)) from world where gid < 50;
  2148316

Initial postmaster process status is:
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  566 pgsql     15   0  3288 3288  2684 S     0.0  0.4   0:00 postmaster

After query:
  gis=# select box(unite(the_geom)) from plmshp02_1 where gid < 50;
  (180,83.1138763427734),(-180,-90)

The postmaster grew of about 66M (having touched a 78MB upper limit):
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  566 pgsql     19   0 71472  69M  3396 S     0.0  9.2   3:35 postmaster

If I run that again, *exactly the same query*:
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  566 pgsql     14   0  126M 126M  3396 S     0.0 16.8   7:13 postmaster

It looks like someone is leaking memory, either postgres, postgis or geos.
Does it sound correct ?

PS: postgres is 7.3.4, postgis and geos are latest cvs.
--strk;

Re: [postgis-users] Union as an aggregate

From
Tom Lane
Date:
strk <strk@keybit.net> writes:
> dblasby wrote:
>> Hum - this shouldnt do anything.
>> The aggregate should have "null" as its initial value.
>> GeomUnion(null,<geometry>) --> null.
>> So, you'll end up with null *very* quickly because
>> GeomUnion(null,<geometry>) returns null without any actual execution.

> It looks like postgres is behaving differently since I always get
> a not-null result.

Yes.  There's a special definition for aggregate behavior when the
transition function is strict and the initial value is null.  As Dave
observes, such a combination would have no useful purpose if we did
not special-case it.  The special definition is "take the first
non-null input as the initial transition value, and then apply the
transition function for each subsequent non-null input".  This does
exactly what's needed for MAX and MIN, and is probably about what
you'd want for a geometric union operator too.

> If I run that again, *exactly the same query*:
>   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
>   566 pgsql     14   0  126M 126M  3396 S     0.0 16.8   7:13 postmaster

> It looks like someone is leaking memory, either postgres, postgis or geos.

On some platforms top's report of memory used can be misleading, because
it begins to count each page of shared memory against a process when the
process first touches that page.  So if you have a big scan that touches
more and more of the shared buffers, the reported process size goes up
--- but there's really no memory leak.  Try a plain "select count(*)"
against your table and see if you see the same change in reported size.
Alternatively, if the reported size continues to increase well beyond
your shared memory allocation, then I'd believe that as evidence of a
leak.

            regards, tom lane

Re: [postgis-users] Union as an aggregate

From
strk
Date:
tgl wrote:
> strk <strk@keybit.net> writes:
> > If I run that again, *exactly the same query*:
> >   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
> >   566 pgsql     14   0  126M 126M  3396 S     0.0 16.8   7:13 postmaster
>
> > It looks like someone is leaking memory, either postgres, postgis or geos.
>
> On some platforms top's report of memory used can be misleading, because
> it begins to count each page of shared memory against a process when the
> process first touches that page.  So if you have a big scan that touches
> more and more of the shared buffers, the reported process size goes up
> --- but there's really no memory leak.  Try a plain "select count(*)"
> against your table and see if you see the same change in reported size.

No changes in size with count(*).
Testing platform is FreeBSD.

> Alternatively, if the reported size continues to increase well beyond
> your shared memory allocation, then I'd believe that as evidence of a
> leak.
>
>             regards, tom lane

thanks for you answer.

--strk;