Thread: Large index operation crashes postgres

Large index operation crashes postgres

From
Frans Hals
Date:
Hi,

running a geo-database from a dump restore where still one of the most
important indexes is missing and so the search is slow.
Whenever I try to add the follwing index to the table "placex", one of
the postmaster processes dies and the server restarts.

I try:
CREATE INDEX idx_placex_sector ON placex USING btree
(geometry_sector(geometry), rank_address, osm_type, osm_id);

The table counts around 50.000.000 rows.
The first 20.000.000 are indexed in 20-30 minutes. Nice!
Then indexing becomes slow and slower, first taking 100.000 rows in
ten minutes while further consequently decreasing speed.
When the job reaches something around row 25.000.000 postgres goes down:

WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.

I have checked RAM and changed the HD with no success.
Experimenting with a lot of different memory settings in the conf-file
didn't help either.
Is there anybody else who experienced this and found a way to create this index?
Server is postgres 8.3.9 with 4 GB dedicated RAM.

gemoetry_sector function looks like this (postgis):

DECLARE
  NEWgeometry geometry;
BEGIN
-- RAISE WARNING '%',place;
  NEWgeometry := place;
  IF ST_IsEmpty(NEWgeometry) OR NOT ST_IsValid(NEWgeometry) OR
ST_X(ST_Centroid(NEWgeometry))::text in ('NaN','Infinity','-Infinity')
OR ST_Y(ST_Centroid(NEWgeometry))::text in
('NaN','Infinity','-Infinity') THEN
    NEWgeometry := ST_buffer(NEWgeometry,0);
    IF ST_IsEmpty(NEWgeometry) OR NOT ST_IsValid(NEWgeometry) OR
ST_X(ST_Centroid(NEWgeometry))::text in ('NaN','Infinity','-Infinity')
OR ST_Y(ST_Centroid(NEWgeometry))::text in
('NaN','Infinity','-Infinity') THEN
      RETURN NULL;
    END IF;
  END IF;
  RETURN (500-ST_X(ST_Centroid(NEWgeometry))::integer)*1000 +
(500-ST_Y(ST_Centroid(NEWgeometry))::integer);
END;

The subcalled St_Centroid is a postgis C-function located in
/usr/lib/postgresql/8.3/lib/liblwgeom.


Anybody out there has an idea what happens or better how to reach the
50.000.000?

Thanks
Frans

Re: Large index operation crashes postgres

From
Tom Lane
Date:
Frans Hals <fhals7@googlemail.com> writes:
> Whenever I try to add the follwing index to the table "placex", one of
> the postmaster processes dies and the server restarts.

Can you provide a stack trace from the crash?

> I try:
> CREATE INDEX idx_placex_sector ON placex USING btree
> (geometry_sector(geometry), rank_address, osm_type, osm_id);

> The table counts around 50.000.000 rows.
> The first 20.000.000 are indexed in 20-30 minutes. Nice!
> Then indexing becomes slow and slower, first taking 100.000 rows in
> ten minutes while further consequently decreasing speed.
> When the job reaches something around row 25.000.000 postgres goes down:

On what exactly do you base these statements about number of rows
processed?  CREATE INDEX doesn't provide any such feedback that I'm
aware of.

            regards, tom lane

Re: Large index operation crashes postgres

From
Paul Ramsey
Date:
Can you do?

alter table placex add column geometry_sector integer;
update placex set geometry_sector = geometry_sector(geometry);

P.

On Wed, Mar 24, 2010 at 1:15 PM, Frans Hals <fhals7@googlemail.com> wrote:
> Hi,
>
> running a geo-database from a dump restore where still one of the most
> important indexes is missing and so the search is slow.
> Whenever I try to add the follwing index to the table "placex", one of
> the postmaster processes dies and the server restarts.
>
> I try:
> CREATE INDEX idx_placex_sector ON placex USING btree
> (geometry_sector(geometry), rank_address, osm_type, osm_id);
>
> The table counts around 50.000.000 rows.
> The first 20.000.000 are indexed in 20-30 minutes. Nice!
> Then indexing becomes slow and slower, first taking 100.000 rows in
> ten minutes while further consequently decreasing speed.
> When the job reaches something around row 25.000.000 postgres goes down:
>
> WARNING:  terminating connection because of crash of another server process
> DETAIL:  The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
>
> I have checked RAM and changed the HD with no success.
> Experimenting with a lot of different memory settings in the conf-file
> didn't help either.
> Is there anybody else who experienced this and found a way to create this index?
> Server is postgres 8.3.9 with 4 GB dedicated RAM.
>
> gemoetry_sector function looks like this (postgis):
>
> DECLARE
>  NEWgeometry geometry;
> BEGIN
> -- RAISE WARNING '%',place;
>  NEWgeometry := place;
>  IF ST_IsEmpty(NEWgeometry) OR NOT ST_IsValid(NEWgeometry) OR
> ST_X(ST_Centroid(NEWgeometry))::text in ('NaN','Infinity','-Infinity')
> OR ST_Y(ST_Centroid(NEWgeometry))::text in
> ('NaN','Infinity','-Infinity') THEN
>    NEWgeometry := ST_buffer(NEWgeometry,0);
>    IF ST_IsEmpty(NEWgeometry) OR NOT ST_IsValid(NEWgeometry) OR
> ST_X(ST_Centroid(NEWgeometry))::text in ('NaN','Infinity','-Infinity')
> OR ST_Y(ST_Centroid(NEWgeometry))::text in
> ('NaN','Infinity','-Infinity') THEN
>      RETURN NULL;
>    END IF;
>  END IF;
>  RETURN (500-ST_X(ST_Centroid(NEWgeometry))::integer)*1000 +
> (500-ST_Y(ST_Centroid(NEWgeometry))::integer);
> END;
>
> The subcalled St_Centroid is a postgis C-function located in
> /usr/lib/postgresql/8.3/lib/liblwgeom.
>
>
> Anybody out there has an idea what happens or better how to reach the
> 50.000.000?
>
> Thanks
> Frans
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: Large index operation crashes postgres

From
Frans Hals
Date:
5 x postmaster taking memory:

93.3 %
18.7 %
 0.3 %
 0.2 %
 0.0 %
--------
112.5%

Looks like there is someone living beyond its means?

Frans

2010/3/24 Tom Lane <tgl@sss.pgh.pa.us>:

> Hm.  I wonder about a memory leak in there somewhere.  Have you checked
> the process size while this is going on?
>
>                        regards, tom lane
>

Re: Large index operation crashes postgres

From
Frans Hals
Date:
Tom,

ran a CREATE INDEX to the gdb operated postmaster.
Nothing new due to missing debugging libraries, so this might not help:

Reading symbols from /usr/bin/postmaster...(no debugging symbols found)...done.
Missing separate debuginfos, use: debuginfo-install
postgresql-server-8.3.9-1PGDG.f12.x86_64
(gdb) run
Starting program: /usr/bin/postmaster
[Thread debugging using libthread_db enabled]
Detaching after fork from child process 2869.
LOG:  database system was interrupted; last known up at 2010-03-24 05:51:12 PDT
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at CF/D482490
LOG:  unexpected pageaddr CE/AD490000 in log file 207, segment 13,
offset 4784128
LOG:  redo done at CF/D48FB80
Detaching after fork from child process 2870.
Detaching after fork from child process 2871.
Detaching after fork from child process 2872.
LOG:  database system is ready to accept connections
Detaching after fork from child process 2996.
Detaching after fork from child process 3682.
Detaching after fork from child process 3685.
Detaching after fork from child process 3687.
Detaching after fork from child process 3688.
Detaching after fork from child process 3690.
Detaching after fork from child process 3691.
Detaching after fork from child process 3696.
Detaching after fork from child process 3698.
Detaching after fork from child process 3702.
Detaching after fork from child process 3706.
Detaching after fork from child process 3710.
LOG:  background writer process (PID 2870) was terminated by signal 9: Killed
LOG:  terminating any other active server processes
LOG:  statistics collector process (PID 2872) was terminated by signal 9: Killed
Detaching after fork from child process 5595.
FATAL:  the database system is in recovery mode

Is there anything meaningful for you?
Does it makes sense to install the debuginfos to catch the problem? If
yes, I may do so.

Paul asked me, to check another function of the postgis set for a
memory leak. I will try this now.

Thanks & regards
Frans

2010/3/24 Tom Lane <tgl@sss.pgh.pa.us>:

> Can you provide a stack trace from the crash?
>

Re: Large index operation crashes postgres

From
Frans Hals
Date:
Paul,

I have started the operation right now after a fresh reboot of the machine.
<update placex set leakcheck = st_x(st_centroid(geometry));>
 Please give me some time to collect the results...

Thanks & regards
Frans

2010/3/25 Paul Ramsey <pramsey@cleverelephant.ca>:
> If you build an index, or try to update a column, using just the
>
> ST_X(ST_Centroid())
>
> without all the extra pl/pgsql wrapping, do you still see the memory
> footprint grow substantially?
>
> P.

Re: Large index operation crashes postgres

From
Frans Hals
Date:
Operation is now running for around 13 hrs.
Two postmaster processes above 1% memory usage are running.

One of them uses constantly 26.5% of memory.
The other one is growing:
After 1 hr        25%
After 9 hrs      59%
After 13 hrs    64%

Thanks & regards
Frans


2010/3/25 Frans Hals <fhals7@googlemail.com>:
> Paul,
>
> I have started the operation right now after a fresh reboot of the machine.
> <update placex set leakcheck = st_x(st_centroid(geometry));>
>  Please give me some time to collect the results...
>
> Thanks & regards
> Frans
>
> 2010/3/25 Paul Ramsey <pramsey@cleverelephant.ca>:
>> If you build an index, or try to update a column, using just the
>>
>> ST_X(ST_Centroid())
>>
>> without all the extra pl/pgsql wrapping, do you still see the memory
>> footprint grow substantially?
>>
>> P.
>

Re: Large index operation crashes postgres

From
Tom Lane
Date:
Frans Hals <fhals7@googlemail.com> writes:
> Operation is now running for around 13 hrs.
> Two postmaster processes above 1% memory usage are running.

> One of them uses constantly 26.5% of memory.
> The other one is growing:
> After 1 hr        25%
> After 9 hrs      59%
> After 13 hrs    64%

Well, it's pretty clear that you've found a memory leak, but that was
what we thought before; this data doesn't move us any closer to a fix.
In particular it's not possible to guess whether the leak should be
blamed on Postgres or Postgis code.  Even if we knew that, I'm not
sure we could fix the leak without tracing through actual execution.

Can you generate a self-contained test case that exhibits similar bloat?
I would think it's probably not very dependent on the specific data in
the column, so a simple script that constructs a lot of random data
similar to yours might be enough, if you would rather not show us your
real data.

            regards, tom lane

Re: Large index operation crashes postgres

From
Paul Ramsey
Date:
Occams razor says it's PostGIS. However, I'm concerned about how old
the code being run is. In particular, the library underneath PostGIS,
GEOS, had a *lot* of memory work done on it over the last year. I'd
like to see if things improve if you upgrade to GEOS 3.2.

On Fri, Mar 26, 2010 at 9:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Frans Hals <fhals7@googlemail.com> writes:
>> Operation is now running for around 13 hrs.
>> Two postmaster processes above 1% memory usage are running.
>
>> One of them uses constantly 26.5% of memory.
>> The other one is growing:
>> After 1 hr        25%
>> After 9 hrs      59%
>> After 13 hrs    64%
>
> Well, it's pretty clear that you've found a memory leak, but that was
> what we thought before; this data doesn't move us any closer to a fix.
> In particular it's not possible to guess whether the leak should be
> blamed on Postgres or Postgis code.  Even if we knew that, I'm not
> sure we could fix the leak without tracing through actual execution.
>
> Can you generate a self-contained test case that exhibits similar bloat?
> I would think it's probably not very dependent on the specific data in
> the column, so a simple script that constructs a lot of random data
> similar to yours might be enough, if you would rather not show us your
> real data.
>
>                        regards, tom lane
>

Re: Large index operation crashes postgres

From
Frans Hals
Date:
Tom,

I'm pretty new to memory debugging, so please be patient if I'm not as
precise as you suppose me to be.
For the records I have run a valgrind postmaster session, starting my
initial indexing routine until it crashes postgres.
If you think this might be enlightning for you, I'll send you the
transcript. It's too long for the list.

I'm not sure, what you're thinking about generating a self-contained
test that exhibits similar bloat.
I have started an index creation using my data without calling postgis
functions. Just to make it busy:
<CREATE INDEX idx_placex_sector ON placex USING btree
(substring(geometry,1,100), rank_address, osm_type, osm_id);>
This is now running against the 50.000.000 rows in placex. I will
update you about the memory usage it takes.

The data itself isn't a secret. I need your experience to find and fix
the problem. For myself I 'll try all necessary steps, you suggest me
to do.

Kind regards
Frans

2010/3/26 Tom Lane <tgl@sss.pgh.pa.us>:
> Well, it's pretty clear that you've found a memory leak, but that was
> what we thought before; this data doesn't move us any closer to a fix.
> In particular it's not possible to guess whether the leak should be
> blamed on Postgres or Postgis code.  Even if we knew that, I'm not
> sure we could fix the leak without tracing through actual execution.
>
> Can you generate a self-contained test case that exhibits similar bloat?
> I would think it's probably not very dependent on the specific data in
> the column, so a simple script that constructs a lot of random data
> similar to yours might be enough, if you would rather not show us your
> real data.
>
>                        regards, tom lane
>

Re: Large index operation crashes postgres

From
Frans Hals
Date:
The index mentioned below has been created in some minutes without problems.
Dropped it and created it again. Uses around 36 % of memorywhile
creating, after completion postmaster stays at 26 %.


> I'm not sure, what you're thinking about generating a self-contained
> test that exhibits similar bloat.
> I have started an index creation using my data without calling postgis
> functions. Just to make it busy:
> <CREATE INDEX idx_placex_sector ON placex USING btree
> (substring(geometry,1,100), rank_address, osm_type, osm_id);>
> This is now running against the 50.000.000 rows in placex. I will
> update you about the memory usage it takes.
>
>> Can you generate a self-contained test case that exhibits similar bloat?
>> I would think it's probably not very dependent on the specific data in
>> the column, so a simple script that constructs a lot of random data
>> similar to yours might be enough, if you would rather not show us your
>> real data.
>>
>>                        regards, tom lane
>>
>

Re: Large index operation crashes postgres

From
Frans Hals
Date:
Paul,

I kindly received the information about the table data (quoting here):

>It changes as it goes down the table, it's a right mixture.

 ST_LineString   |            2 |  5398548
 ST_LineString   |            3 |  2877681
 ST_LineString   |            4 |  2160809
 ST_LineString   |            5 |  1696900
 ST_LineString   |            6 |  1362231
 ST_LineString   |            7 |  1107733
 ST_LineString   |            8 |   915616
 ST_LineString   |            9 |   766904
 ST_LineString   |           10 |   646150
 ST_LineString   |           11 |   550356
 ST_LineString   |           12 |   473357
 ST_LineString   |           13 |   410038
 ST_LineString   |           14 |   358185
 ST_LineString   |           15 |   313985
 ST_LineString   |           16 |   278846
 ST_LineString   |           17 |   248253
 ST_LineString   |           18 |   220736
 ST_LineString   |           19 |   198809
 ST_LineString   |           20 |   179552
 ST_LineString   |           21 |   162140
 ST_LineString   |           22 |   147957
 ST_LineString   |           23 |   134321
 ST_LineString   |           24 |   123805
 ST_LineString   |           25 |   113805
 ST_LineString   |           26 |   105329
 ST_LineString   |           27 |    96809
 ST_LineString   |           28 |    90105
 ST_LineString   |           29 |    83137
 ST_LineString   |           30 |    77846
 ST_LineString   |           31 |    72963
 ST_LineString   |           32 |    67830
 ST_LineString   |           33 |    63849
 ST_LineString   |           34 |    60241
 ST_LineString   |           35 |    56312
 ST_LineString   |           36 |    52805
 ST_LineString   |           37 |    49919
 ST_LineString   |           38 |    47402
 ST_LineString   |           39 |    44860
 ST_LineString   |           40 |    41987
 ST_LineString   |           41 |    40055
 ST_LineString   |           42 |    38173
 ST_LineString   |           43 |    36649
 ST_LineString   |           44 |    34464
 ST_LineString   |           45 |    32637
 ST_LineString   |           46 |    31695
 ST_LineString   |           47 |    29851
 ST_LineString   |           48 |    28546
 ST_LineString   |           49 |    27419
 ST_LineString   |           50 |    26784
....
 ST_LineString   |         1993 |        2
 ST_LineString   |         1995 |        1
 ST_LineString   |         1997 |        6
 ST_LineString   |         1998 |        5
 ST_LineString   |         1999 |        3
 ST_LineString   |         2000 |        9
 ST_Point        |              | 20648939
 ST_MultiPolygon |              |     6188
 ST_Polygon      |              |  8054680


>One thing you could try is indexing each geometry type in turn and
>watch the memory usage.

If indexing starts sequentially from the beginning to the end, we
reach ST_Point around row 22.000.000. This is from the count of
geometry operations exactly where the slowdown begins.
Does this track us to the leak?

Thanks & regards
Frans

2010/3/26 Paul Ramsey <pramsey@cleverelephant.ca>:

>
> Could you
> - put in your version information
> - tell me what kind of spatial objects you have? polygons of > 100
> vertices? lines of two vertices? etc. That will help me pick similar
> data for the memory testing.
>

Re: Large index operation crashes postgres

From
Frans Hals
Date:
Paul,

do you know, if Postgis 1.3.6 will survive, if I install Geos 3.2 over
it? Currently, there's Geos 3.1.1 installed.
I remember, I had to downgrade Postgis/Geos to be compatible with
Postgres 8.3.9. As I "herited" the database from a  8.3.9 dump, I
needed to fit my installation before restoring the dump.

Thanks
Frans

2010/3/26 Paul Ramsey <pramsey@cleverelephant.ca>:
> Occams razor says it's PostGIS. However, I'm concerned about how old
> the code being run is. In particular, the library underneath PostGIS,
> GEOS, had a *lot* of memory work done on it over the last year. I'd
> like to see if things improve if you upgrade to GEOS 3.2.

Re: Large index operation crashes postgres

From
Frans Hals
Date:
Paul,

took your advice and installed Geos 3.2.0.
Index is now running for 14 hrs, postmaster is taking all the RAM.
Sadly it looks like the Geos update didn't save me.

Regards
Frans

2010/3/28 Paul Ramsey <pramsey@cleverelephant.ca>:
> GEOS 3.2 is backwards compatible, so you can install it overtop of 3.1
> and things will still work.
>
> P

>>
>> 2010/3/26 Paul Ramsey <pramsey@cleverelephant.ca>:
>>> Occams razor says it's PostGIS. However, I'm concerned about how old
>>> the code being run is. In particular, the library underneath PostGIS,
>>> GEOS, had a *lot* of memory work done on it over the last year. I'd
>>> like to see if things improve if you upgrade to GEOS 3.2.
>>
>

Re: Large index operation crashes postgres

From
Frans Hals
Date:
Paul,

I have checked the different kinds of data in the table for their memory usage.
ST_LineSting is the one that's leaking, the other types complete
indexing without leakage.
Update to Geos 3.2.0 didn't improve the operation.

Kind regards
Frans

2010/3/28 Paul Ramsey <pramsey@cleverelephant.ca>:
> MIght be random, might be a clue, we'll see. So it looks like much of
> the table is two-point lines and points.
>
> P
>
> On Sat, Mar 27, 2010 at 1:16 PM, Frans Hals <fhals7@googlemail.com> wrote:
>
>>  ST_Point        |              | 20648939
>>  ST_MultiPolygon |              |     6188
>>  ST_Polygon      |              |  8054680

Re: Large index operation crashes postgres

From
strk
Date:
On Mon, Mar 29, 2010 at 09:39:35AM +0200, Frans Hals wrote:
> Paul,
>
> took your advice and installed Geos 3.2.0.
> Index is now running for 14 hrs, postmaster is taking all the RAM.
> Sadly it looks like the Geos update didn't save me.

Frans, could you try installing GEOS 3.2.2 ?
A leak was fixed there. You should need no action
on the postgis side to get the fix.

--strk;

  ()   Free GIS & Flash consultant/developer
  /\   http://strk.keybit.net/services.html