Thread: Postgres 7.2.2 Segment Error

Postgres 7.2.2 Segment Error

From

"Christopher Kings-Lynne"

Date:

19 September 2002, 03:16:08

I just saw this in my logs:

2002-09-18 12:13:10 ERROR:  cannot open segment 1 of relation users_sessions
(target block 1342198864): No such file or directory

This query caused it:

DELETE FROM users_sessions WHERE changed < ('now'::timestamp - '1440
minutes'::interval)  AND name = 'fhnid';

However, I cannot repeat the error now.  Is this a bug in postgres
somewhere.

Also, what should I do to fix the table properly.  I haven't vacuumed it or
anything yet in case someone wants to analyze it.

Chris

Re: Postgres 7.2.2 Segment Error

From

Tom Lane

Date:

19 September 2002, 10:14:03

"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> I just saw this in my logs:
> 2002-09-18 12:13:10 ERROR:  cannot open segment 1 of relation users_sessions
> (target block 1342198864): No such file or directory

> This query caused it:

> DELETE FROM users_sessions WHERE changed < ('now'::timestamp - '1440
> minutes'::interval)  AND name = 'fhnid';

What does EXPLAIN show as the plan for that query?  I'm guessing an
indexscan, and that the error was caused by reading a broken item
pointer from the index.  (1342198864 = hex 50005450, which sure looks
like the upper 5 shouldn't be there ... how big is the table, anyway?)

> However, I cannot repeat the error now.  Is this a bug in postgres
> somewhere.

If the broken item pointer were indeed in the index, I'd expect it to be
100% repeatable.  I'm wondering about flaky memory or some such.  Have
you run any hardware diagnostics?
        regards, tom lane

Re: Postgres 7.2.2 Segment Error

From

"Christopher Kings-Lynne"

Date:

19 September 2002, 22:05:30

> > DELETE FROM users_sessions WHERE changed < ('now'::timestamp - '1440
> > minutes'::interval)  AND name = 'fhnid';
>
> What does EXPLAIN show as the plan for that query?  I'm guessing an
> indexscan, and that the error was caused by reading a broken item
> pointer from the index.  (1342198864 = hex 50005450, which sure looks
> like the upper 5 shouldn't be there ... how big is the table, anyway?)

NOTICE:  QUERY PLAN:

Index Scan using users_sessions_cha_name_idx on users_sessions
(cost=0.00..12738.07 rows=1275 width=6) (actual time=231.74..239.39 rows=2
loops=1)
Total runtime: 239.81 msec

EXPLAIN

The size of the table:

canaveral# ls -al 44632
-rw-------  1 pgsql  pgsql  357130240 Sep 19 18:52 44632

The size of the index:

canaveral# ls -al 7331245
-rw-------  1 pgsql  pgsql  8151040 Sep 19 18:51 7331245

Holy crap - that table is huge.  It's like it's never had a vacuum full sort
of thing.  Going select count(*) takes _ages_ even though there's only 1451
rows in it - and not particularly large rows.  Actually, the longest text
entry is 3832 characters and the average is 677.

The sessions table holds normal site session data, like a uid, username,
some other stuff, etc.  However entries older than two hours or so get
deleted.  We VACUUM everynight, so why is the on-disk relation growing so
huge?

> > However, I cannot repeat the error now.  Is this a bug in postgres
> > somewhere.
>
> If the broken item pointer were indeed in the index, I'd expect it to be
> 100% repeatable.  I'm wondering about flaky memory or some such.  Have
> you run any hardware diagnostics?

No - the thought occured to me that there might be something wacky going on.
We've had problems with users_sessions before.  Remember when I mailed about
vacuum failing on it before?  You suggested doing a select for update on the
relation and that fixed it.

Chris

Re: Postgres 7.2.2 Segment Error

From

Gavin Sherry

Date:

19 September 2002, 23:42:27

On Fri, 20 Sep 2002, Christopher Kings-Lynne wrote:

> > > DELETE FROM users_sessions WHERE changed < ('now'::timestamp - '1440
> > > minutes'::interval)  AND name = 'fhnid';
> >
> > What does EXPLAIN show as the plan for that query?  I'm guessing an
> > indexscan, and that the error was caused by reading a broken item
> > pointer from the index.  (1342198864 = hex 50005450, which sure looks
> > like the upper 5 shouldn't be there ... how big is the table, anyway?)
> 
> NOTICE:  QUERY PLAN:
> 
> Index Scan using users_sessions_cha_name_idx on users_sessions
> (cost=0.00..12738.07 rows=1275 width=6) (actual time=231.74..239.39 rows=2
> loops=1)
> Total runtime: 239.81 msec
> 
> EXPLAIN
> 
> The size of the table:
> 
> canaveral# ls -al 44632
> -rw-------  1 pgsql  pgsql  357130240 Sep 19 18:52 44632

This seems remarkably large. Does pg_filedump reveal anything of interest?

Gavin

Re: Postgres 7.2.2 Segment Error

From

"Christopher Kings-Lynne"

Date:

20 September 2002, 00:18:48

> > Index Scan using users_sessions_cha_name_idx on users_sessions
> > (cost=0.00..12738.07 rows=1275 width=6) (actual
> time=231.74..239.39 rows=2
> > loops=1)
> > Total runtime: 239.81 msec
> >
> > EXPLAIN
> >
> > The size of the table:
> >
> > canaveral# ls -al 44632
> > -rw-------  1 pgsql  pgsql  357130240 Sep 19 18:52 44632
>
> This seems remarkably large. Does pg_filedump reveal anything of interest?

Where on earth do I find that?

BTW - I want to vacuum full this table but I'm holding off until someone
like Tom tells me there's nothing more to be gained from it...

Chris

Re: Postgres 7.2.2 Segment Error

From

Tom Lane

Date:

20 September 2002, 10:26:24

"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> The sessions table holds normal site session data, like a uid, username,
> some other stuff, etc.  However entries older than two hours or so get
> deleted.  We VACUUM everynight, so why is the on-disk relation growing so
> huge?

FSM not big enough, perhaps?  Try doing a vacuum full, then looking to
see how big the table is (in physical blocks) after one day's normal
usage.  You need at least enough FSM space for that many blocks
... unless you want to vacuum it more often.

> However, I cannot repeat the error now.

If you can't reproduce the error then I'm pretty well convinced that
there is no problem in the stored data itself.  This was either a
hardware glitch or a software bug causing a memory stomp on the top byte
of an item pointer retrieved from the index.  Although I can't rule out
the latter, I find it unlikely given that we don't have similar reports
from other people.

You may as well do the VACUUM FULL --- I doubt we can learn anything
from examining the table.
        regards, tom lane