Multicolun index creation never completes on 9.0.1/solaris - Mailing list pgsql-bugs

From Josh Berkus
Subject Multicolun index creation never completes on 9.0.1/solaris
Date
Msg-id 4D3F2D10.1020905@agliodbs.com
Whole thread Raw
Responses Re: Multicolun index creation never completes on 9.0.1/solaris  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Note: I have this running now on a test box.  If someone responds in the
next couple hours, I can run whatever diagnostics you want on it.
Otherwise I'll kill it off and start over with debug logging turned on.

Version: 9.0.1
Platform: Solaris 10u8 / ZFS over DAS 7-drive array
Severity: Serious
Reproduceable? Always, on this machine
Description:

I've been running tests for a client on their old production machine.
Part of the test setup involves restoring a 70GB table (part of a 400GB
database).  Restoring this table always fails to complete (we terminate
it after 3 days).

Watching it closer this time, I've noticed that it's hanging on the
creation of a multicolumn index*:

"write_log_accounttime_idx" btree (account_id, event_time)

Table def*:

                   Table "public.write_log"
     Column     |           Type           |       Modifiers
----------------+--------------------------+------------------------
 log_id         | character varying        | not null
 account_id     | character varying        |
 write_id       | character varying        |
 event_time     | timestamp with time zone | not null default now()
 event_loc      | character varying(1024)  |
 source         | character varying(1024)  |
 detail         | character varying(1024)  |
 source_ip      | inet                     |
 event_length   | integer                  |
 keyword        | character varying        |

account_id is usually a 32-character hex string (a UUID).

When trying to restore this index ... or, indeed, do *anything* else
with in including ANALYZE or REINDEX ... takes longer than we're willing
to monitor in order to complete (I once waited 30 hours).  (I was able
to test Analyze/reindex because the first time I converted the database
using pg_upgrade).

Interestingly, on the production database (where the index was
successfully created on 8.4.4., that index has no entries in the stats
table, i.e. pg_stat_user_indexes does not have a row for that index.
However, this database *did* restore on the same platform in 8.4.4.

The PK for this table restores successfully.  I haven't been able to
test other single-column indexes.

Watching the process of creating this index, it's certainly busy:

read(224, " o02\0\0E0C9 ]D101\0\0\0".., 8192)   = 8192
read(224, " o02\0\0 P\f ^D101\0\0\0".., 8192)   = 8192
read(224, " o02\0\0 P88E7D101\0\0\0".., 8192)   = 8192
read(224, " o02\0\010 . ^D101\0\0\0".., 8192)   = 8192
read(224, " o02\0\0B0 r ^D101\0\0\0".., 8192)   = 8192
read(224, " o02\0\010 q ^D101\0\0\0".., 8192)   = 8192
read(224, " o02\0\0C8F9 ^D101\0\0\0".., 8192)   = 8192
read(224, " o02\0\0 hBE aD101\0\0\0".., 8192)   = 8192
read(224, " o02\0\0C8D6 ^D101\0\0\0".., 8192)   = 8192
read(224, " o02\0\0 `F8 ^D101\0\0\0".., 8192)   = 8192
write(219, " e\0\0\0\0\0\0\08082D4 V".., 8192)  = 8192
read(224, " o02\0\08085 xD101\0\0\0".., 8192)   = 8192
read(224, " o02\0\0E8 ] _D101\0\0\0".., 8192)   = 8192
read(224, " o02\0\0 ( _ _D101\0\0\0".., 8192)   = 8192
read(224, " o02\0\0B080 _D101\0\0\0".., 8192)   = 8192
read(224, " o02\0\0F8A2 _D101\0\0\0".., 8192)   = 8192
read(224, " o02\0\0 xB5EAD101\0\0\0".., 8192)   = 8192
read(224, " o02\0\0 @E4 _D101\0\0\0".., 8192)   = 8192
read(224, " o02\0\0  \b `D101\0\0\0".., 8192)   = 8192
read(224, " o02\0\090 ) `D101\0\0\0".., 8192)   = 8192
write(219, "\0 OA197E1 !01\0 ,\0\0\0".., 8192)  = 8192
read(224, " o02\0\0B0 K `D101\0\0\0".., 8192)   = 8192
read(224, " o02\0\01002 bD101\0\0\0".., 8192)   = 8192
read(224, " o02\0\0188E `D101\0\0\0".., 8192)   = 8192
read(224, " o02\0\0 HB0 `D101\0\0\0".., 8192)   = 8192
read(224, " o02\0\0 HD2 `D101\0\0\0".., 8192)   = 8192
read(224, " o02\0\0 H U\vD201\0\0\0".., 8192)   = 8192
read(224, " o02\0\0 x14 aD101\0\0\0".., 8192)   = 8192
read(224, " o02\0\0C8B9 aD101\0\0\0".., 8192)   = 8192
read(224, " o02\0\010BB aD101\0\0\0".., 8192)   = 8192
read(224, " o02\0\0 h v aD101\0\0\0".., 8192)   = 8192
write(219, " ,\0\0\0F4\0E8 *05\0 ( @".., 8192)  = 8192

This is also the kind of activity I'd expect to see for an index build.
 But ... it goes on forever.  The current index build run has been going
for almost 100 hours.

More detail:
    mainentance_work_mem = 1GB (out of 32GB RAM available)
    9.0.1 was built from source, using Sun CC
    for the current test, I'm running with fsync off
        (but I've had the same issue with fsync on)

Ideas?

(* object names have been changed to preserve confidentiality)

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

pgsql-bugs by date:

Previous
From: Robert Haas
Date:
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Next
From: Alvaro Herrera
Date:
Subject: Re: BUG #5845: Postgres does not seem to handle unquoted upper cased object identifiers