Re: Multicolun index creation never completes on 9.0.1/solaris - Mailing list pgsql-bugs

From Tom Lane
Subject Re: Multicolun index creation never completes on 9.0.1/solaris
Date
Msg-id 23139.1296069017@sss.pgh.pa.us
Whole thread Raw
In response to Re: Multicolun index creation never completes on 9.0.1/solaris  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Multicolun index creation never completes on 9.0.1/solaris
List pgsql-bugs
Josh Berkus <josh@agliodbs.com> writes:
> ... I'll point out that the elapsed times which trace_sort is giving me
> are clearly not clock times; I started this index run at 7pm PST
> yesterday and it's been 15 hours, not 2 as the elapsed time would suggest.

[ raised eyebrow... ]  Those numbers are the differences between two
gettimeofday() readings.  It's really really hard to believe that that's
wrong, unless there's something seriously wrong with your machine.

Now, the starting time is the time at which tuplesort_begin_common was
called, but that should be more or less immediately at the start of a
CREATE INDEX command.

FWIW, I did a test last night on the time to sort some random
(varchar, timestamptz) data using git HEAD.  I didn't have enough disk
space to sort 1.4 billion rows, but I tested with 200 million rows and
1GB maintenance_work_mem, and was able to create an index in 2424
seconds (~40 minutes) --- this on a pretty generic desktop machine.
I don't see a reason to think the runtime for 1.4 billion would have
been over 5 hours.  The log output for my test looked like

LOG:  begin index sort: unique = f, workMem = 1048576, randomAccess = f
STATEMENT:  create index ii on big (account_id, event_time);
LOG:  switching to external sort with 3745 tapes: CPU 1.43s/4.85u sec elapsed 8.17 sec
STATEMENT:  create index ii on big (account_id, event_time);
LOG:  finished writing run 1 to tape 0: CPU 5.33s/127.78u sec elapsed 145.15 sec
STATEMENT:  create index ii on big (account_id, event_time);
LOG:  finished writing run 2 to tape 1: CPU 9.97s/273.53u sec elapsed 311.10 sec
STATEMENT:  create index ii on big (account_id, event_time);
LOG:  finished writing run 3 to tape 2: CPU 14.83s/422.90u sec elapsed 483.71 sec
STATEMENT:  create index ii on big (account_id, event_time);
LOG:  finished writing run 4 to tape 3: CPU 19.77s/572.60u sec elapsed 653.36 sec
STATEMENT:  create index ii on big (account_id, event_time);
LOG:  finished writing run 5 to tape 4: CPU 24.46s/722.20u sec elapsed 825.30 sec
STATEMENT:  create index ii on big (account_id, event_time);
LOG:  finished writing run 6 to tape 5: CPU 29.37s/871.97u sec elapsed 996.62 sec
STATEMENT:  create index ii on big (account_id, event_time);
LOG:  finished writing run 7 to tape 6: CPU 34.21s/1020.82u sec elapsed 1168.62 sec
STATEMENT:  create index ii on big (account_id, event_time);
LOG:  finished writing run 8 to tape 7: CPU 38.92s/1169.84u sec elapsed 1337.46 sec
STATEMENT:  create index ii on big (account_id, event_time);
LOG:  finished writing run 9 to tape 8: CPU 43.48s/1319.35u sec elapsed 1508.72 sec
STATEMENT:  create index ii on big (account_id, event_time);
LOG:  finished writing run 10 to tape 9: CPU 48.31s/1468.19u sec elapsed 1678.14 sec
STATEMENT:  create index ii on big (account_id, event_time);
LOG:  finished writing run 11 to tape 10: CPU 53.08s/1617.21u sec elapsed 1847.99 sec
STATEMENT:  create index ii on big (account_id, event_time);
LOG:  performsort starting: CPU 55.80s/1709.01u sec elapsed 1953.27 sec
STATEMENT:  create index ii on big (account_id, event_time);
LOG:  finished writing run 12 to tape 11: CPU 56.29s/1748.25u sec elapsed 1993.31 sec
STATEMENT:  create index ii on big (account_id, event_time);
LOG:  finished writing final run 13 to tape 12: CPU 56.59s/1768.89u sec elapsed 2014.48 sec
STATEMENT:  create index ii on big (account_id, event_time);
LOG:  performsort done (except 13-way final merge): CPU 57.02s/1771.57u sec elapsed 2018.39 sec
STATEMENT:  create index ii on big (account_id, event_time);
LOG:  external sort ended, 1073478 disk blocks used: CPU 81.28s/2104.60u sec elapsed 2423.59 sec
STATEMENT:  create index ii on big (account_id, event_time);

Your machine seems to be dumping a run about once every 250-300 seconds,
which is about half the speed of mine, which is a bit odd if it's big
iron.  (I wonder whether you have a non-C locale selected ...)  But that
would still extrapolate to no worse than 10 hours to perform the sort,
assuming the runs are about the same size as mine, which they ought to
be given the same maintenance_work_mem setting.

It might be worth setting log_line_prefix = %m so you can check whether
(a) the "begin index sort" message comes out right away, and (b) whether
the trace_sort measurements really differ from clock time or not.

            regards, tom lane

pgsql-bugs by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Multicolun index creation never completes on 9.0.1/solaris
Next
From: Tom Lane
Date:
Subject: Re: NO DATA error message in Frontend when querying large datasets