Re: Index creation takes for ever - Mailing list pgsql-hackers

From ohp@pyrenet.fr
Subject Re: Index creation takes for ever
Date
Msg-id Pine.UW2.4.53.0309011347420.23865@server.pyrenet.fr
Whole thread Raw
In response to Re: Index creation takes for ever  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Index creation takes for ever
List pgsql-hackers
Tom,
Let me come back on this one:
I initdb --locale=C and reloaded all bases including this one. Index
creation wassn't too bad. So I thought that was it.

Yesturday evening I decided to make a test so I pg_dump'd that database,
created a test db and reloaded evrything from the pg_dump.

it took 69 minutes to finish, 75% of this time was devoted to create 2
indexes on varchar(2) with value being 'O', 'N' or null;

I wonder if it's a configuration matter.
PGSQL is VERY solicited here (10's thousand connections/day , multiple
queries/connection

Here's my postgresql.conf FWIW, Any advice ?

#
# PostgreSQL configuration file
# -----------------------------
#
# This file consists of lines of the form:
#
#   name = value
#
# (The '=' is optional.) White space may be used. Comments are introduced
# with '#' anywhere on a line. The complete list of option names and
# allowed values can be found in the PostgreSQL documentation. The
# commented-out settings shown in this file represent the default values.
#
# Any option can also be given as a command line switch to the
# postmaster, e.g. 'postmaster -c log_connections=on'. Some options
# can be changed at run-time with the 'SET' SQL command.
#
# This file is read on postmaster startup and when the postmaster
# receives a SIGHUP. If you edit the file on a running system, you have
# to SIGHUP the postmaster for the changes to take effect, or use
# "pg_ctl reload".


#========================================================================


#
#    Connection Parameters
#
tcpip_socket = true
#ssl = false

max_connections = 64
#superuser_reserved_connections = 2

port = 5432
hostname_lookup = true
#show_source_port = false

#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777    # octal

#virtual_host = ''

#krb_server_keyfile = ''


#
#    Shared Memory Size
#
shared_buffers = 10000        # min max_connections*2 or 16, 8KB each
max_fsm_relations = 1000    # min 10, fsm is free space map, ~40 bytes
max_fsm_pages = 10000        # min 1000, fsm is free space map, ~6 bytes
#max_locks_per_transaction = 64    # min 10
#wal_buffers = 8        # min 4, typically 8KB each

#
#    Non-shared Memory Sizes
#
sort_mem = 10240        # min 64, size in KB
#vacuum_mem = 8192        # min 1024, size in KB


#
#    Write-ahead log (WAL)
#
#checkpoint_segments = 3    # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 300    # range 30-3600, in seconds
#
#commit_delay = 0        # range 0-100000, in microseconds
#commit_siblings = 5        # range 1-1000
#
#fsync = true
#wal_sync_method = fsync    # the default varies across platforms:
#                # fsync, fdatasync, open_sync, or open_datasync
#wal_debug = 0            # range 0-16


#
#    Optimizer Parameters
#
#enable_seqscan = true
#enable_indexscan = true
#enable_tidscan = true
#enable_sort = true
#enable_nestloop = true
#enable_mergejoin = true
#enable_hashjoin = true

effective_cache_size = 10000    # typically 8KB each
#random_page_cost = 4        # units are one sequential page fetch cost
#cpu_tuple_cost = 0.01        # (same)
#cpu_index_tuple_cost = 0.001    # (same)
#cpu_operator_cost = 0.0025    # (same)

#default_statistics_target = 10    # range 1-1000

#
#    GEQO Optimizer Parameters
#
#geqo = true
#geqo_selection_bias = 2.0    # range 1.5-2.0
#geqo_threshold = 11
#geqo_pool_size = 0        # default based on tables in statement,            # range 128-1024
#geqo_effort = 1
#geqo_generations = 0
#geqo_random_seed = -1        # auto-compute seed


#
#    Message display
#
#server_min_messages = notice    # Values, in order of decreasing detail:            #   debug5, debug4, debug3,
debug2,debug1,            #   info, notice, warning, error, log, fatal,            #   panic
 
#client_min_messages = notice    # Values, in order of decreasing detail:            #   debug5, debug4, debug3,
debug2,debug1,            #   log, info, notice, warning, error
 
#silent_mode = false

log_connections = true
log_pid = true
log_statement = true
log_duration = true
#log_timestamp = false

#log_min_error_statement = error # Values in order of increasing severity:             #   debug5, debug4, debug3,
debug2,debug1,             #   info, notice, warning, error, panic(off)
 

#debug_print_parse = false
#debug_print_rewritten = false
#debug_print_plan = false
debug_pretty_print = false

#explain_pretty_print = true

# requires USE_ASSERT_CHECKING
#debug_assertions = true


#
#    Syslog
#
syslog = 2            # range 0-2
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'


#
#    Statistics
#
#show_parser_stats = false
#show_planner_stats = false
#show_executor_stats = false
#show_statement_stats = false

# requires BTREE_BUILD_STATS
#show_btree_build_stats = false


#
#    Access statistics collection
#
#stats_start_collector = true
#stats_reset_on_server_start = true
stats_command_string = true
stats_row_level = true
#stats_block_level = false


#
#    Lock Tracing
#
#trace_notify = false

# requires LOCK_DEBUG
#trace_locks = false
#trace_userlocks = false
#trace_lwlocks = false
#debug_deadlocks = false
#trace_lock_oidmin = 16384
#trace_lock_table = 0


#
#    Misc
#
#autocommit = true
#dynamic_library_path = '$libdir'
#search_path = '$user,public'
datestyle = 'postgres, european'
#timezone = unknown        # actually, defaults to TZ environment setting
#australian_timezones = false
#client_encoding = sql_ascii    # actually, defaults to database encoding
#authentication_timeout = 60    # 1-600, in seconds
#deadlock_timeout = 1000    # in milliseconds
#default_transaction_isolation = 'read committed'
#max_expr_depth = 10000        # min 10
#max_files_per_process = 1000    # min 25
#password_encryption = true
#sql_inheritance = true
#transform_null_equals = false
#statement_timeout = 0        # 0 is disabled, in milliseconds
#db_user_namespace = false

#
# Local Settings
LC_MESSAGES = 'fr_FR'
LC_MONETARY = 'fr_FR'


The machine has 1 G DDR ECC Registred RAM, 2 1,8 GZ XEON running uw713
PGversion is 7.3.4, macine is not swaping.

Those index creation took 100% of 1 CPU.

Regards
On Thu, 28 Aug 2003, Tom Lane wrote:

> Date: Thu, 28 Aug 2003 10:13:21 -0400
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: ohp@pyrenet.fr
> Cc: pgsql-hackers list <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] Index creation takes for ever
>
> ohp@pyrenet.fr writes:
> > I've then pg_dump'ed the database and recreate an other both on 7.3.4 and
> > 7.4b
>
> > Both are still running after more than 30 minutes of CPU (100% cpu taken)
> > creating the levt_lu_ligne_evt_key.
>
> That's hard to believe.  I get
>
> regression=# SELECT levt_lu,count(*) from ligne_evt group by levt_lu;
>  levt_lu | count
> ---------+--------
>  N       |  49435
>  O       | 181242
> (2 rows)
>
> Time: 6927.28 ms
> regression=# create index levt_lu_ligne_evt_key on ligne_evt (levt_lu);
> CREATE INDEX
> Time: 14946.74 ms
>
> on a not-very-fast machine ... and it seems to be mostly I/O bound.
>
> What platform are you on?  I could believe that the local qsort() is
> incredibly inefficient with many equal keys.  Another possibility is
> that you're using a non-C locale and strcoll() is horribly slow.
>
>             regards, tom lane
>

-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou           +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)


pgsql-hackers by date:

Previous
From: Mark Kirkwood
Date:
Subject: Re: Is it a memory leak in PostgreSQL 7.4beta?
Next
From: ohp@pyrenet.fr
Date:
Subject: Re: pg_dump bug?