Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit - Mailing list pgsql-performance

From Craig Ringer
Subject Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date
Msg-id 47D505AF.4040101@postnewspapers.com.au
Whole thread Raw
Responses Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
List pgsql-performance
Hi all

I'm encountering an odd issue with a bulk import query using PostgreSQL
8.3. After a 400,000 row import into a just-truncated table `booking', a
sequential scan run on the table in the same transaction is incredibly
slow, taking ~ 166738.047 ms. After a:
    `COMMIT; BEGIN;'
the same query runs in 712.615 ms, with almost all the time difference
being in the sequential scan of the `booking' table [schema at end of post].

The table is populated by a complex pl/pgsql function that draws from
several other tables to convert data from another app's format. After
that function runs, here's what happens if I do a simple sequential
scan, then what happens after I commit and run it again:

craig=# explain analyze select * from booking;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Seq Scan on booking  (cost=0.00..9871.60 rows=320160 width=139)
     (actual time=0.017..166644.697 rows=341481 loops=1)
 Total runtime: 166738.047 ms
(2 rows)

craig=# commit; begin;
COMMIT
BEGIN
craig=# explain analyze select * from booking;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Seq Scan on booking  (cost=0.00..9871.60 rows=320160 width=139)
     (actual time=0.022..624.492 rows=341481 loops=1)
 Total runtime: 712.615 ms
(2 rows)


SET client_min_messages = 'debug5';

does not shed any more light; the only extra output is (eg):

craig=# select count(distinct id) from booking;
DEBUG:  StartTransactionCommand
DEBUG:  CommitTransactionCommand
 count
--------
 341481
(1 row)

... which took 164558.961 ms to run, or about 2 tuples per second.
[Table schema at end of post]. By comparison, after commit the same
query read about 500 tuples/second.

This issue appears to affect any query that results in a sequential scan
on the newly populated table, and also affects maintenance operations
like ALTER TABLE ... ADD CONSTRAINT ... FOREIGN KEY that perform
sequential scans. ANALYZE is also really slow. I'm not yet sure if index
scans are affected.

I'm not using WAL logging.

It doesn't matter whether I truncate the table before the import using a
separate transaction or the same one that does the import.

I see essentially identical results, and runtimes, with other more
complex queries, but it seems to boil down to extremely slow sequential
scans.

The Linux 2.6.22 host these queries are running on runs PostgreSQL 8.3.
It has 4GB of RAM and shmmax set to 512MB. Tweaking the postgresql
memory parameters seems to make little difference, but the ones that I
adjusted away from defaults to see if this was a resource issue are:

shared_buffers = 32MB
temp_buffers = 128MB
work_mem = 128MB
maintenance_work_mem = 1024MB           # min 1MB

(There are relatively few clients to this database, but they work it hard).

Is this huge speed difference in sequential scans expected behavior? Any
idea what might be causing it?

I'm presently working around it by just committing the transaction after
the bulk import - but there's lots more work to do after that and it
leaves the database in a rather messy interim state.




Here's the table's schema, pasted as a quote to stop Thunderbird
mangling it. There are no rules on this table except those that might be
created internally by postgresql.

> craig=# \d booking
>                                           Table "public.booking"
>          Column         |           Type           |                      Modifiers
> ------------------------+--------------------------+------------------------------------------------------
>  id                     | integer                  | not null default nextval('booking_id_seq'::regclass)
>  customer_id            | integer                  | not null
>  edition_id             | integer                  | not null
>  description            | character varying(255)   | not null
>  position               | integer                  | not null
>  loading_applied        | boolean                  | not null default false
>  loading_ratio          | numeric(16,4)            | not null
>  size_type              | integer                  | not null
>  size_length            | numeric(16,4)            |
>  base_price             | numeric(16,4)            | not null
>  gst_factor             | numeric(16,8)            | not null default gst_factor()
>  page_number            | integer                  |
>  invoiced               | timestamp with time zone |
>  contract_id            | integer                  |
>  old_customer_id        | integer                  | not null
>  booked_time            | timestamp with time zone | not null
>  booked_by              | character varying(80)    | not null
>  cancelled              | boolean                  | not null default false
>  art_supplied           | boolean                  | not null default false
>  repeat_group           | integer                  |
>  notes                  | text                     |
>  originally_from_system | character(1)             |
>  pe_booking_id          | integer                  |
>  pe_customer_id         | integer                  |
> Indexes:
>     "booking_pkey" PRIMARY KEY, btree (id)
> Check constraints:
>     "base_price_nonnegative" CHECK (base_price >= 0::numeric)
>     "gst_factor_nonnegative_and_sane" CHECK (gst_factor >= 0::numeric AND gst_factor < 9::numeric)
>     "loading_ratio_sane" CHECK (loading_ratio > 0::numeric AND loading_ratio < 9::numeric)
>     "page_no_sane" CHECK (page_number IS NULL OR page_number > 0 AND page_number <= 500)
>     "size_length_nonnegative" CHECK (size_length IS NULL OR size_length >= 0::numeric)
> Foreign-key constraints:
>     "booking_contract_id_fkey" FOREIGN KEY (contract_id) REFERENCES contract(id)
>     "booking_customer_id_fkey" FOREIGN KEY (customer_id) REFERENCES customer(id)
>     "booking_edition_id_fkey" FOREIGN KEY (edition_id) REFERENCES edition(id) ON DELETE CASCADE
>     "booking_old_customer_id_fkey" FOREIGN KEY (old_customer_id) REFERENCES customer(id)
>     "booking_position_fkey" FOREIGN KEY ("position") REFERENCES booking_position(id)
>     "booking_repeat_group_fkey" FOREIGN KEY (repeat_group) REFERENCES booking_repeat(id) ON DELETE SET NULL
>     "booking_size_type_fkey" FOREIGN KEY (size_type) REFERENCES booking_size_type(id)
> Triggers:
>     booking_after_insert_update AFTER INSERT ON booking FOR EACH ROW EXECUTE PROCEDURE booking_after_trigger()
>     booking_audit AFTER UPDATE ON booking FOR EACH ROW EXECUTE PROCEDURE booking_audit_trigger()
>     booking_before_insert BEFORE INSERT ON booking FOR EACH ROW EXECUTE PROCEDURE booking_before_trigger()

--
Craig Ringer

pgsql-performance by date:

Previous
From: Craig Ringer
Date:
Subject: Utility functions for enabling/disabling fkey triggers
Next
From: "Heikki Linnakangas"
Date:
Subject: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit