Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit - Mailing list pgsql-performance
From | Craig Ringer |
---|---|
Subject | Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit |
Date | |
Msg-id | 47D505AF.4040101@postnewspapers.com.au Whole thread Raw |
Responses |
Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
|
List | pgsql-performance |
Hi all I'm encountering an odd issue with a bulk import query using PostgreSQL 8.3. After a 400,000 row import into a just-truncated table `booking', a sequential scan run on the table in the same transaction is incredibly slow, taking ~ 166738.047 ms. After a: `COMMIT; BEGIN;' the same query runs in 712.615 ms, with almost all the time difference being in the sequential scan of the `booking' table [schema at end of post]. The table is populated by a complex pl/pgsql function that draws from several other tables to convert data from another app's format. After that function runs, here's what happens if I do a simple sequential scan, then what happens after I commit and run it again: craig=# explain analyze select * from booking; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Seq Scan on booking (cost=0.00..9871.60 rows=320160 width=139) (actual time=0.017..166644.697 rows=341481 loops=1) Total runtime: 166738.047 ms (2 rows) craig=# commit; begin; COMMIT BEGIN craig=# explain analyze select * from booking; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Seq Scan on booking (cost=0.00..9871.60 rows=320160 width=139) (actual time=0.022..624.492 rows=341481 loops=1) Total runtime: 712.615 ms (2 rows) SET client_min_messages = 'debug5'; does not shed any more light; the only extra output is (eg): craig=# select count(distinct id) from booking; DEBUG: StartTransactionCommand DEBUG: CommitTransactionCommand count -------- 341481 (1 row) ... which took 164558.961 ms to run, or about 2 tuples per second. [Table schema at end of post]. By comparison, after commit the same query read about 500 tuples/second. This issue appears to affect any query that results in a sequential scan on the newly populated table, and also affects maintenance operations like ALTER TABLE ... ADD CONSTRAINT ... FOREIGN KEY that perform sequential scans. ANALYZE is also really slow. I'm not yet sure if index scans are affected. I'm not using WAL logging. It doesn't matter whether I truncate the table before the import using a separate transaction or the same one that does the import. I see essentially identical results, and runtimes, with other more complex queries, but it seems to boil down to extremely slow sequential scans. The Linux 2.6.22 host these queries are running on runs PostgreSQL 8.3. It has 4GB of RAM and shmmax set to 512MB. Tweaking the postgresql memory parameters seems to make little difference, but the ones that I adjusted away from defaults to see if this was a resource issue are: shared_buffers = 32MB temp_buffers = 128MB work_mem = 128MB maintenance_work_mem = 1024MB # min 1MB (There are relatively few clients to this database, but they work it hard). Is this huge speed difference in sequential scans expected behavior? Any idea what might be causing it? I'm presently working around it by just committing the transaction after the bulk import - but there's lots more work to do after that and it leaves the database in a rather messy interim state. Here's the table's schema, pasted as a quote to stop Thunderbird mangling it. There are no rules on this table except those that might be created internally by postgresql. > craig=# \d booking > Table "public.booking" > Column | Type | Modifiers > ------------------------+--------------------------+------------------------------------------------------ > id | integer | not null default nextval('booking_id_seq'::regclass) > customer_id | integer | not null > edition_id | integer | not null > description | character varying(255) | not null > position | integer | not null > loading_applied | boolean | not null default false > loading_ratio | numeric(16,4) | not null > size_type | integer | not null > size_length | numeric(16,4) | > base_price | numeric(16,4) | not null > gst_factor | numeric(16,8) | not null default gst_factor() > page_number | integer | > invoiced | timestamp with time zone | > contract_id | integer | > old_customer_id | integer | not null > booked_time | timestamp with time zone | not null > booked_by | character varying(80) | not null > cancelled | boolean | not null default false > art_supplied | boolean | not null default false > repeat_group | integer | > notes | text | > originally_from_system | character(1) | > pe_booking_id | integer | > pe_customer_id | integer | > Indexes: > "booking_pkey" PRIMARY KEY, btree (id) > Check constraints: > "base_price_nonnegative" CHECK (base_price >= 0::numeric) > "gst_factor_nonnegative_and_sane" CHECK (gst_factor >= 0::numeric AND gst_factor < 9::numeric) > "loading_ratio_sane" CHECK (loading_ratio > 0::numeric AND loading_ratio < 9::numeric) > "page_no_sane" CHECK (page_number IS NULL OR page_number > 0 AND page_number <= 500) > "size_length_nonnegative" CHECK (size_length IS NULL OR size_length >= 0::numeric) > Foreign-key constraints: > "booking_contract_id_fkey" FOREIGN KEY (contract_id) REFERENCES contract(id) > "booking_customer_id_fkey" FOREIGN KEY (customer_id) REFERENCES customer(id) > "booking_edition_id_fkey" FOREIGN KEY (edition_id) REFERENCES edition(id) ON DELETE CASCADE > "booking_old_customer_id_fkey" FOREIGN KEY (old_customer_id) REFERENCES customer(id) > "booking_position_fkey" FOREIGN KEY ("position") REFERENCES booking_position(id) > "booking_repeat_group_fkey" FOREIGN KEY (repeat_group) REFERENCES booking_repeat(id) ON DELETE SET NULL > "booking_size_type_fkey" FOREIGN KEY (size_type) REFERENCES booking_size_type(id) > Triggers: > booking_after_insert_update AFTER INSERT ON booking FOR EACH ROW EXECUTE PROCEDURE booking_after_trigger() > booking_audit AFTER UPDATE ON booking FOR EACH ROW EXECUTE PROCEDURE booking_audit_trigger() > booking_before_insert BEFORE INSERT ON booking FOR EACH ROW EXECUTE PROCEDURE booking_before_trigger() -- Craig Ringer
pgsql-performance by date: