Thread: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
From
Craig Ringer
Date:
Hi all I'm encountering an odd issue with a bulk import query using PostgreSQL 8.3. After a 400,000 row import into a just-truncated table `booking', a sequential scan run on the table in the same transaction is incredibly slow, taking ~ 166738.047 ms. After a: `COMMIT; BEGIN;' the same query runs in 712.615 ms, with almost all the time difference being in the sequential scan of the `booking' table [schema at end of post]. The table is populated by a complex pl/pgsql function that draws from several other tables to convert data from another app's format. After that function runs, here's what happens if I do a simple sequential scan, then what happens after I commit and run it again: craig=# explain analyze select * from booking; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Seq Scan on booking (cost=0.00..9871.60 rows=320160 width=139) (actual time=0.017..166644.697 rows=341481 loops=1) Total runtime: 166738.047 ms (2 rows) craig=# commit; begin; COMMIT BEGIN craig=# explain analyze select * from booking; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Seq Scan on booking (cost=0.00..9871.60 rows=320160 width=139) (actual time=0.022..624.492 rows=341481 loops=1) Total runtime: 712.615 ms (2 rows) SET client_min_messages = 'debug5'; does not shed any more light; the only extra output is (eg): craig=# select count(distinct id) from booking; DEBUG: StartTransactionCommand DEBUG: CommitTransactionCommand count -------- 341481 (1 row) ... which took 164558.961 ms to run, or about 2 tuples per second. [Table schema at end of post]. By comparison, after commit the same query read about 500 tuples/second. This issue appears to affect any query that results in a sequential scan on the newly populated table, and also affects maintenance operations like ALTER TABLE ... ADD CONSTRAINT ... FOREIGN KEY that perform sequential scans. ANALYZE is also really slow. I'm not yet sure if index scans are affected. I'm not using WAL logging. It doesn't matter whether I truncate the table before the import using a separate transaction or the same one that does the import. I see essentially identical results, and runtimes, with other more complex queries, but it seems to boil down to extremely slow sequential scans. The Linux 2.6.22 host these queries are running on runs PostgreSQL 8.3. It has 4GB of RAM and shmmax set to 512MB. Tweaking the postgresql memory parameters seems to make little difference, but the ones that I adjusted away from defaults to see if this was a resource issue are: shared_buffers = 32MB temp_buffers = 128MB work_mem = 128MB maintenance_work_mem = 1024MB # min 1MB (There are relatively few clients to this database, but they work it hard). Is this huge speed difference in sequential scans expected behavior? Any idea what might be causing it? I'm presently working around it by just committing the transaction after the bulk import - but there's lots more work to do after that and it leaves the database in a rather messy interim state. Here's the table's schema, pasted as a quote to stop Thunderbird mangling it. There are no rules on this table except those that might be created internally by postgresql. > craig=# \d booking > Table "public.booking" > Column | Type | Modifiers > ------------------------+--------------------------+------------------------------------------------------ > id | integer | not null default nextval('booking_id_seq'::regclass) > customer_id | integer | not null > edition_id | integer | not null > description | character varying(255) | not null > position | integer | not null > loading_applied | boolean | not null default false > loading_ratio | numeric(16,4) | not null > size_type | integer | not null > size_length | numeric(16,4) | > base_price | numeric(16,4) | not null > gst_factor | numeric(16,8) | not null default gst_factor() > page_number | integer | > invoiced | timestamp with time zone | > contract_id | integer | > old_customer_id | integer | not null > booked_time | timestamp with time zone | not null > booked_by | character varying(80) | not null > cancelled | boolean | not null default false > art_supplied | boolean | not null default false > repeat_group | integer | > notes | text | > originally_from_system | character(1) | > pe_booking_id | integer | > pe_customer_id | integer | > Indexes: > "booking_pkey" PRIMARY KEY, btree (id) > Check constraints: > "base_price_nonnegative" CHECK (base_price >= 0::numeric) > "gst_factor_nonnegative_and_sane" CHECK (gst_factor >= 0::numeric AND gst_factor < 9::numeric) > "loading_ratio_sane" CHECK (loading_ratio > 0::numeric AND loading_ratio < 9::numeric) > "page_no_sane" CHECK (page_number IS NULL OR page_number > 0 AND page_number <= 500) > "size_length_nonnegative" CHECK (size_length IS NULL OR size_length >= 0::numeric) > Foreign-key constraints: > "booking_contract_id_fkey" FOREIGN KEY (contract_id) REFERENCES contract(id) > "booking_customer_id_fkey" FOREIGN KEY (customer_id) REFERENCES customer(id) > "booking_edition_id_fkey" FOREIGN KEY (edition_id) REFERENCES edition(id) ON DELETE CASCADE > "booking_old_customer_id_fkey" FOREIGN KEY (old_customer_id) REFERENCES customer(id) > "booking_position_fkey" FOREIGN KEY ("position") REFERENCES booking_position(id) > "booking_repeat_group_fkey" FOREIGN KEY (repeat_group) REFERENCES booking_repeat(id) ON DELETE SET NULL > "booking_size_type_fkey" FOREIGN KEY (size_type) REFERENCES booking_size_type(id) > Triggers: > booking_after_insert_update AFTER INSERT ON booking FOR EACH ROW EXECUTE PROCEDURE booking_after_trigger() > booking_audit AFTER UPDATE ON booking FOR EACH ROW EXECUTE PROCEDURE booking_audit_trigger() > booking_before_insert BEFORE INSERT ON booking FOR EACH ROW EXECUTE PROCEDURE booking_before_trigger() -- Craig Ringer
Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
From
"Heikki Linnakangas"
Date:
Craig Ringer wrote: > I'm encountering an odd issue with a bulk import query using PostgreSQL > 8.3. After a 400,000 row import into a just-truncated table `booking', a > sequential scan run on the table in the same transaction is incredibly > slow, taking ~ 166738.047 ms. After a: > `COMMIT; BEGIN;' > the same query runs in 712.615 ms, with almost all the time difference > being in the sequential scan of the `booking' table [schema at end of post]. > > The table is populated by a complex pl/pgsql function that draws from > several other tables to convert data from another app's format. You must be having an exception handler block in that pl/pgsql function, which implicitly creates a new subtransaction on each invocation of the exception handler block, so you end up with hundreds of thousands of committed subtransactions. For each row in the seq scan, the list of subtransactions is scanned, to see if the transaction that inserted the row is part of the current top-level transaction. That's fine for a handful of subtransactions, but it gets really slow with large numbers of them, as you've seen. It's an O(n^2) operation, where n is the number of rows inserted, so you'll be in even more trouble if the number of rows increases. As a work-around, avoid using exception handlers, or process more than 1 row per function invocation. Or COMMIT the transaction, as you did. For 8.4, it would be nice to improve that. I tested that on my laptop with a similarly-sized table, inserting each row in a pl/pgsql function with an exception handler, and I got very similar run times. According to oprofile, all the time is spent in TransactionIdIsInProgress. I think it would be pretty straightforward to store the committed subtransaction ids in a sorted array, instead of a linked list, and binary search. Or to use a hash table. That should eliminate this problem, though there is still other places as well where a large number of subtransactions will hurt performance. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
From
Craig Ringer
Date:
Thanks for the extremely helpful response. I don't think I would've spotted that one in a hurry. > You must be having an exception handler block in that pl/pgsql > function, which implicitly creates a new subtransaction on each > invocation of the exception handler block, so you end up with hundreds > of thousands of committed subtransactions. Aah - yes, there is. I didn't realize it'd have such an impact. I can work around the need for it by explicitly checking the table constraints in the function - in which case an uncaught exception will terminate the transaction, but should only arise when I've missed a constraint check. > For 8.4, it would be nice to improve that. I tested that on my laptop > with a similarly-sized table, inserting each row in a pl/pgsql > function with an exception handler, and I got very similar run times. > According to oprofile, all the time is spent in > TransactionIdIsInProgress. I think it would be pretty straightforward > to store the committed subtransaction ids in a sorted array, instead > of a linked list, and binary search. Or to use a hash table. That > should eliminate this problem, though there is still other places as > well where a large number of subtransactions will hurt performance. That does sound interesting - and it would be nice to be able to use exception handlers this way without too huge a performance hit. In the end though it's something that can be designed around once you're aware of it - and I'm sure that other ways of storing that data have their own different costs and downsides. What might also be nice, and simpler, would be a `notice', `log', or even `debug1' level warning telling the user they've reached an absurd number of subtransactions that'll cripple PostgreSQL's performance - say 100,000. There's precedent for this in the checkpoint frequency warning 8.3 produces if checkpoints are becoming too frequent - and like that warning it could be configurable for big sites. If you think that's sane I might have a go at it - though I mostly work in C++ so the result probably won't be too pretty initially. -- Craig Ringer
Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
From
Craig Ringer
Date:
Heikki Linnakangas wrote: > You must be having an exception handler block in that pl/pgsql > function, which implicitly creates a new subtransaction on each > invocation of the exception handler block, so you end up with hundreds > of thousands of committed subtransactions. I've just confirmed that that was indeed the issue, and coding around the begin block dramatically cuts the runtimes of commands executed after the big import function. Thanks again! -- Craig Ringer
Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
From
Tom Lane
Date:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > For 8.4, it would be nice to improve that. I tested that on my laptop > with a similarly-sized table, inserting each row in a pl/pgsql function > with an exception handler, and I got very similar run times. According > to oprofile, all the time is spent in TransactionIdIsInProgress. I think > it would be pretty straightforward to store the committed subtransaction > ids in a sorted array, instead of a linked list, and binary search. I think the OP is not complaining about the time to run the transaction that has all the subtransactions; he's complaining about the time to scan the table that it emitted. Presumably, each row in the table has a different (sub)transaction ID and so we are thrashing the clog lookup mechanism. It only happens once because after that the XMIN_COMMITTED hint bits are set. This probably ties into the recent discussions about eliminating the fixed-size allocations for SLRU buffers --- I suspect it would've run better if it could have scaled up the number of pg_clog pages held in memory. regards, tom lane
Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
From
Craig Ringer
Date:
Tom Lane wrote: > "Heikki Linnakangas" <heikki@enterprisedb.com> writes: >> For 8.4, it would be nice to improve that. I tested that on my laptop >> with a similarly-sized table, inserting each row in a pl/pgsql function >> with an exception handler, and I got very similar run times. According >> to oprofile, all the time is spent in TransactionIdIsInProgress. I think >> it would be pretty straightforward to store the committed subtransaction >> ids in a sorted array, instead of a linked list, and binary search. > > I think the OP is not complaining about the time to run the transaction > that has all the subtransactions; he's complaining about the time to > scan the table that it emitted. Yes, but only in succeeding statements in the same transaction as the procedure that creates all the subtransactions. Table scan times return to normal after that transaction commits. > Presumably, each row in the table has a > different (sub)transaction ID and so we are thrashing the clog lookup > mechanism. It only happens once because after that the XMIN_COMMITTED > hint bits are set. It seems to happen with every statement run in the same transaction as, and after, the procedure with all the subtransactions. As soon as a COMMIT is executed, operations return to normal speed. There's no significant delay on the first statement after COMMIT as compared to subsequent statements, nor do successive statements before the COMMIT get faster. In other words, if I repeatedly run one of the statements I used in testing for my initial post, like: EXPLAIN ANALYZE SELECT * FROM booking; ... after running the problem stored procedure, it takes just as long for the second and third and so on runs as for the first. As soon as I commit the transaction, the exact same statement returns to running in less than a second, and doesn't significantly change in runtime for subsequent executions. I'll bang out a couple of examples at work tomorrow to see what I land up with, since this is clearly something that can benefit from a neat test case. In any case, avoding the use of an exception block per record generated worked around the performance issues, so it's clearly something to do with the vast numbers of subtransactions - as Heikki Linnakangas suggested and tested. -- Craig Ringer
Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
From
"Heikki Linnakangas"
Date:
Tom Lane wrote: > "Heikki Linnakangas" <heikki@enterprisedb.com> writes: >> For 8.4, it would be nice to improve that. I tested that on my laptop >> with a similarly-sized table, inserting each row in a pl/pgsql function >> with an exception handler, and I got very similar run times. According >> to oprofile, all the time is spent in TransactionIdIsInProgress. I think >> it would be pretty straightforward to store the committed subtransaction >> ids in a sorted array, instead of a linked list, and binary search. > > I think the OP is not complaining about the time to run the transaction > that has all the subtransactions; he's complaining about the time to > scan the table that it emitted. If you read the original post carefully, he complained that the seq scan was slow when executed within the same transaction as populating the table, and fast if he committed in between. > Presumably, each row in the table has a > different (sub)transaction ID and so we are thrashing the clog lookup > mechanism. It only happens once because after that the XMIN_COMMITTED > hint bits are set. > > This probably ties into the recent discussions about eliminating the > fixed-size allocations for SLRU buffers --- I suspect it would've run > better if it could have scaled up the number of pg_clog pages held in > memory. I doubt that makes any noticeable difference in this case. 300000 transaction ids fit on < ~100 clog pages, and the xmins on heap pages are nicely in order. Getting rid of the fixed-size allocations would be nice for other reasons, of course. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
From
Tom Lane
Date:
Craig Ringer <craig@postnewspapers.com.au> writes: > It seems to happen with every statement run in the same transaction as, > and after, the procedure with all the subtransactions. As soon as a > COMMIT is executed, operations return to normal speed. Ah. I misread your post as saying that it happened only once. regards, tom lane
Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
From
"Heikki Linnakangas"
Date:
Craig Ringer wrote: > I'll bang out a couple of examples at work tomorrow to see what I land > up with, since this is clearly something that can benefit from a neat > test case. Here's what I used to reproduce this: postgres=# BEGIN; BEGIN postgres=# CREATE TABLE foo (id int4,t text);CREATE TABLE postgres=# CREATE OR REPLACE FUNCTION insertfunc() RETURNS void LANGUAGE plpgsql AS $$ begin INSERT INTO foo VALUES ( 1, repeat('a',110)); exception when unique_violation THEN end; $$; CREATE FUNCTION postgres=# SELECT COUNT(insertfunc()) FROM generate_series(1,300000); count -------- 300000 (1 row) postgres=# EXPLAIN ANALYZE SELECT COUNT(*) FROM foo; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Aggregate (cost=13595.93..13595.94 rows=1 width=0) (actual time=239535.904..239535.906 rows=1 loops=1) -> Seq Scan on foo (cost=0.00..11948.34 rows=659034 width=0) (actual time=0.022..239133.898 rows=300000 loops=1) Total runtime: 239535.974 ms (3 rows) The oprofile output is pretty damning: samples % symbol name 42148 99.7468 TransactionIdIsCurrentTransactionId If you put a COMMIT right before "EXPLAIN ANALYZE..." it runs in < 1s. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
From
Craig Ringer
Date:
Tom Lane wrote: > Craig Ringer <craig@postnewspapers.com.au> writes: > >> It seems to happen with every statement run in the same transaction as, >> and after, the procedure with all the subtransactions. As soon as a >> COMMIT is executed, operations return to normal speed. >> > > Ah. I misread your post as saying that it happened only once. No worries - it's best to be sure. Thanks for looking into it. -- Craig Ringer
Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
From
Tom Lane
Date:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > The oprofile output is pretty damning: > samples % symbol name > 42148 99.7468 TransactionIdIsCurrentTransactionId Oh, I have no doubt that that could eat a lot of cycles inside the originating transaction ;-). I just misread Craig's complaint as being about the cost of the first table scan *after* that transaction. Getting rid of the linked-list representation would be a win in a couple of ways --- we'd not need the bogus "list of XIDs" support in pg_list.h, and xactGetCommittedChildren would go away. OTOH AtSubCommit_childXids would get more expensive. regards, tom lane
Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
From
Rainer Pruy
Date:
We experienced a similar degradation, when heavily using savepoints within a single transaction. However, we had not yet enough time to really investigate the issue. It also was not directly reproducible using a (small) set of statements from a script. As the overall scenario "bulk loads with sub-transactions" is close to the scenario we do run, it might come down to thesame reason, so. Thus take my vote for a solution that does not end up with "don't use (sub-) transactions". Regards, Rainer Craig Ringer schrieb: > Thanks for the extremely helpful response. I don't think I would've > spotted that one in a hurry. > >> You must be having an exception handler block in that pl/pgsql >> function, which implicitly creates a new subtransaction on each >> invocation of the exception handler block, so you end up with hundreds >> of thousands of committed subtransactions. > > Aah - yes, there is. I didn't realize it'd have such an impact. I can > work around the need for it by explicitly checking the table constraints > in the function - in which case an uncaught exception will terminate the > transaction, but should only arise when I've missed a constraint check. > >> For 8.4, it would be nice to improve that. I tested that on my laptop >> with a similarly-sized table, inserting each row in a pl/pgsql >> function with an exception handler, and I got very similar run times. >> According to oprofile, all the time is spent in >> TransactionIdIsInProgress. I think it would be pretty straightforward >> to store the committed subtransaction ids in a sorted array, instead >> of a linked list, and binary search. Or to use a hash table. That >> should eliminate this problem, though there is still other places as >> well where a large number of subtransactions will hurt performance. > > That does sound interesting - and it would be nice to be able to use > exception handlers this way without too huge a performance hit. In the > end though it's something that can be designed around once you're aware > of it - and I'm sure that other ways of storing that data have their own > different costs and downsides. > > What might also be nice, and simpler, would be a `notice', `log', or > even `debug1' level warning telling the user they've reached an absurd > number of subtransactions that'll cripple PostgreSQL's performance - say > 100,000. There's precedent for this in the checkpoint frequency warning > 8.3 produces if checkpoints are becoming too frequent - and like that > warning it could be configurable for big sites. If you think that's sane > I might have a go at it - though I mostly work in C++ so the result > probably won't be too pretty initially. > > -- > Craig Ringer > -- Rainer Pruy Geschäftsführer Acrys Consult GmbH & Co. KG Untermainkai 29-30, D-60329 Frankfurt Tel: +49-69-244506-0 - Fax: +49-69-244506-50 Web: http://www.acrys.com - Email: office@acrys.com Handelsregister: Frankfurt am Main, HRA 31151
Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
From
Simon Riggs
Date:
On Mon, 2008-03-10 at 11:01 +0000, Heikki Linnakangas wrote: > According > to oprofile, all the time is spent in TransactionIdIsInProgress. I recently submitted a patch to optimise this. Your comments would be welcome on the patch. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk
Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
From
"Pavan Deolasee"
Date:
On Mon, Mar 10, 2008 at 4:31 PM, Heikki Linnakangas <heikki@enterprisedb.com> wrote: > According > to oprofile, all the time is spent in TransactionIdIsInProgress. I think > it would be pretty straightforward to store the committed subtransaction > ids in a sorted array, instead of a linked list, and binary search. Assuming that in most of the cases, there will be many committed and few aborted subtransactions, how about storing the list of *aborted* subtransactions ? Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: Very slow (2 tuples/second) sequential scan afterbulk insert; speed returns to ~500 tuples/second after commit
From
"Heikki Linnakangas"
Date:
Simon Riggs wrote: > On Mon, 2008-03-10 at 11:01 +0000, Heikki Linnakangas wrote: >> According >> to oprofile, all the time is spent in TransactionIdIsInProgress. > > I recently submitted a patch to optimise this. Your comments would be > welcome on the patch. You mean this one: http://archives.postgresql.org/pgsql-patches/2008-02/msg00008.php ? Unfortunately that patch won't help in this case. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com