Thread: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit

Hi all

I'm encountering an odd issue with a bulk import query using PostgreSQL
8.3. After a 400,000 row import into a just-truncated table `booking', a
sequential scan run on the table in the same transaction is incredibly
slow, taking ~ 166738.047 ms. After a:
    `COMMIT; BEGIN;'
the same query runs in 712.615 ms, with almost all the time difference
being in the sequential scan of the `booking' table [schema at end of post].

The table is populated by a complex pl/pgsql function that draws from
several other tables to convert data from another app's format. After
that function runs, here's what happens if I do a simple sequential
scan, then what happens after I commit and run it again:

craig=# explain analyze select * from booking;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Seq Scan on booking  (cost=0.00..9871.60 rows=320160 width=139)
     (actual time=0.017..166644.697 rows=341481 loops=1)
 Total runtime: 166738.047 ms
(2 rows)

craig=# commit; begin;
COMMIT
BEGIN
craig=# explain analyze select * from booking;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Seq Scan on booking  (cost=0.00..9871.60 rows=320160 width=139)
     (actual time=0.022..624.492 rows=341481 loops=1)
 Total runtime: 712.615 ms
(2 rows)


SET client_min_messages = 'debug5';

does not shed any more light; the only extra output is (eg):

craig=# select count(distinct id) from booking;
DEBUG:  StartTransactionCommand
DEBUG:  CommitTransactionCommand
 count
--------
 341481
(1 row)

... which took 164558.961 ms to run, or about 2 tuples per second.
[Table schema at end of post]. By comparison, after commit the same
query read about 500 tuples/second.

This issue appears to affect any query that results in a sequential scan
on the newly populated table, and also affects maintenance operations
like ALTER TABLE ... ADD CONSTRAINT ... FOREIGN KEY that perform
sequential scans. ANALYZE is also really slow. I'm not yet sure if index
scans are affected.

I'm not using WAL logging.

It doesn't matter whether I truncate the table before the import using a
separate transaction or the same one that does the import.

I see essentially identical results, and runtimes, with other more
complex queries, but it seems to boil down to extremely slow sequential
scans.

The Linux 2.6.22 host these queries are running on runs PostgreSQL 8.3.
It has 4GB of RAM and shmmax set to 512MB. Tweaking the postgresql
memory parameters seems to make little difference, but the ones that I
adjusted away from defaults to see if this was a resource issue are:

shared_buffers = 32MB
temp_buffers = 128MB
work_mem = 128MB
maintenance_work_mem = 1024MB           # min 1MB

(There are relatively few clients to this database, but they work it hard).

Is this huge speed difference in sequential scans expected behavior? Any
idea what might be causing it?

I'm presently working around it by just committing the transaction after
the bulk import - but there's lots more work to do after that and it
leaves the database in a rather messy interim state.




Here's the table's schema, pasted as a quote to stop Thunderbird
mangling it. There are no rules on this table except those that might be
created internally by postgresql.

> craig=# \d booking
>                                           Table "public.booking"
>          Column         |           Type           |                      Modifiers
> ------------------------+--------------------------+------------------------------------------------------
>  id                     | integer                  | not null default nextval('booking_id_seq'::regclass)
>  customer_id            | integer                  | not null
>  edition_id             | integer                  | not null
>  description            | character varying(255)   | not null
>  position               | integer                  | not null
>  loading_applied        | boolean                  | not null default false
>  loading_ratio          | numeric(16,4)            | not null
>  size_type              | integer                  | not null
>  size_length            | numeric(16,4)            |
>  base_price             | numeric(16,4)            | not null
>  gst_factor             | numeric(16,8)            | not null default gst_factor()
>  page_number            | integer                  |
>  invoiced               | timestamp with time zone |
>  contract_id            | integer                  |
>  old_customer_id        | integer                  | not null
>  booked_time            | timestamp with time zone | not null
>  booked_by              | character varying(80)    | not null
>  cancelled              | boolean                  | not null default false
>  art_supplied           | boolean                  | not null default false
>  repeat_group           | integer                  |
>  notes                  | text                     |
>  originally_from_system | character(1)             |
>  pe_booking_id          | integer                  |
>  pe_customer_id         | integer                  |
> Indexes:
>     "booking_pkey" PRIMARY KEY, btree (id)
> Check constraints:
>     "base_price_nonnegative" CHECK (base_price >= 0::numeric)
>     "gst_factor_nonnegative_and_sane" CHECK (gst_factor >= 0::numeric AND gst_factor < 9::numeric)
>     "loading_ratio_sane" CHECK (loading_ratio > 0::numeric AND loading_ratio < 9::numeric)
>     "page_no_sane" CHECK (page_number IS NULL OR page_number > 0 AND page_number <= 500)
>     "size_length_nonnegative" CHECK (size_length IS NULL OR size_length >= 0::numeric)
> Foreign-key constraints:
>     "booking_contract_id_fkey" FOREIGN KEY (contract_id) REFERENCES contract(id)
>     "booking_customer_id_fkey" FOREIGN KEY (customer_id) REFERENCES customer(id)
>     "booking_edition_id_fkey" FOREIGN KEY (edition_id) REFERENCES edition(id) ON DELETE CASCADE
>     "booking_old_customer_id_fkey" FOREIGN KEY (old_customer_id) REFERENCES customer(id)
>     "booking_position_fkey" FOREIGN KEY ("position") REFERENCES booking_position(id)
>     "booking_repeat_group_fkey" FOREIGN KEY (repeat_group) REFERENCES booking_repeat(id) ON DELETE SET NULL
>     "booking_size_type_fkey" FOREIGN KEY (size_type) REFERENCES booking_size_type(id)
> Triggers:
>     booking_after_insert_update AFTER INSERT ON booking FOR EACH ROW EXECUTE PROCEDURE booking_after_trigger()
>     booking_audit AFTER UPDATE ON booking FOR EACH ROW EXECUTE PROCEDURE booking_audit_trigger()
>     booking_before_insert BEFORE INSERT ON booking FOR EACH ROW EXECUTE PROCEDURE booking_before_trigger()

--
Craig Ringer

Craig Ringer wrote:
> I'm encountering an odd issue with a bulk import query using PostgreSQL
> 8.3. After a 400,000 row import into a just-truncated table `booking', a
> sequential scan run on the table in the same transaction is incredibly
> slow, taking ~ 166738.047 ms. After a:
>     `COMMIT; BEGIN;'
> the same query runs in 712.615 ms, with almost all the time difference
> being in the sequential scan of the `booking' table [schema at end of post].
>
> The table is populated by a complex pl/pgsql function that draws from
> several other tables to convert data from another app's format.

You must be having an exception handler block in that pl/pgsql function,
which implicitly creates a new subtransaction on each invocation of the
exception handler block, so you end up with hundreds of thousands of
committed subtransactions. For each row in the seq scan, the list of
subtransactions is scanned, to see if the transaction that inserted the
row is part of the current top-level transaction. That's fine for a
handful of subtransactions, but it gets really slow with large numbers
of them, as you've seen. It's an O(n^2) operation, where n is the number
of rows inserted, so you'll be in even more trouble if the number of
rows increases.

As a work-around, avoid using exception handlers, or process more than 1
row per function invocation. Or COMMIT the transaction, as you did.

For 8.4, it would be nice to improve that. I tested that on my laptop
with a similarly-sized table, inserting each row in a pl/pgsql function
with an exception handler, and I got very similar run times. According
to oprofile, all the time is spent in TransactionIdIsInProgress. I think
it would be pretty straightforward to store the committed subtransaction
ids in a sorted array, instead of a linked list, and binary search. Or
to use a hash table. That should eliminate this problem, though there is
still other places as well where a large number of subtransactions will
hurt performance.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Thanks for the extremely helpful response. I don't think I would've
spotted that one in a hurry.

> You must be having an exception handler block in that pl/pgsql
> function, which implicitly creates a new subtransaction on each
> invocation of the exception handler block, so you end up with hundreds
> of thousands of committed subtransactions.

Aah - yes, there is. I didn't realize it'd have such an impact. I can
work around the need for it by explicitly checking the table constraints
in the function - in which case an uncaught exception will terminate the
transaction, but should only arise when I've missed a constraint check.

> For 8.4, it would be nice to improve that. I tested that on my laptop
> with a similarly-sized table, inserting each row in a pl/pgsql
> function with an exception handler, and I got very similar run times.
> According to oprofile, all the time is spent in
> TransactionIdIsInProgress. I think it would be pretty straightforward
> to store the committed subtransaction ids in a sorted array, instead
> of a linked list, and binary search. Or to use a hash table. That
> should eliminate this problem, though there is still other places as
> well where a large number of subtransactions will hurt performance.

That does sound interesting - and it would be nice to be able to use
exception handlers this way without too huge a performance hit. In the
end though it's something that can be designed around once you're aware
of it - and I'm sure that other ways of storing that data have their own
different costs and downsides.

What might also be nice, and simpler, would be a `notice', `log', or
even `debug1' level warning telling the user they've reached an absurd
number of subtransactions that'll cripple PostgreSQL's performance - say
100,000. There's precedent for this in the checkpoint frequency warning
8.3 produces if checkpoints are becoming too frequent - and like that
warning it could be configurable for big sites. If you think that's sane
I might have a go at it - though I mostly work in C++ so the result
probably won't be too pretty initially.

--
Craig Ringer

Heikki Linnakangas wrote:
> You must be having an exception handler block in that pl/pgsql
> function, which implicitly creates a new subtransaction on each
> invocation of the exception handler block, so you end up with hundreds
> of thousands of committed subtransactions.
I've just confirmed that that was indeed the issue, and coding around
the begin block dramatically cuts the runtimes of commands executed
after the big import function.

Thanks again!

--
Craig Ringer

"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> For 8.4, it would be nice to improve that. I tested that on my laptop
> with a similarly-sized table, inserting each row in a pl/pgsql function
> with an exception handler, and I got very similar run times. According
> to oprofile, all the time is spent in TransactionIdIsInProgress. I think
> it would be pretty straightforward to store the committed subtransaction
> ids in a sorted array, instead of a linked list, and binary search.

I think the OP is not complaining about the time to run the transaction
that has all the subtransactions; he's complaining about the time to
scan the table that it emitted.  Presumably, each row in the table has a
different (sub)transaction ID and so we are thrashing the clog lookup
mechanism.  It only happens once because after that the XMIN_COMMITTED
hint bits are set.

This probably ties into the recent discussions about eliminating the
fixed-size allocations for SLRU buffers --- I suspect it would've run
better if it could have scaled up the number of pg_clog pages held in
memory.

            regards, tom lane

Tom Lane wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>> For 8.4, it would be nice to improve that. I tested that on my laptop
>> with a similarly-sized table, inserting each row in a pl/pgsql function
>> with an exception handler, and I got very similar run times. According
>> to oprofile, all the time is spent in TransactionIdIsInProgress. I think
>> it would be pretty straightforward to store the committed subtransaction
>> ids in a sorted array, instead of a linked list, and binary search.
>
> I think the OP is not complaining about the time to run the transaction
> that has all the subtransactions; he's complaining about the time to
> scan the table that it emitted.

Yes, but only in succeeding statements in the same transaction as the
procedure that creates all the subtransactions. Table scan times return
to normal after that transaction commits.

> Presumably, each row in the table has a
> different (sub)transaction ID and so we are thrashing the clog lookup
> mechanism.  It only happens once because after that the XMIN_COMMITTED
> hint bits are set.

It seems to happen with every statement run in the same transaction as,
and after, the procedure with all the subtransactions. As soon as a
COMMIT is executed, operations return to normal speed. There's no
significant delay on the first statement after COMMIT as compared to
subsequent statements, nor do successive statements before the COMMIT
get faster.

In other words, if I repeatedly run one of the statements I used in
testing for my initial post, like:

EXPLAIN ANALYZE SELECT * FROM booking;

... after running the problem stored procedure, it takes just as long
for the second and third and so on runs as for the first.

As soon as I commit the transaction, the exact same statement returns to
running in less than a second, and doesn't significantly change in
runtime for subsequent executions.

I'll bang out a couple of examples at work tomorrow to see what I land
up with, since this is clearly something that can benefit from a neat
test case.

In any case, avoding the use of an exception block per record generated
worked around the performance issues, so it's clearly something to do
with the vast numbers of subtransactions - as Heikki Linnakangas
suggested and tested.

--
Craig Ringer

Tom Lane wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>> For 8.4, it would be nice to improve that. I tested that on my laptop
>> with a similarly-sized table, inserting each row in a pl/pgsql function
>> with an exception handler, and I got very similar run times. According
>> to oprofile, all the time is spent in TransactionIdIsInProgress. I think
>> it would be pretty straightforward to store the committed subtransaction
>> ids in a sorted array, instead of a linked list, and binary search.
>
> I think the OP is not complaining about the time to run the transaction
> that has all the subtransactions; he's complaining about the time to
> scan the table that it emitted.

If you read the original post carefully, he complained that the seq scan
was slow when executed within the same transaction as populating the
table, and fast if he committed in between.

>  Presumably, each row in the table has a
> different (sub)transaction ID and so we are thrashing the clog lookup
> mechanism.  It only happens once because after that the XMIN_COMMITTED
> hint bits are set.
>
> This probably ties into the recent discussions about eliminating the
> fixed-size allocations for SLRU buffers --- I suspect it would've run
> better if it could have scaled up the number of pg_clog pages held in
> memory.

I doubt that makes any noticeable difference in this case. 300000
transaction ids fit on < ~100 clog pages, and the xmins on heap pages
are nicely in order.

Getting rid of the fixed-size allocations would be nice for other
reasons, of course.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Craig Ringer <craig@postnewspapers.com.au> writes:
> It seems to happen with every statement run in the same transaction as,
> and after, the procedure with all the subtransactions. As soon as a
> COMMIT is executed, operations return to normal speed.

Ah.  I misread your post as saying that it happened only once.

            regards, tom lane

Craig Ringer wrote:
> I'll bang out a couple of examples at work tomorrow to see what I land
> up with, since this is clearly something that can benefit from a neat
> test case.

Here's what I used to reproduce this:

postgres=# BEGIN;
BEGIN
postgres=# CREATE TABLE foo (id int4,t text);CREATE TABLE
postgres=# CREATE OR REPLACE FUNCTION insertfunc() RETURNS void LANGUAGE
plpgsql  AS $$
   begin
     INSERT INTO foo VALUES ( 1, repeat('a',110));
   exception when unique_violation THEN end;
$$;
CREATE FUNCTION
postgres=# SELECT COUNT(insertfunc()) FROM generate_series(1,300000);
count
--------
  300000
(1 row)

postgres=# EXPLAIN ANALYZE SELECT COUNT(*) FROM foo;
                                   QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=13595.93..13595.94 rows=1 width=0) (actual
time=239535.904..239535.906 rows=1 loops=1)
    ->  Seq Scan on foo  (cost=0.00..11948.34 rows=659034 width=0)
(actual time=0.022..239133.898 rows=300000 loops=1)
  Total runtime: 239535.974 ms
(3 rows)


The oprofile output is pretty damning:

samples  %        symbol name
42148    99.7468  TransactionIdIsCurrentTransactionId

If you put a COMMIT right before "EXPLAIN ANALYZE..." it runs in < 1s.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Tom Lane wrote:
> Craig Ringer <craig@postnewspapers.com.au> writes:
>
>> It seems to happen with every statement run in the same transaction as,
>> and after, the procedure with all the subtransactions. As soon as a
>> COMMIT is executed, operations return to normal speed.
>>
>
> Ah.  I misread your post as saying that it happened only once.
No worries - it's best to be sure.

Thanks for looking into it.

--
Craig Ringer


"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> The oprofile output is pretty damning:

> samples  %        symbol name
> 42148    99.7468  TransactionIdIsCurrentTransactionId

Oh, I have no doubt that that could eat a lot of cycles inside the
originating transaction ;-).  I just misread Craig's complaint as
being about the cost of the first table scan *after* that transaction.

Getting rid of the linked-list representation would be a win in a couple
of ways --- we'd not need the bogus "list of XIDs" support in pg_list.h,
and xactGetCommittedChildren would go away.  OTOH AtSubCommit_childXids
would get more expensive.

            regards, tom lane

We experienced a similar degradation,
when heavily using savepoints within a single transaction.
However, we had not yet enough time to really investigate the issue.
It also was not directly reproducible using a (small) set of statements from a script.
As the overall scenario "bulk loads with sub-transactions" is close to the scenario we do run, it might come down to
thesame reason, so. 

Thus take my vote for a solution that does not end up with "don't use (sub-) transactions".

Regards,
Rainer

Craig Ringer schrieb:
> Thanks for the extremely helpful response. I don't think I would've
> spotted that one in a hurry.
>
>> You must be having an exception handler block in that pl/pgsql
>> function, which implicitly creates a new subtransaction on each
>> invocation of the exception handler block, so you end up with hundreds
>> of thousands of committed subtransactions.
>
> Aah - yes, there is. I didn't realize it'd have such an impact. I can
> work around the need for it by explicitly checking the table constraints
> in the function - in which case an uncaught exception will terminate the
> transaction, but should only arise when I've missed a constraint check.
>
>> For 8.4, it would be nice to improve that. I tested that on my laptop
>> with a similarly-sized table, inserting each row in a pl/pgsql
>> function with an exception handler, and I got very similar run times.
>> According to oprofile, all the time is spent in
>> TransactionIdIsInProgress. I think it would be pretty straightforward
>> to store the committed subtransaction ids in a sorted array, instead
>> of a linked list, and binary search. Or to use a hash table. That
>> should eliminate this problem, though there is still other places as
>> well where a large number of subtransactions will hurt performance.
>
> That does sound interesting - and it would be nice to be able to use
> exception handlers this way without too huge a performance hit. In the
> end though it's something that can be designed around once you're aware
> of it - and I'm sure that other ways of storing that data have their own
> different costs and downsides.
>
> What might also be nice, and simpler, would be a `notice', `log', or
> even `debug1' level warning telling the user they've reached an absurd
> number of subtransactions that'll cripple PostgreSQL's performance - say
> 100,000. There's precedent for this in the checkpoint frequency warning
> 8.3 produces if checkpoints are becoming too frequent - and like that
> warning it could be configurable for big sites. If you think that's sane
> I might have a go at it - though I mostly work in C++ so the result
> probably won't be too pretty initially.
>
> --
> Craig Ringer
>

--
Rainer Pruy
Geschäftsführer

Acrys Consult GmbH & Co. KG
Untermainkai 29-30, D-60329 Frankfurt
Tel: +49-69-244506-0 - Fax: +49-69-244506-50
Web: http://www.acrys.com -  Email: office@acrys.com
Handelsregister: Frankfurt am Main, HRA 31151

On Mon, 2008-03-10 at 11:01 +0000, Heikki Linnakangas wrote:
> According
> to oprofile, all the time is spent in TransactionIdIsInProgress.

I recently submitted a patch to optimise this. Your comments would be
welcome on the patch.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


On Mon, Mar 10, 2008 at 4:31 PM, Heikki Linnakangas
<heikki@enterprisedb.com> wrote:
> According
>  to oprofile, all the time is spent in TransactionIdIsInProgress. I think
>  it would be pretty straightforward to store the committed subtransaction
>  ids in a sorted array, instead of a linked list, and binary search.

Assuming that in most of the cases, there will be many committed and few aborted
subtransactions, how about storing the list of *aborted* subtransactions ?


Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

Simon Riggs wrote:
> On Mon, 2008-03-10 at 11:01 +0000, Heikki Linnakangas wrote:
>> According
>> to oprofile, all the time is spent in TransactionIdIsInProgress.
>
> I recently submitted a patch to optimise this. Your comments would be
> welcome on the patch.

You mean this one:

http://archives.postgresql.org/pgsql-patches/2008-02/msg00008.php

? Unfortunately that patch won't help in this case.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com