Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit - Mailing list pgsql-patches

From Tom Lane
Subject Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date
Msg-id 5827.1205269597@sss.pgh.pa.us
Whole thread Raw
In response to Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
Responses Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
List pgsql-patches
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> I initially thought that using a single palloc'd array to hold all the
> XIDs would introduce a new limit on the number committed
> subtransactions, thanks to MaxAllocSize, but that's not the case.
> Without patch, we actually allocate an array like that anyway in
> xactGetCommittedChildren.

Right.

> Elsewhere in our codebase where we use arrays that are enlarged as
> needed, we keep track of the "allocated" size and the "used" size of the
> array separately, and only call repalloc when the array fills up, and
> repalloc a larger than necessary array when it does. I chose to just
> call repalloc every time instead, as repalloc is smart enough to fall
> out quickly if the chunk the allocation was made in is already larger
> than the new size. There might be some gain avoiding the repeated
> repalloc calls, but I doubt it's worth the code complexity, and calling
> repalloc with a larger than necessary size can actually force it to
> unnecessarily allocate a new, larger chunk instead of reusing the old
> one. Thoughts on that?

Seems like a pretty bad idea to me, as the behavior you're counting on
only applies to chunks up to 8K or thereabouts.  In a situation where
you are subcommitting lots of XIDs one at a time, this is likely to have
quite awful behavior (or at least, you're at the mercy of the local
malloc library as to how bad it is).  I'd go with the same
double-it-each-time-needed approach we use elsewhere.

            regards, tom lane

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: TransactionIdIsInProgress() cache
Next
From: Bruce Momjian
Date:
Subject: Re: Load Distributed Checkpoints, final patch