Thread: a question for the way-back machine

a question for the way-back machine

From
Ben
Date:
I'm using an ancient version of postgresql (7.3 - don't ask) and I'm
experiencing a problem where many inserts into an empty table slow down
over time, even if I analyze in the middle of the insertions.
pg_stat_user_tables shows lots and lots of full scans, which explains the
slowdown, but leads me to wonder why the full scans. Each insert is its
own transaction. This table has at least one unique index on it, and a
couple other multi-column indexes, so I was wondering if maybe that was
what was causing the full scans. But a an explain shows that a select for
that unique column will use the index, so now I'm wondering if maybe:

- 7.3 isn't smart enough to use an index on an insert? Seems unlikely.
- I have to start a new session to use new planner stats, even though each
insert is in it's own transaction?

Something I haven't verified for myself yet but have been told by others
is that if I start these inserts quickly after installing a new database
cluster (which probably translates as: before running "vacuum analyze" for
the first time), then these slowdowns do not occur. I can't figure out why
that might be.

Any other thoughts?

Re: a question for the way-back machine

From
Martijn van Oosterhout
Date:
On Wed, Dec 13, 2006 at 01:10:44PM -0800, Ben wrote:
> I'm using an ancient version of postgresql (7.3 - don't ask) and I'm
> experiencing a problem where many inserts into an empty table slow down
> over time, even if I analyze in the middle of the insertions.
> pg_stat_user_tables shows lots and lots of full scans, which explains the
> slowdown, but leads me to wonder why the full scans. Each insert is its
> own transaction. This table has at least one unique index on it, and a
> couple other multi-column indexes, so I was wondering if maybe that was
> what was causing the full scans. But a an explain shows that a select for
> that unique column will use the index, so now I'm wondering if maybe:
>
> - 7.3 isn't smart enough to use an index on an insert? Seems unlikely.

This question makes no sense, you don't need an index to insert.

Are you sure it's not due to some foreign key check?

BTW, seperate inserts is the worst way to load data. At least put them
within a single transaction, or use COPY.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: a question for the way-back machine

From
Ben
Date:

On Wed, 13 Dec 2006, Martijn van Oosterhout wrote:

>> - 7.3 isn't smart enough to use an index on an insert? Seems unlikely.
>
> This question makes no sense, you don't need an index to insert.

Wouldn't it need to check the unique constraint (an index on the table)
before the insert can succeed? It seems like it would be better to check
the index than to do a full table scan to try to satisfy that constraint.

> Are you sure it's not due to some foreign key check?

No, but it seems unlikely, given that the vast majority of activity is
inserts into a single table, and that this table has massive amounts of
sequential scans according to pg_stat_user_tables.

> BTW, seperate inserts is the worst way to load data. At least put them
> within a single transaction, or use COPY.

Oh, I know. It's not my choice, and not (yet) changeable.

Re: a question for the way-back machine

From
Martijn van Oosterhout
Date:
On Wed, Dec 13, 2006 at 02:01:46PM -0800, Ben wrote:
>
>
> On Wed, 13 Dec 2006, Martijn van Oosterhout wrote:
>
> >>- 7.3 isn't smart enough to use an index on an insert? Seems unlikely.
> >
> >This question makes no sense, you don't need an index to insert.
>
> Wouldn't it need to check the unique constraint (an index on the table)
> before the insert can succeed? It seems like it would be better to check
> the index than to do a full table scan to try to satisfy that constraint.

When you insert a tuple, it needs to be inserted into the index, yes. There
is no way an insert can cause a sequential scan, except by some trigger
defined on the table.

> >Are you sure it's not due to some foreign key check?
>
> No, but it seems unlikely, given that the vast majority of activity is
> inserts into a single table, and that this table has massive amounts of
> sequential scans according to pg_stat_user_tables.

You're not doing a select within the insert statement are you?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: a question for the way-back machine

From
Ben
Date:
> When you insert a tuple, it needs to be inserted into the index, yes. There
> is no way an insert can cause a sequential scan, except by some trigger
> defined on the table.

Actually, as it happens, there *is* a trigger defined on the table to fire
before insert, but it too uses an index scan, at least according to
explain. Though, you'd think if it actually was using an index scan, that
would be showing up in pg_stat_user_tables, which it isn't. Might the fact
that the trigger is a plpgsql function be throwing it off and keeping it
from using more recent planner stats?

> You're not doing a select within the insert statement are you?

No, just within the trigger.

Re: a question for the way-back machine

From
Tom Lane
Date:
Ben <bench@silentmedia.com> writes:
> Wouldn't it need to check the unique constraint (an index on the table)
> before the insert can succeed? It seems like it would be better to check
> the index than to do a full table scan to try to satisfy that constraint.

Postgres contains no code that would even consider doing a seqscan for a
uniqueness check; it could not work because of race-condition considerations,
never mind performance.

You have not given us any clue about where the seqscans are really
coming from ... have you got any user-defined triggers, SQL functions in
CHECK constraints, stuff like that?

            regards, tom lane

Re: a question for the way-back machine

From
Richard Huxton
Date:
Ben wrote:
>> When you insert a tuple, it needs to be inserted into the index, yes.
>> There
>> is no way an insert can cause a sequential scan, except by some trigger
>> defined on the table.
>
> Actually, as it happens, there *is* a trigger defined on the table to
> fire before insert, but it too uses an index scan, at least according to
> explain. Though, you'd think if it actually was using an index scan,
> that would be showing up in pg_stat_user_tables, which it isn't. Might
> the fact that the trigger is a plpgsql function be throwing it off and
> keeping it from using more recent planner stats?

The query-plan for the function will be compiled first time it is
called. From that point on, it is fixed. It seems that is the source of
your seq-scans.

You can use the EXECUTE statement to construct a dynamic version of the
query, which will be planned every time it is run.

--
   Richard Huxton
   Archonet Ltd

Re: a question for the way-back machine

From
Ben
Date:
Interesting. Is that plan cached for the life of the session doing the
inserts, the life of the trigger, or until the database is restarted?

I guess I'm trying to figure out how to get the plan to re-cache, without
making it entirely dynamic.

On Thu, 14 Dec 2006, Richard Huxton wrote:

> Ben wrote:
>>> When you insert a tuple, it needs to be inserted into the index, yes.
>>> There
>>> is no way an insert can cause a sequential scan, except by some trigger
>>> defined on the table.
>>
>> Actually, as it happens, there *is* a trigger defined on the table to fire
>> before insert, but it too uses an index scan, at least according to
>> explain. Though, you'd think if it actually was using an index scan, that
>> would be showing up in pg_stat_user_tables, which it isn't. Might the fact
>> that the trigger is a plpgsql function be throwing it off and keeping it
>> from using more recent planner stats?
>
> The query-plan for the function will be compiled first time it is called.
> From that point on, it is fixed. It seems that is the source of your
> seq-scans.
>
> You can use the EXECUTE statement to construct a dynamic version of the
> query, which will be planned every time it is run.
>
> --
>  Richard Huxton
>  Archonet Ltd
>

Re: a question for the way-back machine

From
Martijn van Oosterhout
Date:
On Thu, Dec 14, 2006 at 09:35:47AM -0800, Ben wrote:
> Interesting. Is that plan cached for the life of the session doing the
> inserts, the life of the trigger, or until the database is restarted?

Duration of a session, there is no support to cache a plan for any other
period.

> I guess I'm trying to figure out how to get the plan to re-cache, without
> making it entirely dynamic.

I don't think you can. Restarting the connection should be enough.

Have a ncie day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: a question for the way-back machine

From
Ben
Date:
OK, thanks.

On Thu, 14 Dec 2006, Martijn van Oosterhout wrote:

> On Thu, Dec 14, 2006 at 09:35:47AM -0800, Ben wrote:
>> Interesting. Is that plan cached for the life of the session doing the
>> inserts, the life of the trigger, or until the database is restarted?
>
> Duration of a session, there is no support to cache a plan for any other
> period.
>
>> I guess I'm trying to figure out how to get the plan to re-cache, without
>> making it entirely dynamic.
>
> I don't think you can. Restarting the connection should be enough.
>
> Have a ncie day,
> --
> Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
>> From each according to his ability. To each according to his ability to litigate.
>

Re: a question for the way-back machine

From
"Gurjeet Singh"
Date:
On 12/14/06, Martijn van Oosterhout <kleptog@svana.org> wrote:
On Thu, Dec 14, 2006 at 09:35:47AM -0800, Ben wrote:
> Interesting. Is that plan cached for the life of the session doing the
> inserts, the life of the trigger, or until the database is restarted?

Duration of a session, there is no support to cache a plan for any other
period.

Is it session level !!??? I think it is query level; don't we discard the plan after each query? AFAIK, only the plpgsql and other pl/ sisters cache the plan for the queries inside their code-blocks. I am under the impression that if a query is fired multiple times, it will be planned every time, unless we user PREPAREd stetements.

Please correct me if I am wrong.

> I guess I'm trying to figure out how to get the plan to re-cache, without
> making it entirely dynamic.

I don't think you can. Restarting the connection should be enough.

Have a ncie day,
--
Martijn van Oosterhout   <kleptog@svana.org >   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFFgZWEIB7bNG8LQkwRAkh9AJ9F1YmGTnmBt4iiKNUnkmlM+Xp9/QCffZpl
x4OxRMtBHmcWnTyl/bDFtbo=
=zs38
-----END PGP SIGNATURE-----





--
gurjeet[.singh]@ EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com

Re: a question for the way-back machine

From
Martijn van Oosterhout
Date:
On Fri, Dec 15, 2006 at 02:29:48PM +0530, Gurjeet Singh wrote:
> Is it session level !!??? I think it is query level; don't we discard the
> plan after each query? AFAIK, only the plpgsql and other pl/ sisters cache
> the plan for the queries inside their code-blocks. I am under the impression
> that if a query is fired multiple times, it will be planned every time,
> unless we user PREPAREd stetements.

In the OP's case, he's talking about query fired from a trigger written
in pl/pgsql, so it's cached for the session. Normal queries sent by the
client are not cached ofcourse. In his example it's not the INSERT
that's being cached, it's a query in the trigger.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment