Thread: a question for the way-back machine
I'm using an ancient version of postgresql (7.3 - don't ask) and I'm experiencing a problem where many inserts into an empty table slow down over time, even if I analyze in the middle of the insertions. pg_stat_user_tables shows lots and lots of full scans, which explains the slowdown, but leads me to wonder why the full scans. Each insert is its own transaction. This table has at least one unique index on it, and a couple other multi-column indexes, so I was wondering if maybe that was what was causing the full scans. But a an explain shows that a select for that unique column will use the index, so now I'm wondering if maybe: - 7.3 isn't smart enough to use an index on an insert? Seems unlikely. - I have to start a new session to use new planner stats, even though each insert is in it's own transaction? Something I haven't verified for myself yet but have been told by others is that if I start these inserts quickly after installing a new database cluster (which probably translates as: before running "vacuum analyze" for the first time), then these slowdowns do not occur. I can't figure out why that might be. Any other thoughts?
On Wed, Dec 13, 2006 at 01:10:44PM -0800, Ben wrote: > I'm using an ancient version of postgresql (7.3 - don't ask) and I'm > experiencing a problem where many inserts into an empty table slow down > over time, even if I analyze in the middle of the insertions. > pg_stat_user_tables shows lots and lots of full scans, which explains the > slowdown, but leads me to wonder why the full scans. Each insert is its > own transaction. This table has at least one unique index on it, and a > couple other multi-column indexes, so I was wondering if maybe that was > what was causing the full scans. But a an explain shows that a select for > that unique column will use the index, so now I'm wondering if maybe: > > - 7.3 isn't smart enough to use an index on an insert? Seems unlikely. This question makes no sense, you don't need an index to insert. Are you sure it's not due to some foreign key check? BTW, seperate inserts is the worst way to load data. At least put them within a single transaction, or use COPY. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
On Wed, 13 Dec 2006, Martijn van Oosterhout wrote: >> - 7.3 isn't smart enough to use an index on an insert? Seems unlikely. > > This question makes no sense, you don't need an index to insert. Wouldn't it need to check the unique constraint (an index on the table) before the insert can succeed? It seems like it would be better to check the index than to do a full table scan to try to satisfy that constraint. > Are you sure it's not due to some foreign key check? No, but it seems unlikely, given that the vast majority of activity is inserts into a single table, and that this table has massive amounts of sequential scans according to pg_stat_user_tables. > BTW, seperate inserts is the worst way to load data. At least put them > within a single transaction, or use COPY. Oh, I know. It's not my choice, and not (yet) changeable.
On Wed, Dec 13, 2006 at 02:01:46PM -0800, Ben wrote: > > > On Wed, 13 Dec 2006, Martijn van Oosterhout wrote: > > >>- 7.3 isn't smart enough to use an index on an insert? Seems unlikely. > > > >This question makes no sense, you don't need an index to insert. > > Wouldn't it need to check the unique constraint (an index on the table) > before the insert can succeed? It seems like it would be better to check > the index than to do a full table scan to try to satisfy that constraint. When you insert a tuple, it needs to be inserted into the index, yes. There is no way an insert can cause a sequential scan, except by some trigger defined on the table. > >Are you sure it's not due to some foreign key check? > > No, but it seems unlikely, given that the vast majority of activity is > inserts into a single table, and that this table has massive amounts of > sequential scans according to pg_stat_user_tables. You're not doing a select within the insert statement are you? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
> When you insert a tuple, it needs to be inserted into the index, yes. There > is no way an insert can cause a sequential scan, except by some trigger > defined on the table. Actually, as it happens, there *is* a trigger defined on the table to fire before insert, but it too uses an index scan, at least according to explain. Though, you'd think if it actually was using an index scan, that would be showing up in pg_stat_user_tables, which it isn't. Might the fact that the trigger is a plpgsql function be throwing it off and keeping it from using more recent planner stats? > You're not doing a select within the insert statement are you? No, just within the trigger.
Ben <bench@silentmedia.com> writes: > Wouldn't it need to check the unique constraint (an index on the table) > before the insert can succeed? It seems like it would be better to check > the index than to do a full table scan to try to satisfy that constraint. Postgres contains no code that would even consider doing a seqscan for a uniqueness check; it could not work because of race-condition considerations, never mind performance. You have not given us any clue about where the seqscans are really coming from ... have you got any user-defined triggers, SQL functions in CHECK constraints, stuff like that? regards, tom lane
Ben wrote: >> When you insert a tuple, it needs to be inserted into the index, yes. >> There >> is no way an insert can cause a sequential scan, except by some trigger >> defined on the table. > > Actually, as it happens, there *is* a trigger defined on the table to > fire before insert, but it too uses an index scan, at least according to > explain. Though, you'd think if it actually was using an index scan, > that would be showing up in pg_stat_user_tables, which it isn't. Might > the fact that the trigger is a plpgsql function be throwing it off and > keeping it from using more recent planner stats? The query-plan for the function will be compiled first time it is called. From that point on, it is fixed. It seems that is the source of your seq-scans. You can use the EXECUTE statement to construct a dynamic version of the query, which will be planned every time it is run. -- Richard Huxton Archonet Ltd
Interesting. Is that plan cached for the life of the session doing the inserts, the life of the trigger, or until the database is restarted? I guess I'm trying to figure out how to get the plan to re-cache, without making it entirely dynamic. On Thu, 14 Dec 2006, Richard Huxton wrote: > Ben wrote: >>> When you insert a tuple, it needs to be inserted into the index, yes. >>> There >>> is no way an insert can cause a sequential scan, except by some trigger >>> defined on the table. >> >> Actually, as it happens, there *is* a trigger defined on the table to fire >> before insert, but it too uses an index scan, at least according to >> explain. Though, you'd think if it actually was using an index scan, that >> would be showing up in pg_stat_user_tables, which it isn't. Might the fact >> that the trigger is a plpgsql function be throwing it off and keeping it >> from using more recent planner stats? > > The query-plan for the function will be compiled first time it is called. > From that point on, it is fixed. It seems that is the source of your > seq-scans. > > You can use the EXECUTE statement to construct a dynamic version of the > query, which will be planned every time it is run. > > -- > Richard Huxton > Archonet Ltd >
On Thu, Dec 14, 2006 at 09:35:47AM -0800, Ben wrote: > Interesting. Is that plan cached for the life of the session doing the > inserts, the life of the trigger, or until the database is restarted? Duration of a session, there is no support to cache a plan for any other period. > I guess I'm trying to figure out how to get the plan to re-cache, without > making it entirely dynamic. I don't think you can. Restarting the connection should be enough. Have a ncie day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
OK, thanks. On Thu, 14 Dec 2006, Martijn van Oosterhout wrote: > On Thu, Dec 14, 2006 at 09:35:47AM -0800, Ben wrote: >> Interesting. Is that plan cached for the life of the session doing the >> inserts, the life of the trigger, or until the database is restarted? > > Duration of a session, there is no support to cache a plan for any other > period. > >> I guess I'm trying to figure out how to get the plan to re-cache, without >> making it entirely dynamic. > > I don't think you can. Restarting the connection should be enough. > > Have a ncie day, > -- > Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ >> From each according to his ability. To each according to his ability to litigate. >
On 12/14/06, Martijn van Oosterhout <kleptog@svana.org> wrote:
Is it session level !!??? I think it is query level; don't we discard the plan after each query? AFAIK, only the plpgsql and other pl/ sisters cache the plan for the queries inside their code-blocks. I am under the impression that if a query is fired multiple times, it will be planned every time, unless we user PREPAREd stetements.
Please correct me if I am wrong.
--
gurjeet[.singh]@ EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com
On Thu, Dec 14, 2006 at 09:35:47AM -0800, Ben wrote:
> Interesting. Is that plan cached for the life of the session doing the
> inserts, the life of the trigger, or until the database is restarted?
Duration of a session, there is no support to cache a plan for any other
period.
Is it session level !!??? I think it is query level; don't we discard the plan after each query? AFAIK, only the plpgsql and other pl/ sisters cache the plan for the queries inside their code-blocks. I am under the impression that if a query is fired multiple times, it will be planned every time, unless we user PREPAREd stetements.
Please correct me if I am wrong.
> I guess I'm trying to figure out how to get the plan to re-cache, without
> making it entirely dynamic.
I don't think you can. Restarting the connection should be enough.
Have a ncie day,
--
Martijn van Oosterhout <kleptog@svana.org > http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
iD8DBQFFgZWEIB7bNG8LQkwRAkh9AJ9F1YmGTnmBt4iiKNUnkmlM+Xp9/QCffZpl
x4OxRMtBHmcWnTyl/bDFtbo=
=zs38
-----END PGP SIGNATURE-----
--
gurjeet[.singh]@ EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com
On Fri, Dec 15, 2006 at 02:29:48PM +0530, Gurjeet Singh wrote: > Is it session level !!??? I think it is query level; don't we discard the > plan after each query? AFAIK, only the plpgsql and other pl/ sisters cache > the plan for the queries inside their code-blocks. I am under the impression > that if a query is fired multiple times, it will be planned every time, > unless we user PREPAREd stetements. In the OP's case, he's talking about query fired from a trigger written in pl/pgsql, so it's cached for the session. Normal queries sent by the client are not cached ofcourse. In his example it's not the INSERT that's being cached, it's a query in the trigger. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.