Thread: update functions locking tables
I have almost completed one web site migration from mysql to pgsql. It is already running totally pgsql. But there is one last conversion problem. Most of the queries use tables populated every 3 hours. In mysql, for the site to not be unavailable during updating i was creating temporary tables, populating them, dropping the main tables and then renaming the temp tables to the main tables. The updating is not trivial (for me) and needs some coding effort to be done. Since pgsql has MVCC I wanted to eliminate the table rotation step and use a transaction to update the tables. But what is happening is that the plpgsql update functions are locking the tables and this is what the web clients are getting (from ps ax): 26873 ? S 0:18 postgres: user database 127.0.0.1(38946) SELECT waiting 23973 ? S 0:22 postgres: user database 127.0.0.1(43396) SELECT waiting 31212 ? S 0:02 postgres: user database 127.0.0.1(60619) SELECT waiting 21601 ? S 0:00 postgres: user database 127.0.0.1(46770) SELECT waiting I've been trying for 3 days to figure out what is happening to no avail. What am i missing about transactions and MVCC? What could make a plpgsql update function lock a table? The indexes are default btree. Otherwise the functions are behaving exactly as expected. Regards, Clodoaldo Pinto
Clodoaldo Pinto <clodoaldo.pinto@gmail.com> writes: > I've been trying for 3 days to figure out what is happening to no > avail. What am i missing about transactions and MVCC? What could make > a plpgsql update function lock a table? What is the function doing to the table, exactly? DDL changes generally take exclusive locks ... regards, tom lane
On Mon, Aug 29, 2005 at 09:41:21PM -0300, Clodoaldo Pinto wrote: > > 26873 ? S 0:18 postgres: user database 127.0.0.1(38946) SELECT waiting > 23973 ? S 0:22 postgres: user database 127.0.0.1(43396) SELECT waiting > 31212 ? S 0:02 postgres: user database 127.0.0.1(60619) SELECT waiting > 21601 ? S 0:00 postgres: user database 127.0.0.1(46770) SELECT waiting > > I've been trying for 3 days to figure out what is happening to no > avail. What am i missing about transactions and MVCC? What could make > a plpgsql update function lock a table? The indexes are default btree. In general, writers shouldn't block readers. Have you examined pg_locks? Do you know exactly what the blocked queries are, or can you find out from pg_stat_activity (stats_command_string must be enabled)? Are you doing any explicit locking (LOCK statement)? -- Michael Fuhr
2005/8/29, Michael Fuhr <mike@fuhr.org>: > > In general, writers shouldn't block readers. Have you examined > pg_locks? Do you know exactly what the blocked queries are, or can > you find out from pg_stat_activity (stats_command_string must be > enabled)? Are you doing any explicit locking (LOCK statement)? > This is one of the blocked queries: select count (*) from times_producao where pontos_0 - pontos_7 > 0; These selects were done during the updating: select * from pg_locks; relation | database | transaction | pid | mode | granted ----------+----------+-------------+-------+---------------------+--------- 1813938 | 1813868 | | 7040 | AccessShareLock | t 1813938 | 1813868 | | 7040 | RowExclusiveLock | t 1813938 | 1813868 | | 7040 | ShareLock | t 1813938 | 1813868 | | 7040 | AccessExclusiveLock | t 1813939 | 1813868 | | 7040 | AccessShareLock | t 1813939 | 1813868 | | 7040 | RowExclusiveLock | t 1813939 | 1813868 | | 7040 | ShareLock | t 1813939 | 1813868 | | 7040 | AccessExclusiveLock | t 1813914 | 1813868 | | 24012 | AccessShareLock | f 1813892 | 1813868 | | 7040 | AccessShareLock | t 1813892 | 1813868 | | 7040 | RowExclusiveLock | t 1813914 | 1813868 | | 7040 | AccessShareLock | t 1813914 | 1813868 | | 7040 | RowExclusiveLock | t 1813914 | 1813868 | | 7040 | ShareLock | t 1813914 | 1813868 | | 7040 | AccessExclusiveLock | t 1813896 | 1813868 | | 7040 | AccessShareLock | t 16839 | 1813868 | | 12751 | AccessShareLock | t 2314110 | 1813868 | | 26871 | AccessShareLock | f 1813914 | 1813868 | | 26844 | AccessShareLock | f | | 288553 | 26844 | ExclusiveLock | t | | 288561 | 24012 | ExclusiveLock | t | | 288548 | 7040 | ExclusiveLock | t | | 288558 | 26871 | ExclusiveLock | t 1813914 | 1813868 | | 31212 | AccessShareLock | f 2314110 | 1813868 | | 7040 | AccessShareLock | t 2314110 | 1813868 | | 7040 | RowExclusiveLock | t 2314110 | 1813868 | | 7040 | ShareLock | t 2314110 | 1813868 | | 7040 | AccessExclusiveLock | t | | 288556 | 31212 | ExclusiveLock | t | | 288562 | 12751 | ExclusiveLock | t 1813887 | 1813868 | | 7040 | AccessShareLock | t 2314112 | 1813868 | | 7040 | ShareLock | t 2314112 | 1813868 | | 7040 | AccessExclusiveLock | t 1813907 | 1813868 | | 7040 | AccessShareLock | t 1813911 | 1813868 | | 7040 | AccessShareLock | t (35 rows) select * from pg_stat_user_tables as a inner join pg_locks as b on a.relid = b.relation ; relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | relation | database | transaction | pid | mode | granted ---------+------------+-------------------+----------+--------------+----------+---------------+-----------+-----------+-----------+----------+----------+-------------+-------+---------------------+--------- 1813914 | public | usuarios_producao | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1813914 | 1813868 | | 24012 | AccessShareLock | f 1813892 | public | last_date | 0 | 0 | | | 0 | 0 | 0 | 1813892 | 1813868 | | 7040 | AccessShareLock | t 1813892 | public | last_date | 0 | 0 | | | 0 | 0 | 0 | 1813892 | 1813868 | | 7040 | RowExclusiveLock | t 1813914 | public | usuarios_producao | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1813914 | 1813868 | | 7040 | AccessShareLock | t 1813914 | public | usuarios_producao | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1813914 | 1813868 | | 7040 | RowExclusiveLock | t 1813914 | public | usuarios_producao | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1813914 | 1813868 | | 7040 | ShareLock | t 1813914 | public | usuarios_producao | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1813914 | 1813868 | | 7040 | AccessExclusiveLock | t 1813896 | public | times | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1813896 | 1813868 | | 7040 | AccessShareLock | t 2314110 | public | times_producao | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2314110 | 1813868 | | 31383 | AccessShareLock | f 1813914 | public | usuarios_producao | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1813914 | 1813868 | | 12351 | AccessShareLock | f 2314110 | public | times_producao | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2314110 | 1813868 | | 26871 | AccessShareLock | f 1813914 | public | usuarios_producao | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1813914 | 1813868 | | 26844 | AccessShareLock | f 2314110 | public | times_producao | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2314110 | 1813868 | | 24021 | AccessShareLock | f 1813914 | public | usuarios_producao | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1813914 | 1813868 | | 31212 | AccessShareLock | f 2314110 | public | times_producao | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2314110 | 1813868 | | 7040 | AccessShareLock | t 2314110 | public | times_producao | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2314110 | 1813868 | | 7040 | RowExclusiveLock | t 2314110 | public | times_producao | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2314110 | 1813868 | | 7040 | ShareLock | t 2314110 | public | times_producao | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2314110 | 1813868 | | 7040 | AccessExclusiveLock | t 1813914 | public | usuarios_producao | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1813914 | 1813868 | | 26872 | AccessShareLock | f 1813887 | public | datas | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1813887 | 1813868 | | 7040 | AccessShareLock | t 1813914 | public | usuarios_producao | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1813914 | 1813868 | | 8875 | AccessShareLock | f 1813907 | public | usuarios | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1813907 | 1813868 | | 7040 | AccessShareLock | t 1813911 | public | usuarios_indice | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1813911 | 1813868 | | 7040 | AccessShareLock | t (23 rows) I had just enabled stats_command_string and in about 15 hours i will be able to post pg_stat_activity. Regards, Clodoaldo Pinto
2005/8/29, Tom Lane <tgl@sss.pgh.pa.us>: > > What is the function doing to the table, exactly? DDL changes generally > take exclusive locks ... This is the transaction: begin; select update_last_date(); truncate times_producao; select kstime(), insert_times_producao(), kstime(); select kstime(), update_ranking_times(), kstime(); truncate usuarios_producao; select kstime(), insert_usuarios_producao(), kstime(); analyze usuarios_producao; select kstime(), update_ranking_usuarios(), kstime(); select kstime(), update_ranking_usuarios_time(), kstime(); select kstime(), update_team_active_members(), kstime(); commit; This is one of the functions: CREATE OR REPLACE FUNCTION update_ranking_usuarios() RETURNS void AS $BODY$declare linha record; rank integer; begin rank := 0; for linha in select usuario from usuarios_producao order by pontos_0 desc, pontos_7 desc, pontos_24 desc loop rank := rank + 1; update usuarios_producao set rank_0 = rank where usuario = linha.usuario ; end loop; -- ---------------------------------------------------------- rank := 0; for linha in select usuario from usuarios_producao order by pontos_0 + (pontos_7 / 7) desc, pontos_0 desc loop rank := rank + 1; update usuarios_producao set rank_24 = rank where usuario = linha.usuario ; end loop; -- ---------------------------------------------------------- rank := 0; for linha in select usuario from usuarios_producao order by pontos_0 + pontos_7 desc, pontos_0 desc loop rank := rank + 1; update usuarios_producao set rank_7 = rank where usuario = linha.usuario ; end loop; -- ---------------------------------------------------------- rank := 0; for linha in select usuario from usuarios_producao order by pontos_0 + (pontos_7 * 30 / 7) desc, pontos_0 desc loop rank := rank + 1; update usuarios_producao set rank_30 = rank where usuario = linha.usuario ; end loop; return; end;$BODY$ LANGUAGE 'plpgsql' VOLATILE STRICT; There is no DDL inside the functions. Regards, Clodoaldo Pinto
On Tue, Aug 30, 2005 at 08:39:52AM -0300, Clodoaldo Pinto wrote: > > begin; > select update_last_date(); > truncate times_producao; TRUNCATE acquires an AccessExclusiveLock, which conflicts with all other lock types. Locks are held until the transaction completes, so once this lock is acquired no other transactions will be able to access the table until this transaction commits or rolls back. DELETE is slower than TRUNCATE but it won't block readers in other transactions. -- Michael Fuhr
2005/8/30, Michael Fuhr <mike@fuhr.org>: > > TRUNCATE acquires an AccessExclusiveLock, which conflicts with all > other lock types. Locks are held until the transaction completes, > so once this lock is acquired no other transactions will be able > to access the table until this transaction commits or rolls back. > > DELETE is slower than TRUNCATE but it won't block readers in other > transactions. > I think it is of great help. I will change it and let you know what happened. Regards, Clodoaldo Pinto
Clodoaldo Pinto <clodoaldo.pinto@gmail.com> writes: > 2005/8/29, Tom Lane <tgl@sss.pgh.pa.us>: > > > > What is the function doing to the table, exactly? DDL changes generally > > take exclusive locks ... > > This is the transaction: > > begin; > select update_last_date(); > truncate times_producao; I think truncate takes a table lock. Just change it to "delete from times_producao". Also, if consider doing a "vacuum full" or "cluster" after the batch job to clear up the free space (not in a large transaction). That will still take a table lock but it may be a small enough downtime to be worth the speed increase the rest of the day. -- greg
30 Aug 2005 09:10:51 -0400, Greg Stark <gsstark@mit.edu>: > > I think truncate takes a table lock. > Just change it to "delete from times_producao". Thanks, i will try it. > > Also, if consider doing a "vacuum full" or "cluster" after the batch job to > clear up the free space (not in a large transaction). That will still take a > table lock but it may be a small enough downtime to be worth the speed > increase the rest of the day. > I'm already doing a vacuum (not full) once a day. A vacuum full or a cluster is totally out of reach since each take about one hour. The biggest table is 170 million rows long. Regards, Clodoaldo Pinto
On Tue, Aug 30, 2005 at 10:18:20AM -0300, Clodoaldo Pinto wrote: > 30 Aug 2005 09:10:51 -0400, Greg Stark <gsstark@mit.edu>: > > Also, if consider doing a "vacuum full" or "cluster" after the batch job to > > clear up the free space (not in a large transaction). That will still take a > > table lock but it may be a small enough downtime to be worth the speed > > increase the rest of the day. > > I'm already doing a vacuum (not full) once a day. > > A vacuum full or a cluster is totally out of reach since each take > about one hour. Even if you cluster/vacuum only the just-loaded table? > The biggest table is 170 million rows long. I hope this is not the one you are loading daily ... -- Alvaro Herrera <alvherre[]alvh.no-ip.org> Architect, www.EnterpriseDB.com "El destino baraja y nosotros jugamos" (A. Schopenhauer)
Clodoaldo Pinto <clodoaldo.pinto@gmail.com> writes: > I'm already doing a vacuum (not full) once a day. > > A vacuum full or a cluster is totally out of reach since each take > about one hour. The biggest table is 170 million rows long. Well a regular vacuum will mark the free space for reuse. If you insert or update any records the new ones will go into those spots. Make sure you set the fsm_* parameters high enough to cover all the updates and inserts for the entire day (or repeat the vacuum periodically even if there are no deletes or updates going on to create more free space). You should realize that what's going on here is that the old records are still in your table, marked as deleted. So any sequential scan will take twice as long as otherwise. I think even index scans could take twice as long too depending on the distribution of values. I'm not saying that's untenable. If all your queries are fast enough then you're set and it's just a cost of having no downtime. -- greg
2005/8/30, Alvaro Herrera <alvherre@alvh.no-ip.org>: > On Tue, Aug 30, 2005 at 10:18:20AM -0300, Clodoaldo Pinto wrote: > > A vacuum full or a cluster is totally out of reach since each take > > about one hour. > > Even if you cluster/vacuum only the just-loaded table? > No, that would be much faster. The biggest just updated is about 600 thousand rows. I will consider it. > > The biggest table is 170 million rows long. > > I hope this is not the one you are loading daily ... > I load daily 8 times 700+ thousand rows.
30 Aug 2005 10:35:31 -0400, Greg Stark <gsstark@mit.edu>: > > Well a regular vacuum will mark the free space for reuse. If you insert or > update any records the new ones will go into those spots. Make sure you set > the fsm_* parameters high enough to cover all the updates and inserts for the > entire day (or repeat the vacuum periodically even if there are no deletes or > updates going on to create more free space). I will check those fsm_* parameters. > > You should realize that what's going on here is that the old records are still > in your table, marked as deleted. So any sequential scan will take twice as > long as otherwise. I think even index scans could take twice as long too > depending on the distribution of values. > > I'm not saying that's untenable. If all your queries are fast enough then > you're set and it's just a cost of having no downtime. > > -- > greg > >
On Tue, Aug 30, 2005 at 08:13:15AM -0300, Clodoaldo Pinto wrote: > 2005/8/29, Michael Fuhr <mike@fuhr.org>: > > > > In general, writers shouldn't block readers. Have you examined > > pg_locks? Do you know exactly what the blocked queries are, or can > > you find out from pg_stat_activity (stats_command_string must be > > enabled)? Are you doing any explicit locking (LOCK statement)? > > > > This is one of the blocked queries: > select count (*) from times_producao where pontos_0 - pontos_7 > 0; FWIW, that where clause might be more efficient as WHERE pontos_0 > pontos_7. Some databases would be able to use indexes to answer that (not sure if PostgreSQL could), plus it removes an operator. It also seems to be cleaner code to me. :) -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com 512-569-9461
2005/8/30, Jim C. Nasby <jnasby@pervasive.com>: > > FWIW, that where clause might be more efficient as > WHERE pontos_0 > pontos_7. Some databases would be able to use indexes > to answer that (not sure if PostgreSQL could), plus it removes an > operator. It also seems to be cleaner code to me. :) > -- Done, thanks. Regards, Clodoaldo Pinto