Thread: performance "tests"
Hi, having read on this list (some time ago) that inserts could become slow when there are foreign keys constraints, I wanted to test it. So I created a DB with 5 tables (T1 -> T5) with, for 0<i,j<6, Tj has a foreign key from Ti. More clearly: create table T1(k1 integer NOT NULL PRIMARY KEY, k2 integer,k3 integer,k4 integer,k5 integer); create table T2(k2 integer NOT NULL PRIMARY KEY, k1 integer,k3 integer,k4 integer,k5 integer, FOREIGN KEY(k1) REFERENCEST1 ON DELETE CASCADE); create table T3(k3 integer NOT NULL PRIMARY KEY, k1 integer,k2 integer,k4 integer,k5 integer, FOREIGN KEY(k1) REFERENCEST1 ON DELETE CASCADE,FOREIGN KEY(k2) REFERENCES T2 ON DELETE CASCADE); create table T4(k4 integer NOT NULL PRIMARY KEY, k1 integer,k2 integer,k3 integer,k5 integer, FOREIGN KEY(k1) REFERENCEST1 ON DELETE CASCADE,FOREIGN KEY(k2) REFERENCES T2 ON DELETE CASCADE,FOREIGN KEY(k3) REFERENCES T3 ON DELETE CASCADE); create table T5(k5 integer NOT NULL PRIMARY KEY, k1 integer,k2 integer,k3 integer,k4 integer, FOREIGN KEY(k1) REFERENCEST1 ON DELETE CASCADE,FOREIGN KEY(k2) REFERENCES T2 ON DELETE CASCADE,FOREIGN KEY(k3) REFERENCES T3 ON DELETE CASCADE,FOREIGNKEY(k4) REFERENCES T4 ON DELETE CASCADE); I also wrote a python script to populate this DB (see below). I certainly don't pretend this test is reflecting reality, but I wanted to ask explanations about one of the things that happen. I now have 300000 rows in each table, and it fills the database quite fast, and the postmaster takes something like 25% of the CPU. At some times, it seems to hang: it doesn't insert any rows for more than 10 seconds. At that time, the postmaster process takes 0%. Why is that? I would have thought that the postmaster would use much power to insert few rows when the DB gets filled, but it's not happening that way. When rows are inserted, it happens more or less at the same speed as initially (when DB is empty). When rows are inserted more slowly, it's because the postmaster process uses less CPU. What's strange is that everything else hangs also! Would that be due to the CPU?? Thanks for your help. Raph. Some info: Debian GNU/Linux cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 6 model name : AMD Athlon(tm) XP stepping : 2 cpu MHz : 1050.052 cache size : 256 KB dpkg -l postgresql Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Installed/Config-files/Unpacked/Failed-config/Half-installed |/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad) ||/ Name Version Description +++-==============-==============-============================================ ii postgresql 7.1.3-7 Object-relational SQL database, descended fr uname -r 2.4.17 cat /etc/debian_version 3.0 HEre's the script: import random import sys from pyPgSQL import libpq dbname = 'test' random.seed() cnx = libpq.PQconnectdb('host=localhost user=rb password=linuxxxx dbname=%s' % dbname) INSERT FIRST 5 ENTRIES IN EACH TABLE for i in range (1,6): #INSERER 5 RANGEES for j in range(1,6): #DANS LES 5 TABLES res = cnx.query('INSERT INTO T'+str(j)+' VALUES('+str(i)+','+str(i)+','+str(i)+','+str(i)+','+str(i)+')') N=[5,5,5,5,5] #number of rows in each table while N[0]<1000000: t=random.randrange(1,6,1) #table in which to insert the next row k=random.randrange(1,6,1) #table to which we link the inserted row r=[random.randrange(1,N[0],1),random.randrange(1,N[1],1) ,random.randrange(1,N[2],1) ,random.randrange(1,N[3],1) ,random.randrange(1,N[4],1)] r[t-1]=N[t-1]+1 #In table Tt ,the field kt is the primary key and has the value N[t]+1 sqlquery="insert into T"+ str(t)+ " (k1,k2,k3,k4,k5) values ('"+str(r[0])+"','"+str(r[1])+"','"+str(r[2])+"','"+str(r[3])+"','"+str(r[4])+"')" print sqlquery res = cnx.query(sqlquery) N[t-1]=N[t-1]+1 del cnx, res
On Wed, 2002-04-10 at 14:00, Raphael Bauduin wrote: ... > dpkg -l postgresql > Desired=Unknown/Install/Remove/Purge/Hold > | Status=Not/Installed/Config-files/Unpacked/Failed-config/Half-installed > |/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad) > ||/ Name Version Description > +++-==============-==============-============================================ > ii postgresql 7.1.3-7 Object-relational SQL database, descended fr 7.2.1 is available in Debian unstable. It might be better to run tests on the latest version. -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C "But seek ye first the kingdom of God, and his righteousness; and all these things shall be added unto you." Matthew 6:33
Attachment
Raphael Bauduin <raphael@be.easynet.net> writes: > At some times, it seems to hang: it doesn't insert any rows for more > than 10 seconds. At that time, the postmaster process takes 0%. Why is > that? At a guess, you're seeing the syncer daemon flushing a lot of dirty kernel disk buffers out to disk, and thereby monopolizing disk I/O. I haven't experimented too much with Linux, but on HPUX it's not difficult for a sync() call to bring the system to its knees for many seconds, if you've got application programs that have written a whole lot of pages since the last sync. > What's strange is that everything else hangs also! Would that be due > to the CPU?? The CPU is free, but everything that wants to access disk is starved for disk bandwidth ... > ii postgresql 7.1.3-7 Object-relational SQL database, descended fr If you're going to run tests on foreign-key performance, please use 7.2.*. regards, tom lane
From my many years of Informix knowledge, we noticed that checkpoints, during high activity times, did take a long time, because it locked the shared memory segment. We found that setting the checkpoint knobs to flush almost constantly, overall, was much better for performance. Looking in postgresql.conf, it seems that some tweaking of : CHECKPOINT_SEGMENTS and CHECKPOINT_TIMEOUT are in order. I also see some interesting items in the WAL_* configuration parameters, and would look at these as well. Again, in Informix-speak, we were able to control when the buffers flushed to disk, with parameters like: Start flushing buffers when they are X% full and keep flushing until they are X% full Overall, having TONS of buffers helped benchmark performance, but could have slowed down checkpoints had we not continually flushed to disk. At 11:37 AM 4/10/02 -0400, Tom Lane wrote: >Raphael Bauduin <raphael@be.easynet.net> writes: > > At some times, it seems to hang: it doesn't insert any rows for more > > than 10 seconds. At that time, the postmaster process takes 0%. Why is > > that? > >At a guess, you're seeing the syncer daemon flushing a lot of dirty >kernel disk buffers out to disk, and thereby monopolizing disk I/O. >I haven't experimented too much with Linux, but on HPUX it's not >difficult for a sync() call to bring the system to its knees for many >seconds, if you've got application programs that have written a whole >lot of pages since the last sync. -- Naomi Walker Chief Information Officer Eldorado Computing, Inc. 602-604-3100 ext 242
Naomi Walker wrote: > From my many years of Informix knowledge, we noticed that checkpoints, > during high activity times, did take a long time, because it locked the > shared memory segment. We found that setting the checkpoint knobs to flush > almost constantly, overall, was much better for performance. > > Looking in postgresql.conf, it seems that some tweaking of : > CHECKPOINT_SEGMENTS and CHECKPOINT_TIMEOUT are in order. > > I also see some interesting items in the WAL_* configuration parameters, > and would look at these as well. Again, in Informix-speak, we were able to > control when the buffers flushed to disk, with parameters like: > Start flushing buffers when they are X% full > and keep flushing until they are X% full > > Overall, having TONS of buffers helped benchmark performance, but could > have slowed down checkpoints had we not continually flushed to disk. Actually, we don't lock shared memory like Informix does. As I remember, other backends can write to the WAL while we are doing the checkpoint. In fact, there is code in there that expects the WAL file may grow during checkpointing. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Tom Lane wrote: > Raphael Bauduin <raphael@be.easynet.net> writes: > > At some times, it seems to hang: it doesn't insert any rows for more > > than 10 seconds. At that time, the postmaster process takes 0%. Why is > > that? > > At a guess, you're seeing the syncer daemon flushing a lot of dirty > kernel disk buffers out to disk, and thereby monopolizing disk I/O. > I haven't experimented too much with Linux, but on HPUX it's not > difficult for a sync() call to bring the system to its knees for many > seconds, if you've got application programs that have written a whole > lot of pages since the last sync. Some BSD's implement trickle sync, for this very reason. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Raphael Bauduin <raphael.bauduin@be.easynet.net> writes: > I just installed 7.2.1. I run 7.1.3 in parallel. after I run the script > to introduce 1000 rows in T1 in each version, deleting is much faster > with 7.1.3 than with 7.2.1 (I took the same configuration parameters for > both servers...). When running the delete query in parallel, 7.1.3 > finishes really faster. Uh, what was the test query again? And what does EXPLAIN say about it on each version? regards, tom lane
On Thu, Apr 11, 2002 at 10:34:40AM -0400, Tom Lane wrote: > Raphael Bauduin <raphael.bauduin@be.easynet.net> writes: > > I just installed 7.2.1. I run 7.1.3 in parallel. after I run the script > > to introduce 1000 rows in T1 in each version, deleting is much faster > > with 7.1.3 than with 7.2.1 (I took the same configuration parameters for > > both servers...). When running the delete query in parallel, 7.1.3 > > finishes really faster. > > Uh, what was the test query again? And what does EXPLAIN say about it > on each version? I just want to delete all rows in the table T1. As all other tables have foreign keys coming from T1, all rows in all tables are deleted. For example, from T1: test=# select * from T1 where k1>50 and k1< 55; k1 | k2 | k3 | k4 | k5 ----+----+----+----+---- 51 | 3 | 42 | 26 | 7 52 | 15 | 32 | 35 | 2 53 | 4 | 8 | 11 | 12 54 | 3 | 33 | 38 | 26 (4 rows) for Tj, kj is primary key. for 0<i<j<5, Tj has a foreign key coming (ki) from Ti (ki, which is also primary key of Ti) so, Tj has j-1 foreign keys (T2 has one from T1, T3 has 2: from T1 and T2,....) All tables don't have the same number of rows, but it's more or less the same (rows are inserted randomly in the tables) From 7.1.3: test=# select count(*) from T1; count ------- 1000 (1 row) test=# explain delete from T1; NOTICE: QUERY PLAN: Seq Scan on t1 (cost=0.00..20.00 rows=1000 width=6) EXPLAIN From 7.2.1: test=# select count(*) from T1; count ------- 1000 (1 row) test=# explain delete from T1; NOTICE: QUERY PLAN: Seq Scan on t1 (cost=0.00..20.00 rows=1000 width=6) EXPLAIN So it says exactly the same.
> If you're going to run tests on foreign-key performance, please use > 7.2.*. I just installed 7.2.1. I run 7.1.3 in parallel. after I run the script to introduce 1000 rows in T1 in each version, deleting is much faster with 7.1.3 than with 7.2.1 (I took the same configuration parameters for both servers...). When running the delete query in parallel, 7.1.3 finishes really faster. For inserts, the 7.2.1 is slightly faster for the first 3000 inserts, from inserts from 3000 to 5000 it's more or less equivalent I'll go further in documentation to look if I forgot something, but if you have any hints, they're welcome :-) Raph