Thread: File Fragmentation
Hi, I have a PostgreSQL database with 50 tables. Every minute, sequentially, a batch load 10.000 rows of 250 bytes with a COPY. After a day, i got a database with 50 tables with 1.440 set of 10.000 rows. The tables are cleany and naturally clustered by the inserted timestamp. Each table has data in a file with 1.440 fragments (each day) Now, there is a partition rotation script, that suppress old tables when some size limit happens. Let suppose, that this script runs and suppress only one table qith few days of data, then recreates a new empty one. I got a disk freespace very fragmented, the space used by the rotated table. Then some COPY inserts new data, the tables got new data in theirs files and continue to be fragmented. The new tables begins to grows from the begining of the free space to and is more fragmented that ever. But all the data are always clustered in the tables. After few more rotated tables, all the tables are heavily fragmented and even if the data is clustered inside the file fragmentsare spread all over the drive. After few days, I see IO wait grows and grows, even when the size of the database stabilises due to the rotation script. I suspect the heavy fragmented files to the cause of the IO wait grows (PostgreSQL on WIndows). How to cope with that ? It seems I can not pregrow file in PostgreSQL. I found a trick: if i created an empty table and i insert dummy data, then i insert good data, then i suppress dummy data,then i vacuum the table (but not a full vacuum) i got a large file with freespace at the begining of the file. If allthe files were created with that tricks and larger than the maximum data COPYed, i will have no fragmented files. JG
On Wed, Mar 20, 2013 at 7:13 AM, jg <jg@rilk.com> wrote:
Now, there is a partition rotation script, that suppress old tables when some size limit happens.
Let suppose, that this script runs and suppress only one table qith few days of data, then recreates a new empty one.
It sounds like you are using partitioned tables. your partitions should be divided up such that they help optimize your queries. that is, minimize the number of partitions you need to scan for any given query.
That said, try to make is so that this cleanup script purges whole partitions, not just deleting some rows. That way new data will fill in space without fragmentation.
Hi, > It sounds like you are using partitioned tables. your partitions should be > divided up such that they help optimize your queries. that is, minimize the > number of partitions you need to scan for any given query. > > That said, try to make is so that this cleanup script purges whole > partitions, not just deleting some rows. That way new data will fill in > space without fragmentation. The rotated script, as explained, just drops tables and creates empty ones. There are only COPY and SELECT in this database. The problem seems that the IO pattern creates higly fragmented files. I have some files with 1,440 fragments a day. JG
On Wed, Mar 20, 2013 at 9:53 AM, jg <jg@rilk.com> wrote:
The rotated script, as explained, just drops tables and creates empty ones.
That doesn't make sense then, to have fragmentation if you are creating new tables with fresh data copied into them. The files should be pretty much sequentially written.
Ohhhh.... I see. You're using Windows. Maybe you need some OS with a better file system that doesn't fragment like that?
On 03/20/2013 07:14 AM, Vick Khera wrote: > > On Wed, Mar 20, 2013 at 9:53 AM, jg <jg@rilk.com <mailto:jg@rilk.com>> > wrote: > > The rotated script, as explained, just drops tables and creates > empty ones. > > > That doesn't make sense then, to have fragmentation if you are creating > new tables with fresh data copied into them. The files should be pretty > much sequentially written. I think the problem is here: "Every minute, sequentially, a batch load 10.000 rows of 250 bytes with a COPY. After a day, i got a database with 50 tables with 1.440 set of 10.000 rows." So if I am reading it right the table starts with 10,000 rows then 10,000 rows are added each minute during the day. > > Ohhhh.... I see. You're using Windows. Maybe you need some OS with a > better file system that doesn't fragment like that? > -- Adrian Klaver adrian.klaver@gmail.com
Hi, > That doesn't make sense then, to have fragmentation if you are creating new > tables with fresh data copied into them. The files should be pretty much > sequentially written. > > Ohhhh.... I see. You're using Windows. Maybe you need some OS with a better > file system that doesn't fragment like that? I know OS other than Windows will be better. But, I think on any OS, I would have some trouble because the pattern of IO. Each minute, each of the 50 tables has 10,000 new rows COPYed. Normaly the filesystem try to keep the file (under the tables) continous with few fragment. But the pattern is such, that it seems to me really difficult to prevent it. Do you have any idea to mitigate the problem on Windows ? JG
jg, 20.03.2013 12:13: > I suspect the heavy fragmented files to the cause of the IO wait > grows (PostgreSQL on WIndows). > >How to cope with that? I would first investigate that it's *really* the fragmentation. As a database does a lot of random IO, fragmentation isn't such a big issue. You could use e.g. contig[1] from SysInternals to de-fragment the data files and then check if that really improves performance. Thomas [1] http://technet.microsoft.com/de-de/sysinternals/bb897428
Hi, I create a test cas on Linux: postgres=# create table a (v int); postgres=# create table b (v int); Then a while(true) over the following script where 24577 and 24580 are the files of the tables a and b #!/bin/sh psql test -c 'insert into a select generate_series(1,100000,1);' psql test -c 'insert into b select generate_series(1,100000,1);' psql test -c 'checkpoint;' /usr/sbin/filefrag -v 24577 24580 ls -lh 24577 24580 After few minutes, I got 100 extend by files. The file fragmentation happens on Windows and Linux, too. I not sure that the Wait IO on Windows is related to file fragmentation. I try to find a way to analyse the situation. JG
Hi, Atfer 30 minutes, on my Linux computer, with 2 files fill one after the other. I got a fragmented files with many back step: # /usr/sbin/filefrag -v 24586 Filesystem type is: ef53 File size of 24586 is 822231040 (200740 blocks, blocksize 4096) ext logical physical expected length flags 0 0 2263040 2048 1 2048 2271232 2265087 2048 2 4096 2277376 2273279 2048 3 6144 2289664 2279423 2048 4 8192 2306048 2291711 658 5 8850 2306707 2306705 1390 6 10240 2316288 2308096 2048 7 12288 2308097 2318335 102 8 12390 2328576 2308198 1946 9 14336 2336768 2330521 2048 10 16384 2347008 2338815 4096 11 20480 2357248 2351103 2048 12 22528 2385920 2359295 4096 13 26624 2416640 2390015 2048 14 28672 2424832 2418687 4096 15 32768 2439168 2428927 2048 16 34816 2582528 2441215 2048 17 36864 2940928 2584575 2048 18 38912 3045376 2942975 2048 19 40960 1845248 3047423 2048 20 43008 1910784 1847295 2048 21 45056 2017280 1912831 2048 22 47104 2029568 2019327 2048 23 49152 2146304 2031615 2048 24 51200 2213888 2148351 2048 25 53248 3096576 2215935 2048 26 55296 40960 3098623 2048 27 57344 90112 43007 2048 28 59392 124928 92159 2048 29 61440 102400 126975 2048 30 63488 161792 104447 2048 31 65536 164609 163839 680 32 66216 243712 165288 1368 33 67584 307200 245079 2048 34 69632 372736 309247 2048 35 71680 448512 374783 2048 36 73728 495616 450559 2048 37 75776 577536 497663 2048 38 77824 649216 579583 2048 39 79872 724992 651263 2048 40 81920 757760 727039 2048 41 83968 849920 759807 2048 42 86016 909312 851967 2048 43 88064 929792 911359 2048 44 90112 972800 931839 2048 45 92160 968704 974847 604 46 92764 1040384 969307 1444 47 94208 1081344 1041827 2048 48 96256 1134592 1083391 2048 49 98304 1171456 1136639 2048 50 100352 1165312 1173503 2048 51 102400 1202176 1167359 2048 52 104448 1234944 1204223 2048 53 106496 1267712 1236991 2048 54 108544 1298432 1269759 2048 55 110592 1325056 1300479 2048 56 112640 1372160 1327103 2048 57 114688 1384448 1374207 2048 58 116736 1433600 1386495 2048 59 118784 1452032 1435647 2048 60 120832 1499136 1454079 2048 61 122880 1529856 1501183 2048 62 124928 1560576 1531903 2048 63 126976 1687552 1562623 2048 64 129024 2125824 1689599 2048 65 131072 534560 2127871 2048 66 133120 544800 536607 2048 67 135168 1056800 546847 2048 68 137216 2629789 1058847 6144 69 143360 2867200 2635932 2048 70 145408 2887680 2869247 2048 71 147456 75776 2889727 2048 72 149504 2990080 77823 2048 73 151552 3014656 2992127 2048 74 153600 3094528 3016703 2048 75 155648 3117056 3096575 2048 76 157696 63488 3119103 2048 77 159744 190464 65535 2048 78 161792 215040 192511 2048 79 163840 284672 217087 2048 80 165888 378880 286719 2048 81 167936 419840 380927 2048 82 169984 432128 421887 2048 83 172032 501760 434175 2048 84 174080 598016 503807 2048 85 176128 659456 600063 2048 86 178176 700416 661503 2048 87 180224 772096 702463 2048 88 182272 829440 774143 2048 89 184320 864256 831487 2048 90 186368 903168 866303 2048 91 188416 1030144 905215 2048 92 190464 1255424 1032191 2048 93 192512 1431552 1257471 2048 94 194560 1542144 1433599 2048 95 196608 1732608 1544191 2048 96 198656 1740800 1734655 2048 97 200704 1787904 1742847 36 eof 24586: 98 extents found You can see 3 back steps: position 26, 71, 76. Just imagine, 50 files over few days, the files become heavely fragmented with many back steps. So a sequential scan (logicaly from PostgreSQL point of view) is in fact random (filesystem point of view). JG Le Mercredi 20 Mars 2013 15:47 CET, jg@rilk.com a écrit: > Hi, > > I create a test cas on Linux: > postgres=# create table a (v int); > postgres=# create table b (v int); > > > Then a while(true) over the following script where 24577 and 24580 are the files of the tables a and b > #!/bin/sh > psql test -c 'insert into a select generate_series(1,100000,1);' > psql test -c 'insert into b select generate_series(1,100000,1);' > psql test -c 'checkpoint;' > /usr/sbin/filefrag -v 24577 24580 > ls -lh 24577 24580 > > After few minutes, I got 100 extend by files. > > > The file fragmentation happens on Windows and Linux, too. > > > I not sure that the Wait IO on Windows is related to file fragmentation. > I try to find a way to analyse the situation. > > > JG > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Cordialement, Jean-Gérard Pailloncy
On 2013-03-20, jg <jg@rilk.com> wrote: > Hi, > > I have a PostgreSQL database with 50 tables. > Every minute, sequentially, a batch load 10.000 rows of 250 bytes with a COPY. > > After a day, i got a database with 50 tables with 1.440 set of 10.000 rows. > The tables are cleany and naturally clustered by the inserted timestamp. > Each table has data in a file with 1.440 fragments (each day) > > Now, there is a partition rotation script, that suppress old tables when some size limit happens. > Let suppose, that this script runs and suppress only one table qith few days of data, then recreates a new empty one. > > I got a disk freespace very fragmented, the space used by the rotated table. > > Then some COPY inserts new data, the tables got new data in theirs files and continue to be fragmented. > The new tables begins to grows from the begining of the free space to and is more fragmented that ever. > > But all the data are always clustered in the tables. > > After few more rotated tables, all the tables are heavily fragmented and even if the data is clustered inside the filefragments are spread all over the drive. > > After few days, I see IO wait grows and grows, even when the size of the database stabilises due to the rotation script. > > I suspect the heavy fragmented files to the cause of the IO wait grows (PostgreSQL on WIndows). > How to cope with that ? > It seems I can not pregrow file in PostgreSQL. > I found a trick: if i created an empty table and i insert dummy >data, then i insert good data, then i suppress dummy data, then i >vacuum the table (but not a full vacuum) i got a large file with >freespace at the begining of the file. If all the files were created >with that tricks and larger than the maximum data COPYed, i will have >no fragmented files. I assume you're using generate_series to do the dummy data in a single insert. it might be faster to insert the dummy data through one connection, then insert the first good data through a second connection then rollback the insert of the dummy data. and vacuum the table. -- ⚂⚃ 100% natural