Thread: How to estimate size of a row and therefore how much progress this query has made
How to estimate size of a row and therefore how much progress this query has made
From
Greg Stark
Date:
I'm running a long running query; it's been running since this morning. It's inserting data into a table. I'm trying to estimate how many rows it has inserted so far and how much longer it'll run. For one thing I'm running out of space on this disk... How do I measure the amount of space a record takes in a table? The table structure it's inserting into looks like this: Column | Type | Modifiers -------------------+------------------------+----------- str | character varying(100) | foo_id | integer | bar_id | integer | dist | double precision | geom2 | box | The last two columns are actually being populated with nulls. The first three columns are being populated with data. The string in the first column is actually always 12 characters. Incidentally, if I set the field to a maximum of 12 will it save space? (It doesn't in Oracle for example.) And are there any table parameters that allow postgres to pack data more densely or avoid problems if I fill in the null fields later with updates? So far it's inserted quite a bit of data: -rw------- 1 postgres postgres 1.0G Feb 1 13:35 /var/lib/postgres/data/base/992336/7992140 -rw------- 1 postgres postgres 1.0G Feb 1 14:51 /var/lib/postgres/data/base/992336/7992140.1 -rw------- 1 postgres postgres 1.0G Feb 1 16:14 /var/lib/postgres/data/base/992336/7992140.2 -rw------- 1 postgres postgres 1.0G Feb 1 17:27 /var/lib/postgres/data/base/992336/7992140.3 -rw------- 1 postgres postgres 1.0G Feb 1 18:45 /var/lib/postgres/data/base/992336/7992140.4 -rw------- 1 postgres postgres 1.0G Feb 1 20:23 /var/lib/postgres/data/base/992336/7992140.5 -rw------- 1 postgres postgres 1.0G Feb 1 21:34 /var/lib/postgres/data/base/992336/7992140.6 -rw------- 1 postgres postgres 1.0G Feb 1 22:47 /var/lib/postgres/data/base/992336/7992140.7 -rw------- 1 postgres postgres 577M Feb 1 23:30 /var/lib/postgres/data/base/992336/7992140.8 I think it'll insert a total of 107,535,604 records. What I'm trying to figure out is, a) How much space will I need to make sure I have for this table and b) will it finish within the next 24 hours or should I just give up and rethink my plan. I guess I have other questions like are there any tips on making inserting lots of records fast. (Does turning off fsync help on big inserts or only when you have lots of small queries?) I'm also going to have to make at least one index on this table. I'm dreading that. How do I estimate how much space the index will take up? -- greg
Re: How to estimate size of a row and therefore how much progress this query has made
From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes: > How do I measure the amount of space a record takes in a table? > The table structure it's inserting into looks like this: > Column | Type | Modifiers > -------------------+------------------------+----------- > str | character varying(100) | > foo_id | integer | > bar_id | integer | > dist | double precision | > geom2 | box | > The last two columns are actually being populated with nulls. The first three > columns are being populated with data. The string in the first column is > actually always 12 characters. Okay, so 4+12+4+4 data bytes, plus header overhead --- is this 7.3 or something older? Let's assume older --- 24+32+4 bytes per row, plus the null bitmap and alignment padding; say 64 bytes per row. > So far it's inserted quite a bit of data: [8.5 Gig] > I think it'll insert a total of 107,535,604 records. Should be finishing pretty darn soon; in fact, I'd venture you already had a good bit of data in the table. You don't have much over 6 gig accounted for here. > I'm also going to have to make at least one index on this table. I'm dreading > that. How do I estimate how much space the index will take up? As above, except the header overhead is much less (I think 8+4 bytes per row). regards, tom lane
Re: How to estimate size of a row and therefore how much progress this query has made
From
Greg Stark
Date:
Thanks a lot for your help. Tom Lane <tgl@sss.pgh.pa.us> writes: > > The last two columns are actually being populated with nulls. The first three > > columns are being populated with data. The string in the first column is > > actually always 12 characters. > > Okay, so 4+12+4+4 data bytes, plus header overhead --- is this 7.3 or > something older? Let's assume older --- 24+32+4 bytes per row, plus > the null bitmap and alignment padding; say 64 bytes per row. It's 7.3 > > So far it's inserted quite a bit of data: [8.5 Gig] > > > I think it'll insert a total of 107,535,604 records. > > Should be finishing pretty darn soon; in fact, I'd venture you already > had a good bit of data in the table. You don't have much over 6 gig > accounted for here. Nope, the table was empty previously. I created it just before the insert. Hm, I guess I have a bit of a puzzle. Do OIDs consume space too? This is a INSERT INTO foo () (select ..) style query so the records get OIDs no? I probably should get in the habit of creating all my tables without OIDs since I don't use them. I haven't thought about them previously. -- greg
Re: How to estimate size of a row and therefore how much progress this query has made
From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes: >> Okay, so 4+12+4+4 data bytes, plus header overhead --- is this 7.3 or >> something older? Let's assume older --- 24+32+4 bytes per row, plus >> the null bitmap and alignment padding; say 64 bytes per row. > It's 7.3 In that case it should be 4 or so bytes less per row. How sure are you about your estimated row count? You are way over what I figure the table size ought to be. > Do OIDs consume space too? In 7.3, yes. (Before that, you paid for the space whether you wanted 'em or not.) You could shave another 4 bytes off if you created the table without OIDs. regards, tom lane
Re: How to estimate size of a row and therefore how much progress this query has made
From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes: > > So far it's inserted quite a bit of data: [8.5 Gig] > > > I think it'll insert a total of 107,535,604 records. Sorry, I found the problem. I was overflowing Emacs's integer datatype doing my calculations for how many rows to expect. Postgres's analyze had the correct answer. It's actually going to insert more than that, 375,971,060 records. So that makes it 24G of data which won't fit on the partition. I'll have to look for another way to do this. Thanks again for your help. -- greg