Tom Lane wrote:
> Dirk Riehle <dirk@riehle.org> writes:
>> For one analysis, I created a table of about 15 columns, and then ran an
>> insert with a subselect on the table, where the select was drawing from
>> three other tables, merging over ids. One of the other tables has about
>> 60M rows.
>
>> The insert ran for about 18h before running out of disk space. Before
>> the query, there had been about 1TB free on the disk.
>
>> After the insert query failed, the disk space was not made available
>> again; PostgreSQL did not free it up (or would not free it up quickly).
>
> What that part sounds like is you mistyped the insert such that it
> was inserting a huge number of rows. It's not too hard to do if
> you get the join condition wrong --- what you meant to be a sane
> join can easily turn into a Cartesian product. In theory vacuum
> could reclaim the space eventually, but it'd take awhile.
Hmm... here the insert:
insert into t select ... from commits c, diffs d, sloc_metrics sm, where
d.commit_id = c.id and sm.diff_id = d.id;
sm has the 46M rows, d has 60M rows. I don't know enough about database
engines to understand why it would not be able to incrementally do the
matching but rather might run into a cartesian product?
In any case, I pared down the tables and it ran through quickly without
further problems. So maybe I had a typo in there.
>> I rebooted soon thereafter.
>
>> During boot, fsck (must have been fsck) gave me tons of freeing up inode
>> messages.
>
> And this part is a filesystem bug; it cannot possibly be Postgres'
> fault that the filesystem got corrupted.
One would think so? But the reboot should have stopped PostgreSQL properly.
Thanks for the help!
Dirk
--
Phone: +1 (650) 215 3459
Blog: http://dirkriehle.com
http://twitter.com/dirkriehle