Thanks everyone. Sorry for the late reply.
Do you have indexes on all the referencing columns?
I had thought so, but it turns out no, and this appears to be the main cause of the slowness. After adding a couple of extra indexes in the bigger tables, things are going much more smoothly.
write the whole thing into a new SQL schema
This is a really interesting approach I hadn't thought of! We can currently afford a little bit of downtime, but it's helpful to keep this in mind if we ever do this kind of thing again in future.
The two changes we've made are:
- Add a few indexes so that the cascades operate more efficiently
- Move some of the tables (whose ID values don't matter so much to our app) into a separate migration, which can be run before we take down the site. Then only the tables whose IDs matter to the app/user are done while the site is down.
With those changes it looks like we can fit the downtime into the window we have. Thanks for all the advice, much appreciated!