On Tuesday 09 December 2014 01:58:11 Vincent de Phily wrote:
> On Monday 08 December 2014 10:17:37 Jeff Janes wrote:
> > You can `strace` for the lseek command to see which file handles it is
> > currently working on, and
> > use lsof to turn those into names. You want to look at where it is in the
> > table files, not the index files.
>
> Neat trick, thanks. I'll try it in the morning when I'm fresh. Are the files
> read in C-sorting order ?
I did this (stracing the first 1000 read() and write() every 30s) and kept an
eye on progress throughout the day. It follows a (to me) surprising pattern,
which looks unnecessarily time-consuming :
It reads about 8G of the table (often doing a similar number of writes, but
not always), then starts reading the pkey index and the second index (only 2
indexes on this table), reading both of them fully (some writes as well, but
not as many as for the table), which takes around 8h.
And the cycle apparently repeats: process a few more GB of the table, then go
reprocess both indexes fully. A rough estimate is that it spends ~6x more time
(re)processing the indexes as it does processing the table (looking at data
size alone the ratio would be 41x, but the indexes go faster). I'm probably
lucky to only have two indexes on this table.
Is that the expected behaviour ? Why ? I can imagine that it skips some pages
and needs to go back, but then it should only do this once at the end of the
process, or it should only revisit a handfull of pages. Is that something that
can be improved by throwing more maintenance_work_mem at it ?
--
Vincent de Phily