Re: Massive parallel queue table causes index deterioration, butREINDEX fails with deadlocks. - Mailing list pgsql-performance

From Corey Huinker
Subject Re: Massive parallel queue table causes index deterioration, butREINDEX fails with deadlocks.
Date
Msg-id CADkLM=eMcZkjuj9mcU3BU1XOzt4BnUtP5p+zBRsRdGLUw8N0sw@mail.gmail.com
Whole thread Raw
In response to Re: Massive parallel queue table causes index deterioration, butREINDEX fails with deadlocks.  (Gunther <raj@gusw.net>)
List pgsql-performance


Anyway, I think the partitioned table is the right and brilliant solution, because an index really isn't required. The actual pending partition will always remain quite small, and being a queue, it doesn't even matter how big it might grow, as long as new rows are inserted at the end and not in the middle of the data file and still there be some way of fast skip over the part of the dead rows at the beginning that have already been processed and moved away.

Good thing is, I don't worry about maintenance window.  I have the leisure to simply tear down my design now and make a better design. What's 2 million transactions if I can re-process them at a rate of 80/s? 7 hours max. I am still in development. So, no need to worry about migration / transition acrobatics. So if I take Corey's steps and envision the final result, not worrying about the transition steps, then I understand this:

1. Create the Queue table partitioned on the pending column, this creates the partition with the pending jobs (on which I set the fillfactor kind of low, maybe 50) and the default partition with all the rest. Of course that allows people with a constant transaction volume to also partition on jobId or completionTime and move chunks out to cold archive storage. But that's beside the current point.

I'm guessing there's a fairly insignificant difference in performance between one true partition and one false partition vs one true partition and a default partition, but I don't have insight into which one is better.  


One question I have though: I imagine our pending partition heap file to now be essentially sequentially organized as a queue. New jobs are appended at the end, old jobs are at the beginning. As pending jobs become completed (pending = false) these initial rows will be marked as dead. So, while the number of live rows will remain small in that pending partition, sequential scans will have to skip over the dead rows in the beginning.

That's basically true, but vacuums are erasing deleted rows, and that space gets re-used. So the table builds up to a working-set size, and I envision it looking like a clock sweep, where your existing rows are at 11pm to 7pm, your new rows are inserting into space at 8pm that was vacuumed clean a while ago, and 9pm and 10pm have deleted rows that haven't been vacuumed yet. Where the empty spot is just keeps cycling through the table.

Of course vacuum eliminates dead rows, but unless I do vacuum full, it will not re-pack the live rows, and that requires an exclusive table lock. So, what is the benefit of vacuuming that pending partition? What I don't want is insertion of new jobs to go into open slots at the beginning of the file. I want them to be appended (in Oracle there is an INSERT /*+APPEND*/ hint for that. How does that work in PostgreSQL? 

See above, the db (tries to) reuse the space space before new space is allocated.

I don't know of an append equivalent for pgsql. If memory servers, the big win of /*+ APPEND */ was that raw data blocks were assembled out-of-band and then just written to disk.
 
Ultimately that partition will amass too many dead rows, then what do I do? I don't think that the OS has a way to truncate files physically from the head, does it? I guess it could set the file pointer from the first block to a later block. But I don't know of an IOCTL/FCNTL command for that. On some OS there is a way of making blocks sparse again, is that how PostgreSQL might do it? Just knock out blocks as sparse from the front of the file?

See clock sweep analogy above.
 

pgsql-performance by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: Query slow for new participants
Next
From: Michael Lewis
Date:
Subject: Re: Aggregate and many LEFT JOIN