Re: All Taxi Services need Index Clustered Heap Append - Mailing list pgsql-hackers

From David Rowley
Subject Re: All Taxi Services need Index Clustered Heap Append
Date
Msg-id CAKJS1f-Gim0cLBidKTpkGfRr-guWvELNt7JoO7pryCYsY63c7A@mail.gmail.com
Whole thread Raw
In response to Re: All Taxi Services need Index Clustered Heap Append  (Ants Aasma <ants.aasma@eesti.ee>)
List pgsql-hackers
On 4 March 2018 at 23:05, Ants Aasma <ants.aasma@eesti.ee> wrote:
> On Sat, Mar 3, 2018 at 4:53 PM, David Rowley
>> It's a good job someone invented HASH partitioning then.
>>
>> It would be interesting to hear how your benchmarks go using current
>> master + the faster partition pruning patchset [1].  Currently, HASH
>> partitioning does exist in master, just there's no partition pruning
>> for the non-matching partitions, which is why you need [1].
>>
>> I think trying with something like 500-1000 partitions might be a good
>> place to start.
>
> I don't think that will actually help much. 1000 partitions means each
> partition gets data from ~50 vehicles. A 60 tuples per page each page
> in the partitioned able will contain on average 1.2 interesting
> tuples. So you still have almost one page read per row.

hmm, I missed that part about only 60 tuples per page.

It may be worth an experiment with two table, one to hold the day's
worth of data, and a holding table which stores about 1-2 minutes of
data. Each minute or two the holding table could be flushed like:

WITH del AS (DELETE FROM driver_pos_holding RETURNING *)
INSERT INTO driver_pos SELECT * FROM del ORDER BY id,ts;

then perhaps a manual VACUUM of driver_pos_holding... or leave it up
to auto-vacuum...

both tables could be inherited by a single parent to allow queries to
return all rows, or be wrapped up in a UNION ALL view, although an
inherited table should provide better plans than the view in some
cases. Although using an inherited parent would disallow you to use
partitioning if you ever wanted to partition by ts to make the job of
removing old data easier.

Hopefully having 60-120 seconds of driver data will in the holding
table will mean that the tuples for each driver only span 2-3 pages
for that 1-2 minute period in the main table You might then have not
much more than 240 pages to load for a driver after a 4-hour run.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: 2018-03 Commitfest Summary (Andres #1)
Next
From: Fabien COELHO
Date:
Subject: Re: 2018-03 Commitfest Summary (Andres #1)