Re: Why we lost Uber as a user - Mailing list pgsql-hackers

From Alfred Perlstein
Subject Re: Why we lost Uber as a user
Date
Msg-id 20e8f6de-f4db-e96a-3c54-71ace3c8dbbc@freebsd.org
Whole thread Raw
In response to Re: Why we lost Uber as a user  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
List pgsql-hackers

On 8/2/16 10:02 PM, Mark Kirkwood wrote:
> On 03/08/16 02:27, Robert Haas wrote:
>>
>> Personally, I think that incremental surgery on our current heap
>> format to try to fix this is not going to get very far.  If you look
>> at the history of this, 8.3 was a huge release for timely cleanup of
>> dead tuple.  There was also significant progress in 8.4 as a result of
>> 5da9da71c44f27ba48fdad08ef263bf70e43e689.   As far as I can recall, we
>> then made no progress at all in 9.0 - 9.4.  We made a very small
>> improvement in 9.5 with 94028691609f8e148bd4ce72c46163f018832a5b, but
>> that's pretty niche.  In 9.6, we have "snapshot too old", which I'd
>> argue is potentially a large improvement, but it was big and invasive
>> and will no doubt pose code maintenance hazards in the years to come;
>> also, many people won't be able to use it or won't realize that they
>> should use it.  I think it is likely that further incremental
>> improvements here will be quite hard to find, and the amount of effort
>> will be large relative to the amount of benefit.  I think we need a
>> new storage format where the bloat is cleanly separated from the data
>> rather than intermingled with it; every other major RDMS works that
>> way.  Perhaps this is a case of "the grass is greener on the other
>> side of the fence", but I don't think so.
>>
> Yeah, I think this is a good summary of the state of play.
>
> The only other new db development to use a non-overwriting design like 
> ours that I know of was Jim Starky's Falcon engine for (ironically) 
> Mysql 6.0. Not sure if anyone is still progressing that at all now.
>
> I do wonder if Uber could have successfully tamed dead tuple bloat 
> with aggressive per-table autovacuum settings (and if in fact they 
> tried), but as I think Robert said earlier, it is pretty easy to come 
> up with a highly update (or insert + delete) workload that makes for a 
> pretty ugly bloat component even with real aggressive autovacuuming.
I also wonder if they had used "star schema" which to my understanding 
would mean multiple tables to replace the single-table that has multiple 
indecies to work around the write amplification problem in postgresql.

>
> Cheers
>
> Mark
>
>
>




pgsql-hackers by date:

Previous
From: Christopher Browne
Date:
Subject: Re: [GENERAL] C++ port of Postgres
Next
From: Alfred Perlstein
Date:
Subject: Re: Why we lost Uber as a user