Re: DBT-3 with SF=20 got failed - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: DBT-3 with SF=20 got failed |
Date | |
Msg-id | CA+TgmoZPVzBrbvqf5rxBWGmaGUdGuOO836ccFrjv_gp8yhgq6A@mail.gmail.com Whole thread Raw |
In response to | Re: DBT-3 with SF=20 got failed (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: DBT-3 with SF=20 got failed
|
List | pgsql-hackers |
On Tue, Sep 8, 2015 at 5:02 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > Also, I'm not sure what other places do you have in mind (could you list > some examples?) but I'd bet we limit the allocation to 1GB because of the > palloc() limit and not because of fear of over-estimates. I don't really think those two things are different from each other. The palloc() limit is a means of enforcing a general policy of limiting all allocations to 1GB except in places where we've made a very conscious decision to allow a specific exception. This limit happens to dovetail nicely with the varlena size limit, so in many cases it is the exactly correct limit just for that reason. But even when, as here, that's not at issue, it's still a useful limit, because there are many ways that some garbage value can get passed to palloc -- bad planner estimates, corrupted tuples, bugs in other parts of our code. And at least on my old MacBook Pro (I haven't tested the current one), passing a sufficiently-large value to malloc() causes a kernel panic. That's probably a particularly bad bug, but there are lots of systems where "accidentally" allocating an unreasonable amount of space will have all kinds of unpleasant consequences. So, I believe that palloc()'s limit improves the overall stability of the system considerably even if it causes some occasional annoyance. Most of the time, you can just palloc() and not worry too much about whether you're going to blow up the machine: you won't, because you aren't going to allocate more than 1GB. Any place that wants to allocate more than that needs to be someplace where we can be pretty sure that we're not going to accidentally allocate some completely unreasonable amount of memory, like say 1TB. Nothing in this discussion convinces me that this is such a place. Note that tuplesort.c and tuplestore.c, the only existing callers of repalloc_huge, only allocate such large amounts of memory when they actually have enough tuples to justify it - it is always based on the actual number of tuples, never an estimate. I think that would be a sound principle here, too. Resizing the hash table to such a large size based on the actual load factor is very reasonable; starting with such a large size seems less so. Admittedly, 512MB is an arbitrary point: and if it so happened that the limit was 256MB or 1GB or 128MB or even 2GB I wouldn't advocate for changing it just for fun. But you're saying we should just remove that limit altogether, and I think that's clearly unreasonable. Do you really want to start out with a TB or even PB-sized hash table when the actual number of tuples is, say, one? That may sound crazy, but I've seen enough bad query plans to know that, yes, we are sometimes off by nine orders of magnitude. This is not a hypothetical problem. >> More importantly, removing the cap on the allocation size makes the >> problem a lot worse. You might be sad if a bad planner estimate >> causes the planner to allocate 1GB when 64MB would have been enough, >> but on modern systems it is not likely to be an enormous problem. If >> a similar mis-estimation causes the planner to allocate 16GB rather >> than 1GB, the opportunity for you to be sad is magnified pretty >> considerably. Therefore, I don't really see the over-estimation bug >> fix as being separate from this one. > > Perhaps. But if you want to absolutely prevent such sadness then maybe you > should not set work_mem that high? I think that's a red herring for a number of reasons. One, the allocation for the hash buckets is only a small portion of the total memory. Two, the fact that you are OK with the hash table growing to a certain size does not mean that you want it to start out that big on the strength of a frequently-flawed planner estimate. > Anyway, I do see this as a rather orthogonal problem - an independent > improvement, mostly unrelated to the bugfix. Even if we decide to redesign > it like this (and I'm not particularly opposed to that, assuming someone > takes the time to measure how expensive the additional resize actually is), > we'll still have to fix the repalloc(). > > So I still fail to see why we shouldn't apply this fix. In all seriousness, that is fine. I respect your opinion; I'm just telling you mine, which happens to be different. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: