Re: On Scalability - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: On Scalability |
Date | |
Msg-id | AANLkTinL2Y7-HdfmQzKF-+XxwFV=c4OUVRhJY1KNuNQs@mail.gmail.com Whole thread Raw |
In response to | Re: On Scalability (Vincenzo Romano <vincenzo.romano@notorand.it>) |
List | pgsql-hackers |
On Fri, Jul 30, 2010 at 3:50 PM, Vincenzo Romano <vincenzo.romano@notorand.it> wrote: > 2010/7/30 Josh Berkus <josh@agliodbs.com>: >> >>> Is there anyone who knows whether those algorithms are linear or not? >> >> Read the code? It's really very accessible, and there's lots and lots >> of comments. While the person who wrote the code is around, isn't it >> better to see the real implementation? > > If the programmer(s) who wrote that part is around, a simple hint would suffice. > Even an hint to where look into the code would be very appreciated: the query > planner is not as simple as the "ls" command (which is not that simple any > more, though). > > It looks like I need to go the hard way ... > Starting from postgresql-8.4.4/src/backend/optimizer I think you're approaching this in the wrong way. You've repeatedly said you don't want to do all the work of setting up a test, but trying to search the code for algorithms that might not be linear is not going to be easier. I've been reading this thread and I'm fairly familiar with this code, and I even understand the algorithms pretty well, and I don't know whether they're going to be linear for what you want to do or not. Certainly, the overall task of join planning is exponential-time in the number of *tables*, but if you're just doing SELECT statements on a single table, will that be linear? Tough to say. Certainly, there are a lot of linked lists in there, so if we have any place where we have two nested loops over the list of indices, it won't be linear. I can't think of a place where we do that, but that doesn't mean there isn't one. And even if they are linear, or n log n or something, the coefficients could still be lousy. Theoretical computer science is one of my favorite subjects, but, it doesn't always tell you what you want to know about the real world. It doesn't seem like it should be very hard to figure this out empirically. Just create a big database full of random data. Maybe you could populate one of the columns with something like (random() * 1000)::int. Then you could create partial indices ON (some_other_column) WHERE that_column = <blat> for <blat> in 0..999. Then you could run some test queries and see how you make out. Or, I mean, you can read the source code. That's fine, too. It's just... I've already read the source code quite a few times, and I still don't know the answer. Admittedly, I wasn't trying to answer this specific question, but still - I don't think it's an easy question to answer that way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
pgsql-hackers by date: