Re: Controlling complexity in queries - Mailing list pgsql-general
From | Jay Levitt |
---|---|
Subject | Re: Controlling complexity in queries |
Date | |
Msg-id | 4EE7D158.7070109@gmail.com Whole thread Raw |
In response to | Re: Controlling complexity in queries (Merlin Moncure <mmoncure@gmail.com>) |
Responses |
Re: Controlling complexity in queries
|
List | pgsql-general |
Merlin Moncure wrote: > Breaking your large queries into functions OTOH can make significant > changes to the plan, often to the worse. As an end-user, I think this is an area where PostgreSQL could really stand out (that and the moon launch). In Rails-land, you don't have The DBA that writes queries. You have a developer and an ORM, and when they step outside that ORM to do the cool things SQL can do, all their DRY habits fall apart, because it's 1990 again and you can either write clear code or fast code but not both. Postgres has made huge inroads with its optimizer. A few years ago, for web startups picking their relational DB, the performant answer was "MySQL for fast, simple, mostly-read queries, PostgreSQL when you grow up." That's become "Probably Postgres", and with 9.2's COUNT(*), it's become "Definitely Postgres". But having to write one big query for performance feels exactly like having to write one big C function with unrolled loops. I'm currently taking a well-factored, function-based query and turning it INTO what Robert James is trying to get OUT of: a monolithic query. > In the end, the performance of your queries is going to be directly > related to how well you map the problem into relational logic It's not just that, though; it's quite possible to think relationally and still fall down. There are plenty of cases where the human eye can see that a modular function can be inlined, but the optimizer can't. I have a pathological case: a query against a database with just a few thousand users takes 1.5 seconds on fast hardware, because it ends up scanning a cartesian product to get 16 rows, even before you get to the nested loops. In fact, most of the time the optimizer does a great job of inlining all my set-returning functions, once 9.0.6/9.1.2 rolled out. I've seen at least three equally ominous pieces that would have to happen to allow DRY, composable SQL: 1. Optional optimization of non-recursive WITH 2. Optional pushdown of WHERE clauses into GROUP BY[1] 3. LATERAL AFAIK, none of these are on anyone's short-term to-do list, and I'm sure none are easy. [1] Since this is my current favorite problem, the pathological case is: select questions.id from questions join ( select u.id from users as u group by u.id ) as s on s.id = questions.user_id where questions.id = 1; With users.id as a primary key, it's obvious that this can return only one row, but it has to scan the users table to get there. See the "Subjquery in a JOIN not getting restricted?" thread on pgsql-performance for Tom's explanation of why that's a hard problem to solve.
pgsql-general by date: