Thread: Solving sudoku using SQL
Hi, I found an interesting article which explains how to solve "sudoku" by using SQL(unfortunately written in Japanese): http://codezine.jp/article/detail/1629?p=2 In the article example sudoku is: http://static.shoeisha.jp/cz/static/images/article/1629/problem4.gif To solve the example you create following table: CREATE TABLE nums (n INT NOT NULL PRIMARY KEY); INSERT INTO nums VALUES (1); INSERT INTO nums VALUES (2); INSERT INTO nums VALUES (3); INSERT INTO nums VALUES (4); INSERT INTO nums VALUES (5); INSERT INTO nums VALUES (6); INSERT INTO nums VALUES (7); INSERT INTO nums VALUES (8); INSERT INTO nums VALUES (9); Then execute the huge SELECT: http://codezine.jp/static/images/article/1629/html/sql.html In the page first one takes infinite time by PostgreSQL 9.0.1. Next one can be executed very quickly because the join order is explicitely specified by cross join syntax. This seems to be a limitation of PostgreSQL optimizer and I would like it be removed. Comments? BTW according to the article, Oracle execute the first SELECT(the one PostgreSQL takes infinite time) in 10 seconds. It seems Oracle's optimizer is sperior in this regard. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
On Wed, Dec 8, 2010 at 8:57 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: > Hi, > > I found an interesting article which explains how to solve "sudoku" by > using SQL(unfortunately written in Japanese): > http://codezine.jp/article/detail/1629?p=2 > > In the article example sudoku is: > http://static.shoeisha.jp/cz/static/images/article/1629/problem4.gif > > To solve the example you create following table: > > CREATE TABLE nums (n INT NOT NULL PRIMARY KEY); > INSERT INTO nums VALUES (1); > INSERT INTO nums VALUES (2); > INSERT INTO nums VALUES (3); > INSERT INTO nums VALUES (4); > INSERT INTO nums VALUES (5); > INSERT INTO nums VALUES (6); > INSERT INTO nums VALUES (7); > INSERT INTO nums VALUES (8); > INSERT INTO nums VALUES (9); > > Then execute the huge SELECT: > http://codezine.jp/static/images/article/1629/html/sql.html > > In the page first one takes infinite time by PostgreSQL 9.0.1. Next > one can be executed very quickly because the join order is explicitely > specified by cross join syntax. This seems to be a limitation of > PostgreSQL optimizer and I would like it be removed. Comments? > > BTW according to the article, Oracle execute the first SELECT(the one > PostgreSQL takes infinite time) in 10 seconds. It seems Oracle's > optimizer is sperior in this regard. benchmark what you've got against this (ported to postgres by marcin mank): http://www.pastie.org/684163 merlin
On Wed, Dec 8, 2010 at 8:57 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: > In the page first one takes infinite time by PostgreSQL 9.0.1. Next > one can be executed very quickly because the join order is explicitely > specified by cross join syntax. This seems to be a limitation of > PostgreSQL optimizer and I would like it be removed. Comments? It's not easy to make the optimizer degrade gracefully when confronted with a very large number of tables, but I agree it would be worthwhile if we could figure out how to do it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Dec 8, 2010 at 8:57 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >> In the page first one takes infinite time by PostgreSQL 9.0.1. �Next >> one can be executed very quickly because the join order is explicitely >> specified by cross join syntax. This seems to be a limitation of >> PostgreSQL optimizer and I would like it be removed. Comments? > It's not easy to make the optimizer degrade gracefully when confronted > with a very large number of tables, but I agree it would be worthwhile > if we could figure out how to do it. There is something funny going on there; it's not just that the planner is slower with a large flat search space. It is slower, but only maybe 5x or so. What I'm seeing is that it actually finds a much worse plan (very much larger estimated cost as well as actual runtime) when given the flat problem. That seems like a bug: a constrained search ought never find a better solution than an unconstrained search. It may be that the search heuristics in joinrels.c are failing. If so, it may be impractical to fix, ie making this better would slow down more-typical planning problems enormously. But it'd be nice to understand exactly where it's going off the rails. regards, tom lane
I wrote: > There is something funny going on there; it's not just that the planner > is slower with a large flat search space. It is slower, but only maybe > 5x or so. What I'm seeing is that it actually finds a much worse plan > (very much larger estimated cost as well as actual runtime) when given > the flat problem. That seems like a bug: a constrained search ought > never find a better solution than an unconstrained search. Oh, wait: the problem of course is that it's switching into GEQO mode and hence *not* doing a complete search. Doh. If you turn GEQO off then planning takes ~ forever with the flat version of the query. We could "fix" that by forcibly breaking up the search problem in the same fashion that join_collapse_limit does, but I'm sure we'd get complaints about that approach. The real fix in my mind is to replace GEQO search with something smarter. I wonder what happened to the SA patch that was reported on at PGCon. regards, tom lane
On 08/12/10 18:45, Tom Lane wrote: > The real fix in my mind is to replace GEQO search with something > smarter. I wonder what happened to the SA patch that was reported > on at PGCon. I got distracted with other things :( I'll try to plan the two queries with SA and see what the results are. If they're good it'll certainly raise my motivation on finishing up the module and proposing it. Cheers, Jan PS: https://github.com/wulczer/saio, although it's still ugly as hell :( J
On 08/12/10 19:02, Jan Urbański wrote: > On 08/12/10 18:45, Tom Lane wrote: >> The real fix in my mind is to replace GEQO search with something >> smarter. I wonder what happened to the SA patch that was reported >> on at PGCon. > > I got distracted with other things :( I'll try to plan the two queries > with SA and see what the results are. If they're good it'll certainly > raise my motivation on finishing up the module and proposing it. I'm pleasantly surprised that the SA code as it stands today, setting the equlibrium factor to 8 and temperature reduction factor to 0.4, the query takes 1799.662 ms in total. With the default values it runs forever, but I long discovered that defaults taken from the original paper are not well suited for my PG implementation (I could plug my MSc thesis here, but I'm way too shy for that). 8/0.4 are values where I got better results than GEQO for Andres' monster-query. Maybe it actually has some value after all... Let's see if I can untangle myself from plpython in time to clean up that code before January. Cheers, Jan
Jan Urbański <wulczer@wulczer.org> writes: > I'm pleasantly surprised that the SA code as it stands today, setting > the equlibrium factor to 8 and temperature reduction factor to 0.4, the > query takes 1799.662 ms in total. Cool. > With the default values it runs > forever, but I long discovered that defaults taken from the original > paper are not well suited for my PG implementation (I could plug my MSc > thesis here, but I'm way too shy for that). 8/0.4 are values where I got > better results than GEQO for Andres' monster-query. Hmmm ... "runs forever" is a bit scary. One of the few good things I can say about GEQO is that it will terminate in a reasonable amount of time for even quite large problems. I would like to think that SA will also have that property. I thought that the annealing approach was sure to terminate in a fixed number of steps? Or did you mean that the planner terminated, but produced a horrid plan? regards, tom lane
On 08/12/10 21:18, Tom Lane wrote: > Jan Urbański <wulczer@wulczer.org> writes: >> I'm pleasantly surprised that the SA code as it stands today, setting >> the equlibrium factor to 8 and temperature reduction factor to 0.4, the >> query takes 1799.662 ms in total. > > Cool. > >> With the default values it runs >> forever, but I long discovered that defaults taken from the original >> paper are not well suited for my PG implementation (I could plug my MSc >> thesis here, but I'm way too shy for that). 8/0.4 are values where I got >> better results than GEQO for Andres' monster-query. > > Hmmm ... "runs forever" is a bit scary. One of the few good things I > can say about GEQO is that it will terminate in a reasonable amount of > time for even quite large problems. I would like to think that SA will > also have that property. I thought that the annealing approach was sure > to terminate in a fixed number of steps? Or did you mean that the > planner terminated, but produced a horrid plan? It finishes after a bound number of steps, but with high values of temperature reduction it takes a lot of time for the temperature to fall low enough to consider the system "frozen", so that number of steps is big. With SA you start with a temperature that's linearily dependant on the size of the query, and back off exponentially. Each step means work tha also depends on the size of the query, so big queries can mean expensive steps. With q=0.9 and initial temperature=<very-big> it takes too much time to plan. The good thing is that it's trivial to implement a hard cut-off value, which will stop annealing after a fixed number of steps (regardless of the current temperature) that would serve as a safety valve. Jan
Jan Urbański <wulczer@wulczer.org> writes: > On 08/12/10 21:18, Tom Lane wrote: >> Hmmm ... "runs forever" is a bit scary. > With SA you start with a temperature that's linearily dependant on the > size of the query, and back off exponentially. Each step means work tha > also depends on the size of the query, so big queries can mean expensive > steps. With q=0.9 and initial temperature=<very-big> it takes too much > time to plan. > The good thing is that it's trivial to implement a hard cut-off value, > which will stop annealing after a fixed number of steps (regardless of > the current temperature) that would serve as a safety valve. Well, let's wait and see whether experience says we need that. A hard-wired cutoff risks returning a pretty bad plan, and we have no experience yet with how fail-soft SA is. Something that might be more useful is an escape that quits as soon as the best plan's estimated cost is less than something-or-other. There's no point in expending more planner time to improve the plan than you can hope to recoup at execution. regards, tom lane
Excerpts from Jan Urbański's message of mié dic 08 17:11:44 -0300 2010: > I'm pleasantly surprised that the SA code as it stands today, setting > the equlibrium factor to 8 and temperature reduction factor to 0.4, the > query takes 1799.662 ms in total. That's 5x better than Oracle :-) -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Merlin Moncure <mmoncure@gmail.com> writes: > On Wed, Dec 8, 2010 at 8:57 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >> >> Then execute the huge SELECT: >> http://codezine.jp/static/images/article/1629/html/sql.html > > benchmark what you've got against this (ported to postgres by marcin mank): > http://www.pastie.org/684163 It that this one ? http://archives.postgresql.org/message-id/e08cc0400911042333o5361b21cu2c9438f82b1e55ce@mail.gmail.com -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Thu, Dec 9, 2010 at 11:56 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> On Wed, Dec 8, 2010 at 8:57 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >>> >>> Then execute the huge SELECT: >>> http://codezine.jp/static/images/article/1629/html/sql.html >> >> benchmark what you've got against this (ported to postgres by marcin mank): >> http://www.pastie.org/684163 > > It that this one ? > > http://archives.postgresql.org/message-id/e08cc0400911042333o5361b21cu2c9438f82b1e55ce@mail.gmail.com sure is -- I missed the formatted version. The original query also an Oracle original. If you remove the 'where ind = 0', you can watch the database solve the puzzle, which is pretty neat. merlin