Thread: Solving sudoku using SQL

Solving sudoku using SQL

From

Tatsuo Ishii

Date:

08 December 2010, 09:56:35

Hi,

I found an interesting article which explains how to solve "sudoku" by
using SQL(unfortunately written in Japanese):
http://codezine.jp/article/detail/1629?p=2

In the article example sudoku is:
http://static.shoeisha.jp/cz/static/images/article/1629/problem4.gif

To solve the example you create following table:

CREATE TABLE nums (n INT NOT NULL PRIMARY KEY);
INSERT INTO nums VALUES (1);
INSERT INTO nums VALUES (2);
INSERT INTO nums VALUES (3);
INSERT INTO nums VALUES (4);
INSERT INTO nums VALUES (5);
INSERT INTO nums VALUES (6);
INSERT INTO nums VALUES (7);
INSERT INTO nums VALUES (8);
INSERT INTO nums VALUES (9);

Then execute the huge SELECT:
http://codezine.jp/static/images/article/1629/html/sql.html

In the page first one takes infinite time by PostgreSQL 9.0.1.  Next
one can be executed very quickly because the join order is explicitely
specified by cross join syntax. This seems to be a limitation of
PostgreSQL optimizer and I would like it be removed. Comments?

BTW according to the article, Oracle execute the first SELECT(the one
PostgreSQL takes infinite time) in 10 seconds. It seems Oracle's
optimizer is sperior in this regard.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: Solving sudoku using SQL

From

Merlin Moncure

Date:

08 December 2010, 11:20:59

On Wed, Dec 8, 2010 at 8:57 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
> Hi,
>
> I found an interesting article which explains how to solve "sudoku" by
> using SQL(unfortunately written in Japanese):
> http://codezine.jp/article/detail/1629?p=2
>
> In the article example sudoku is:
> http://static.shoeisha.jp/cz/static/images/article/1629/problem4.gif
>
> To solve the example you create following table:
>
> CREATE TABLE nums (n INT NOT NULL PRIMARY KEY);
> INSERT INTO nums VALUES (1);
> INSERT INTO nums VALUES (2);
> INSERT INTO nums VALUES (3);
> INSERT INTO nums VALUES (4);
> INSERT INTO nums VALUES (5);
> INSERT INTO nums VALUES (6);
> INSERT INTO nums VALUES (7);
> INSERT INTO nums VALUES (8);
> INSERT INTO nums VALUES (9);
>
> Then execute the huge SELECT:
> http://codezine.jp/static/images/article/1629/html/sql.html
>
> In the page first one takes infinite time by PostgreSQL 9.0.1.  Next
> one can be executed very quickly because the join order is explicitely
> specified by cross join syntax. This seems to be a limitation of
> PostgreSQL optimizer and I would like it be removed. Comments?
>
> BTW according to the article, Oracle execute the first SELECT(the one
> PostgreSQL takes infinite time) in 10 seconds. It seems Oracle's
> optimizer is sperior in this regard.

benchmark what you've got against this (ported to postgres by marcin mank):
http://www.pastie.org/684163

merlin

Re: Solving sudoku using SQL

From

Robert Haas

Date:

08 December 2010, 13:09:51

On Wed, Dec 8, 2010 at 8:57 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
> In the page first one takes infinite time by PostgreSQL 9.0.1.  Next
> one can be executed very quickly because the join order is explicitely
> specified by cross join syntax. This seems to be a limitation of
> PostgreSQL optimizer and I would like it be removed. Comments?

It's not easy to make the optimizer degrade gracefully when confronted
with a very large number of tables, but I agree it would be worthwhile
if we could figure out how to do it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Solving sudoku using SQL

From

Tom Lane

Date:

08 December 2010, 13:31:57

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Dec 8, 2010 at 8:57 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>> In the page first one takes infinite time by PostgreSQL 9.0.1. �Next
>> one can be executed very quickly because the join order is explicitely
>> specified by cross join syntax. This seems to be a limitation of
>> PostgreSQL optimizer and I would like it be removed. Comments?

> It's not easy to make the optimizer degrade gracefully when confronted
> with a very large number of tables, but I agree it would be worthwhile
> if we could figure out how to do it.

There is something funny going on there; it's not just that the planner
is slower with a large flat search space.  It is slower, but only maybe
5x or so.  What I'm seeing is that it actually finds a much worse plan
(very much larger estimated cost as well as actual runtime) when given
the flat problem.  That seems like a bug: a constrained search ought
never find a better solution than an unconstrained search.

It may be that the search heuristics in joinrels.c are failing.  If
so, it may be impractical to fix, ie making this better would slow down
more-typical planning problems enormously.  But it'd be nice to
understand exactly where it's going off the rails.
        regards, tom lane

Re: Solving sudoku using SQL

From

Tom Lane

Date:

08 December 2010, 13:45:54

I wrote:
> There is something funny going on there; it's not just that the planner
> is slower with a large flat search space.  It is slower, but only maybe
> 5x or so.  What I'm seeing is that it actually finds a much worse plan
> (very much larger estimated cost as well as actual runtime) when given
> the flat problem.  That seems like a bug: a constrained search ought
> never find a better solution than an unconstrained search.

Oh, wait: the problem of course is that it's switching into GEQO mode
and hence *not* doing a complete search.  Doh.  If you turn GEQO off
then planning takes ~ forever with the flat version of the query.

We could "fix" that by forcibly breaking up the search problem in the
same fashion that join_collapse_limit does, but I'm sure we'd get
complaints about that approach.

The real fix in my mind is to replace GEQO search with something
smarter.  I wonder what happened to the SA patch that was reported
on at PGCon.
        regards, tom lane

Re: Solving sudoku using SQL

From

Jan Urbański

Date:

08 December 2010, 14:01:56

On 08/12/10 18:45, Tom Lane wrote:
> The real fix in my mind is to replace GEQO search with something
> smarter.  I wonder what happened to the SA patch that was reported
> on at PGCon.

I got distracted with other things :( I'll try to plan the two queries
with SA and see what the results are. If they're good it'll certainly
raise my motivation on finishing up the module and proposing it.

Cheers,
Jan

PS: https://github.com/wulczer/saio, although it's still ugly as hell :(

J

Re: Solving sudoku using SQL

From

Jan Urbański

Date:

08 December 2010, 16:11:04

On 08/12/10 19:02, Jan Urbański wrote:
> On 08/12/10 18:45, Tom Lane wrote:
>> The real fix in my mind is to replace GEQO search with something
>> smarter.  I wonder what happened to the SA patch that was reported
>> on at PGCon.
> 
> I got distracted with other things :( I'll try to plan the two queries
> with SA and see what the results are. If they're good it'll certainly
> raise my motivation on finishing up the module and proposing it.

I'm pleasantly surprised that the SA code as it stands today, setting
the equlibrium factor to 8 and temperature reduction factor to 0.4, the
query takes 1799.662 ms in total. With the default values it runs
forever, but I long discovered that defaults taken from the original
paper are not well suited for my PG implementation (I could plug my MSc
thesis here, but I'm way too shy for that). 8/0.4 are values where I got
better results than GEQO for Andres' monster-query.

Maybe it actually has some value after all... Let's see if I can
untangle myself from plpython in time to clean up that code before January.

Cheers,
Jan

Re: Solving sudoku using SQL

From

Tom Lane

Date:

08 December 2010, 16:18:37

Jan Urbański <wulczer@wulczer.org> writes:
> I'm pleasantly surprised that the SA code as it stands today, setting
> the equlibrium factor to 8 and temperature reduction factor to 0.4, the
> query takes 1799.662 ms in total.

Cool.

> With the default values it runs
> forever, but I long discovered that defaults taken from the original
> paper are not well suited for my PG implementation (I could plug my MSc
> thesis here, but I'm way too shy for that). 8/0.4 are values where I got
> better results than GEQO for Andres' monster-query.

Hmmm ... "runs forever" is a bit scary.  One of the few good things I
can say about GEQO is that it will terminate in a reasonable amount of
time for even quite large problems.  I would like to think that SA will
also have that property.  I thought that the annealing approach was sure
to terminate in a fixed number of steps?  Or did you mean that the
planner terminated, but produced a horrid plan?
        regards, tom lane

Re: Solving sudoku using SQL

From

Jan Urbański

Date:

08 December 2010, 16:32:28

On 08/12/10 21:18, Tom Lane wrote:
> Jan Urbański <wulczer@wulczer.org> writes:
>> I'm pleasantly surprised that the SA code as it stands today, setting
>> the equlibrium factor to 8 and temperature reduction factor to 0.4, the
>> query takes 1799.662 ms in total.
> 
> Cool.
> 
>> With the default values it runs
>> forever, but I long discovered that defaults taken from the original
>> paper are not well suited for my PG implementation (I could plug my MSc
>> thesis here, but I'm way too shy for that). 8/0.4 are values where I got
>> better results than GEQO for Andres' monster-query.
> 
> Hmmm ... "runs forever" is a bit scary.  One of the few good things I
> can say about GEQO is that it will terminate in a reasonable amount of
> time for even quite large problems.  I would like to think that SA will
> also have that property.  I thought that the annealing approach was sure
> to terminate in a fixed number of steps?  Or did you mean that the
> planner terminated, but produced a horrid plan?

It finishes after a bound number of steps, but with high values of
temperature reduction it takes a lot of time for the temperature to fall
low enough to consider the system "frozen", so that number of steps is big.

With SA you start with a temperature that's linearily dependant on the
size of the query, and back off exponentially. Each step means work tha
also depends on the size of the query, so big queries can mean expensive
steps. With q=0.9 and initial temperature=<very-big> it takes too much
time to plan.

The good thing is that it's trivial to implement a hard cut-off value,
which will stop annealing after a fixed number of steps (regardless of
the current temperature) that would serve as a safety valve.

Jan

Re: Solving sudoku using SQL

From

Tom Lane

Date:

08 December 2010, 16:39:09

Jan Urbański <wulczer@wulczer.org> writes:
> On 08/12/10 21:18, Tom Lane wrote:
>> Hmmm ... "runs forever" is a bit scary.

> With SA you start with a temperature that's linearily dependant on the
> size of the query, and back off exponentially. Each step means work tha
> also depends on the size of the query, so big queries can mean expensive
> steps. With q=0.9 and initial temperature=<very-big> it takes too much
> time to plan.

> The good thing is that it's trivial to implement a hard cut-off value,
> which will stop annealing after a fixed number of steps (regardless of
> the current temperature) that would serve as a safety valve.

Well, let's wait and see whether experience says we need that.  A
hard-wired cutoff risks returning a pretty bad plan, and we have no
experience yet with how fail-soft SA is.

Something that might be more useful is an escape that quits as soon as
the best plan's estimated cost is less than something-or-other.  There's
no point in expending more planner time to improve the plan than you can
hope to recoup at execution.
        regards, tom lane

Re: Solving sudoku using SQL

From

Alvaro Herrera

Date:

09 December 2010, 11:50:17

Excerpts from Jan Urbański's message of mié dic 08 17:11:44 -0300 2010:

> I'm pleasantly surprised that the SA code as it stands today, setting
> the equlibrium factor to 8 and temperature reduction factor to 0.4, the
> query takes 1799.662 ms in total.

That's 5x better than Oracle :-)

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Solving sudoku using SQL

From

Dimitri Fontaine

Date:

09 December 2010, 12:57:07

Merlin Moncure <mmoncure@gmail.com> writes:
> On Wed, Dec 8, 2010 at 8:57 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>>
>> Then execute the huge SELECT:
>> http://codezine.jp/static/images/article/1629/html/sql.html
>
> benchmark what you've got against this (ported to postgres by marcin mank):
> http://www.pastie.org/684163

It that this one ?

http://archives.postgresql.org/message-id/e08cc0400911042333o5361b21cu2c9438f82b1e55ce@mail.gmail.com

--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Solving sudoku using SQL

From

Merlin Moncure

Date:

09 December 2010, 13:08:27

On Thu, Dec 9, 2010 at 11:56 AM, Dimitri Fontaine
<dimitri@2ndquadrant.fr> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> On Wed, Dec 8, 2010 at 8:57 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>>>
>>> Then execute the huge SELECT:
>>> http://codezine.jp/static/images/article/1629/html/sql.html
>>
>> benchmark what you've got against this (ported to postgres by marcin mank):
>> http://www.pastie.org/684163
>
> It that this one ?
>
> http://archives.postgresql.org/message-id/e08cc0400911042333o5361b21cu2c9438f82b1e55ce@mail.gmail.com

sure is  --  I missed the formatted version.  The original query also
an Oracle original.  If you remove the 'where ind = 0', you can watch
the database solve the puzzle, which is pretty neat.

merlin