Re: Searchable chess positions in a Postgress DB - Mailing list pgsql-general

From Gavin Flower
Subject Re: Searchable chess positions in a Postgress DB
Date
Msg-id 4F874E2D.4030709@archidevsys.co.nz
Whole thread Raw
In response to Re: Searchable chess positions in a Postgress DB  (Gavin Flower <GavinFlower@archidevsys.co.nz>)
Responses Re: Searchable chess positions in a Postgress DB
List pgsql-general
On 11/04/12 21:24, Gavin Flower wrote:
On 11/04/12 19:15, Sidney Cadot wrote:
Dear all,

As a hobby project, I am toying around with a database containing
about 5 million chess games. On average, these games have about 80
positions (~ 40 moves by both black and white), which means there are
about 400 million chess positions in there.

I have written code to extract these positions, and now I want to put
them into a Postgres database. Specifically, I want to do this in a
way that allows *fast* lookups of positions, e.g. "give me all
positions that have a White King on c4 and either a Black Bishop or
White Knight on f7".

Currently, my "Positions" table looks like this:
     Column       |  Type   | Modifiers
-------------------+---------+-----------gameindex         | integer | not nullplyindex          | integer | not nullpseudofenboard    | text    | not nullfenside           | text    | not nullfencastling       | text    | not nullfenenpassant      | text    | not nullpossiblemovecount | integer | not nullisincheck         | boolean | not null
Indexes:   "positions_pkey" PRIMARY KEY, btree (gameindex, plyindex)
Foreign-key constraints:   "positions_gameindex_fkey" FOREIGN KEY (gameindex) REFERENCES
games(gameindex)

The "PseudoFenBoard" field currently holds a string describing the
position. For example, the starting position of chess looks like this:

"rnbqkbnr/pppppppp/________/________/________/________/PPPPPPPP/RNBQKBNR"

This design allows me to formulate the kind of positional queries that
I want (by using regular expression matching), but executing them will
involve a slow, linear traversal of the 400M table rows, which is not
desirable.

I am toying around with the ugly idea to make a "Positions" table that
has a single field for each of the squares, e.g.

CREATE TABLE Position2 (   GameIndex INTEGER NOT NULL,   PlyIndex  INTEGER NOT NULL,   a1        "char"  NOT NULL,   a2        "char"  NOT NULL,                                   -- (60 fields defs omitted)   h7        "char"  NOT NULL,   h8        "char"  NOT NULL
);

This would allow the creation of indices on each of the 64 fields
separately, which should help to achieve near-instantaneous position
query performance, especially after gathering proper statistics for
all the field-specific indices.

I realize that this design is quite ugly, so I would be interested to
hear if there are nicer alternatives that can perform equally well.

Also, above I use the 1-byte "char" type. Is this the only type in
PostGres that is guaranteed to be just a single byte, or are there
better alternatives? A 13-state enum would be best (listing the 6
white pieces, 6 black pieces, and 'empty' states for every square on
the board) but as I understand from the documentation, enums always up
take 4 bytes per entry.

Any ideas for improvement would be greatly appreciated.

How aboutsomething like the following (game and postion would have more fields in practice, like comments and where played)?


DROP TABLE IF EXISTS game CASCADE;

CREATE TABLE game
(
    id int PRIMARY KEY,
    name_white text,
    name_black text,
    played timestamptz
);

CREATE TABLE position
(
  
id int PRIMARY KEY,
  
game_id int REFERENCES game (id),
  
ply int
);

CREATE TABLE piece
(
    id int PRIMARY KEY,
    position_id int REFERENCES position (id),
    rank char, -- 1...8 from white's perspective
    file char, -- a...h
    white boolean,
    type char -- P.R,N,B,K,Q
);

CREATE UNIQUE INDEX square ON piece (rank, file, type, white);


SELECT
   p.position_id
FROM
    piece p
WHERE
   (       p.white
       AND p.type = 'K'
       AND p.file = 'c'
       AND p.rank = '4'
   )
   AND
   (
       ((NOT p.white AND p.type = 'B') OR (p.white AND p.type = 'K'))
       AND p.file = 'f'
       AND p.rank = '7'
   );


Cheers,
Gavin

There was a blatantly obvious flaw in the above query: the pices checked, should belong to the same position!

That I only discovered the flaw when I mentally checked the SQL on the way to work the folowing day.

The following, hopefully, fixes the problem

SELECT
    p1.position_id
FROM
    piece AS p1 JOIN piece AS p2 USING (position_id)
WHERE
    (
        p1.white
        AND p1.type = 'K'
        AND p1.file = 'c'
        AND p1.rank = '4'
    )
    AND
    (
        ((NOT p2.white AND p2.type = 'B') OR (p2.white AND p2.type = 'K'))
        AND p2.file = 'f'
        AND p2.rank = '7'       
    );
   

pgsql-general by date:

Previous
From: Bret Stern
Date:
Subject: Installer Questions (NSIS)
Next
From: Michael Nolan
Date:
Subject: Re: Searchable chess positions in a Postgress DB