pov and tsort - Mailing list pgsql-hackers

From Joel Jacobson
Subject pov and tsort
Date
Msg-id AANLkTinxm99N5X6-am=Y4_5EkKfU+_EytNkLG89oOeoR@mail.gmail.com
Whole thread Raw
List pgsql-hackers
Greetings hackers!

I've renamed the project fsnapshot to pov, PostgreSQL Object Version
control system. :)

I've also created a quite nifty SQL command line based utility to
perform graph/tree sorting algorithms on data in the database, without
the need to export the data to some external application.

The utility is named tsort and behaves exactly like the GNU coreutils
tool tsort (or the Perl Power Tools tool tcsort).

Since tree sorting is a quite common task in computer science, perhaps
some of you will find it useful.

Source code: https://github.com/gluefinance/pov/blob/master/sql/schema/pov/functions/tsort.pl

Also, if I have understood everything correctly, the toposort
algorithm in pg_dump_sort.c does not does its best ensuring the
objects are created in the same order, if the oids would change or if
objects would be dropped/added.

The algorithm is probably a simple DFS (depth-first sorting) or BFS
(breadth-first sorting) without any effort made to do sorting when
selecting the successor objects.

It's probably a quite challenging task to implement the extra
necessary sorting in C, on top of the standard DFS or BFS algorithm.

I put together a small quite nifty plperl utility to do the task, as a
simple proof-of-concept and also because I thought it makes sense to
provide such a method accessible from the SQL prompt,
since tree/graph sorting is probably one of the most commonly
performed tasks in many applications dealing.

It is quite powerful with all the basic tree sorting features I could think of.

I hope you find it useful and perhaps even a bit fun :)

Here is a small example on how to use it to analyze pg_depend:

create table a ( id int not null, primary key (id));
create table aa ( id int not null, primary key (id), foreign key (id)
references a(id));
create table ab ( id int not null, primary key (id), foreign key (id)
references a(id));
create table aaa ( id int not null, primary key (id), foreign key (id)
references aa(id));
create table aab ( id int not null, primary key (id), foreign key (id)
references aa(id));
create table aba ( id int not null, primary key (id), foreign key (id)
references ab(id));
create table abb ( id int not null, primary key (id), foreign key (id)
references ab(id));

Instead of using oid, one should use unique names for each object
(composed differently based on the object type).

If doing so, the example below would render exactly the same result
even in two databases with completely different oids for the same
schema.

glue=# select oid,relname from pg_class where relname ~ '^a.?.?$'; oid  | relname
-------+---------52354 | a52359 | aa52399 | aba52369 | ab52389 | aab52379 | aaa52409 | abb
(7 rows)

Find root objects (source objects, no predecessors), sort numerically:
glue=# SELECT tsort(array_to_string(array_agg(distinct objid || ',' ||
refobjid), ','), ',', 0, 'sub {$a <=> $b}', 'SOURCE') FROM pg_depend;



tsort




--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------------------------------{0,10939,10940,10942,10943,10945,10946,10948,10949,10951,10952,10955,10956,10958,10959,10962,10963,10966,10967,10970,10971,10973,10974,10976,10977,10980,10981,10983,10984,10985,10986,10988,10989,10991,10992,10994,10995,10998,10999,11001,11002,11004,11005,11008,11009,11011,11012,11014,11015,11018,11019,11021,11022,11024,11025,11028,11029,11031,11032,11034,1103

5,11037,11038,11040,11041,11043,11044,11046,11047,11049,11050,11053,11054,11056,11057,11074,11076,11078,11080,11082,11084,11086,11088,11090,11092,11094,11096,11098,11100,11102,11104,11106,11108,11110,11112,11114,11116,11118,11120,11122,11124,11126,11128,11130,11132,11134,11136,11138,11140,11142,11144,11146,11148,11150,11152,11154,11156,11158,11160,11162,11164,

11166,11168,11170,11172,11174,11176,11178,11180,11182,11184,11186,11188,11190,11192,11193,11194,11195,11196,11197,11198,11199,11200,11201,11202,11203,11204,11205,11206,11207,11208,11209,11210,11211,11212,11214,11216,11218,11220,11222,11224,11226,11228,11230,11232,11234,11236,11238,11240,11241,11242,11243,11244,11245,11246,11247,11248,11249,11250,11251,11252,11

253,11254,11255,11256,11257,11258,11259,11260,11261,11262,11263,11264,11266,11268,11270,11272,11274,11276,11278,11280,11282,11284,11286,11288,11290,11292,11297,11299,11301,11303,11305,11307,11309,11311,11313,11315,11317,11319,11321,11323,11325,11340,11344,11345,11348,11349,11352,11353,11355,11356,11359,11360,11363,11364,11367,11368,11371,11372,11375,11376,1137

9,11380,11383,11384,11387,11388,11391,11392,11395,11396,11398,11399,11402,11403,11405,11406,11409,11410,11413,11414,11417,11418,11421,11422,11425,11426,11429,11430,11433,11434,11437,11438,11441,11442,11444,11445,11448,11450,11451,11453,11455,11456,11458,11460,11461,11463,11465,11466,11468,11470,11471,11473,11475,11476,11478,11480,11481,11483,11484,11487,11488,

11491,11492,11495,11496,11498,11499,11502,11503,11506,11507,11510,11511,11514,11515,11518,11519,11522,11523,11526,11527,11530,11531,11533,11534,11536,11537,11539,11540,11542,11543,11545,11546,11548,11549,11551,11552,11555,11556,52267,52342,52344,52345,52346,52347,52348,52349,52350,52351,52355,52360,52365,52366,52367,52368,52370,52375,52376,52377,52378,52380,52

382,52385,52386,52387,52388,52390,52392,52395,52396,52397,52398,52400,52402,52405,52406,52407,52408,52410,52412,52415,52416,52417,52418}
(1 row)


Find connected objects to the tables we created above and sort like
pg_dump_sort.c (DFS/BFS):
glue=# SELECT tsort(array_to_string(array_agg(distinct objid || ',' ||
refobjid), ','),',',0,'DFS','CONN_INCL','INCLUDE','52354,52359,52399,52369,52389,52379,52409','BOTH')
FROM pg_depend;

                                                     tsort


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------{52398,52405,52386,52390,52391,52400,52401,52387,52380,52381,52406,52375,52396,52418,52407,52417,52397,52410,52411,52408,52404,52385,52395,52394,52365,52415,52378,52355,52356,52376,52402,52403,52399,52416,52414,52372,52373,52377,52374,52366,52370,52371,52369,52412,52413,52409,52392,52393,52389,52360,52361,52388,52384,52362,52363,52367,52382,52383,52379,52368,
52364,52359,52357,52358,52354,2200}
(1 row)


Find connected objects to the tables we created above and sort each
lexically ascending, preserving the order:
glue=# SELECT tsort(array_to_string(array_agg(distinct objid || ',' ||
refobjid), ','),',',0,'sub {$a cmp
$b}','CONN_INCL','INCLUDE','52354,52359,52399,52369,52389,52379,52409','BOTH')
FROM pg_depend;

                                                     tsort


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------{52355,52356,52360,52361,52365,52366,52367,52368,52364,52370,52371,52375,52376,52377,52378,52374,52357,52358,52354,52380,52381,52382,52383,52385,52386,52387,52388,52384,52379,52390,52391,52392,52393,52395,52396,52397,52398,52394,52362,52363,52359,52389,52400,52401,52402,52403,52405,52406,52407,52408,52404,52399,52410,52411,52412,52413,52415,52416,52417,52418,
52414,52372,52373,52369,52409,2200}
(1 row)


Calling the utility with no arguments shows the help:

glue=# SELECT tsort();                                     tsort
----------------------------------------------------------------------------------
DESCRIPTIONtsort - return a tree's nodes in topological order

SYNOPSIStsort(edges text, delimiter text, debug integer, algorithm text,selection text, operator text, nodes text,
directiontext); 
OUTPUT    nodes       text[]  Array of nodes in topologically sorted order
INPUT PARAMETERS    Parameter   Type    Regex     Description
============================================================================   edges       text    ^.+$        Node
pairs,separated by [delimiter]. 
    delimiter   text    ^.*$        Node separator in [edges],                                    default is ' ', i.e.
singleblank space. 
    debug       integer             Print debug information using RAISE DEBUG:                        0           no
debug(default)                        1           some debug                        2           verbose debug 
    algorithm   text                Sorting algorithm:                        DFS         depth-first (default)
                          explores as far as possible along each                                    branch before
backtracking.                       BFS         breadth-first                                    explores all the
neighboringnodes,                                    then for each of those nearest nodes,
     it explores their unexplored neighbor                                    nodes, and so on.
^sub       sort using perl subroutine.                                    examples:
#sort numerically ascending                                    sub {$a <=> $b} 
                                    # sort numerically descending                                    sub {$b <=> $a}
                                    #sort lexically ascending                                    sub {$a cmp $b}
                                    # sort lexically descending                                    sub {$b cmp $a}
                                    # sort case-insensitively                                    sub {uc($a) cmp
uc($b)}
                                    For more examples, please goto:
http://perldoc.perl.org/functions/sort.html
    The following options will not affect the order of the nodes in the result,    they only control which nodes are
includedin the result: 
    Parameter   Type    Regex     Description
============================================================================   selection   text
Selectionof nodes, used by [operator]:                        ALL         select all nodes (default)
   ISOLATED    select nodes without any                                    successors nor predecessors
     SOURCE      select nodes with successors                                    but no predecessors
   SINK        select nodes with predecessors                                    but no successors
 CONN_INCL   select nodes connected to [nodes],                                    including [nodes]
   CONN_EXCL   select nodes connected to [nodes],                                    excluding [nodes]
                 separated by [delimiter] 
    operator    text                Include or exclude nodes in [selection]:                        INCLUDE     include
nodes(default)                        EXCLUDE     exclude nodes 
    The following options are only applicable if,    [selection] is CONN_INCL or CONN_EXCL
    Parameter   Type    Regex     Description
============================================================================
    nodes       text                select nodes connected to [nodes]                        NULL        not applicable
(default)                       [nodes]     the start nodes, separated by [delimiter] 

    direction   text                direction to look for connected nodes                        BOTH        traverse
bothsuccessors and                                    predecessors (default)                        UP          only
traversepredecessors                        DOWN        only traverse successors 








--
Best regards,

Joel Jacobson
Glue Finance


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Streaming base backups
Next
From: Itagaki Takahiro
Date:
Subject: Re: SQL/MED - file_fdw