pov and tsort - Mailing list pgsql-hackers
From | Joel Jacobson |
---|---|
Subject | pov and tsort |
Date | |
Msg-id | AANLkTinxm99N5X6-am=Y4_5EkKfU+_EytNkLG89oOeoR@mail.gmail.com Whole thread Raw |
List | pgsql-hackers |
Greetings hackers! I've renamed the project fsnapshot to pov, PostgreSQL Object Version control system. :) I've also created a quite nifty SQL command line based utility to perform graph/tree sorting algorithms on data in the database, without the need to export the data to some external application. The utility is named tsort and behaves exactly like the GNU coreutils tool tsort (or the Perl Power Tools tool tcsort). Since tree sorting is a quite common task in computer science, perhaps some of you will find it useful. Source code: https://github.com/gluefinance/pov/blob/master/sql/schema/pov/functions/tsort.pl Also, if I have understood everything correctly, the toposort algorithm in pg_dump_sort.c does not does its best ensuring the objects are created in the same order, if the oids would change or if objects would be dropped/added. The algorithm is probably a simple DFS (depth-first sorting) or BFS (breadth-first sorting) without any effort made to do sorting when selecting the successor objects. It's probably a quite challenging task to implement the extra necessary sorting in C, on top of the standard DFS or BFS algorithm. I put together a small quite nifty plperl utility to do the task, as a simple proof-of-concept and also because I thought it makes sense to provide such a method accessible from the SQL prompt, since tree/graph sorting is probably one of the most commonly performed tasks in many applications dealing. It is quite powerful with all the basic tree sorting features I could think of. I hope you find it useful and perhaps even a bit fun :) Here is a small example on how to use it to analyze pg_depend: create table a ( id int not null, primary key (id)); create table aa ( id int not null, primary key (id), foreign key (id) references a(id)); create table ab ( id int not null, primary key (id), foreign key (id) references a(id)); create table aaa ( id int not null, primary key (id), foreign key (id) references aa(id)); create table aab ( id int not null, primary key (id), foreign key (id) references aa(id)); create table aba ( id int not null, primary key (id), foreign key (id) references ab(id)); create table abb ( id int not null, primary key (id), foreign key (id) references ab(id)); Instead of using oid, one should use unique names for each object (composed differently based on the object type). If doing so, the example below would render exactly the same result even in two databases with completely different oids for the same schema. glue=# select oid,relname from pg_class where relname ~ '^a.?.?$'; oid | relname -------+---------52354 | a52359 | aa52399 | aba52369 | ab52389 | aab52379 | aaa52409 | abb (7 rows) Find root objects (source objects, no predecessors), sort numerically: glue=# SELECT tsort(array_to_string(array_agg(distinct objid || ',' || refobjid), ','), ',', 0, 'sub {$a <=> $b}', 'SOURCE') FROM pg_depend; tsort -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------------------------------------------------{0,10939,10940,10942,10943,10945,10946,10948,10949,10951,10952,10955,10956,10958,10959,10962,10963,10966,10967,10970,10971,10973,10974,10976,10977,10980,10981,10983,10984,10985,10986,10988,10989,10991,10992,10994,10995,10998,10999,11001,11002,11004,11005,11008,11009,11011,11012,11014,11015,11018,11019,11021,11022,11024,11025,11028,11029,11031,11032,11034,1103 5,11037,11038,11040,11041,11043,11044,11046,11047,11049,11050,11053,11054,11056,11057,11074,11076,11078,11080,11082,11084,11086,11088,11090,11092,11094,11096,11098,11100,11102,11104,11106,11108,11110,11112,11114,11116,11118,11120,11122,11124,11126,11128,11130,11132,11134,11136,11138,11140,11142,11144,11146,11148,11150,11152,11154,11156,11158,11160,11162,11164, 11166,11168,11170,11172,11174,11176,11178,11180,11182,11184,11186,11188,11190,11192,11193,11194,11195,11196,11197,11198,11199,11200,11201,11202,11203,11204,11205,11206,11207,11208,11209,11210,11211,11212,11214,11216,11218,11220,11222,11224,11226,11228,11230,11232,11234,11236,11238,11240,11241,11242,11243,11244,11245,11246,11247,11248,11249,11250,11251,11252,11 253,11254,11255,11256,11257,11258,11259,11260,11261,11262,11263,11264,11266,11268,11270,11272,11274,11276,11278,11280,11282,11284,11286,11288,11290,11292,11297,11299,11301,11303,11305,11307,11309,11311,11313,11315,11317,11319,11321,11323,11325,11340,11344,11345,11348,11349,11352,11353,11355,11356,11359,11360,11363,11364,11367,11368,11371,11372,11375,11376,1137 9,11380,11383,11384,11387,11388,11391,11392,11395,11396,11398,11399,11402,11403,11405,11406,11409,11410,11413,11414,11417,11418,11421,11422,11425,11426,11429,11430,11433,11434,11437,11438,11441,11442,11444,11445,11448,11450,11451,11453,11455,11456,11458,11460,11461,11463,11465,11466,11468,11470,11471,11473,11475,11476,11478,11480,11481,11483,11484,11487,11488, 11491,11492,11495,11496,11498,11499,11502,11503,11506,11507,11510,11511,11514,11515,11518,11519,11522,11523,11526,11527,11530,11531,11533,11534,11536,11537,11539,11540,11542,11543,11545,11546,11548,11549,11551,11552,11555,11556,52267,52342,52344,52345,52346,52347,52348,52349,52350,52351,52355,52360,52365,52366,52367,52368,52370,52375,52376,52377,52378,52380,52 382,52385,52386,52387,52388,52390,52392,52395,52396,52397,52398,52400,52402,52405,52406,52407,52408,52410,52412,52415,52416,52417,52418} (1 row) Find connected objects to the tables we created above and sort like pg_dump_sort.c (DFS/BFS): glue=# SELECT tsort(array_to_string(array_agg(distinct objid || ',' || refobjid), ','),',',0,'DFS','CONN_INCL','INCLUDE','52354,52359,52399,52369,52389,52379,52409','BOTH') FROM pg_depend; tsort -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------{52398,52405,52386,52390,52391,52400,52401,52387,52380,52381,52406,52375,52396,52418,52407,52417,52397,52410,52411,52408,52404,52385,52395,52394,52365,52415,52378,52355,52356,52376,52402,52403,52399,52416,52414,52372,52373,52377,52374,52366,52370,52371,52369,52412,52413,52409,52392,52393,52389,52360,52361,52388,52384,52362,52363,52367,52382,52383,52379,52368, 52364,52359,52357,52358,52354,2200} (1 row) Find connected objects to the tables we created above and sort each lexically ascending, preserving the order: glue=# SELECT tsort(array_to_string(array_agg(distinct objid || ',' || refobjid), ','),',',0,'sub {$a cmp $b}','CONN_INCL','INCLUDE','52354,52359,52399,52369,52389,52379,52409','BOTH') FROM pg_depend; tsort -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------{52355,52356,52360,52361,52365,52366,52367,52368,52364,52370,52371,52375,52376,52377,52378,52374,52357,52358,52354,52380,52381,52382,52383,52385,52386,52387,52388,52384,52379,52390,52391,52392,52393,52395,52396,52397,52398,52394,52362,52363,52359,52389,52400,52401,52402,52403,52405,52406,52407,52408,52404,52399,52410,52411,52412,52413,52415,52416,52417,52418, 52414,52372,52373,52369,52409,2200} (1 row) Calling the utility with no arguments shows the help: glue=# SELECT tsort(); tsort ---------------------------------------------------------------------------------- DESCRIPTIONtsort - return a tree's nodes in topological order SYNOPSIStsort(edges text, delimiter text, debug integer, algorithm text,selection text, operator text, nodes text, directiontext); OUTPUT nodes text[] Array of nodes in topologically sorted order INPUT PARAMETERS Parameter Type Regex Description ============================================================================ edges text ^.+$ Node pairs,separated by [delimiter]. delimiter text ^.*$ Node separator in [edges], default is ' ', i.e. singleblank space. debug integer Print debug information using RAISE DEBUG: 0 no debug(default) 1 some debug 2 verbose debug algorithm text Sorting algorithm: DFS depth-first (default) explores as far as possible along each branch before backtracking. BFS breadth-first explores all the neighboringnodes, then for each of those nearest nodes, it explores their unexplored neighbor nodes, and so on. ^sub sort using perl subroutine. examples: #sort numerically ascending sub {$a <=> $b} # sort numerically descending sub {$b <=> $a} #sort lexically ascending sub {$a cmp $b} # sort lexically descending sub {$b cmp $a} # sort case-insensitively sub {uc($a) cmp uc($b)} For more examples, please goto: http://perldoc.perl.org/functions/sort.html The following options will not affect the order of the nodes in the result, they only control which nodes are includedin the result: Parameter Type Regex Description ============================================================================ selection text Selectionof nodes, used by [operator]: ALL select all nodes (default) ISOLATED select nodes without any successors nor predecessors SOURCE select nodes with successors but no predecessors SINK select nodes with predecessors but no successors CONN_INCL select nodes connected to [nodes], including [nodes] CONN_EXCL select nodes connected to [nodes], excluding [nodes] separated by [delimiter] operator text Include or exclude nodes in [selection]: INCLUDE include nodes(default) EXCLUDE exclude nodes The following options are only applicable if, [selection] is CONN_INCL or CONN_EXCL Parameter Type Regex Description ============================================================================ nodes text select nodes connected to [nodes] NULL not applicable (default) [nodes] the start nodes, separated by [delimiter] direction text direction to look for connected nodes BOTH traverse bothsuccessors and predecessors (default) UP only traversepredecessors DOWN only traverse successors -- Best regards, Joel Jacobson Glue Finance
pgsql-hackers by date: