Thread: Row ordering after CREATE TABLE AS...SELECT regexp_split_to_table(source_text, regexp) AS new_column

This is a two-part question:

1) I have a source_text that I want to divide into smaller subunits
that will be contained in rows in a column in a new table.  Is it
absolutely certain that the initial order of the rows in the resultant
table after this  operation:

CREATE TABLE new_table AS SELECT regexp_split_to_table(source_text,
E'regexp') as subunits FROM source_table;

will be the same as the order of these subunits in the original text?
Emphasis *initial order*.

2) I would like to be able to create a serial-type column during
CREATE TABLE AS in the new table that "memorizes" this order so that I
can reconstruct the original text using ORDER BY on that serial
column.  However, I am stumped how to do that.  I do not see how to
put the name of that column into my SELECT statement which generates
the table, and I do not see where else to put it.  Please forgive my
stupidity.

The "work-around" to this problem has been to ALTER my table after its
creation with a new serial-type column.  But this assumes that the
answer to Question 1) above is always "Yes".

Thanking you for your understanding,

John


On Wed, Feb 24, 2010 at 07:51:54AM +0100, John Gage wrote:
> This is a two-part question:
>
> 1) I have a source_text that I want to divide into smaller subunits
> that will be contained in rows in a column in a new table.  Is it
> absolutely certain that the initial order of the rows in the
> resultant table after this  operation:
>
> CREATE TABLE new_table AS SELECT regexp_split_to_table(source_text,
> E'regexp') as subunits FROM source_table;
>
> will be the same as the order of these subunits in the original
> text?  Emphasis *initial order*.

I'd put money on not; this is not what databases are designed for.

>
> 2) I would like to be able to create a serial-type column during
> CREATE TABLE AS in the new table that "memorizes" this order so that
> I can reconstruct the original text using ORDER BY on that serial
> column.  However, I am stumped how to do that.  I do not see how to
> put the name of that column into my SELECT statement which generates
> the table, and I do not see where else to put it.  Please forgive my
> stupidity.

Pre- or append an increasing serial number to the data, and use that
as a column named "initial_order" or something else that will make
it clear to you and other users what it is, and then import.

But if you have the original data, in order, why do you need to be
able to reconstruct it from a database dump? It just looks like
adding a step to add a step, to me.

>
> The "work-around" to this problem has been to ALTER my table after
> its creation with a new serial-type column.  But this assumes that
> the answer to Question 1) above is always "Yes".
>
> Thanking you for your understanding,
>
> John
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

John Gage <jsmgage@numericable.fr> writes:
> This is a two-part question:
> 1) I have a source_text that I want to divide into smaller subunits
> that will be contained in rows in a column in a new table.  Is it
> absolutely certain that the initial order of the rows in the resultant
> table after this  operation:

> CREATE TABLE new_table AS SELECT regexp_split_to_table(source_text,
> E'regexp') as subunits FROM source_table;

> will be the same as the order of these subunits in the original text?

If you have a version new enough to have synchronize_seqscans, you'd
need to turn that off.  Otherwise should be OK.

> 2) I would like to be able to create a serial-type column during
> CREATE TABLE AS in the new table that "memorizes" this order so that I
> can reconstruct the original text using ORDER BY on that serial
> column.  However, I am stumped how to do that.

I think the trick is to get the SRF to be expanded before the serial
values are assigned.  There's more than one way to do it, but I think
(too tired to experiment) this would work:

  CREATE TABLE new_table (id serial, subunits text);

  INSERT INTO new_table(subunits) SELECT regexp_split_to_table(source_text,
    E'regexp') FROM source_table;

            regards, tom lane

Thank you very much for this explanation/reply.  It precisely answers
my question.

Unfortunately, it prompts a new question.  I am using 8.4.2 which I
assume is new enough to trigger a "yes" response to "If you have a
version new enough to have synchronize_seqscans...".  I have
absolutely no idea how to turn that off.  Perhaps the best thing would
be to direct me to the documentation where turning it off is described
so that I can become more autonomous.  However, accompanying that with
explicit directions would be welcome too.

I am in Greenwich +1 timezone, but I fear you are in the 2AM time
zone.  Thank you again,

John


On Feb 24, 2010, at 8:06 AM, Tom Lane wrote:

> John Gage <jsmgage@numericable.fr> writes:
>> This is a two-part question:
>> 1) I have a source_text that I want to divide into smaller subunits
>> that will be contained in rows in a column in a new table.  Is it
>> absolutely certain that the initial order of the rows in the
>> resultant
>> table after this  operation:
>
>> CREATE TABLE new_table AS SELECT regexp_split_to_table(source_text,
>> E'regexp') as subunits FROM source_table;
>
>> will be the same as the order of these subunits in the original text?
>
> If you have a version new enough to have synchronize_seqscans, you'd
> need to turn that off.  Otherwise should be OK.
>
>> 2) I would like to be able to create a serial-type column during
>> CREATE TABLE AS in the new table that "memorizes" this order so
>> that I
>> can reconstruct the original text using ORDER BY on that serial
>> column.  However, I am stumped how to do that.
>
> I think the trick is to get the SRF to be expanded before the serial
> values are assigned.  There's more than one way to do it, but I think
> (too tired to experiment) this would work:
>
>  CREATE TABLE new_table (id serial, subunits text);
>
>  INSERT INTO new_table(subunits) SELECT
> regexp_split_to_table(source_text,
>    E'regexp') FROM source_table;
>
>             regards, tom lane


John Gage wrote:

> Unfortunately, it prompts a new question.  I am using 8.4.2 which I
> assume is new enough to trigger a "yes" response to "If you have a
> version new enough to have synchronize_seqscans...".  I have
> absolutely no idea how to turn that off.  Perhaps the best thing
> would be to direct me to the documentation where turning it off is
> described so that I can become more autonomous.  However,
> accompanying that with explicit directions would be welcome too.

See postgresql.conf, but you probably want to leave it turned on in
general and turn it off only for the specific case of this usage.
(Using the SET command, or ALTER ROLE, or ALTER DATABASE).

See here
http://www.postgresql.org/docs/8.4/static/runtime-config-compatible.html#RUNTIME-CONFIG-COMPATIBLE-VERSION

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support