Thread: "slicing" records

"slicing" records

From
"Claudio Lapidus"
Date:
Hello

I have a table with objects' descriptions:

   id    | length
---------+--------
 object1 |     40
 object2 |     66
 object3 |     12
 object4 |    107
 object5 |    220

But I need to export data to a legacy system that doesn't handle lengths
greater than 50 (don't ask me why...). Instead, it expects the data in this
format:

   id    | length | fragment | offst
---------+--------+----------+-------
 object1 |     40 | whole    |     0
 object2 |     50 | start    |     0
 object2 |     16 | end      |    50
 object3 |     12 | whole    |     0
 object4 |     50 | start    |     0
 object4 |     50 | middle   |    50
 object4 |      7 | end      |   100
 object5 |     50 | start    |     0
 object5 |     50 | middle   |    50
 object5 |     50 | middle   |   100
 object5 |     50 | middle   |   150
 object5 |     20 | end      |   200

So when length becomes greater, it is break up in as many pieces as
necessary, each of max allowed length except the last one, in such a way
that the sum of partial lengths equals the original one.

Now I couldn't manage to get a query capable of doing this. If anybody has
an idea, I'll be very much appreciated.

TIA,
cl.

Re: "slicing" records

From
Richard Huxton
Date:
On Saturday 11 October 2003 06:00, Claudio Lapidus wrote:
> Hello
>
> I have a table with objects' descriptions:
>
>    id    | length
> ---------+--------
>  object1 |     40
>  object2 |     66
>  object3 |     12
>  object4 |    107
>  object5 |    220
>
> But I need to export data to a legacy system that doesn't handle lengths
> greater than 50 (don't ask me why...). Instead, it expects the data in this
> format:
>
>    id    | length | fragment | offst
> ---------+--------+----------+-------
>  object1 |     40 | whole    |     0
>  object2 |     50 | start    |     0
>  object2 |     16 | end      |    50
>  object3 |     12 | whole    |     0
>  object4 |     50 | start    |     0
>  object4 |     50 | middle   |    50
>  object4 |      7 | end      |   100
>  object5 |     50 | start    |     0
>  object5 |     50 | middle   |    50
>  object5 |     50 | middle   |   100
>  object5 |     50 | middle   |   150
>  object5 |     20 | end      |   200

Simplest way is probably to write either a plpgsql function within PG or a
perl script outside it to split up the data.

If doing it within PG, you might find Stephan Szabo's article on set-returning
functions useful (http://techdocs.postgresql.org)

--
  Richard Huxton
  Archonet Ltd

Re: "slicing" records

From
Jan Wieck
Date:
Claudio Lapidus wrote:

> Hello
>
> I have a table with objects' descriptions:
>
>    id    | length
> ---------+--------
>  object1 |     40
>  object2 |     66
>  object3 |     12
>  object4 |    107
>  object5 |    220
>
> But I need to export data to a legacy system that doesn't handle lengths
> greater than 50 (don't ask me why...). Instead, it expects the data in this
> format:

Oh, it's one of these _don't ask me why_ things ... well, then "what is
the target legacy system?" ... hehe.

>
>    id    | length | fragment | offst
> ---------+--------+----------+-------
>  object1 |     40 | whole    |     0
>  object2 |     50 | start    |     0
>  object2 |     16 | end      |    50
>  object3 |     12 | whole    |     0
>  object4 |     50 | start    |     0
>  object4 |     50 | middle   |    50
>  object4 |      7 | end      |   100
>  object5 |     50 | start    |     0
>  object5 |     50 | middle   |    50
>  object5 |     50 | middle   |   100
>  object5 |     50 | middle   |   150
>  object5 |     20 | end      |   200
>

If there is a total upper maximum for the object length and it's not way
too obscenely large, then you can create a view that get's you this:

select id, length(data), data from t1;
  id | length |                      data
----+--------+-------------------------------------------------
   1 |      6 | 123456
   2 |     10 | 1234567890
   3 |     15 | 123456789012345
   4 |     20 | 12345678901234567890
   5 |     27 | 123456789012345678901234567
   6 |     47 | 12345678901234567890123456789012345678901234567
(6 rows)

select * from t1_sliced order by id, fragoffset;
  id | fragoffset | fraglength | fragtype |  fragdata
----+------------+------------+----------+------------
   1 |          0 |          6 | whole    | 123456
   2 |          0 |         10 | whole    | 1234567890
   3 |          0 |         10 | start    | 1234567890
   3 |         10 |          5 | end      | 12345
   4 |          0 |         10 | start    | 1234567890
   4 |         10 |         10 | end      | 1234567890
   5 |          0 |         10 | start    | 1234567890
   5 |         10 |         10 | middle   | 1234567890
   5 |         20 |          7 | end      | 1234567
   6 |          0 |         10 | start    | 1234567890
   6 |         10 |         10 | middle   | 1234567890
   6 |         20 |         10 | middle   | 1234567890
   6 |         30 |         10 | middle   | 1234567890
   6 |         40 |          7 | end      | 1234567
(14 rows)


See attached sample script. I didn't know if you really wanted this
fancy "whole|start|middle|end" string or if that was supposed to be the
data of the fragment itself. Please notice that the view in the sample
is "configured" for data sized up to 100 characters.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #
drop view t1_sliced;
drop table t1;
drop sequence t1_id_seq;
drop table slice_config;
drop function slice_length (integer, integer, integer);
drop function slice_type (integer, integer, integer);

create table t1 (
    id        serial primary key,
    data    text
);

insert into t1 (data) values ('123456');
insert into t1 (data) values ('1234567890');
insert into t1 (data) values ('123456789012345');
insert into t1 (data) values ('12345678901234567890');
insert into t1 (data) values ('123456789012345678901234567');
insert into t1 (data) values ('12345678901234567890123456789012345678901234567');

create table slice_config (
    s_off        integer primary key,
    s_len        integer
);
insert into slice_config (s_off, s_len) values (0, 10);
insert into slice_config (s_off, s_len) values (10, 10);
insert into slice_config (s_off, s_len) values (20, 10);
insert into slice_config (s_off, s_len) values (30, 10);
insert into slice_config (s_off, s_len) values (40, 10);
insert into slice_config (s_off, s_len) values (50, 10);
insert into slice_config (s_off, s_len) values (60, 10);
insert into slice_config (s_off, s_len) values (70, 10);
insert into slice_config (s_off, s_len) values (80, 10);
insert into slice_config (s_off, s_len) values (90, 10);

create function slice_length (integer, integer, integer) returns integer
as '
declare
    data_size    alias for $1;
    slice_off    alias for $2;
    slice_len    alias for $3;
    frag_len    integer;
begin
    frag_len = data_size - slice_off;
    if frag_len > slice_len then
        return slice_len;
    end if;
    return frag_len;
end;
' language plpgsql;

create function slice_type (integer, integer, integer) returns text
as '
declare
    data_size    alias for $1;
    slice_off    alias for $2;
    slice_len    alias for $3;
begin
    if slice_off = 0 then
        if data_size <= slice_len then
            return ''whole'';
        end if;
        return ''start'';
    end if;
    if data_size <= slice_off + slice_len then
        return ''end'';
    end if;
    return ''middle'';
end;
' language plpgsql;

create view t1_sliced as
    select T.id, C.s_off as fragoffset,
        slice_length (length(T.data), C.s_off, C.s_len) as fraglength,
        slice_type (length(T.data), C.s_off, C.s_len) as fragtype,
        substr (T.data, C.s_off + 1, C.s_len) as fragdata
        from t1 T, slice_config C
        where C.s_off = 0 or length(T.data) > C.s_off;

select id, length(data), data from t1;

select * from t1_sliced order by id, fragoffset;

Re: "slicing" records

From
"Claudio Lapidus"
Date:
Jan Wieck wrote:
> Oh, it's one of these _don't ask me why_ things ... well, then "what is
> the target legacy system?" ... hehe.
>
Of course, "don't ask me why" is my own way of saying "I don't know why!"
:-)

> If there is a total upper maximum for the object length and it's not way
> too obscenely large, then you can create a view that get's you this:
>
[snip]
> See attached sample script. I didn't know if you really wanted this
> fancy "whole|start|middle|end" string or if that was supposed to be the
> data of the fragment itself. Please notice that the view in the sample
> is "configured" for data sized up to 100 characters.

No, the destination system actually needs the labels as a flag of the
fragment position or if it's a fragment at all (i.e. not 'whole'). Actually,
your view/functions seem to almost fit my original need, I think they'll
just need minor touch up. Thanks a lot Jan, really nice code.

cheers
cl.