Re: html to postgres... - Mailing list pgsql-general

From Mitch Vincent
Subject Re: html to postgres...
Date
Msg-id 007701c10e0d$721691d0$1251000a@Mitch
Whole thread Raw
In response to html to postgres...  (Tony Grant <tony@animaproductions.com>)
List pgsql-general
> Yes I was vague - the heat is coming back...

Still somewhat vague, though we're getting there!

> These are film and director pages in a movie site. I am looking at
> HTML->XML tools then with a parser I should be able to create a tab
> delimited text file.

    Ok, it seems that you're going to have to write something to do the
inserting into the database as you're creating a custom schema and such I
assume.. A few thousand web pages with a few fields shouldn't take very long
at all to import, I'm guessing that it won't be all that much data...
Something quick in Perl/C or even PHP would work after you got the
individual HTML files parsed into your comma delimited file.

Assuming you want to parse these pages into fields (name, descriptions,
whatever else) and that seems to me to be the hardest thing to do especially
if the pages weren't written with that in mind... What (XML?) tool do you
intend on using to parse out these fields and how will it know what goes in
what field? Have you written the pages in a way so that you can
programatically decide everything you need to?

    I probably didn't tell you anything you didn't already know... Sorry if
it wasn't any help :-)

> The objective is now that we will be moving from hundreds to thousands
> of pages a database generated site seem more reasonable...

    I see.. Well, it looks like you're probably going to have to write
something to do the parsing for you and after that is done, inserting it
into the database is cake.

    Good luck!

-Mitch




pgsql-general by date:

Previous
From: Vince Vielhaber
Date:
Subject: Re: html to postgres...
Next
From: "Thalis A. Kalfigopoulos"
Date:
Subject: Re: How do i give comment for each Field