Re: US Census database (Tiger 2004FE) - Mailing list pgsql-hackers

From Mark Woodward
Subject Re: US Census database (Tiger 2004FE)
Date
Msg-id 22892.24.91.171.78.1123160487.squirrel@mail.mohawksoft.com
Whole thread Raw
In response to Re: US Census database (Tiger 2004FE)  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
> * Mark Woodward (pgsql@mohawksoft.com) wrote:
>> > How big dumped & compressed?  I may be able to host it depending on
>> how
>> > big it ends up being...
>>
>> It's been running for about an hour now, and it is up to 3.3G.
>
> Not too bad.  I had 2003 (iirc) loaded into 7.4 at one point.

Cool.

>
>> pg_dump tiger | gzip > tiger.pgz
>
> What db version are you using, how did you load it (ogr2ogr?), is it in
> postgis form?  Fun questions, all of them. :)

8.0.3, in simple pg_dump form.

I loaded it with a utility I wrote a long time ago for tigerua. It is a
fixed width text file to PG utility. It takes a "control" file that
describes the fields, field widths, and field name. It creates a SQL
"create table" statement, and also reads all the records from a control
file into a PostgreSQL copy command. A control file looks something like:

# Zip+4 codes
# Tiger 2003 Record Conversion File
# Copyright (c) 2004 Mark L. Woodward, Mohawk Software
TABLE RTZ
1:I     RT
4:I     VERSION
10:T    TLID
3:S     RTSQ
4:Z     ZIP4L
4:Z     ZIP4R


The first number is the field width in chars, second is an optional type
(there are a few, 'I' means ignore, 'Z' means zipcode, etc.) if no type is
given, then varchar is assumed. Last is the column name.


>
>> I'll let you know. Hopefully, it will fit on  DVD.
>
> I guess your upload pipe isn't very big?  snail-mail is slow... :)

Never underestimate the bandwidth of a few DVDs and FedEx. Do the math, it
is embarrasing.

>
>> You know, ... maybe pg_dump needs a progress bar? (How would it do that,
>> I
>> wonder?)
>
> Using the new functions in 8.1 which provide size-on-disk of things,
> hopefully there's also a function to give a tuple-size or similar as
> well.  It'd be a high estimate due to dead tuples but should be
> sufficient for a progress bar.
>
>     Thanks,
>
>         Stephen
>



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: US Census database (Tiger 2004FE)
Next
From: Tino Wildenhain
Date:
Subject: Re: US Census database (Tiger 2004FE)