Re: Looking for a large database for testing - Mailing list pgsql-performance

From Sebastian Hennebrueder
Subject Re: Looking for a large database for testing
Date
Msg-id 430261A0.3040708@laliluna.de
Whole thread Raw
In response to Re: Looking for a large database for testing  (Sebastian Hennebrueder <usenet@laliluna.de>)
List pgsql-performance
Sebastian Hennebrueder schrieb:

>Tino Wildenhain schrieb:
>
>
>
>
>>You can download the wikipedia content. Just browse the wikimedia site.
>>Its some work to change the data to be able to import into postgres,
>>but at least you have a lot real world data - in many languages.
>>
>>
>
>I have just found it. Here there is a link
>http://download.wikimedia.org/
>They have content in multiple languages and dumps up to 20 GB.
>
>
>
Just if anybody wants to import the wikipedia data. I had considerable
problems to get the proper encoding working. I downloaded the german
content from wikipedia, which is a dump of a unicode encoded database of
mysql (utf8)

I used MySql 4.1 on Windows 2000 to read the dump and then copied the
data with a small application to postgreSQL
In
mysql.ini you should configure the setting
max_allowed_packet = 10M
I set it to 10, wich worked out. Else you can not import the dump into
mysql. The error message was something like lost connection ....
The default encoding of mysql was latin1 which worked.

Then I imported the dump
mysql -uYourUserName -pPassword --default-character-set=utf8 database <
downloadedAndUnzippedFile
The default-character-set is very important

Create table in postgres (not with all the columns)
CREATE TABLE content
(
  cur_id int4 NOT NULL DEFAULT nextval('public.cur_cur_id_seq'::text),
  cur_namespace int2 NOT NULL DEFAULT (0)::smallint,
  cur_title varchar(255) NOT NULL DEFAULT ''::character varying,
  cur_text text NOT NULL,
  cur_comment text,
  cur_user int4 NOT NULL DEFAULT 0,
  cur_user_text varchar(255) NOT NULL DEFAULT ''::character varying,
  cur_timestamp varchar(14) NOT NULL DEFAULT ''::character varying
) ;

After this I copied the data from mySql to postgres with a small Java
application. The code is not beautiful.

    private void copyEntries() throws Exception {
        Class.forName("org.postgresql.Driver");
        Class.forName("com.mysql.jdbc.Driver");
        Connection conMySQL = DriverManager.getConnection(
                "jdbc:mysql://localhost/wikidb", "root", "mysql");
        Connection conPostgreSQL = DriverManager.getConnection(
                "jdbc:postgresql://localhost/wiki", "postgres", "p");
        Statement selectStatement = conMySQL.createStatement();
        StringBuffer sqlQuery = new StringBuffer();
        sqlQuery.append("insert into content (");
        sqlQuery
                .append("cur_id, cur_namespace, cur_title, cur_text,
cur_comment, cur_user, ");
        sqlQuery.append("cur_user_text , cur_timestamp) ");
        sqlQuery.append("values (?,?,?,?,?,?,?,?)");

        PreparedStatement insertStatement = conPostgreSQL
                .prepareStatement(sqlQuery.toString());

        // get total rows
        java.sql.ResultSet resultSet = selectStatement
                .executeQuery("select count(*) from cur");
        resultSet.next();
        int iMax = resultSet.getInt(1);


        int i = 0;
        while (i < iMax) {
            resultSet = selectStatement
                    .executeQuery("select * from cur limit "+i +", 2000");
            while (resultSet.next()) {
                i++;
                if (i % 100 == 0)
                    System.out.println("" + i + " von " + iMax);
                insertStatement.setInt(1, resultSet.getInt(1));
                insertStatement.setInt(2, resultSet.getInt(2));
                insertStatement.setString(3, resultSet.getString(3));
                insertStatement.setString(4, resultSet.getString(4));
// this blob field is utf-8 encoded
                byte comment[] = resultSet.getBytes(5);

                insertStatement.setString(5, new String(comment, "UTF-8"));
                insertStatement.setInt(6, resultSet.getInt(6));
                insertStatement.setString(7, resultSet.getString(7));
                insertStatement.setString(8, resultSet.getString(8));
                insertStatement.execute();
            }
        }
    }

--
Best Regards / Viele Grüße

Sebastian Hennebrueder

----

http://www.laliluna.de

Tutorials for JSP, JavaServer Faces, Struts, Hibernate and EJB

Get support, education and consulting for these technologies.


pgsql-performance by date:

Previous
From: Alex Turner
Date:
Subject: Re: choosing RAID level for xlogs
Next
From: "Gregory S. Williamson"
Date:
Subject: Re: choosing RAID level for xlogs