开发者

Encoding problem wirh JDBC and MySQL

开发者 https://www.devze.com 2022-12-30 10:53 出处:网络
I\'m grabbing data from RSS-channels, sanitize it and save in the database. I use java, tidy, MySQL and JDBC.

I'm grabbing data from RSS-channels, sanitize it and save in the database. I use java, tidy, MySQL and JDBC.

Steps:

  1. I grab RSS-records. It's OK.
  2. I sanitize html with tidy. Here is one transformation. Tidy automatically converts strings like "So it’s unlikely" to "So it’s unlikely".
  3. I save this string to the table

MySQL scheme is

CREATE TABLE IF NOT EXISTS `rss_item_safe_texts` (
  `id` int(10) unsigned NOT NULL,
  `title` varchar(1000) NOT NULL,
  `link` varchar(255) NOT NULL,
  `description` mediumtext NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

JDBC connection URL is

connUrl = "jdbc:mysql://" + host + "/" + database + "?user=" + username + "&password=" + password + "&useUnicode=true&characterEncoding=UTF-8";

Java code is

PreparedStatement updateSafeTextSt = conn.prepareStatement("UPDATE `rss_item_safe_texts` SET `title` = ?, `link` = ?, `description` = ? WHERE `id` = ?");
updateSafeTextSt.setString(1, EscapingUtils.escapeXssInjection(title));
updateSafeTextSt.se开发者_开发技巧tString(2, link);
updateSafeTextSt.setString(3, EscapingUtils.escapeXssInjection(description));
updateSafeTextSt.setInt(4, itemId);
updateSafeTextSt.execute();
updateSafeTextSt.close();

As a result I see broken characters in the database like "So it'? unlikely". The same I see then output text on the web-page (utf-8 page).


Don't forget there are lots of other places where encoding can be set differently. Check, for example, if your database/table/column has correct encodings to begin with. Also, I usually set everything I can to utf8 in MySQL:

mysql> show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
0

精彩评论

暂无评论...
验证码 换一张
取 消