I'm building a gwt app that stores the text of random webpages in a datastore text field. Often the text is formatted UTF-8. All the files of my app are stored as UTF-8 and when I run the application on my local machine the entire process works fine. UTF-8 text is stored as such and retrievable ftom the local version of the app engine as UTF-8. However when I deploy the app to the google app engine somewhere between when I store the text and when I retrieve it it is no longer UTF-8 which causes non-ascii characters to be displayed as ?.
When I view the datastore in the appengine control panel all the special characters appear as ? which leads me to believe that 开发者_StackOverflow社区it is a problem when writing to the database.
Does anyone know how to fix this?
The app itself is a little big. Here's some pseudocode:
Text webPageText = new Text(<STRING THAT CONTAINS UNICODE CHARACTERS>);
/*Some Code to store Text object on datastore
Specifically I'm using javax.jdo.PersistenceManager to do this.
Some Code to retrieve text from datastore. */
String retrievedText = webPageText.getValue();
The problem is that retrievedText comes back with ? instead of unicode characters.
Here's a similar problem in python that I found: Trying to store Utf-8 data in datastore getting UnicodeEncodeError. Though my app is not getting any errors.
Unfortunately I think Java strings are default utf-8 and I can't find any code that will let me declare them explicitly as utf-8.
Edit: I've now built a small webapp that takes in unicode text and stores it in the datastore and then retrieves it with no problems. I still have no idea where the problem is in my original source code but I'm going to change the way my code handles webpage retrieval to match the smaller app that I just built. Thank you everyone for your help.
Fixed same issue by setting both request and response encoding to utf-8. Request encoding results in valid string stored in datastore, without it values will be stored as "????..."
Requests: if you use Apache HTTP Client, this is done in the following way:
Get request:
NameValuePair... params;
...
String url = urlBase + URLEncodedUtils.format(Arrays.asList(params), "UTF-8");
HttpGet httpGet = new HttpGet(url);
Post request:
NameValuePair... params;
...
HttpPost httpPost = new HttpPost(url);
httpPost.setEntity(new UrlEncodedFormEntity(Arrays.asList(params), "UTF-8"));
Response: if you build your response in HttpServlet, this is done in a following way:
HttpServletResponse resp;
...
resp.setContentType("text/html; charset=utf-8");
I tried to convert String to ByteArray and then store it as datastore blob.
//Save String as Blob
Blob webPageText = new Blob(<STRING THAT CONTAINS UNICODE CHARACTERS>.getBytes());
//Retrieve Blob as String
String retrievedText = new String(webPageText.getBytes());
I originally thought this had solved the problem but I had by mistake only tested it on my local server. This code still returns ? instead of unicode characters which leads me to believe that the problem isn't in the datastore but in the transfer from the app engine to the client.
Encoding Solution: Cause Browser use "8859_1"
charset
=> Before
Save Datastore, I convert charset.
new String(req.getParameter("title").getBytes("8859_1"),"utf-8")
When I ran this application on my local machine, it was fine. But when I deployed, I faced the same issue you saw. I solved this problem by:
After
=> Save Datastore Code.
new String(req.getParameter("title").getBytes("utf-8"),"utf-8")
These links may prove useful, afterall:
How to set Google App Engine java Content-Type to UTF-8
http://code.google.com/appengine/docs/python/tools/webapp/buildingtheresponse.html
精彩评论