开发者

How to match Arabic Unicode characters in a string with Java?

开发者 https://www.devze.com 2023-02-20 00:59 出处:网络
Greetings All; I have a desktop java application which gives the following output in the console window:

Greetings All;

I have a desktop java application which gives the following output in the console window:

[
{
"ew" : "ana"
"hws" : [
"\u0623\u0646\u0627"
]
}
]

I would like to separate this string:"\u0623\u0646\u0627" from the whole output in order to do further processing on this string only.

I don't know how to do that? But one of the ideas is to use REGEX. But how开发者_StackOverflow中文版 could I do that?

Would you help me.


Given the additional information

The output shall be arabic letters not \u064A...etc. My idea was to search the output till the \u064A... lines and convert them to arabic. Have you get my point? I don't know how to solve this, I am a beginer in java. Sorry for the confusion and thank you for your response.

And that the input comes from http://www.google.com/transliterate/arabic?tlqt=1&langpair=en|ar&text=ana,m­asry&&tl_app=1 you can solve it like this:

import java.net.*;
import java.io.*;
import java.util.*;
import java.util.regex.*;

public class URLConnectionReader {
    public static void main(String[] args) throws Exception {
    URL googleUrl = new URL("http://www.google.com/transliterate/arabic?tlqt=1&langpair=en|ar&text=ana,m­asry&&tl_app=1");
    URLConnection googleUrlc = googleUrl.openConnection();
    BufferedReader in = new BufferedReader(new InputStreamReader(googleUrlc.getInputStream()));
    String inputLine;
    Pattern wordRegex = Pattern.compile("\"(\\\\u[\\da-z]{4})+\"", Pattern.CASE_INSENSITIVE);
    Pattern charRegex = Pattern.compile("\\\\u([\\da-z]{4})", Pattern.CASE_INSENSITIVE);
    while ((inputLine = in.readLine()) != null) {
        Matcher wordMatch = wordRegex.matcher(inputLine);
        for (int i = 0; wordMatch.find(); i++) {
        StringBuffer arabicBuffer = new StringBuffer();
        Matcher charMatch = charRegex.matcher(wordMatch.group());
        for (int j = 0; charMatch.find(); j++) {
            arabicBuffer.appendCodePoint(Integer.valueOf(charMatch.group(1), 16));
        }
        if (0 < arabicBuffer.length()) {
            System.out.println(arabicBuffer.toString());
        }       
        } 
    }
    in.close(); 
    }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消