开发者

Parse URL without DNS queries in Java

开发者 https://www.devze.com 2023-01-23 03:08 出处:网络
I\'m parsing squid logs with Java. It seemed appropriate to use URL class. This class, however, makes a DNS request, which indefinitely slows down parsing. Are there other easy ways to extract hostnam

I'm parsing squid logs with Java. It seemed appropriate to use URL class. This class, however, makes a DNS request, which indefinitely slows down parsing. Are there other easy ways to extract hostname and port from an url?

Conditions

  • url schema might be ommited in squid logs
  • an absent (default) port should be derived for ftp, http, https protocols

Log example:

1288763851.129    295 10.10.100.10 TCP_MISS/200 435 GET http://win.mail.ru/cgi-bin/checknew? - DIRECT/217.69.128.52 text/plain
1288763881.110    275 10.10.100.10 TCP_MISS/200 434 GET http://win.mail.ru/cgi-bin/checknew? - DIRECT/217.69.128.52 text/plain
1288763883.093  60001 10.10.102.202 TCP_MISS/503 0 CONNECT www.update.microsoft.com:443 - DIRECT/- -
1288763884.301      0 10.10.102.202 NONE/400 3506 GET / - NONE/- text/html
1288763911.194    359 10.10.100.10 TCP_MISS/200 435 GET http://win.mail.ru/cgi-bin/checknew? - DIRECT/217.69.128.52 text/plain
1288763941.097    264 10.10.100.10 TCP_MISS/200 434 GET http://win.mail.ru/cgi-bin/checknew? - DIRECT/217.69.128.52 text/plain
1288763944.094  59777 10.10.102.202 TCP_MISS/503 0 CONNECT www.update.microsoft.com:443 - DIRECT/- 开发者_JAVA百科-
1288763971.123    289 10.10.100.10 TCP_MISS/200 434 GET http://win.mail.ru/cgi-bin/checknew? - DIRECT/217.69.128.52 text/plain
1288764002.257   1421 10.10.100.10 TCP_MISS/200 435 GET http://win.mail.ru/cgi-bin/checknew? - DIRECT/217.69.128.52 text/plain

EDIT: I had to write my own class parser for this task. The idea is to use InetAddress if thestring has an IP or simple string for hostnames.


You could try Restlet's Reference class.


Use the java.net.URI class.

0

精彩评论

暂无评论...
验证码 换一张
取 消