开发者

html source does not show all visible data

开发者 https://www.devze.com 2022-12-27 09:20 出处:网络
if you go here: http://whois.domaintools.com/iconplc.com and view the source why can\'t you see the registrant data in the开发者_JAVA百科 HTML source?

if you go here:

http://whois.domaintools.com/iconplc.com

and view the source

why can't you see the registrant data in the开发者_JAVA百科 HTML source?

is it at all possible to get this data through the html source?

this stuff is not in the html source:

Registrant:
ICON Clinical Research
   212 Church Road
   North Wales, PA 19454
   US

   Domain Name: ICONPLC.COM

   Administrative Contact, Technical Contact:
      ICON Clinical Research                
      212 Church Road
      North Wales, PA 19454
      US
      215-616-3359 fax: 123 123 1234

   Record expires on 08-Sep-2019.
   Record created on 12-Dec-2007.

   Domain servers in listed order:

   UDNS1.ULTRADNS.NET           
   UDNS2.ULTRADNS.NET

even after i save the webpage as .html, i am still unable to find the email address


You can use the Selenium C# Client driver to write code that checks for this css locator css=div.whois_record . You can then write code to scrape every
under that particular div. The email address found on the page is an image so you would have to save it.


If you look at the source, they have linked to an ajax application. My guess would be that they are pulling it down after the HTML has loaded, and so the information won't be viewable by looking at the source.

Here is a link talking about how to scrape ajax sites:

How do you scrape AJAX pages?


Looks like the page is put together with AJAX. Firebug in Firefox, or Developer tools in IE should help you get to it.


Because it is generated with JavaScript. Grep the source for whois_data


i have chrome browser and it shows the content you want but not in the same format like this:

ajaxUpdate("3","Registrant:
ICON Clinical Research
   212 Church Road
   North Wales, PA 19454
   US

   Domain Name: ICONPLC.COM

   Administrative Contact, Technical Contact:
      ICON Clinical Research                
      212 Church Road
      North Wales, PA 19454
      US
      215-616-3359 fax: 123 123 1234

   Record expires on 08-Sep-2019.
   Record created on 12-Dec-2007.

   Domain servers in listed order:

   UDNS1.ULTRADNS.NET           
   UDNS2.ULTRADNS.NET")


I just looked at the source and the text you mention is there, with the only mention that it has  s instead of spaces.

<div class=\'whois_record\'>Registrant:<br/>ICON&nbsp;Clinical&nbsp;Research<br/>&nbsp;&nbsp;&nbsp;212&nbsp;Church&nbsp;Road<br/>&nbsp;&nbsp;&nbsp;North&nbsp;Wales,&nbsp;PA&nbsp;19454<br/>&nbsp;&nbsp;&nbsp;US<br/><br/>&nbsp;&nbsp;&nbsp;Domain&nbsp;Name:&nbsp;ICONPLC.COM<br/><br/>&nbsp;&nbsp;&nbsp;Administrative&nbsp;Contact,&nbsp;Technical&nbsp;Contact:<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ICON&nbsp;Clinical&nbsp;Research&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; etc.

Also, as already mentioned, extra text can always be added to a page at a later time by client-side scripts.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号