Background: I have to download webpages with their resources for offline viewing, however as part of this I have to "rewrite" the URL's for links with the HTML webpage so they work. This is fine more the standard types of links however I'm realizing now that there are some links that are dynamically created by javascript.
Question: What approach (or even existing library) could I use to transcribe a web page with dynamically generated links (from javascript) to a webpage with normal non-dynamic links. (as then I can do the URL rewriting I need to do)
Notes:
- It's almost as if I need to have a Javascript interpreter library that I pass the page HTML to, and it then spits out the generated java code perhaps? Then I can rewrite the links as I wish (the result would then not use the javascript dynamic approach).
- Context is a C# WinForms (3.5) application.
Thanks
PS. Some examples:
<script type="text/javascript">
<!--
document.write("<a href=\"/home.asp\" onMouseOver=\"MM_swapImag开发者_运维知识库e('tab_home','','/_includes/images/tab_home_.gif',1)\" onMouseOut=\"MM_swapImgRestore()\"><img src=\"/includes/images/tab_home.gif\" alt=\"Home\" name=\"tab_home\" width=\"45\" height=\"18\" border=\"0\" id=\"tab_home\"><\/a>");
if (window.document.location.pathname.indexOf("mysite.asp") != "-1") {
document.write("<a href=\"/mysite.asp\" onMouseOver=\"MM_swapImage('tab_my_site','','/_includes/images/tab_my_site_.gif',1)\" onMouseOut=\"MM_swapImgRestore()\"><img src=\"/_includes/images/tab_my_site_.gif\" alt=\"My Site\" name=\"tab_my_site\" width=\"76\" height=\"18\" border=\"0\" id=\"tab_my_site\"><\/a>");
}
else {
document.write("<a href=\"/mysite.asp\" onMouseOver=\"MM_swapImage('tab_my_site','','/_includes/images/tab_my_site_.gif',1)\" onMouseOut=\"MM_swapImgRestore()\"><img src=\"/_includes/images/tab_my_site.gif\" alt=\"My Site\" name=\"tab_my_site\" width=\"76\" height=\"18\" border=\"0\" id=\"tab_my_site\"><\/a>");
}
and
<script type="text/javascript">
var fo = new FlashObject("/homepage/ia/flash/hero/banner.swf?q=1", "hero", "642", "250", "8", "#ffffff");
fo.addParam("wmode", "transparent");
fo.addParam("allowScriptAccess", "always");
fo.addParam("base", "/homepage/ia/flash/hero/");
fo.write("flashContent");
</script>
and
<td width="1%">
<a href="javascript:checksubmit(this);"
onmouseover="MM_swapImage('but_srch_go','','/_includes/images/but_srch_go_.gif',1)"
onmouseout="MM_swapImgRestore()">
<img src="http://localhost:3000/sites/http://qheps.health.qld.gov.au/_includes/images/but_srch_go.gif" alt="Go" name="but_srch_go" width="57" height="40" border="0">
</a>
</td>
If you're not using the WebBrowser control you might be able to use the JScriptEvaluate method in JScript.NET but chances are you'll need to evaluate more than just a simple expression. The WebBrowser control is certainly the easier route.
If you are using the WebBrowser control, you can invoke the "eval" method from C# pretty easily.
/// <summary>
/// Handles the Navigated event of the browser control.
/// </summary>
/// <param name="sender">The source of the event.</param>
/// <param name="e">The <see cref="T:WebBrowserNavigatedEventArgs"/> instance containing the
/// event data.</param>
private void browser_Navigated( object sender, WebBrowserNavigatedEventArgs e )
{
string codeToEval = "window.alert('blah')";
if ( browser.Document != null ) {
object window = browser.Document.Window.DomWindow;
if ( window != null ) {
Type windowType = window.GetType();
BindingFlags flags = BindingFlags.InvokeMethod | BindingFlags.Instance;
string[] args = { codeToEval, "JScript" };
windowType.InvokeMember( "[DispID=1165]", flags, null, window, args );
} // if
} // if
}
There is a third option too. You could always download the HTML pages as-is without rewriting the URL's then in the code that presents the HTML to the user, you could trap the click on the link and cancel navigation and instead navigate to the corresponding "offline" link.
精彩评论