I'm retrieving HTML of many webpages (saved earlier) from SQL Server. My purpose is to modify an img's src attribute. There is only one img tag in the HTML and it's source is like so:
...
<td colspan="3" align="center">
<img src="/crossword/13cnum1.gif" height="360" width="360" border="1"><br></td>
...
I need to change the /crossword/13cnum1.gif to http://www.nostrotech.com/crossword/13cnum1.gif
Code:
private void ReplaceTest() {
String currentCode = string.Empty;
Cursor saveCursor = Cursor.Current;
try {
Cursor.Current = Cursors.WaitCursor;
foreach (WebData oneWebData in DataContext.DbContext.WebDatas.OrderBy(order => order.PuzzleDate)) {
if (oneWebData.Status == "Done" ) {
currentCode = oneWebData.Code;
#region Setup Agility
Html开发者_StackOverflowAgilityPack.HtmlDocument AgilityHtmlDocument = new HtmlAgilityPack.HtmlDocument {
OptionFixNestedTags = true
};
AgilityHtmlDocument.LoadHtml(oneWebData.PageData);
#endregion
#region Image and URL
var imageOnPage = from imgTags in AgilityHtmlDocument.DocumentNode.Descendants()
where imgTags.Name == "img" &&
imgTags.Attributes["height"] != null &&
imgTags.Attributes["width"] != null
select new {
Url = imgTags.Attributes["src"].Value,
tag = imgTags.Attributes["src"],
Text = imgTags.InnerText
};
if (imageOnPage == null) {
continue;
}
imageOnPage.FirstOrDefault().tag.Value = "http://www.nostrotech.com" + imageOnPage.FirstOrDefault().Url;
#endregion
}
}
}
catch (Exception ex) {
XtraMessageBox.Show(String.Format("Exception: " + currentCode + "!{0}Message: {1}{0}{0}Details:{0}{2}", Environment.NewLine, ex.Message, ex.StackTrace), Text, MessageBoxButtons.OK, MessageBoxIcon.Error);
}
finally {
Cursor.Current = saveCursor;
}
}
I need help as the markup is NOT updated this way and I need to store the modified markup back to the DB. Thanks.
XPATH is much more consise than all this XLinq jargon, IMHO... Here is how to do it:
HtmlDocument doc = new HtmlDocument();
doc.Load(myHtml);
foreach (HtmlNode img in doc.DocumentNode.SelectNodes("//img[@src and @height and @width]"))
{
img.SetAttributeValue("src", "http://www.nostrotech.com" + img.GetAttributeValue("src", null));
}
This code searches for img
tags that have src
, height
and width
attributes. Then, it replaces the src
attribute value.
精彩评论