开发者

Parse controls in an aspx file and convert it to xml

开发者 https://www.devze.com 2023-01-02 00:36 出处:网络
I need to parse through the aspx file (from disk, and not the one rendered on the browser) and make a list of all the server side asp.net controls present on the page, and then create an xml file from

I need to parse through the aspx file (from disk, and not the one rendered on the browser) and make a list of all the server side asp.net controls present on the page, and then create an xml file from it. which would be the best way to do it? Also, are there any available libraries for this?

For eg, if my aspx file contains

<asp:label ID="lbl1" runat="server" Text="Hi"></asp:label>

my xml file would be

<controls>

<ID>lbl1</ID>

<runat>server</runat>

<开发者_C百科;Text>Hi</Text>

</controls>


Xml parsers wouldn't understand the ASP directives: <%@ <%= etc.

You'll probably best to use regular expressions to do this, likely in 3 stages.

  1. Match any tag elements from the entire page.
  2. For Each tag, match the tag and control type.
  3. For Each tag that matches (2), match any attributes.

So, starting from the top, we can use the following regex:

(?<tag><[^%/](?:.*?)>)

This will match any tags that don't have <% and < / and does so lazily (we don't want greedy expressions, as we won't read the content correctly). The following could be matched:

<asp:Content ID="ph_PageContent" ContentPlaceHolderID="ph_MainContent" runat="server">
<asp:Image runat="server" />
<img src="/test.png" />

For each of those captured tags, we want to then extract the tag and type:

<(?<tag>[a-z][a-z1-9]*):(?<type>[a-z][a-z1-9]*)

Creating named capture groups makes this easier, this will allow us to easily extract the tag and type. This will only match server tags, so standard html tags will be dropped at this point.

<asp:Content ID="ph_PageContent" ContentPlaceHolderID="ph_MainContent" runat="server">

Will yield:

{ tag = "asp", type = "Content" }

With that same tag, we can then match any attributes:

(?<name>\S+)=["']?(?<value>(?:.(?!["']?\s+(?:\S+)=|[>"']))+.)["']?

Which yields:

{ name = "ID", value = "ph_PageContent" },
{ name = "ContentPlaceHolderID", value = "ph_MainContent" },
{ name = "runat", value = "server" }

So putting that all together, we can create a quick function that can create an XmlDocument for us:

public XmlDocument CreateDocumentFromMarkup(string content)
{
  if (string.IsNullOrEmpty(content))
    throw new ArgumentException("'content' must have a value.", "content");

  RegexOptions options = RegexOptions.CultureInvariant | RegexOptions.Compiled | RegexOptions.IgnoreCase;
  Regex tagExpr = new Regex("(?<tag><[^%/](?:.*?)>)", options);
  Regex serverTagExpr = new Regex("<(?<tag>[a-z][a-z1-9]*):(?<type>[a-z][a-z1-9]*)", options);
  Regex attributeExpr = new Regex("(?<name>\\S+)=[\"']?(?<value>(?:.(?![\"']?\\s+(?:\\S+)=|[>\"']))+.)[\"']?", options);

  XmlDocument document = new XmlDocument();
  XmlElement root = document.CreateElement("controls");

  Func<XmlDocument, string, string, XmlElement> creator = (document, name, value) => {
    XmlElement element = document.CreateElement(name);
    element.InnerText = value;

    return element;
  };

  foreach (Match tagMatch in tagExpr.Matches(content)) {
    Match serverTagMatch = serverTagExpr.Match(tagMatch.Value);

    if (serverTagMatch.Success) {
      XmlElement controlElement = document.CreateElement("control");

      controlElement.AppendChild(
        creator(document, "tag", serverTagMatch.Groups["tag"].Value));
      controlElement.AppendChild(
        creator(document, "type", serverTagMatch.Groups["type"].Value));


      XmlElement attributeElement = document.CreateElement("attributes");

      foreach (Match attributeMatch in attributeExpr.Matches(tagMatch.Value)) {
        if (attributeMatch.Success) {
          attributeElement.AppendChild(
            creator(document, attributeMatch.Groups["name"].Value, attributeMatch.Groups["value"].Value));
        }
      }

      controlElement.AppendChild(attributeElement);
      root.AppendChild(controlElement);
    }
  }  

  return document;
}

The resultant document could look like this:

<controls>
  <control>
    <tag>asp</tag>
    <type>Content</type>
    <attributes>
      <ID>ph_PageContent</ID>
      <ContentPlaceHolderID>ph_MainContent</ContentPlaceHolderID>
      <runat>server</runat>
    </attributes>
  </control>
</controls>

Hope that helps!


I used the below three regular expressions with the above code and it gives me html tags as well. Also I can obtain the value in between opening and closing tags too.

Regex tagExpr = new Regex("(?<tag><[^%/](?:.*?)>[^/<]*)", options);
Regex serverTagExpr = new Regex("<(?<type>[a-z][a-z1-9:]*)[^>/]*(?:/>|[>/])(?<value>[^</]*)", options);
Regex attributeExpr = new Regex("(?<name>\\S+)=[\"']?(?<value>(?:.(?![\"']?\\s+(?:\\S+)=|[>\"']))+.)[\"']?", options);


 Func<XmlDocument, string, string, XmlElement> creator = (document, name, value) => {
XmlElement element = document.CreateElement(name);
element.InnerText = value;

the above generic template will work version 3.5 and above.. so if any one using version below that , create function like :

public XmlElement creator(XmlDocument document, string name, string value)
{
    XmlElement element = document.CreateElement(name);
    element.InnerText = value;

    return element;
}

this will work


ASPX files should be valid XML, so maybe XSLT would be a good solution. The W3 Schools site has a good introduction and reference. You could then call this XSLT from a simple program to pick the required file(s).

Alternatively, you could use Linq to XML to load the ASPX file(s) and iterate over the controls in a Linq-style.


if the code for the tag is written in more than one line, we may have an issue in extracting the tag data. to avoid that I have removed the newline characters as below from the source string that we are passing to the above function (content)

string contentRemovedNewLines = Regex.Replace(content, @"\t|\n|\r", "");

then we can use contentRemovedNewLines instead of content.

Above code works as i wanted. one more thing can be added. you can call the above method as shown below and then save as an xml file so, we can check that the expected result is there or not.

XmlDocument xmlDocWithWebContent = CreateDocumentFromMarkup(sourceToRead);

string xmlfileLocation = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location) + "tempXmlOutputFileOfWebSource.xml";

xmlDocWithWebContent.Save(xmlfileLocation);

to do that, we have to have a root element for the xml file

            XmlDocument document = new XmlDocument();
            XmlNode xmlNode = document.CreateNode(XmlNodeType.XmlDeclaration, "", "");
            XmlElement root = document.CreateElement("controls");
            document.AppendChild(root);

i used the above fix for that

0

精彩评论

暂无评论...
验证码 换一张
取 消