I would really appreciate any help you may have, regarding the following problem:
I'm processing large amounts of XML data using PowerShell. XML is stored in .txt files and my PowerShell script after reading the file writes the content into the database.
I would like to filter-out some XML nodes if they do not have proper "signatureNumber" (verifying it either by length, or preferably with regular expression).
Below is the XML structure:
<Objs xmlns="http://schemas.microsoft.com/powershell/2004/04" Version="1.1.0.1">
<Obj RefId="0">
<TN RefId="0">
<T>WebServiceProxy.TestOutputElement</T>
<T>System.Object</T>
</TN>
<ToString>WebServiceProxy.TestOutputElement</ToString>
<Props>
<DT N="declarationDate">2011-08-29T10:28:17</DT>
<B N="declarationDateSpecified">true</B>
<Nil N="testDate" />
<B N="testDateSpecified">true</B>
<S N="XMLdocument"><?xml S>
<I32 N="id">1359569</I32>
<B N="idSpecified">true</B>
<I32 N="decisionCode">5</I32>
<B N="decisionCodeSpecified">true</B>
<S N="documentStatus">issued</S>
<S N="incidentSignature">Nc-e 491993/11</S>
<S N="signatureNumber">11111111111/222222/33</S> <----- signature length (21) is OK! We want the whole <Obj>
</Props>
</Obj>
<Obj RefId="1">
<TNRef RefId="0" />
<ToString>WebServiceProxy.TestOutputElement</ToString>
<Props>
<DT N="declarationDate">2011-08-29T10:28:18</DT>
<B N="declarationDateSpecified">true</B>
<Nil N="testDate" />
<B N="testDateSpecified">true</B>
<S N="XMLdocument"><?xml D__x000A_</S>
<I32 N="id">1359570</I32>
<B N="idSpecified">true</B>
<I32 N="decisionCode">5</I32>
<B N="decisionCodeSpecified">true</B>
<S N="documentStatus">issued</S>
<S N="incidentSignature">Nc-e 491923/11</S>
<S N="signatureNumber">test</S> <----- wrong signature! <Obj> should be filtered out!
</Props>
</Obj>
The content is read开发者_JS百科 in loops using similar code:
$filedata = Get-Content ("C:\EXPORT\MyData"+$pageNumber+".txt")
Right after reading each file, the XML is written into database:
$Command.CommandText = "INSERT INTO dbo.ImportXml (MethodName,XmlData) VALUES ('"+$methodName+"','"+ $filedata+ "')"
$Command.ExecuteNonQuery() >> $log_message
The goal is to filter-out all <Obj>
elements from the $filedata
variable, if they contain "signatureNumber" with length different from 21. Everything must be done before the INSERT.
I would really appreciate any advice!
UPDATE: Just to clarify everything. In my example <Obj RefId="0">
is OK and should be inserted, and <Obj RefId="1">
should be completely removed from the XML.
Since you are loading the XML into the database, you will have to resort to some ugly regex I think:
$filedata = [System.IO.File]::ReadAllText("C:\EXPORT\MyData"+$pageNumber+".txt")
$re=[regex] '(?s)<Obj.*?<S N="signatureNumber">(.*?)</S>.*?</Obj>'
$m = $re.Matches($filedata)
$m | ?{ $_.Groups[1].value.length -ne 21} | %{ $filedata = $filedata.Replace($_.value,"") }
$filedata
If you were using the XML in Powershell, I would have suggested something like this:
$fileXml = [xml]$filedata
$filedata = foreach ($obj in $fileXml.Objs.Obj){
$obj.Props.S | ?{ $_.N -eq "signatureNumber"} | %{if( $_."#text".length -eq 21) {$obj}}
}
$filedata
精彩评论