开发者

parser creation in perl that extract xml tags from source code?

开发者 https://www.devze.com 2023-04-06 13:00 出处:网络
i have to extract xml comments from c code .I tried using perl regexp but i am unable to extract the comments. can any one help me. my code as shown below.

i have to extract xml comments from c code .I tried using perl regexp but i am unable to extract the comments. can any one help me. my code as shown below.

   Dima_chkTimeValidation(&dacl_ts_pumpPWMLowNoDos_str,
                       &dacl_ti_pumpPWMLowNoDos_U16,
                       ti_valid_U16,
                     开发者_StackOverflow  ti_inval_U16,
                       (tB)(dacl_r_pumpPwmResidualFilt_S16 < r_testlimit_S16),
                       (tB)((testCond_B == TRUE) && (dosingActive_B == FALSE)),
                       TRUE);
  /*****************************************/
  /*xml comments*/
  /****************************************/

 <DTC>
  <TroubleCode>1101</TroubleCode> 
  <Classification>FAULT</Classification> 
  <SelfHealing>No selfhealing</SelfHealing> 
  <WarningLamp>No Warning Lamp</WarningLamp> 
  <DirectDegradation>No Action</DirectDegradation> 
  <Order>PRIMARY</Order> 
   </DTC>
     /*******************************/
  /* Dosing clogg test           */
  /*******************************/
  /* special test when run i sequence test mode SMHD_DOSVALVE_E */
  if ((s_seqTestCtrlStatus_E == SMHD_RUNNING_E) && (s_seqTestMainState_SMHD_DOSVALVE_E))
  {
    /* Use result from DDOS test */
    Dima_chkValidation(&dacl_ts_pumpPWMLowDos_str,
                       (tB)(s_dosValveTest_E == SMHD_TESTFAILED_E),
                       (tB)(s_dosValveTest_E != SMHD_TESTNOTFINISHED_E));
   }

as show above i have lot of c code lines before and after xml comments but i posted just little c code, i added some comments in the c code, i need to extract the comments as it is. so any body can help me how to extract using perl.


Your data is bizarre, to say the least. I'm making two assumptions here: the ' is the starting delimiter of the example string, and you want to extract the stuff between the angle brackets (which are neither XML nor XML comments according to, you know, the standard). No guarantee against misparsing embedded C code.

use 5.010;
use Data::Dumper qw(Dumper);

say Dumper \%+ while
'<dtcnumber>1223<dtcnumber>
 <discription>battery short circuited<discription>
   <cause>due to unproper connections<cause>
  main();
  {
   ..........
   ...
   c code.
   ...
    };' =~ /<(?<key>[^>]+)>(?<value>[^<]+)<\g{key}>/g;

Output

$VAR1 = {
          'value' => '1223',
          'key' => 'dtcnumber'
        };

$VAR1 = {
          'value' => 'battery short circuited',
          'key' => 'discription'
        };

$VAR1 = {
          'value' => 'due to unproper connections',
          'key' => 'cause'
        };


Its not a good idea to write the whole code for your work, but I am still doing it so that you can get an idea of how to approach a particular problem.

Here, I am providing you the simplest approach (might not be efficient)

1. Keep you input data simple and make your life more simpler. Identify a particular pattern using which your code can identify the beginning and end of XML.

Dima_chkTimeValidation(&dacl_ts_pumpPWMLowNoDos_str,
                       &dacl_ti_pumpPWMLowNoDos_U16,
                       ti_valid_U16,
                       ti_inval_U16,
                       (tB)(dacl_r_pumpPwmResidualFilt_S16 <  r_testlimit_S16),
                       (tB)((testCond_B == TRUE) && (dosingActive_B == FALSE)),
                       TRUE);
  /*****************************************/

  /*[[[ Start XML  

 < DTC >
  < TroubleCode > 1101 < /TroubleCode > 
  < Classification > FAULT < /Classification > 
  < SelfHealing > No selfhealing < /SelfHealing > 
  < WarningLamp > No Warning Lamp lt /WarningLamp > 
  < DirectDegradation > No Action < /DirectDegradation > 
  < Order > PRIMARY < /Order > 
   < /DTC >

   End XML]]]*/

  /*******************************/


  /* special test when run i sequence test mode SMHD_DOSVALVE_E */
  if ((s_seqTestCtrlStatus_E == SMHD_RUNNING_E) && (s_seqTestMainState_SMHD_DOSVALVE_E))
  {
    /* Use result from DDOS test */
    Dima_chkValidation(&dacl_ts_pumpPWMLowDos_str,
                       (tB)(s_dosValveTest_E == SMHD_TESTFAILED_E),
                       (tB)(s_dosValveTest_E != SMHD_TESTNOTFINISHED_E));
   }

Here, you can identify the pattern I have kept to detect starting of xml and end of xml

2. Next, is the code. Now i have tried to write it as much in the "C" way except for the regex.

#!/usr/bin/perl
#
#
open(FD,"< Code.cpp") or die "unable to open file: $!\n";

my $start_xml = 0 ; ## 0 indicates false condition ..i.e either XML not started or XML ended
                    ## 1 means xml has started.

while(< FD >){

        chomp($_);

        ## Handling only single Line comments

        my $temp = $_;

        if($temp =~ m/\[\[\[\s*start\s*xml/ig && $start_xml == 0){  ## Check if start xml pattern found

                $start_xml = 1;
                next;     ## equivalent to continue of C 
        }

        if(($temp =~ m/< [a-z0-9 -&!@]+ >.*/ig)  && ($start_xml == 1)){ ## You can add additional letters that may come
                                                              ## In such cases pattern matching wont be necessary as you know
                                                              # you have got XML data between start and end xml pattern. But still... 
                                                              #  some case you might need it

                print "$temp\n";  ## I am printing it out , but you may write it to file

        }elsif($temp =~ m/end\s*xml\s*\]\]\]/ig){

                $start_xml = 0;
                last;   ## equivalent to break in C
        }
}
close FD;

NOTE :: There is no < space > after "<" and after the ">" tag in the text and in the code. So, remove that space when you are running the code.

The kind of pattern chosen to detect xml taken from "Python cog" :)

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号