开发者

How do I edit an XML file with Perl?

开发者 https://www.devze.com 2022-12-31 17:51 出处:网络
I have a movie collection catalogue with local links to folders and files for an easy access. Recently I reorganaized my entire hard disk space and I need to update the links and I\'m trying to do tha

I have a movie collection catalogue with local links to folders and files for an easy access. Recently I reorganaized my entire hard disk space and I need to update the links and I'm trying to do that automatically with Perl.

I can export the data in a XML file and import it again. I can extract the new filepaths with the use of File::Find but I'm stuck with two problems. I have no idea how to connect the $title from the new filepath with the corresponding $title from the XML file. I'm dealing with such files for the first time and I don't know how to proceed with the replacement process. Here is what I've done till now

use strict; 
use warnings; 
use File::Basename;
use File::Find; 
use File::Spec;
use XML::Simple;
use Data::Dumper;



my $dir_target = 'D:/Movies/';
my %titles_locations = ();

find(\&file_handler, $dir_target);
sub file_handler {
   /\.iso$/ or return;       

   my $fn = $File::Find::name;
   $fn =~ s/\//\\/g;
   $fn =~ /(.*\\)开发者_如何学JAVA(.*)/;
   my $path = $1;
   my $filename = $2;

   my $title = (File::Spec->splitdir($fn))[2];
   $title =~ s/(.*?)\s\(\d+\)$/$1/;
   $title =~ s/~/:/;
   $title =~ s/`/?/;

   my $link_local = '<link><description>Folder</description><url>'.$path.'</url><urltype>Movie</urltype></link><link><description>'.$filename.'</description><url>'.$fn.'</url><urltype>Movie</urltype></link>' unless $title eq '';

   $titles_locations{$title} = {'filename'=>$filename, 'path'=>$path };
}

   my $xml_in = XMLin('somepath/test.xml', ForceArray => 1, KeepRoot => 1);

   my $title = {'key1' => 'title', 'key2' => 'links'};

   foreach my $link (keys %$title) {
   }

   print Data::Dumper->Dump([$title]);

   my $xml_out = XMLout($xml_in, OutputFile => 'somepath/test_out.xml', KeepRoot=>1);       

And here is a snippet of the data I need to edit. If found imdb and dvdempire link - do not touch. if found local links replace, otherwise insert. I'm willing to complete the code myself but need some directions how to proceed further. Thanks.

<title>$title</title>
.......

<links>
<link>
<description>IMDB</description> 
<url>http://www.imdb.com/title/VARIABLE</url> 
<urltype>URL</urltype> 
</link>
<link>
<description>DVD Empire</description> 
<url>http://www.dvdempire.com/VARIABLE</url> 
<urltype>URL</urltype> 
</link>
<link>
<description>Folder</description>
<url>OLD_FOLDERPATH</url>
<urltype>Movie</urltype>
</link>
<link>
<description>OLD_FILENAME</description>
<url>OLD_FILENAMEPATH</url>
<urltype>Movie</urltype>
</link>
</links>


Get rid of XML::Simple and use XML::Twig which is made just for this sort of task. The traversal and element operations are built into Twig. There is a lot less to think about when Twig does most of the work.

As far as connecting old paths to new paths, there's not much to go on with the data that you have. If they are the same filenames but in different folders, that could be the way that you match up the new and old paths if they are unique filenames. Here's everything except getting all of the new paths to populate %new_paths:

#!perl

use File::Basename qw(basename);
use XML::Twig;

my %new_paths = (
         # filename => new_path
         ...
         ); 

my $twig = XML::Twig->new(
    twig_handlers => 
      {
      link   => \&rewrite_link,
      },
    pretty_print => 'indented',
    );

$twig->parse( *DATA );
$twig->flush;

sub rewrite_link
    {
    my( $link ) = $_;

    return unless $link->field( 'urltype' ) eq 'Movie';

    # this is from the old file
    my $basename = basename( $link->field( 'url' ) );

    unless( exists $new_paths{ $basename } )
        {
        warn "Didn't find a new location for $basename!\n";
        return;
        }

    $link->first_child( 'url' )->set_text( $new_paths{ $basename } );
    }

__END__
<titles>
<entry>
    <title>$title</title>
    <links>
        <link>
            <description>IMDB</description> 
            <url>http://www.imdb.com/title/VARIABLE</url> 
            <urltype>URL</urltype> 
        </link>
        <link>
            <description>DVD Empire</description> 
            <url>http://www.dvdempire.com/VARIABLE</url> 
            <urltype>URL</urltype> 
        </link>
        <link>
            <description>Folder</description>
            <url>OLD_FOLDERPATH</url>
            <urltype>Movie</urltype>
        </link>
        <link>
            <description>OLD_FILENAME</description>
            <url>OLD_FILENAMEPATH</url>
            <urltype>Movie</urltype>
        </link>
    </links>
</entry>
</titles>


I'll provide a plausible approach - please comment if you'd like it fleshed out more.

  1. Declare a hash my %titles_locations = (); at the beginning.

  2. You should move your XML handling out of sub a (and please call it something readable, like sub file_handler :)

    What the file handler should do is:

    • Build the $title and $link_local as you do now

    • Store them in a %titles_locations hash with $title being the key and the value a hashref containing {'filename'=>$filename, 'path'=>$path }

  3. Now, in your code, after calling find(), you will call XMLin. $xml_in should become an array of hashrefs (or a hashref mapping your "root" key to an array of hashrefs. Each hashref in the array will represent 1 title.

  4. After that, you will loop over that arrayref of titles.

    Each element (call it $title) of the arrayref will be a hashref with 2 keys, "title" and "links".

    From the value of the "title" key, find the new path and filename from %titles_locations hash.

    The value of "links" key will be a hashref mapping "link" to an array of hashrefs. I won't bother detailing the data structure here but it's trivial to see it by printing Data::Dumper->Dump([$title]);

    You will then loop over those link hashrefs. For each of them (call it $link:

    • If $link->{urltype} ne "Movie", leave it alone (next;)
    • If $link->{description} eq "Folder", replace the $link->{url} value with new path you found from %titles_locations hash.
    • Else, it's a file, replace the $link->{url} value with new filepath you found from %titles_locations hash.

    May be add some error handling if $title is not in %titles_locations hash.

  5. After all the looping is done, then simply take your $xml_in (that now contains updated info) and pass to XMLout()

DONE

0

精彩评论

暂无评论...
验证码 换一张
取 消