It would really make my work easier if someone could help me with writing script in python or perl in which from given file it retreives all sentences like:
[LANG::...]
- ... means anything
for ecxample:
[LANG::Sample text with digits 0123]
and writes it to the fileeach in single line.
Tha开发者_如何学运维nks very much for help
EDIT:
Thanks for help, and now something more advanced.
if it finds something like [:ANG:: ...] please write only ... without brackets ang LANG:: tag.
Thanks guys You are awesome :)
import re
with open('input.txt', 'w') as f:
text = f.read()
#text = 'Intro [LANG::First text 1] goes on [LANG::Second text 2] and finishes.'
with open('output.txt', 'w') as f:
for match in re.findall('\[LANG::.*?\]', text):
f.write(match+'\n')
outputs:
[LANG::First text 1]
[LANG::Second text 2]
Second part of the question: if it finds something like [:ANG:: ...] please write only ... without brackets and LANG:: tag.
Change the last part to:
with open('output.txt', 'w') as f:
for match in re.findall('\[.ANG::.*?\]', text):
if match.startswith('[:ANG'):
f.write(match[7:-1]+'\n')
else:
f.write(match+'\n')
Fix that substring part match[7:-1]
to your needs.
perl version
perl -lne "print if /\[LANG::.+?\]/;" infile > outfile
Perl version (edited to get input from file):
#!/usr/bin/perl
use strict;
use warnings;
open(my $in, '<', 'input.txt');
open(my $out, '>', 'output.txt');
while ( <$in> ) {
my @found = /\[LANG::.*?\]/g;
print $out "$_\n" for @found;
}
Perl
$ perl -nE'say $1 while /\[LANG::([^]]+)\]/g' input.txt >output.txt
Python
#!/usr/bin/env python
import fileinput, re
for line in fileinput.input():
for match in re.findall(r'\[LANG::([^]]+)\]', line):
print match
Usage: $ print-lang input.txt >output.txt
input.txt
井の中の蛙、大海を知らず [LANG::Japanese] a frog in a well cannot conceive of the ocean [LANG::English] терпи казак, атаманом будешь [LANG::Russian] no pain, no gain [LANG::English]
output.txt
Japanese
English
Russian
English
精彩评论