开发者

regex script in python or perl

开发者 https://www.devze.com 2023-02-18 02:52 出处:网络
It would really make my work easier if someone could help me with writing script in python or perl in which from given file it retreives all sentences like:

It would really make my work easier if someone could help me with writing script in python or perl in which from given file it retreives all sentences like:

[LANG::...]
  • ... means anything

for ecxample:

[LANG::Sample text with digits 0123]

and writes it to the fileeach in single line.

Tha开发者_如何学运维nks very much for help

EDIT:

Thanks for help, and now something more advanced.

if it finds something like [:ANG:: ...] please write only ... without brackets ang LANG:: tag.

Thanks guys You are awesome :)


import re

with open('input.txt', 'w') as f:
    text = f.read()
#text = 'Intro [LANG::First text 1] goes on [LANG::Second text 2] and finishes.'

with open('output.txt', 'w') as f:
    for match in re.findall('\[LANG::.*?\]', text):
        f.write(match+'\n')

outputs:

[LANG::First text 1]
[LANG::Second text 2]

Second part of the question: if it finds something like [:ANG:: ...] please write only ... without brackets and LANG:: tag.

Change the last part to:

with open('output.txt', 'w') as f:
    for match in re.findall('\[.ANG::.*?\]', text):
        if match.startswith('[:ANG'):
            f.write(match[7:-1]+'\n')
        else:
            f.write(match+'\n')

Fix that substring part match[7:-1] to your needs.


perl version

perl -lne "print if /\[LANG::.+?\]/;" infile > outfile


Perl version (edited to get input from file):

#!/usr/bin/perl 

use strict;
use warnings;

open(my $in, '<', 'input.txt');
open(my $out, '>', 'output.txt');

while ( <$in> ) {
    my @found = /\[LANG::.*?\]/g;
    print $out "$_\n" for @found;
}


Perl

$ perl -nE'say $1 while /\[LANG::([^]]+)\]/g' input.txt >output.txt

Python

#!/usr/bin/env python
import fileinput, re

for line in fileinput.input():
    for match in re.findall(r'\[LANG::([^]]+)\]', line):
        print match

Usage: $ print-lang input.txt >output.txt

input.txt

井の中の蛙、大海を知らず [LANG::Japanese] a frog in a well cannot conceive 
of the ocean [LANG::English]

терпи казак, атаманом будешь [LANG::Russian] no pain, no gain [LANG::English]

output.txt

Japanese
English
Russian
English
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号