I have a string that may contain an arbitrary number of single-letters separated by spaces. I am looking for a regex (in Perl) that will remove spaces between all (unknown number) of single letters.
For example:
ab c d
should become ab cd
a bcd e f gh
should become a bcd ef gh
a b c
should become abc
and
abc d
should be unchanged (because there are no single letters followed by or preceded by a single space).
Thanks for any ideas.
Your description doesn't really match your examples. It looks to me like you want to remove any space that is (1) preceded by a letter which is not itself preceded by a letter, and (2) followed by a letter which is not itself followed by a letter. Those conditions can be expressed precisely as nested lookarounds:
/(?<=(?<!\pL)\pL) (?=\pL(?!\pL))/
tested:
use strict;
use warnings;
use Test::Simple tests => 4;
sub clean {
(my $x = shift) =~ s/(?<=(?<!\pL)\pL) (?=\pL(?!\pL))//g;
$x;
}
ok(clean('ab c d') eq 'ab cd');
ok(clean('a bcd e f gh') eq 'a bcd ef gh');
ok(clean('a b c') eq 'abc');
ok(clean('ab c d') eq 'ab cd');
output:
1..4
ok 1
ok 2
ok 3
ok 4
I'm assuming you really meant one space character (U+0020); if you want to match any whitespace, you might want to replace the space with \s+
.
You can do this with lookahead and lookbehind assertions, as described in perldoc perlre:
use strict;
use warnings;
use Test::More;
is(tran('ab c d'), 'ab cd');
is(tran('a bcd e f gh'), 'a bcd ef gh');
is(tran('a b c'), 'abc');
is(tran('abc d'), 'abc d');
sub tran
{
my $input = shift;
(my $output = $input) =~ s/(?<![[:lower:]])([[:lower:]]) (?=[[:lower:]])/$1/g;
return $output;
}
done_testing;
Note the current code fails on the second test case, as the output is:
ok 1
not ok 2
# Failed test at test.pl line 7.
# got: 'abcd efgh'
# expected: 'a bcd ef gh'
ok 3
ok 4
1..4
# Looks like you failed 1 test of 4.
I left it like this as your second and third examples seem to contradict each other as to how leading single characters should be handled. However, this framework should be enough to allow you to experiment with different lookaheads and lookbehinds to get the exact results you are looking for.
This piece of code
#!/usr/bin/perl
use strict;
my @strings = ('a b c', 'ab c d', 'a bcd e f gh', 'abc d');
foreach my $string (@strings) {
print "$string --> ";
$string =~ s/\b(\w)\s+(?=\w\b)/$1/g; # the only line that actually matters
print "$string\n";
}
prints this:
a b c --> abc
ab c d --> ab cd
a bcd e f gh --> a bcd ef gh
abc d --> abc d
I think/hope this is what you're looking for.
This should do the trick:
my $str = ...;
$str =~ s/ \b(\w) \s+ (\w)\b /$1$2/gx;
That removes the space between all single nonspace characters. Feel free to replace \S
with a more restrictive character class if needed. There also may be some edge cases related to punctuation characters that you need to deal with, but I can't guess that from the info you have provided.
As Ether helpfully points out, this fails on one case. Here is a version that should work (though not quite as clean as the first):
s/ \b(\w) ( (?:\s+ \w\b)+ ) /$1 . join '', split m|\s+|, $2/gex;
I liked Ether's test based approach (imitation is the sincerest form of flattery and all):
use warnings;
use strict;
use Test::Magic tests => 4;
sub clean {
(my $x = shift) =~ s{\b(\w) ((?: \s+ (\w)\b)+)}
{$1 . join '', split m|\s+|, $2}gex;
$x
}
test 'space removal',
is clean('ab c d') eq 'ab cd',
is clean('a bcd e f gh') eq 'a bcd ef gh',
is clean('a b c') eq 'abc',
is clean('abc d') eq 'abc d';
returns:
1..4
ok 1 - space removal 1
ok 2 - space removal 2
ok 3 - space removal 3
ok 4 - space removal 4
It's not a regex but since I am lazy by nature I would it do this way.
#!/usr/bin/env perl
use warnings;
use 5.012;
my @strings = ('a b c', 'ab c d', 'a bcd e f gh', 'abc d');
for my $string ( @strings ) {
my @s; my $t = '';
for my $el ( split /\s+/, $string ) {
if ( length $el > 1 ) {
push @s, $t if $t;
$t = '';
push @s, $el;
} else { $t .= $el; }
}
push @s, $t if $t;
say "@s";
}
OK, my way is the slowest:
no_regex 130619/s -- -60% -61% -63%
Alan_Moore 323328/s 148% -- -4% -8%
Eric_Storm 336748/s 158% 4% -- -5%
canavanin 352654/s 170% 9% 5% --
I didn't include Ether's code because ( as he has tested ) it returns different results.
Now I have the slowest and the fastest.
#!/usr/bin/perl
use 5.012;
use warnings;
use Benchmark qw(cmpthese);
my @strings = ('a b c', 'ab c d', 'a bcd e f gh', 'abc d');
cmpthese( 0, {
Eric_Storm => sub{ for my $string (@strings) { $string =~ s{\b(\w) ((?: \s+ (\w)\b)+)}{$1 . join '', split m|\s+|, $2}gex; } },
canavanin => sub{ for my $string (@strings) { $string =~ s/\b(\w)\s+(?=\w\b)/$1/g; } },
Alan_Moore => sub{ for my $string (@strings) { $string =~ s/(?<=(?<!\pL)\pL) (?=\pL(?!\pL))//g; } },
keep_uni => sub{ for my $string (@strings) { $string =~ s/\PL\pL\K (?=\pL(?!\pL))//g; } },
keep_asc => sub{ for my $string (@strings) { $string =~ s/[^a-zA-Z][a-zA-Z]\K (?=[a-zA-Z](?![a-zA-Z]))//g; } },
no_regex => sub{ for my $string (@strings) { my @s; my $t = '';
for my $el (split /\s+/, $string) {if (length $el > 1) { push @s, $t if $t; $t = ''; push @s, $el; } else { $t .= $el; } }
push @s, $t if $t;
#say "@s";
} },
});
.
Rate no_regex Alan_Moore Eric_Storm canavanin keep_uni keep_asc
no_regex 98682/s -- -64% -65% -66% -81% -87%
Alan_Moore 274019/s 178% -- -3% -6% -48% -63%
Eric_Storm 282855/s 187% 3% -- -3% -46% -62%
canavanin 291585/s 195% 6% 3% -- -45% -60%
keep_uni 528014/s 435% 93% 87% 81% -- -28%
keep_asc 735254/s 645% 168% 160% 152% 39% --
This will do the job.
(?<=\b\w)\s(?=\w\b)
Hi I have written simple javascript to do this it's simple and you can convert into any language.
function compressSingleSpace(source){
let words = source.split(" ");
let finalWords = [];
let tempWord = "";
for(let i=0;i<words.length;i++){
if(tempWord!='' && words[i].length>1){
finalWords.push(tempWord);
tempWord = '';
}
if(words[i].length>1){
finalWords.push(words[i]);
}else{
tempWord += words[i];
}
}
if(tempWord!=''){
finalWords.push(tempWord);
}
source = finalWords.join(" ");
return source;
}
function convertInput(){
let str = document.getElementById("inputWords").value;
document.getElementById("firstInput").innerHTML = str;
let compressed = compressSingleSpace(str);
document.getElementById("finalOutput").innerHTML = compressed;
}
label{
font-size:20px;
margin:10px;
}
input{
margin:10px;
font-size:15px;
padding:10px;
}
input[type="button"]{
cursor:pointer;
background: #ccc;
}
#firstInput{
color:red;
font-size:20px;
margin:10px;
}
#finalOutput{
color:green;
font-size:20px;
margin:10px;
}
<label for="inputWords">Enter your input and press Convert</label><br>
<input id="inputWords" value="check this site p e t z l o v e r . c o m thanks">
<input type="button" onclick="convertInput(this.value)" value="Convert" >
<div id="firstInput">check this site p e t z l o v e r . c o m thanks</div>
<div id="finalOutput">check this site petzlover.com thanks</div>
精彩评论