i have text such as
http://pasteb开发者_如何学编程in.com/H8zTbG54
we can say this text is set of rules splitted by "OR" at the end of lines
i need to put set of lines(rules) into buckets (bash array members) but i have character limit for each array member which is 1024
so each array member should contain set of rules but character count for each array member can not exceed 1024
suppose rule text like a OR b OR c OR d OR e OR f OR g OR h
output should be array member 1 = a OR b
array member 2 = c OR d OR e
array member 3 = f OR g
array member 4 = h
can anybody help me to do that
working on solaris 10 server
This is not entirely trivial and would require a bit more clarification, but basically you split them initially by OR/AND (and maybe some other patterns, depending on your needs) and then recursively split again those chunks that are larger than 1024.
P.S. This seems one of those cases, when using a fully-fledged scripting language such as Perl, Python, PHP or any other would be able to achieve result more convieniently.
Eg. a basic thing in PHP (not sure if completely correct, haven't done PHP in a while), could go like this:
function splitByOr($input)
{
$tokens = explode(" OR ",$input);
foreach ($t in $tokens)
if (strlen($t) > 1024)
$t=splitByOr($t);
return $tokens;
}
None of the individual rules in the samplerule file exceed 148 characters in length - far less than the 1024 character limit. You don't say what should be done with the rules if they do exceed that limit.
This is a very simple Bash script that will split your sample on literal "\n" into and array called "rules". It skips lines that exceed 1024 characters and prints an error message:
#!/bin/bash
while read -r line
do
(( count++ ))
if (( ${#line} > 1024 ))
then
echo "Line length limit of 1024 characters exceeded: Length: ${#line} Line no.: $count"
echo "$line"
continue
fi
rules+=($line)
done < <(echo -e "$(<samplerule)")
This variation will truncate the line length without regard to the consequences:
#!/bin/bash
while read -r line
do
rules+=(${line:0:1024})
done < <(echo -e "$(<samplerule)")
If the literal "\n" is not actually in the file and you need to use Bash arrays rather than coding this entirely in AWK, change the line in either version above that says this:
done < <(echo -e "$(<samplerule)")
to say this:
done < <(awk 'BEGIN {RS="OR"} {print $0,"OR"}' samplerule)
if [[ "${rules[${#rules[@]}-1]}" == "OR" ]]
then
unset "rules[${#rules[@]}-1]"
fi
which will split the lines on the "OR".
Edit: Added a command to remove an extra "OR" at the end.
精彩评论