Here is another spoj problem that asks how to find the number of distinct subsequences of a string ?
For example,
Input
AAA ABCDEFG CODECRAFTOutput
4 128 496
How can I solve this pr开发者_如何学运维oblem ?
It's a classic dynamic programming problem.
Let:
dp[i] = number of distinct subsequences ending with a[i]
sum[i] = dp[1] + dp[2] + ... + dp[i]. So sum[n] will be your answer.
last[i] = last position of character i in the given string.
A null string has one subsequence, so dp[0] = 1
.
read a
n = strlen(a)
for i = 1 to n
dp[i] = sum[i - 1] - sum[last[a[i]] - 1]
sum[i] = sum[i - 1] + dp[i]
last[a[i]] = i
return sum[n]
Explanation
dp[i] = sum[i - 1] - sum[last[a[i]] - 1]
Initially, we assume we can append a[i]
to all subsequences ending on previous characters, but this might violate the condition that the counted subsequences need to be distinct. Remember that last[a[i]]
gives us the last position a[i]
appeared on until now. The only subsequences we overcount are those that the previous a[i]
was appended to, so we subtract those.
sum[i] = sum[i - 1] + dp[i]
last[a[i]] = i
Update these values as per their definition.
If your indexing starts from 0, use a[i - 1]
wherever I used a[i]
. Also remember to wrap your computations in a mod
function if you're going to submit code. This should be implemented like this:
mod(x) = (x % m + m) % m
In order to correctly handle negative values in some languages (such as C/C++).
There exists an easier solution to this problem.
The idea is: If all character of the string are distinct, total number of subsequences is 2^n.
Now, if we find any character that have already occurred before, we should consider its last occurrence only (otherwise sequence won't be distinct). So we have to subtract the number of subsequences due to its previous occurrence.
My implementation is like this:
read s
dp[0] = 1
len = strlen(s)
last[s.length()] = {-1} //declaring `last` array with same as length of string `s` and all elements initialized with -1.
for (i = 1; i <= len; i++)
{
dp[i] = (dp[i - 1] * 2)
if (last[s[i]] > 0) dp[i] = (dp[i] - dp[last[s[i]] - 1])
last[s[i]] = i
}
Here is my CODE:
#include<iostream>
typedef long long ll;
ll fun(std::string s,ll visited[256],ll n,ll L[]){
ll ans=0;
if(n<0){
return 1;
}
//std::cout<<s.substr(0,n+1)<<" "<<n<<endl;
ans=fun(s,visited,n-1,L);
L[n]=ans;
ans=ans*2;
if(visited[int(s[n])]>=0){
ans -= L[visited[int(s[n])]];
}
visited[int(s[n])]=n;
return ans;
}
int main(){
std::string s;
std::cin>>s;
ll n=s.length();
ll visited[256];
ll L[n];
memset(visited,-1,sizeof(visited));
memset(L,-1,sizeof(L));
std::cout<<fun(s,visited,n-1,L);
return 0;
}
Explanation :
I scan from the back of a string ie- from the last element to the first and therefore send the first n-1
characters for further scanning in the recursion.
Once n==-1 or n<0(both are same)
, I reach on the empty string and return 1 because no. of subsequences of an empty string is 1.
So, on returning back from recursion, we know that adding the current non-duplicate character to the previous string doubles the no. of subsequences. Doubling happens because now I can add this character at the end of all the previous subsequences. So, with
and without
this character means double of all previous subsequences.
Assuming that the current character is not a duplicate, I multiply the previous no. of subsequences with 2.
After the total no. of subsequences of the first n-1
characters has been computed, we double them for the first n
characters.
But, suppose the character currently encountered(nth character) has already been present in the first n-1
characters before(ie - found within the string s[0....n-1] (Note: s[n] is the current character)), then we have to subtract those no. of subsequences possible from up to (excluding) that part of s when the last time this current character was encountered and which has already been computed and stored in L['this particular character'].
ie - BACA
- for the given string, the 4th A
has already been encountered before(while returning from the recursion, we first encounter B
, then A
, then C
and at last A
) and so we deduct the no. of subsequences calculated upto (excluding) the 2nd A
(which is 2 (because no. of subseq. before A
is 2)).
So, every time we have calculated the no. of subsequences for the first n-1
characters, we store them in the array L.
Notice : L[k] store the no. of subsequences before the kth index.
I've used the visited array in order to check whether the given character that I'm currently present at has already been scanned through or not.
On encountering the current character, I update the visited array with the position of current position as n
. This need to be done because we have to exclude the duplicate sequences.
Note: visited[]
is initialized with all -1 because the position of any character in the string s
is non-negative (0 based indexing).
Summary:
How do you arrive at the number of duplicates? Let's say the last occurrence of current character at i, was at j'th position. Then, we will have duplicate subsequences: consider starting with i'th character and then all subsequences possible from [0,j-1] vs. starting at j'th character and then all subsequences possible from [0,j-1]. So, to eliminate this, you subtract the number of subsequences possible from upto (excluding) j with L[0]=1 mean that upto(excluding 0), no. of subseq are 1(empty string has 1 subsequence).
///i get wa
int finding_dist_subs(int len,char data[])
{
dp[0]=1;
for(int i=1;i<len;i++)
{
dp[i]=(dp[i-1]*2+1)%1000000007;
for(int j=i-1;j>=0;j--)
{
if(data[i]==data[j])
{
if(j!=0)
dp[i]=(dp[i]-(dp[j-1])-1)%1000000007;
else dp[i]=(dp[i]-1)%1000000007;
break;
}
}
}
return dp[len-1];
}
精彩评论