You can do it in O(n + m) where m is the number of characters in SEA and n the number of characters in SRC.
You can do it in O(n + m), where m is the number of characters in SEA, and n the number of characters in SRC: last = 1 for I = 1 to m do while SRClast! = SEAi ++last print last ++last (skip this match) Basically, for each character in SEA, find its position in SRC, but only scan after the position where you found the previous character. For instance; if the source string is BRRBRBR (N=7) and the search string is BBB Then: find B in SRC: found at last = 1 print 1, set last = 2.
Find B in SRC: found at last = 4, print 4, set last = 5. Find B in SRC: found at last = 6, print 6, set last = 7. Done.As for the complexity of your algorithm, I'm not able to provide a very formal analysis, but I'll try to explain how I'd go about it.
Assume that all characters are equal in both SRC and SEA and between them. Therefore we can eliminate the condition in your while loop. Also note that your while loop executes n times.
Note that for the first character you will call find(1, 1), ... find(m, n). But these will also start their while loops and make their own recursive calls. Each find(i, j) will make O(m) recursive calls that in its while loop, for I = 1 to n.
But these in turn will make more recursive calls themselves, resulting in a sort of "avalanche effect" that causes exponential complexity. So you have: I = 1: calls find(2, 2), find(3, 3), ..., find(m, n) find(2, 2) calls find(3, 3), ..., find(m, n) find(3, 3) calls find(4, 4), ..., find(m, n) find(4, 4) calls find(5, 5), ..., find(m, n) ... total calls: O(m^m) I = 2: same, but start from find(2, 3).... I = n: same Total complexity thus looks like O(n*m^m). I hope this makes sense and I haven't made any mistakes.
IVlad, thanks for the answer but there's one thing I don't get. The first 'B' is found at last=1, and the other two B's are found at the remaining part of SRC. OK, but I want to find more than one occurrence of the subsequence.
Say SRC was BRRBRBRB (one more 'B' added). Then I would like the algorithm to say that "'B' can be at multiple places. Similarly, the second and the third 'B' can also be at multiple places." and return -1,-1,-1.
So, you would have to not only check the first occurrences, but perhaps all occurrences to be sure. How can we change your algorithm to adapt this situation? – Murat Jan 28 at 16:01 @Murat - sorry, I haven't read your question properly it seems.
However, this should be easy to handle. Reverse SRC and SEA and run the same algorithm again. Print -1 for the positions where you get different values between the two runs.
This should work because the algorithm always finds the first possible match, going from left to right. So if you reverse the string and get different matches, you know where there are multiple possibilities. For example, ran on the reversed strings you'd get 1, 3, 5.
Accounting for the inversions this translates easily to 4 6 8 => -1 -1 -1. – IVlad Jan 28 at 16:10 I think this is a decent approach, IVIlad, it takes linear time as you mentioned. Thank you very much.
I'm posting a Python implementation of the algorithm you proposed. Doesn't even take a second to complete with N=58 and the search string longer than 20 characters! – Murat Jan 28 at 18:32 By the way, do you happen to know how can I derive the T(n,m) equation or compute the big-oh complexity of the method I first wrote (the code above)?
Just out of curiosity... – Murat Jan 28 at 18:42 @Murat - I attempted to do so. I hope it makes sense. – IVlad Jan 28 at 20:25.
This is simply the longest common subsequence problem. This can be implemented with dynamic programming to get a lot less than exponential time. In your case when LCS returns the length of SEA then you know that the sequence SEA exists in SRC, saving their indexes is a trivial thing when modifying the algorithm.
Here's a link to a good explanation. en.wikipedia.org/wiki/Longest_common_sub....
Murat: your current algorithm will run in O(2^n) (if I'm not mistaken or misunderstanding your code) time which is unacceptable in contest problem solving environments. LCS runs in O(n * m) which is the length of the first string times the length of the 2nd or O(n^2) worst case when both strings are of equal length. One further optimization is to realize that you only need 2 rows of the DP table to solve the problem instead of the whole DP matrix – Jesus Ramos Jan 28 at 15:11 This is overkill.
There's no need to use LCS when this has nothing to do with the concept of "longest" in any way. – IVlad Jan 28 at 15:50 @|V|ad: You're correct, I was reading the code and assumed a little too much in this case it's solvable in O(n) where n is the length of the longer string. I will edit my answer to reflect this, thanks again for pointing this out.
– Jesus Ramos Jan 28 at 16:23.
I believe creating a suffix array for your source string would be a good idea. Constructing the suffix array has O(nlogn) complexity. Locating a substring has O(logn) time complexity.
– Murat Jan 28 at 14:56 @Murat: you can find a match with binary search. Once you locate a match, you can scan forward and backward from that position (sequentially) to match more substrings. – Nick Dandoulakis Jan 28 at 14:59 Even more overkill.
Also, you are talking about substrings when the OP wants subsequence. The two are not the same. -1.
– IVlad Jan 28 at 15:51 @IVlad: it's not overkill in case we're going to search the same string for many substrings. But you're right. The OP wants subsequence.
Oops. – Nick Dandoulakis Jan 28 at 16:43.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.