Similarity String Comparison in Java?

Yes, there are many well documented algorithms like.

Yes, there are many well documented algorithms like: cosine similarity Jaccard similarity Dice's coefficient Matching similarity Overlap similarity etc etc alternatively you can check this check also these probjects: dcs.shef.ac.uk/~sam/simmetrics.html jtmt.sourceforge.net.

1 The simmetrics site doesn't seem active anymore. However, I found the code on sourceforge: sourceforge.net/projects/simmetrics Thanks for the pointer. – Michael Merchant Dec 22 '11 at 21:06.

You could use Levenshtein distance to calculate the difference between two strings. en.wikipedia.org/wiki/Levenshtein_distance.

1 Levenshtein is great for a few strings, but will not scale to comparisons between a large number of strings. – spender Jun 5 '09 at 10:00 I've used Levenshtein in Java with some success. I havent done comparisons over huge lists so there may be a performance hit.

Also it's a bit simple and could use some tweaking to raise the threshold for shorter words (like 3 or 4 chars) which tend to be seen as more similar than the should (it's only 3 edits from cat to dog) Note that the Edit Distances suggested below are pretty much the same thing - Levenshtein is a particular implementation of edit distances. – Rhubarb Jun 5 '09 at 10:27.

This is typically done using an edit distance measure. Searching for "edit distance java" turns up a number of libraries, like this one.

Sounds like a plagiarism finder to me if your string turns into a document. Maybe searching with that term will turn up something good. "Programming Collective Intelligence" has a chapter on determining whether two documents are similar.

The code is in Python, but it's clean and easy to port.

I translated the Levenshtein distance algoritem into javascript. en.wikipedia.org/wiki/Levenshtein_distance String.prototype. LevenshteinDistance = function(s2) { var array = new Array(this.

Length + 1); for(var i=0; iLength + 1; i++) arrayi0 = i; for(var j=0; jLength + 1; j++) { if (thisi-1 == s2j-1) arrayij = arrayi-1j-1; else { arrayij = Math. Min(arrayij-1 + 1, arrayi-1j + 1); arrayij = Math. Min(arrayij, arrayi-1j-1 + 1); } } return arraythis.

Lengths2. Length; }.

Theoretically, you can compare edit distances.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions