It really depends on what you mean by 'programmatically'. Part of English works on easy to understand rules, and part doesn't. It has to do mainly with frequency.
For a brief overview, you can read Pinker's "Words and Rules", but do yourself a favor and don't take the whole generative theory of linguistics entirely to heart. There's a lot more empiricism there than that school of thought really lends to the pursuit A lot of English can be statistically lemmatized. By the way, stemming or lemmatization is the term you're looking for.
One of the most effective lemmatizers which work off of statistical rules bootstrapped with frequency-based exceptions is the Morpha Lemmatizer You can give this a shot if you have a project that requires this type of simplification of strings which represent specific terms in English There are even more naive approaches that accomplish much with respect to normalizing related terms. Take a look at the Porter Stemmer which is effective enough to cluster together most terms in English.
It really depends on what you mean by 'programmatically'. Part of English works on easy to understand rules, and part doesn't. It has to do mainly with frequency.
For a brief overview, you can read Pinker's "Words and Rules", but do yourself a favor and don't take the whole generative theory of linguistics entirely to heart. There's a lot more empiricism there than that school of thought really lends to the pursuit. A lot of English can be statistically lemmatized.By the way, stemming or lemmatization is the term you're looking for.
One of the most effective lemmatizers which work off of statistical rules bootstrapped with frequency-based exceptions is the Morpha Lemmatizer. You can give this a shot if you have a project that requires this type of simplification of strings which represent specific terms in English. There are even more naive approaches that accomplish much with respect to normalizing related terms.
Take a look at the Porter Stemmer, which is effective enough to cluster together most terms in English.
1 +1 for morpha.. I had the same problem, and morpha did a really good job at solving it – adi92 Sep 4 '09 at 3:25 Another stemmer option is the UEA stemmer for which there are Ruby, Java, Perl, and Scala implementations. Github. Com/ealdent/uea-stemmer/tree/master uea.ac.Uk/cmp/research/graphicsvisionspeech/speech/WordStemming github.
Com/DRMacIver/uea-stemmer-scala/tree/master – ealdent Sep 7 '09 at 0:35.
No - English isn't a language which sticks to many rules. I think your best bet is either: use a dictionary of common words and their plurals (or group them by their plural rule, eg: group words where you just add an S, words where you add ES, words where you drop a Y and add IES...) rethink your application.
Yea, after discovering that list I linked, my hopes plummeted, but I was still curious. – Matthew Scharley Sep 4 '09 at 3:12 English plurals are actually pretty regular. Far more so than say German or french.
– cletus Sep 4 '09 at 3:20.
AmE "ax" = BrE "axe". Similarly, is "ellipses" the plural of an "ellipse" (an oval shape) or an "ellipsis" (…)? Is "bases" the plural of a "base" or a "basis"?
Is "taxes" the plural of a "tax", or a "taxis" (as in biology)? Other examples, anyone? – ShreevatsaR Sep 4 '09 at 4:27 How about which of indexes or indices is the plural of "index"?
– JB King Sep 4 '09 at 23:22.
You can take a look at Inflector. Net - my port of Rails' inflection class.
Going from singular to plural, English plural form is actually pretty regular compared to some other European languages I have a passing familiarity with. In German for example, working out the plural form is really complicated (eg Land -> Länder). I think there are roughly 20-30 exceptions and the rest follow a fairly simple ruleset: -y -> -ies (family -> families) -us -> -i (cactus -> cacti) -s -> -ses (loss -> losses) otherwise add -s That being said, plural to singular form becomes that much harder because the reverse cases have ambiguities.
For example: pies: is it py or pie? Ski: is it singular or plural for 'skus'? Molasses: is it singular or plural for 'molasse' or 'molass'?
So it can be done but you're going to have a much larger list of exceptions and you're going to have to store a lot of false positives (ie things that appear plural but aren't).
You could have the de-pluralizer guess which one is right, then check a dictionary to see if the word it guesses exists, but even this is going to get it wrong sometimes. – Chris Lutz Sep 4 '09 at 4:21 1 If you're using a dictionary anyway, you have access to all the plurals so there's no need for an algorithm. – cletus Sep 4 '09 at 5:59 In English: box -> boxes (not boxs), dish -> dishes (not dishs), etc.– Robert L Sep 15 '09 at 12:08.
Probably not seeing as English uses pluralization rules from multiple languages. In addition to that no rule will ever let you know that Goose is the singular form of Geese or Octopus is the singular form of octopi. Goose...Geese Mouse...Mice Octopus...Octopi.
Actually, octopi is incorrect. The classically proper form is octopodes, but the accepted form is octopuses. – WCWedin Sep 5 '09 at 14:30 Actually it's not... "There are three forms of the plural of octopus; namely, octopuses, octopi, and octopodes.
Currently, octopuses is the most common form in the US as well as the UK; octopodes is rare, and octopi is often objectionable" – gshauger Sep 5 '09 at 16:00.
It is not possible, as nickf has already said. It would be simple for the classes of words you have described, but what about all the words that end with s naturally? My name, Marius, for example, is not plural of Mariu.
Same with Bus I guess. Pluralization of words in English is a one way function (a hash function), and you usually need the rest of the sentence or paragraph for context.
For my intentended purpose, I can (relatively) safely assume that the word I am looking at is a plural, ie. In the context it wouldn't make sense elsewise. – Matthew Scharley Sep 4 '09 at 3:18.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.