How to intelligently parse last name?

Probably the best answer here is not to try. Names are individual and idosyncratic and, even limiting yourself to the Western tradition, you can never be sure that you'll have thought of all the edge cases. A friend of mine legally changed his name to be a single word, and he's had a hell of a time dealing with various institutions whose procedures can't deal with this.

You're in a unique position of being the one creating the software that implements a procedure, and so you have an opportunity to design something that isn't going to annoy the crap out of people with unconventional names. Think about why you need to be parsing out the last name to begin with, and see if there's something else you could do.

Probably the best answer here is not to try. Names are individual and idosyncratic and, even limiting yourself to the Western tradition, you can never be sure that you'll have thought of all the edge cases. A friend of mine legally changed his name to be a single word, and he's had a hell of a time dealing with various institutions whose procedures can't deal with this.

You're in a unique position of being the one creating the software that implements a procedure, and so you have an opportunity to design something that isn't going to annoy the crap out of people with unconventional names. Think about why you need to be parsing out the last name to begin with, and see if there's something else you could do. That being said, as a purely techincal matter the best way would probably be to trim off specifically the strings " Jr", ", Jr", ", Jr.", "III", ", III", etc.From the end of the string containing the name, and then get everything from the last space in the string to the (new, after having removed Jr, etc. ) end.

This wouldn't get, say, "Del La Hoya" from your example, but you can't even really count on a human to get that - I'm making an educated guess that John Mark Del La Hoya's last name is "Del La Hoya" and not "Mark Del La Hoya" because I"m a native English speaker and I have some intuition about what Spanish last names look like - if the name were, say "Gauthip Yeidze Ka Illunyepsi" I would have absolutely no idea whether to count that Ka as part of the last name or not because I have no idea what language that's from.

I'm seconding Tnekutippa here, but you should check out named entity recognition. It might help automate some of the process. This is however, as noted, quite difficult.

I'm not quite sure if the Stanford NER can extract first and last names out of the box, but a machine learning approach could prove very useful for this task. The Stanford NER could be a nice starting point, or you could try to make your own classifiers and training corpora.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions