The way I do it is less elegant that what it looks like you are shooting for.
The way I do it is less elegant that what it looks like you are shooting for: I preprocess the documents using a named entity recognizer and save all of the entities in a separate file. Then, when I am publishing to Solr, I just read the entities from this file and populate the entity fields (different for people, locations, and organizations). This could be simplified, but since I had already done the parsing for other work, it was easier to just reuse what already existed.
Here's an idea I think would work in lucene, but I have no idea if it's possible in solr. You could tokenize the string outside the typical tokenstream chain as you suggest then manually add the tokens to the document using the NOT_ANALYZED option. You have to add each token separately with document.
Add(...) which lucene will treat as a single field for searching.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.