Extending your own Markov chain generator is probably your best bet, if you want "random" text. Generating something that has context is an open research problem Try (if you haven't): Tokenising punctuation separately, or include punctuation in your chain if you're not already. This includes paragraph marks If you're using a 2- or 3- history Markov chain, try resetting to using a 1-history one when you encounter full stops or newlines Alternatively, you could use WordNet in two passes with your corpus: Analyse sentences to determine common sequences of word types ie nouns, verbs, adjectives, and adverbs.
WordNet includes these. Everything else (pronouns, conjunctions, whatever) is excluded, but you could essentially pass those straight through. This would turn "The quick brown fox jumps over the lazy dog" into "The adjective adjective noun verb(s) over the adjective noun Reproduce sentences by randomly choosing a template sentence and replacing adjective, nouns and verbs with actual adjectives nouns and verbs There are quite a few problems with this approach too: for example, you need context from the surrounding words to know which homonym to choose.
Looking up "quick" in wordnet yields the stuff about being fast, but also the bit of your fingernail I know this doesn't solve your requirement for a library or a tool, but might give you some ideas.
Extending your own Markov chain generator is probably your best bet, if you want "random" text. Generating something that has context is an open research problem. Try (if you haven't): Tokenising punctuation separately, or include punctuation in your chain if you're not already.
This includes paragraph marks. If you're using a 2- or 3- history Markov chain, try resetting to using a 1-history one when you encounter full stops or newlines. Alternatively, you could use WordNet in two passes with your corpus: Analyse sentences to determine common sequences of word types, ie nouns, verbs, adjectives, and adverbs.
WordNet includes these. Everything else (pronouns, conjunctions, whatever) is excluded, but you could essentially pass those straight through. This would turn "The quick brown fox jumps over the lazy dog" into "The adjective adjective noun verb(s) over the adjective noun" Reproduce sentences by randomly choosing a template sentence and replacing adjective, nouns and verbs with actual adjectives nouns and verbs.
There are quite a few problems with this approach too: for example, you need context from the surrounding words to know which homonym to choose. Looking up "quick" in wordnet yields the stuff about being fast, but also the bit of your fingernail. I know this doesn't solve your requirement for a library or a tool, but might give you some ideas.
There's are links to several API's as well.
Very similar, but I'm looking for one that can consume a corpus of text and generate random but similar text. I apologize, I should have been more clear in the question. – Carl Summers Nov 3 '09 at 23:45.
I've used for this purpose many data sets, including wikinews articles. I've extracted text from them using this tool: alas.matf.bg.ac.rs/~mr04069/WikiExtracto....
I'm looking for tools for generating random but realistic text. I've implemented a Markov Chain text generator myself and while the results were promising, my attempts at improving them haven't yielded any great successes. I'd be happy with tools that consume a corpus or that operate based on a context-sensitive or context-free grammar.
I'd like the tool to be suitable for inclusion into another project. Most of my recent work has been in Java so a tool in that language is preferred, but I'd be OK with C#, C, C++, or even JavaScript.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.