Word Generation

I had an idea for a way to generate fake words from dictionaries a few months ago and I made it. I totally forgot that it existed and just remembered, so I’m posting it here.

To generate fake words, a method that I thought of (almost definitely not the best) is to make a graph with a depth n. Each node is a single character with a probability value (0-1). The probability of a root node x is the number of times that it’s at the beginning of. The probability of a child node (which in turn has more child nodes) y is the chance that y comes after x. The probabilities of a node’s children add up to 1. Then, to generate words, we would just recursively go through the tree while building a word, picking nodes based on their probabilities.

Sometimes, the words it generates are kinda or really weird or sometimes just verbatim taken from the dictionary used.

The source isn’t currently public, I might make it public sometime.

Some output (cherry-picked, n = 5):

  • English
    • uncon
    • querus
    • superhippo
    • sulfiscava
  • German
    • analydamen
    • leserrügte
    • krankferti
    • vertrrüstu
    • bürstrostz
  • Finnish
    • vapaak
    • paraasorto
    • pitkähyvät
    • puolipiiri
    • räntäaukko
    • holiskiist
  • Telugu
    • దళగిం (dhaLagiM)
    • అనిరు (aniru)
    • విద్య (nidhya)
    • వ్రతన (vrathana)
    • పైటగి (paitagi)
  • Korean
    • 건지다조카설 (geonjidajokaseol)
    • 평양찌꺼기여 (pyeong-yangjjikkeogiyeo)
    • 보행자 (bohaengja)
    • 싸우다주민개인적권리 (ssaudajumingaeinjeoggwonli)

Note: The generated Korean words seem to always show actual translations in Google Translate. I have no idea why, but all four were completely randomly generated (even the second one). The Korean generations were the only ones not cherry-picked, since I can’t read Hangeul or assess how much it sounds like Korean (contrasted with how I can do that with English and Telugu, and somewhat with German and Finnish). The second one was also completely randomly generated, but I’ll keep it.

Leave a Reply

Your email address will not be published. Required fields are marked *