Python NLTK module and its download() function

I am trying to build a small machine learning service that would use Python NLTK library. As it is a pet project, and a very small one – I’ve decided to use Heroku for the hosting. And as I am using NLTK library I needed to download models and corpora by calling method to parse punctuation and have some other textual tricks.

Heroku doesn’t allow that method to execute, as it requires GUI interaction.

Initially I got a bit lost, as there is a way to bypass GUI when you provide specific list of NLTK corpora to download – but for certain missing models the error message provided a name of a model that was not compatible with

Luckily, I’ve found the page with a list of all available corpora and associated download IDs.


Generators in Python!

Python generators code tutorial:

UUID3 cross compatible in Java and Python

Sometimes, in the world of microservices you want to generate consistent hashes across several languages. I’ve faced a problem of getting the same UUID in Java and Python, and wanted to share how it can be done.




PyMongo insert_many with overwrite

I’ve recently needed to insert lots of objects into the Mongo collection. The only problem was that some of the objects would have an ‘_id’ key pre-set and would conflict with existing objects in the database.

PyMongo inser_many operation doesn’t support it out of the box, so here is a work-around using bulk api:

