I am trying to build a small machine learning service that would use Python NLTK library. As it is a pet project, and a very small one – I’ve decided to use Heroku for the hosting. And as I am using NLTK library I needed to download models and corpora by calling nltk.download() method to parse punctuation and have some other textual tricks.
Heroku doesn’t allow that method to execute, as it requires GUI interaction.
Initially I got a bit lost, as there is a way to bypass GUI when you provide specific list of NLTK corpora to download – but for certain missing models the error message provided a name of a model that was not compatible with nltl.download().
Luckily, I’ve found the page with a list of all available corpora and associated download IDs.