Ryuzo and the Seven Henchmen

Recently watched another masterpiece from Takeshi Kitano.

Plot in a twitter format: Old yakuza gathers his very old yakuza friends to do some brutal yakuza stuff and compete for turf of a small Japanese town.

The movie is full of humor and subtle and gentle moments that reflect on the matters of our short lives and temporary friendships. It is shot in a calm, very Kitano manner and gives a great, pacifying visual experience.

If you want to get away from once again Beauty and King Kong, check it out.

http://www.imdb.com/title/tt4176776/

My rating:

8/10, yanking two points for poor special effects.

MXNet im2rec.py

MXNet provides a Python utility to create RecordIO packages of data that are supported by framework’s data iterators.

https://github.com/dmlc/mxnet/blob/master/tools/im2rec.py

Little detail, the im2rec.py doesn’t support .png files, unless you manually adjust the script and add it as allowed extensions.

In order to create a rec file, you first need to make a list of files using this command:

python \\mxnet-master\tools\im2rec.py  --list 1 --recursive 1 --num-thread 4 --train-ratio 0.7 --test-ratio 0.2 prefix frames
--list 1 ==> tells the script to create the .lst files
--num-thread 4 ==> runs script in parallel
--train-ratio 0.7 ==> will split the data set between several list files
--test-ratio 0.2 ==> it will make sure that 20 percent of the data will be used in the test set.

After you get .lst files, lets create a record set.

Record objects will be about the same size as the data that being used for the objects.

python \\mxnet-master\tools\im2rec.py --num-thread 4 --quality 80 prefix frames

here we eliminating –list argument so the script will be working on .rec files.

At the end of its execution, it should display something like

time: 0.0130000114441 count: 0
time: 3.01699995995 count: 1000
time: 2.83000016212 count: 2000

and in the folder where you have your dataset you will see files

prefix_test.rec

prefix_test.lst
prefix_test.val

and same for train and validation sets.

 

Enjoy!

Installing Apache MXNet Python library Windows 10

Hello, hello.

I wanted to share a quick how-to install MXNet on Windows 10 64-bit.

The solution here is a bit opinionated as I am using Anaconda to manage Python environments.

I will start from the beginning – Anaconda installation.

We will need Anaconda 4.3. You should use 64-bit Anaconda/Python 2.7 combination, as the environment we will create will be based on Python 2.7 64-bit in order to support MXNet. You can get anaconda here:

https://www.continuum.io/downloads

Follow the installation instructions and once it is done, lets go to the windows command line and type this:

conda create -n mxnet python=2.7 anaconda

having ‘anaconda’ argument in there will ensure we have common data analytics libraries, that are not necessary for MXNet, but nice to have. Also, we want to use Python 2.7 in this case, as MXNet has compatibility issues with Python 3.

Once the environment creation is complete, activate it by executing:

activate mxnet

It is all for the terminal so far.

Lets get pre-combiled MXNet dlls!

You will need to download several packages from

https://github.com/yajiedesign/mxnet/releases

First, get the prebuild VC14 ( Visual Studio runtime 2015) package, it doesn’t contain any specific binaries, but it provides 3rd party libraries and helps to set all the environment variables that are necessary to run MXNet. In my case I got prebuildbase_win10_x64_vc14 

Once you download VC14 base archive, extract its contents somewhere. A good folder candidate would be ‘mxnet’ in the root of C: or any other drive. Let’s imagine that you have created ‘D:\mxnet’ folder.

Open terminal in that folder an run

setupenv.cmd

It should finish quickly. You can verify that it was successful by going into the System -> Advanced System Settings and making sure it has MXNET_HOME:”D:\mxnet” environment variable setup.

Now, lets download GPU or CPU version of pre-compiled MXNet from the same page where you got the base package. At the moment of writing it is 20170702_mxnet_x64_vc14_gpu.7z

We won’t need the source code of mxnet, as we are not going to compile it from scratch, so you can IGNORE Source code (zip)

The archive will have contents that you need to extract to “D:\mxnet”, it might overwrite some of the folders, which is ok.

Optional STEP:

Now, if you used GPU version of MXNet, the last part would be is to get the cuDNN nVidia library for windows, you will have to register to get it, but essentially it is free.

cuDNN is available here:

https://developer.nvidia.com/rdp/cudnn-download

Once you download the archive, extract its contents to “D:\mxnet\3rdparty\cudnn”.

END OF Optional STEP.

It is time now to try to install python binding, so you can import mxnet inside a python project.

Go to the terminal where you had your MXNet Anaconda environment activated and jump to the location: “D:\mxnet\python”. Once you are in the correct location, run

python setup.py install

If it ran without any errors, then we are ready to do our final test. Go back to the terminal where you acivated mxnet environment and first, start Python interpreter, and then run

import mxnet

You should be able to import the library.

Enjoy!

 

Python NLTK module and its download() function

I am trying to build a small machine learning service that would use Python NLTK library. As it is a pet project, and a very small one – I’ve decided to use Heroku for the hosting. And as I am using NLTK library I needed to download models and corpora by calling nltk.download() method to parse punctuation and have some other textual tricks.

Heroku doesn’t allow that method to execute, as it requires GUI interaction.

Initially I got a bit lost, as there is a way to bypass GUI when you provide specific list of NLTK corpora to download – but for certain missing models the error message provided a name of a model that was not compatible with nltl.download().

Luckily, I’ve found the page with a list of all available corpora and associated download IDs.

http://www.nltk.org/nltk_data/

Enjoy.

RE7 – Biohazard

I want to congratulate all the fans of the series. Finally there is an episode that has an old house.

An episode with puzzles in an old house. Also, an episode where you have to save bullets when you trying to kill these very, very scary mutations and family members.

I am loving the gameplay, but can not play it alone, I get scared. A true horror game as it should be. I am playing with my mom, who is about to finish the game on her own. We both are fascinated with the visuals of the game that are pushing the boundaries of what our old PS4 could deliver. Truly beautiful and scary.

I am happy there will be new episodes coming in later as DLCs. Not a big fan of the DLC concept, but I will make an exception.

A strong return in my opinion.

Ideology

Ideology – is a system where other people can wreck your life.

https://en.wikipedia.org/wiki/Ernst_Kolman

Generators in Python!

Python generators code tutorial:

import string
import inspect

# Informal tutorial on the generators. We will cover very briefly:
## 1) What are the generators (PEP 255).
## 2) What is yield (PEP268).


# Generator in Python is a fancy object that maintain its state between calls. 
# In simple words generator knows what to return you next on a next function call. 
# And it doesn't need to store it in memory. 
# Think of it as a computation on-demand. Lazy, yeah. 
# Generator uses iterator protocol. 
# Thus it is common to say that every generator is an iterator. 
# It just means we can get a next value from the generator via next() call.
# You get a generator by asking to return one from a function. 

# Theoretically, you can compute something blah blah to 
# infinity using generators, and have very little memory footprint. 

# The 'yield' keywoard in a function will cause the function 
# instead of a value return a generator-iterator object. 


# Lets demonstrate!

# lets define a function that will return us a generator-iterator object 
# when we call it 
def english_alphabet_generator_function():
    for char in string.ascii_lowercase:
        yield char


print english_alphabet_generator_function 
# <function alphabet_generator at 0x1004c7320>, it is still a function


# Lets call english_alphabet_generator to get the actual generator object
alphabet_generator = english_alphabet_generator_function() 
# calling the generator function will return a generator-iterator object. 

print type(alphabet_generator) 
# <type 'generator'>, not a function anymore, we got generator object

# Now lets examine function locals - varibales defined in the function namespace 
print alphabet_generator.gi_frame.f_locals 
# {}, nothing yet 

# Now we can use iterator protocol and get print a few characters by calling generator's next() method. 
print alphabet_generator.next() # prints 'a'
print alphabet_generator.next() # prints 'b'
print alphabet_generator.next() # prints 'c'


print alphabet_generator.gi_frame.f_locals 
# prints {'char': 'c'}


# lets exauhst our generator and see what will happen
# it will print characters from d to y and throw 'StopIteration' exception
empty = False
while not empty:
    try:
        print alphabet_generator.next()
    except(StopIteration) as exp:
        empty = True
    
# Another not very useful example:
def _my_gen():
    for x in xrange(200, 400):
        yield 1

gen = _my_gen()
print("Sum:")
print(sum(gen)) # will print 200! :-) 


# Please, refer to 
# https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/ for a in-deep explanation with 
# very interesting examples. Thank you!

UUID3 cross compatible in Java and Python

Sometimes, in the world of microservices you want to generate consistent hashes across several languages. I’ve faced a problem of getting the same UUID in Java and Python, and wanted to share how it can be done.

Java

UUID.nameUUIDFromBytes("name".getBytes())
# b068931c-c450-342b-a3f5-b3d276ea4297

Python

class NULL_NAMESPACE:
    bytes = b''

uid.uuid3(NULL_NAMESPACE, "name")
# b068931c-c450-342b-a3f5-b3d276ea4297

Enjoy!

PyMongo insert_many with overwrite

I’ve recently needed to insert lots of objects into the Mongo collection. The only problem was that some of the objects would have an ‘_id’ key pre-set and would conflict with existing objects in the database.

PyMongo inser_many operation doesn’t support it out of the box, so here is a work-around using bulk api:

try:
    bulk = collection.initialize_unordered_bulk_op() # or ordered
    objects_to_insert = (prepare generator for objects that need to be saved)
    for one in objects_to_insert:
        bulk.find({"_id": one["_id"]}).upsert().replace_one(one)
     bulk.execute()
except BulkWriteError as exc:
    # exc.details available for more information
    pass # do something here

Keep H1B

Just a short point. Conservatives talk about H1B visas being used to bring workforce that doesn’t really match the criteria that is defined in the H1B terms. It is possible, and I agree sometimes it happens.

But you need to understand that the cancelling H1B will just mean that the companies will switch to a remote, online workforce. No options for taxation and the money paid to a person will not fall into the pocket of a local hotel, grocery store or a barber.

You can not protect technology jobs inside the US by not letting people in to the country. People will just work remotely. The possibility for remote work is available for designers, engineers and basically anyone who uses computer to produce some unit of work.

Wake up.