Monthly Archives: December 2017

My first independent implementation of a NLP research model…

A tiny tiny milestone… But still some lessons learnt for next time.

Key take-aways:

  • It is not that hard. After finding a code base, it takes probably 20 hours of work to complete the model. This includes going back and forth revisiting basic concepts.
  • Cannot cheat programming. If I stop to think, that usually mean I don’t really understand the model.
  • When faced with large and difficult tasks (thesis, big programs), will feel lost in the beginning. Two strategies :
    • Focus on starting so can start early.
    • Work on what you do understand now, even though it seems irrelevant. Work on the exact next step. Don’t solve a more general problem for a specific case! (e.g. do not read the whole book, whole documentation… that’s procrastination too!)

Experiments for a research consist of the following parts:

  • Preprocessing and data cleaning : my teammates have kindly done this for me.
    • Here the key questions are,
      • what should the final data structure be?
      • what should be the input of the model,
      • what should be output of the model?
      • This needs to be very specific.
        • Store in a list or dictionary or pandas dataframe or numpy array or pytorch tensor?
        • Someone told me for my current computational need, these questions are not that significant.
    • The best strategy is to work by doing. Pre-design usually change later.
  • Implementing the models. This is the most intellectually challenging part.
    • First, need to make sure you really understand the model.
      • e.g. What’s the size of the matrix? What is passed to the next layer? How is attention calculated?
    • Second, when the model is really complex, it is difficult to digest all the parts at once!
      • Strategy: It’s frustrating when you don’t understand, but don’t be discouraged and distracted. Instead, work on what you can understand now, and slowly move forward.
    • The test process usually consists of loops. e.g. batch gradient descent — there is batch. And there is epoch. Basically there is hierarchical structure.
      • Key question:
        • What is the smallest working unit? Take it out.

Other tips:

  • Know what you don’t understand. It’s usually a missing part of understanding that gets people think and stop. Write it done.
  • When debugging, duplicate the notebook. Create a new notebook for each unit wants to test! Do not drag along the whole notebook.
  • After clean up some code, duplicate the notebook and delete the test process code.
  • For each class, write step-by-step code first then wrap them together later.

What is a recurrent neural network (RNN)

This post explains what a recurrent neural network – or RNN model – is in machine learning.

RNN is a model that works with input sequences. Difference with sequences and normal data is that for sequence, order matters. Kind of like time-series but this is discrete time. This also means inputs are not independent of each other.

RNN takes input as a sequence of words. The output at each step, is a hidden state vector and an output.

Hidden state vector is a function of previous hidden state s_{t-1} and current input $x$. It is the memory of the network. The formula is s_t = f(Ux_t + Ws_{t-1}).

Output is a possibility across vocabulary o_t = \text{softmax} (Vs_t) .

Here size of parameters:

  • U is a matrix m * n. m is the size of the hidden state vector, n is the size of input vector.
  • W is a matrix m * m. m is the size of the hidden state vector.
  • V is a matrix of size h * m. h is the size of the output vector, m is the size of the hidden state vector.

Depending on the context, the output of an RNN can also be a single vector called context. This will be discussed in the next post.


Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs

What is cross-validation?

This post explains the concept of cross-validation in machine learning.

Cross validation is a way to do model validation when only limited amount of data is available.

Model validation is a step in model selection, whose goal is to select the best model that fits the data.

Model selection is a two-stage process.

  • 1) select a family of hypotheses/models (hypothesis space). e.g. a neural network with 2 hidden layers size fixed vs another neural network with 5 hidden layers size fixed vs decision tree …
  • 2) Then for each of the hypothesis family, select the optimal sets of parameters.

Training set and validation set take care of stage 2). That is, for every hypothesis family of interest –

  • First use training set to iteratively search for estimates of parameters, that minimize empirical loss function over the training set
  • Then (after a few iterations), use the current estimate of parameters, to calculate the empirical loss function over the validation set.
  • Optimal estimates of parameters are chosen when the empirical loss over validation set do not increase – early stop.

Test sets are used for stage 1). For every tuned hypothesis, use test sets to do a meta-comparison and select the one with desirable evaluation metrics.

Usually the whole dataset is split into 7:2:1 as training /  validation / test set.

But when the dataset is small, such split will leave few data available for training. With cross validation, there is no need for a separate validation set.

The steps of cross validation is the following:

  • Partition the data into a test set and a training set. The test set is left untouched until the very end.
  • Divide the training set into K folds.
    • For each i in K, train the model using the K-1 folds left in training set.
    • The model is validated using this hold out set.
  • After completing the K validations, will have K accuracy numbers. Take average of these numbers to be the final validation results.
  • Select models based on these validation results.

Python *args and **kwargs

(A new post, in the spirit of always be jabbing, always be firing, always be shipping. )

This post deals with Python *args and **kwargs. Here args and kwargs are just naming conventions, the grammar is actually * and ** .


Define variable length of parameters for a function. In plain English, the number and type of parameters are not known beforehand.


def test_var_args(f_arg, *argv):
print("first normal arg: ", f_arg)
for arg in argv:
print("another arg through *argv :", arg)

test_var_args('Yasoob', 'python', 'eggs', 'test')

The output is

first normal arg: Yasoob
another arg through *argv : python
another arg through *argv : eggs
another arg through *argv : test


Similar to *args in that it enables variable length inputs. But different in that the inputs can be named.


def table_things(**kwargs):
for name, value in kwargs.items():
print( '{0} = {1}'.format(name, value))

table_things(apple = 'fruit', cabbage = 'vegetable')

The output is

apple = fruit
cabbage = vegetable