# 重读 hierarchical attention network for document classification

• 将每个句子里的词向量送入 GRU (see last post on explanation on what is a GRU)，收集每一步输出的 hidden state （因而不能直接调用 pytorch nn.GRU 函数而需要稍作改变写个 for-loop 并把结果存起来)
• 把所有的 hidden state 送入MLP，生成对词向量的表征
• 随机初始化一个 context vector，这个 context vector 的意义是句子的含义。用它和每个向量的表征求点积，代表 attention score。score 越高，说明两个向量越相似，也就说明这个词在这个句子里有更显著的意义。因此给它的 attention weight 也就应该比较高。
• 将 attention score 送入 softmax 函数求得权重。用这个权重和原始的 hidden states sequence 求 weighted sum 得到整个句子的表征。

• 我们一共只训练两套 GRU 单元，一个负责总结句子，一个负责总结段落。因此所有的句子必须一样长，所有的段落必须有一样长度的句子。因此在预处理时，过长的句子被剪掉，过短的句子被补足。具体的长度选取可以看训练数据中长度的分布，用 qunatile 选择。
• 将数据划整齐后，首先用上述方法得到每个句子的表征。
• 其次，针对每个段落，再把所有的句子表征送入GRU，得到段落表征。
• 最后就可以用这个表征做分类了。

# 2D convolution with tiling technique on GPU

This post describes how to write CUDA C code to perform 2D convolution on GPU with tiling technique. It is very brief, only covers basic concepts but with links to code examples. We are not going to use cuDNN, only the bare bones of CUDA. But here is an excellent post if you want to use cuDNN – for this particular task, cuDNN is slower than the code I will present below.

You can find the source code for this post here.

This post will consist of the following sections:

• Installation CUDA
• Basic GPU programming model
• Vanilla convolution on GPU
• Constant memory in GPU
• Tiling technique and indexing

Install CUDA

First, make sure if you have a NVIDIA GPU on your machine. Here is how. Or just search the model online and ask on reddit 🙂

Next, follow the official NVIDIA guide here to download CUDA Toolkit. Note not every card support every version of CUDA kit.

Note also the download path, as you will need to specify that to the compiler later. If you are using Linux you can type whereis nvcc to find out. nvcc is the NVIDIA C compiler.

It’s easy to get wrong. Just follow carefully every step in the official installation guide. In particular don’t forget to finish the post-installation set-ups! Wrong PATH environment variables can be a real pain.

Basic CUDA Programming model

The most prominent characteristic of CUDA programming is parallization. When you write one line of code in a CUDA kernel (explained later), it will be executed by thousands of threads on different data. This is called the SIMD (same instruction multiple data) parallel paradigm.

With this idea in mind, here is what a typical CUDA C code look like:

• Setting up the program on the host (CPU) by initializing variables and populate them. Write non-parallel sequential code.
• Preparing for GPU work by initializing device (GPU) variables that reside in device memory, and copy contents from the corresponding host variables to it.
• Write a GPU kernel function to deal with the parallel part of computation (e.g. convolution).
• Set up the parallel configurations by specifying how many blocks and grids are used for the kernel. These two variables will decide how many threads we started for parallel execution.
• Call the kernel from host code. Store the results in a device variable.
• Copy contents from device to the host.

Some concepts to clear:

• How many threads we use in the GPU is controlled by Grid size and Block size. A grid consists of many blocks. Grid and Block are both 3D structures. For example, a grid of dimension <2, 2, 2> will have 8 blocks in total, and a block of dimension <2, 2, 2> will have 8 threads in total. A block cannot have more than 1024 threads.

Here is a basic matrix addition example:

#include<stdio.h>
#include<cuda.h>

// function declarations
__global__ void vecAddKernel(float * a, float * b, float * c, unsigned int N);

// main function
int main()
{
int N = 10;    // length of vector
float * a, * b, * c;  // a and b are vectors. c is the result
unsigned int size = N * sizeof(float);  // number of bytes to allocate
a = (float *)calloc(N, sizeof(float));
b = (float *)calloc(N, sizeof(float));

int i = 0;
float sum = 0;
for (i = 0; i < N; i++){
a[i] = (float)i / 0.23 + 1;
b[i] = (float)i / 5.89 + 9;
sum += a[i] + b[i];
}

c = (float*) malloc(size);

// 1. allocate memory on CUDA
float * d_a, * d_b, * d_c;   // device memory
cudaError_t err1 =  cudaMalloc((void **) & d_a, size);
cudaError_t err2 = cudaMalloc((void **) & d_b, size);
cudaError_t err3 = cudaMalloc((void **) & d_c, size);
if (err1 != cudaSuccess){
printf("%s in %s at line %d\n", cudaGetErrorString(err1), __FILE__, __LINE__);
exit(EXIT_FAILURE);
}
if (err2 != cudaSuccess){
printf("%s in %s at line %d\n", cudaGetErrorString(err2), __FILE__, __LINE__);
exit(EXIT_FAILURE);
}
if (err3 != cudaSuccess){
printf("%s in %s at line %d\n", cudaGetErrorString(err3), __FILE__, __LINE__);
exit(EXIT_FAILURE);
}

// copy memory
cudaMemcpy(d_a, a, size, cudaMemcpyHostToDevice);
cudaMemcpy(d_b, b, size, cudaMemcpyHostToDevice);

// 2. operate on kernels

// 3. copy the results back to host
cudaMemcpy(c, d_c, size, cudaMemcpyDeviceToHost);

cudaError_t error = cudaGetLastError();
if(error!=cudaSuccess)
{
fprintf(stderr,"ERROR: %s\n", cudaGetErrorString(error) );
exit(-1);
}

float cuda_res = 0;
for(i = 0; i < N; i++){
printf("%.2f\t", c[i]);
cuda_res += c[i];
}

printf("Results from host :%.2f\n", sum);
printf("Results from device:%.2f\n", cuda_res);

cudaFree(d_a);
cudaFree(d_b);
cudaFree(d_c);

return 0;
}

__global__
void vecAddKernel(float * a, float * b, float * c, unsigned int N){
int i = threadIdx.x + blockDim.x * blockIdx.x;
if (i<N)  c[i] = a[i] + b[i];
}


Vanilla convolution on GPU

Where to parallelize for convolution? We use every thread to correspond to one single element in the output, and all threads do the same convolution operation but with different data chunks.

Constant memory in GPU

Constant memory variables are shared by all threads in all blocks. This memory is small but fast. So we should put the filter in it because everyone needs it, and it’s small.

Tiling technique and indexing

Tiling is making use of the shared memory on GPU. Shared memory is shared by all threads in the same block, it is small but fast. So the idea is for every thread in a block, they will operate on data regions that have some overlaps. And we make use of that to copy the small region of the larger picture that these threads need into the shared memory and dealing with them.

We use all threads to copy these data elements. Here is a further illustration of this tiling idea.

Code example

Here.

Here is the tiling organization :

# Understanding GRUs

This post follows this post last year about vanilla Recurrent Neural Network structures.

One of the ultimate goal of a recurrent neural network is to summarize a sequence of vectors into a single vector representing all the information.

In the simplest case, imagine for the past month, every day  you and your friend did some push-ups. Let’s denote your number as $x_i$ at day $i$, and your friend’s number as $y_i$ at day $i$. Now a month has passed and you two want to compare who is more keen on working out.

So what you do you? Directly comparing two vectors is hard, because there is no obvious “greater-than” relations across dimensions? One way is to calculate the simple average of the two sequences, i.e. on average you did $\bar{x}$ and your friend did $\bar{y}$ per day. Whichever number is larger, you can bet that person is a better athlete!

Now in this example, time order does not really matter, because we are treating numbers from 30 days ago as the same as number yesterday. Another example where time does matter is interests from bank savings. Suppose every year you save $b_i$ amount of Yuan into the bank, and the bank pays out interests at the end of year, once per year. How much will your total savings be at the end of the fifth year? You can image the oldest money will receive the highest interests, because of confounding.

With the same logic, we can apply this idea to summarizing sentences. Suppose every word can be represented as a $n-D$ vector in a semantic space. How do we produce one single vector that represent the meaning of this sentence?

Taking average is a way, but note every word will have some influence on the words after it. e.g. “this burger is delicious.” Note how the “burger” constrains word choices after “is” … And that’s why we need recurrent neural network : at every step of the model, the hidden vector (which contains information about previous words) will concatenate with the current word, and becomes the input into producing the next hidden vector. So every word will have a lasting influence in the final output.

However, simple RNN has a vanishing (explosive) gradient problem. Mathematically, the older a word is, the higher order it will has on the multiplication factor of weight matrix. When taking first-order gradients, the weight matrix will have an order of $n-1$. Now if one term is larger than 1, as $n$ approaches infinity this will will approach infinity too. If one term is smaller than 1, as $n$ approaches infinity this will will approach zero and thus the model will “stop learning” (i.e. weights will not update).

Let’s formulate the intuition above more formally. For a simple RNN, every update we have

$h_t =g (W x_t + U h_{t-1} + b)$

Here $g$ can be any non-linear activation function, e.g. a RELU or a sigmoid. Now we consider the gradient of $h_t$ with regard to $W$.

Write

$h_t = g (O_t)$

and

$O_t =W x_t + U h_{t-1} + b$

Using the chain rule we have:

$\frac{\partial h_t}{\partial W} =\frac{\partial h_t}{\partial O_t} \frac{\partial O_t}{\partial W}$

We also know:

$\frac{\partial O_t}{\partial W} = X_t + U_h \frac{\partial h_{t-1}}{\partial W}$

So plug in the second equation above into the first equation, we have:

$\frac{\partial h_t}{\partial W} = {\partial g} \cdot (X_t + U_h \frac{\partial h_{t-1}}{\partial W})$

We can already see a recurrent relation between $\frac{\partial h_t}{\partial W}$ and $\frac{\partial h_{t-1}}{\partial W}$. To make things clear, write $\frac{\partial h_t}{\partial W} = \alpha_t$, and expand the equation a few more steps:

$\alpha_t = {\partial g} \cdot (X_t + U_h \alpha_{t-1})$

$\alpha_t = {\partial g} \cdot (X_t + U_h \cdot \partial g \cdot (X_{t-1} + \alpha_{t-2}) )$

$\ldots$

$\alpha_t = C + (\partial g U_h)^{n-1} \alpha_0$

Here $C$ represent the other terms (with lower order of $\alpha_t$-s) in the formula that we omitted for simplicity. So we can see, as $n$ increases, if norm of $\partial g U_h$ is greater than 1 this term will approach infinity, or if is less than 1 it will approach zero.

The main reason is the same term is multiplied $n-1$ times. i.e. information always flow at the same rate every step. If you think about a sentence, this does not really make sense as words meaning / importance does not really decrease (increase) exponentially w.r.t. it’s distance to the end of the sentence. The first word might very well be very important (e.g. a female name which will influence later pronoun choices “he” or “her”), but this vanilla RNN structure is too simple to capture that. We need more dynamic structures to allow information flow freely between time stamps.

Gated recurrent unit solves the problem by adapting different terms in every update step. There are two key ideas:

• Introduces an external “memory” vector to capture long distance dependencies.
• Allow error messages to flow at different strengths depending on inputs.

It achieves this by first computing two “gates”:

update gate : $z_t = \sigma (W_z x_t + U_z h_{t-1} + b_z)$

reset gate: $r_t = \sigma (W_r x_t + U_r h_{t-1} + b_r)$

They are both continuous vectors of the same length as the hidden state, constructed by passing the current word $x_t$ and the last hidden vector $h_{t-1}$ through an MLP. The sigmoid function makes every element between 0 and 1, so when used to perform element-wise multiplications $\o$, these two gates essentially controls how much information “flows” through.

There is also a candidate vector representing “new memory content”:

$\tilde{h_t} = \text{tanh} (W_h X_t + r_t \circ (U_h h_{t-1} + b_h)$

The usage of $tanh$ and $W_h X_t$ is pretty standard as in all other unites. But note the meaning of reset gate $r_t$ here. If $r_t$ is close to zero, then we “forget” all the previous information in the $h_{t-1}$ vector, and just use the current word information.

An example where the reset gate is useful is sentiment analysis on a movie review, where the author spend many sentences describing and summarizing the movie plot, but conclude that “But the movie is really boring”. With this sentence, all the previous information become useless. This reset gate can help the model “forget” the previous summaries.

Finally, we update the new hidden vector as :

$h_t = z_t \circ h_{t-1} + (1 - z_t) \circ \tilde{h_t}$

You can see the update controls how much information is remembered from old state, and how much information is acquired from the new memory content.

A caveat is why do we need the reset gate at all now that we have the update gate? Doesn’t we already have a gate to control how much old information is retained by using an update gate? Doesn’t the update gate also have the ability to eliminate all historical information embedded in $h_{t-1}$? I did not find satisfactory information about this point online.

Now think about why GRU can solve the vanish gradient problem. The update gate allows us to retain all previous information by setting all elements of $z_t$ to 1, and $h_t$ is exactly the same as $h_{t-1}$. In this case, we do not have the vanishing gradient problem because no weight matrix is multiplied.

For example, in the same movie review suppose the author says “I love this movie so much” and then goes on to explain the movie plot. Now the first sentence is really important, but with recurrent neural network, we update the gradient at every step and the content will be washed out as time passes. But now the update gate allows the model to retain the first sentence without exponentially decay its content.

Units with short term memories usually have reset gate very activate (i.e. numbers close to 1), and thus forget most about the past…

If reset gate is entirely 1 and update gate is entirely zero, then we just have a standard RNN.

Why tanh and sigmoid?

• In theory tanh can also be RElu, but in practice it just performs better.
• Sigmoid will control value between 0 and 1, but again no formal justification.

Finally, here is an illustration that pictures what’s going on in a GRU unit:

Reference:

# pointnet 论文理解

pointnet 是这系列的重点论文，2016年发表。这篇论文说不需要进行这样的处理，而是直接把点云数据输入到 multi-layer perceptron 中。步骤如下：

1. 假设 batch size = 32, 每张图片有 1024个点。那么每批次输入的数据形状为 32 * 1024 * 3
2. 首先把这个数据中每一张图片和一个 3 * 3 的 transform matrix 相乘。整体的形状，就是 32 * 1024 * 3 的 tensor乘上 32 * 3 * 3 的 tensor，使用 tf.matmul 函数，得到的结果是 32 * 1024 * 3。物理意义是把这个点进行坐标变换。这样做的原因是这篇讲  spatial transformer 的论文，还没看。
• 这个 transform matrix 并不是随机生成，而是一个较复杂的 T-net。它的网络结构和这个大的网络结构一样，也是将输入图片几次升维，非线性变换 + bias，然后降维，得到的矩阵。所以不同的图片虽然分享了网络参数，但是并没有乘以相同系数。
• 为什么这个网络这么复杂？其实和GRU这类网络有异曲同工之处，都是用当下数据既生成权重，又用这个权重再次处理当下数据。
3. 图片处理好之后的形状是  32 * 1024 * 3. 接下来是第一次升维。具体操作是，每个点的三个维度，求 weighted sum 获得一个 scalar。然后这样的操作重复64次，得到一个64维的向量。其实也就是长度为3的向量乘以 3 * 64 的矩阵。这样会得到 32 * 1024 * 64 维的数据。将这个数据加上随机的 bias 之后，做 batch normalization，形状不变。
• 在作者的源代码中，这一步没有用拼接 64 个线性层的方法，而是用 tensorflow 中的 conv2d 实现。通过变换 channel 数来代表64个 fully connected layer。
4. 接下来又是一次“升维”操作，和上一步一样，但是输出的维度仍然是64.
5. 现在的数据形状是 32 * 1024 * 64.接下来是 feature transform。通过同样的网络结构，获得一个 32 * 64 * 64 的 tensor。用 tf.matmul 把每一个点再次变换，原理和2）一样。
6. transform 之后，将图片进行升维。先升到64维，然后是128，最后是 1024.最后我们得到一个 32 * 1024 * 1024 的数据形状。
• 在作者的源码中，数据形状是32 * 1024 * 1 * 1024，因为要使用 conv2d 操作。
7. 接下来是关键一步：要概括总结上一步中每个图片中所有的点，得到代表全局信息的向量。这里通过 maxpooling 实现，也就是对每个 channel ，取图片中所有点中的最大值。也就是 element-wise max pooling。
• 源码中的实现是通过tf.max_pool 函数，输入就是 32 * 1024 * 1 * 1024 的这个数据，kernel size = [1, num_points (1024), 1, 1]，输出是 32 * 1 * 1 * 1024。
• 为什么不用 sum 或者 average？这篇论文针对 max pooling 操作提出了理论性质，我还没看。
8. 一番折腾后，终于得到了代表每张图片的全局向量。下一步是reshape ，成为形状为 32 * 1024 的一个数据块。把两个多余维度扔掉。

• classification：只需要全局表征即可。把这个表征通过三层 fully connected layers 生成长度为 40 的向量（因为一共40类），通过 softmax 层得到概率，用 cross entropy 的均值作为 loss function，通过一维逼近得到参数的最优值。
• segmentation：作者的处理是对每个点做 classification，每个点都贴上这个全局变量后，进行分类。
• semantic segmentation，不是很懂，略过。

# Writing Python tests: doctest, unittest, and pytest

So I set myself another task the other day: writing tests for an open-source project —- a baby step towards software engineering! This post describes the tools and practices I’ve learnt.

The project I’ve committed to is here. The goal is to write tests for an experiment class, whose objective is to save state of machine learning experiments and quickly retrieve them.

There are two sets of tools to be learnt: 1) conceptually, what is a test and how to  write one? 2) practically, what tools does Python offer for writing and organizing test code, and how to use them? I will answer these two sets of questions below.

What is software testing and how to write it?

Testing is just write code to test if a software behaves as expected. If it does not, then there is a bug, and bugs are bad. Most of the online materials / books on this are kind of fluffy, describing concepts that are hard to define and unnecessarily convoluted. But really I just understand test to be like, this piece of software is supposed to output 1 when I input 3, is that what it is behaving?

There are also edge cases to be tested and taken care of accordingly.

So the real issue seems to be how to write good tests. This is a very broad topic with no definite answer, and I’ll omit discussion in this post. For this project, I’m following this guide.

Testing tools in Python

1. doctest
2. unittest
3. pytest

doctest  is packaged with Python. The usage is the following:

• Open a Python interpreter, and interactively test a bunch of results with the function to be tested
• Copy and paste all output from the interactive session to the docstring of that specific function
• add a if __name__ == "__main__" line in the script to be tested so as to detect if it is imported or being executed standalone. If it is executed standalone, add one line doctest.testmod() so the doctest will run the test mode..

this module will parse the docstring to get expected results for every input. It will then compare the actual output with the expected input, and return errors if they do not match. If everything runs fine, the module will not output anything.

unittest the biggest difference with doctest is test cases are no longer defined in the original script. Instead, we now create a new script, import the old script and its methods, and test all out test cases there. The good news is now script is separated from tests, the bad news is we need to write more code.

How to use unittest module?

• First create a new python script, import bothunittest and the original script in it. This script will be run alone, so need to specify the shebang line as well.
• Create a unittest.TestCase class. syntax is class MyTestCase(unittest.TestCase), i.e. MyTestCase is a subclass of unittest.TestCase.
• So the real issue is what methods we can use with unittest.TestCase. These methods are used to assert if the expected value is equal to the real reaults. Here is a link to the useful parts of its documentation. Basically, the testing process is a series of assertion processes.
• The most naive way is just to define a bunch of assertTrue() methods in the class one by one. A more mature way is to use the setUp and tearDown methods to put all test cases into a data structure, e.g. a list, and in the test function body use a for loop to iterate through that data structure. Sample code here : https://www.python-course.eu/python3_tests.php

pytest is a third party package. It is easier to use than the unittest module, because it does not require all the class and method setups.

• The simplest case, first create a python script exercise.py with 1) the function to be tested and a function to test that function with assert. e.g. assert func(4) == 5
• Then, run the scrip with pytest exercise.py from command line in the folder containing the script. You do not even need to import pytest in exercise.py!
• On the other hand, if we do not specify an argument to pytest, it will run test on every file of the form *_test.py or test_*.py in the current directory. This is a convention to discover test files. Also see this link.

Finally, adding some good resources on pytest that I just found:

• The accompany git repo: https://github.com/okken/pycharm_2018_Feb

Some updates on using pytest with PyCharm

• If you want to run a single test instead of the whole script, see this SO discussion. The pipeline might be broken though.
• Pycharm professional has a test-runner tab where you could use auto-test and it will re-run a test function after modification. Link here.
• Add verbose option for pytest. It’s in “edit configurations” — pytest — additional argument. Specify a -v.
• Put tests in a class so you can run them together. i.e. if they are related and always need to be run together.
• auto test —- you can have test run continuously when you type

Happy testing!

# Using ROS with Conda

For the past two weeks I’ve been playing around with ROS. All was fine until I tried to embed tensorflow and “deep learning” stuff into ROS with conda.

Things went south immediately. Pipelines broke down, what used to work no longer worked anymore. roslaunch showed not a package error, import in Python showed package not found even though I just pip installed it…

This post aims to clarify what happens, why my system became a mess, and how to resolve them.

The first problem with error importing modules in python is because ROS conflicts with Conda. This is because ROS sneakly changes the Python path in ~/.bash_rc file. According to this post by Udacity NanoDegree github:

The problem here is that Python is looking for a package in /opt/ros/kinetic/lib/python2.7/dist-packages/ when it should be looking in the Anaconda installation directories. The reason for this is that with your ROS install, your .bashrc file contains the line source /opt/ros/kinetic/setup.bash, which sets your PYTHONPATH variable to the /opt/ros/kinetic/... location.

The solution is to unset your PYTHONPATH variable in your ~/.bash_rc file, right after sourcing ROS, but before sourcing your Anaconda environment. Use this command unset PYTHONPATH.

The second issue is I used many package installers to install python packages (and possibly python itself). Be careful of online tutorials my dude! I ends up having tens of python paths, scattered around different virtual environments, and things became very convoluted.

Package managers I used include:

• pip (system wide and within conda; I have five pips …)
• apt (did not figure out what it did until too late)
• conda

Where do these package managers install packages?

• pip: https://stackoverflow.com/questions/29980798/where-does-pip-install-its-packages
• When used without conda environment, will show the base python dir
• pip when used with virtualenv will generally install packages in the path <virtualenv_name>/lib/<python_ver>/site-packages. For example, I created a test virtualenv named venv_test with Python 2.7, and the django folder is in venv_test/lib/python2.7/site-packages/django.
• conda will install in the environment’s python site-packages.
• If there is no Python2 but we want a python2 package, conda seems will install it to the system-wide python site-packages

A detailed comparison about pip and apt:

pip is used to download and install packages directly from PyPI. PyPI is hosted by Python Software Foundation. It is a specialized package manager that only deals with python packages.

apt-get is used to download and install packages from Ubuntu repositories which are hosted by Canonical.

Some of the differences between installing python packages from apt-get and pip are as follows:

• Canonical only provides packages for selected python modules. Whereas, PyPI hosts a much broader range of python modules. So, there are a lot of python modules which you won’t be able to install using apt-get.
• Canonical only hosts a single version of any package (generally the latest or the one released in recent past). So, with apt-get we cannot decide the version of python-package that we want. pip helps us in this situation. We can install any version of the package that has previously been uploaded on PyPI. This is extremely helpful in case of conflict in dependencies.
• apt-get installs python modules in system-wide location. We cannot just install modules in our project virtualenv. pip solves this problem for us. If we are using pip after activating the virtualenv, it is intelligent enough to only install the modules in our project virtualenv. As mentioned in previous point, if there is a version of a particular python package already installed in system-wide location, and one of our project requires an older version of the same python package, in such situations we can use virtualenv and pip to install that older version of python package without any conflicts.
• As @Radu Rădeanu pointed out in this answer, there would generally be difference in names of packages as well. Canonical usually names Python 2 packages as python-<package_name> and Python 3 packages as python3-<package_name>. Whereas for pip we generally just need to use <package_name> for both Python 2 as well as Python3 packages.

## Which one should you use:

Both apt-get and pip are mature package managers which automatically install any other package dependency while installing. You may use anyone as you like. However, if you need to install a particular version of python-package, or install the package in a virtualenv, or install a package which is only hosted on PyPI; only pip would help you solve that issue. Otherwise, if you don’t mind installing the packages in system-wide location it doesn’t really matter whether you use apt-get or pip.

Also, this link that summarizes this problem perfectly: https://stackoverflow.com/questions/14117945/too-many-different-python-versions-on-my-system-and-causing-problem

The third issue is I confused Python packages with ROS packages somewhere in my projects. This is more conceptual but did misguide my actions. Here are some clarifications:

ROS

• ROS is a programming ecosystem/framework that provide middleware for modules (packages) to communicate, especially for the purpose of robotics systems
• ROS provide distribution in Python and C++
• ROS code is organized by packages: folders containing src, data, other libraries…
• ROS has a $ROS_PACKAGE_PATH, which is an environment variable indicating where to find packages for ROS. For example: echo$ROS_PACKAGE_PATH will give the output /opt/ros/kinetic/share https://answers.ros.org/question/241528/what-does-environment-set-up-do/

Python

• Python path: environment variables. Where to find import packages…https://realpython.com/python-modules-packages/ the module search path
• modules: individual scripts
• package: a collection of modules https://realpython.com/python-modules-packages/#the-dir-function
• Given this structure, if the pkg directory resides in a location where it can be found (in one of the directories contained in sys.path), you can refer to the two modules with dot notation (pkg.mod1, pkg.mod2) and import them with the syntax you are already familiar with
• The __init__.py file in a package will import the modules into the package namespace, so that we can directly use imported modules in the main function.

Finally, some useful scripts for inspection and cleaning:

• Find all Python:  sudo find / -type f -executable -iname 'python*' -exec file -i '{}' \; | awk -F: '/x-executable; charset=binary/ {print \$1}' | xargs readlink -f | sort -u | xargs -I % sh -c 'echo -n "%: "; % -V'
• whereis pip   finds all pips

# Use binary representation of int to represent subsets of n elements

For the past two months I spent many hours solving algorithmic problems. It’s fun, but very time-consuming. I think the majority of time is spent on trying to find accurate solutions and systematic ways for problem solving. I also spent a few weeks (!) grasping data structure fundamentals.

Anyway, this post just describes a technique I came into when reading this book , page 73, on using integers to find all possible subsets from a list of length n. Example Leetcode problem: https://leetcode.com/problems/subsets/description/

The key insights are:

• Every int is binary represented in a machine’s memory as a sequence of 0’s and 1’s
• The range of ints from 0 to 2**n-1, have included all possible combinations of 0’s and 1’s for a list of length n.
• Think about every position’s value (0 or 1) as an indicator whether element at that position appears or not.
• So for every int from 0 to 2 **n -1, we only need to extract all the positions that are 1, to get the combination of list elements this int represents
• With bitwise operators & and shift operator <<, we can easily get whether the i-th element of an int is 1 (turned on) or 0 (turned off). Try X & (1 << i).
• Looping through all positions from 0 to (n-1) we can get all valid bits for this int. We append the corresponding elements to the answer list corresponding to this int.
• Finally, we collect every answer list to a big list and return.

Very smart and interesting strategy. Code snippet here: https://codepad.co/snippet/Tm6Z5ZZP

This strategy can used for other exhaustive search problems that requires looping through all possible combinations of n elements, with exponential time complexity ( 2**n). But the very fast bit manipulation makes it evades the time limit.

# 津门大饭店今天煎饼果子师傅回家了

（又是一个 draft，放到博客上有点 git add 的意思，比较公开感觉是个 milestone，又不会发到微信太丢脸。writer’s block 可能需要不要脸才能解决，有空会回来扩写这篇文脉不畅的东西……）

# 又一个迷茫的北美理科生小王准备成为码农

Anyway，我们这个年代身处北美的理工科青年，理想破灭这件事情的发生，通常伴随着如下行为：
– 对亲友和网友疯狂吐槽原专业并深入讨论转 CS 可能性
– 购买各类编程书籍，例如 Learn Python the Hard Way, Head First Java, Problem Solving With C++
– 订阅 LeetCode VIP 账户
– 注册一亩三分地账号，每天视奸老中和小中 offer 并刷面经