This post explains the concept of soft classifiers (in its simple form) and offers examples in sklearn.

Soft classifiers

In classification problems, hard classifiers gives the exact predicted class.

But soft classifiers gives a probability estimation over all classes. Prediction can then be made using a threshold. This also gives the possibility of multi-label classifications.

Code in sklearn:

This is a sample program in python using a KNN classier.

import pandas
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
dataset = pandas.read_csv(url, names=names)
# change to binary classification
new_class = np.random.randint(0, 2, len(dataset), dtype='l')
dataset['class'] = new_class

Now build the classifiers.

from sklearn.neighbors import KNeighborsClassifier
# Make predictions on validation dataset
knn = KNeighborsClassifier()
knn.fit(X_train, Y_train)
predictions = knn.predict(X_validation) # hard
predictions_prob = knn.predict_proba(X_validation) # soft

Results of hard predictor.

predictions
array([ 1., 1., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0.,
1., 1., 1., 1., 1., 0., 1., 0., 1., 1., 1., 0., 0.,
0., 1., 1., 1.])

Results of soft classifier:

array([[ 0.2, 0.8],
[ 0.2, 0.8],
[ 0.6, 0.4],
[ 0.6, 0.4],
[ 0.4, 0.6],
[ 0.6, 0.4],
[ 0.6, 0.4],
[ 0.4, 0.6],
[ 0.8, 0.2],
[ 0.6, 0.4],
[ 0.8, 0.2],
[ 0.6, 0.4],
[ 0.8, 0.2],
[ 0.4, 0.6],
[ 0.2, 0.8],
[ 0.4, 0.6],
[ 0.4, 0.6],
[ 0.4, 0.6],
[ 0.6, 0.4],
[ 0.4, 0.6],
[ 0.6, 0.4],
[ 0.4, 0.6],
[ 0.4, 0.6],
[ 0.4, 0.6],
[ 0.6, 0.4],
[ 0.6, 0.4],
[ 0.8, 0.2],
[ 0.4, 0.6],
[ 0.4, 0.6],
[ 0.2, 0.8]])

This will have results on the ROC curve produced. For the hard classifier, the ROC is linear. For the soft classifier, the ROC is continuous… Note the input of soft classifier: pred = predictions_prob[:, 1] === this generates the probability of the positive class and input to ROC function.

import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.metrics import roc_curve, auc
plt.figure(figsize = (12, 8))
truth = Y_validation
pred = predictions
fpr, tpr, thresholds = roc_curve(truth, pred)
roc_auc = auc(fpr, tpr)
c = (np.random.rand(), np.random.rand(), np.random.rand())
plt.plot(fpr, tpr, color=c, label= 'HARD'+' (AUC = %0.2f)' % roc_auc)
truth = Y_validation
pred = predictions_prob[:, 1]
fpr, tpr, thresholds = roc_curve(truth, pred)
roc_auc = auc(fpr, tpr)
c = (np.random.rand(), np.random.rand(), np.random.rand())
plt.plot(fpr, tpr, color=c, label= 'SOFT'+' (AUC = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], 'k--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.xlabel('FPR')
plt.ylabel('TPR')
plt.title('ROC')
plt.legend(loc="lower right")
plt.show()

Why is this the case?

This has to do with the input parameters of sklearn. See doc:

sklearn.metrics.roc_curve(y_true, y_score, pos_label=None, sample_weight=None, drop_intermediate=True)

Parameters:

y_score : array, shape = [n_samples]

Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).

But still, why is the ROC curve continuous? This has to do with what is ROC curve?

ROC = receiver operating characteristics

For an ROC graph, the x-axis is the false positive rate, the y-axis is the true positive rate.

Each point corresponding to a particular classification strategy.

(0, 0) = classify all instances to negative.

(1, 1) = classifying all instances to positive. ??

A ranking model produces a set of points in ROC space. Each point correspond to the result of a threshold – each threshold produces a different point in ROC. Thus this soft classifier in effect equals to (many) strategies.

Final plot: