Django for Data Scientists Part I: Serving A Machine

Quick summary

Summarize this blog with AI

Introduction

This tutorial will teach you how to productionize a machine learning model by serving it through a web API server with Django.

This is the first article for our Django for data scientist tutorials that aims to help a data scientist become more ‘full-stack’ and stand out among other data scientists.

We've created a crash course to teach you everything covered in this article with 1 hour long, 8 video lectures. Feel free to check out the course here.

Here are a few reasons to consider wondering how web development skills can help with your data science career.

Web dev skills make your resume stand out among other candidates. It could be a killer skill for hiring managers who struggle to fight for engineering resources to productize a machine learning model;
Very often, a decision-maker in a company is not very technical. If you can develop a prototype and allow them to interact with using a web browser, it gives you an edge to help them realize the power of your model and get green light your project;
If you have a revolutionary idea about using AI to make a dent in the universe, you can quickly convert your idea to a web product, make it available over the whole internet, and let the whole world know about it.

Our Goals:

I will show you how to productize a machine learning model and create a web service hosting your model in 3 steps:

Set up your Django project with Cookiecutter, a great tool to jump-start your Django project;
Train a sentiment classification model using 2000 movie reviews data;
Productize your model locally by setting up your web service API;

Let’s jump right in.

1. Creating a python virtual environment

I am assuming you are using a Mac, and the same steps should apply to Unix/Linux based operating system;
Follow the step by step instruction and install Django cookiecutter, a great Django package to jumpstart your project;

pip install "cookiecutter>=1.4.0"

cookiecutter https://github.com/pydanny/cookiecutter-django

Leons-iMac:projects leon$ cookiecutter https://github.com/pydanny/cookiecutter-django
You've downloaded /Users/leon/.cookiecutters/cookiecutter-django before. Is it okay to delete and re-download it? [yes]:
project_name [My Awesome Project]: Classification Project
project_slug [classification_project]:
description [Behold My Awesome Project!]: My Classification Project
author_name [Daniel Roy Greenfeld]: leon
domain_name [example.com]:
email [[email protected]]: [email protected]
version [0.1.0]:
Select open_source_license:
1 - MIT
2 - BSD
3 - GPLv3
4 - Apache Software License 2.0
5 - Not open source
Choose from 1, 2, 3, 4, 5 [1]: 5
timezone [UTC]: US/Pacific
windows [n]: n
use_pycharm [n]: y
use_docker [n]: n
Select postgresql_version:
1 - 10.5
2 - 10.4
3 - 10.3
4 - 10.2
5 - 10.1
6 - 9.6
7 - 9.5
8 - 9.4
9 - 9.3
Choose from 1, 2, 3, 4, 5, 6, 7, 8, 9 [1]: 1
Select js_task_runner:
1 - None
2 - Gulp
Choose from 1, 2 [1]: 1
custom_bootstrap_compilation [n]: n
use_compressor [n]: y
use_celery [n]: n
use_mailhog [n]: n
use_sentry [n]: n
use_whitenoise [n]: y
use_heroku [n]: y
use_travisci [n]: n
keep_local_envs_in_vcs [y]: n
debug [n]: n
 [WARNING]: Cookiecutter Django does not support Python 2. Stability is guaranteed with Python 3.6+ only, are you sure you want to proceed (y/n)?
y
 [SUCCESS]: Project initialized, keep up the good work!

Once the Django project is created, let’s create a virtual environment and use Python 3.6, so we can avoid conflicts by using the specific python libraries and

Leons-iMac:projects leon$ cd classification_project/
Leons-iMac:classification_project leon$ ls
Procfile                              docs                                  pytest.ini                            setup.cfg
README.rst                            locale                                requirements                          utility
classification_project                manage.py                             requirements.txt
config                                merge_production_dotenvs_in_dotenv.py runtime.txt
Leons-iMac:classification_project leon$ virtualenv -p python3 venv
Leons-iMac:classification_project leon$ ls
Procfile                              docs                                  pytest.ini                            setup.cfg
README.rst                            locale                                requirements                          utility
classification_project                manage.py                             requirements.txt                      venv
config                                merge_production_dotenvs_in_dotenv.py runtime.txt

Notice the newly created venv folder, which contains all the necessary files for your virtual environment.

Then we enter the virtual environment.

leons-iMac:classification_project leon$ source venv/bin/activate
(venv) leons-iMac:classification_project leon$

Notice the venv in front of your shell prompt, which indicates that you are not in the virtual environment; leave the virtual env, deactivate on the command line.

Install all the Django libraries that are needed for your local dev environment

(venv) leons-iMac:classification_project leon$ pip install -r requirements/local.txt

Now start your Django development server,

(venv) leons-iMac:classification_project leon$ python manage.py runserver

And you will see this error message:

django.db.utils.OperationalError: FATAL: database “classification_project” does not exist

That is simply because Django tries to access the default Postgres database which does not exist yet. Let’s fix that.

(optional) If you have not installed Postgres on your computer, you can install it with homebrew.

mkdir homebrew && curl -L https://github.com/Homebrew/brew/tarball/master | tar xz --strip 1 -C homebrew

Then install Postgres

brew install postgres

Now that your Postgres server is installed, we can create a database for this project.

createdb classification_project

Then start the Django dev server again:

(venv) leons-iMac:classification_project leon$ python manage.py runserver
Performing system checks...
System check identified no issues (0 silenced).
You have 23 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): account, admin, auth, contenttypes, sessions, sites, socialaccount, users.
Run 'python manage.py migrate' to apply them.
January 28, 2019 - 23:05:20
Django version 2.0.10, using settings 'config.settings.local'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.

Open your web browser (e.g., chrome), then go to http://127.0.0.1:8000/

Viola, your Django website is initialed and up and running. Congratulations.

Now we run the migration command so that Django will create the first set of tables to provision the database.

(venv) leons-iMac:classification_project leon$ python manage.py migrate
Operations to perform:
  Apply all migrations: account, admin, auth, contenttypes, sessions, sites, socialaccount, users
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying contenttypes.0002_remove_content_type_name... OK
  Applying auth.0001_initial... OK
  Applying auth.0002_alter_permission_name_max_length... OK
  Applying auth.0003_alter_user_email_max_length... OK
  Applying auth.0004_alter_user_username_opts... OK
  Applying auth.0005_alter_user_last_login_null... OK
  Applying auth.0006_require_contenttypes_0002... OK
  Applying auth.0007_alter_validators_add_error_messages... OK
  Applying auth.0008_alter_user_username_max_length... OK
  Applying users.0001_initial... OK
  Applying account.0001_initial... OK
  Applying account.0002_email_max_length... OK
  Applying admin.0001_initial... OK
  Applying admin.0002_logentry_remove_auto_add... OK
  Applying auth.0009_alter_user_last_name_max_length... OK
  Applying sessions.0001_initial... OK
  Applying sites.0001_initial... OK
  Applying sites.0002_alter_domain_unique... OK
  Applying sites.0003_set_site_domain_and_name... OK
  Applying socialaccount.0001_initial... OK
  Applying socialaccount.0002_token_max_lengths... OK
  Applying socialaccount.0003_extra_data_default_dict... OK

With the local Django dev project created, now we move on to build our model.

Since our main focus in this article is mainly about hosting a machine learning model, we will not go into too much detail about tuning the machine learning model parameters. Still, the same model serving method can also be applied to other models.

2. Start a Django app for modeling

(venv) leons-iMac:classification_project leon$ django-admin startapp modeling
(venv) leons-iMac:classification_project leon$ cd modeling/
(venv) leons-iMac:modeling leon$ ls
__init__.py admin.py    apps.py     migrations  models.py   tests.py    views.py

After that, add ‘modeling’ to your installed apps in the project settings file: config/base.py.

Notice the app is currently located directly in our project root directory. Many of you may prefer to have a Django app inside the project_slug directory ( classification_project/classificaiton_project instead of classification_project/). To achieve that, follow these 3 simple steps:

1. move the entire app directory into classification_project/classificaiton_project/ and update the path.

mv modeling classification_project/
cd classificatino_project/modeling/

2. open the apps.py and change `name = modeling.app` to `name = “classification_project.modeling”`

If you followed the above step, make sure to include classification_project.modeling.apps.ModelingConfig to installed app section in the settings file: config/base.py

Now we need to install the scikit-learn libraries to train the model and predict an incoming sample.

(venv) leons-iMac:modeling leon$ pip install scikit-learn==0.20.2

We should also include scikit-learn in the requirements file to ensure it will be installed when deploying to production.

echo 'scikit-learn==0.20.2' >> requirements/base.txt

Now we can download the movie review data sets, which include 2 preprocessed data sets: positive reviews and negative reviews.

(venv) leons-iMac:classification_project leon$ cd modeling/
(venv) leons-iMac:modeling leon$ python
Python 3.7.2 (default, Jan 13 2019, 12:50:01)
[Clang 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

Then execute the following script. The original script can be found on scikit-learn’s official GitHub page:

https://github.com/scikit-learn/scikit-learn/blob/master/doc/tutorial/text_analytics/data/movie_reviews/fetch_data.py

import os
import tarfile
from contextlib import closing
try:
    from urllib import urlopen
except ImportError:
    from urllib.request import urlopen

URL = ("https://djangoml.s3-us-west-1.amazonaws.com/static/datasets/txt_sentoken.tar.gz")
ARCHIVE_NAME = URL.rsplit('/', 1)[1]
DATA_FOLDER = "txt_sentoken"

if not os.path.exists(DATA_FOLDER):
    if not os.path.exists(ARCHIVE_NAME):
        print("Downloading dataset from %s (3 MB)" % URL)
        opener = urlopen(URL)
        with open(ARCHIVE_NAME, 'wb') as archive:
            archive.write(opener.read())
    print("Decompressing %s" % ARCHIVE_NAME)
    with closing(tarfile.open(ARCHIVE_NAME, "r:gz")) as archive:
        archive.extractall(path='.')
    os.remove(ARCHIVE_NAME)

Now we exit the python and get back to the command line.

(venv) leons-iMac:modeling leon$ ls
__init__.py        apps.py            model.file         poldata.README.2.0 txt_sentoken
admin.py           migrations         models.py          tests.py           views.py

There is a new folder txt_sentoken, which’s where 2000 preprocessed movie review files under two filer: pos (positive reviews) and neg (negative reviews).

Train the model and save the model into a pickle file. We relaunch python and paste the following code.

The original script and detailed explanations can be found here

import sys
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_files
from sklearn.model_selection import train_test_split
from sklearn import metrics

movie_reviews_data_folder = 'txt_sentoken'
dataset = load_files(movie_reviews_data_folder, shuffle=False)
print("n_samples: %d" % len(dataset.data))
docs_train, docs_test, y_train, y_test = train_test_split(
    dataset.data, dataset.target, test_size=0.25, random_state=None)
pipeline = Pipeline([
    ('vect', TfidfVectorizer(min_df=3, max_df=0.95)),
    ('clf', LinearSVC(C=1000)),
])
# for the parameters
parameters = {
    'vect__ngram_range': [(1, 1), (1, 2)],
}

grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1)

grid_search.fit(docs_train, y_train)

Now that the model is trained using grid_search. We need to save the best model to serve the incoming requests and make predictions. Run the following in python:

from sklearn.externals import joblib
joblib.dump(grid_search.best_estimator_, 'model.file', compress = 1)

You have now saved your classifier model into a binary file, ‘classifier.model’. That’s a lot of codes, stay with me, and we are almost there. Now let’s serve the model by creating a local API.

Open views.py from the modeling app, and add the following code.

from django.shortcuts import render

import os
from django.http import JsonResponse

from sklearn.externals import joblib
CURRENT_DIR = os.path.dirname(__file__)
model_file = os.path.join(CURRENT_DIR, 'model.file')
model = joblib.load(model_file)

# Create your views here.
def api_sentiment_pred(request):
    review = request.GET['review']
    result = 'Positive' if model.predict([review]) else 'Negative'
    return (JsonResponse(result, safe=False))
Now that we have a predict function, we need to bind it to an URL, in the modeling folder, create a urls.py file and enter the following code:

from django.urls import path
from .views import api_sentiment_pred
urlpatterns = [
    path('api/predict/', api_sentiment_pred, name='api_sentiment_pred'),    
]

Now we need to include this URL configuration in to project.

Open classification_project/config/urls.py file and add the following

# Your stuff: custom urls includes go here
path('model/', include('classification_project.modeling.urls'))

Now let’s start the server from the project root, enter the following:

python manage.py runserver

After the Django server is up and running, it might take a few seconds for it to load the model. Go to your browser and enter the following URL:

http://localhost:8000/model/api/predict/?review=This movie is great

If everything is running as expected, you will see the predicted results says:

"Positive"

You can also try a few more examples such as:

http://localhost:8000/model/api/predict/?review=I really liked this movie

http://localhost:8000/model/api/predict/?review=This movie is long and boring

Alternatively, you can use cURL to submit a web request on your command line:

curl -G "http://localhost:8000/model/api/predict" --data-urlencode "review=This movie sucks"

Congratulations! Now you have successfully created a web server to host your machine learning model on your local machine. Read more on how to productionze this machine learning model and deploy it into a cloud service.

Conclusion

In this tutorial:

We used cookiecutter to jumpstart a Django project;
We then trained a classification model based on 2000 movie review data;
We created a local HTTP server to handle web traffic, taking a review text, and output a predicted sentimental analysis result
We’ve accomplished a lot in this tutorial. If you have followed each step and successfully see a predicted result, you can proudly say now you have hosted your machine learning model and convert it into an HTTP service.
In the next tutorial, we will walk you through Django for Data Scientists Part 2: Deploy A Machine Learning Model RESTFul API to the Cloud (Heroku) and let the whole internet use your machine learning service. Stay tuned.

We've created a crash course to teach you everything covered in this article with 1 hour long, 8 video lectures. Feel free to check the course here.

If you or your team has any questions about using Django for your machine learning service, please feel free to book a Django machine learning consulting with Leon here.

Django for Data Scientists Part I: Serving A Machine Learning Model through a RESTFul API

Summarize this blog with AI

Introduction

Our Goals:

1. Creating a python virtual environment

Install all the Django libraries that are needed for your local dev environment

2. Start a Django app for modeling

Conclusion

Begin Your SQL, Python, and R Journey

Flask vs django

Best No-Code Data Analytics Tools

Django for Data Scientists Part 2: Deploy A Machine Learning Model RESTFul API to the Cloud (Heroku)