1. Abstract

In this report, we'll be exploring a dataset consisting of app interaction data from a popular online food delivery platform and using it to answer some business questions.

import sys
import time
import warnings
import pylab as P
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from __future__ import division

pd.set_option('display.float_format', lambda x: '%.0f' % x)
cols = ["id", "platform", "timestamp", "event_type"]
sv_cols = ["id", "platform", "timestamp", "event_type", "screenname"]
sns.set(style="darkgrid")
warnings.filterwarnings("ignore")

#reading in all of the event data
data = pd.read_csv("/data.csv")[cols].sort_values(by = ['id', 'timestamp'])

#reading in all of 'Screen Visited' events' data
svdata = pd.read_csv("/svdata.csv")[sv_cols].sort_values(by = ['id', 'timestamp'])
svdata = svdata.loc[svdata['screenname'].notnull()]

#Lets first see what our data looks like.
print data.head()
print "\n"
print svdata.head()

         id platform      timestamp      event_type
16218     6  Android 20161001073000    App Launched
1348891   6  Android 20161001073000          Splash
958609    6  Android 20161001073100  Screen Visited
16249     6  Android 20161001073500    App Launched
16256     6  Android 20161001073500    App Launched


        id platform      timestamp      event_type       screenname
55849    6  Android 20161001073100  Screen Visited  Select location
53074   33  Android 20161001171700  Screen Visited     Menu catalog
163871  36  Android 20161002123700  Screen Visited             Cart
163872  36  Android 20161002123700  Screen Visited     Menu catalog
356218  54  Android 20161003150100  Screen Visited     Menu catalog

2. Data Exploration

Where do all these app interaction events come from? Lets find out. From the plot below we can see that most of the app interaction events are triggered on the android application of our e-commerce portal.

plot = ''
plot = sns.countplot(x="platform", data=data)
sns.plt.show()

#Number of unique users who logged in on each platform.
print dict(data[["id", "platform"]].drop_duplicates().groupby(by=['platform']).size())

#Number of unique users who made a transaction on each platform.
print dict(data.loc[data.event_type=="Charged"][["id", "platform"]].drop_duplicates().groupby(by=['platform']).size())

{'Web': 53196, 'Android': 30884, 'iOS': 11433}
{'Web': 4940, 'Android': 4001, 'iOS': 2487}

Where are all the unique users coming from and how many of them actually make a purchase on our website? We noticed from the data that for a time period of two days (from 10-01-2016 to 10-03-2016), the vast majority of unique users (53,196) accessed the app on Web. Of these users, 4940 users had placed orders and were charged. The second most popular platform is Android, with a unique active user count of 30,884, of which 4001 users were charged for their orders.

from collections import OrderedDict
active = data[["id", "platform"]].drop_duplicates().groupby(by=['platform']).size()
charged = data.loc[data.event_type=="Charged"][["id", "platform"]].drop_duplicates().groupby(by=['platform']).size()
sales = OrderedDict([ ('Platform', ['Android', 'iOS', 'Web']), ('Charged', [4001, 2487, 4940]), ('Active',  [30884, 11433, 53196])])
df = pd.DataFrame.from_dict(sales).sort_values(by = ['Active', 'Charged'], ascending = [False, False])
sns.set_color_codes("pastel")
ax = ''
ax = sns.barplot(x="Platform", y="Active", data=df, label="Active", color="b")
sns.set_color_codes("muted")
ax = sns.barplot(x="Platform", y="Charged", data=df, label="Charged", color="b")
ax.set(xlabel='Platform', ylabel='Number of Users')
ax.legend(ncol=2, loc="upper right", frameon=True)
sns.despine(left=True, bottom=True)
sns.plt.show()

There are over 60 different types of events that can be tracked through the app. Which ones occur most frequently? We see that “Screen Visited” seems to be the most frequently occurring event on our app, followed by the event “Menu Category Changed”. We can also observe that over the course of two days less than 15000 transactions have occurred (the event “Charged” has a frequency of 14,417).

plot = ''
plot = sns.countplot(x="event_type", data=data)
plot.set_xticklabels(plot.get_xticklabels(), rotation=-90)
sns.plt.show()

categories = ['american', 'asian', 'chinese', 'continental', 'european', 'fusion', 
              'indian', 'italian', 'latin-american', 'mediterranean', 'mexican', 
              'middle-east', 'oriental', 'pan-asian', 'thai', 'universal']

for category in categories:    
    svdata.loc[svdata.screenname.str.startswith('/'+category), 'category'] = category

svdata.head()

category_df = svdata.loc[svdata.category.notnull()]
category_df.loc[:, 'isveg'] = np.where(category_df['screenname'].str.contains('vegetarian'), 'Veg', 'Nonveg')

category_df.head()

Users often browse through food from different cuisines before making a final choice on what to purchase. Lets find out which cuisines are getting the most screen views. Over here, we observe that Continental with a count of 5620, is the food category with the highest number of user views. Latin-American dishes were the least popular (with a view count of only 1).

plot = ''
plot = sns.countplot(x="category", data=category_df)
plot.set_xticklabels(plot.get_xticklabels(), rotation=-90)
sns.plt.show()

What is the ratio of vegetarian food views to non vegetarian food views withing each cuisine? The plot below shows us that Pan-Asian seems to be the most popular choice among people who’re interested in vegetarian food. And for non vegetarians, the most popular is Continental.

plot = ''
category_count = category_df[["id", "category", "isveg"]].groupby(by = ['category', 'isveg']).count().reset_index().sort_values(by = ['category', 'id'], ascending=[True, False])
plot = sns.barplot(x = "id", y = "category", data = category_count, hue = "isveg")
plot.set(xlabel='Number of Views', ylabel='Food Category')
plt.title("#Total Views by Category")
sns.plt.show()

def getplot(category):
    ax0=''
    category = category
    product_df = category_df[["id", "screenname", "category", "isveg"]].groupby(by = ['screenname', 'category', 'isveg']).count().sort_values(by = ['id'], ascending=False).reset_index()
    ax0 = sns.barplot(x = "id", y = "screenname", data = product_df.loc[product_df.category==category])
    ax0.set(xlabel='Number of Views', ylabel=category)
    return(ax0)

In terms of screen views, which dishes are more popular for a given cuisine? Over here we drill down to the product level for two of the most popular categories, continental and pan-asian. We can observe that number of continental food items that are viewed far exceed that of pan-asian food items.

getplot('continental')
plt.title("#Product Views by Category")
sns.plt.show()
print "\n"
getplot('pan-asian')
plt.title("#Product Views by Category")
sns.plt.show()

	id	platform	timestamp	event_type	screenname	category
55849	6	Android	20161001073100	Screen Visited	Select location	NaN
53074	33	Android	20161001171700	Screen Visited	Menu catalog	NaN
163871	36	Android	20161002123700	Screen Visited	Cart	NaN
163872	36	Android	20161002123700	Screen Visited	Menu catalog	NaN
356218	54	Android	20161003150100	Screen Visited	Menu catalog	NaN

	id	platform	timestamp	event_type	screenname	category	isveg
273657	147790	Web	20161002220200	Screen Visited	/chinese--non-veg--add-ons--chicken--oys	chinese	Nonveg
283821	147810	Web	20161002233300	Screen Visited	/pan-asian--vegetarian--add-ons--bell-pe	pan-asian	Veg
283827	147810	Web	20161002233300	Screen Visited	/italian--vegetarian--mains--fusilli-pas	italian	Veg
283834	147810	Web	20161002233300	Screen Visited	/universal--vegetarian--jars--caramel--b	universal	Veg
286666	147810	Web	20161003020400	Screen Visited	/mexican--vegetarian--starter-appetizers	mexican	Veg