<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Sachin Shrestha</title>
    <description>Data Scientist / Analyst</description>
    <link>http://sachinshrestha.github.io</link>
    <atom:link href="http://sachinshrestha.github.io/feed.xml" rel="self" type="application/rss+xml" />
    
      <item>
        <title>Crime in NSW - How safe is your suburb?</title>
        <description>&lt;head&gt;
  &lt;style&gt;
    h5{
      font-size:90%;
      font-weight: normal;
      color: Gray;
    }
    
    p.small {
    line-height: 70%;
}
  &lt;/style&gt;
&lt;/head&gt;
&lt;p&gt;&lt;img src=&quot;http://sachinshrestha.github.io/images/crime.jpg&quot; alt=&quot;Crime&quot; /&gt;&lt;/p&gt;

&lt;h5&gt;&lt;i&gt;Author: Sachin Shrestha&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;Data Source: NSW Bureau of Crime Statistics and Research&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;Image source: www.nsw.gov.au&lt;/i&gt;&lt;/h5&gt;

&lt;p&gt;&lt;br /&gt;
How safe is your suburb?
&lt;br /&gt;
&lt;br /&gt;
Find the trend of crimes in the suburb you live in. Has it increased over the years? Or is it getting safer?
&lt;br /&gt;
&lt;br /&gt;
How many home break-ins, robberies, motor vehicle thefts, store thefts or drug-related offences were reported in your suburb in the years 2011 - 2015? 
&lt;br /&gt;
&lt;br /&gt;
How does the monthly crime numbers look like? Is there a pattern? In what months do crimes happen the most? When do they occur the least?
&lt;br /&gt;
&lt;br /&gt;
Your answers are in the following dashbboard.
&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&amp;lt;iframe
src=”https://public.tableau.com/views/Crime_55/Dashboard1?:embed=y&amp;amp;:display_count=yes&amp;amp;:toolbar=no&amp;amp;:origin=viz_share_link”
/iframe&amp;gt;&lt;/p&gt;

&lt;iframe style=&quot;border: 0px;&quot; src=&quot;https://public.tableau.com/views/Crime_55/Dashboard1?:embed=y&amp;amp;:display_count=yes&amp;amp;:toolbar=no&amp;amp;:origin=viz_share_link&quot; scrolling=&quot;no&quot; width=&quot;800px&quot; height=&quot;800px&quot;&gt;
&lt;/iframe&gt;
&lt;hr /&gt;

</description>
        <pubDate>Thu, 17 Nov 2016 00:00:00 +0000</pubDate>
        <link>http://sachinshrestha.github.io/Crime/</link>
        <guid isPermaLink="true">http://sachinshrestha.github.io/Crime/</guid>
      </item>
    
      <item>
        <title>How do people travel to work in New South Wales?</title>
        <description>&lt;p&gt;&lt;img src=&quot;http://sachinshrestha.github.io/images/JTW.jpg&quot; alt=&quot;Journey to Work&quot; /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;tableauPlaceholder&quot; id=&quot;viz1495501105795&quot; style=&quot;position: relative&quot;&gt;
&lt;noscript&gt;
&lt;a href=&quot;#&quot;&gt;
&lt;img alt=&quot;Number of People Travelling to Work by Various Modes of Transport &quot; src=&quot;https:&amp;#47;&amp;#47;public.tableau.com&amp;#47;static&amp;#47;images&amp;#47;JT&amp;#47;JTW-ModeofTransport&amp;#47;JTW-ModeofTransport&amp;#47;1_rss.png&quot; style=&quot;border: none&quot; /&gt;
&lt;/a&gt;
&lt;/noscript&gt;
&lt;object class=&quot;tableauViz&quot; style=&quot;display:none;&quot;&gt;&lt;param name=&quot;host_url&quot; value=&quot;https%3A%2F%2Fpublic.tableau.com%2F&quot; /&gt; 
&lt;param name=&quot;site_root&quot; value=&quot;&quot; /&gt;
&lt;param name=&quot;name&quot; value=&quot;JTW-ModeofTransport&amp;#47;JTW-ModeofTransport&quot; /&gt;
&lt;param name=&quot;tabs&quot; value=&quot;no&quot; /&gt;
&lt;param name=&quot;toolbar&quot; value=&quot;no&quot; /&gt;
&lt;param name=&quot;static_image&quot; value=&quot;https:&amp;#47;&amp;#47;public.tableau.com&amp;#47;static&amp;#47;images&amp;#47;JT&amp;#47;JTW-ModeofTransport&amp;#47;JTW-ModeofTransport&amp;#47;1.png&quot; /&gt; 
&lt;param name=&quot;animate_transition&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_static_image&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_spinner&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_overlay&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_count&quot; value=&quot;yes&quot; /&gt;
&lt;/object&gt;
&lt;/div&gt;
&lt;script type=&quot;text/javascript&quot;&gt;
var divElement = document.getElementById('viz1495501105795');
var vizElement = divElement.getElementsByTagName('object')[0];
vizElement.style.width='100%';
vizElement.style.height=(divElement.offsetWidth*0.75)+'px';
var scriptElement = document.createElement('script');
scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);
&lt;/script&gt;

&lt;p&gt;&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;tableauPlaceholder&quot; id=&quot;viz1477441390667&quot; style=&quot;position: relative&quot;&gt;
&lt;noscript&gt;
&lt;a href=&quot;#&quot;&gt;
&lt;img alt=&quot;People travelling by from suburb &quot; src=&quot;https:&amp;#47;&amp;#47;public.tableau.com&amp;#47;static&amp;#47;images&amp;#47;JT&amp;#47;JTW-Numberofpersonsfromorigin-bysuburb&amp;#47;Peopletravellingbyfromsuburb&amp;#47;1_rss.png&quot; style=&quot;border: none&quot; /&gt;
&lt;/a&gt;
&lt;/noscript&gt;
&lt;object class=&quot;tableauViz&quot; style=&quot;display:none;&quot;&gt;
&lt;param name=&quot;host_url&quot; value=&quot;https%3A%2F%2Fpublic.tableau.com%2F&quot; /&gt; 
&lt;param name=&quot;site_root&quot; value=&quot;&quot; /&gt;
&lt;param name=&quot;name&quot; value=&quot;JTW-Numberofpersonsfromorigin-bysuburb&amp;#47;Peopletravellingbyfromsuburb&quot; /&gt;
&lt;param name=&quot;tabs&quot; value=&quot;no&quot; /&gt;
&lt;param name=&quot;toolbar&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;static_image&quot; value=&quot;https:&amp;#47;&amp;#47;public.tableau.com&amp;#47;static&amp;#47;images&amp;#47;JT&amp;#47;JTW-Numberofpersonsfromorigin-bysuburb&amp;#47;Peopletravellingbyfromsuburb&amp;#47;1.png&quot; /&gt; 
&lt;param name=&quot;animate_transition&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_static_image&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_spinner&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_overlay&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_count&quot; value=&quot;yes&quot; /&gt;
&lt;/object&gt;
&lt;/div&gt;
&lt;script type=&quot;text/javascript&quot;&gt;                    
var divElement = document.getElementById('viz1477441390667');                    
var vizElement = divElement.getElementsByTagName('object')[0];                    
vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';                    
var scriptElement = document.createElement('script');                    
scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                
&lt;/script&gt;

&lt;p&gt;&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;tableauPlaceholder&quot; id=&quot;viz1477452262751&quot; style=&quot;position: relative&quot;&gt;&lt;noscript&gt;&lt;a href=&quot;#&quot;&gt;
&lt;img alt=&quot;Top 10 suburbs from where people travel to work by train &quot; src=&quot;https:&amp;#47;&amp;#47;public.tableau.com&amp;#47;static&amp;#47;images&amp;#47;To&amp;#47;Top10byTrain&amp;#47;Train&amp;#47;1_rss.png&quot; style=&quot;border: none&quot; /&gt;&lt;/a&gt;
&lt;/noscript&gt;
&lt;object class=&quot;tableauViz&quot; style=&quot;display:none;&quot;&gt;
&lt;param name=&quot;host_url&quot; value=&quot;https%3A%2F%2Fpublic.tableau.com%2F&quot; /&gt; 
&lt;param name=&quot;site_root&quot; value=&quot;&quot; /&gt;
&lt;param name=&quot;name&quot; value=&quot;Top10byTrain&amp;#47;Train&quot; /&gt;
&lt;param name=&quot;tabs&quot; value=&quot;no&quot; /&gt;
&lt;param name=&quot;toolbar&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;static_image&quot; value=&quot;https:&amp;#47;&amp;#47;public.tableau.com&amp;#47;static&amp;#47;images&amp;#47;To&amp;#47;Top10byTrain&amp;#47;Train&amp;#47;1.png&quot; /&gt; 
&lt;param name=&quot;animate_transition&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_static_image&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_spinner&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_overlay&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_count&quot; value=&quot;yes&quot; /&gt;
&lt;/object&gt;
&lt;/div&gt;
&lt;script type=&quot;text/javascript&quot;&gt;                    
var divElement = document.getElementById('viz1477452262751');                    
var vizElement = divElement.getElementsByTagName('object')[0];                    
vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';                    
var scriptElement = document.createElement('script');                    
scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                
&lt;/script&gt;

&lt;p&gt;&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;tableauPlaceholder&quot; id=&quot;viz1477452282936&quot; style=&quot;position: relative&quot;&gt;
&lt;noscript&gt;
&lt;a href=&quot;#&quot;&gt;&lt;img alt=&quot;Top 10 suburbs from where people travel to work by bus &quot; src=&quot;https:&amp;#47;&amp;#47;public.tableau.com&amp;#47;static&amp;#47;images&amp;#47;To&amp;#47;Top10byBus&amp;#47;Bus&amp;#47;1_rss.png&quot; style=&quot;border: none&quot; /&gt;
&lt;/a&gt;
&lt;/noscript&gt;
&lt;object class=&quot;tableauViz&quot; style=&quot;display:none;&quot;&gt;
&lt;param name=&quot;host_url&quot; value=&quot;https%3A%2F%2Fpublic.tableau.com%2F&quot; /&gt; 
&lt;param name=&quot;site_root&quot; value=&quot;&quot; /&gt;
&lt;param name=&quot;name&quot; value=&quot;Top10byBus&amp;#47;Bus&quot; /&gt;
&lt;param name=&quot;tabs&quot; value=&quot;no&quot; /&gt;
&lt;param name=&quot;toolbar&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;static_image&quot; value=&quot;https:&amp;#47;&amp;#47;public.tableau.com&amp;#47;static&amp;#47;images&amp;#47;To&amp;#47;Top10byBus&amp;#47;Bus&amp;#47;1.png&quot; /&gt; &lt;param name=&quot;animate_transition&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_static_image&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_spinner&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_overlay&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_count&quot; value=&quot;yes&quot; /&gt;
&lt;/object&gt;
&lt;/div&gt;
&lt;script type=&quot;text/javascript&quot;&gt;                    
var divElement = document.getElementById('viz1477452282936');                   
var vizElement = divElement.getElementsByTagName('object')[0];                   
vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';                    
var scriptElement = document.createElement('script');                    
scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                
&lt;/script&gt;

&lt;p&gt;&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;tableauPlaceholder&quot; id=&quot;viz1477452305064&quot; style=&quot;position: relative&quot;&gt;
&lt;noscript&gt;
&lt;a href=&quot;#&quot;&gt;
&lt;img alt=&quot;Top 10 suburbs from where people travel to work by car &quot; src=&quot;https:&amp;#47;&amp;#47;public.tableau.com&amp;#47;static&amp;#47;images&amp;#47;To&amp;#47;Top10byCar&amp;#47;Car&amp;#47;1_rss.png&quot; style=&quot;border: none&quot; /&gt;
&lt;/a&gt;
&lt;/noscript&gt;
&lt;object class=&quot;tableauViz&quot; style=&quot;display:none;&quot;&gt;
&lt;param name=&quot;host_url&quot; value=&quot;https%3A%2F%2Fpublic.tableau.com%2F&quot; /&gt; 
&lt;param name=&quot;site_root&quot; value=&quot;&quot; /&gt;
&lt;param name=&quot;name&quot; value=&quot;Top10byCar&amp;#47;Car&quot; /&gt;
&lt;param name=&quot;tabs&quot; value=&quot;no&quot; /&gt;
&lt;param name=&quot;toolbar&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;static_image&quot; value=&quot;https:&amp;#47;&amp;#47;public.tableau.com&amp;#47;static&amp;#47;images&amp;#47;To&amp;#47;Top10byCar&amp;#47;Car&amp;#47;1.png&quot; /&gt; &lt;param name=&quot;animate_transition&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_static_image&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_spinner&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_overlay&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_count&quot; value=&quot;yes&quot; /&gt;
&lt;/object&gt;
&lt;/div&gt;
&lt;script type=&quot;text/javascript&quot;&gt;                    
var divElement = document.getElementById('viz1477452305064');                   
var vizElement = divElement.getElementsByTagName('object')[0];                   
vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';                    
var scriptElement = document.createElement('script');                    
scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                
&lt;/script&gt;

&lt;p&gt;&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;tableauPlaceholder&quot; id=&quot;viz1477452323232&quot; style=&quot;position: relative&quot;&gt;
&lt;noscript&gt;
&lt;a href=&quot;#&quot;&gt;
&lt;img alt=&quot;Top 10 suburbs from where people bike to work &quot; src=&quot;https:&amp;#47;&amp;#47;public.tableau.com&amp;#47;static&amp;#47;images&amp;#47;To&amp;#47;Top10byBicycle&amp;#47;Bicycle&amp;#47;1_rss.png&quot; style=&quot;border: none&quot; /&gt;
&lt;/a&gt;
&lt;/noscript&gt;
&lt;object class=&quot;tableauViz&quot; style=&quot;display:none;&quot;&gt;
&lt;param name=&quot;host_url&quot; value=&quot;https%3A%2F%2Fpublic.tableau.com%2F&quot; /&gt; 
&lt;param name=&quot;site_root&quot; value=&quot;&quot; /&gt;
&lt;param name=&quot;name&quot; value=&quot;Top10byBicycle&amp;#47;Bicycle&quot; /&gt;
&lt;param name=&quot;tabs&quot; value=&quot;no&quot; /&gt;
&lt;param name=&quot;toolbar&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;static_image&quot; value=&quot;https:&amp;#47;&amp;#47;public.tableau.com&amp;#47;static&amp;#47;images&amp;#47;To&amp;#47;Top10byBicycle&amp;#47;Bicycle&amp;#47;1.png&quot; /&gt; 
&lt;param name=&quot;animate_transition&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_static_image&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_spinner&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_overlay&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_count&quot; value=&quot;yes&quot; /&gt;
&lt;/object&gt;
&lt;/div&gt;
&lt;script type=&quot;text/javascript&quot;&gt;
var divElement = document.getElementById('viz1477452323232');              
var vizElement = divElement.getElementsByTagName('object')[0];      
vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';    
var scriptElement = document.createElement('script');                  
scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);          
&lt;/script&gt;

&lt;p&gt;&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;tableauPlaceholder&quot; id=&quot;viz1477452335105&quot; style=&quot;position: relative&quot;&gt;
&lt;noscript&gt;
&lt;a href=&quot;#&quot;&gt;
&lt;img alt=&quot;Top 10 suburbs from where people walk to work &quot; src=&quot;https:&amp;#47;&amp;#47;public.tableau.com&amp;#47;static&amp;#47;images&amp;#47;To&amp;#47;Top10onFoot&amp;#47;Walked&amp;#47;1_rss.png&quot; style=&quot;border: none&quot; /&gt;&lt;/a&gt;
&lt;/noscript&gt;
&lt;object class=&quot;tableauViz&quot; style=&quot;display:none;&quot;&gt;
&lt;param name=&quot;host_url&quot; value=&quot;https%3A%2F%2Fpublic.tableau.com%2F&quot; /&gt;
&lt;param name=&quot;site_root&quot; value=&quot;&quot; /&gt;
&lt;param name=&quot;name&quot; value=&quot;Top10onFoot&amp;#47;Walked&quot; /&gt;
&lt;param name=&quot;tabs&quot; value=&quot;no&quot; /&gt;
&lt;param name=&quot;toolbar&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;static_image&quot; value=&quot;https:&amp;#47;&amp;#47;public.tableau.com&amp;#47;static&amp;#47;images&amp;#47;To&amp;#47;Top10onFoot&amp;#47;Walked&amp;#47;1.png&quot; /&gt; 
&lt;param name=&quot;animate_transition&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_static_image&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_spinner&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_overlay&quot; value=&quot;yes&quot; /&gt;
&lt;param name=&quot;display_count&quot; value=&quot;yes&quot; /&gt;
&lt;/object&gt;
&lt;/div&gt;
&lt;script type=&quot;text/javascript&quot;&gt;             
var divElement = document.getElementById('viz1477452335105');      
var vizElement = divElement.getElementsByTagName('object')[0];       
vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';   
var scriptElement = document.createElement('script');                  
scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);           
&lt;/script&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;hr /&gt;

</description>
        <pubDate>Fri, 08 Jan 2016 00:00:00 +0000</pubDate>
        <link>http://sachinshrestha.github.io/JTW/</link>
        <guid isPermaLink="true">http://sachinshrestha.github.io/JTW/</guid>
      </item>
    
      <item>
        <title>Market Segmentation for Frequent Flyer Program</title>
        <description>&lt;p&gt;&lt;img src=&quot;http://sachinshrestha.github.io/images/frequentFlyer.jpg&quot; alt=&quot;Frequent Flyer&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Market segmentation&lt;sup&gt;1&lt;/sup&gt; is a marketing strategy which involves dividing a broad target market into subsets of consumers, businesses, or countries that have, or are perceived to have, common needs, interests, and priorities, and then designing and implementing strategies to target them. Market segmentation strategies are generally used to identify and further define the target customers, and provide supporting data for marketing plan elements such as positioning to achieve certain marketing plan objectives. Businesses may develop product differentiation strategies, or an undifferentiated approach, involving specific products or product lines depending on the specific demand and attributes of the target segment.&lt;/p&gt;

&lt;p&gt;In this project, I will analyse data from an ariline’s frequent flyer program to group it’s customers into different market segments. In particular, I will use &lt;i&gt;clustering algorithm&lt;/i&gt; to segment the airline’s market into different clusters.&lt;/p&gt;

&lt;p&gt;The data is sourced from  &lt;i&gt;www.dataminingbook.com&lt;/i&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;i&gt;airlines&lt;/i&gt; dataframe contains 7 variables all of which are integers. The variables and their description are shown in the following table.&lt;/p&gt;

&lt;table border=&quot;1&quot; style=&quot;background-color:#FFFFCC;border-collapse:collapse;border:1px;color:#000000;width:100%&quot; cellpadding=&quot;5&quot; cellspacing=&quot;3&quot;&gt;
	&lt;tr&gt;
		&lt;th&gt;Name of Variable&lt;/th&gt;
		&lt;th&gt;Variable Type&lt;/th&gt;
		&lt;th&gt;Description&lt;/th&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;Balance&lt;/td&gt;
		&lt;td&gt;Integer&lt;/td&gt;
		&lt;td&gt;Frequent flyer points earned so far&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;QualMiles&lt;/td&gt;
		&lt;td&gt;Integer&lt;/td&gt;
		&lt;td&gt;Number of qualifying miles&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;BonusMiles&lt;/td&gt;
		&lt;td&gt;Integer&lt;/td&gt;
		&lt;td&gt;Miles earned from non-flight transactions&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;BonusTrans&lt;/td&gt;
		&lt;td&gt;Integer&lt;/td&gt;
		&lt;td&gt;Number of non-flight transactions&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;FlightMiles&lt;/td&gt;
		&lt;td&gt;Integer&lt;/td&gt;
		&lt;td&gt;Miles earned from flight transactions&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;FlightTrans&lt;/td&gt;
		&lt;td&gt;Integer&lt;/td&gt;
		&lt;td&gt;Number of flight transactions&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;DaysSinceEnroll&lt;/td&gt;
		&lt;td&gt;Integer&lt;/td&gt;
		&lt;td&gt;Number of days since joing the frequent flyer program &lt;/td&gt;
	&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;The details of the modelling process and the R code to this project is avaiable &lt;a href=&quot;http://sachinshrestha.github.io/frequentFlyerCode/&quot;&gt; here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Results from the model shows that the airline’s customers (that are members of its frequent flyer program) can be categorised into five distinct groups. Analysis of each individual group shows that the groups may be broadly described as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Infrequent but loyal customers&lt;/li&gt;
&lt;li&gt;Customers with large amount of miles mostly from flight transactions&lt;/li&gt;
&lt;li&gt;Customers with large amount of miles mostly from non-flight transactions&lt;/li&gt;
&lt;li&gt;New customers accumulating miles from non-flight transactions&lt;/li&gt;
&lt;li&gt;New and infrequent customers&lt;/li&gt;  
&lt;/ol&gt;

&lt;p&gt;Here’s the complete R code to the&lt;a href=&quot;http://sachinshrestha.github.io/frequentFlyerCode/&quot;&gt; Market Segmentation for Frequent Flyer Customers Project&lt;/a&gt;.&lt;/p&gt;
&lt;hr /&gt;

&lt;p&gt;&lt;sup&gt;1&lt;/sup&gt;Wikipedia&lt;/p&gt;

</description>
        <pubDate>Wed, 25 Nov 2015 00:00:00 +0000</pubDate>
        <link>http://sachinshrestha.github.io/frequentFlyer/</link>
        <guid isPermaLink="true">http://sachinshrestha.github.io/frequentFlyer/</guid>
      </item>
    
      <item>
        <title>What determines whether someone earns more than $50K?</title>
        <description>&lt;p&gt;&lt;img src=&quot;http://sachinshrestha.github.io/images/census.jpg&quot; alt=&quot;Census&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;
&lt;br /&gt;
We keep wondering what factors play a role in a person’s annual income. We may think of hundreds of factors, but are there that many?&lt;/p&gt;

&lt;p&gt;Here I use publicly available census data (source: UCI Machine Learning Laboratory) to find what factors actually determine whether someone earns more than 50K annually.&lt;/p&gt;

&lt;p&gt;The census dataframe used for this project contains 13 variables - 12 independent variables (&lt;i&gt;features&lt;/i&gt;) and one dependent variable (&lt;i&gt;label&lt;/i&gt;).  The variables are shown in the table below.&lt;/p&gt;

&lt;p&gt;The variable &lt;i&gt;fiftyKPlus&lt;/i&gt; is the variable to be predicted. So, it is the dependent variable. The remaining 12 variables are used to predict the dependent variable.&lt;/p&gt;

&lt;table border=&quot;1&quot; style=&quot;background-color:#FFFFCC;border-collapse:collapse;border:1px;color:#000000;width:100%&quot; cellpadding=&quot;5&quot; cellspacing=&quot;3&quot;&gt;
	&lt;tr&gt;
		&lt;th&gt;Name of Variable&lt;/th&gt;
		&lt;th&gt;Variable Type&lt;/th&gt;
		&lt;th&gt;Values&lt;/th&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;age&lt;/td&gt;
		&lt;td&gt;continuous&lt;/td&gt;
		&lt;td&gt;17 - 90 years&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;workclass&lt;/td&gt;
		&lt;td&gt;categorical&lt;/td&gt;
		&lt;td&gt;Federal-gov, Local-gov, Never-worked, &lt;br /&gt;Private, Self-emp-inc, Self-emp-not-inc, &lt;br /&gt;State-gov, Without-pay 
    &lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;education&lt;/td&gt;
		&lt;td&gt;categorical&lt;/td&gt;
		&lt;td&gt;Preschool, 1st-4th, 5th-6th, 7th-8th, 9th, 10th, 11th, 12th, &lt;br /&gt;HS-Grad, Assoc-acdm, Assoc-voc, Prof-school, Some-college,           &lt;br /&gt;Bachelors, Masters, Doctorate
		&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;maritalstatus&lt;/td&gt;
		&lt;td&gt;categorical&lt;/td&gt;
		&lt;td&gt;Divorced, Married-AF-spouse, Married-civ-spouse, &lt;br /&gt;Married-spouse-absent, Never-married, &lt;br /&gt;Separated, Widowed
&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;occupation&lt;/td&gt;
		&lt;td&gt;categorical&lt;/td&gt;
		&lt;td&gt; Adm-clerical, Armed-Forces, Craft-repair, &lt;br /&gt;Exec-managerial, Farming-fishing, Handlers-cleaners, &lt;br /&gt;  Other-service, Priv-house-serv, Prof-specialty, &lt;br /&gt;Protective-serv, Sales, Tech-support, Transport-moving
		&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;relationship&lt;/td&gt;
		&lt;td&gt;categorical&lt;/td&gt;
		&lt;td&gt;Unmarried, Husband, Wife, Not-in-family, Other-relative&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;race&lt;/td&gt;
		&lt;td&gt;categorical&lt;/td&gt;
		&lt;td&gt;White, Black, Amer-Indian-Eskimo,,Asian-Pac-Islander, Other
		&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;sex&lt;/td&gt;
		&lt;td&gt;categorical&lt;/td&gt;
		&lt;td&gt;Female, Male&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;capitalgain&lt;/td&gt;
		&lt;td&gt;continuous&lt;/td&gt;
		&lt;td&gt;0 - $100,000&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;capitalloss&lt;/td&gt;
		&lt;td&gt;continuous&lt;/td&gt;
		&lt;td&gt;0 - $4,356&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;hoursperweek&lt;/td&gt;
		&lt;td&gt;continuous&lt;/td&gt;
		&lt;td&gt;1 - 99 hours&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;nativecountry&lt;/td&gt;
		&lt;td&gt;categorical&lt;/td&gt;
		&lt;td&gt;Cambodia, Canada, China, Columbia, Cuba, &lt;br /&gt;Dominican-Republic, Ecuador, El-Salvador, &lt;br /&gt;England, France, Germany etc.
&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;fiftyKPlus&lt;/td&gt;
		&lt;td&gt;categorical&lt;/td&gt;
		&lt;td&gt;Less than or equal to $50,000, Greater than $50,000&lt;/td&gt;
	&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;The dependent variable &lt;i&gt;fiftyKPlus&lt;/i&gt; has two classes: &lt;em&gt;&amp;lt;=50K&lt;/em&gt;  or  &lt;em&gt;&amp;gt;50K&lt;/em&gt;. Therefore, this being a classification problem, I have developed the following models to predict whether or not a person earns more than 50K annually. I have used R to build the models.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;A CART (Classification and Regression Tree) model&lt;/li&gt;
  &lt;li&gt;A CART model with cross-validation&lt;/li&gt;
  &lt;li&gt;A Random Forest model&lt;/li&gt;
  &lt;li&gt;A Logistic Regression model&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The details of the modelling process and the R code to this project is avaiable &lt;a href=&quot;http://sachinshrestha.github.io/censusCode/&quot;&gt; here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To decide which model is best suited to predict whether or not a person earns more than 50K a year, the performance of each model has to be evaluated.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Performance of the models:&lt;/b&gt;&lt;/p&gt;
&lt;table border=&quot;1&quot; style=&quot;background-color:#FFFFCC;border-collapse:collapse;border:1px;color:#000000;width:100%&quot; cellpadding=&quot;5&quot; cellspacing=&quot;3&quot;&gt;
	&lt;tr&gt;
		&lt;th&gt;Type of Model&lt;/th&gt;
		&lt;th&gt;Baseline Accuracy&lt;/th&gt;
		&lt;th&gt;Model Accuracy&lt;/th&gt;
		&lt;th&gt;AUC&lt;/th&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;CART&lt;/td&gt;
		&lt;td&gt;&lt;/td&gt;
		&lt;td&gt;0.849582&lt;/td&gt;
		&lt;td&gt;0.846746&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;CART with cross validation&lt;/td&gt;
		&lt;td&gt;&lt;/td&gt;
		&lt;td&gt;0.862403&lt;/td&gt;
		&lt;td&gt;0.871933&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;Random Forest&lt;/td&gt;
		&lt;td&gt;&lt;/td&gt;
		&lt;td&gt;0.825580&lt;/td&gt;
		&lt;td&gt;NA&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;Logistic Regression&lt;/td&gt;
		&lt;td&gt;0.759362&lt;/td&gt;
		&lt;td&gt;0.850989&lt;/td&gt;
		&lt;td&gt;0.905733&lt;/td&gt;
	&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br /&gt;
Since the accuracy of the &lt;em&gt;random forest model&lt;/em&gt; is the least of the four models, it may not be best suited for predicting the &lt;em&gt;label&lt;/em&gt; in this project.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;logistic regression model&lt;/em&gt; does a little better than the &lt;em&gt;CART model&lt;/em&gt;. However, the &lt;em&gt;logistic regression model&lt;/em&gt; is not easily interpretable. Although the model coefficients may be used to determine the significance of &lt;em&gt;features&lt;/em&gt;, the coefficients do not offer simple explanation of how decision is made. For example, look at the following results for the variable &lt;em&gt;education&lt;/em&gt; from the summary of the &lt;em&gt;logistic regression model&lt;/em&gt;.&lt;/p&gt;
&lt;section&gt;
&lt;pre&gt;&lt;code&gt;&lt;font size=&quot;2&quot;&gt;
occupation Adm-clerical                  -7.306e-02  1.283e-01  -0.569 0.569102
occupation Armed-Forces                  -1.393e+01  9.937e+02  -0.014 0.988812
occupation Craft-repair                   4.995e-02  1.092e-01   0.457 0.647523
occupation Exec-managerial                7.267e-01  1.133e-01   6.414 1.42e-10 ***
occupation Farming-fishing               -1.214e+00  1.851e-01  -6.558 5.46e-11 ***
occupation Handlers-cleaners             -7.398e-01  1.879e-01  -3.938 8.22e-05 ***
occupation Machine-op-inspct             -3.752e-01  1.393e-01  -2.693 0.007089 **
occupation Other-service                 -9.234e-01  1.655e-01  -5.579 2.42e-08 ***
occupation Priv-house-serv               -1.390e+01  2.166e+02  -0.064 0.948837
occupation Prof-specialty                 3.865e-01  1.213e-01   3.186 0.001442 **
occupation Protective-serv                6.512e-01  1.718e-01   3.790 0.000150 ***
occupation Sales                          1.892e-01  1.170e-01   1.617 0.105965
occupation Tech-support                   5.251e-01  1.532e-01   3.428 0.000608 ***
&lt;/font&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/section&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Observe how some sub-categories of the variable have been marked as significant (with three asterisks) whereas some are marked as not significant at all. A similar trend can be seen with other variables too. This is complicated! The model is difficult to use to quickly make a prediciton for a new case. In summary, the &lt;em&gt;logistic regression model&lt;/em&gt; is not easily interpretable.&lt;/p&gt;

&lt;p&gt;On the other had, the &lt;em&gt;CART model&lt;/em&gt; is more easily interpretable. A &lt;em&gt;CART model&lt;/em&gt; is also preferable because it does not assume a linear model like a &lt;em&gt;logistic regression model&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;As seen in the plot of the &lt;em&gt;CART model&lt;/em&gt; (see &lt;a href=&quot;http://sachinshrestha.github.io/censusCode/&quot;&gt; code&lt;/a&gt;), the features that split the tree are &lt;em&gt;relationship&lt;/em&gt;, &lt;em&gt;capitalgain&lt;/em&gt; and &lt;em&gt;education&lt;/em&gt;. The &lt;em&gt;CART model&lt;/em&gt; is more interpretable in the sense that the model tells that these three features are the strong predictors of whether or not a person earns more than 50K annually.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;CART model with cross-validation&lt;/em&gt; has the highest accuracy of the models. Although the plot of the &lt;em&gt;CART model with cross-validation&lt;/em&gt; presents a tree that is more complex than that of the &lt;em&gt;CART model without cross-validation&lt;/em&gt;, a closer look shows that both the trees have been split by the same three features: &lt;em&gt;relationship&lt;/em&gt;, &lt;em&gt;capitalgain&lt;/em&gt; and &lt;em&gt;education&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;We may thus conclude that, out of the four models, the &lt;em&gt;CART model with cross-validation&lt;/em&gt; best predicts whether or not a person earns more than $50K a year.&lt;/p&gt;

&lt;p&gt;And what factors are the strong predictors? Well, as we just saw, the following three factors most signify whether a person earns more than $50K every year:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;em&gt;relationship&lt;/em&gt;,&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;capitalgain&lt;/em&gt; and&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;education&lt;/em&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here’s the complete R code to the&lt;a href=&quot;http://sachinshrestha.github.io/censusCode/&quot;&gt; Census Project &lt;/a&gt;.&lt;/p&gt;
&lt;hr /&gt;

</description>
        <pubDate>Mon, 09 Nov 2015 00:00:00 +0000</pubDate>
        <link>http://sachinshrestha.github.io/census/</link>
        <guid isPermaLink="true">http://sachinshrestha.github.io/census/</guid>
      </item>
    
  </channel>
</rss>