An exploratory and iterative process of asking many questions and find answers from data in order to build better hypothesis for Explanation, Prediction, and Control.
2 Principle Questions for EDA
Q. How the variance of Monthly Income?
A. Manager’s Monthly Income range seems to be higher while Sales Rep & Research Scientist are lower.
Q. Are there any differences in the Monthly Income variance between Male and Female. If so, how?
A. There is no clear difference between Female and Male. (It's even easier to see by Density Plot)
Q. Are there any differences in the Monthly Income variance among Job Roles. If so, how?
A. Manager’s Monthly Income range seems to be higher while Sales Rep & Research Scientist are lower. (It’s hard to see the differences among the Job Roles because they are on top of each other. Density Plot comes to help!)
It’s easier to see the differences among Job Roles by Density Plot than Histogram.
Q. How the variance of Job Role?
A. Sales Executive, Research Scientists, and Laboratory Technician have the largest number of headcounts adding up to 60%.
Q. Are there any differences in the Job Role variance between Male and Female. If so, how?
Q. How is the variance of Monthly Income associated with Job Role?
A. By BoxPlot we can see three ranges of Monthly Income by Job Roles.
Q. How is the variance of Monthly Income associated with Job Role?
A. By ViolinPlot we can see three ranges of Monthly Income by Job Roles and also the distribution (density) of Incomme for each Job Role.
Q. How is the variance of Monthly Income correlated with the variance of Age?
A. The income increases as the age increases. By ScatterPlot and Trend Line of Linear Regression we can see there is a correlation of strength 0.49.
Q. How is the variance of Monthly Income correlated with the variance of Total Working Years?
A. The income increases as the working years increase. By ScatterPlot and Trend Line of Linear Regression we can see there is a strong correlation of strength 0.77.
Q. How is the variance of Monthly Income correlated with the variance of Total Working Years?
A. There is a huge jump in Monthly Income after 20th years of working in this company. And up till 16th years, the income increases as the working years increase. But after 20th years, there is no obvious correlation between the working years and the income.
Q. Are there any differences in the correlation between Monthly Income and Total Working Years among Job Roles? Which Job Roles have similar kind of correlation?
Strong correlation: Healthcare Representative, Human Resources, Manufacturing, Reserch Director (work longer, paid better)
Middle correlation: Sales Executive, Manager, Research Scientist.
Weak correlation: Laboratory Technician, Sales Represntative.
Q. How is the variance of Monthly Income associated with Gender?
A. There is no clear association.
Q. How is the Job Role associated with the Education?
A. Life Science is the majority among all Job Roles. Sales Executive/Representative has more people of background in Marketing. Human Resources has more people of background in its own specialty.
Q. How is the Job Role associated with the Attrition?
A. Sales Representative has the highest turnover rate up to 40%,
Q. How is the Job Role associated with the Monthly Income?
Airbnb Listing Data for New York City https://exploratory.io/data/kanaugust/Airbnb-Listing-Data-for-New-York-City-OPV5XdB9uw
Historical US Stock Prices Data of Tech Giants
Q. How is the trend of the number of properties at Airbnb over the years? Is the trend steadily increasing? Are there any change points of the trend?
A. The total number of hosted properties increased very rapidly especially from 2013 to 2016, but then it slowed down for a bit after that.
Q. Is the trend same among different regions? Which regions did contribute the rapid growth?
Q. Is the trend same among different neighborhoods? Which neighborhood did contribute the rapid growth?
A. * Williamsburg (Light Blue) and Bedford-Stuyvesant (Blue) and are the significant top 2. The number of properties started picking up as early as 2011. * Bedford-Stuyvesant (Blue) catched up with Williamsburg (Light Blue) at 2019. * The growth speed for Williamsburg seems to have settled after 2016.
Q. Are regions growing rapidly in some particular property types?
A. Apartment has significant growth among all.
To compare the means or the ratios we should take account of the variance in the data and the size of the data.
95% Confidence Interval gives us the range of 'True Mean' by computing condifence intervals of all 'Sample Mean'. i.e., 95% of the confidence intervals of 'Sample Mean' should inclue the 'True Mean' of the population.
Error Bar chart is your friend if you want to compare the means or the ratios with confidence interval.
Comparing the Attrition Rates among Job Roles.
Online Seminar - Data Visualization Workshop https://exploratory.io/note/kanaugust/Data-Visualization-Workshop-YAZ6azM0MU