Introduction to Statistics - Brain Mentors
Statistics
is one of the major branches of mathematics and stats is one of the biggest reasons
behind the success of Data Science, because the methodologies and techniques
statistics provides are very helpful for Data Scientist in daily life.
Statistical Analysis plays a major role in life cycle of data science.
Definition: Statistics deals with the methods
which helps us to gather, analyze, review, and make conclusions from the data. Statistics
is used when the data set depends on a sample of a larger population, then the
analyst can develop interpretations about the population primarily based on the
statistical outcomes from the sample. Like mean, median, mode, range etc.
It comes
into the role when a user wants to see the insight of data or wants to find out
hidden patterns and it is used in almost every field and department like:
· Weather reports
· In Sports to show players and teams
performances
· In TV Channels to perform analysis on
TRP
· Stock Market
· Products Based Companies
· Disease and their impact
From very
small to very large, each company need statistics to evaluate their growth and
how their products perform in market. Let’s see few examples of statistics:
So, these
was the few examples of statistics that how everything is being shown to us
with the help of graphs and graphs are the best way to show data to users.Before
we talk about statistics, first we need to understand data and its different
types.
Data and
its types:
Categorical Or Qualitative : Categorical data is a type of data
which represents categories. Data is divided into two or more categories like
gender, languages, cast or religion etc. Data can also be numerical (Example :
1 for male and 0 for female). Here numbers do not represent any mathematical
meaning.
Nominal : Type of data that has two or more
categories without any specific order. Nominal values represents discrete units
and used to label variables that do not have any quantitative value.
Examples :
· Gender – Male and Female
· Languages – Hindi, English, Chinese
· Exams – Pass, Fail
· Grades – A, B, C, D
Ordinal : Type of data that has two or more
categories but they have a specific order. Ordinal values represents discrete
and ordered units. So it is almost similar to nominal data but they have some
specific order.
Examples
:
· Movie Ratings : Flop, Average, Hit, Superhit
· Scale : Strongly Disagree, Disagree,
Neutral, Agree, Strongly Agree
Numerical
Or Quantitative : Numerical
data represents continuous type of data which has a mathematical meaning and
measured in a quantity.
Interval : Interval type of data represents
data in equal intervals. The values of interval variable are equally spaced. So
they are almost similar to ordinal type of data but here data could of any
continuous range.
Examples
:
·
Temperature
– Generally temperature is divided into equal intervals like 10 – 20, 20 – 30,
30 – 40
·
Distance
and speed could also be given in equal intervals
Ratio : It is interval data with a natural
zero point. When a value of any variable is 0.0 then it means there is none of
that value. Suppose you are given temperature and it is 0 degree, so it is
valid because temperature could be 0 degree. But if I say that your height is 0
ft then it doesn’t mean anything.
Statistics
is also divided into 2 major categories :
·
Descriptive Statistics – Presenting, organizing and summarizing data
·
Inferential Statistics – Drawing conclusions about a population based on
data observed in a sample
Descriptive
Statistics
Descriptive
Statistics helps to find out the summary of data and tells us the value that
best describes the data set. It also tells how much your data is spread and
scattered around from its average value or mean value. You can also find out
minimum and maximum range of your data.
Descriptive
Statistics is broken down into :
Measure of
Central Tendency (Mean, Median, Mode)
Measure
of Variability / Spread (Standard Deviation, Variance, Range, Kurtosis,
Skewness)
Measure
of Central Tendency
Here we can
describe whole dataset with a single value that represents the center of its
distribution. There are 3 main measures of central tendency : Mean,
Median and Mode.
I know most
of you are already aware of simple arithmetic mean but there are few more types
of mean that you should learn about. Different types of mean :
·
Arithmetic Mean
·
Weighted Mean
·
Geometric Mean
·
Harmonic Mean
Relationship
b/w AM, GM and HM
Measure of
variability describes how spread out a set of data is. We can observe how widely
data is scattered when we have large values in the dataset or how data is tightly
clustered when we have smaller values in the dataset. It tells the
variation of the data from one another and gives the clear idea about the
distribution.
The spread
of a data is described by a range of descriptive statistics which includes
variance, standard deviation, range and interquartile range. Here the spread of
data can be shown in graphs like : boxplot, dot plots, stem and leaf plots. The
measure of variability tells how much your data is deviated from its standard
or in simple terms we can say that how much data is far away from center point
or from average value.
Note : We
are not going in depth of measure of central tendency or measure of spread.
Soon there will be a separate blog for these topics. Here in this blog we are
just having introduction to statistics
Probability
Distributions
You might
have heard the term probability a lot of times earlier and might have studied
in schools or colleges as well. There were few common examples when we used to
learn probability like probability of head or tail when we coin the toss or
probability of getting a 6 if roll the dice.
Definition : Probability Distributions are the
mathematical functions from which we get to know about the probabilities of the
occurrence of various possible outcomes in an experiment. There are different
types of probability distributions like :
•
Bernoulli Distribution
•
Uniform Distribution
•
Binomial Distribution
•
Normal Distribution
•
Poisson Distribution
•
Exponential Distribution
•
T-Distribution
•
Chi-Squared Distribution
I have just
written few of the most popular ones. There are few more types of
distributions.
Note : We are not going into details of these distributions right now,
because each distribution needs a separate blog. So in upcoming blogs we will
see these distributions one by one.Here in this blog we are just having
introduction to statistics.
Inferential
Statistics
Inferential
Statistics is used to make conclusions from the data. Generally here we take a
random sample from the population to describe and make inferences about the
population.
Inferential
statistics is used a lot in data analysis field. We conduct different types of
test on random samples from a given set of data and get to know about the
effect of the product. Inferential Statistics use statistical models to help
you compare your sample data to other samples or to previous research. Most
research uses statistical models called the Generalized Linear model and
include :
· Student’s t-test
· ANOVA (Analysis of variance)
· Regression Analysis
Inferential
Statistics includes :
·
Hypothesis Testing
·
Binomial Theorem
·
Normal Distributions
·
T-Distributions
·
Central Limit Theorem
·
Confidence Intervals
·
Regression Analysis / Linear Regression
·
Comparison of Mean
So this
was a introduction to statistics and its different types.
Comments