T Test
Today, Im exploring T-test and its significance. T test is a type of hypothesis testing and its a very important tool in data science. A hypothesis is any testable assupmtion about the data set and hypothesis testing allows us to validate these assumptions
T-test is predominantly used to understand whether the differrence in means of two datasets have any statistical significance. For T test to provide any meaningful insights, the datasets has to satisfy the following conditions
-
-
- The data sets must be normally distributed, i.e, the shape must resemble a bell curve to an extent
- are independent and continuous, i.e., the measurement scale for data should follow a continuous pattern.
- Variance of data in both sample groups is similar, i.e., samples have almost equal standard deviation
-
Hypotheses:
-
- H0: There is a significant differrence between the means of the data sets
- H1: There is no significant differrence between the means of the data sets
T Test code
Results:
Reject the null hypothesis. There is a significant difference between the datasets. T-statistic: -8.586734600367794 P-value: 1.960253729590773e-17
Note:
This T-test does not provide any meaningful insights as two of the requisite conditions are violates
-
- The datasets are not normally distributes
- the variances of the datasets are not quite similar