Comparative Statistics


 


The main goal of this report is to identify whether the crime statistics of Texas is related to the crime statistics of the entire United States. Further, this also aims to determine the difference of the crime statistics of both data. With this, the statistical method to be used is Paired Sample T-Test and Correlational Statistics. The strength of the linear association between two variables is quantified by the correlation coefficient.


Given a set of observations (x1, y1), (x2,y2),…(xn,yn), the formula for computing the correlation coefficient is given by:



            Where:


= Correlation between X and Y


= Sum of Variable X


= Sum of Variable Y


= Sum of the product X and Y


N= Number of Cases


= Sum of squared X score


= Sum of squared Y score


 


Furthermore, the correlation coefficient always takes a value between -1 and 1, with 1 or -1 indicating perfect correlation (all points would lie along a straight line in this case). A positive correlation indicates a positive association between the variables (increasing values in one variable correspond to increasing values in the other variable), while a negative correlation indicates a negative association between the variables (increasing values is one variable correspond to decreasing values in the other variable). A correlation value close to 0 indicates no association between the variables.


Since the formula for calculating the correlation coefficient standardizes the variables, changes in scale or units of measurement will not affect its value. For this reason, the correlation coefficient is often more useful than a graphical depiction in determining the strength of the association between two variables.


In addition, if the correlation index of the computed rxy is not perfect, then it is suggested to use the following categorization (Guilford, J.P. and B. Fruchter, 1973):


            rxy                                                  Indication


between ± 0.80 to ± 1.00    :           High Correlation


between ± 0.60 to ± 0.79    :           Moderately High Correlation


between ± 0.40 to ± 0.59    :           Moderate Correlation


between ± 0.20 to ± 0.39    :           Low Correlation


between ± 0.01 to ± 0.19    :           Negligible Correlation


 


The Paired Samples T-Test is used to compare the means of two variables (crime statistics of Texas and United States). In addition, this also calculates the difference between these two variables for each case, and evaluates to see if the average difference is significantly different from zero.


The following table shows the Crime Statistics of Texas and United States in 2005.


 


 


Table 1


Crime Statistics 2005


 


Texas


United States


Population


22,859,968 


296,410,404 


Index


1,111,384 


11,556,854


Violent


121,091 


1,390,695


Property


990,293 


10,166,159 


Murder


1,407 


16,692 


Forcible Rape


8,511 


93,934 


Robbery


35,790 


417,122 


Aggravated Assault


75,383


862,947 


Burglary


219,828 


2,154,126 


Larceny-Theft


677,042 


6,776,807 


Vehicle Theft


93,423 


1,235,226 


 


Table 2


Data Analysis


Statistic


Texas Crime Statistics


United States


Mean


246974.222222


2.56819e+6


Variance


1.07650e+11


1.09937e+13


Standard Error


328101.080628


3.31568e+6


Correlation


0.999430


T-Test


78.312437


Critical 2-sided T-value (5%)


2.365000


2-sided p-value


0.000000


Critical 1-sided T-value (5%)


1.895000


1-sided p-value


0.000000


Degrees of Freedom


7


Observations


9


 


 


 


Figure 1


 


 


 


 


 


 


 


 


 


 


 


 


            The correlations table displays Pearson correlation coefficients, significance values, and the number of cases with non-missing values. Pearson correlation coefficients assume the data are normally distributed. The Pearson correlation coefficient is a measure of linear association between two variables.


Basically, the values of the correlation coefficient range from -1 to 1. The sign of the correlation coefficient indicates the direction of the relationship (positive or negative). The absolute value of the correlation coefficient indicates the strength, with larger absolute values indicating stronger relationships. The correlation coefficients on the main diagonal are always 1.0, because each variable has a perfect positive linear relationship with itself. Correlations above the main diagonal are a mirror image of those below.


Analysis shows that the crime statistics of Texas and United States has a strong positive correlation. This means that the crime rate in Texas affects the crime rate of the entire United States. For the paired Samples t-test, the t, degrees of freedom, and significance of the data are computed.


Correlation


0.999430


Critical 2-sided T-value (5%)


2.365000


2-sided p-value


0.000000


Critical 1-sided T-value (5%)


1.895000


1-sided p-value


0.000000


Degrees of Freedom


7


 


The T value = 2.365


We have 7 degrees of freedom


Significance is 0.999430


 


            The significance of each correlation coefficient is also displayed in the correlation table. The significance level (or p-value) is the probability of obtaining results as extreme as the one observed. If the significance level is very small (less than 0.05) then the correlation is significant and the two variables are linearly related. If the significance level is relatively large, for example 0.50, then the correlation is not significant, and the two variables are not linearly related.


The correlation coefficient for Texas (independent) and United States (dependent) is 0.999. The significance level or p-value is 0.000 which indicates a very low significance. The law significance level indicates that crime statistics of Texas (independent) and crime statistics of United States (dependent) are significantly positively correlated.


 



Credit:ivythesis.typepad.com


0 comments:

Post a Comment

 
Top