Writing Assignment 2A (Data Scientist at Amazon)

Due Date:  Thursday  May 13

You are currently  interviewing  for the  job of Data  Scientist  at  Amazon.  As part  of the interview  process, Amazon  has asked you to analyze the  data  set Crime Multivariable Data Set.xlsx which contains  crime statistics from 50 American  cities and  present your findings in the form of a statistics report.  The data  set contains  seven columns labeled X1 through  X7; each row is data  for a city.  The  reference for the  data  was the  book Life in America’s Cities, by G.S. Thomas.  Here is a description  of each column:

(1) X1 = total  overall reported  crime rate  per 1 million residents

(2) X2 = reported  violent crime rate  per 100,000 residents

(3) X3 = annual  police funding in $/resident

(4) X4 = % of people 25 years + with 4 years of high school

(5) X5 = % of 16 to  19 years  olds not  in high  school graduate  and  not  high  school graduates

(6) X6 = % of 18 to 24 year olds in college

(7) X7 = % of people 25 years + with at least 4 years of college

In this project, Amazon would like you to treat  X3 through X7 as independent variables (i.e. as x-variables)  and X1 and X2 as dependent variables (i.e.  as y-variables).  Amazon would like you to study the effects of X3 through  X7 on X1 and X2 respectively.  Here is the format for your report:

1. Introduction:  Here you should describe the problem you are investigating  as well as given a detailed description of the data  set you are studying  (in this case the data is crime statistics from 50 American  cities).   Specifically, specify the  independent variables in this data  set and the dependent variables that  you are trying to explain or predict.  In this section, formulate hypotheses (or educated  guesses) on the impact of X3 through  X7 on X1 and X2 respectively.  For example, do you expect a positive or negative correlation  between (a) X3 and X1, (b) X3 and X2, (c) X4 and X1, (d) X4 and X2, and so on. Also, which three pairs of variables do you expect to have the strongest  correlation?  Explain your reasoning.

In formulating  your hypothesis,  discuss what the literature has to say on the relationship  between (1) crime and police funding and (2) crime and education.   Use a minimum of 4 references. The references may be books or journal articles.

2. Analysis and Results:  As the  name indicates,  this  is the  section where you will do your analysis and report  your results.  Specifically, calculate  the correlation  coefficients between (a) X3 and X1, (b) X3 and X2, (c) X4 and X1, (d) X4 and X2, and so on.  Organize your results  in the form of a table.  Which three  pairs of variables displayed  the  strongest  correlations?   Do your results  agree with  your hypotheses? Propose an explanation  for the results that  you are obtaining.  For each of the three pairs of variables with the strongest correlations,  compute the equation of the regression line and plot its scatterplot and regression line on a graph.  (Hence, your report will contain three graphs in total.)  In addition,  compute the relative squared error for each regression line. The relative squared error is defined as follows:

where y is the estimate  given by your regression line.  Which of the three  regression b line had the smallest relative squared error?  Discuss.

3. Conclusions:  Here you will give a summary  of your  results  and  discuss possible directions for future  research.

4. References: Here you will list any references that  you used.  Here is one commonly used format for citing research papers:

E. Parzen,  “Maximum Entropy  Interpretation of Autoregressive Spectral Densi- tites”,  Statistics  and Probability, 1, pp.  2-6. (1982).

Here is one format for citing books:

M. B. Priestley,  Spectral Analysis and Time Series, Academic Press, (1981). Alternately, you may use the APA format for citing references.

Note:  If you use any books or articles  for your report,  do NOT plagiarize!  Plagiarism  is easily detected  using software.  Explain everything  in your own words.