Economics 140 Fall 2015 Problem Set #4

| November 24, 2016

California State University, Sacramento Fall 2015
Quantitative Economic Analysis Economics 140
Problem Set #4
Section 2 Due in hard copy at beginning of lecture on Monday, 10/26/15
Section 3 Due in hard copy at beginning of lecture on Tuesday, 10/27/15
Please follow all the instructions, or your problem set will receive no points. Perform all of the steps
given below. You will only turn in the answer to Part c. It is to be typed in 12 points Times New Roman
font. Problem sets are due at the beginning of lecture and will not be accepted late. I strongly encourage
you to form study groups and work together on this problem set. Many of you will get frustrated as what
I am asking you to do is tedious. (It took me two attempts to successfully download and clean the data
myself.) This is the nature of dealing with data though. For this problem set, I will grade everyone’s
problem set based solely on the summary statistics. (I provide one check-value below.) If you need help,
you will need to come to my office hours as I will not attempt to answer questions by email. While you
are free to work together, realize that if you just blindly copy someone else’s work and they are wrong,
you get no points. Also, you will only be cheating yourself if you free-ride. You need to be able to
handle data to complete your project in this class and in Econ 145.
Problem sets should NOT be left in my Economics Department mailbox or under my office door. I
am not responsible for problem sets turned in any way other than to me at the beginning of lecture.
No electronic submissions will be accepted.
I will discuss this problem set briefly in class on Wednesday and Thursday. Yiou should go ahead
and start as soon as possible though!
1. Constructing a Data Set
For this assignment, you will download a data set from the University of Michigan Panel Survey Of
Income Dynamics. Note that during this process you will need to create an account. To access the PSID,
go to .isr.umich.edu/”>http://psidonline.isr.umich.edu/. (If you are using Windows Explorer, make sure you are
compatibility mode.) From the home page you will select Data, and then Data Center under the section
labeled Public Use. Once you get to the Data Center, select File. From this page, expand PSID FamilyLevel
and PSID Main Family Data. Then expand the 2005 survey from which you will select your
variables. Following the instructions given on the screen, select the following variables.
ER25017 AGE OF HEAD
ER25018 SEX OF HEAD
ER25020 # CHILDREN IN FU
ER25023 HEAD MARITAL STATUS
ER25128 BC21 MAIN IND FOR JOB 1: 2000 CODE (HD)
ER25160 BC41 YRS PRES EMP (H-E)
ER25161 BC41 MOS PRES EMP (H-E)
ER25910 G13 WAGES/SALARY OF HEAD
ER27393 L40 RACE OF HEAD-MENTION 1
ER27418 L55 HGHST COLLEGE DEGREE RECD-HD
ER27931 LABOR INCOME OF HEAD-2004
ER28003 HEAD WAGE RATE-2006
ER28047 COMPLETED ED-HD
After you add the data to your cart, go to your cart and complete the check-out process. Select PDF for
the Code Book type and Microsoft Excel Spreadsheet for the data output type. Save both in a safe place.
The first thing you must do after downloading your data is to clean it. It will be easiest to do this by using
the filter feature in Excel and deleting rows as appropriate. A separate handout will be posted for those of
you who do not know how to use this.
a. We need to eliminate from our sample each household for which we are missing data. Typically,
this will be a case where a respondent has refused to answer or does not know the answer.
Eliminate observations as indicated in the table below:
Variable Eliminate Observations with Code
AGE OF HEAD 999
HEAD MARITAL STATUS 8, 9
BC21 MAIN IND FOR JOB 1: 2000 CODE (HD) 0, 999
BC41 YRS PRES EMP (H-E) 98, 99*
BC41 MOS PRES EMP (H-E) 98, 99
*
G13 WAGES/SALARY OF HEAD 0, 9,999,998, 9,999,999
L40 RACE OF HEAD-MENTION 1 0, 9
L55 HGHST COLLEGE DEGREE RECD-HD 8, 97, 98, 99
LABOR INCOME OF HEAD-2004 0
HEAD WAGE RATE-2004 0
COMPLETED ED-HD 99
*For these two variables you will need to filter twice so as to delete all observations for which both
variables take the value of zero. These will be the people we will consider to have no time on their
current job and hence not working. Note here that for the sake of simplicity we are assuming away
some of the complexity of the actual data set. It is not strictly correct to eliminate some of these
people from our data set. Doing so will not significantly affect our results though.
Once you correctly complete this process you will have 4,256 observations. Before you do anything
else, save your data set with a unique name. You should always save this version as it is. Save
the version you create in Part b under a different name.
b. A number of these variables are categorical variables. Convert the following into dummy
variables as instructed:
i. “Head Marital Status” into a single binary variable of married = 1 for married and zero
otherwise. Not married is the benchmark.
ii. “Race of Head-Mention 1” into a single binary variable of white or non-white with white as
the benchmark. White=1 if race is white and zero otherwise.
iii. “Sex of Head” into a single binary variable of female=1 for female and zero for male. In
other words male is the benchmark.
iii. “Highest College Degree Recd-HD” into dummy variables for Associate Degree, Bachelor’s
Degree and Graduate Degree using no degree (code 0) as your benchmark.
iv. Convert the categorical variable INDUSTRY into a series of dummy variables as indicated in
the table below.
Note that for convenience you are creating a dummy variable for every category. Clearly, you can’t
include all of them in a regression as they will then be perfectly multi-collinear. Whichever variable we
leave out of a regression will be the benchmark for that particular regression.
c. You will also need to combine years and months of employment. To do this, you will need to
divide months of employment by 12 and add the resulting decimal number to years of
employment. You can complete this step either in Excel or in Eviews. Note that PSID also
includes length of employment in days. To get an accurate number for length of employment we
would combine years, months and days. For simplicity, we are only using years and months.
Label this new variable “Tenure”
d. Name your variables in a logical and intuitive manner (you are welcome to use my names, but are
not required to) and compute summary statistics for each. Report only the mean, median and
standard deviation for each variable. Report the sample size for your overall data set. For each of
your dummy variable, interpret the mean. What is it telling you? The table below provides a
check value for the mean some variables.
Variable Mean Variable Mean Variable Mean
AGE 41.73167 INFR WHITE
FEMALE PFSRVCS ASSOC
CHILDREN HSPENT BACHELORS
MARRIED OSRVCS GRADUATE
RUC PUBADMIN LABOR INCOME OF HD 44,256.72
MFCTR TENURE 8.2883 WAGE_RATE
TRDTRSP 0.20113 WAGES YEARS_EDUCATION 13.32
Save all three versions of your data set. You should have the original data
from PSID, the data set that you have cleaned up in part a, and the data set
with all the dummy variables as separate files.
All that you need to turn in is a single clean table showing the mean of all your
variables and the standard deviation for all variables except the dummy
variables.
Dummy Variable Category Variable Name Codes in Original Data
Resources, Utilities and Construction RUC 17-77
Manufacturing MFCTR 107-399
Wholesale and Retail Trade, Transport and Warehousing TRDTRSP 407-639
Information, Finance and Real Estate INFR 647-719
Professional Services PFSRVCS 727-847
Hospitality and Entertainment HSPENT 856-869
Other Services OSRVCS 877-929
Public Admin and Military PUBADMIN 937-987

Get a 30 % discount on an order above $ 5
Use the following coupon code:
CHRISTMAS
Order your essay today and save 30% with the discount code: CHRISTMASOrder Now
Positive SSL