Economics 140 Problem Set #4 Section 2 Due in hard copy at beginning of lecture on Monday

| November 24, 2016

Department of Economics
California State University, Sacramento
Quantitative Economic Analysis

M. Dowell
Fall 2015
Economics 140
Problem Set #4

Section 2 Due in hard copy at beginning of lecture on Monday, 10/26/15
Section 3 Due in hard copy at beginning of lecture on Tuesday, 10/27/15
Please follow all the instructions, or your problem set will receive no points. Perform all of the steps
given below. You will only turn in the answer to Part c. It is to be typed in 12 points Times New Roman
font. Problem sets are due at the beginning of lecture and will not be accepted late. I strongly encourage
you to form study groups and work together on this problem set. Many of you will get frustrated as what
I am asking you to do is tedious. (It took me two attempts to successfully download and clean the data
myself.) This is the nature of dealing with data though. For this problem set, I will grade everyone’s
problem set based solely on the summary statistics. (I provide one check-value below.) If you need help,
you will need to come to my office hours as I will not attempt to answer questions by email. While you
are free to work together, realize that if you just blindly copy someone else’s work and they are wrong,
you get no points. Also, you will only be cheating yourself if you free-ride. You need to be able to
handle data to complete your project in this class and in Econ 145.
Problem sets should NOT be left in my Economics Department mailbox or under my office door. I
am not responsible for problem sets turned in any way other than to me at the beginning of lecture.
No electronic submissions will be accepted.
I will discuss this problem set briefly in class on Wednesday and Thursday. Yiou should go ahead
and start as soon as possible though!

1. Constructing a Data Set
For this assignment, you will download a data set from the University of Michigan Panel Survey Of
Income Dynamics. Note that during this process you will need to create an account. To access the PSID,
go to http://psidonline.isr.umich.edu/. (If you are using Windows Explorer, make sure you are
compatibility mode.) From the home page you will select Data, and then Data Center under the section
labeled Public Use. Once you get to the Data Center, select File. From this page, expand PSID FamilyLevel and PSID Main Family Data. Then expand the 2005 survey from which you will select your
variables. Following the instructions given on the screen, select the following variables.
ER25017
ER25018
ER25020
ER25023
ER25128
ER25160
ER25161
ER25910
ER27393
ER27418
ER27931
ER28003
ER28047

AGE OF HEAD
SEX OF HEAD
# CHILDREN IN FU
HEAD MARITAL STATUS
BC21 MAIN IND FOR JOB 1: 2000 CODE (HD)
BC41 YRS PRES EMP (H-E)
BC41 MOS PRES EMP (H-E)
G13 WAGES/SALARY OF HEAD
L40 RACE OF HEAD-MENTION 1
L55 HGHST COLLEGE DEGREE RECD-HD
LABOR INCOME OF HEAD-2004
HEAD WAGE RATE-2006
COMPLETED ED-HD

After you add the data to your cart, go to your cart and complete the check-out process. Select PDF for
the Code Book type and Microsoft Excel Spreadsheet for the data output type. Save both in a safe place.

The first thing you must do after downloading your data is to clean it. It will be easiest to do this by using
the filter feature in Excel and deleting rows as appropriate. A separate handout will be posted for those of
you who do not know how to use this.
a. We need to eliminate from our sample each household for which we are missing data. Typically,
this will be a case where a respondent has refused to answer or does not know the answer.
Eliminate observations as indicated in the table below:
Variable
AGE OF HEAD
HEAD MARITAL STATUS
BC21 MAIN IND FOR JOB 1: 2000 CODE (HD)
BC41 YRS PRES EMP (H-E)
BC41 MOS PRES EMP (H-E)
G13 WAGES/SALARY OF HEAD
L40 RACE OF HEAD-MENTION 1
L55 HGHST COLLEGE DEGREE RECD-HD
LABOR INCOME OF HEAD-2004
HEAD WAGE RATE-2004
COMPLETED ED-HD

Eliminate Observations with Code
999
8, 9
0, 999
98, 99*
98, 99*
0, 9,999,998, 9,999,999
0, 9
8, 97, 98, 99
0
0
99

*

For these two variables you will need to filter twice so as to delete all observations for which both
variables take the value of zero. These will be the people we will consider to have no time on their
current job and hence not working. Note here that for the sake of simplicity we are assuming away
some of the complexity of the actual data set. It is not strictly correct to eliminate some of these
people from our data set. Doing so will not significantly affect our results though.
Once you correctly complete this process you will have 4,256 observations. Before you do anything
else, save your data set with a unique name. You should always save this version as it is. Save
the version you create in Part b under a different name.
b. A number of these variables are categorical variables. Convert the following into dummy
variables as instructed:
i. “Head Marital Status” into a single binary variable of married = 1 for married and zero
otherwise. Not married is the benchmark.
ii. “Race of Head-Mention 1” into a single binary variable of white or non-white with white as
the benchmark. White=1 if race is white and zero otherwise.
iii. “Sex of Head” into a single binary variable of female=1 for female and zero for male. In
other words male is the benchmark.
iii. “Highest College Degree Recd-HD” into dummy variables for Associate Degree, Bachelor’s
Degree and Graduate Degree using no degree (code 0) as your benchmark.

iv. Convert the categorical variable INDUSTRY into a series of dummy variables as indicated in
the table below.
Dummy Variable Category
Resources, Utilities and Construction
Manufacturing
Wholesale and Retail Trade, Transport and Warehousing
Information, Finance and Real Estate
Professional Services
Hospitality and Entertainment
Other Services
Public Admin and Military

Variable Name
RUC
MFCTR
TRDTRSP
INFR
PFSRVCS
HSPENT
OSRVCS
PUBADMIN

Codes in Original Data
17-77
107-399
407-639
647-719
727-847
856-869
877-929
937-987

Note that for convenience you are creating a dummy variable for every category. Clearly, you can’t
include all of them in a regression as they will then be perfectly multi-collinear. Whichever variable we
leave out of a regression will be the benchmark for that particular regression.
c. You will also need to combine years and months of employment. To do this, you will need to
divide months of employment by 12 and add the resulting decimal number to years of
employment. You can complete this step either in Excel or in Eviews. Note that PSID also
includes length of employment in days. To get an accurate number for length of employment we
would combine years, months and days. For simplicity, we are only using years and months.
Label this new variable “Tenure”
d. Name your variables in a logical and intuitive manner (you are welcome to use my names, but are
not required to) and compute summary statistics for each. Report only the mean, median and
standard deviation for each variable. Report the sample size for your overall data set. For each of
your dummy variable, interpret the mean. What is it telling you? The table below provides a
check value for the mean some variables.
Variable
AGE
FEMALE
CHILDREN
MARRIED
RUC
MFCTR
TRDTRSP

Mean
41.73167

0.20113

Variable
INFR
PFSRVCS
HSPENT
OSRVCS
PUBADMIN
TENURE
WAGES

Mean

8.2883

Variable
WHITE
ASSOC
BACHELORS
GRADUATE
LABOR INCOME OF HD
WAGE_RATE
YEARS_EDUCATION

Mean

44,256.72
13.32

Save all three versions of your data set. You should have the original data
from PSID, the data set that you have cleaned up in part a, and the data set
with all the dummy variables as separate files.
All that you need to turn in is a single clean table showing the mean of all your
variables and the standard deviation for all variables except the dummy
variables.

Get a 30 % discount on an order above $ 50
Use the following coupon code:
COCONUT
Order your essay today and save 30% with the discount code: COCONUTOrder Now
Positive SSL