Model

Data Generating Mechanism

The outcome variable in our model is the natural logarithm of income (log_income), and it is modeled as a linear function of English proficiency (lep), education level (educ_level), and gender (gender). The fitted model is:

\[ \begin{aligned} \log(\text{income})_i =\;& 9.62 \\ &- 0.0385 \cdot \text{LEP}_i \\ &+ 0.227 \cdot \text{HighSchool}_i \\ &+ 0.274 \cdot \text{SomeCollege}_i \\ &+ 0.402 \cdot \text{Bachelors}_i \\ &+ 1.14 \cdot \text{Graduate}_i \\ &+ 0.419 \cdot \text{Male}_i \\ &+ \varepsilon_i \end{aligned} \]

Where:

  • LEP = 1 if individual has limited English proficiency, 0 otherwise
  • HighSchool, SomeCollege, Bachelors, Graduate are dummy variables for education level
  • Male = 1 if individual is male, 0 if female
  • The reference category is female English-proficient individuals with less than high school education

To recover predicted income in dollars, we exponentiate the log outcome:

\[ \text{income} = \exp(\log(\text{income})) \]

Model Estimates

Linear Regression Results: Log Income Model
Variable Coefficient Std Error p-value 95% CI
Intercept 9.620 0.007 <0.001 [9.61, 9.64]
Limited English Proficiency -0.038 0.010 <0.001 [-0.058, -0.019]
High School 0.227 0.007 <0.001 [0.212, 0.241]
Some College 0.274 0.009 <0.001 [0.256, 0.292]
Bachelor's Degree 0.402 0.008 <0.001 [0.387, 0.416]
Graduate Degree 1.140 0.007 <0.001 [1.12, 1.15]
Male Gender 0.419 0.003 <0.001 [0.413, 0.426]
Note:
Reference: Less than HS education, Female gender