Model

Data Generating Mechanism

The outcome variable in our model is the natural logarithm of income (log_income), and it is modeled as a linear function of English proficiency (lep), education level (educ_level), and gender (gender). The fitted model is:

\[ \begin{aligned} \log(\text{income})_i =\;& 9.62 \\ &- 0.0385 \cdot \text{LEP}_i \\ &+ 0.227 \cdot \text{HighSchool}_i \\ &+ 0.274 \cdot \text{SomeCollege}_i \\ &+ 0.402 \cdot \text{Bachelors}_i \\ &+ 1.14 \cdot \text{Graduate}_i \\ &+ 0.419 \cdot \text{Male}_i \\ &+ \varepsilon_i \end{aligned} \]

Where:

LEP = 1 if individual has limited English proficiency, 0 otherwise
HighSchool, SomeCollege, Bachelors, Graduate are dummy variables for education level
Male = 1 if individual is male, 0 if female
The reference category is female English-proficient individuals with less than high school education

To recover predicted income in dollars, we exponentiate the log outcome:

\[ \text{income} = \exp(\log(\text{income})) \]

Model Estimates

Linear Regression Results: Log Income Model
Variable	Coefficient	Std Error	p-value	95% CI
Intercept	9.620	0.007	<0.001	[9.61, 9.64]
Limited English Proficiency	-0.038	0.010	<0.001	[-0.058, -0.019]
High School	0.227	0.007	<0.001	[0.212, 0.241]
Some College	0.274	0.009	<0.001	[0.256, 0.292]
Bachelor's Degree	0.402	0.008	<0.001	[0.387, 0.416]
Graduate Degree	1.140	0.007	<0.001	[1.12, 1.15]
Male Gender	0.419	0.003	<0.001	[0.413, 0.426]
Note:
Reference: Less than HS education, Female gender