I have been deepening my understanding of data fitting, and in particular linear regression models. While there are other approaches that don’t square the residuals when trying to minimize them, most of the literature uses the least-squares criterion for finding the best-fit line. So one question that kept popping into my head while studying was: why is it important to square the residuals?

So far, I’ve understood two separate reasons.

- We want to penalize the values that most deviate from our best-fit line. Smaller residuals are seen as more favorable, as they are closer to the best-fit. Larger residuals should be disproportionately, as opposed to linearly, penalized.
- We want to make all the residual values positive. Squaring the residuals guarantees that positive or negative residuals are treated the same way. It also helps prevents positive and negative residuals offsetting each other. This could also be done through taking the absolute value.