Improving R-Squared in Salary Survey Analysis: How Can I Tweak My Database?

yoyi
Hi all,

I am doing an analysis of a salary survey. The regression presents an R-squared value of 0.8, and the adjusted R-squared value is 0.5. How can I trim the database to improve the R-squared value?

Thanks for any help.
nashbramhall
Dear Yovi, Please provide more details about the nature of the data you have collected. What do you mean by "Trimming the database"? How many variables does your database contain? The R-Squared value is for a single explanatory variable, and the Adjusted R-Squared is for multiple explanatory variables. Have a nice day.

Regards, Simhan
Autumn Jane
Common Variables in Salary Regression

The two common variables in salary regression can be:

- Salary & Job Points (if jobs are evaluated)
- Salary & Age
- Salary & Tenure, etc.

Whichever variables are used, you are trying to establish the correlation between the two.

Trimming the Database in Salary Surveys

In conducting a salary survey, "trimming the database" refers to identifying "data outliers" or "extreme data points" and excluding them from the analysis. Including these data will skew the results either upwards or downwards, and trends/norms will not be able to be established. For example, if I have five Production Workers, with four of them receiving a salary within the range of $1000 to $2000 but the fifth one receiving $5000, the fifth is considered an "outlier" because including this data point will skew the analysis.

To identify "outliers," you need to perform a Standard Deviation Analysis (use Excel), set the desired Deviation step (e.g., 1, 2, or 3), and run the analysis. The Deviation step is anchored on the size of your data sample.

It is a good practice to run two sets of regression salary - one before the "trimming" to depict the current situation and another after the "trimming" to depict the desired situation.

"Trimming" is not just for show or presentation; it indicates an area of concern for the company that must eventually be addressed.

Please see the sample attachment.

Understanding Correlation in Regression

When R=1, you have a "perfect" correlation, but this is rarely the case in real life. To conclude whether two variables are "relatively correlated," the minimum is at R=0.8 (but it also depends very much on your desired standard).

Regards,
Autumn Jane
1 Attachment(s) [Login To View]

nashbramhall
Thank you, Autumn Jane, for a clear explanation of what "Data Trimming" is in this context. As the R-Squared value shown was 0.8 (quite high), I did not think about the outliers. There is a good explanation of this at http://www.statisticaloutsourcingser...m/Outlier2.pdf. However, it is not with respect to pay analysis.

Have a nice day.

Regards,
Simhan
Autumn Jane
Dear Simhan, you are absolutely correct in stating that a 0.8 R-squared value is indeed quite high. However, for salary analysis, just one outlier, whether it is above or below the expected range, can lead to significant morale issues throughout the organization. Therefore, the tighter the control over the spread of data, the more reliable and valid the analysis becomes.

Have a nice day.

Regards,
Autumn Jane
chintant
Understanding R-Squared in Salary Analysis

Getting a 0.5 R-square is not worthless. It is telling you that the factors you have chosen as the independent variables are not really the sole ones determining salary, and you are missing some important factors. Are you doing only a single-factor correlation, or are you doing multivariate analysis? If you are doing multivariate analysis, then you should also worry about the interdependence between the selected independent variables. If I remember my statistics correctly, it has to do with Pearson's correlation coefficient.

Regards
yoyi
Thanks, guys! Your replies are really meaningful to me!

Market Survey Analysis

First of all, I am doing a market survey analysis. All I have are the P25, P50, and P75. I remember that my teacher told me the R-squared would be acceptable if it is >= 0.95, which means the market data has high validity. If the R-squared is low, we should "trim" the original data. In this case, I only use the annual salary and the job grade to derive the regression line.

Dear Autumn Jane, your explanation is terrific, but I don't understand why the Internal Equity Analysis can determine the number of pay structures an organization should have. Could you give any further explanation?
nashbramhall
You are using univariate analysis; however, as there are people with variable experience and qualifications, you should be using multivariate analysis as pay does not depend solely on the grade.
raheelumair
Understanding R-Squared Value in Regression Analysis

R-squared value is 0.8, which means your independent variable is explaining 80% of the correct relationship between the dependent variable. An R-squared value equal to 0.5 or greater is acceptable. Remember one thing, R-squared can never be equal to 1. If it is 1, that means a 100% relationship, which is not a part of the statistical model. The value of 1 or 100% is only a part of a mathematical expression.

If you have more than one independent variable, you should also check tolerance and variance inflation factor (VIF). Basically, in technical terms, R-squared explains the variance (variation) of the independent variable in the dependent variable.

Just suppose you are checking the relationship between salary and age. Salary is your dependent variable, and age is the independent variable. If the R-squared value explains 0.8, then we can interpret that 80% of the variation in salary is explained by age.
Autumn Jane
Dear Yoyi, As promised, attached please find a perfect example of using an Internal Equity Analysis to determine whether an organization should have more than one pay structure.

Internal Equity Analysis Overview

In the first chart, it shows internal equity at its original practice, i.e., one pay structure after a job evaluation exercise. Even though the R-squared was 0.94 (exceeded the minimum of 0.8 and pretty close to 1.0), there are other "tell-tale" signs that indicate "unknowingly" or "unintentionally," the company was adopting two pay structures.

Signs of Multiple Pay Structures

- **Job Clusters:** One cluster at Job Points from 50 to 250 and another cluster at Job Points 350 all the way to 1050.
- **Break in Grades:** There are no jobs in between job points from 251 to 349.
- **Background Information:** Beyond the visual, background information about this company confirmed that the 1st cluster belongs to non-executive jobs (blue-collar & clerical) while the 2nd cluster belongs to executive jobs (professional & managerial). From the chart, it is confirmed that the company is paying more aggressively for their executive jobs vs the non-executive jobs.

Because of all the above factors, a second internal equity analysis was constructed, but with a break or with two pay structures. Notice the decline in the R-squared for both lines? Although there is a decline, the 2nd chart is a better representation of the pay practice for this company. Therefore, pay recommendations should be based on the 2nd chart instead of the 1st.

I hope my explanation is clear and useful.

Regards, Autumn Jane
1 Attachment(s) [Login To View]

If you are knowledgeable about any fact, resource or experience related to this topic - please add your views. For articles and copyrighted material please only cite the original source link. Each contribution will make this page a resource useful for everyone. Join To Contribute