noun. the trait of being founded on honesty, correctness, fact, or law. 2. the level to which an examination or gauge correctly gauges or shows what it claims to gauge. There are many forms of validity, inclusive of construct validity, concurrent validity, and ecological validity.

What is validity in psychology?

Importance

Validity is the extent to which a test measures what it claims to measure.

If you are familiar with the story about the duck, rabbit, eagle, and squirrel which excelled in their natural skills but flunked on tests that are out of their nature like in the case of the rabbit where it top in the running test but performed poorly in swimming; the duck which got first honors in swimming but got a low score in climbing. Where does “validity” come into the picture vis-à-vis the story?

Newton and Shaw (2013) defined validity as the extent to which a test measures what it claims to measure.[1] The school where the animals belong is using a set of tests that is invalid as the activities aimed to measure a certain animal’s skills which are not appropriate. A rabbit must not be tested in swimming same as the eagle should not be measured for its speed in running.

Using the definitions and the story above, we can see that if a test used is not valid, it will not give the result that it intends to measure. It would be a waste of time, effort, and unmet goals.

The validity theory in psychology

Sireci (2007) in his review of validity literature expressed the following:[2]

1. Validity is not a property of a test. Rather, it refers to the use of a test for a particular purpose.

2. Evaluating the utility and appropriateness of a test for a particular purpose requires multiple sources of evidence.

3. If the use of a test is to be defensible for a particular purpose, sufficient evidence must be put forward to defend the use of the test for that purpose.

4. Evaluating test validity is not a static, one-time event; it is a continuous process

Types of validity in psychology

Validity is a wide concept and good thing it can be specified further into four types!

1. Face Validity

Usually done at the early part of the test validation process and perhaps the most basic among the four is Face Validity. This type of validity refers to the degree to which a test appears to measure what it intends to measure primarily on its face value.

Furthermore, face validity elicits the impression from the respondent that it will measure what it is supposed to. If there is a problem with the face validity, most likely the respondent is left confused and has a lot of questions after going through with the instrument.

At this point, the level of validity is superficial and does not give enough weight for the test construction professional to say the tool is valid. As mentioned, this form of validity is the most basic and in a way, the weakest as it is very subjective and prone to researcher bias. Compared with the other types of validity, face validity lacks statistical analysis which makes it less objective and less scientific.

As to who would assess the tool’s validity remains to be a subject of debate. Would it be the experts who will do the process solely or would lay persons qualify or would it be best to pilot from the sample population indicated in the research demographics?

The best way is to cover all bases and run the face validity from the perspective of varied profiles. If commonalities are coming from the different subjects, all the more the face validity of the tool.

2. Construct Validity

Before we go to the discussion of Construct Validity, let us first know what a “construct” means. A construct is a concept or idea borne out of empirical observations. The thing about constructs is that they are not directly measurable. You may not be aware but concepts such as motivation, anxiety, self-esteem, depression, and many others cannot be directly observed or measured. Some indicators are associated with the constructs that help researchers quantify and test them.

Now off to Construct Validity. Simply put, it just wants to check if the measurements employed in a test or research aptly measure the concept/s it claims it will measure. For example, if the questions in a Depression Scale try to measure the concept of depression, then it passes the screening for construct validity. However, if the questions are more of descriptions of sadness or a gloomy feeling, then the test may be off the mark.

There are two types of Construct Validity: Convergent Validity and Discriminant Validity.

a. Convergent Validity- in layman’s terms, if two related constructs in your test correspond to each other or have a high correlation, then you can conclude that you have good construct validity. An example of this is when you are making a new scale for depression. To test for convergent validity, you may administer your test along with a widely used depression scale if you have access to one. Your test will be considered to have convergent validity if the respondents have similar or close scores on related items from both scales.

b. Discriminant Validity- on the contrary, to screen for discriminant validity, you have to use contradicting, highly unrelated concepts. For example, you want to test your Introversion Scale together with a scale that measures extroversion in the same sample population. A weak correlation between the two scales will show that your scale has discriminant validity.

3. Content Validity

Content validity is the degree to which a test or assessment instrument evaluates all aspects of the topic, construct, or behavior that it is designed to measure. Do the items fully cover the subject? High content validity indicates that the test fully covers the topic for the target audience. Lower results suggest that the test does not contain relevant facets of the subject matter.

Content validation assesses the degree to which the instrument measures the targeted construct it is designed to measure (Anastasia, 1988).[3] Specifically, it scrutinizes individual questions on a test and whether each targets concepts that the test is designed to cover.

Psychometric-wise, content validity is an important property of a test or measurement instrument. It assesses the level to which the test items represent and cover the domain or construct of interest. Simply put, content validity makes sure that a test measures what it is intended to measure and considers all important aspects of the construct.

Why is content validity important? In terms of test development, an instrument with high content validity provides accurate interpretations of the construct which paves the way for concrete conclusions and guided decisions. On the other hand, low-content validity instruments may fail to provide thorough information about the construct which may bring about incorrect results and conclusions.

How then do you go about with content validity? To establish high content validity, one must have the combination of both expert review and the utility of statistical analysis. Experts thoroughly review the test items of their relevance and how it represents the construct being studied. Furthermore, scrutiny of why the test was done, for whom it was done, and how it was answered are also placed into consideration by the experts. To be more precise in their conclusion of validating, experts may also use statistical procedures such as factor analysis and content ratio.

4. Criterion Validity

Criterion validity measures the level of correlation between a test and an established standard called a criterion or criterion variable. Others use the term “gold standards”.

A gold standard test serves as a benchmark or reference test that is the best available at the moment of the validation.

A test achieves criterion validity if its results highly correlate with the “gold standard.”

It is vital to remember that criterion validity is only as good as the validity of the gold standard. If the reference standard is also questionable to some degree, for example, research bias can impact the level of validity.

There are three types of criterion validity:

First is Predictive Validity, where the assessment takes place before the outcome. There is high criterion validity if the assessment predicts what it should predict. A common example is the SAT scores which can predict a student's performance in college.

Next is Concurrent Validity, where the assessment is simultaneous with measuring the outcome. This is when the criterion data and the predictor are simultaneously collected. Just think of it as a test replaced by another test.

Lastly is Postdictive or Retrospective Validity. When the criterion data is a measure of a factor that happened in the past. It is like assessing the present while another person measured the outcome in an earlier time setting.

Validity versus reliability

It is important not to be confused with validity and reliability. Reliability is getting the same result when replicating a study in the same conditions. Reliability and validity are independent factors of a process; therefore, a study can be reliable but may not be valid. However, a study can be both reliable and valid if done correctly.

References:

[1] Newton PE, Shaw SD. Standards for talking and thinking about validity. Psychol Methods. 2013; 18(3):301-19. doi:10.1037/a0032969

[2] Sireci, S. On Validity Theory and Test Validation.Educational Researcher. 2007; 36(8)

[3] Anastasia, A. (1988). Psychological testing (6th ed.). New York: Macmillan Publishing.