The Knowledgebase is organised into different categories, please select a category that you are interested in. Additionally, you may also search the entire Knowledgebase by entering keywords in the navigation bar beside this text.
A basic and simple assessment for bias in psychometric tests: The 4/5 RulePsychometric tests should have been validated by a test publisher prior to being published. However, the test user should also pay due diligence and ensure that the test cannot be accused of having unjustifiable adverse impact for one group of people over another.There is a simple guideline known as the four/fifths (4/5) rule.Here we seek to ascertain the percentage of one group that is selected or passes through to the next round of a selection process based on test scores, compared to another group.Let's say that we are comparing red people with green people so that we don't unwittingly offend anybody! If we find that more than 4/5 of reds to greens are reaching the mark, we have little to worry about. On the other hand, if we discover that 4/5 or less of reds to greens are making it, we have cause for concern. In fact, in some parts of the world, we would be required by law
This statistic is useful when we are trying to assess the amount of coverage (prediction) of job performance that is afforded by a test score or composite of scores. Most trained users of psychometric assessments will be familiar with the correlation coefficient. This is simply the strength of relationship between two variables. For example, test score and performance rating. The coefficient of determination is calculated by squaring the correlation coefficient.Let's say that a candidate's score on the Numerical Reasoning Test is related to their performance as an accountant. We find there is a strong relationship or correlation. We'll make it r = .6. Now just square this: r2 = .6 x .6 = .36. This suggests that the test score accounts for 36% of the variance in performance. However, because correlation does not infer causation, we must also accept the inverse. Performance accounts for 36% of the variance in test scores! The calculation is similar although more complicated within regression models. Here, the theoretical rationale will drive the direction of the hypothesis and a resultant r2 is produced which tells us the amount of varian
The following table shows corresponding percentile scores for a range of Z scores. PsyAsia's course attendees will recall that to calculate a Z score from a raw score one uses the following:Z score = Raw Score minus Mean divided by Standard DeviationTo convert from any other standard score scale into a Z score one simply subtracts the mean of the other standard scale from the standard scale score and divides by the standard deviation of the standard score scale. So, to convert a T score into a Z score: Z score = T Score minus mean of T score scale (50) divided by standard deviation of a T score scale (10).See also: What is a percentile score
Occupational tests may be classified in numerous different ways. The use of each term provides a language by which test administrators are able to understand what the test measures, how it measures, what scores are compared against and conditions under which the test is sat (e.g., timed or getting more difficult as test progresses). Different aspects of this form of communication are listed below.The two major categories of psychometric test are: MAXIMUM PERFORMANCE TESTSAbility and aptitude types tests where the candidate is asked to perform to the best of their ability. There are right and wrong answers and the test is timed. TYPICAL PERFORMANCE TESTSThe candidate is asked about their preferences, behaviours, typical response styles and attitudes to a range of things. There are no right or wrong answers and these tests are not usually timed.Tests may also be classified as:Power Vs Speed TestItems on power tests get more complex as the test progresses and usually, not everyone will be able to accurately answer all questions. Speed tests are timed and usually not everyone wi
Disattenuation of correlation coefficients due to unreliability of measurementA common problem in psychometrics research is the fact that a correlation coefficient is indicative of the relationship between two variables plus measurement error. As a correction for measurement error, Nunnally (1978) provided an equation that results in the disattenuation of the correlation coefficient. The issue here is that in a ‘real-world’ setting, if there was a pure way of measuring two variables that did not incorporate error, the actual relationship between the variables would be stronger. Prof. Paul Barrett has developed a software program that calculates the relationship between two variables when corrected for error. Click here for the software. There is however debate as to the usefulness of this equation (Nunnally, 1978). The contention is that if a personality (or other) questionnaire is being used as a predictive tool, it is imperative that the measure is reliable before using it as a predictor. Correcting for attenuation has the effect of increasing
The first thing to remember is that if you are using a purely ipsative personality test then you should not be comparing test results between candidates. Ipsative tests are self-referencing - they are comprised of force-choice items. They are useful in coaching, team-building and career guidance, but should not be used alone in recruitment and selection scenarios.Some tests on the market, such as the Saville Consulting Wave or the Apollo Profile are joint normative-ipsative tests and these would be fine to be used to compare between candidates. A normative test is one which allows the candidate to respond based on the strength of their agreement or disagreement with a statement. The end results are then compared with a group of similar others who have previously taken the test (the norm group). Purely normative tests such as the Identity Self-Perception Questionnaire would also be good to use for comparing candidates. Aptitude tests are by their nature normative tests and hence can be used to compare between candidates. So, let
Some test publishers do publish ipsative (forced choice) tests for use in selection. However, caution is required here. If your test is purely ipsative, then you should not use it in selection. See the references below for reasons why. The main reason relates to the fact that whilst you can compare your candidate's relative strengths with an ipsative test, you cannot compare them to others. In selection of course, the aim is to compare one candidate with another. That said, recent developments allow for the use of ipsative questions within selection. Some recent tests provide for dynamic ipsative questions wherein these questions will only appear if the candidate has not presented a clear picture of themselves on a particular test scale. An alternative to this is the modified ipsative scale where candidates are asked to rate the degree to which they agree or disagree with a statement. Again, if these items are presented alongside normative (non-forced-choice) items, they can be used in selection.
As noted in the Standard Error of Measurement knowledgebase article, a respondent's score on any test or scale is not their true score, rather their true score + error. Therefore, if we want to compare two respondents who have taken the same test or one respondent who has sat two different tests, we need to use a formula to assess whether any differences we observe in test scores are real differences or simply differences that occurred due to measurement error.To elaborate, if Tommy scores 25 on a test and Ada scores 30 on the same test, we cannot immediately conclude that Ada is better than Tommy in the construct being measured (e.g., verbal reasoning). Likewise, if Tommy scores 15 on his verbal reasoning test and 21 on his abstract reasoning test, we cannot immediately conclude that Tommy is better at abstract reasoning than verbal reasoning.To enable such conclusions, it is first necessary to calculate the Standard Error of Difference between the two test scores. The SEdiff equation is based on Standard Error of Measurement and thus the more reliable a test is (in terms of test-retest reliability), the lower the SEm will be and the lower the SEdiff value.
No method of assessment is 100% reliable - this applies to psychometrics just as it applies to interviews or reference checks and so forth. Psychometrics as a science is more likely however to apply statistical correction techniques to account for such error.When consulting test manuals, a user may come across the SEm figure. This refers to the test or scale's Standard Error of Measurement. If we were to hypothetically test a candidate time and time again, ignoring practice effects, we know that their score would vary over time. Sometimes the candidate would be tired, other times not so, sometimes hot or cold, other times in the mood for testing or not and so on. Likewise and although we would prefer this not to be the case, items in the test may at times appear ambiguous to a candidate or the test user may stray slightly from the standardised instructions. These factors all impact upon test reliability and thus Standard Error of Measurement. A candidate's true score is to be found somewhere within their hypothetical distribution of scores, but the score that we observe when we test them is not their true score, rather their true score + error. To calculate the error associated with a test or scale (and thus know the range within which a candidate's true score lies), we
Types of Bias in Psychometric Test Translation With the demand and need for psychological tests increasing in various different cultures and countries, there has been much greater awareness regarding some of the issues that are associated with the development or adaptation of tests to be used in contexts and situations that may be different from which the test was developed for. This article focuses on one of the key aspects of translating tests, the types of bias that can occur. When utilizing the test in a new cultural group, it is not quite as simple as directly translating the test, administering it and then comparing the results for its validity. There are a number of issues that need to be considered such as whether the area assessed with the test applies to the new culture or whether is may be biased towards that group and whether what is assessed by the test also has similar behavioral indicators? These are just some of the potential areas where bias can be found in the translation of tests and affect the validity of the test being utilized in the new context. Van
Personality assessment can be divided into two categories, type and trait based personality assessments. Across both types of personality assessments, it is assumed that personality remains stable over time, involves a genetic basis and influences individuals to demonstrate similar behavior in most situations.Trait based personality assessments assess various aspects of an individual’s personality which contributes to them behaving in particular ways. Across the population, different people’s aspects of their personality will tend to vary and this explains the wide variety of personality descriptions. Such assessments have tended to be developed to fulfill a certain need to explain personality in various contexts such as work. Although these assessments may be more difficult for people to understand and can be difficult to use in team building activities, they are more psychometrically sound and allow for more accurate comparisons between individuals. These assessments can be used in conjunction with other methods in activities that require differentiation between individuals such as
Firstly, ensure that you understand the theoretical rationale upon which the test was developed. It should be based upon sound scientific theory! Keep this in mind, in addition to your own reason for using the test (staff selection, departmental development, counselling etc.) as you gather the following information:1. Suitability of the questions: are the questions in the test suitable given your target and context? Are they easy enough for your group to understand? Do they appear to measure what the test purports to measure? Do you think any of the questions might cause offence to the respondent?2. Reliability: source reliability data from the test publisher. You can usually find this in the test manual. Ensure that internal consistency and test-retest data exists. Coefficients listed in the manual or quoted by the publisher should reach at least .70 for personality tests and .80 for ability/aptitude tests.3. Validity: look for evidence of validity in the test manual. The publisher should note at least one out of construct validity and criterion-related validity. Basically, the data should suggest that the construct the test pur
A derived dimension is similar to a scale on a personality test. However, a scale is a direct measure of a particular construct, such as openness to experience or extraversion. In contrast, a derived dimension is "derived" statistically using a formula that has been shown by previous research to predict scores in this dimension. More simply put: A group of candidates completes the Identity Questionnaire and the Belbin Team Roles Inventory. The publisher then statistically examines the relationship between scores on various Identity scales and team role preferences. An equation is established that enables prediction of team role preferences from Identity scale scores. In the future, respondents do not actually complete the Team Roles Inventory, instead, their Identity scales are used to predict how they are likely to score on the inventory if they were to complete it. The benefits of this are related to time and cost savings. However, the con is that test users must be cautious in the interpretation of derived dimensions. As we know, there is error associated with any selection method, including psychometric tests. When using derived dimensions, we are actually correlating potential error with potential error and thus our results (and interpretations) may be less accurate than if we were to take a direct measure of the construct or derived dimension.
A norm group is a reference group that is used to compare your respondent's scores on a test or scale against similar others. This gives the score meaning. For example, a score of 20/30 (known as the raw score) means nothing on it's own. We need to know how well similar others perform if it's an ability test or whether a person is scoring in the middle band, above or below the middle band in a personality questionnaire. By comparing the score with a group of similar others, we add meaning and thus interpretation to our observed score.NB: Not all psychometric tests have norm groups/norm tables. This is because some tests rely on comparison within the self for their interpretation. These tests are usually referred to as Ipsative or forced choice tests. Here we can say that a respondent is more sociable than questioning for example, but we cannot compare our respondent with others. It is due to this lack of comparison that ipsative tests should be reserved for development and coaching and not used in selection.
The percentile score is the value below which x% of values fall. That is, if your respondent scores in the 60th percentile, the data is indicating that 60% of similar others (assuming you are using the correct norm table) would score less than your respondent.NB: Prof.Paul Barrett at the University of New Zealand discusses some contrasts in definitions of the percentile. Some definitions include the qualifier "at or below". Thus, "the percentile is the point at or below which a given percentage of scores is observed. The intellectually curious may refer to Prof. Barrett's article by clicking here.See also: Converting a standardised score to a percentile score
A psychometric test is a measure of a psychological construct (such as personality or aptitude) that has been constructed according to psychometric principles. In addition, the test should be administered, scored and interpreted in a standardised manner. Psychometric principles means that the test was developed in the following manner:Constructed according to a valid theoretical rationaleItems (questions) developed on the basis of that rationale and in order to assess the construct the test purports to measureTrialling of itemsAnalysis of the items for reliability and other statistical propertiesRefinement of the items on the basis of the analysisRe-trialling of the itemsFurther analysis and retrialling as necessary to p
Most psychological societies (e.g., British Psychological Society) and academics (e.g., Devellis, 1991) suggest that an acceptable level of reliability for psychometric tests is:Ability/Aptitude Tests: .80Personality Tests: .70Acceptable and unacceptable levels of the Cronbach’s Alpha coefficient
There is not a figure for acceptable validity of a psychometric test. When scrutinizing test manuals, the qualified user is recommended to assess whether or not validity studies have been conducted and whether they are criterion or construct validity studies. If you will be using your test in a norm referenced manner, construct validity is essential. If you will be using your test in a criterion referenced manner, criterion-related validity is more important. You also need to check who the study was conducted on, how many respondents were involved, whether the criteria used was both reliable and valid# and whether any noted correlations* were statistically significant given the stated sample size.* E.g., a correlation between similar scales on 2 different tests for a construct validity study or correlation between test score and performance in a criterion-related validity study.# The criterion (e.g., performance appraisal scores) must be consistently and accurately assessed and must really measure what it claims to measure, i.e. performance, time management, customer service skills etc.
Reliability refers to consistency of measurement.If you measure the length of your office wall on two occasions and get two different measurements, you know that something is not right! Maybe you changed your viewing angle of the tape measure between the measurements? Maybe you held the tape measure differently on each occasion?Reliability is a crucial necessity in psychometric assessment. If there is a lack of reliability, there is a lack of consistency in the scores that test respondents receive and, as a result, the interpretation of their profile of behaviours and abilities changes.No test can claim to be 100% reliable - indeed, neither can any method of assessment. Test publishers, distributors and users are all part of the process of enhancing the reliability of well designed psychometric tests. The following factors (and more) can all impact upon reliability:Factors within the testing environmentSuch as noise or temperatureFactors within the respondentSuch as mood or de
Validity means "Is the test fit for purpose?"Some different types of validity:Face Validity (low-level of importance overall)Asks: "Do the questions appear to measure what the test purports to measure?"Important for: Respondent buy-inHow assessed: Simply by looking at the questionsContent ValidityAsks: "Do there appear to be enough suitable questions to measure the complete construct we are trying to measure?"Important for: E
There have been a number of comments from clients and the general public regarding the length of psychometric assessments and queries about the reasons why psychometric assessments are so long. It needs to be kept in mind that for psychometric assessments to have utility and be effective when assessing people for various purposes, the assessment has to be reliable and valid for the situation.All psychometric assessments are not 100% accurate and measurement errors from a variety of sources can affect the results. The length (i.e. the number of items) of the assessment affects the reliabilty of the assessment and research has demonstrated that measurement errors are smaller in longer assessments than in shorter assessments. In addition, a larger number of items better represent the abstract characteristics that are being assessed. For example, when assessing personality, one cannot expect to obtain an accurate picture of an individual through a few questions, therefore more items are needed. It has to be noted that after a limit, increasing the number of items will not provide further increases to reliability as other factors such as fatigue will set in.It is for this reason that good psychometric assessments will have a large number of items and therefore require some time for the can
One of the first things clients will want to know when choosing who to work with when ordering psychometric tests is “why should I choose xyz company”?As the field of psychometrics continues to grow, overseas publishers are working hard to make inroads into local markets. Clients should therefore be wary of the expertise (or lack of it) in organisations that are distributing tests.We firmly believe that those in the best place to distribute psychometric tests are those who have a background in personality psychology and/or organisational psychology. In fact this premise was shared by many reputable test publishers until relatively recently.Greed and motivation to expand market share have taken over in many cases and some test publishers have delegated test distribution to non-psychologists or those with short-course qualifications in this area.The downsides of this are tremendous. Not only does it threaten the very integrity of the test and the industry, but it brings into the fore concerns regarding malpractice and the like.
Choosing the right psychometric testHuman resource professionals are unlikely to need any convincing that the use of psychometric tests as an aid to employee selection and development is probably at an all time high. The increase in the use of aptitude and personality tests in the workplace is a positive thing provided the tests are chosen and used properly. This article discusses what decision-makers should look for in order to be confident they are making the right test choice.The Hong Kong website of an employee testing system that is marketed worldwide claims:"Really, what is the most effective way to evaluate the reliability and validity of any assessment tests so to help us to know exactly how to find the right productive people with certainty and predictability without any catastrophe in hiring any wrong people who simply look good?""The most workable and effective answer of the above questions is simply to TEST THE PEOPLE YOU KNOW VERY WELL; then you know which assessment test can be valid and reliable to use!"