Scale (social sciences) - AbsoluteAstronomy.com

In the social sciences, scaling is the process of measuring

Measurement

Measurement is the process or the result of determining the ratio of a physical quantity, such as a length, time, temperature etc., to a unit of measurement, such as the metre, second or degree Celsius...

or ordering entities with respect to quantitative attributes or traits. For example, a scaling technique might involve estimating individuals' levels of extraversion, or the perceived quality of products. Certain methods of scaling permit estimation of magnitudes on a continuum

Continuum (theory)

Continuum theories or models explain variation as involving a gradual quantitative transition without abrupt changes or discontinuities. It can be contrasted with 'categorical' models which propose qualitatively different states.-In physics:...

, while other methods provide only for relative ordering of the entities.

See level of measurement

Level of measurement

The "levels of measurement", or scales of measure are expressions that typically refer to the theory of scale types developed by the psychologist Stanley Smith Stevens. Stevens proposed his theory in a 1946 Science article titled "On the theory of scales of measurement"...

for an account of qualitatively different kinds of measurement scales.

Comparative and non comparative scaling

With comparative scaling, the items are directly compared with each other (example : Do you prefer Pepsi

Pepsi

Pepsi is a carbonated soft drink that is produced and manufactured by PepsiCo...

or Coke

Coca-Cola

Coca-Cola is a carbonated soft drink sold in stores, restaurants, and vending machines in more than 200 countries. It is produced by The Coca-Cola Company of Atlanta, Georgia, and is often referred to simply as Coke...

?). In noncomparative scaling each item is scaled independently of the others (example : How do you feel about Coke?).

Composite measures

Composite measures of variables are created by combining two or more separate empirical

Empirical

The word empirical denotes information gained by means of observation or experimentation. Empirical data are data produced by an experiment or observation....

indicators into a single measure. Composite measures measure complex concepts more adequately than single indicators, extend the range of scores available and are more efficient at handling multiple items.

In addition to scales, there are two other types of composite measures. Indexes are similar to scales except multiple indicators of a variable are combined into a single measure. The index of consumer confidence, for example, is a combination of several measures of consumer attitudes. A typology is similar to an index except the variable is measured at the nominal level.

Indexes are constructed by accumulating scores assigned to individual attributes, while scales are constructed through the assignment of scores to patterns of attributes.

While indexes and scales provide measures of a single dimension

Dimension

In physics and mathematics, the dimension of a space or object is informally defined as the minimum number of coordinates needed to specify any point within it. Thus a line has a dimension of one because only one coordinate is needed to specify a point on it...

, typologies are often employed to examine the intersection of two or more dimensions. Typologies are very useful analytical tools and can be easily used as independent variable

Independent variable

The terms "dependent variable" and "independent variable" are used in similar but subtly different ways in mathematics and statistics as part of the standard terminology in those subjects...

s, although since they are not unidimensional it is difficult to use them as a dependent variable.

Data types

The type of information collected can influence scale construction. Different types of information are measured in different ways.

Some data are measured at the nominal level. That is, any numbers used are mere labels : they express no mathematical properties. Examples are SKU inventory codes and UPC bar codes.
Some data are measured at the ordinal level. Numbers indicate the relative position of items, but not the magnitude of difference. An example is a preference ranking.
Some data are measured at the interval level. Numbers indicate the magnitude of difference between items, but there is no absolute zero point. Examples are attitude scales and opinion scales.
Some data are measured at the ratio level. Numbers indicate magnitude of difference and there is a fixed zero point. Ratios can be calculated. Examples include: age, income, price, costs, sales revenue, sales volume, and market share.

Scale construction decisions

What level of data is involved (nominal, ordinal, interval, or ratio)?
What will the results be used for?
Should you use a scale, index, or typology?
What types of statistical analysis would be useful?
Should you use a comparative scale or a noncomparative scale?
How many scale divisions or categories should be used (1 to 10; 1 to 7; −3 to +3)?
Should there be an odd or even number of divisions? (Odd gives neutral center value; even forces respondents to take a non-neutral position.)
What should the nature and descriptiveness of the scale labels be?
What should the physical form or layout of the scale be? (graphic, simple linear, vertical, horizontal)
Should a response be forced or be left optional?

Comparative scaling techniques

Pairwise comparison
Pairwise comparison
Pairwise comparison generally refers to any process of comparing entities in pairs to judge which of each entity is preferred, or has a greater amount of some quantitative property. The method of pairwise comparison is used in the scientific study of preferences, attitudes, voting systems, social...

scale – a respondent is presented with two items at a time and asked to select one (example : Do you prefer Pepsi or Coke?). This is an ordinal level technique when a measurement model is not applied. Krus and Kennedy (1977) elaborated the paired comparison scaling within their domain-referenced model. The Bradley–Terry–Luce (BTL) model (Bradley and Terry, 1952; Luce, 1959) can be applied in order to derive measurements provided the data derived from paired comparisons possess an appropriate structure. Thurstone's Law of comparative judgment
Law of comparative judgment
The law of comparative judgment was conceived by L. L. Thurstone. In modern day terminology, it is more aptly described as a model that is used to obtain measurements from any process of pairwise comparison...

can also be applied in such contexts.
Rasch model
Rasch model
Rasch models are used for analysing data from assessments to measure variables such as abilities, attitudes, and personality traits. For example, they may be used to estimate a student's reading ability from answers to questions on a reading assessment, or the extremity of a person's attitude to...

scaling – respondents interact with items and comparisons are inferred between items from the responses to obtain scale values. Respondents are subsequently also scaled based on their responses to items given the item scale values. The Rasch model has a close relation to the BTL model.
Rank-ordering
Ranking
A ranking is a relationship between a set of items such that, for any two items, the first is either 'ranked higher than', 'ranked lower than' or 'ranked equal to' the second....

– a respondent is presented with several items simultaneously and asked to rank them (example : Rate the following advertisements from 1 to 10.). This is an ordinal level technique.
Bogardus social distance scale
Bogardus Social Distance Scale
The Bogardus social distance scale is a psychological testing scale created by Emory S. Bogardus to empirically measure people's willingness to participate in social contacts of varying degrees of closeness with members of diverse social groups, such as racial and ethnic groups.The scale asks...

– measures the degree to which a person is willing to associate with a class or type of people. It asks how willing the respondent is to make various associations. The results are reduced to a single score on a scale. There are also non-comparative versions of this scale.
Q-Sort
Q methodology
Q Methodology is a research method used in psychology and other social sciences to study people's "subjectivity" -- that is, their viewpoint. Q was developed by psychologist William Stephenson...

– Up to 140 items are sorted into groups based a rank-order procedure.
Guttman scale
Guttman scale
In statistical surveys conducted by means of structured interviews or questionnaires, a subset of the survey items having binary answers forms a Guttman scale if they can be ranked in some order so that, for a rational respondent, the response pattern can be captured by a single index on that...

– This is a procedure to determine whether a set of items can be rank-ordered on a unidimensional scale. It utilizes the intensity structure among several indicators of a given variable. Statements are listed in order of importance. The rating is scaled by summing all responses until the first negative response in the list. The Guttman scale is related to Rasch measurement; specifically, Rasch models bring the Guttman approach within a probabilistic framework.
Constant sum scale – a respondent is given a constant sum of money, script, credits, or points and asked to allocate these to various items (example : If you had 100 Yen to spend on food products, how much would you spend on product A, on product B, on product C, etc.). This is an ordinal level technique.
Magnitude estimation scale – In a psychophysics
Psychophysics
Psychophysics quantitatively investigates the relationship between physical stimuli and the sensations and perceptions they effect. Psychophysics has been described as "the scientific study of the relation between stimulus and sensation" or, more completely, as "the analysis of perceptual...

procedure invented by S. S. Stevens
Stanley Smith Stevens
Stanley Smith Stevens was an American psychologist who founded Harvard's Psycho-Acoustic Laboratory and is credited with the introduction of Stevens' power law. Stevens authored a milestone textbook, the 1400+ page "Handbook of Experimental Psychology" . He was also one of the founding organizers...

people simply assign numbers to the dimension of judgment. The geometric mean of those numbers usually produces a power law
Power law
A power law is a special kind of mathematical relationship between two quantities. When the frequency of an event varies as a power of some attribute of that event , the frequency is said to follow a power law. For instance, the number of cities having a certain population size is found to vary...

with a characteristic exponent. In cross-modality matching instead of assigning numbers, people manipulate another dimension, such as loudness or brightness to match the items. Typically the exponent of the psychometric function can be predicted from the magnitude estimation exponents of each dimension.

Non-comparative scaling techniques

Continuous rating scale (also called the graphic rating scale) – respondents rate items by placing a mark on a line. The line is usually labeled at each end. There are sometimes a series of numbers, called scale points, (say, from zero to 100) under the line. Scoring and codification is difficult.
Likert scale
Likert scale
A Likert scale is a psychometric scale commonly involved in research that employs questionnaires. It is the most widely used approach to scaling responses in survey research, such that the term is often used interchangeably with rating scale, or more accurately the Likert-type scale, even though...

– Respondents are asked to indicate the amount of agreement or disagreement (from strongly agree to strongly disagree) on a five- to nine-point scale. The same format is used for multiple questions. This categorical scaling procedure can easily be extended to a magnitude estimation procedure that uses the full scale of numbers rather than verbal categories.
Phrase completion scales
Phrase completions
Phrase completion scales are a type of psychometric scale used in questionnaires. Developed in response to the problems associated with Likert scales, Phrase completions are concise, unidimensional measures that tap ordinal level data in a manner that approximates interval level data.- Overview of...

– Respondents are asked to complete a phrase on an 11-point response scale in which 0 represents the absence of the theoretical construct and 10 represents the theorized maximum amount of the construct being measured. The same basic format is used for multiple questions.
Semantic differential scale – Respondents are asked to rate on a 7 point scale an item on various attributes. Each attribute requires a scale with bipolar terminal labels.
Stapel scale – This is a unipolar ten-point rating scale. It ranges from +5 to −5 and has no neutral zero point.
Thurstone scale
Thurstone scale
In psychology, the Thurstone scale was the first formal technique for measuring an attitude. It was developed by Louis Leon Thurstone in 1928, as a means of measuring attitudes towards religion. It is made up of statements about a particular issue, and each statement has a numerical value...

– This is a scaling technique that incorporates the intensity structure among indicators.
Mathematically derived scale – Researchers infer respondents’ evaluations mathematically. Two examples are multi dimensional scaling and conjoint analysis
Conjoint analysis (in marketing)
Conjoint analysis is a statistical technique used in market research to determine how people value different features that make up an individual product or service....

.

Scale evaluation

Scales should be tested for reliability, generalizability, and validity. Generalizability is the ability to make inferences from a sample to the population, given the scale you have selected. Reliability is the extent to which a scale will produce consistent results. Test-retest reliability checks how similar the results are if the research is repeated under similar circumstances. Alternative forms reliability checks how similar the results are if the research is repeated using different forms of the scale. Internal consistency reliability checks how well the individual measures included in the scale are converted into a composite measure.

Scales and indexes have to be validated. Internal validation checks the relation between the individual measures included in the scale, and the composite scale itself. External validation checks the relation between the composite scale and other indicators of the variable, indicators not included in the scale. Content validation (also called face validity) checks how well the scale measures what is supposed to measured. Criterion validation checks how meaningful the scale criteria are relative to other possible criteria. Construct validation checks what underlying construct is being measured. There are three variants of construct validity

Construct validity

In science , construct validity refers to whether a scale measures or correlates with the theorized psychological scientific construct that it purports to measure. In other words, it is the extent to which what was to be measured was actually measured...

. They are convergent validity

Convergent validity

Convergent validity, is the degree to which an operation is similar to other operations that it theoretically should also be similar to. For instance, to show the convergent validity of a test of mathematics skills, the scores on the test can be correlated with scores on other tests that are also...

, discriminant validity

Discriminant validity

In psychology, discriminant validity tests whether concepts or measurements that are supposed to be unrelated are, in fact, unrelated.Campbell and Fiske introduced the concept of discriminant validity within their discussion on evaluating test validity. They stressed the importance of using both...

, and nomological validity

Nomological validity

Nomological validity is a form of construct validity. It is the degree to which a construct behaves as it should within a system of related constructs called a nomological net....

(Campbell and Fiske, 1959; Krus and Ney, 1978). The coefficient of reproducibility indicates how well the data from the individual measures included in the scale can be reconstructed from the composite scale.