Richardson ME, Adams CP, McCartney TPG. An analysis of tooth measuring methods in dental casts. Trans Eur Orthod Soc. 1963; 285-301
Ayoub AF, Wray D, Moos KF, Jin J, Niblett TB, Urquhart C, Mowforth P, Siebert P. A three-dimensional imaging system for archiving dental study casts: a preliminary report. Int J Adult Orthod Orthognathic Surg. 1997; 12:79-84
London: HMSO;
Medical and Dental Defence Union of Scotland. Essential Guide to Medical and Dental Records. 2007;
London: British Orthodontic Society;
Harradine N, Suominen R, Stephens C, Hathorn I, Brown I. Holograms as substitutes for orthodontic study casts: a pilot clinical trial. Am J Orthod Dentofacial Orthop. 1990; 98:110-116
Mah J, Freshwater M. The cutting edge. 3D digital models using laser technology. J Clin Orthod. 2003; 37:101-103
Brook PH, Shaw WC. The development of an index of orthodontic treatment priority. Eur J Orthod. 1989; 11:309-320
Linder-Aronson S. Orthodontics in the Swedish Public Dental Health Service. Trans Eur Orthod Soc. 1974; 233-240
Evans R, Shaw W. Preliminary evaluation of an illustrated scale for rating dental attractiveness. Eur J Orthod. 1987; 9:314-318
So LLY, Tang ELK. A comparative study using the Occlusal Index and the Index of Orthodontic Treatment Need. Angle Orthod. 1993; 63:57-64
Burden DJ, Holmes A. The need for orthodontic treatment in child population of the United Kingdom. Eur J Orthod. 1994; 16:395-399
Burden DJ, Mitropoulos CM, Shaw WC. Residual orthodontic treatment need in a sample of 15- and 16-year-olds. Br Dent J. 1994; 176:220-224
Richmond S, Roberts CT, Andrews M. Use of the Index of Orthodontic Treatment Need (IOTN) in assessing the need for orthodontic treatment pre- and post-appliance therapy. Br J Orthod. 1994; 21:175-184
Üçüncü N, Ertugay E. The use of the Index of Orthodontic Treatment Need (IOTN) in a school population and referred population. J Orthod. 2001; 28:45-52
Cohen J. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull. 1968; 70:213-220
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977; 33:37-46
Buchanan IB, Downing A, Stirrups DR. A comparison of the Index of Orthodontic Treatment Need applied clinically and to diagnostic records. Br J Orthod. 1994; 21:185-188
Mok C, Zhou L, McGrath C, Hagg U, Bendeus M. Digital images as an alternative to orthodontic study casts in assessing malocclusion and orthodontic treatment need. Acta Odontol Scand. 2007; 65:362-368
Veenema AC, Katsaros C, Boxum SC, Bronkhorst EM, Kuijpers-Jagtman AM. Index of Complexity, Outcome and Need scored on plaster and digital models. Eur J Orthod. 2009; 31:281-286
Peluso MJ, Josell SD, Levine SW, Lorei BJ. Digital models: an introduction. Semin Orthod. 2004; 10:226-238
Mullen SR, Martin CA, Ngan P, Gladwin M. Accuracy of space analysis with emodels and plaster models. Am J Orthod Dentofacial Orthop. 2007; 132:346-352
Fleming PS, Marinho V, Johal A. Orthodontic measurements on digital study models compared with plaster models: a systematic review. Orthod Craniofac Res. 2011; 14:1-16
Index of orthodontic treatment need applied to plaster models and to their three-dimensional digital equivalents Niall JP McGuinness James P McDonald Dental Update 2024 6:4, 707-709.
Orthodontic plaster casts are fragile and bulky, and an alternative method of storage which allows accurate dental cast measurement is desirable. Four examiners determined the IOTN scores from 30 sets of plaster casts on two occasions and their three-dimensional digital equivalents on two occasions. Three-dimensional digital models were found to be suitable substitutes for plaster models when scoring the Dental Health Component of the IOTN. However, there was less agreement in the scores for the Aesthetic Component.
Clinical Relevance: The application of three-dimensional digital casts in orthodontics.
Article
Plaster study models (or casts) have been a mainstay of clinical records in dentistry for many years. The use of study models is an integral part of both dental practice and dental research. The accuracy of dental measurement using such casts is more than sufficient for clinical purposes,1 providing an ideal three-dimensional, time-related record of the original dental status and the results of treatment,2 and are relatively inexpensive to produce.
For medico-legal purposes in the United Kingdom the Consumer Protection Act3 states that it is necessary to retain all patient records for a minimum of ten years (Medical and Dental Defence Union of Scotland4) and the British Orthodontic Society5 recommends that study models should be kept for 11 years, or until the patient is 25 years old, whichever is the later. This leads to problems of storage in terms of space and cost, in addition to the risk of damage because of the brittle nature of dental casts,6 highlighting the need for an alternative method of storage.
Computer-based record-keeping is becoming commonplace in orthodontics and has revolutionized the storage and handling of orthodontic records. Digital photography gives high-quality extra-oral and intra-oral images and, with the current software available, manipulation of the pictures is easy. Digital radiographs are instantly viewable and, combined with on-screen digitization, allow rapid assessment and cephalometric measurement. The logical development from this would be three-dimensional digital models which have the potential to replace the plaster cast and thus eliminate storage problems.
Digital models became available to the profession at the AAO Annual Meeting in 2001; these were developed by GeoDigm (Minnesota, USA). Impressions and a bite registration can be posted to GeoDigm, where plaster models are fabricated. The models are then scanned using a proprietary non-destructive scanning process that digitally maps the geometry of the plaster casts with an accuracy of ± 0.01 mm.7 A laser stripe is projected onto the surface of a cast and its distortion is read by multiple cameras while the cast is orientated on multiple axes to expose all its surfaces for scanning. The maxillary and mandibular digital casts are articulated, based on the bite registration provided by the clinician.
Occlusal indices are tools used to assign either numerical or categorical values to malocclusions. Various indices have been developed to determine the need for treatment, severity of malocclusion, complexity of treatment or health gain due to treatment.
The Index of Orthodontic Treatment Need or IOTN8 is based on the index used to assess treatment need within the Swedish Health Service.9 It is divided into two parts, a Dental Health Component (DHC) and an Aesthetic Component (AC). The two components taken together give an indication of treatment need. This index was designed as a possible means of prioritizing resources to those patients with the most severe malocclusions.
The Dental Health Component (DHC) of the IOTN has five categories (Table 1), ranging from 1 (no treatment need) to 5 (great treatment need) and the Aesthetic Component of the IOTN10 consists of a ten-point scale illustrated by a series of photographs which were rated for attractiveness by a lay panel and selected as being equidistantly spaced through the range of grades (Figure 1). There is a general consensus (in the UK) that a DHC of less than 4 and an AC score of below 7 do not justify treatment by a hospital-based consultant service except for teaching or research purposes.
Grade 1 (none)
Extremely minor malocclusions, including displacements <1 mm.
Grade 2 (little)
2a
Increased overjet >3.5 mm but <6 mm with competent lips
2b
Reverse overjet >0 mm but ≤1 mm
2c
Anterior or posterior crossbite with ≤1 mm discrepancy between retruded contact position and intercuspal position
2d
Contact point displacements >1 mm but ≤2 mm
2e
Anterior or posterior openbite >1 mm but ≤2 mm
2f
Increased overbite ≥3.5 mm without gingival contact
2g
Pre-normal or post-normal occlusions with no other anomalies. Includes up to half a unit discrepancy
Grade 3 (borderline need)
3a
Increased overjet >3.5 mm but ≤6 mm with incompetent lips
3b
Reverse overjet greater than 1 mm but ≤3.5 mm
3c
Anterior or posterior crossbites with >1 mm but ≤2 mm discrepancy between the retruded contact position and intercuspal position
3d
Displacement of teeth >2 mm but ≤4 mm
3e
Lateral or anterior openbite >2 mm but ≤4 mm
3f
Increased and incomplete overbite without gingival or palatal trauma
Grade 4 (treatment need)
4a
Increased overjet >6 mm but ≤9 mm
4b
Reverse overjet >3.5 mm with no masticatory or speech difficulties
4c
Anterior or posterior crossbites with >2 mm discrepancy between the retruded contact position and intercuspal position
4d
Severe displacements of teeth >4 mm
4e
Extreme lateral or anterior open bites >4 mm
4f
Increased and complete overbite with gingival or palatal trauma
4h
Less extensive hypodontia requiring pre-restorative orthodontics or orthodontic space closure to obviate the need for a prosthesis
4l
Posterior lingual crossbite with no functional occlusal contact in one or more buccal segments
4m
Reverse overjet >1 mm but <3.5 mm with recorded masticatory and speech difficulties
4t
Partially erupted teeth, tipped and impacted against adjacent teeth
4x
Existing supernumerary teeth
Grade 5 (treatment required)
5a
Increased overjet >9 mm
5h
Extensive hypodontia with restorative implications (more than one tooth missing in any quadrant requiring pre-restorative orthodontics
5i
Impeded eruption of teeth (apart from 3rd molars) due to crowding, displacement, the presence of supernumerary teeth, retained deciduous teeth, and any pathological cause
5m
Reverse overjet >3.5 mm with reported masticatory and speech difficulties
5p
Defects of cleft lip and palate
5s
Submerged deciduous teeth
The IOTN has been shown to be valid and reproducible in a number of studies,8,11,12,13,14,15 with kappa scores ranging from 0.71 to 0.91 for both inter- and intra-examiner agreement.
Because of the fragility of plaster study casts, together with the bulk and storage problems, an alternative method of storage which allows dental cast measurement is desirable. A cross-sectional analysis (Figure 2) and algorithm analysis (Figure 3) is possible.
To date, no previous research has been carried out to determine the validity and reproducibility of the Index of Orthodontic Treatment Need scores derived from plaster study casts and their three-dimensional digital equivalents.
The aims of this investigation were as follows:
To select a sample of orthodontic study casts from the archives of the Department of Orthodontics, representing a wide range of malocclusions (IOTN categories 1–5);
To produce three-dimensional digital equivalents of these casts using laser scanning and dedicated computer software;
Using a number of calibrated examiners, to assign IOTN scores to both study casts and their three-dimensional digital equivalents on-screen and to compare the scores from the two media, as well as inter- and intra-examiner reliability.
The null hypothesis (Ho) was that there was the same level of agreement in IOTN scores (DHC and AC) when assessed from study models or from the same models in digital format.
Materials and methods
The sample consisted of 30 sets of study casts with a range of malocclusions (IOTN categories 1–5) based on the Index of Orthodontic Treatment Need data and, in particular, the Dental Health Component. It was estimated that, at a 95% confidence level, 19 casts would provide the standard statistical power of 80% and 30 casts a power of 95%, which is desirable. The study models were all of good quality, accurately and uniformly trimmed with their bases parallel to the occlusal surface. The inclusion criteria were good quality models, all teeth having normal morphology, no visible attrition, no caries and no restorations affecting the mesio-distal or buccolingual crown diameter.
The plaster casts were professionally scanned by a digital model company using a 3Shape's R250TM laser scanner with an overall scanning cycle time of approximately 7 minutes.
The three-dimensional equivalents were stored on a laptop computer and the principal investigator (PS) was instructed in the use of the 3Shape's OrthoAnalyzerTM software and how to view the models on screen. 3Shape's OrthoAnalyzerTM is a dedicated software package used for the analysis and planning of orthodontic treatments, using the data of patient's cases scanned with a 3Shape R250TM 3D scanner.
The casts were initially scored on two occasions at least four weeks apart by one experienced and calibrated consultant orthodontist (NM) to set the ‘gold standard’.
This exercise provided two scores for each cast on two occasions (ie four scores for each model – two DHC and two AC.) The scores derived from this exercise were used for comparison with those of the four other examiners. These four specialists were chosen on the basis that they were experienced in the use of computers and had previously been calibrated in the use of IOTN by attending the calibration course in occlusal indices at Cardiff University (UK). The four examiners were each asked to examine the study casts on two separate occasions, at least four weeks apart. On a third and fourth occasion, they were asked to examine the digital model equivalents on the computer.
One of the strengths of the current study design is having four examiners, assuming that all four have good calibration (weighted kappa scores ≥0.6) for both inter- and intra-reliability. An additional feature of the study was the addition of two pairs of identical models, of which the examiners were not informed, bringing the total number of models actually examined to 32. This allowed for some descriptive analysis of the extent to which these duplicates were scored differently. The duplicate pairs were not presented consecutively, although they were examined only minutes apart. This aspect of the design meant that the true sample size was 30 distinct models.
For the Dental Health and Aesthetic Components on the IOTN, the weighted kappa statistic16 was used, with scores >0.6 regarded as good agreement and scores >0.8 regarded as very high (or almost perfect) concordance.17 Data were initially entered into an Excel file before being analysed in SAS v9.1 (SAS Institute, NC, USA) for cross-tabulation and calculation of kappa statistics.
Results
Gold standard
The intra-examiner weighted kappa scores for the gold standard were 0.92 for the DHC and 0.84 for AC. These scores were very high and higher than any of the examiners, so there is good evidence that this was an appropriate gold standard (Table 2).
Weighted kappa score
95% Confidence interval
Dental Health component
0.92
0.80 to 1.00
Aesthetic Component
0.84
0.74 to 0.95
Inter-examiner reproducibility of plaster casts
When compared with the gold standard, all four examiners had a weighted kappa score of 0.66 or greater (range 0.66–0.90) for the DHC and a score of 0.64 or greater in the Aesthetic Component (range 0.64–0.72), as shown in Table 3.
DHC
AC
Examiner
Weighted kappa score
95% confidence interval
Weighted kappa score
95% confidence interval
A
0.70
(0.53, 0.90)
0.64
(0.52, 0.81)
B
0.88
(0.79, 0.99)
0.69
(0.53, 0.83)
C
0.66
(0.53, 0.82)
0.72
(0.60, 0.82)
D
0.90
(0.75, 0.97)
0.68
(0.56, 0.82)
Intra-examiner reproducibility of plaster casts
For the Dental Health Component scores, all four examiners were found to have a weighted kappa score of 0.75 or above (range 0.75–0.91). For the Aesthetic Component scores, all four examiners showed a weighted kappa score of 0.72 or above (range 0.72–0.80), as shown in Table 4.
DHC
AC
Examiner
Weighted kappa score
95% confidence interval
Weighted kappa score
95% confidence interval
A
0.91
(0.72, 1.00)
0.80
(0.74, 0.91)
B
0.85
(0.74, 0.98)
0.75
(0.64, 0.85)
C
0.75
(0.61, 0.90)
0.72
(0.62, 0.90)
D
0.78
(0.65, 0.95)
0.78
(0.65, 0.87)
Every examiner scored well above the acceptable reliability threshold of 0.6.
Intra-examiner reproducibility of digital casts
For the Dental Health Component scores, all four examiners were found to have a weighted kappa score of 0.87 or above (range 0.87–0.94). For the Aesthetic Component scores, all four examiners showed a weighted kappa score of 0.77 or above (range 0.77–0.92), as shown in Table 5.
Examiner
DHC
AC
A
0.91 (0.80,1.00)
0.85 (0.76, 0.93)
B
0.94 (0.86, 1.00)
0.92 (0.85, 0.99)
C
0.94 (0.85, 1.00)
0.85 (0.76, 0.93)
D
0.87 (0.75, 1.00)
0.77 (0.67, 0.87)
Comparison between plaster and digital models
The weighted kappa scores for the DHC in the comparison between the plaster and digital models for the four examiners ranged from 0.71–0.91. The AC scores ranged from 0.54–0.86, as shown in Tables 6 and 7.
DHC
AC
Examiner
Weighted kappa score
95% confidence interval
Weighted kappa score
95% confidence interval
A
0.84
(0.72, 0.97)
0.54
(0.36, 0.72)
B
0.81
(0.73, 0.98)
0.71
(0.61, 0.86)
C
0.74
(0.59, 0.89)
0.68
(0.54, 0.82)
D
0.71
(0.55, 0.87)
0.72
(0.61, 0.87)
DHC
AC
Examiner
Weighted kappa score
95% confidence interval
Weighted kappa score
95% confidence interval
A
0.91
(0.81, 1.00)
0.68
(0.54, 0.81)
B
0.88
(0.77, 0.99)
0.69
(0.58, 0.81)
C
0.76
(0.61, 0.92)
0.64
(0.48, 0.80)
D
0.91
(0.80, 1.00)
0.86
(0.76, 0.96)
Analysis of duplicates
Both pairs of duplicates were scored a total of 36 times each, with the first pair being consistently scored 26 times (72%) and the second being scored consistently 27 times (75%). Therefore, the overall ‘duplicate consistency rate’ was 74% (53/72). Mostly, the size of the discrepancies was one point, but there were five occasions where the difference was two points (the AC score in each case).
Consistency was 75% for Examiner A, 81% for Examiner B, 88% for Examiner C and 69% for Examiner D. DHC consistency was 83% compared with 64% for the AC. The first and second sets of scores for the duplicate plaster models were both 75% consistent, whilst the duplicate virtual models were 56% consistent the first time and 88% consistent the second time. Table 8 shows that the Aesthetic Component scores less well compared with the Dental Health Component and the virtual models have very slightly lower duplicate consistency.
Plaster
Digital
All models
AC
65%
63%
64%
DHC
85%
81%
83%
Both Components
75%
72%
74%
Discussion
This study has shown that good inter-examiner reproducibility of the Dental Health Component of IOTN scores for the study casts was found between the four examiners and the gold standard (weighted kappa range 0.66–0.90) with good reproducibility for the Aesthetic Component of IOTN scores (weighted kappa range 0.64–0.72). When intra-examiner reproducibility for the study casts was examined, this was found to be good to almost perfect for the DHC (weighted kappa range 0.75–0.91) and good for the AC (weighted kappa range 0.72–0.80). Intra-examiner reproducibility for the digital models was found to be very good to almost perfect for the DHC (0.87–0.94) and good to almost perfect for the AC (0.77–0.92). When reproducibility of the scores between the plaster casts and their digital equivalents were examined, the weighted kappa scores for the DHC indicated a good to almost perfect level of agreement (0.71–0.91); for the AC scores, the range of agreement was moderate to very good (0.54–0.86).
There is no evidence to show that any of the examiners performed significantly better than any of the other examiners and inferences can be made in addition to each kappa score. For example, from Examiner C's initial DHC examinations of plaster versus digital casts, their estimate of reliability obtained from the kappa score for the study sample was 0.74, but we can be confident that, if asked to examine different sets of 30 casts repeatedly, then the interval 0.59–0.89 would include a kappa score equal to Examiner C's true measure of reliability on 95% of occasions.
There are essentially two factors that influenced the standard error:
The variability of the examiner responses; and
The sample size (ie the 30 casts).
In most cases, the sample size was more than adequate to show clear results, although a minority of the results may have benefited from having a larger sample size. The sample size was calculated purely on the variability of the gold standard's intra-reliability (because these were the only data available when designing the study). As it turned out, the gold standard had higher reliability than any of the other examiners and the variability of this reliability was greater than Examiner A, about the same as Examiner B and less than Examiner C and D. In other words, NM was the rater with the highest reproducibility, but Examiner A is the least variable. However, NM's variability was still relatively low, so the overall variability of the group of examiners may have been underestimated.
Aesthetic Component reliability scores in general are liable to be slightly lower than DHC scores because it uses a 10-point scale rather than a 5-point scale. However, it would be expected that the difference in the size of the scale would account for only a slight difference in the reliability scores, given that weighted kappa scores were used. The lower reliability scores for AC compared with DHC generally, together with the higher AC reliability in digital models, suggests that reliable AC measurements may be harder to achieve with plaster models compared with digital models.
Both previous investigations into the application of IOTN to study models and alternative media found similar results. Buchanan et al18 found a high level of examiner reliability when the Dental Health Component of the IOTN was applied clinically to a group of patients and later to study models and photographs taken at that same visit. Aesthetic Component scores, however, showed poor reproducibility. Two-dimensional pictures of study models on a computer screen were investigated by Mok et al19 who found a high level of agreement for the DHC (kappa = 0.79) and a lower agreement for the AC (kappa = 0.56) between measurements obtained from orthodontic casts and their images. A study by Veenema et al20 concluded that differences in ICON score between plaster and digital models resulted in mostly statistically insignificant values (P values ranging from 0.07–0.19), except for one observer in one sample.
Advantages and disadvantages of digital models
Digital models can be integrated into a patient's electronic file, along with digital radiographs, clinical notes and digital radiographs. Retrieval is both fast and efficient because the models can be stored by patient name and number, and it is possible to view digital models at multiple locations from computers linked to the central server.
Storage space is negligible for digital models when compared with conventional plaster models. The digital information for each case can be stored on the hard drive of a computer, on portable storage devices or on a central server. The digital ‘size’ of a set of orthodontic study models can be less than one megabyte.21 The digital models can be copied easily without the expense and inconvenience of plaster model duplication.
Three examiners were completely unfamiliar with digital models and three of the examiners considered the digital models and plaster casts to be equally informative for clinical purposes. The examiner who found that digital models were less informative considered the digital models to be generally somewhat harder to see in sufficient detail to control on screen and disliked the inability to ‘hold’ the digital models and make detailed analysis.
It was originally assumed that scoring digital models would be faster than scoring stone models. However, it took between 90 and 150 minutes to score 32 sets of plaster models and 180 to 240 minutes to score 32 sets of digital models. Mullen et al22 concluded that, when performing a Bolton analysis, the digital model can be as accurate as, and significantly faster than, digital callipers and plaster models. It is more difficult to quantify the precise inter-digitation of a digital model than with plaster. It is possible that this will improve with future software releases, but the pictures on the screen appeared to indicate openbites more than their plaster counterparts. This could be a function of the zoom facility on the computer. More zoom exaggerates features that would not normally be noticeable in plaster. This phenomenon should be acknowledged so that the viewer can learn to separate a clinically significant problem from an insignificant but magnified problem. It is possible that digital models will allow clinicians to see imperfections and improve other aspects of a patient's malocclusion; this in turn will raise the standards in the clinical outcomes of patient treatment. A review by Fleming et al23 recommended the use of digital models as an alternative to conventional measurement (tooth size, arch length, irregularity index, arch width and crowding) on plaster models, however stated the evidence was of variable quality.
Because a malocclusion can be reliably scored on digital models by using the IOTN index, as demonstrated in this study, a computer-based calibration exercise could be made available electronically for examiners who are already trained in the IOTN index. This would provide a standard for a previously calibrated examiner to update his or her skills. It should also be possible to provide a computer-based instruction module (including digital models) as a practice exercise for those who wish to be calibrated.
Conclusions
This study provides no evidence to reject the null hypothesis. There does not appear to be any difference between expert rating of plaster and digital models in either component of the IOTN, although less agreement has been observed between the methods for the Aesthetic Component.