How
to Visualize Public Health Data?
Part one: Box Plot and Map
.........................................................................................................................
Dr. Mohsen Rezaeian (PhD, Epidemiologist, Associate
Professor)
Social Medicine Department, Rafsanjan Medical
School, Rafsanjan, Iran.
Correspondence:
Dr. Mohsen Rezaeian
Tel: +98 391 5234003
Fax: +98 391 5225209
Email: moeygmr2@yahoo.co.uk
|
ABSTRACT
Health care professionals
including family physicians increasingly
become involved in public health data
analyses. Data visualisation is the first
step in data analyses, which help to disclose
complex structures within data. The chief
aim of the present article, which is the
first article in a series of two, is to
discuss the pros and cons of two ways
of data visualisation i.e. box plot and
map using a real public health data example.
Key words:
Box plot, Map, Data visualization, Health
care professionals.
|
Health care professionals increasingly become
involved in public health data analyses. They
either have to analyse public health data by
themselves or have to use the results of the
analyses, which have been done by other health
care professionals. Therefore, they have to
be familiar with different ways of public health
data analyses. Data visualisation is the first
step in data analyses, which help to disclose
complex structure in data(1). From
this point of view, data visualisation may not
only create interest and attract the attention
of the viewer but also provide a way of discovering
the unexpected(2). In the present
article, which is the first article in a series
of two, the pros and cons of two ways of data
visualisation i.e. box plot and map are discussed,
using a real public health data example.
One of the most useful methods of summarising
data is to present the lowest value, the lower
quartile, the median, the upper quartile and
the highest value in a graph called box plot(3).
In this display, the median is used to show
the central value and the range of the upper
and lower quartiles to show variability of the
data.
To make this graph, a box is drawn with ends
at the upper and lower quartiles and a crossbar
at the median value. Next, a line is drawn from
the lower quartile to the lowest value and from
the upper quartile to the highest value. To
complete this picture and by using the following
formula, the position of the outliers is also
indicated usually using a circle symbol (3):
Lower quartile - 1.5 inter-quartile range &
upper quartile + 1.5 inter-quartile range
The application of box plot will be demonstrated
using a public health database later on.
"From the perspective of public health
practice, knowledge that a health problem is
concentrated in identifiable places is essential
for the efficient distribution of resources
for prevention, treatment or amelioration(4)."
Therefore, maps are becoming more and more important
in public health data analyses.
The production of attractive and informative
disease maps harmonize any formal statistical
analyses of spatial variations and for their
attractiveness, maps will influence the recipient
of the information much more than the associated
statistics(5). Maps reveal geographical
relations that are not obvious from numerical
and tabular data(6).
However, like any other graphical displays
there are a number of principals that one has
to follow in order to produce an informed map.
For instance, selecting the appropriate administrative
boundaries, selecting the appropriate colour
scheme or hatching, plus selecting an appropriate
method of data classification patterns, are
among the most important issues in mapmaking,
which requires cautious considerations(5,7).
In the next section and by using a real public
health data example I am going to show one of
these principals i.e. selecting an appropriate
method of data classification and for the rest
of these principals I am going to refer the
readers to the other articles(4,5).
It should be noted that the process of classification
can be explained as systematically grouping
data based on one or more characteristics. This
should result in a clearer picture and should
also improve insight into the data. Research
has also revealed that in order to get an overview
of the theme mapped at a single glance, the
number of classes should not exceed more than
seven(8).
|
PUBLIC HEALTH DATA EXAMPLE |
The data used in this article comes from the
results of Iranian National Demographic Health
Survey (DHS) which was conducted in the year
2000(9). The piece of data that was
selected for visualisation purposes is related
to the percentage of people over 15 years with
hypertension in the then 28 provinces of Iran
(Table 1). Based on the figures, which are presented
in an ascending order in Table 1 it is very
difficult to summarise the data or visualise
any relationship between provinces.
| Table
1 The percentage of people over 15 years
with hypertension within different provinces
of Iran |
|
Iranian Provinces |
% of people over 15 years with hypertension |
|
Gom |
7.1 |
|
Bushehr |
7.5 |
|
Sistan va Baluchestan |
7.9 |
|
Khuzestan |
8.6 |
|
Fars |
8.7 |
|
Golestan |
8.7 |
|
Semnan |
8.8 |
|
Chahar Mahall va Bakhtiar |
9 |
|
Azarbayjan-e-gharbi |
9.2 |
|
Kordestan |
9.3 |
|
Lorestan |
10.4 |
|
Kohgiluyeh va Buyer Ahmad |
10.7 |
|
Ilam |
10.8 |
|
Mazandaran |
10.8 |
|
Zanjan |
11.2 |
|
Khorasan |
11.2 |
|
Khorasan |
11.4 |
|
Hormozgan |
11.6 |
|
Kermanshah |
11.7 |
|
Hamadan |
11.7 |
|
Kerman |
12.4 |
|
Ardabil |
12.5 |
|
Tehran |
13.1 |
|
Azarbayjan-e-shargi |
13.5 |
|
Gilan |
15.1 |
|
Qazvin |
15.9 |
|
Markazi |
18.9 |
|
Yazd |
19.3 |
In order to summarise the data a box plot was
produced (Diagram 1). As mentioned earlier a
number of important summary indices can be seen
by this graph. For instance, by looking at this
graph one could easily visualise the following
summary indices:
Lowest value = 7.10
Lower quartile = 8.85
Median = 11
Upper quartile = 12.47
Highest value = 16.20
Inter-quartile range = 3.62
| Diagram
1 Box plot depicting the percentage
of people over 15 years with hypertension
within different provinces of Iran |

One also easily visualises that two provinces
i.e. Markazi and Yazd were considered as the
outliers for their high percentage of people
over 15 years with hypertension i.e. 18.9 and
19.3, respectively.
Nevertheless, box plot is still unable to reveal
any relationship between provinces. Therefore,
one has to apply a map to reveal any such relations.
Therefore, two maps were produced from the
current data selecting two acceptable methods
of classification as follows: The first method
is Quantile, which divides the number of observations
evenly over the number of classes taken. The
name of this method is based on the number of
classes, for instance, when applied to four
classes it is called Quartile and with five
classes, Quintiles(8). The second
method is Equal Interval, in which the class
width is equal for all classes(8).
For each map a white to black colouring scheme
has been adapted. According to this scheme those
provinces which have a higher percentage of
people over 15 years with hypertension, have
adopted a darker colour and vice versa.
Map 1 depicts a Quintiles classification of
the percentage of people over 15 years with
hypertension within different provinces of Iran.
This map reveals all 28 provinces of Iran evenly
categorized in five classes i.e. 6 provinces
placed in three categories whilst five provinces
are in two other categories. Based on this map
there are five provinces i.e. Azarbayjan-e-shargi,
Gilan, Qazvin, Markazi and Yazd, which adopt
a black colour indicating that they have a high
percentage of people over 15 years with hypertension.
| Map
1 Map depicting Quintiles classification
of the percentage of people over 15 years
with hypertension within different provinces
of Iran |

Map 2 also depicts Equal Interval classification
of the percentage of people over 15 years with
hypertension within different provinces of Iran.
For producing this map the highest percentage
i.e. 19.3 has been detracted from the lowest
percentage i.e. 7.1. Then, we get the resulting
figure i.e. 12.2 divides by 5 i.e. the number
of classes, which becomes equal to 2.44. This
means that the interval between classes must
be set at 2.44. Based on this map there are
only two provinces i.e. Markazi and Yazd, which
adopt a black colour indicating that they have
a high percentage of people over 15 years with
hypertension.
| Map
2 Map depicting Equal Interval classification
of the percentage of people over 15 years
with hypertension within different provinces
of Iran |

It should be noted that both maps are correct
looking at the problem from different angles.
Whilst Map One divides provinces evenly, Map
two is more in accordance with box plot trying
to highlight outliers. Both maps also highlight
that more provinces in the northern and central
parts of Iran suffer from hypertension compared
to southern provinces.
Although maps reveal the spatial relationships
that might not be seen in tables(10)
we should not rely on the presentation of a
single map (5) because a single map is only
one of the large number of maps that might be
produced from the same data(11).
On the one hand, it has been pointed out that
the end point of data visualisation is not necessarily
a single 'correct' map"(12),
and, on the other hand, it has been argued that
it is crucial to ensure that correct rules are
applied in the mapping processes(13).
Furthermore, one should also bear in mind that
other graphical displays such as box plot may
also help health care professionals to better
summarise and visualise their data(5).
- Cleveland WS. Visualising data. Hobart
Press, Summit, NJ, 1993.
- Everitt BSE, Dunn G. Applied multivariate
data analysis. London: Arnold, 2001.
- Dunn G, Everitt B. Clinical biostatistics.
London: Edward Arnold, 1995.
- Rezaeian, M. Dunn, G. St. Leger, S. Appleby
L. Geographical epidemiology, spatial analysis
and geographical information systems: a multidisciplinary
glossary. J Epidemiol Community Health 2007;
61 : 98-102.
- Rezaeian, M. Dunn, G. St. Leger, S. Appleby
L. The production and interpretation of disease
maps: A methodological case-study. Soc Psychiatry
Psychiatr Epidemiol. 2004; 39: 947-954.
- Parchman, ML. Ferrer, RL. Blanchard, KS.
Geography and Geographic Information Systems
in Family Medicine Research. Fam Med 2002;
34:132-137.
- Smans M, Esteve J. Practical approach to
disease mapping. In Elliott P, Cuzik J, English
D, Stern R. Geographical and environmental
epidemiology-methods for small area studies,
pp 141-150. Oxford: Oxford University Press,
1996.
- Kraak M, Ormeling F. Cartography: visualisation
of spatial data. Harlow: Longman, 1996.
- National Demographic Health Survey (DHS).
Iranian Ministry of Health and Medical Education;
2001.
- Bell BS, Broemeling LD. A Bayesian analysis
for spatial processes with application to
disease mapping. Stat Med 2000; 19 : 957-974.
- Monmonier M. How to lie with maps. Chicago:
The university of Chicago Press, 1996.
- Gatrell AC, Bailly TC. Interactive spatial
data analysis in medical geography. Soc Sci
Med 1996; 42 : 843-855.
- Clif AD. Analysing geographically related
disease data. Stat Methods Med Res 1995; 4
: 93-101.
|