|
Information Systems & Conceptual Schema
Introduction
Conceptual Schema are diagrams used in the
planning and design of databases. These diagrams have their own set of
symbols and rules. An understanding of conceptual schema diagrams and how
to produce them is very useful to anybody intending to construct complex
databases such as those involving relationships between tables in MS
Access. The term conceptual schema is often abbreviated to CS.
Problem Solving With Databases
You will already be familiar with Polya's 4
step process for problem solving from the work you have done when studying the
Algorithm & Programming section of the IPT course in Y11. This process can
be applied to problem solving with databases when producing an information
system.
George Polya was a well known Mathematician at
Stanford University in the USA. Polya was fascinated by the way in which
people solve problems and wrote widely on the subject. His most famous
book was called How To Solve It and it was in this work that he described the
four common steps taken when solving a problem. These are described below:
-
1. Define the problem
Define the problem so that its is clearly
understood and time is not wasted in following false directions.
In database design the problem definition
phase can be divided into two further parts. Firstly, problem
identification which is a general description of the problem outlining
the benefits of using an information system to solve this specific
problem. Secondly, conceptualisation which is a statement
describing the aim of the information systems and any specific objective
associated with it.
2. Plan the solution
Design an algorithm to solve the problem.
In database design the solution planning
phase is known as formulisation. The planning for an information
system involves the development of a conceptual schema. This in
turn leads to the development of what is called a relational schema.
Conceptual schema will be described in more detail later.
3. Implement the solution
Implement the solution means actioning the
planned algorithm. When working with Visual Basic this is writing,
testing and debugging code.
In database design it means realisation
of a solution by creating an actual working information system using a database
tool such as MS Access. This process may involve: -
- defining tables including data types and
field lengths
- adding relationships between tables
- populating tables by carefully adding
data
- creating appropriate queries to question
the database
- creating necessary forms and reports
- removing unwanted user components from
the Access GUI.
In addition, technical documentation
and a user manual is considered advisable.
4. Evaluate the solution
Evaluate the solution to make sure that fulfils
its intended purpose as describes in the problem definition phase.
In database design this involves the use of
test data to check that the information system works as intended. A formal
evaluation may be produced in written form. It would be expected to
include remarks on performance, reliability and user feedback.
Conceptual Schema
Just as Polya's four step process outlines the
method by which which problems should be effectively solved so there is another
process that outlines the most effective way to create conceptual schema and
relational schema. This method is also named after its creator, or at
least one of them, and is called Nijssen's Information Analysis Method (NIAM).
NIAM was, in fact jointly developed by two
Professors, G M Nijsssen and E D Falkenberg. Their process for the
analysis and representation of information is frequently represented in a number
of ways. However, for practical purposes it can best be summarised in six
steps: -
1. State the elementary facts.
2. Sketch an initial draft of the conceptual
schema diagram.
3. Eliminate surplus entities and identify
derived facts.
4. Add uniqueness constraints.
5. Add mandatory roles and cardinality
constraints.
6. Check the conceptual schema by populating
it with examples.
Let us look at each of these six stages in
turn.
State the elementary facts
Elementary facts are based upon a given
specific body of information on a related subject. This body of
information is referred to as the universe of discourse often abbreviated
to UoD. An elementary fact is a simple, unambiguous sentence that makes a
statement about the universe of discourse. For example: -
Eric Clapton plays guitar.
This fact is said to be a binary
fact because it associates two entities. An entity is a thing in
the information system. In other words it could have been written: -
The musician with the name
Eric Clapton plays the instrument with the name guitar.
In the sentence above the words
"musician" and "instrument" define the types of entity
are classified as entity types. An entity is frequently
identified with a label. A label is a way of identifying an
entity. Common label types include numbers, ID codes and
names. A label instance is a single occurrence of such a
label. In the example above, the word "name", which appears
twice, is a label type, whilst the words "Eric Clapton" and
"guitar" are label instances. The final significant component of
the elementary fact above is called the predicate or role. A predicate or
role (the two terms are interchangeable) defines the action performed by
an entity. In the example above the predicate is the word
"plays".
An elementary fact might contain
three entities in which case it would be called a ternary fact.
Look at the example below: -
The musician with the name
Sting was born in the birth country with the name England in the birth year
with the number 1961.
Without any loss of meaning this
particular ternary fact can be divided into two binary facts. This process
is called reduction and the ternary fact is said to have been reduced into
two binary facts. Thus: -
The musician with the name
Sting was born in the birth country with the name England.
The musician with the name
Sting was born in the birth year with the number 1961.
An eliminatory fact containing
four entities is called a quaternary fact. An elementary fact containing
more than five entities is called an n-ary fact for whatever number of
entities it may in theory contain.
Some ternary facts cannot be
reduced because to do so would destroy the meaning of the fact. Look at
the table below and in particular the highlighted row: -
| Country |
City |
Population |
| Australia |
Newcastle |
428 000 |
| England |
Newcastle |
284 000 |
| USA |
Newcastle |
38 000 |
| Australia |
Perth |
1 240 000 |
| Scotland |
Perth |
42 000 |
A country with the name
Australia contains a city with the name Perth.
A city with the name Perth
has a population with the number 1 240 000.
The creation of two binary facts
by reduction of the ternary fact has led to inaccuracy. Only specific
cities called Perth have a population of 1 450 000 in our example, that in
Australia. In this case the entities cannot be split at all. To accommodate
this the elementary facts are combined to create a nested binary
fact. In the case of the example above this would results in: -
A country with the name
Australia contains a city with the name Perth that has a population with the
number 1 450 000.
This represents a combination of:-
(A country with the name
Australia contains a city with the name Perth) has a population with the
number 1 450 000.
Sketch an initial draft of the conceptual
schema diagram
Entities are represented by a
symbol in the shape of an ellipse. Predicates are represented by symbols
in the shape of a rectangular box. This predicate box is divided in half
with a vertical line and each half represents the role played by the nearest
entity. Usually only one half of the box is filled in as the role performed
by the adjoining half can usually be inferred from the completed half.
Labels
are represented with an elliptical symbol drawn with a broken line.
An
entity linked to an entity represents a fact whilst an entity linked to a label
represents a reference. To save space it is usual to condense references
so that they are contained within the entity symbol. When this is done the
label is represented in parentheses
underneath the entity type. The diagram below follows this convention
and is made possible because it is clear that the label type has a 1:1
relationship with the entity. In other words, there is only one musician
with a particular name.
Look at the following table
containing four columns that can be reduced to two binary facts and a corresponding
single reference.
| ID |
Musician |
Album |
Year of Release |
| 0101 |
Michael Jackson |
Dangerous |
1991 |
| 0102 |
Phil Collins |
Phil Collins Hits |
1998 |
| 0103 |
Sheryl Crow |
Tuesday Night Music Club |
1993 |
| 0104 |
Enya |
The Celts |
1992 |
| 0105 |
Shania Twain |
Come On Over |
1998 |
Notice
how the ID number has been condensed into the ID entity. This has been
done because the ID number is a unique identifier whilst it is possible that the
musician's name might appear more than once if they had released more than one
album.
In
the next example the information contained in the table can be written as a
ternary fact but cannot be reduced into two binary facts without loss of
accurate meaning. In this case it is necessary to produce nested
elementary facts. The resulting combined entity is by convention
represented by drawing an entity ellipse around the role box.
| Country |
City |
Population |
| Australia |
Newcastle |
428 000 |
| England |
Newcastle |
284 000 |
| South Africa |
Newcastle |
38 000 |
| Australia |
Perth |
1 240 000 |
| Scotland |
Perth |
42 000 |
The
combined entity has a separate relationship with the entity population.
Eliminate surplus entities and identify
derived facts
Surplus entities
Care needs to be taken when
constructing CS diagrams to keep the number of entities to a minimum. The
goal is to have as few entities as possible. The table below contains an
example of information that might lead to the construction of surplus entities.
| Musician |
Birth Year |
Death Year |
| Jimi
Hendrix |
1942 |
1970 |
| Elvis Presley |
1935 |
1969 |
| Michael Hutchins |
1961 |
1999 |
| Buddy
Holly |
1938 |
1959 |
It would be tempting to represent
the information contained in this table like this: -
However, birth year and death year
could be thought of as a single entity namely year. This would simplify
the above diagram so: -

Derived
facts
Derived
facts are those that can be calculated using existing information. For
example, in the table below the musician's age at death can be calculated by
subtracting the birth year from the death year.
| Musician |
Birth Year |
Death Year |
Died Age |
| Jimi
Hendrix |
1942 |
1970 |
28 |
| Elvis Presley |
1935 |
1977 |
42 |
| Bob Marley |
1945 |
1981 |
36 |
| Buddy
Holly |
1938 |
1959 |
21 |
A derived fact is identified on a
CS diagram by the inclusion of an asterisk beside the predicate box placed
nearest the derived entity.
Add uniqueness constraints
Uniqueness constraints are import
in CS diagrams because they are used to avoid data redundancy. Efficient
database design involves the removal of unnecessarily duplicated data. A
uniqueness constraint is represented by a bar usually (but not exclusively)
placed above the role box. The bar represents "one" whilst
absence of a bar indicates "many".
In
the diagram below we have a one:many relationship in which a single
musician might belong to a number of groups. For example Paul McCartney
belonged to both The Beatles and Wings.

In
the next diagram we have a many:one relationship in which a number of
musicians only belong to a single group. If there is more than one group
in the table the musician's name may appear several times but each group's name
only once.
A one:one relationship has
bars above each half of the role box that are separated in the middle. So
in the example below a musician's name and a group's name would only appear once
in each column.

In a many:many relationship there are no
uniqueness constraints and therefore no bars. This would be the case if,
for example, a number of musicians belong to a number of groups. The
musician's names and the group's names appear more than once in each column of a
table. If uniqueness is achieved though a combination, in other words two
entities can be combined to form a single entity, then uniqueness is
demonstrated by drawing the bar across the role box without a break in the
centre (not illustrated).
CS diagrams appear more complex when uniqueness
constraints are applied to nested elementary facts. Notice the uniqueness constraint
is placed inside the entity ellipse. The uniqueness constraints in
the diagram below indicate that there is only one city of a given name
associated with a particular country. Further the city only has a single
population.
| Country |
City |
Population |
| Australia |
Newcastle |
428 000 |
| England |
Newcastle |
284 000 |
| USA |
Newcastle |
38 000 |
| Australia |
Perth |
1 240 000 |
| Scotland |
Perth |
42 000 |
Although in reality we know this
to be inaccurate (there is more than one Newcastle in England, Australia and
America) we can take it be be true if told to assume that the example contains
significant data. The term significant indicates there is
enough information to be able to make a decision about uniqueness and data type.
Add mandatory roles and cardinality
constraints
Mandatory roles
A mandatory role is a relationship
between a role and an entity that must be executed. These are represented
on CS diagrams by a "man-dot". This is a small black circle
placed on the entity symbol nearest the role that must be performed. Non
mandatory roles are called optional and are represented by the absence of
a man-dot. Optional roles tend to be obvious as in tabular information
they cause gaps. This might be because the data is unavailable perhaps
because it is not required, has been lost, or has yet to be generated. It is
possible for an entity to have several roles, some of which are mandatory and
others which are not.
| Year |
Winner |
| 1995 |
Germany |
| 1996 |
Switzerland |
| 1997 |
|
| 1998 |
Japan |
| 1998 |
Japan |
Cardinality constraints
Cardinality constraints identify permitted
values or limits to specific labels. If the cardinality (countable number)
is quite small then the legal labels are normally listed in braces (curly
brackets).
Check the conceptual schema by populating
it with examples
This final step in the process involves
returning to the elementary facts which were stated and testing to see if each
fits into the CS design that has been produced. Each fact must appear on
the CS diagram but importantly, only once. Also, extra facts that were not
identified as elementary facts, should not be shown on a CS diagram.
|