EPICURE Design Note 62.0<P> <b> VAX/Rdb Relational Database</b>

EPICURE Design Note 62.0
VAX/Rdb Relational Database

Debra S. Baddorf Document styling produced by

Originally written for CS425

IIT Computer Science Database course

30-NOV-1986

OUTLINE

INTRODUCTION

DDL - Data Definition Language

Define Database

Define Field

Define Relation

Define Index

Define Protection

Define View

Change Relation

Constraints

Entity Deletions

DML - Data Manipulation Language

Transactions

Data Updates

Selecting Data

Relational Operators

Statistical Expressions

RELATIONAL ALGEBRA SET OPERATIONS

Selection

Projection

Product

Join

Intersection

Union

Division

Difference

GENERAL TOPICS

Query Optimization

Embedded Rdb Commands

Journaling

Locking

Security

Data Integrity - Rollback

Relational Integrity

entity integrity

referential integrity

Tuple Value Restrictions

integrity constraints

functional dependencies

REFERENCES

INTRODUCTION

Digital's VAX Rdb/VMS (Relational DataBase) has relational conceptual and external models. Rdb/VMS runs on Digital's VAX or MicroVAX computers under the VMS or MicroVMS operating systems. There is also Rdb/ELN which runs under VAXELN, which has no general purpose operating system. Rdb supports concurrent use (maximum 512 users), providing both locking and rollback facilities. An Rdb database can span multiple disks, so the size of the database is limited only by the amount of storage available.

DDL - Data Definition Language

DEFINE DATABASE `db-name'.

This command creates the VMS file which will contain the database definition and the data. Additionally, the database definitions are stored in the VAX CDD - Common Data Dictionary (a separate product) --- if available. This data dictionary allows user-written programs to include the record and field definitions into a program at compile time and thus be assured that the program's record variable structure matches that of the database.

DEFINE FIELD field-name field-attributes.

Fields can be defined using the DEFINE FIELD command, or later within the DEFINE RELATION command. The DEFINE FIELD command allows some small measure of domain support in Rdb/VMS. The field-attributes clause includes selection of a datatype from the following choices:

The field-attributes clause can also include a VALID IF clause, which restricts attribute values to any set expressable as a conditional expression. Another field attribute clause is MISSING VALUE IS. The missing value clause indicates what to print in the field if it is null. An example might be

To ensure that a primary key field does not contain null values, you can specify

VALID IF NOT MISSING.

The missing value and validity clauses are both optional.

The field names included in the relation can be:

field-name. Defined elsewhere with the DEFINE FIELD statement.

field-name attributes. A full field definition.

A local field definition based on global field.

A field computed by other fields:

PRICE DATATYPE IS SIGNED WORD.

DISCOUNT_PRICE COMPUTED BY (PRICE * 0.9).

LATE_PAYMENT COMPUTED BY (PRICE * 1.1).

Rdb's DEFINE INDEX allows the user to specify whether duplicates are permissible, and lists the fields included in the indexing.

DEFINE PROTECTION FOR object

Protection can be defined for an entire database or for a relation or view using VMS access control lists (ACL's). Access rights delimit the actions a user is authorized to perform, and range from the privilege to read data or update data (modify, store, erase all separately grantable) to the privilege to make changes in the data definitions. ACL's allow specification of rights for an individual user, a class of users (by rights identifiers such as SECRETARIES, MANAGERS, DATAENTRY),

and access mode (BATCH, NETWORK, INTERACTIVE, LOCAL, DIALUP, REMOTE).

A name clause can be a qualified field name from one of the participating relations, a rename of a qualified field name, or a calculation from a qualified field:

S.SNAME.

SUP_NAM FROM S.SNAME.

GRMWT COMPUTED BY (P.WEIGHT * 454).

The ``rse'' specified in the DEFINE VIEW statement is a record selection expression. It can include any data selection statements valid in Rdb.

Rdb permits data update in a view only if it is from one relation.

You cannot use DELETE on a field that is referenced by a view, a constraint, or a COMPUTED BY field. You must remove all references first. A new field could be added to our suppliers database by performing the following:

CONSTRAINTS

Rdb/VMS provides you with a feature that helps the database to maintain referential integrity. For example, to ensure that any shipments entered in the SP relation correspond to a valid supplier and part, you can define the following constraint:

ANY is the equivalent of the English ``there exists''. You cannot place constraints on views. Also, the use of many complex constraints can inhibit concurrent database access, since Rdb puts locks on any relation it needs to consult for a write transaction.

See Figure for examples of the define commands.

ENTITY DELETIONS

DML - Data Manipulation Language

TRANSACTIONS

You begin a session with a database (the file containing the relations of interest) by using the INVOKE database command. When you are through, you use the FINISH database command. For each desired set of operations, you issue a START_TRANSACTION command, in which you specify whether you want a read only transaction or a write transaction, which relations are to be reserved, and whether the relations involved may be shared. All operations are recorded in a run-unit journal file until you either COMMIT or ROLLBACK the transaction. This insures that all the related changes are either made or not made together. After either a COMMIT or a ROLLBACK, the locks taken out by the system on all related tables are released. These locks include relations which are involved via a defined CONSTRAINT, as well as records actually specified in your query.

DATA UPDATES

New records are added with the STORE command. Its format is (via an example):

See also Figure . Both STATUS and CITY can be left blank if that information is not yet available. Information can be added later using the MODIFY command.

To eliminate a record you no longer want, use ERASE.

SELECTING DATA

Rdb's equivalence to SELECT or RETRIEVE is the FOR ... PRINT ... END_FOR block. You specify a range of records with selection expressions in the FOR block, then specify which fields are to be selected in the PRINT statement. In a way, the PRINT part of the process makes more sense than SQL's SELECT statement, because SQL ``selects'' the record for display purposes only. Rdb is clearer about its intent with the PRINT statement.

RELATIONAL OPERATORS

Rdb's set of permissible relational operators includes: = & > & UNIQUE

NE & LT & LE & CONTAINING

<> & < & <= & STARTING WITH

BETWEEN & ANY & MISSING & tabular } tex2html_wrap Example: Print the first five distinct last names with ``on''as the last 2 letters but 1.

We can list suppliers with the status value missing as follows:

STATISTICAL EXPRESSIONS

The following statistical expressions are available:

Fields with MISSING values are omitted for all but the COUNT function.

RELATIONAL ALGEBRA SET OPERATIONS Some examples have been shown about the record selection capabilities of Rdb, but in order to claim it as a ``relationally complete'' system I'd like to (try to) prove that Rdb's query language can perform all of the relational algebra set operations. In the process I will show more examples of data retrieval.

SELECTION English: Find data for all suppliers in Paris.

RA: Rdb:

PROJECTION English: Show all supplier numbers and the city in which located. RA:

Rdb:

To stray from my proof for a moment, and do a further comparison of Rdb and SQL, I note that Rdb will, by default, give you all of the values in a projection, without eliminating the duplicates. If you want duplicates eliminated, use the REDUCED TO clause. You can also sort records as desired.

PRODUCT (unrestricted) English: Print all possible combinations of suppliers and parts data. RA:

Rdb:

JOIN (natural) English: Print suppliers and parts data where the supplier is in the same city as the part. RA:

Rdb:

Note that the natural join over matching fields is easily done by using R CROSS S OVER CITY, but that you must specify the fields to be printed if you do not want both CITY fields. Also, the OVER CITY clause can be equivalently replaced by WITH SZ.CITY = PZ.CITY, if that is clearer or more natural.

INTERSECTION English: Print suppliers numbers for suppliers in London currently shipping anything. RA:

Rdb:

UNION English: List names of suppliers located in London or supplying to London. RA:

Rdb:

DIVISON English: List supplier number for suppliers who supply all parts. RA:

Rdb:

DIFFERENCE English: list supplier numbers for suppliers who do not supply part P2. RA:

Rdb:

GENERAL TOPICS

QUERY OPTIMIZATION Rdb has a run-time query optimizer called RDO - the utility which accepts interactive queries. There is also a precompiler for programs written in Basic, Cobol, Fortran, or Pascal. Other languages may be interfaced with Rdb by directly calling RDO.

EMBEDDED RDB COMMANDS Most, if not all, interactive Rdb commands can be used embedded in a program. A few others are also necessary. START_STREAM tells Rdb to begin processing the list of records specified in the record selection expression. FETCH advances the pointer to the next record in the stream. GET is used to assign values from the record to a host variable. END_STREAM stops the current stream of retrieved records and sets to null the position pointer for that stream.

JOURNALING Rdb provides ``after-image journaling'' by which all transactions since the start of the journal file can be recreated. The cumulative contents of this file can be applied to a restored copy of the database, recreating all transactions except the one actually in progress at point of failure.

Another journal, the ``run-unit journal'', keeps a record of any changes to the database definitions or the data itself. This is the file which holds copies of data before a transaction takes place, and to which the database is restored if a rollback is requested. This type of journaling is not optional (the other is); it also uses a separate file for each database user. Run unit journals are automatically applied after a system crash. These journals will help if the system crashes in the middle of a transaction. The after-image journaling is needed in case there is actually a failure of the medium on which the database resides.

LOCKING To ensure that data accessed for write remains constant throughout a transaction, Rdb places locks on all relations involved in the transaction. Thus, even an update to our SP relation causes locks to be put on the S and P relations also, since constraints require Rdb to consult these relations for checks on SNO and PNO. Rdb handles read-only transactions (in a way I don't fully understand) by providing a ``snapshot'' file, in parallel with the data file, to allow read operations to take place during a write.

Depending on the data being accessed (how the query is optimized; how tightly the records are restricted) the needed locking during transactions can be done at the file level, the index or the cluster level, or at the individual record level. The last, of course, permits the most activity to go on around it, without other users needing to wait until a lock is released.

SECURITY As mentioned in the DDL section, there are security measures to allow or disallow individual users or classes of users access to the database. Users can be entirely prohibited access, or can be permitted read access only. Write access is different from the ability to alter the relation (field creation or alteration, or relation alteration).

DATA INTEGRITY - Rollback Rdb provides START_TRANSACTION and COMMIT or ROLLBACK termination, as a relationally complete

system ought. Journaling is also an integrity assurance, as it provides the rollback capability even after a system crash, and also the recovery from backup of all transactions performed in the event of disk failure.

RELATIONAL INTEGRITY

Entity Integrity Entity integrity is not built in, but is available in the same way that SQL and QUEL provide it. A primary key field or fields can be defined so as not to accept null values. A unique index can be created, and its presence assured by allowing only the DBA to delete the index. Comments about the primary key can be included in the definition of the relation and are stored in the system catalog with the data definitions.

Referential Integrity Foreign key fields can also be defined as not null, and indexes can be defined on them. The authorization mechanisms cannot be applied on a per field basis, so preventing on-line modification of the foreign key or of the primary key is not possible. However, the CONSTRAINT mechanism is even better, since the associated primary keys may be checked before allowing a value to be entered in a foreign key field. Constraints can be defined to check for existence, uniqueness, or non-existence of a value, and can be defined so as to be checked when the STORE or MODIFY is issued or only when the COMMIT is issued. Thus, a delete or update of a primary key value can be CONSTRAINED so that the record cannot be deleted (erased) or a key value modified if a foreign key value is referencing it. Similarly, a foreign key value cannot be inserted or modified unless there is a matching value in the primary field referenced.

There is therefore no need to make sure that only one program maintains a given foreign key. Foreign key comments can be placed in the relation definitions.

Due to the CONSTRAINT mechanism, I would venture to say that Rdb is closer to ``fully relational'' than is either SQL/DB2 or INGRES, though perhaps the amount I've learned about those two systems from Date does not do them full justice.

TUPLE VALUE RESTRICTIONS

Integrity Constraints Constraints on the value of a tuple - ex: the height of a human should be less than 20 feet - can be enforced with Rdb's VALID IF clause in a field's definition. A range of values can be specified or a list of values checked. Any combination that can be specified with a Boolean expression and constants is possible.

Functional Dependencies X

Y can be defined using Rdb's CONSTRAINT mechanism. For instance (farfetched, but I'm grasping for examples!) you could define a constraint such that if the occupation were ``wet nurse'' then the sex must necessarily be female. Or perhaps, one might want a constraint that if the part is a ``screw'' then the weight may not be more than 50 pounds (we're not building bridges, we assume).

References

1: C.J.Date An Introduction to Database Systems 1985 Addison-Wesley
2: VAXELN Technical Summary 1984, Digital Equipment Corporation
3: VAX Rdb/VMS Guide to Data Manipulation December 1985, Digital Equipment Corporation
4: VAX Rdb/VMS Guide to Database Administration and Maintenance December 1985, Digital Equipment Corporation
5: VAX Rdb/VMS Guide to Database Design and Definition December 1985, Digital Equipment Corporation
6: VAX Rdb/VMS Reference Manual August 1986, Digital Equipment Corporation

Keywords: EPICURE, RDCS, Rdb, Relational, Database

Distribution: normal, Joel Butler, Bob Trendler

Security, Privacy, Legal

rwest@fsus04.fnal.gov

EPICURE Design Note 62.0 VAX/Rdb Relational Database

References

EPICURE Design Note 62.0
VAX/Rdb Relational Database