"Linux Gazette...making Linux just a little more fun!"

Evaluating postgreSQL for a Production Environment

By

Introduction

With the advent of relatively powerful, free and/or cheap software, I wanted to see if I was able to create a production environment around Linux and a DBMS. In the past I have worked with several DBMS products in several environments. My goal was to evaluate the Linux/postgreSQL against the several environments I am familiar with. Out of these environments I will pick the aspects which I think are important in a production environment.

Past production environments

I have worked in three stages of production environment. These were the unconnected PC environment, the connected PC environment (file/print server networking), and multi-user/multi-tasking environment (minicomputer). I will introduce the several tools in order of this complexity.

Some terms will need to be explicitly defined, because the xBase terminology is sometimes confusing. The term 'database' here means the collection of several related tables which are needed to store organised data. The term 'table' is used to define one collection of identical data, a set. This is because in the original xBase languages, 'database' was used as to mean 'table'.

FoxBase

Effectively being a much faster clone of dBase III, FoxBase contained the minimum necessary to define the tables of a database and a programming language which contained all necessary to write applications very quickly.

A database consisted of several tables and their indexes. The association of tables and their indexes must explicitly be done using commands.

The programming language used is completely procedural. It contains statements to create menu's, open and close tables, filter tables (querying), insert, update and delete records, view records through screens and a browse statement. Defining all these things in a program is quite straightforward. Records are manipulated as program variables. All data is stored in ASCII format.

One special feature which originated in dBase, are 'macro's'. These macro's are text strings, which could be compiled and interpreted at run-time. This was a necessary feature, because most statements took string arguments without quotes, e.g. OPEN MY_TABLE. If you wanted to define a statement with a parameter, you could not directly refer to a variable. Trying to execute OPEN ARG_TABLE, the program would search for the file 'ARG_TABL'. To circumvent this problem you need to code the following :

ARG_TABLE = "MY_TABLE" OPEN &ARG_TABLE

Clipper

Clipper originated as a dBase compiler, but added soon after powerful extensions to the language. I have also worked with the Summer '87 and the 5.0 version. At the database level, nothing changed very much from FoxBase, but at the user interface level and the programming level, several advanced features offered faster development turn-around times and advanced user interfacing. The macro feature was still available, but Clipper expanded it through code blocks. In the original implementation, a macro needed to be evaluated every time it was used. In cases where macro's where used to filter data, this amounted to waste of computing time. The introduction of the code block made it possible to compile a macro just once and then use the compiled version.

Other features where the introduction of some object-oriented classes for user interfacing, a powerful multi-dimensional array type, declarations for static and local variables and a plethora of functions to manipulate arrays and tables. The flipside of all this was that learning to effectively use the language took some more time. I have two books about Clipper 5.0 and they are quite large.

FoxPro 2.0

FoxPro was the successor of FoxBase. It added GUI-features to the text-interface, making it possible to work with overlapping windows. FoxPro 2.0 also added embedded SQL statements. It was only a subset with SELECT, INSERT, UPDATE and DELETE, but this offered already a substantial advantage over the standard query statements. It also offered a better integration between tables and their indexes, and one of the most powerful query optimizers ever developed. They also provided some development tools, of which the most important where the screen development and the source documentation tools.

Clipper and FoxPro made it also possible to program for networks and thus enable multi-user database systems.

WANG PACE

WANG PACE is a fully integrated DBMS development system which runs on the WANG VS minicomputers. It offers an extended data dictionary with field- and record-level validation, HL-language triggers and view-definitions. All defined objects contain a version count. When an object is modified and subsequent programs are not, then a runtime error is generated when compiled versions don't match DD versions. It also offers a powerful screen editor, a report editor and a query-by-example system. Access through COBOL, COBOL85 or RPGII is available with a pre-processor which compiles embedded statements into API-calls.

Summary of important features

If I look in retrospect to these systems, what were the important features which made programming more productive ? This reference must be made against postgreSQL and the available libraries for interfacing to the back-end. It must also be made from the point of the production programmer, who must be able to write applications without being bothered by irrelevant details.

Field names translate to native variable names
Defining a field name for a table makes it available under the same name to the program which can then use it as an ordinary, native variable.
Uniform memory allocation system
The xBase systems have a dynamic memory allocation scheme, which is completely handled by the runtime library. COBOL is fully statically allocated. In none of these languages the programmer needs to be concerned with tracking allocated memory.
Direct update through the current record
The manipulated record is available to update the table through one or another UPDATE statement.
Database statements have the same format as the application language
When programming in xBase, the statements to extract and manipulate data from the database tables formed an integral part of the procedure language.
In COBOL, the statements where embedded and processed by a preprocessor. The syntax of the available statements was made to resemble COBOL syntax, with its strong and weak points.
Simple definition and usage of screens
In xBase, there are simple yet powerful statements available for defining screens. Screens are called through only one or two statements.
In WANG PACE, screens can only be defined through the screen editor. There are three statements available : one to use menu's, one to process one record in a cursor and an iterative version to process all records in a cursor. Most screen processing is handled through the run-time libraries.

Features available when installing postgreSQL

The four first features can be installed using the ecpg preprocessor. This makes it possible to use native program variables, you don't have to worry about memory allocation, because the run-time library takes care of it, and updates can also take place using the selected program-variables.

What is missing, is a special form of include statement. Now you need to know which fields are in a table if you want to use a 'exec sql declare' statement. It would be better if there was something like 'exec sql copy fields from <tablename>'. If the tabledefinition then changes, recompiling the program will adjust to the new definitions.

Using the pgaccess program (under X-windows) provides access to the data dictionary in a more elegant manner.

Summary

I started out to write a critique on postgreSQL because of the summary documentation which is delivered in the package. This made it rather hard to find and use all the components which provide additional functions.

I started describing my experiences on other platforms to get an insight in what a production environment should deliver to the programmer. Then I started to look closely at the delivered documentation and to my surprise all components that I needed where in fact in the package.

The following critique still remains however. The documentation of the package is too much fragmented, and most parts of the documentation are centered around technical aspects which do not bother the production programmer. This is understandable however. The documentation is written by the same people that implement them. I know of my own experience that writing a user manual is very hard and it is easy to get lost in the technical details of the implementation that you know about.

The following parts of postgreSQL are important for the production programmer, and their documentation should be better integrated.

The psql processor
This is a nice tool to define all necessary objects in a database, to get acquainted with SQL and to test ideas and verify joins and queries.
The ecpg preprocessor
This is the main production tool to write applications which use database manipulation statements. This capacity should probably be extended to other languages too. Since all bindings from the selected cursor are made to program variables, records can be processed without the hassle of converting them from and to ASCII, and updates can be made through the 'exec sql update' statement.
The pgaccess package
The pgaccess package provides access to all parts of the database and offers the ability to design screens and reports. It is still in a development phase. I hope it will be extended in the future, because the basic idea is excellent and the first implementations are worthwile.

The libpq library is of no real value to a production programmer. It should be mainly a tool to be used in implementing integrated environments and database access languages. It could e.g. be used to create an xBase like environment (for those who wish to use this).

Further research

In the following weeks (months) I hope to setup a complete database system over a network, with one server and several types of clients (workstation, terminal, remote computer) through several interfaces (Ethernet, serial connections). I will investigate the several platforms for application development. I intend to have a closer look at the provided tools in the postgreSQL package (developing a simple database for my strip book collection), but I will also look at the possibilities that Java offers a development platform with JDBC, screen design and application programming.

One last note : for the moment I concentrate on using tools that I don't need to pay for, because I need the money for my hardware platforms and for my house. This does not mean that I am a die-hard 'software should be gratis' advocate. A production environment favors to pay for software, because then it knows that it has a complete tool with support and warranty (horror stories about bad support not withstanding).