LINUX GAZETTE

[ Prev ][ Table of Contents ][ Front Page ][ Talkback ][ FAQ ][ Next ]

"Linux Gazette...making Linux just a little more fun!"


Installing dict - An On-Line Dictionary

By


Purpose of this Paper

To advertise the efforts of http://www.dict.org and to provide the means by which any Linux user regardless of experience, can install a functional dictionary system either for local or network use.

CONTENTS

  1. Introduction
  2. The DICT Development Group (www.dict.org)
  3. Available Dictionaries
  4. Installation
    1. dictd, dict and dictzip
    2. Webster's
    3. WordNet (r) 1.6
    4. Jargon File, FOLDOC, The Elements, Easton's Bible Dictionary & Hitchcock's Bible Names Dictionary
    5. More up-to-date Jargon File
    6. US Gazetteer
    7. The Devils Dictionary
    8. Who Was Who: 5000 B. C. to Date
    9. Language Dictionaries
  5. Configuring dictd
  6. Using dict
  7. Kdict
  8. Conclusion

Introduction

I have been using Linux exclusively as my operating system for over three years now. One of the very few things I miss about "that other operating system" is the easy availability of cheap or even free versions of commercial encyclopedias and dictionaries.

So when I installed a recent version of S.u.S.E. linux I was both surprised and happy to find a package called Kdict had been installed on my machine. Reading the documentation that came with the package revealed that the program was only a front end to another program, and that though it is possible to install a dictionary server locally, if I wanted to do so I would have to get everything else I need from the Internet.

The DICT Development Group (www.dict.org)

Note:- This section paraphrases the contents of ANNOUNCE in the dict distribution.

The DICT Development Group (www.dict.org) have both developed a Dictionary Server Protocol (as described in RFC 2229), client/server software in C as well as clients in other languages such as Java and Perl, and converted various freely available dictionaries for use with their software.

The Dictionary Server Protocol (DICT) is a TCP transaction based query/response protocol that allows a client to access dictionary definitions from a set of natural language dictionary databases.

dict(1) is a client which can access DICT servers from the command line.

dictd(8) is a server which supports the DICT protocol.

dictzip(1) is a compression program which creates compressed files in the gzip format (see RFC 1952). However, unlike gzip(1), dictzip(1) compresses the file in pieces and stores an index to the pieces in the gzip header. This allows random access to the file at the granularity of the compressed pieces (currently about 64kB) while maintaining good compression ratios (within 5% of the expected ratio for dictionary data). dictd(8) uses files stored in this format.

Available in separate .tar.gz files are the data, conversion programs, and formatted output for several freely-distributable dictionaries. For any single dictionary, the terms for commercial distribution may be different from the terms for non-commercial distribution -- be sure to read the copyright and licensing information at the top of each database file. Below are approximate sizes for the databases, showing the number of headwords in each, and the space required to store the database:

Database

Headwords

Index

Data

Uncompressed

web1913

185399

3438 kB

11 MB

30 MB

wn

121967

2427 kB

7142 kB

21 MB

gazetteer

52994

1087 kB

1754 kB

8351 kB

jargon

2135

38 kB

536 kB

1248 kB

foldoc

11508

220 kB

1759 kB

4275 kB

elements

131

2 kB

12 kB

38 kB

easton

3968

64 kB

1077 kB

2648 kB

hitchcock

2619

34 kB

33 kB

85 kB

www

587

8 kB

58 kB

135 kB

All of these compressed databases and indices can be stored in approximately 32MB of disk space.

Additionally there are a number of bi-lingual dictionaries to help with translation. Though I have not looked at these judging from their different sizes some will be more useful than others (i.e. English to Welsh is unfortunately not very good, whereas English to German is probably quite useful).

All the dictionaries seem to be under constant development so interested people should keep up with latest developments.

Available Dictionaries


Webster's Revised Unabridged Dictionary (1913)

The Oxford English Dictionary this is not! It is however a very pleasant dictionary. It seems to be an American version of one of those Dictionary/Encyclopedias, so common at the time of its writing. Quite often in a definition you will find a poetic quote and it really is very informative and pleasant to use.



WordNet (r) 1.6

This dictionary seems to be under constant development. The aim seems to be to provide definitions of all the words people want to have definitions for! In practice it seems to miss some obvious words such as "with" and "without". I guess the idea is to simply provide necessary update to the definitions found in Webster's. Unfortunately this dictionary is neither as informative or as pleasant as Webster's. If you need a more up to date dictionary it is necessary.



The Free On-line Dictionary of Computing (15Feb98)

FOLDOC is a searchable dictionary of acronyms, jargon, programming languages, tools, architecture, operating systems, networking, theory, conventions, standards, mathematics, telecoms, electronics, institutions, companies, projects, products, history, in fact anything to do with computing. The dictionary is Copyright Denis Howe 1993, 1997.



U.S. Gazetteer (1990)

This is probably only of interest to people wanting information about America. The original U.S. Gazetteer Place and Zipcode Files are provided by the U.S. Census Bureau and are in the Public Domain.



Easton's 1897 Bible Dictionary

These Dictionary topics are from M.G. Easton M.A., D.D., Illustrated Bible Dictionary, Third Edition, published by Thomas Nelson, 1897. Due to the nature of etext, the illustrated portion of the Dictionary has not been included.



Hitchcock's Bible Names Dictionary (late 1800's)

This dictionary is from "Hitchcock's New and Complete Analysis of the Holy Bible," published in the late 1800s. It contains more than 2,500 Bible and Bible-related proper names and their meanings. Some Hebrew words of uncertain meaning have been left out. It is out of copyright, so feel free to copy and distribute it. I pray it will help in your study of God's Word. --Brad Haugaard



The Elements (22Oct97)

This dictionary database was created by Jay Kominek <jfk at acm.org>.



The CIA World Factbook (1995)

This somewhat typically short sighted view of the World (sorry I love America, I lived there for a while - its great, but it is not ALL THE WORLD!), really only becomes useful if you look in the index file and see that there are Appendix's, these are though of limited use to normal people, who think that the world ends at their keyboard.



Jargon File (4.2.0, 31 JAN 2000)

The Jargon File is a comprehensive compendium of hacker slang illuminating many aspects of hackish tradition, folklore, and humor. This bears remarkable similarity to FOLDOC above.



THE DEVIL'S DICTIONARY ((C)1911 Released April 15 1993)

_The Devil's Dictionary_ was begun in a weekly paper in 1881, and was continued in a desultory way at long intervals until 1906. In that year a large part of it was published in covers with the title _The Cynic's Word Book_, a name which the author had not the power to reject or happiness to approve. Users of the fortune program will already have some familiarity with this ;-).



Who Was Who

Who Was Who: 5000 B. C. to Date: Biographical Dictionary of the Famous and Those Who Wanted to Be, edited by Irwin L. Gordon

OTHER DICTIONARIES

A number of other dictionaries have been made available, see the dict home page for details. In many cases you may find the program to convert dictionary data to the format dict requires has not been written yet ;-(

As mentioned elsewhere, there are a number of translation dictionaries also available (see below).

Installation

The links given here were correct at the time of writing. If it is a long time since this paper was published you should visit http://www.dict.org to see what has changed.

Unfortunately installation of the above mentioned software did not go quite as easily as it should have, which partly explains why I am writing this;-).

The first thing you will need is plenty of disk space. The largest dictionary available is Webster's 1913 dictionary, which will need about 85Meg to be re-built in.

dictd, dict and dictzip

Unarchive dictd-1.5.5.tar.gz in the normal manner.

IMPORTANT:- The HTML support has been turned off in this version of dict. You need to turn it back on if you want to take advantage of Kdict.

Load the file dict.c into your favorite editor and remove the comments from line 1069:-


      { "raw",        0, 0, 'r' },
      { "pager",      1, 0, 'P' },
      { "debug",      1, 0, 502 },
         { "html",       0, 0, 503 },    //Remove comments from this line
      { "pipesize",   1, 0, 504 },
      { "client",     1, 0, 505 },

so the file becomes as above.

Now you can run ./configure;make;make install. You will see a great many warnings produced by the compiler, but at the end you should have working client, server and compression program installed.

Webster's

Unpack the files dict-web1913-1.4.tar.gz and web1913-0.46-a.tar.gz:


     $ tar xvzf dict-web1913-1.4.tar.gz
     $ tar xvzf web1913-0.46-a.tar.gz
     $ cd dict-web1913-1.4 
     $ mkdir web1913
     $ cp ../web1913-0.46-a/* web1913
     $ ./configure
     $ make
     $ make db

Now go make a cup of tea, this takes over an hour on my 133MHz box. When done, decide on a place for your dictionaries to live and copy them there, I use /opt/public/dict-dbs as suggested:-


     $ mkdir /opt/public/dict-dbs
     $ cp web1913.dict.dz /opt/public/dict-dbs
     $ cp web1913.index /opt/public/dict-dbs

WordNet (r) 1.6

Grab dict-wn-1.5.tar.gz

It is a great shame that one of the most useful dictionaries is also the one that refuses to compile correctly. To create a viable dictionary the original data must be parsed by a program. When you do make it is this program that is created. Unfortunately this package uses a Makefile created by ./configure which does not work. I am unable to correct the automake procedure but can assure you that the following will work:


   $ tar xvzf dict-wn-1.5.tar.gz
   $ cd dict-wn-1.5 
   $ ./configure
   $ gcc -o wnfilter wnfilter.c
   $ make db

Again this process takes a considerable amount of time ( > 1 hour on my 133MHz). Once complete if you have not already created a directory for your dictionaries do so now and copy the dictionary and its index there:


   $ cp wn.dict.dz /opt/public/dict-dbs
   $ cp wn.index /opt/public/dict-dbs

Jargon File, FOLDOC, The Elements, Easton's Bible Dictionary & Hitchcock's Bible Names Dictionary

Grab dict-misc-1.5.tar.gz


   $ tar xvzf dict-misc-1.5.tar.gz
   $ cd  dict-misc-1.5
   $ ./configure
   $ make
   $ make db
   
   $ cp easton.dict.dz /opt/public/dict-dbs
   $ cp easton.index /opt/public/dict-dbs
   $ cp elements.dict.dz /opt/public/dict-dbs
   $ cp elements.index /opt/public/dict-dbs
   $ cp foldoc.dict.dz /opt/public/dict-dbs
   $ cp foldoc.index /opt/public/dict-dbs
   $ cp hitchcock.dict.dz /opt/public/dict-dbs
   $ cp hitchcock.index /opt/public/dict-dbs
   $ cp jargon.dict.dz /opt/public/dict-dbs
   $ cp jargon.index /opt/public/dict-dbs

More up-to-date Jargon File

Grab dict-jargon-4.2.0.tar.gz


   $ tar xvzf dict-jargon-4.2.0.tar.gz
   $ cd dict-jargon-4.2.0
   $ ./configure
   $ make
   $ make db

   $ cp jargon.dict.dz /opt/public/dict-dbs
   $ cp jargon.index /opt/public/dict-dbs

US Gazetteer

Grab dict-gazetteer-1.3.tar.gz


   $ tar xvzf dict-gazetteer-1.3.tar.gz
   $ cd dict-gazetteer-1.3
   $ ./configure
   $ make
   $ make db

   $ cp gazetteer.dict.dz /opt/public/dict-dbs
   $ cp gazetteer.index /opt/public/dict-dbs

The Devils Dictionary

Grab devils-dict-pre.tar.gz

As with the language dictionaries below, the dictionary has already been created for you. Simply unpack this file in your dictionary directory.

Who Was Who: 5000 B. C. to Date

Grab http://www.hawklord.uklinux.net/dict/www-1.0.tgz


   $ tar xvzf www-1.0.tgz
   $ cd www-1.0
   $ ./configure
   $ make
   $ make db

   $ cp www.dict.dz /opt/public/dict-dbs
   $ cp www.index /opt/public/dict-dbs

Language Dictionaries

Visit ftp://ftp.dict.org/pub/dict/pre/www.freedict.de/20000906

Installing a language dictionary does not involve re-building the dictionary from original data, so you just need to unpack each file into you dictionary directory.

Configuring dictd

dictd expects to find the file /etc/dictd.conf, though an alternative file may be specified on the command line. Each dictionary needs to be specified in this file so dictd can find the dictionary and its index. For example if you just want to use Webster's, WordNet and The Devils Dictionary, then the following entries will be required (assuming you use /opt/public/dict-dbs as your dictionary directory):


    database Web-1913  { data "/opt/public/dict-dbs/web1913.dict.dz"
                        index "/opt/public/dict-dbs/web1913.index" }
    database wn        { data "/opt/public/dict-dbs/wn.dict.dz"
                        index "/opt/public/dict-dbs/wn.index" }
    database devils    { data "/opt/public/dict-dbs/devils.dict.dz"
                        index "/opt/public/dict-dbs/devils.index" }

Advanced Configuration

It seems it is possible to implement user access control and other security measures. I have not tried this. If I were into security issues the current state of the software gives me no reason to trust any security feature it might have. But why anyone would want to restrict access to these dictionaries is completely beyond me, this is stuff any user has a right to use.

You should be aware of a number of security issues if you intend to make dictd available over a local network since not being aware will leave your server vulnerable to a number of possible attacks.

Unless you are installing dictd on a server for a school/college or for some other large network these issues will probably be of no concern to you. If you are installing on such a network then you should already be aware of the issues below.

Server Overload, Denial of Service, Heavy Swapping

All these symptoms can occur if a number of users send queries like MATCH * re . at the same time. Such queries return the whole database index and each instance will require around 5MB buffer space on the server.

Possible solutions include limiting the number of connections to the server, limiting the amount of data that can be returned for a single query or limiting the number of simultaneous outstanding searches.

Denial of Service

The server can be driven to a complete stand still by any evil minded cracker that wants to connect to the server 1,000,000 times.

To prevent such anti-social behavior simply limit the number of connections based on IP or mask.

Buffer Overflow

If you experience this kind of problem you should make your logging routines more robust, use strlen and examine daemon_log.

Using dict

dict expects to find the file /etc/dict.conf. This file should contain a line with the name of the machine you wish to use as your dictd server, though this can be overridden at the command line.

The current version of dict is a little disappointing as a users front-end for dictd. If all you have is a console and you can't use Kdict then you will just have to get used to dict. The worst thing about dict is that it can trash your console and you will need to take action (such as logging out and back in) to restore the keyboard to normal! This typically occurs if there is a problem with dictd; such as when it is not running and you try to use dict.

Since dict is just a console program, it just sends output to less. So unless you have a very good memory you will need to use `cut and paste' to transfer referenced words or phrases back to the command line.

There is an option to send output to a pager program. I tried the command dict -html -P lynx luser, the result was not a happy one! Lynx went mad, referencing random help and configuration files in a manner that reminded me of certain viruses in MS operating systems.

Personally I would say if you can avoid using dict directly, avoid it! It is necessary to have it if you want to use Kdict, and you do want to use Kdict.

Kdict

Kdict.gif

To take full advantage of dict you really need Kdict from http://www.rhrk.uni-kl.de/~gebauerc/kdict. I have used version 0.2 and cannot speak for any other version.

To use Kdict you must turn HTML support back on for dict as described above.

The screen shot above shows Kdict in use. Kdict makes good use of the limited HTML tags provided by dict, and inserts extra tags so that you can easily cross-reference words. Any phrase or word shown in red can be clicked on with the mouse to show its definition.

What makes Kdict so good is the fact that you can use the clipboard to highlight a word from any window on the desktop and paste it into Kdict as a query.

Conclusion

This is a great project that can only get better, so it is a lot like Linux and gnu software in general... Give it your full support!

If you get xscrabble from Matt Chapman's homepage, you can enhance your enjoyment of the game by looking up the definitions of words you don't know, - as the computer beats the sh*t out of you;-).


Copyright © 2001, Chris Gibbs.
Copying license http://www.linuxgazette.com/copying.html
Published in Issue 63 of Linux Gazette, Mid-February (EXTRA) 2001

[ Prev ][ Table of Contents ][ Front Page ][ Talkback ][ FAQ ][ Next ]