Many languages introduce special capabilities for specific kinds of applications, but few present us with more powerful control structures or programming paradigms. You may be comfortable sticking with a language you already know, but if you are challenged to write complex programs and are short on time, you need the best language for the job. Icon is a high-level programming language that looks like many other programming languages but offers many advanced features that add up to big gains in productivity. Before we get to all that, let us write the canonical first program:
procedure main() write("Hello, world!") endIf you've installed Linux Icon, Save this in a file named
hello.icn
and run icont, the Icon translator on it:
icont hello -xicont performs some syntax checking on
hello.icn
and transforms the code into instructions for the Icon virtual machine, which will be saved in hello
. The -x
option tells icont
to execute the program also.
We are introducing many concepts, so don't expect to understand everything the first time through -- the only way to learn a language is to write programs in it; so get Icon, and take it for a test drive.
Icon was developed over several years at the University of Arizona by a team led by Ralph Griswold. Today, it runs on many platforms and is used by researchers in algorithms, compilers, and linguistics as well as system administrators and hobbyists. The implementation and source code are in the public domain.
Icon's expression syntax starts out much as do most languages. For example, i+j
represents the arithmetic addition of the values stored in the variables i
and j
, f(x)
is a call to f
with argument x
, variables may be global or local to a procedure, and so on.
Variable declarations are not required, and variables can hold any type of value. However, Icon is a strongly typed language; it knows the type of each value and it does not allow you to mix invalid types in expressions. The basic scalar types are integers, real numbers, strings, and sets of characters (csets). Integers and strings can be arbitrarily large; and strings can contain any characters. There are also structured types: lists, associative arrays, sets and records. Icon performs automatic storage management.
i > 3
-- if the value i
is greater than 3 the expression succeeds, otherwise it fails.
Control structures such as if
check for success, so
if i > 3 then ...does the expected thing. Since the expression semantics are not encumbered with the need to propagate boolean (or 0 and 1) values, comparison operators can instead propagate a useful value (their right operand), allowing expressions such as
3 > i > 7
which is standard in mathematics, but doesn't work in most languages.
Since functions that fail do not need to return an error code separately from the results, detecting cases such as end-of-file is simpler, as in:
if line := read() then write(process(line))On end-of-file,
read()
fails, causing the assignment expression tested in the if-part to fail. When the test fails, the then branch is not executed so the call to write()
does not occur. Since failure propagates through an expression, the above example is equivalent to
write(process(read())
find()
function:
find("or", "horror")In conventional languages, this would return one of the possible return values, usually the first or the last. In Icon, this expression is capable of returning all the values, depending on the execution context. If the surrounding expression only needs one value, as in the case of an if test or an assignment, only the first value of a generator is produced. If a generator is part of a more complex expression, then the return values are produced in sequence until the whole expression produces a value. In the expression
find("or", "horror") > 3the first value produced by
find()
, a 2, causes the >
operation to fail. Icon resumes the call to find(), which produces a 4, and the expression succeeds.
The most obvious generator is the alternation operator |. The expression
expr1 | expr2is a generator that produces its lefthand side followed by its righthand side, if needed by the surrounding expression. Consider
f(1|2)
-- f
is first invoked with the value 1; if that does not produce a value, the generator is resumed for another result and f
will be called again with the value 2. As another example of the same operator,
x = (3 | 5)is equivalent to but more concise than C's (x == 3) || (x == 5). When more than one generator is present in an expression, they are resumed in a LIFO manner.
(x | y) = (3 | 5)is the Icon equivalent of C's
(x == 3) || (x == 5) || (y == 3) || (y == 5)
In addition to |
, Icon has a generate operator !
that generates elements of data structures, and a generator to
that produces ranges of integers. For example, !L
generates the elements of list L, and 1 to 10
generates the first ten positive integers. Besides these operators that generate results, most generators in Icon take the form of calls to built-in and user-defined procedures. Procedures are discussed below.
while
loop where the control expression is evaluated before each iteration. For generators, an alternative loop is available where the loop body executes once per result produced by a single evaluation of the control expression. The alternative loop uses the reserved word every
and can be used in conjunction with the to
operator to provide the equivalent of a for
-loop:
every i := 1 to 10 do ...The point of
every
and to
is not that you can use them to implement a for-loop; Icon's generator mechanism is quite a bit more general. The every
loop lets you walk through all the results of a generator giving you iterators for free. And every
isn't limited to sequences of numbers or traversals of specific data structures like iterators in some languages; it works on any expression that contains generators.
every f(1 | 1 | 2 | 3 | 5 | 8)executes the function
f
with the first few fibonacci numbers, and the example could be generalized to a user-defined generator procedure that produced the entire fibonacci sequence. Using generators requires a bit of practice, but then it is fun!
Procedures are a basic building block in most languages, including Icon. Like C, an Icon program is organized as a collection of procedures and execution starts from a procedure named main()
. Here is an example of an ordinary procedure. This one generates and sums the elements of a list L
, whose elements had better be numbers (or convertible to numbers).
procedure sum(L) total := 0 every total +:= !L return total end
A user can write her own generator by including a
suspend exprin a procedure where a result should be produced. When a procedure suspends, it transfers a result to the caller, but remains available to continue where it left off and generate more results. If the expression from which it is called needs more or different results in order to succeed, the procedure will be resumed. The following example generates the elements from parameter
L
, but filters out the zeros.
procedure nonzero(L) every i := !L do if i ~= 0 then suspend i endThe
fail
expression makes the procedure fail, i.e. causes control to go back to the calling procedure without returning a value. A procedure also fails implicitly if control flows off the end of the procedure's body.
Besides expression evaluation, Icon offers compelling features to reduce the effort required to write complex programs. From Icon's ancestor SNOBOL4, the granddaddy of all string processing languages, Icon inherits some of the most flexible and readable built-in data structures found in any language.
s
is the string "hello, world"
then the expressions
s[7] := " linux " s[14:19] := "journal"change
s
into "hello, linux journal"
, illustrating the ease with which insertions and substitutions are made. A myriad of built-in functions operate on strings; among them are the operators for concatenation (s1 || s2
) and size (*s
).
?
operator:
s ? exprA scanning environment has a string and a current position in it. Matching functions change this position, and return the substring between the old and new positions. Here is a simple example:
text ? { while move(1) do write(move(1)) }
move
is a function that advances the position by its argument; so this code writes out every alternate character of the string in text
. Another matching function is tab
, which sets the position to its argument. String analysis functions examine a string and generate the interesting positions in it. We have already seen find
, which looks for substrings. These functions default their subject to the string being scanned. Here is a procedure that produces the words from the input:
procedure getword() while line := read() do line ? while tab(upto(wchar)) do { word := tab(many(wchar)) suspend word } end
upto(c)
returns the next position of a character from the cset c
; and many(c)
returns the position after a sequence of characters from c
. The expression tab(upto(wchar))
advances the position to a character from wchar
, the set of characters that make up words; then tab(many(wchar))
moves the position to the end of the word and returns the word that is found.
The Icon Program Library (included with the distribution) provides regular expression matching functions. To use it, include the line link regexp
at the top of the program. Here is an example of `search-and-replace':
procedure re_sub(str, re, repl) result := "" str ? { while j := ReFind(re) do { result ||:= tab(j) || repl tab(ReMatch(re)) } result ||:= tab(0) } return result end
Icon has several structured (or non-scalar) types as well that help organize and store collections of arbitrary (and possibly mixed) types of values. A table is an associative array, where values are stored indexed by keys which may be of arbitrary type; a list is a group of values accessed by integer indices as well as stack and queue operations; a set is an unordered group of values, etc.
table
function. It takes one argument: the default value, i.e. the value to return when lookup fails. Here is a code fragment to print a word count of the input (assuming the getword
function generates words of interest):
wordcount := table(0) every word := getword() do wordcount[word] +:= 1 every word := key(wordcount) do write(word, "\t", wordcount[word])(The
key
function generates the keys with which values have been stored.) Since the default value for the table is 0, when a new word is inserted, the default value gets incremented and the new value (i.e. 1) is stored with the new word. Tables grow automatically as new elements are inserted.
L := ["linux", 2.0, "unix"]Lists are dynamic; they grow or shrink through calls to list manipulation routines like
pop()
etc. Elements of the list can be obtained either through list manipulation functions or by subscripting:
write(L[3])There is no restriction on the kinds of values that may be stored in a list.
record complex(re, im)instances of that record are created using a constructor procedure with the name of the record type, and on such instances, fields are accessed by name:
i := complex(0, 0) j := complex(1, -1) if a.re = b.re then ...
A set is an unordered collection of values with the uniqueness property i.e. an element can only be present in a set once.
S := set(["rock lobster", 'B', 52])The functions
member
, insert
, and delete
do what their names suggest. Set intersection, union and difference are provided by operators. A set can contain any value (including itself, thereby neatly sidestepping Russell's paradox!).
Since there is no restriction on the types of values in a list, they can be other lists too. Here's an example of a how a graph or tree may be implemented with lists:
record node(label, links) ... barney := node("Barney", list()) betty := node("Betty", list()) bambam := node("Bam-Bam", list()) put(bambam.links, barney, betty)
global wchar procedure main(args) wchar := &ucase ++ &lcase (*args = 1) | stop("Need a file!") f := open(args[1]) | stop("Couldn't open ", args[1]) wordlist := table() lineno := 0 while line := read(f) do { lineno +:= 1 every word := getword(line) do if *word > 3 then { # if word isn't in the table, set entry to empty list /wordlist[word] := list() put(wordlist[word], lineno) } } L := sort(wordlist) every l := !L do { writes(l[1], "\t") linelist := "" # Collect line numbers into a string every linelist ||:= (!l[2] || ", ") write(linelist[1:-2]) } end procedure getword(s) s ? while tab(upto(wchar)) do { word := tab(many(wchar)) suspend word } endIf we run this program on this input:
Sing, Mother, sing. Can Mother sing? Mother can sing. Sing, Mother, sing!the program writes this output:
Mother 1, 2, 3, 4 Sing 1, 4 sing 1, 2, 3, 4While we may not have covered all the features used in this program, it should give you a feeling for the flavour of the language.
Another novel control facility in Icon is the co-expression, which is an expression encapsulated in a thread-like execution context where its results can be picked apart one at a time. Co-expressions are are more portable and more fine-grained than comparable facilities found in most languages. Co-expressions let you `capture' generators and then use their results from multiple places in your code. Co-expressions are created by
create exprand each result of the co-expression is requested using the
@
operator.
As a small example, suppose you have a procedure prime()
that generates an infinite sequence of prime numbers, and want to number each prime as you print them out, one per line. Icon's seq()
function will generate the numbers to precede the primes, but there is no way to generate elements from the two generators in tandem; no way except using co-expressions, as in the following:
numbers := create seq() primes := create prime() every write(@numbers, ": ", @primes)More information about co-expressions can be found at
http://www.drones.com/coexp/
and a complete description is in the Icon language book mentioned below.
Icon features high-level graphics facilities that are portable across platforms. The most robust implementations are X Window and Microsoft Windows; Presentation Manager, Macintosh, and Amiga ports are in various stages of progress. The most important characteristics of the graphics facilities are:
"g"
in the call to open stands for "graphics". &window
is a special global variable that serves as a default window for graphics functions. &lpress
, &ldrag
, and &rpress
are special constants that denote left mouse button press and drag, and right mouse button press, respectively. &x
and &y
are special global variables that hold the mouse position associated with the most recent user action returned by Event(). "\e"
is a one-letter Icon string containing the escape character.
procedure main() &window := open("LJ example","g") repeat case e := Event() of { &lpress | &ldrag : DrawPoint(&x,&y) &rpress : GotoXY(&x,&y) "\e" : break default : if type(e)=="string" then writes(&window, e) } endA complete description of the graphics facilities is available on the web at
http://www.cs.arizona.edu/icon/docs/ipd281.html
An Icon program that uses the POSIX functions should include the header file posixdef.icn
. On error, the POSIX functions fail and set the keyword &errno
; the corresponding printable error string is obtained by calling sys_errstr()
.
Unix functions that return a C struct (or a list, in Perl) return records in Icon. The fields in the return values have names similar to the Unix counterparts: stat() returns a record with fields ino, nlink, mode etc.
A complete description of the POSIX interfaces is included in the distribution; an HTML version is available on the web, at http://www.drones.com/unicon/. We look at a few small examples here.
Let us look at how a simple version of the Unix ls command may be written. What we need to do is to read the directory, and perform a stat call on each name we find. In Icon, opening a directory is exactly the same as opening a file for reading; every read returns one filename.
f := open(dir) | stop(name, ":", sys_errstr(&errno)) names := list() while name := read(f) do push(names, name) every name := !sort(names) do write(format(lstat(name), name, dir))The lstat function returns a record that has all the information that lstat(2) returns. One difference between the Unix version and the Icon version is that the mode field is converted to a human readable string -- not an integer on which you have to do bitwise magic on. (And in Icon, string manipulation is as natural as a bitwise operation.)
The function to format the information is simple; it also checks to see if the name is a symbolic link, in which case it prints the value of the link also.
link printf procedure format(p, name, dir) s := sprintf("%7s %4s %s %3s %8s %8s %8s %s %s", p.ino, p.blocks, p.mode, p.nlink, p.uid, p.gid, p.size, ctime(p.mtime)[5:17], name) if p.mode[1] == "l" then s ||:= " -> " || readlink(dir || "/" || name) return s end
It's not just stat that uses human-readable values -- chmod can accept an integer that represents a bit pattern to set the file mode to, but it also takes a string just like the shell command:
chmod(f, "a+r")And the first argument: it can be either an opened file or a path to a file. Since Icon values are typed, the function knows what kind of value it's dealing with -- no more fchmod or fstat. The same applies to other functions -- for example, the Unix functions getpwnam, getpwuid and getpwent are all subsumed by the Icon function getpw which does the appropriate thing depending on the type of the argument:
owner := getpw("ickenham") root := getpw(0) while u := getpw() do ...Similarly, trap and kill can accept a signal number or name; wait returns human-readable status; chown takes a username or uid; and select takes a list of files.
select
The select()
function waits for input to become available on a set of files. Here is an example of the usage -- this program waits for typed input or for a window event, with a timeout of 1000 milliseconds:
repeat { while *(L := select([&input, &window], 1000)) = 0 do ... handle timeout if &errno ~= 0 then stop("Select failed: ", sys_errstr(&errno)) every f := !L do case f of { &input : handle_input() &window : handle_evt() } }If called with no timeout value, select will wait forever. A timeout of 0 performs a poll.
Icon provides a much simpler interface to BSD-style sockets. Instead of the four different system calls that are required to start a TCP/IP server using Perl, only one is needed in Icon--the open function opens network connections as well as files. The first argument to open is the network address to connect to -- host:port for Internet domain connections, and a filename for Unix domain sockets. The second argument specifies the type of connection.
Here is an Internet domain TCP server listening on port 1888:
procedure main() while f := open(":1888", "na") do if fork() = 0 then { service_request(f) exit() } else close(f) stop("Open failed: ", sys_errstr(&errno)) endThe "na" flags indicate that this is a network accept. Each call to
open
waits for a network connection and then returns a file for that connection. To connect to this server, the "n" (network connect) flag is used with open
. Here's a function that connects to a `finger' server:
procedure finger(name, host) static fserv initial fserv := getserv("finger") | stop("Couldn't get service: ", sys_errstr(&errno)) f := open(host || ":" || fserv.port, "n") | fail write(f, name) | fail while line := read(f) do write(line) endNice and simple, isn't it? One might even call it Art! On the other hand, writing socket code in Perl is not much different from writing it in C, except that you have to perform weird machinations with pack. No more! Eschew obfuscation, do it in Icon.
s := getserv("daytime", "udp") f := open(host || ":" || s.port, "nu") | stop("Open failed: ", sys_errstr(&errno)) writes(f, " ") if *select([f], 5000) = 0 then stop("Connection timed out.") r := receive(f) write("Time on ", host, " is ", r.msg)Since UDP is not reliable, the receive is guarded with select (timeout of 5000 ms), or the program might hang forever if the reply is lost.
The popular languages Perl and Java have been covered in LJ, and we think it is worth discussing how Icon stacks up against these dreadnaughts.
But when it comes to language design, Perl and Icon are not at all alike. Perl has been designed with very little structure -- or, as Larry Wall puts it, it's more like a natural language than a programming language. Perl looks strange but underneath the loose syntax its semantics are those of a conventional imperative language. Icon, on the other hand, looks more like a conventional imperative language but has richer semantics.
Namespace confusion: it is a bad idea to allow scalar variables, vector variables and functions to have the same name. This seems like a useful thing to do, but it leads to write-only code. We think this is primarily why it's hard to maintain Perl programs. A couple of things are beyond belief -- $foo and %foo are different things, but the expression $foo{bar} actually refers to an element of %foo!
Parameter passing is a mess. Passing arrays by name is just too confusing! Even after careful study and substantial practice, we still are not absolutely certain about how to use *foo in Perl. As if to make up for the difficulty of passing arrays by reference, all scalars are passed by reference! That's very unaesthetic.
Why are there no formal parameters? Instead, one has to resort to something that looks like a function call to declare local variables and assign @_ to it. Allowing the parentheses to be left off subroutine calls is also unfortunate; it is another `easy to write, hard to read' construct. And the distinction between built-in functions and user-defined subroutines is ugly.
Variables like $` are a bad idea. We think of special characters as punctuation, we don't expect them to be (borrowing Wall's terminology) nouns. And the mnemonics that are required are evidence that these variables place an additional burden of memorization upon the programmer. (Quick, if you write Perl programs: What's `$('?)
The distinction between array and scalar contexts also leads to obfuscated code. Certainly after you've been writing Perl for a while, you get used to it (and might even like it), but again, this is just confusing. All the warnings in the Perl documentation about being certain you are evaluating in the right context is evidence of this.
The important differences between Java and Icon are differences of philosophy. The Java philosophy is that everything is an object, nothing is built-in to the language, and programmers should learn class libraries for all non-trivial structures and algorithms. Java's lack of operator overloading means that its object-oriented notation allows no "shorthand" as does C++. Java's simplicity is a welcome relief after C++, but its expressive power is so weak compared to Icon (and several other very high level languages) that it is hard to argue that Java can supplant these languages. Most of Java's market share is being carved out of the C and C++ industries.
http://www.cs.arizona.edu/sumatra/hallofshame/
. Most of Java's misfeatures are sins of omission, because the language designers were trying to be elegant and minimal. We would like to see a Java dialect with features such as Icon's goal-directed evaluation, Perl's pattern matching, and APL's array-at-a-time numeric operators; a description of such a dialect is at http://segfault.cs.utsa.edu/godiva/.
Users who become serious about the language will want a copy of `The Icon Programming Language', by Ralph and Madge Griswold, Peer-to-Peer Communications 1997, ISBN 1-57398-001-3.
Lots of documentation for Icon is available from the University of Arizona, at http://www.cs.arizona.edu/icon/ There is also a newsgroup on Usenet: comp.lang.icon.
The Icon source distribution is at:
ftp://ftp.cs.arizona.edu/icon/packages/unix/unix.tgz
The POSIX functions are in the following patch that you need to apply if you wish to build from sources:
ftp://ftp.drones.com/unicon-patches.tar.gz
Linux binaries (kernel 2.0 ELF, libgdbm 2.0.0, libX11 6.0, libdl 1.7.14, libm 5.0.0 and libc 5.2.18) for Icon (with X11 and POSIX support) are available at
ftp://ftp.drones.com/icon-9.3-3.i386.rpm | Icon ftp://ftp.drones.com/icon-ipl-9.3-3.i386.rpm | Icon Program Library |
ftp://ftp.drones.com/icon-idol-9.3-3.i386.rpm | Idol: Object-oriented Icon | |
ftp://ftp.drones.com/icon-vib-9.3-3.i386.rpm | VIB: The Visual Interface Builder | |
ftp://ftp.drones.com/icon-docs-9.3-3.i386.rpm | Documentation |