Breaking Open the Black Box(打开黑箱)

Before we look at the specifics of Lisp's syntax and semantics, it's worth taking a moment to look at how they're defined and how this differs from many other languages.

在介绍 Lisp 的语法和语义之前,值得花一点时间来看看它们是定义及其与许多其他语言的语法所存在的差异。

In most programming languages, the language processor--whether an interpreter or a compiler--operates as a black box: you shove a sequence of characters representing the text of a program into the black box, and it--depending on whether it's an interpreter or a compiler--either executes the behaviors indicated or produces a compiled version of the program that will execute the behaviors when it's run.

在大多数编程语言里,语言的处理器(无论是解释器还是编译器)的操作方式都类似于黑箱作业:一系列表示程序文本的字符被送进黑箱,然后它(取决于是解释器还是编译器)要么执行预想行为,要么产生一个编译版本的程序并在运行时执行这些行为。

Inside the black box, of course, language processors are usually divided into subsystems that are each responsible for one part of the task of translating a program text into behavior or object code. A typical division is to split the processor into three phases, each of which feeds into the next: a lexical analyzer breaks up the stream of characters into tokens and feeds them to a parser that builds a tree representing the expressions in the program, according to the language's grammar. This tree--called an abstract syntax tree--is then fed to an evaluator that either interprets it directly or compiles it into some other language such as machine code. Because the language processor is a black box, the data structures used by the processor, such as the tokens and abstract syntax trees, are of interest only to the language implementer.

当然,在黑箱的内部,语言的处理器通常划分成子系统,各自负一部分责将程序文本转换成具体行为或目标代码的任务。一个典型的任务划分思路是将处理器分成三个阶段,每个阶段为下一个阶段提供内容:一个词法分析器将字符流分拆成语元并将其送进一个解析器,解析器再根据该语言的语法在程序中构建一个表达式的树形表示。这棵树被称为抽象语法树,它随即被送进一个求值器,求值器要么直接解释它,要么将其编译成某种其他语言(比如机器码)。由于语言处理器是一种黑箱,所以处理器所使用的包括语元和抽象语法树在内的数据结构,只对语言的实现者有用。

In Common Lisp things are sliced up a bit differently, with consequences for both the implementer and for how the language is defined. Instead of a single black box that goes from text to program behavior in one step, Common Lisp defines two black boxes, one that translates text into Lisp objects and another that implements the semantics of the language in terms of those objects. The first box is called the reader, and the second is called the evaluator.

而在 Common Lisp 中,分工则有点不同,无论从实现者的角度还是从语言定义方式的角度上来说都是这样。与一个从文本到程序行为一步到位的单一黑箱有所不 同的是,Common Lisp 定义了两个黑箱,一个将文本转化成 Lisp 对象,而另一个则用这些对象来实现语言的语义。前一个箱子称为读取器,后一个称为求值器。

Each black box defines one level of syntax. The reader defines how strings of characters can be translated into Lisp objects called s-expressions. Since the s-expression syntax includes syntax for lists of arbitrary objects, including other lists, s-expressions can represent arbitrary tree expressions, much like the abstract syntax tree generated by the parsers for non-Lisp languages.

每个黑箱都定义了一个语法层面。读取器定义了字符串如何被转换为我们称之为 _S-表达式_的 Lisp 对象。由于 S-表达式语法可适用于由任意对象及其他列表所组成的列表,因此 S-表达式可用来表达任意树形表达式,这跟由非 Lisp 语言的语法解析器所生成的抽象语法树非常相似。

The evaluator then defines a syntax of Lisp forms that can be built out of s-expressions. Not all s-expressions are legal Lisp forms any more than all sequences of characters are legal s-expressions. For instance, both (foo 1 2) and ("foo" 1 2) are s-expressions, but only the former can be a Lisp form since a list that starts with a string has no meaning as a Lisp form.

求值器随后定义了一种构建在 S-表达式之上的 Lisp 形式(form)的语法。并非所有的 S-表达式都是合法的 Lisp 形式,更不用说所有字符序列都是合法的 S-表达式了。举个例子,(foo 1 2)("foo" 1 2) 都是 S-表达式,但只有前者才是一个 Lisp 形式,因为一个以字符串开始的列表对于 Lisp 形式来说是没有意义的。

This split of the black box has a couple of consequences. One is that you can use s-expressions, as you saw in Chapter 3, as an externalizable data format for data other than source code, using READ to read it and PRINT to print it. The other consequence is that since the semantics of the language are defined in terms of trees of objects rather than strings of characters, it's easier to generate code within the language than it would be if you had to generate code as text. Generating code completely from scratch is only marginally easier--building up lists vs. building up strings is about the same amount of work. The real win, however, is that you can generate code by manipulating existing data. This is the basis for Lisp's macros, which I'll discuss in much more detail in future chapters. For now I'll focus on the two levels of syntax defined by Common Lisp: the syntax of s-expressions understood by the reader and the syntax of Lisp forms understood by the evaluator.

这样的黑箱划分方法带来了一系列后果。其中之一是可将 S-表达式(正如第 3 章那样)用作一种可暴露的数据格式来表达源代码之外的数据,用 READ 来读取它再用 PRINT 来打印它。 另一个后果则在于,由于语言的语义是用对象树而非字符串定义而成的,因此很容易使用语言本身而非文本形式来生成代码。完全从手工生成代码的好处很有限——构造列表和构造字符串的工作量大致相同。尽管如此,真正的优势在于可通过处理现有数据来生成代码。这就是 Lisp 宏的本意,我将在后续章节详加论述。目前我将集中在 Common Lisp 所定义的两个层面上:读取器所理解的 S-表达式语法以及求值器所理解的 Lisp 表达式语法。