Introduction

Another Simplistic Datalog Implementation (ASDI), in Rust.

Datalog is a logic programming language and a subset of the earlier Prolog1. The language is interesting as it can be used as a data query language akin to SQL with some important additional capabilities such as recursive queries. It is also expressive enough to allow for it’s use as an entailment mechanism for ontology languages such as the Web Ontology Language (OWL) and the Semantic Web.

This package provides an in-memory model to represent Datalog programs, a parser for a textual representation, relation input/output implementations and some implementations of common inference techniques.

Chapter 1, provides a brief simple description of Datalog by example.

Chapter 2 will provide a more detailed overview of Datalog itself from both an abstract point of view, and a tour of the concrete syntax used as the textual representation.

Chapter 3 will provide a guide to the crate API in a more scenario-based fashion and relies on the rust doc to provide the definitive reference.

Chapter 4 will cover the major extension points in the crate and identify how to contribute new implementations for specific extension traits.

Brief Datalog Example

We will consider the following syllogism from Mill1851, vol. 1, chap. 2:

All men are mortal.

Socrates is a man.

Therefore, Socrates is mortal.

This can be represented in the following Datalog textual program.

human("Socrates").

mortal(X) <- human(X).

?- mortal("Socrates").

The execution of this program will start with the goal query “is Socrates mortal?” and in doing so will evaluate the necessary rule and derive the relation mortal. The result is a boolean value denoting whether the goal is satisfied.

+------------+
| _: boolean |
+============+
| true       |
+------------+

However, if we were to change the final query to replace the constant with a variable, as follows.

?- mortal(X).

The program will select all matching (in this case all) facts from the mortal relation.

+------------+
| X: string  |
+============+
| "Socrates" |
+------------+

The following is the same example constructed via the ASDI library.

#![allow(unused)]
fn main() {
use asdi::{NameReferenceSet, Program};
use asdi::edb::{Attribute, Predicate};
use asdi::idb::{Atom, Term, Variable};
use std::str::FromStr;

let mut syllogism = Program::default();

let predicates = syllogism.predicates();
let p_human = predicates.fetch("human").unwrap();
let p_mortal = predicates.fetch("mortal").unwrap();

let human = syllogism
    .add_new_extensional_relation(p_human.clone(), vec![Attribute::string()])
    .unwrap();
human.add_as_fact(["Socrates".into()]).unwrap();

let variables = syllogism.variables();
let var_x: Term = variables.fetch("X").unwrap().into();

syllogism
    .add_new_pure_rule(
        p_mortal.clone(),
        [var_x.clone()],
        [Atom::new(p_human, [var_x]).into()],
    )
    .unwrap();

syllogism
    .add_new_query(p_mortal, ["Socrates".into()])
    .unwrap();
}

The textual form of Datalog also has a set of pragmas, or directives, which allow us to specify details of the relations in our program. For example, in the text below we use the assert pragma to identify an EDB relation named “human” with a single column of type string. We then go on to use the infer pragma to identify an IDB relation named “mortal”, and we declare that it’s schema is the same as the EDB relation “human”.

.assert human(string).
.infer mortal from human.

human("Socrates").

mortal(X) <- human(X).

?- mortal("Socrates").

License

MIT License

Copyright (c) 2021 Simon Johnston

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

1

Chapter 1 of CeGoTa90 provides a good overview of the drawbacks of Prolog and the advantages of Datalog for certain tasks.

Datalog

Datalog is a logic programming language, but is also an expressive query language for deductive databases. As such it can be studied, and described, either as a programming language, as a language for expressing mathematical logic, or as a query language.

When considered from a mathematical point of view it is common to start from a basis in predicate logic or Horn clauses and demonstrate how these relate to the rules in a Datalog program. The following section takes a more direct approach leaving the relationship with other logic forms for a brief appendix.

When considered from a database or query point of view it is common to start from a basis in relational algebra and demonstrate the expressive equivalence between it and Datalog. In considering Datalog as a query language it is noted that every expression in the basic relational algebra can be expressed as a Datalog query. A mapping from relational algebra to Datalog is also included as an appendix.

Datalog Languages

When we talk specifically about the Datalog language it is common to represent it in a roman serif face as $\small\text{Datalog}$. This allows the distinction of talking in general about Datalog in the abstract vs. the actual semantics of the language $\small\text{Datalog}$.

This core $\small\text{Datalog}$ language has many documented extensions that provide additional flexibility, conciseness, or expressiveness. These languages, sub-languages, extensions, or dialects are usually labeled by adding a superscript symbol that identifies the specific extension. For example, $\small\text{Datalog}^{\lnot}$ is the language extended with negation of literals, $\small\text{Datalog}^{\Gamma}$ is the language extended with type checking on attributes, and $\small\text{Datalog}^{\lnot,\theta}$ is the language extended with negation of literals and arithmetic literals. The order of superscript symbols is irrelevant.

When referring to the specifics of the language we will use the common format $\small\text{Datalog}$ with superscripts as necessary to identify specific language extensions being used.

Abstract Syntax

This section describes the structure of $\small\text{Datalog}$ languages without reference to any concrete syntax or serialized representation. The most common concrete syntax is derived from Prolog and will be described in detail in the following section. In this section, text in bold indicates a key concept in the language while text in italics indicates a forward reference to such a concept.

Programs

A Datalog Program $\small P$ is a tuple comprising the Extensional database, EDB, or $\small D_{E}$, the Intensional database, IDB, or $\small D_{I}$, and a set of queries $\small Q$.

$$\tag{0}\small P=( D_E, D_I, Q )$$

The extensional database in turn is a set of relations each of which is a set of facts (ground atoms). The intensional database is a set of rules that derive additional facts into intensional relations via entailment.

A program has the following properties.

  • $\small program(x)$ a boolean predicate which returns true IFF $\small x$ is a program.
  • $\small extensional(P) \equiv D_E$ returns the set of relations comprising the extensional database.
  • $\small intensional(P) \equiv \lbrace r | r \in D_I \land relation(r) \rbrace$ returns the set of relations comprising the intensional database, IFF entailment has occurred.
  • $\small rules(P) \equiv \lbrace r | r \in D_I \land rule(r) \rbrace$ returns the set of rules associated with the intensional database.
  • $\small positive(r)$ a boolean predicate which returns true IFF all intensional rules are positive: $$\tag{v}\small positive(p) \coloneqq (\forall{r}\in rules(p); positive(r))$$

Rules

Rules $\small R$ are built from a language $\small \mathcal{L}=( \mathcal{C},\mathcal{P},\mathcal{V})$ that contains the

  • $\small \mathcal{C}$ — the finite sets of symbols for all constant values; e.g. hello, "hi" 123,
  • $\small \mathcal{P}$ — the finite set of alphanumeric character strings that begin with a lowercase character; e.g. human, size, a,
  • $\small \mathcal{V}$ — the finite set of alphanumeric character strings that begin with an uppercase character; e.g. X, A, Var.

While it would appear that the values from $\small \mathcal{C}$ and $\small \mathcal{P}$ would intersect, these values must remain distinct. For example, the value human is a valid predicate and string constant, but they have distinct types in ASDI that ensure they have distinct identities.

Each rule $\small r \in R$ has the form:

$$\tag{i}\small A_1, \ldots, A_m \leftarrow L_1, \ldots, L_n$$

as well as the following properties:

  • $\small rule(x)$ a boolean predicate which returns true IFF $\small x$ is a rule.
  • $\small head(r)$ (the consequence), returns the set of atom values $\small A_1, \ldots, A_m$ where $\small m \in \mathbb{N}$,
  • $\small body(r)$ (the antecedence), returns the set of literal values $\small L_1, \ldots, L_n$ where $\small n \in \mathbb{N}$,
  • $\small distinguished(r)$ returns the set of terms in the head of a rule, $$\tag{ii}\small distinguished(r) \coloneqq \lbrace t | t \in \bigcup\lbrace terms(a) | a \in head(r) \rbrace \rbrace$$
  • $\small non\text{-}distinguished(r)$ returns the set of terms in the body that of a rule that are not in the head, $$\tag{iii}\small non\text{-}distinguished(r) \coloneqq \lbrace t | t \in ( \bigcup\lbrace terms(a) | a \in body(r) \rbrace - distinguished(r) \rbrace)\rbrace$$
  • $\small ground(r)$ a boolean predicate which returns true IFF its head and its body are both ground: $$\tag{iv}\small ground(r) \coloneqq (\forall{a}\in head(r); ground(a)) \land (\forall{l}\in body(r); ground(l))$$
  • $\small positive(r)$ a boolean predicate which returns true IFF all body literals are positive: $$\tag{v}\small positive(r) \coloneqq (\forall{l}\in body(r); positive(l))$$

A pure rule is one where there is only a single atom in the head; if the body is true, the head is true. A constraint rule, or contradiction, does not allow any consequence to be determined from evaluation of its body. A disjunctive rule is one where there is more than one atom, and any one may be true if the body is true. The language $\small\text{Datalog}^{\lor}$ allows for inclusive disjunction, and while a language, $\small\text{Datalog}^{\oplus}$, exists for exclusive disjunction it is not implemented here.

The property $\small form(r)$ returns the form of the rule, based on the cardinality of the rule’s head as follows:

$$\tag{vi}\small form(r) \coloneqq \begin{cases} pure, &\text{if } |head(r)| = 1 \\ constraint, &\text{if } |head(r)| = 0 \land \text{Datalog}^{\lnot} \\ disjunctive, &\text{if } |head(r)| > 1 \land \text{Datalog}^{\lor} \end{cases}$$

Note that this notation is similar to that of a sequent. Taking our definition of a rule, $\small A_1, \ldots, A_m \leftarrow L_1, \ldots, L_n$, and swap the order of antecedence and consequence we get $\small L_1, \ldots, L_m \vdash A_1, \ldots, A_n$. A pure rule is termed a simple conditional assertion, a constraint rule is termed an unconditional assertion, and a disjunctive rule is termed a sequent (or simply conditional assertion).

Alternatively, some literature defines a rule in the following form:

$$\tag{ia}\small C_1 | C_2 | \ldots | C_n \leftarrow A_1, \ldots, A_m, \lnot B_1, \ldots, \lnot B_k$$

Where this form shows the expanded head structure according to $\small\text{Datalog}^{\lor}$, and the set of negated literals according to $\small\text{Datalog}^{\lnot}$.

Datalog does not allow rules to infer new values for relations that exist in the extensional database. This may be expressed as follows:

$$\tag{xvii}\small \mathcal{P}_E \cap \mathcal{P}_I = \empty$$

The same restriction is not required for constants in $\small \mathcal{C}_P$ or variables in $\small \mathcal{V}_P$ which should be shared.

Terms

Terms, mentioned above, may be constant values or variables such that $\small\mathcal{T}=\mathcal{C}\cup\mathcal{V}\cup\bar{t}$ where $\small\bar{t}$ represents an anonymous variable.

Terms have the following properties:

  • $\small term(x)$ a boolean predicate which returns true IFF $\small x$ is a term.
  • $\small constant(t)$ a boolean predicate which returns true IFF the term argument is a constant value.
  • $\small variable(t)$ a boolean predicate which returns true IFF the term argument is a variable.
  • $\small anonymous(t)$ a boolean predicate which returns true IFF the term argument is the anonymous variable, $\small\bar{t}$.

With the definition of rules so far it is possible to write rules that generate an an infinite number of results. To avoid such problems Datalog rules are required to satisfy the following Safety conditions:

  1. Every variable that appears in the head of a clause also appears in a positive relational literal (atom) in the body of the clause. $$\tag{vii}\small \begin{alignat*}{2} safe\text{-}head(r) &\coloneqq &&\lbrace t | t \in distinguished(r), t \in \mathcal{V} \rbrace \\ &- &&\lbrace t | t \in \bigcup\lbrace terms(a) | a \in body(r), atom(a), positive(a) \rbrace, t \in \mathcal{V} \rbrace \\ &= &&\empty \end{alignat*}$$
  2. Every variable appearing in a negative literal in the body of a clause also appears in some positive relational literal in the body of the clause. $$\tag{viii}\small \begin{alignat*}{2} safe\text{-}negatives(r) &\coloneqq &&\lbrace t | t \in \bigcup\lbrace terms(a) | a \in body(r), \lnot positive(a) \rbrace, t \in \mathcal{V} \rbrace \\ &- &&\lbrace t | t \in \bigcup\lbrace terms(a) | a \in body(r), atom(a), positive(a) \rbrace, t \in \mathcal{V} \rbrace \\ &= &&\empty \end{alignat*}$$

Atoms

Atoms are comprised of a label, $\small p \in \mathcal{P}$, and a tuple of terms. A set of atoms form a Relation if each conforms to the schema of the relation. The form of an individual atom is as follows:

$$\tag{ix}\small p(t_1, \ldots, t_k)$$

as well as the following properties:

  • $\small atom(x)$ a boolean predicate which returns true IFF $\small x$ is a atom.
  • $\small label(a)$ returns the predicate $\small p$,
  • $\small terms(a)$ returns the tuple of term values $\small t_1, \ldots, t_k$; where $\small t \in \mathcal{T}$ and $\small k \in \mathbb{N}^{+}$,
  • $\small arity(a)$ returns the cardinality of the relation identified by the predicate; $\small arity(a) \equiv |terms(a)| \equiv k$,
  • in $\small\text{Datalog}^{\Gamma}$:
    • there exists a type environment $\small \Gamma$ consisting of one or more types $\small \tau$,
    • each term $\small t_i$ has a corresponding type $\small \tau_i$ where $\small \tau \in \Gamma$,
    • $\small type(t)$ returns the type $\small \tau$ for that term,
    • $\small types(a)$ returns a tuple such that; $\small (i \in {1, \ldots, arity(a)} | type(t_i))$,
  • $\small ground(a)$ a boolean predicate which returns true IFF its terms are all constants: $$\tag{x}\small ground(a) \coloneqq (\forall{t}\in terms(a); t \in \mathcal{C})$$

Relations

Every relation $\small r$ has a schema that describes a set of attributes $\small \lbrace \alpha_1, \ldots, \alpha_j \rbrace$, and each attribute may be labeled, and may in $\small\text{Datalog}^{\Gamma}$ also have a type. In general, we refer to attributes by index, a value in $\small 1, \cdots, j$.

Relations have the following properties:

  • $\small relation(x)$ a boolean predicate which returns true IFF $\small x$ is a relation.
  • $\small label(r)$ returns the predicate $\small p$,
  • $\small schema(r)$ returns the set of attributes $\small \lbrace \alpha_1, \ldots, \alpha_j \rbrace$; where $\small k \in \mathbb{N}^{+}$,
  • $\small arity(r)$ returns the number of attributes in the relation’s schema, and therefore all atoms within the relation; $\small arity(r) \equiv |schema(a)| \equiv j$.
  • $\small atoms(r)$ returns the set of atoms that comprise this relation.

You will sometimes find the term sort used instead of schema. This term is used with meaning derived from Model Theory (see also Wilfred22) as a partition of objects into similarly structured forms. Sort is used in this way extensively in AbHuVi94.

Attributes have the following properties:

  • $\small index(\alpha)$ returns the index of this attribute in the schema, where $\small index \in \lbrace 1, \cdots, j \rbrace$.
  • $\small label(\alpha)$ returns either the predicate label of the attribute, or $\small\bot$.
  • in $\small\text{Datalog}^{\Gamma}$:
    • $\small type(\alpha)$ returns a type $\small \tau$ for the attribute, where $\small \tau \in \Gamma$, or $\small\bot$.

The following defines a binary function that determines whether an atom $\small a$ conforms to the schema of a relationship $\small r$.

$$\tag{xi}\small \begin{alignat*}{2} conforms(a, r) &\coloneqq &&ground(a) \\ &\land &&label(a) = label(r) \\ &\land &&arity(a) = arity(r) \\ &\land &&\forall{i} \in [1, arity(r)] \medspace conforms( a_{t_i}, r_{\alpha_i} ) \end{alignat*} $$ $$\tag{xii}\small conforms(t, \alpha) \coloneqq label(t) = label(\alpha) \land \tau_{t} = \tau_{\alpha} $$

Note that in relational algebra it is more common to use the term domain $\small D$ to denote a possibly infinite set of values. Each attribute on a relation has a domain $\small D_i$ such that each ground term is a value $\small d_i$ and the equivalent of $\small \tau_i \in \Gamma$ becomes $\small d_i \in D_i$.

To visualize a set of facts in a relational form we take may create a table $p$, where each column, or attribute, corresponds to a term index $1 \ldots k$. If the facts are typed then each column takes on the corresponding $\tau$ type. Finally each row in the table is populated with the tuple of term values.

$\small col_1: \tau_1$$\small \ldots$$\small col_k: \tau_k$
$\small row_1$$\small t_{1_1}$$\small \ldots$$\small t_{1_k}$
$\small \ldots$$\small \ldots$$\small \ldots$$\small \ldots$
$\small row_y$$\small t_{y_1}$$\small \ldots$$\small t_{y_k}$

Literals

Literals within the body of a rule, represent sub-goals that are the required to be true for the rule’s head to be considered true.

  • A literal may be an atom (termed a relational literal) or, in $\small\text{Datalog}^{\theta}$, a conditional expression (termed an arithmetic literal),
  • an arithmetic literal has the form $\small \langle t_{lhs} \theta t_{rhs} \rangle$, where
    • $\small \theta \in \lbrace =, \neq, <, \leq, >, \geq \rbrace$,
    • in $\small\text{Datalog}^{\Gamma}$ both $\small t_{lhs}$ and $\small t_{rhs}$ terms have corresponding types $\small \tau_{lhs}$ and $\small \tau_{rhs}$,
    • the types $\small \tau_{lhs}$ and $\small \tau_{rhs}$ must be compatible, for some system-dependent definition of the property $\small compatible(\tau_{lhs}, \tau_{rhs}, \theta)$,
  • in $\small\text{Datalog}^{\lnot}$ a literal may be negated, appearing as $\small \lnot l$,
  • and has the following properties:
    • $\small literal(x)$ a boolean predicate which returns true IFF $\small x$ is a literal.
    • $\small relational(l)$ a boolean predicate which returns true IFF the literal argument is a relational literal.
    • $\small arithmetic(l)$ a boolean predicate which returns true IFF the literal argument is a arithmetic literal.
    • $\small terms(l)$ returns the set of terms in a literal, $$\tag{xiii}\small terms(l) \coloneqq \begin{cases} terms(l), &\text{if } relational(l) \\ \lbrace t_{lhs}, t_{rhs} \rbrace, &\text{if } arithmetic(l) \land \text{Datalog}^{\theta} \end{cases}$$
    • $\small ground(l)$ a boolean predicate which returns true IFF its terms are all constants $\small (\forall{t}\in terms(l); t \in \mathcal{C})$,
    • $\small positive(l)$ in $\small\text{Datalog}^{\lnot}$ returns false if negated, otherwise it will always return true.
$\small relational(l)$$\small arithmetic(l)$$\small terms(l)$$\small ground(l)$$\small positive(l)$
$\small p(X, Y)$truefalse$\small \lbrace X, Y \rbrace$falsetrue
$\small p(X, 1)$truefalse$\small \lbrace X, 1 \rbrace$falsetrue
$\small p(2, 1)$truefalse$\small \lbrace 2, 1 \rbrace$truetrue
$\small X = 1$falsetrue$\small \lbrace X, 1 \rbrace$falsetrue
$\small \lnot p(2, 1)$truefalse$\small \lbrace 2, 1 \rbrace$truefalse
$\small \lnot X = 1$falsetrue$\small \lbrace X, 1 \rbrace$falsefalse

† note that while $\small \lnot X = 1$ is a negative literal, the corresponding literal $\small X \neq 1$ is not.

It is sometimes useful to consider the values (tautology) and (absurdity) as pseudo-literals given that certain expressions can be reduced to them. For example, the arithmetic literal $\small 1=1$ must always be true and can therefore be reduced to (or discarded entirely); similarly the arithmetic literal $\small 1=2$ must always be false and can therefore be reduced to . Any rule where any literal reduces to can itself never be true. Any rule that only comprises literals that reduce to must always be true.

Facts

Any ground rule where $\small m=1$ and where $\small n=0$ is termed a Fact as it is true by nature of having an empty body, or alternatively we may consider the body be comprised of the truth value $\small\top$.

Facts have the following properties:

  • $\small fact(x)$ a boolean predicate which returns true IFF $\small x$ is a fact. $$\tag{xiv}\small fact(r) \coloneqq (ground(r) \land form(r)=pure \land body(r)=\empty)$$

Queries

An atom may be also used as a Goal or Query clause in that its constant and variable terms may be used to match facts from the known facts or those that may be inferred from the set of rules introduced. A ground goal is simply determining that any fact exists that matches all the constant values provided and will return true or false. In the case that one or more variables exist a set of facts will be returned that match the expressed constants and provide the corresponding values for the variables.

The set of relations accessible to queries in the program is the union of relations in the extensional and intensional databases.

$$\tag{xvi}\small \mathcal{P}_P = \mathcal{P}_E \cup \mathcal{P}_I$$

It should be noted that the same exists for constants and variables; $\small \mathcal{C}_P = \mathcal{C}_E \cup \mathcal{C}_I$ and $\small \mathcal{V}_P = \mathcal{V}_E \cup \mathcal{V}_I$.

Queries have the following properties:

  • $\small query(x)$ a boolean predicate which returns true IFF $\small x$ is a query.

Concrete Syntax

The definitions below uses both Extended Backus-Naur Form (EBNF)1 and syntax diagrams2 to focus on the concrete syntax as expressed in the text representation. The EBNF definition is somewhat simplified from the grammar used in the ASDI parser although any deviations do not significantly affect the meaning of the language.

Programs

A program consists of a set of facts that comprise the extensional database, a list of rules that comprise the intensional database, and possibly a set of queries to interrogate the result of any reasoning performed over the program.

program

program ::= pragma* ( fact | rule | query )* ;

A program consists of a single file containing facts, rules, and queries as well as any additional files referenced via pragmas.

Facts

Facts must be expressed in the form of ground atoms and so they have a specific rule rather than a constrained form of the atom rule.

fact

fact    ::= predicate ( "(" constant ( "," constant )* ")" )? "." ;

A predicate is the identifier shared by a fact and relation.

predicate

predicate
        ::= LC_ALPHA ( ALPHA | DIGIT | "_" )* ;

Example – the following demonstrates a simple fact denoting that the constant brooke representing some individual is the parent of some individual represented by the constant "Xerces".

parent("Xerces", brooke).

Constant Values

Constants are supported in three types, String, Numbers, and Boolean. Whereas some definitions of Datalog introduce an additional Identifier type, ASDI treats these as short strings that can safely be expressed without quotes; therefore, the values xerces and "xerces" are equivalent.

constant

constant
        ::= string | number | boolean ;

Strings

Strings are described in both the identifier and quoted form in the string rule.

string

string  ::= predicate ( ":" ALPHA ( ALPHA | DIGIT | "_" * )? )
            | DQUOTE [^\u{22}]* DQUOTE ;

Numbers

Numeric values in ASDI take one of three forms, integers, decimal values, and floating point values.

Currently, decimal types are not implemented.

integer

number  ::= float | decimal | integer

Integer values in ASDI are signed, 64-bit values (i64).

integer

integer ::= ( "+" | "-" )? DIGIT+

Decimal values in ASDI are … TBD.

integer

decimal ::= integer "." DIGIT+

Floating point values in ASDI are 65-bit IEEE floats (f64).

integer

float   ::= decimal ( "e" | "E" ) integer

Booleans

Boolean values may also be represented using (down tack \u{22a4}) for true, and (up tack \u{22a5}) for false where this may improve readability.

boolean

boolean ::= ( "true" | "⊤" ) | ( "false" | "⊥" ) ;

Rules

As facts are syntactically distinct from rules in the text representation there is no need for empty bodies – all rules must have at least one literal. Material implication may be written using the Unicode character (long leftwards arrow\u{27f5}).

rule

rule    ::= ( head | "⊥" )? ( ":-" | "<-" | "⟵" ) body "." ;

The head of a rule is a disjunction of atoms, or in the case of a constraint the head may is optional or replaced by the value "⊥".

head

head    ::= ( atom ( ( ";" | "|" | "OR" | "∨" ) atom )* ) ;

The body of a rule is comprised of one, or more, literals.

body

body    ::= literal ( ( "," | "&" | "AND" | "∧" ) literal )* ;

Example – the following sets of rules are equivalent.

ancestor(X, Y) :- parent(X, Y).
ancestor(X, Y) <- parent(X, Y).
ancestor(X, Y) ⟵ parent(X, Y).

movie_star(X) :- star(X)  ,  movie_cast_member(X, _, _).
movie_star(X) :- star(X)  &  movie_cast_member(X, _, _).
movie_star(X) :- star(X) AND movie_cast_member(X, _, _).
movie_star(X) :- star(X)  ∧  movie_cast_member(X, _, _).

Example – as described in the abstract syntax it is an error to use an extensional relation in the head of a rule. The following will generate an error:

parent("Xerces", brooke).

parent(X,Y) :- father(X,Y).

The language feature disjunction corresponds to the language $\small\text{Datalog}^{\lor}$ and allows multiple atoms to appear in the rule’s head with the semantics that these are choices. This syntax will not be accepted unless the feature is enabled.

Example – the following describes the rule that if X is a parent then X is either a father or mother.

.feature(disjunction).

father(X) ;  mother(X) :- parent(X).
father(X) |  mother(X) :- parent(X).
father(X) OR mother(X) :- parent(X).
father(X) ⋁  mother(X) :- parent(X).

As the use of disjunction in this position in the head is inclusive it is considered that any rule as above can be transformed into the following standard form. Clearly, in this case this is not the expected semantics which would require an exclusive disjunction, the language $\small\text{Datalog}^{\oplus}$. Because the semantics may cause such confusion ASDI does not do this transformation by default.

father(X) :- parent(X).
mother(X) :- parent(X).

The language feature constraints corresponds to the language $\small\text{Datalog}^{\Leftarrow}$ and allows the specification of rules with no head. In this case the material implication symbol is required, the falsum value is optional for readability, therefore the following rules are equivalent.

.feature(constraints).

:- alive(X) AND dead(X).
⊥ ⟵ alive(X) ∧ dead(X).

Safety

ASDI will disallow the addition of rules that are unsafe according to the abstract syntax. The following are examples of unsafe rules:

  • a(X) :- b(Y). — because X appears as a distinguished variable but does not appear in a positive relational literal, error HeadVariablesMissingInBody.
  • a(X) :- b(Y), NOT b(X). — because X appears in a negated literal but does not appear in a positive relational literal, error NegativeVariablesNotAlsoPositive.
  • a(X) :- b(Y), X < Y. — Because X appears in an arithmetic literal but does not appear in a positive relational literal, error ArithmeticVariablesNotAlsoPositive.

Atoms

The text representation of an atom is a relatively simple translation from the abstract syntax above.

atom

atom    ::= predicate "(" term ( "," term )* ")" ;

term

term    ::= variable | constant ;

Note that we explicitly separate variables into named and anonymous forms here.

variable

variable
        ::= named-variable | anon-variable ;

named-variable

named-variable
        ::= UC_ALPHA ( ALPHA | DIGIT | "_" )* ;
        
anon-variable
        ::= "_" ;

Example – the following are all valid body atoms.

dead(julius_caesar).
emperor(julius_caesar, rome).
emperor(X, Y).
emperor(X, rome).

Literals

Any valid atom is also a valid positive relational literal. The syntax below also allows for negative literals as well as arithmetic expressions as literals. Conjunction may be written with the Unicode character (logical and \u{2227}).

literal

literal ::= ( "!" | "NOT" | "¬" )? ( atom | comparison ) ;

Example – the following rules are all equivalent.

ancestor(X, Y) ⟵ parent(X, Z)  ,  ancestor(Z, Y).
ancestor(X, Y) ⟵ parent(X, Z)  &  ancestor(Z, Y).
ancestor(X, Y) ⟵ parent(X, Z)  ∧  ancestor(Z, Y).
ancestor(X, Y) ⟵ parent(X, Z) AND ancestor(Z, Y).

The language feature negation corresponds to the language $\small\text{Datalog}^{\lnot}$ and allows the specification of negated literals. Negation may also be written using the Unicode character (full-width not sign \u{ffe2}). The following rules are equivalent.

.feature(negation).

alive(X) :- person(X), NOT dead(X).
alive(X) ⟵ person(X) ∧ ¬dead(X).

The following will fail as the negated rule is not considered safe ([Error::NegativeVariablesNotAlsoPositive]).

.feature(negation).

alive(X) :- NOT dead(X).
alive(X) ⟵ ¬dead(X).

Arithmetic Literals

The language feature comparisons corresponds to the language $\small\text{Datalog}^{\theta}$ and allows the use of arithmetic literals. Comparisons take place between two literals and are currently limited to a set of common operators. Note the addition of a string match operator, this is similar to the Perl =~ and requires a string value/variable on the left and a string value or variable on the right that compiles to a valid Rust regular expression. Finally, the rule named-term disallows the use of anonymous variables in arithmetic literals.

comparison

comparison
        ::= ( named-variable | constant ) operator ( named-variable | constant ) ;

operator

operator
        ::= "="
            | ("!=" | "/=" | "≠")
            | "<"
            | ("<=" | "≤")
            | ">"
            | (">=" | "≥")
            | ("*=" | "≛" | "MATCHES") ;

The Unicode characters (not equal to \u{2260}), (less-than or equal to \u{2264}), (greater-than or equal to \u{2265}, and star equals \u{e2899b}) may be substituted for the common arithmetic and string operators.

All arithmetic operations must be between terms of the some type, such that the property compatible introduce above is defined as:

$$\tag{xvi}\small compatible(\tau_{lhs}, \tau_{rhs}, \theta) \leftarrow \tau_{lhs} = \tau_{rhs}$$

Additionally, some operators are not present for all types, as shown in the table below.

Type=, <, , >,
StringYesYes - lexicalYes
IntegerYesYesNo
BooleanYesNoNo

Example – the following is an example using arithmetic literals and the car relation.

.feature(comparisons).
.assert car(make: string, model: string, age: integer).

antique(X, Y) :- car(X, Y, _) AND X *= "[dD]uesenberg".
antique(X, Y) :- car(X, Y, _) AND Y = "model t".
antique(X, Y) :- car(X, Y, Z) AND Z > 50.

Queries

A query is simply an atom, but one identified to the system as a goal with either the prefix ?- or the suffix ?.

query

query   ::= ( "?-" atom "." ) | ( atom "?" ) ;

Example – the following queries are equivalent and will return the value of the variable X for any facts in the ancestor relationship where the first attribute is the string value "xerces".

?- ancestor(xerces, X).
ancestor(xerces, X)?

When the value _ is used in a query it denotes an attribute of the relation that has no meaning in either the query or the response. For example, in the following query we ask for all values of the model attribute in the car relation where the make is “ford”, and ignore the age entirely.

.assert car(make: string, model: string, age: integer).

car("ford", X, _)?

The results of this query would not include the age column:

+------------+
| model      |
+============+
| edge       |
+------------+
| escort     |
+------------+
| fiesta     |
+------------+
| focus      |
+------------+
| fusion     |
+------------+
| mustang    |
+------------+
     ...

Pragmas

Pragmas are declarative statements meant for the parser and runtime tooling, they do not affect the meaning of the program itself.

pragma

pragma  ::= "." ( feature | assert | infer | fd | input | output ) "." ;

pragma feature

The feature pragma determines which Datalog language is in use. Use of syntax not supported by the selected language feature will result in errors.

feature

feature ::= "feature" "(" feature-id ( "," feature-id )* ")" ;

feature-id

feature-id
        ::= "comparisons"
        | "constraints"
        | "disjunction"
        | "negation"
        | "functional_dependencies"

Example

.feature(negation).
.feature(comparisons, disjunction).

pragma assert

The assert pragma describes a new relation in the extensional database. The parser can determine the schema for facts from their types in the database. The use of this pragma is therefore optional, but recommended.

assert

assert  ::= "assert" predicate "(" attribute-decl ( "," attribute-decl )* ")" ;

attribute-decl

attribute-decl
        ::= ( predicate ":" )? ( "boolean" | "integer" | "string" ) ;

Example

.assert human(name: string).

pragma infer

The infer pragma describes a new relation in the intensional database. Typically the parser can determine the schema for relational literals from their context, The use of this pragma is therefore optional, but recommended. The alternate form is more explicit in that it defines an intensional relation in terms of a previously defined extensional relation.

infer

infer   ::= "infer"
            ( predicate "(" attribute-decl ( "," attribute-decl )* ")" )
            | "from" predicate ;

Example

.infer mortal(name: string).

Alternatively the short-cut form is often more convenient.

.assert human(name: string).
.infer mortal from human.

pragma fd

The fd pragma, short for functional dependency, introduces a relationship between one or more attributes on the left-hand (determinant) side and one or more attributes on the right-hand (dependent) side. This relationship denotes that for a relationship $\small R$ with attributes $\small a_1, \cdots, a_n$, every valid combination of determinant values uniquely determines the value of the dependent values.

Note that this pragma is only valid if the corresponding language feature is enabled.

input

fd      ::= ( "fd" | "functional_dependency" ) 
            predicate ":"
            attribute-index-list ( "-->" | "⟶" ) attribute-index-list

input

attribute-index-list
        ::= attribute-index ( "," attribute-index )*

An attribute index is either an attribute label, if one has been declared for the relation, or an integer $\small i \in [1,|schema(R)|]$.

input

attribute-index
        ::= integer | predicate

Constraints – given the notational form $\small R: \alpha \rightarrow \Beta$;

  1. the initial predicate must be the label of an extensional relation: $$\small R \in D_I \land label(R)$$
  2. the set of attribute identifiers comprising the set $\small \alpha$ must be in $\small R$: $$\small \forall a \in \alpha (a \in schema(R))$$
  3. the set of attribute identifiers comprising the set $\small \Beta$ must be in $\small R$: $$\small \forall a \in \Beta (a \in schema(R))$$
  4. the same attribute identifier must not appear in both determinant and dependent: $$\small \alpha \cap \Beta = \emptyset$$

Example – given the extensional relation employee the two functional dependencies in the following are equivalent. Note that the implementation will ignore such duplicate declarations.

.feature(functional_dependencies).
.assert employee(id:integer, name:string).

.fd employee: id --> name.
.fd employee: 1 ⟶ 2.

pragma input

The input pragma instructs the parser to load facts for the named extensional relation from an external file. This pragma requires that the relation be previously defined via the assert pragma.

input

input   ::= "input" io-details "."

io-details

io-details
        ::= "(" predicate "," quoted-string ( "," quoted-string )? ")" ;

Example

.assert human(name: string).
.input(human, "data/humans.csv", "csv").

pragma output

The output pragma instructs the parser to write facts from the named intensional relation to an external file. This pragma requires that the relation be previously defined via the infer pragma.

output

output  ::= "output" io-details "." ;

Example

.infer mortal(name: string).
.output(mortal, "data/mortals.txt").

Comments

Comments in Datalog are either 1) the % character and continue to the end of the line, or 2) C-style with /* to start and */ to end. These correspond to the same rules as Prolog.

comment

comment ::= "%" [^\r\n]* EOL
            | "/" "*" ( [^\*] | "*"+ [^\*\/] )* "*"+ "/" ;

Example

% Here's a comment
?- ancestor(xerces, X). % and another
?- ancestor(brooke /* and one inline */, X). % and another

Terminal (Lexical) Symbols

The following are lexical rules that can be assumed by the rules above.

EOL     ::= "\n" | "\r\n" | "\r" ;

WHITESPACE
        ::= " " | "\t" | EOL ;
        
DQUOTE
        ::= '"' ;
        
LC_ALPHA
        ::= ? corresponds to the Unicode category 'Ll' (letter, lower case) ? ;
        
UC_ALPHA
        ::= ? corresponds to the Unicode category 'Lu' (letter, upper case) ? ;
        
ALPHA   ::= LC_ALPHA | UC_ALPHA ;

DIGIT   ::= ? corresponds to the Unicode category 'Nd' (decimal number) ? ;

1

For the original description of the EBNF notation as it is used here, please refer to “A.1.1 Notation” in the XQuery recommendation.

2

The syntax, or railroad, diagrams were generated by the bottlecaps online tool from the definition in the appendix.

Language Features

By default, the language supported by ASDI is sometimes termed Pure Datalog. This language allows for positive literals only, and allow for recursion. It does support additional features with feature sets applied to programs. Without a feature specified certain types will fail well-formedness rules, the parser will report errors, and some tools will not execute. However, the enabling of features is relatively simple in both the text representation using the .feature pragma, and using FeatureSets in the API.

Negation

The feature negation enables the negation of individual literals in the body of a rule. This language is often described as $\text{\small{Datalog}}^{\lnot}$.

.features(negation).
alive(X) :- person(X) AND NOT dead(X).
alive(X) :- person(X) ∧ ¬dead(X).

The text representation allows for "!", "¬", and "NOT" to be used to denote negation.

Constraints

The feature constraints enables the specification of rules that have no head, which in turn specifies a rule that may never be true. This language is described herein as $\text{\small{Datalog}}^{\bot}$.

.features(constraints).
⊥ :- dead(X) AND alive(X).
:- dead(X) AND alive(X).

The text allows for either an entirely missing head, or the value "⊥" as the head to denote a constraint. The latter signifies that the rule implies falsity (or absurdity).

Comparisons

The feature comparisons enables the inclusion of literal terms that use standard comparison operators. This language is described herein as $\text{\small{Datalog}}^{\theta}$.

.features(comparisons).
old(X) :- age(X, Y) ∧ Y > 75.

The text representation supports the operators equality ("="), inequality ("!=", "/=", or "≠"), less than ("<"), less than or equal-to ("<=", or "≤"), greater than (">"), and greater than or equal-to (">=", or "≥").

Disjunction

The feature disjunction enables the negation of individual literals in the body of a rule. This language is often described as $\text{\small{Datalog}}^{\lor}$.

.features(disjunction).
mother(X, Y) OR father(X, Y) :- parent(X, Y).
mother(X, Y) ∨ father(X, Y) :- parent(X, Y).

The text representation allows for ";", “|”, “∨”, and “OR”` to be used to denote disjunction.

Example

The following demonstrates the text representation support for enabling features.

.features(negation, comparisons, disjunction).

Similarly, the following API example shows how to create a feature set that may be added to a program during creation.

#![allow(unused)]
fn main() {
use asdi::features::{
    FEATURE_COMPARISONS, FEATURE_DISJUNCTION, FEATURE_NEGATION, FeatureSet
};

let features = FeatureSet::from(vec![FEATURE_NEGATION, FEATURE_DISJUNCTION]);

assert!(features.supports(&FEATURE_NEGATION));
assert!(!features.supports(&FEATURE_COMPARISONS));

assert_eq!(features.to_string(), ".feature(negation, disjunction).");
}

Model API

This chapter will walk through various parts of the ASDI crate, not as an API reference, which is to be found in the rust doc, but at a higher level and so some specific details not be covered here.

The following is a representation of the module structure, note that this isn’t quite the same as the book organization shown on the left.

Module Structure

3.1: crate module structure.

Programs

A Program is the key entry point in the crate, and it is provided in the crate’s root. The following diagram shows the content of the root, with the program class in bold. As you can see the program class is an aggregate of other classes in modules edb, features, idb, and idb::query.

Library Top Level

3.2: library top level

Traits

The crate root also includes a number of common traits. In the following diagram the details of these traits are shown, eliding a number of the classes from the figure above.

Program

3.3: common traits, expanded

Program

The following is an expanded view of the Program struct from figure 3.2. Note that, in this diagram, the fill color for some classes was set to white. This is to indicate that while they are still important they are of secondary importance to the current discussion.

Program

3.4: Program, expanded

Program Features

features module

3.5: the features module

Feature Details

The following table lists the labels and symbols for each Feature currently supported in ASDI.

Language FeatureLabel StringSymbol Char
FEATURE_COMPARISONS“comparisons”‘θ’
FEATURE_CONSTRAINTS“constraints”‘⇐’
FEATURE_DISJUNCTION“disjunction”‘∨’
FEATURE_NEGATION“negation”‘¬’
FEATURE_FUNCTIONAL_DEPENDENCIES“functional_dependencies”‘⟶’

Program Parser

parse module

3.6: the parser module

Program Visitor

visitor module

3.7: the visitor module

Relations

Extensional Relations

Relation Input/Output

io module

Rules

idb module

Evaluation

idb::eval module

Stratification

strata module

Queries

idb::query module

Relational Algebra

idb::query::relational module

Error Handling

error module

Model Extension

RelationSet I/O

Reading Resource Sets

I/O extension point

Writing Resource Sets

I/O extension point

Writing an Evaluator

Evaluator extension point

Using the PrecedenceGraph

PrecedenceGraph

Using the StratifiedProgram

StratifiedProgram

Writing a Formatter

Using the Visitor API

Visitor details

Contributing

The latest version of this document can be found on GitHub as CONTRIBUTING.md.

I’m really glad you’re reading this, because we need volunteer developers to help this project continue to grow and improve.

  1. file bugs and enhancement requests
  2. review the project documentation know if you find are issues, or missing content, there
  3. Fix or Add something and send us a pull request; you may like to pick up one of the issues marked help wanted or good first issue as an introduction. Alternatively, documentation issues can be a great way to understand the project and help improve the developer experience.

Submitting changes

We love pull requests from everyone. By participating in this project, you agree to abide by our code of conduct, and License.

Fork, then clone the repo:

git clone git@github.com:johnstonskj/{{repository-name}}.git

Ensure you have a good Rust install, usually managed by Rustup. You can ensure the latest tools with the following:

rustup update

Make sure the tests pass:

cargo test --package {{package-name}} --no-fail-fast --all-features -- --exact

Make your change. Add tests, and documentation, for your change. Ensure not only that tests pass, but the following all run successfully.

cargo doc --all-features --no-deps
cargo fmt
cargo clippy

If you made changes to the book source, ensure the following runs successfully

mdbook build

If you have made any changes to Cargo.toml, also check:

cargo outdated
cargo audit

Push to your fork and submit a pull request using our template.

At this point you’re waiting on us. We like to at least comment on pull requests within three business days (and, typically, one business day). We may suggest some changes or improvements or alternatives.

Some things that will increase the chance that your pull request is accepted:

Coding conventions

The primary tool for coding conventions is rustfmt, and specifically cargo fmt is a part of the build process and will cause Actions to fail.

DO NOT create or update any existing rustfmt.toml file to change the default formatting rules.

DO NOT alter any warn or deny library attributes.

DO NOT add any feature attributes that would prohibit building on the stable channel. In some cases new crate-level features can be used to introduce an unstable feature dependency but these MUST be clearly documented and optional.

Appendix: Abstract Graphical View

The following graph is an alternative representation of the abstract syntax. It represents leaf values (constants, variables, …) in rectangles and composite values as ellipses. Additionally, where a composite is defined as $\small A \oplus B$ a small filled diamond shape represents the exclusive or relationship. Finally, some edges in the graph are labeled with “*”, “+”, and “?” which are the common cardinality notation used in regular expressions and BNF notation.

Graphical View

  1. The edge between rule and head has the label “?/*” as it has differing cardinality under $\small\text{Datalog}$, $\small\text{Datalog}^{\lor}$, and $\small\text{Datalog}^{\bot}$.
  2. The edge between literal and negated? is labeled as “?” as it is necessary under $\small\text{Datalog}^{\lnot}$ but not under $\small\text{Datalog}$.
  3. The edge from the choice between literal and comparison is labeled as “?” as it is necessary under $\small\text{Datalog}^{\theta}$ but not under $\small\text{Datalog}$.
  4. The two dashed lines represent the following constraints.
    1. Between relation and predicate to represent the fact that the predicate for a relation may be derived from the predicate of the atoms within it (or vice versa).
    2. Between head and body to represent the fact that while both are optional for a rule, one or other must be present.

GraphViz Source

The following is the source for the graph above.

digraph G {
  rankdir=LR;
  pad=0.1;
  splines=true;

  negated [label="negated?"; shape=box; width=0.1; height=0.1];
  boolean [shape=box; width=0.1; height=0.1];
  string [shape=box; width=0.1; height=0.1];
  integer [shape=box; width=0.1; height=0.1];
  variable [shape=box; width=0.1; height=0.1];
  anonymous [shape=box; width=0.1; height=0.1];
  predicate [shape=box; width=0.1; height=0.1;];

  program [root=true];

  program -> edb;
  program -> idb;
  edb -> relation [label="*"];
  idb -> relation [label="*"];
  idb -> rule [label="*"];
  rule -> head [label="?/*"];
  head -> atom;
  rule -> body [label="?"];
  body -> literal [label="+"];

  head -> body [arrowhead=none;style=dashed;label="|head|+|tail|>=1"];

  literal -> xor3;
  literal -> negated [label="?"];
  xor3 -> atom;
  xor3 -> comparison [label="?"];

  comparison -> term [label="lhs"];
  comparison -> term [label="rhs"];
  comparison -> operator;

  relation -> predicate [style=dashed];
  relation -> atom [label="*"];
  atom -> term [label="+"];
  atom -> predicate;

  term -> xor2;
  xor2 -> constant;
  xor2 -> variable;
  xor2 -> anonymous;

  xor1 [shape=diamond,style=filled,label="",height=.1,width=.1];
  xor2 [shape=diamond,style=filled,label="",height=.1,width=.1];
  xor3 [shape=diamond,style=filled,label="",height=.1,width=.1];

  constant -> xor1;
  xor1 -> integer;
  xor1 -> string;
  xor1 -> boolean;
}

This file is accessible directly here.

Appendix: Datalog EBNF

The following is the EBNF definition of the Datalog native text representation. Note that this is simplified somewhat from the actual parser specification ASDI uses although in practice the differences are irrelevant.

program ::= pragma* ( fact | rule | query )* ;

/* ************************************************************************* */

fact    ::= predicate ( "(" constant ( "," constant )* ")" )? "." ;

predicate
        ::= LC_ALPHA ( ALPHA | DIGIT | "_" )* ;

/* ************************************************************************* */

constant
        ::= string | number | boolean ;

string  ::= predicate ( ":" ALPHA ( ALPHA | DIGIT | "_" * )? )
            | DQUOTE [^#x22]* DQUOTE ;

number  ::= float | decimal | integer ;

integer ::= ( "+" | "-" )? DIGIT+ ;

decimal ::= integer "." DIGIT+ ;

float   ::= decimal ( "e" | "E" ) integer ;

boolean ::= ( "true" | "⊤" ) | ( "false" | "⊥" ) ;

/* ************************************************************************* */

rule    ::= ( head | "⊥" )? ( ":-" | "<-" | "⟵" ) body "." ;

head    ::= ( atom ( ( ";" | "|" | "OR" | "∨" ) atom )* ) ;

body    ::= literal ( ( "," | "&" | "AND" | "∧" ) literal )* ;


/* ************************************************************************* */

atom    ::= predicate "(" term ( "," term )* ")" ;

term    ::= variable | constant ;

variable
        ::= named-variable | anon-variable ;

named-variable
        ::= UC_ALPHA ( ALPHA | DIGIT | "_" )* ;

anon-variable
        ::= "_" ;

/* ************************************************************************* */

literal ::= ( "!" | "NOT" | "¬" )? ( atom | comparison ) ;

/* ************************************************************************* */

comparison
        ::= operand operator operand ;

operand ::= ( named-variable | constant ) ;

operator
        ::= "="
            | ("!=" | "/=" | "≠")
            | "<"
            | ("<=" | "≤")
            | ">"
            | (">=" | "≥")
            | ("*=" | "≛" | "MATCHES") ;

/* ************************************************************************* */

query   ::= ( "?-" atom "." ) | ( atom "?" ) ;

/* ************************************************************************* */

pragma  ::= "." ( feature | assert | infer | fd | input | output ) "." ;

feature ::= "feature" "(" feature-id ( "," feature-id )* ")" ;

feature-id
        ::= "comparisons"
        | "constraints"
        | "disjunction"
        | "negation"
        | "functional_dependencies" ;

assert  ::= "assert" predicate "(" attribute-decl ( "," attribute-decl )* ")" ;

attribute-decl
        ::= ( predicate ":" )? ( "boolean" | "integer" | "string" ) ;

infer   ::= "infer"
            ( predicate "(" attribute-decl ( "," attribute-decl )* ")"
            | "from" predicate ) "." ;

fd      ::= ( "fd" | "functional_dependency" )
            predicate ":"
            attribute-index-list ( "-->" | "⟶" ) attribute-index-list ;

attribute-index-list
        ::= attribute-index ( "," attribute-index )* ;

attribute-index
        ::= integer | predicate ;

input   ::= "input" io-details ;

io-details
        ::= "(" predicate "," quoted-string ( "," quoted-string )? ")" ;

output  ::= "output" io-details ;

/* ************************************************************************* */

comment ::= "%" [^\r\n]* EOL
            | '/*' ( [^*] | '*'+ [^*/] )* '*'* '*/' ;

/* ************************************************************************* */

EOL     ::= "\n" | "\\r\\n" | "\r" ;

WHITESPACE
        ::= " " | "\t" | EOL ;

DQUOTE  ::= #x22 ;

LC_ALPHA
        ::= ? corresponds to the Unicode category 'Ll' ? ;

UC_ALPHA
        ::= ? corresponds to the Unicode category 'Lu' ? ;

ALPHA   ::= LC_ALPHA | UC_ALPHA ;

DIGIT   ::= ? corresponds to the Unicode category 'Nd' (decimal number) ? ;

This file is accessible directly here.

Relational Algebra Mapping

Every expression in the basic relational algebra can be expressed as a $\small\text{Datalog}$ query. However, operations in the extended relational algebra (grouping, aggregation, and sorting) have no corresponding capability in $\small\text{Datalog}$. Similarly, $\small\text{Datalog}$ can express recursion, which the relational algebra cannot. The following describes the foundational relational operations and their $\small\text{Datalog}$ equivalent.

The semantics of $\small\text{Datalog}$ relations are primarily set based we would expect common set operations to work in $\small\text{Datalog}$ rules. Given the relations R, $\small r(a,b,c)$, and S, $\small s(d,e,f)$, we can easily describe the common set operations.

Union ($\small\cup$)

Union is a binary operator that is written as $\small R \cup S$, where $\small R$ and $\small S$ are relations.

For set union and set difference, the two relations involved must be union-compatible -— that is, the two relations must have the same set of attributes. A strict form of this rule does not allow for same-named attributes whereas a looser form allows duplicate names IFF they have identical types. Because set intersection is defined in terms of set union and set difference, the two relations involved in set intersection must also be union-compatible.

Consider the expression $\small U \coloneqq R \cup S$, which can be expressed as follows:

u(X, Y, Z) :- r(X, Y, Z).
u(X, Y, Z) :- s(X, Y, Z).

Difference ($\small\setminus$)

Difference is a binary operator that is written as $\small R \setminus S$, where $\small R$ and $\small S$ are relations. This cannot be implemented in $\small\text{Datalog}$ as it required negation.

Consider the expression $\small D \coloneqq R \setminus S$ which can be expressed in $\small\text{Datalog}^{\lnot}$ as follows:

d(X,Y,Z) :- r(X,Y,Z) AND NOT s(X,Y,Z).

Intersection ($\small\cap$)

Intersection is a binary operator that is written as $\small R \cap S$, where $\small R$ and $\small S$ are relations.

Consider the expression $\small I \coloneqq R \cap S$, which can be expressed as follows:

i(X,Y,Z) :- r(X,Y,Z) AND s(X,Y,Z).

Intersection may be defined in terms of the difference operation, as below.

$$\small R \cap S \enspace\equiv\medspace R \setminus (R \setminus S)$$

Cartesian Product ($\small\times$)

Cartesian Product is a binary operator that is written as $\small R \times S$, where $\small R$ and $\small S$ are relations. In relational algebra it is required that the two relations involved must have disjoint headers—that is, they must not have a common attribute name.

The following is a valid expression of $\small P \coloneqq R \times S$:

p(A, B, C, D, E, F) :- r(A, B, C) AND s(D, E, F).

Projection ($\small\Pi$)

Projection is a unary operation written as $\small\Pi_{a_{1},\ldots ,a_{n}}(R)$ where $a_{1},\ldots ,a_{n}$ is a set of attribute names. The result of such projection is defined as the set that is obtained when all tuples in $R$ are restricted to the set $\small\lbrace a_{1},\ldots ,a_{n}\rbrace$. In some literature the lower case $\small\pi$ is used instead of $\small\Pi$.

For example, the projection $\small P \coloneqq \Pi_{X} (R)$, a projection of the first attribute in R only, can be expressed in $\small\text{Datalog}$ as either of the following equivalent rules.

p(X) :- r(X, Y, Z).
p(X) :- r(X, _, _).

Generalized Selection ($\small\sigma$)

As defined by Codd, selection is written as $\small \sigma_{a\theta b}(R)$ or $\small \sigma_{a\theta v}(R)$ where:

  • $\small a$ and $\small b$ are attribute names,
  • $\small\theta$ is a binary operation, where $\small\theta\in\lbrace =, \neq, <, \leq, >, \geq \rbrace$,
    • ASDI adds the non-standard string match operator $\small\overset{\star}{=}$ to the set $\small\theta$,
  • $\small v$ is a constant value,
  • $\small R$ is a relation,

The selection $\small\sigma_{a\theta b}(R)$ denotes all tuples in $\small R$ for which $\small\theta$ holds between the $\small a$ and the $\small b$ attribute.

The selection $\small\sigma_{a\theta v}(R)$ denotes all tuples in $\small R$ for which $\small\theta$ holds between the $\small a$ attribute and the value $\small v$.

Selection requires arithmetic literals and therefore the languages $\small\text{Datalog}^{\theta}$ and $\small\text{Datalog}^{\lnot}$ for negation.

Generalized Selection is a unary operation written as $\small\sigma_{\varphi}(R)$ where $\small\varphi$ is a propositional formula that consists of conditions as allowed in the normal selection and the logical operators $\small\land$ (and), $\small\lor$ (or) and $\small\lnot$ (negation). This selection selects all those tuples in $\small R$ for which $\small\varphi$ holds.

In $\small\text{Datalog}^{\theta}$ the selection $\small L \coloneqq \sigma_{X>100 \land Y=‘something’} (R)$ can be expressed as follows, where both rules are equivalent.

l(X, Y, Z) :- r(X, Y, Z) AND X > 100 AND Y = something.
l(X, Y, Z) :- r(X, something, Z) AND X > 100.

Rename ($\small\rho$)

Rename is a unary operation written as $\small\rho_{a/b}(R)$ where the result is identical to $\small R$ except that the $\small b$ attribute in all tuples is renamed to an $\small a$ attribute. This is simply used to rename the attribute of a relation or the relation itself. There is also the $\small\rho_{x(A_{1},\ldots, A_{n})}(R)$ notation, where $\small R$ is renamed to $\small x$ and the attributes $\small\lbrace a_{1},\ldots, a_{n}\rbrace$ are renamed to $\small\lbrace A_{1}, \ldots, A_{n}\rbrace$.

TBD

Theta Join ($\small\Join_{\theta}$)

Theta Join is a binary operator that is written as $\small{R \Join S} \atop {a \theta b}$ and $\small{R \Join S} \atop {a \theta v}$ or alternatively as $\small R \Join_{a \theta b} S$ and $\small R \Join_{a \theta v} S$, where $\small R$ and $\small S$ are relations, and the expressions $\small a\theta b$ and $\small a\theta v$ should be interpreted in the same way as for selection.

A generalized Theta Join can be described following the convention of generalized selection where $\small\theta$ expands into a propositional formula in the same manner as $\small\varphi$. Unfortunately there is no notational alignment where a generalized theta join might be signified as $\small\Join_{\varphi}$

For example, the natural join $\small J \coloneqq R \Join_{S.X>100} S$, a join conditional join on an attribute in S can be expressed in $\small\text{Datalog}^{\theta}$ as follows.

j(X,Y,Z) :- r(X,Y,Z) AND s(Xs,Y,Z) AND Xs > 100.

It is possible to dispense with the theta join in most cases as it can be expressed in terms of the selection and cartesian product operators described above.

$$\small R \Join_{\theta} \enspace\equiv\enspace \sigma_{\theta}(R \times S)$$

An Equi-Join is a special case of the Theta Join where $\small\theta$ contains only equalities.

Natural Join ($\small\Join$)

Natural Join is a binary operator that is written as $\small R \Join S$, where $\small R$ and $\small S$ are relations. The result of the natural join is the set of all combinations of tuples in $\small R$ and $\small S$ that are equal on their common attribute names. The natural join is arguably one of the most important operators since it is the relational counterpart of the logical AND operator.

For example, the natural join $\small J = R \Join S$, a join on the first two attributes of R and S, can be expressed in $\small\text{Datalog}$ as follows. Note that the common attributes are defined by the names of their variables, not the underlying names of the attribute schema.

j(X,Y,Z,Q) :- r(X,Y,Z) AND s(X,Y,Q).

A Natural Join is a special case of the Equi-Join where equality operations are performed on all common attributes.

Complex Expressions

More complex examples can the be made from combining relational operators. The relational query $\small A \coloneqq \Pi_{X}(\sigma_{Y = 3} (R) )$ becomes

a(X) :- r(X, 3, _).

Similarly, the relational query $\small A \coloneqq \Pi_{X}(\sigma_{Y = 3} (R) \Join_{R.X=S.X}\sigma_{Y = 5} (S))$ becomes

a(X) :- r(X, 3, _) AND s(X, 5, _).

Recursion

Although relational algebra seems powerful enough for most practical purposes, there are some simple and natural operators on relations that cannot be expressed by relational algebra. One of them is the transitive closure of a binary relation. Given a domain $\small D$, let binary relation $\small R$, $\small \text{Datalog } r(x, y)$, be a subset of $\small D\times D$. The transitive closure $\small R^{+}$ of $R$ is the smallest subset of $\small D\times D$ that contains $R$ and satisfies the following condition:

$$\small\forall x\forall y\forall z\left((x,y)\in R^{+}\land (y,z)\in R^{+}\Rightarrow (x,z)\in R^{+}\right)$$

In Datalog this form of recursion is naturally expressed, as follows.

r_plus(X, Z) :- r(X, Y), r_plus(Y, Z).

Acknowledgements

Examples in this section are taken from Introduction to Data Management CSE 344, Lecture 10: DatalogBalaz12, and definitions for the relational algebra are taken from Relational algebraWikiRelalg. The Relational Algebra Query ConverterQConv is a useful tool to take a SQL query and convert to relational algebra which can then be converted to Datalog using the examples above.

Appendix: Relational Algebra EBNF

program
        ::= assignment | expression ;

assignment
        ::= identifier ( "≔" | ":=" ) expression ;

expression
        ::=  relation-identifier
          |  union
          |  intersection
          |  difference
          |  cartesian-product
          |  selection
          |  projection
          |  rename
          |  join
          |  "(" expression ")" ;

/* ************************************************************************* */

union   ::= expression ( "∪" | "union" ) expression ;

intersection
        ::= expression ( "∩" | "intersect" ) expression ;

difference
        ::= expression ( "∖" | "-" | "diff" ) expression ;

cartesian-product
        ::= expression ( "⨯" | "×" | "product" ) expression ;

selection
        ::= expression
            ( "σ" | "select" )
            "[" criteria ("," criteria )* "]"
            expression ;

criteria
        ::= ( attribute-identifier | literal )
            operation
            ( attribute-identifier | literal ) ;

operation
        ::= "=" | "!=" "<" | "<=" | ">" | ">=" ;

projection
        ::= expression
            ( "Π" | "π" | "project" )
            "[" attribute-identifier ("," attribute-identifier )* "]"
            expression ;

rename  ::= expression
            ( "ρ" | "rename" )
            "[" rename-pair ("," rename-pair )* "]"
            expression ;

rename-pair
        ::= attribute-identifier "=" attribute-identifier ;

join    ::= expression
            ( natural-join | theta-join )
            expression ;

natural-join
        ::= "⨝" | "natural-join" ;

theta-join
        ::= ( "⨝" | "theta-join" )
            "[" criteria  ("," criteria )* "]" ;

/* ************************************************************************* */

relation-identifier
        ::= ALPHA ( ALPHANUM | "_" )* ;

attribute-identifier
        ::= relation-identifier
            ( "." ( relation-identifier | integer ) ) ;

/* ************************************************************************* */

literal ::= integer | string | boolean ;

integer ::= DIGIT+ ;

string  ::= QMARK [^#x22]* QMARK ;

boolean ::= "true" | "false" ;

QMARK  ::= #x22 ;

This file is accessible directly here.

Appendix: First-Order Logic & Horn Clauses

This section does not provide a primer on propositional logic, or first-order predicate logic. However, some background may be required and this section will briefly outline key terms, we refer the reader to AbHuVi94, chapter 2.

Propositional logic

For Propositional logic we define a language comprising the following:

  1. An infinite set of propositional variables, $\small \mathcal{V}$.
  2. The constant values true and false.
  3. Propositional formulae $\small\varphi$, comprising variables, constants, the unary connective negation ($\small\lnot$) and the following binary connectives; disjunction ($\small\lor$), conjunction ($\small\land$), implication ($\small\rightarrow$), and equivalence ($\small\leftrightarrow$).

A literal is a (sub-)formula comprising a single variable or constant that may be negated, i.e. $\small p$ or $\small \lnot p$.

A propositional formula $\small\varphi$ is in conjunctive normal form (CNF) if it has the form

$$\tag{i}\small\psi_1 \land \cdots \land \psi_n$$

where each formula $\small\psi$ is a disjunction of literals.

First-Order Predicate logic

First-Order Predicate logic is a more general language than propositional logic allowing predicate symbols that range over n-ary relationships. A first-order language $\small L$ is differentiated by the set of predicate symbols, constant values, and functions it chooses to allow.

  1. An infinite set of variables $\small \mathcal{V}$.
  2. A set of constant values, $\small \mathcal{C}$, usually including true and false.
  3. A set of n-ary predicate symbols, $\small \mathcal{P}$.
  4. A set of n-ary function symbols, $\small \mathcal{F}$.
  5. Well-formed formulae (WFF) comprising variables, constants, the unary connective negation ($\small\lnot$) and the following binary connectives; disjunction ($\small\lor$), conjunction ($\small\land$), implication ($\small\rightarrow$), and equivalence ($\small\leftrightarrow$).
  6. The univeral quantifier $\small\forall$ and the existential quantifier $\small\exists$.

An atom corresponds to the a propositional variable and is either true, false, or $\small p(t_1, \cdots, t_n)$, where $\small p$ is a predicate symbol and $\small t_1, \cdots, t_n$ are terms.

A term is either a variable, constant value, or function symbol.

Additionally, the language may include equality so as to allow atoms of the form $\small t_1 = t_2$. Note that AbHuVi94 uses $\small t_1 \approx t_2$.

Horn Clauses

A Clause is simply a disjunctions of positive $\small a_1, \cdots, a_m$ and negative $\small b_1, \cdots, b_n$ literals and may be written in either of the following forms.

$$\tag{ii}\small a_1 \lor \cdots \lor a_m \lor \lnot b_1 \lor \cdots \lor \lnot b_n$$

$$\tag{iii}\small \forall x_1, \cdots, x_k (a_1 \lor \cdots \lor a_m \lor \lnot b_1 \lor \cdots \lor \lnot b_n)$$

Where the latter form introduces the set of variables used in the clause. This is often removed for clarity as the universal quantification is assumed for any variable present. This can also be written in clausal form, something that looks a lot like Datalog.

$$\tag{iv}\small a_1, \cdots, a_m \leftarrow b_1, \cdots, b_n$$

A Horn Clause is a formula consisting of a disjunction of literals of which at most one is positive.

$$\tag{vi}\small a \lor \lnot b_1 \lor \cdots \lor \lnot b_n$$

or, in clausal form,

$$\tag{vii}\small a_1 \leftarrow b_1, \cdots, b_n$$

The following table describes additional clause forms along with their equivalent Datalog form.

PropositionalForm/DatalogClausal
$\small a_1(x,y) \lor a_2(x,y) \lor \lnot b_1(x,z) \lor \lnot b_2(z,y)$indefinite/disjunctive$\small a_1(x,y), a_2(x,y) \leftarrow b_1(x,z), b_2(z,y)$
$\small a(x,y) \lor \lnot b_1(x,z) \lor \lnot b_2(z,y)$definite/pure1$\small a(x,y) \leftarrow b_1(x,z), b_2(z,y)$
$\small b(x,y)$unit/fact2$\small b(x,y) \leftarrow$
$\lnot b_1(x,z) \lor \lnot b_2(z,y)$goal/constraint$\small \leftarrow b_1(x,z), b_2(z,y)$
$\small false$ or $\small\bot$empty$\small \square$

Mapping from Datalog to First-Order Logic

The purpose of the descriptions above was to allow for the mapping of Datalog into a first-order language.

Given a Datalog program $\small P=( D_E, D_I, Q )$ we can construct a new first-order language $\small L=(\mathcal{C},\mathcal{P},\mathcal{F})$ in the following manner.

  1. $\mathcal{C}$ is comprised the set of all constant values in any EDB or IDB relation. $$\small \lbrace c | c \in \bigcup \lbrace terms(atoms(r)) | r \in extensional(P) \cap intensional(P) \rbrace \rbrace$$
  2. $\mathcal{P}$ is comprised the set of labels for all EDB and IDB relations. $$\small \lbrace label(r) | r \in extensional(P) \cap intensional(P) \rbrace$$
  3. In $\small\text{Datalog}$, which does not allow functions, $\mathcal{F}=\empty$. However, in $\small\text{Datalog}^{\theta}$ which allows certain operators in arithmetic literals $\mathcal{F}=\theta$.
  4. For each rule $\small r \in rules(P)$:

TBD


1

In Datalog a pure rule must also follow the safety constraint that all variables that appear in the head must also appear in the body.

2

In Datalog a fact must also be ground.

Appendix: References

  1. AbHuVi94Abiteboul, S., Hull, R., & Vianu, V. Foundations of Databases: The Logical Level. (Pearson, 1995). webdam.inria.fr/Alice/.
  2. Balaz12Balazinska, M. CSE 344: Introduction to Data Management. Lecture 10: Datalog. (University of Washington, 2012). Retrieved January 24, 2022, from courses.cs.washington.edu.
  3. CeGoTa90Ceri, S., Gottlob, G., & Tanca, L. Logic Programming and Databases. (Springer Berlin Heidelberg, 1990). DOI: 10.1007/978-3-642-83952-8.
  4. ChVa97Chaudhuri, S., and Vardi, M. Y. (1997). On the Equivalence of Recursive and Nonrecursive Datalog Programs. Journal of Computer and System Sciences, 54(1), 61–78. DOI: 10.1006/JCSS.1997.1452.
  5. Colomb98Colomb, R. M. Deductive Databases and their Applications. (Taylor & Francis, 1998).
  6. GreMol15Greco, S. & Molinaro, C. Datalog and Logic Databases. (Morgan & Claypool Publishers, 2015).
  7. Mill1851Mill, J. S. A System of Logic, Ratiocinative and Inductive, Being a Connected View of the Principles of Evidence, and the Methods of Scientific Investigation. (John W. Parker, 1851).
  8. QConvGrammatical Framework. (March 1, 2018). Query-converter – SQL to Relational Algebra. Retrieved January 24, 2022, from grammaticalframework.org.
  9. Shep88Shepherdson, John C. Negation in Logic Programming in Foundations of Deductive Databases and Logic Programming. (1988).
  10. WikiDatalogDatalog. (January 9, 2022). In Wikipedia. Retrieved January 24, 2022, from en.wikipedia.org/Datalog.
  11. WikiRelalgRelational algebra. (December 1, 2021). In Wikipedia. Retrieved January 24, 2022, from en.wikipedia.org/Relational_algebra.
  12. Wilfrid97Hodges, W. A Shorter Model Theory. 1997. Cambridge University Press, USA.
  13. Wilfrid22Hodges, W. Model Theory. in The Stanford Encyclopedia of Philosophy (ed. Zalta, E. N.) (2022). plato.stanford.edu/archives/spr2022/entries/model-theory/