Semi-interval Comparison Constraints in Query
Containment and Their Impact on Certain Answer
Computation
Foto N. Afrati
National Technical University of Athens
Matthew Damigos
Ionian University, Corfu
Abstract
We consider conjunctive queries with arithmetic comparisons (CQAC) and
investigate the computational complexity of the problem: Given two CQAC
queries,QandQ′, isQ′contained inQ? We know that, for CQAC queries,
the problem of testing containment isΠp
2-complete. However, there are broad
classes of queries with semi-interval arithmetic comparisons in the containing
query that render the problem solvable in NP. In all cases examined the con-
tained query is allowed to be any CQAC. Interestingly, we also prove that there
are simple cases where the problem remainsΠp
2-complete.
We also investigate the complexity of computing certain answers in the
framework of answering CQAC queries with semi-interval comparisons using
any CQAC views. We prove that maximally contained rewritings in the lan-
guage of union of CQACs always compute exactly all certain answers. We find
caseswherewecancomputecertainanswersinpolynomialtimeusingmaximally
contained rewritings.
Keywords:query containment, query rewriting, computing certain answers,
conjunctive queries with arithmetic comparisons, complexity of query
Email addresses:afrati@gmail.com(Foto N. Afrati),mgdamig@gmail.com(Matthew
Damigos)arXiv:2509.10138v2  [cs.DB]  17 Sep 2025
containment, maximally contained rewritings
1. Introduction
Aconjunctivequerywitharithmeticcomparisons(CQAC)isaselect-project-
join query in SQL. Query containment and query equivalence play a prominent
role in efficient query processing, e.g., for minimizing the number of joins in a
query. Query equivalence can be reduced to a query containment problem. Data
integration is often put into the framework of answering queries using views via
contained rewritings that are found based on properties of query containment.
Recently, the problem of determinacy has attracted the interest of researchers
and query containment tests offer tools for its investigation.
For conjunctive queries (CQs), the query containment problem is shown to
be NP-complete [1]. Membership in NP is proven via a containment mapping
from the variables of one query to the variables of the other which preserves
the relation symbol, i.e., it is a homomorphism. For conjunctive queries with
arithmetic comparisons the query containment problem isΠp
2-complete [2, 3].
We denote conjunctive queries with arithmetic comparisons (CQAC) byQ=
Q0+βwhereQ 0denotestherelationalsubgoalsandβthearithmeticcomparison
subgoals. The containment test now uses all containment mappingsµ 1, . . . , µ k
from the variables of one query to the variables of the other. The containment
test decides whetherQ 2⊑Q 1by checking whether the following containment
entailment is true [4, 5]:
ϕ:β 2⇒µ 1(β1)∨ ··· ∨µ k(β1).1
Previous work has considered CQACs with only left semi-interval arithmetic
comparisons or only right semi-interval ACs [2, 6]. A left semi-interval (LSI)
AC is an AC of typevar≤constorvar < const, wherevaris a variable
andconstis a constant and a right semi-interval (RSI) AC is an AC of type
1we assume the queries are normalized; see definition shortly.
2
Contained Query Containing Query normComplexity Reference
OSI and̸= OSI n/a Πp
2-complete Theorem 5.1
LSI OLSI, constant n/a Πp
2-complete Theorem 5.2
CLSI,̸= OLSI n/a Πp
2-complete Theorem 5.2
̸= ̸= n/a Πp
2-complete Theorem 5.2, [8]
any one AC no NP Theorem 6.1
any CLSI no NP/HP Theorem 8.6, [7]
anycLSI no NP/HP Theorem 8.6, [7]
SI CLSI 1CRSI yes NP [6]
any closed ACs CLSI 1CRSI no NP [9]
any CLSI 1CRSI no NP Theorem 8.8
anycCLSI 1ORSI no NP Theorem 8.8
anycOLSI 1ORSI no NP Theorem 8.8
anycOLSI 1CRSI no NP Theorem 8.8
Table 1: Complexity of query containment.
var≥constorvar > const. The LSI AC with≤is called closed (CLSI) and the
LSI AC with<is called open (OLSI) and similarly for RSI, we define ORSI and
CRSI. Klug in [2] noticed that if only LSI (RSI respectively) are used then one
containment mapping suffices to prove query containment. Further work in [7]
considers certain broader classes of CQACs with LSI (RSI respectively) and
shows that a single containment mapping suffices to prove query containment
(in this case, we say that the homomorphism property holds). In [6] a more
elaborate approach that works on the containment entailment is taken to prove
that for certain cases of queries with both LSI and RSI ACs, we can check
containment in non-deterministic polynomial time although we may need more
than one containment mappings to make the containment entailment true.
In data integration, in the local-as-view approach, views are maintained over
the sources in order to be used to answer queries. Query answering is usually
made possible via a rewriting that expresses the query in terms of the views.
3
Since the views do not provide all the information that the base relations that
form the query would require, we are looking into computing certain answers,
i.e., alltheanswersthatwearecertainwouldbeintheanswersofthequerygiven
a specific view instance. For conjunctive queries, it is shown that maximally
contained rewritings (MCR) in the language of union of conjunctive queries
compute all certain answers in polynomial time [10]. For CQAC queries and
views, this may not always be the case. There is a large body of work on
the topic of obtaining MCRs for certain cases [11]. Recent work developed an
implementation for computing certain answers [12]. In [18], the problem of
finding certain answers is proven coNP-hard when the queries have inequalities
(̸=) and the views are CQs.
WhenonlyCLSIareusedinthecontainingquerythecontainmentproblemis
inNPviathehomomorphismproperty. Interestingly, forthecasethecontaining
query uses only OLSIs, we prove here that the containment problem is stillΠp
2-
complete. However, to prove the hardness result we need to use queries which
share constants in a rather intricate way, thus creating a dichotomy which leaves
aclasswiththe“majority” ofthequeriestobesuchthatquerycontainmentisan
NP problem. In fact, the subcase for which complexity is in NP, membership in
NP is shown via the homomorphism property, i.e., one mapping suffices to prove
containment. The only other result we are aware of concerns CQAC queries that
use only̸=comparisons [8] where a boundary is delineated depending on the
number of relational subgoals of the query.
The following example shows two queries that belong in the class for which
containment checking is not an NP problem:
Q1() :−a(X,5), X <5Q 2() :−a(X,5), a(Y, X), X≤5, Y <5
The above example shows another challenge that we need to take into ac-
count before we find all the containment mappings to use in the containment
entailment: we need to normalize the queries. A CQAC query is normalized if
each variable appears only once in relational subgoals and to compensate for
that, we add equalities. Also no constants are allowed in relational subgoals.
Thus, a normalized query uses explicit equalities in addition to inequalities. The
4
above queries in the normalized form are as follows:
Q1() :−a(X, Z), X <5, Z= 5
Q2() :−a(X, Z), a(Y, W), X≤5, Y <5, Z= 5, W=X
The cases that we treat in this paper and all related previous results are
listed in Table 1. where, in the first two columns we mention the arithmetic
comparisons allowed. Notation anycin the table means that any AC can be
used inQ 2up to a certain condition which is stated in the corresponding the-
orem. Notation NP/HP means that it is in the complexity class NP via the
homomorphism property. The third column in the table refers to whether nor-
malization is needed. The table only contains results from the literature that
concern SI comparisons. All the symmetrical results are valid too, i.e., when we
interchange RSI with LSI ACs and vice versa. We do not state explicitly the
symmetrical results.
Intheframeworkofansweringqueriesusingviews, whenonlyaviewinstance
isavailable, wewanttofindanswerstoagivenquerywhicharealwayscomputed
on any database that produces the view instance or a superset of the view
instance. Through this consideration the concept of certain answers is defined.
Query rewriting techniques are used to find efficiently some certain answers to
queries when only a view instance is available. Actually, for CQ query and
views, maximally contained rewritings (MCR) can compute exactly all certain
answers, and hence certain answers can be computed in polynomial time. In
this paper, we prove that the same result is true for CQAC query and views.
However, unlike CQs, it is not easy to find MCRs that are unions of CQAC
queries and the problem of finding certain answers for CQ views and queries
with only “̸=” is coNP-hard, as is proven in [18]. Here we focus again on queries
with LSI and RSI ACs only and show that, for any CQAC view set, we can find
an MCR in the language of (possibly infinite) union of CQACs. This extends
the results in [6] and settles an error in there because, as the following example
demonstrates, even when the views use only SI ACs in their definition, an MCR
may have to use an AC that relates two variables. The following query and
views use only SI ACs:
5
Views Query MCR Certain Reference
Answers
CQ-LSI CQ-LSI Union of CQ-LSI PTIME [13, 6]
CQ-SI CQ-RSI1 Datalog with ACs PTIME [6]
CQAC CQ with closed RSI1 Datalog with ACs PTIME Thm. 13.1
CQAC CQ∗w. closed/open RSI1 Datalog with ACs PTIME Thm. 16.4
CQAC CQ∗w. open/closed RSI1 Datalog with ACs PTIME Thm. 16.4
CQAC CQ∗w. open RSI1 Datalog with ACs PTIME Thm. 16.4
CQ CQ̸=n/a coNP-hard [18]
Table 2: Work on finding maximally contained rewritings (“MCR”) and certain answers.
Q() :−e(X, Y), e(Y, Z), X≥5, Z≤5,
V1(Z) :−e(X, Y), e(Y, Z), X≥5,V 2(X) :−e(X, Y), e(Y, Z), Z≤5.
However, the MCR consists of the following three CQACs:Q() :−V 1(Z), Z≤5,
Q() :−V 2(X), X≥5,Q() :−V 1(Z), V 2(X), X≥Z. The cases that we treat in
this paper and all related previous results are listed in Table 2.
The main contributions in this paper are:
•The results on the complexity of the query containment problem for con-
junctive queries with arithmetic comparisons (CQACs) where the contain-
ing query uses only semi-interval arithmetic comparisons. These results are
summarized in Table 1 where we mention only results from previous work
that concern semi-interval comparisons. (Sections 5 – 9)
•We prove that for CQAC queries and views, if there is a maximally con-
tained rewriting (MCR) in the language of union of CQACs, then this MCR
computes exactly all certain answers. (Sections 10 – 12 )
•For queries that are CQAC with semi-interval comparisons with a single
RSI comparison and any CQAC views, we build an MCR in the language
of Datalog with arithmetic comparisons. (Sections 13 – 16), hence proving
that in this case, we can compute certain answers in polynomial time.
6
The structure of the paper is as follows: After Related Work (Section 2)
and Preliminaries (Section 3), Section 4 presents a sound and complete set
of elemental implications to derive an arithmetic comparison from a given set
of arithmetic comparisons. It also presents the preliminaries to analyze the
containment test in the case of CQACs. Section 5 presents the reduction for
the hardness result.
The three sections that follow investigate the cases where the problem of
query containment is in NP. Section 6 considers the case where the containing
query has only one AC. Section 7 considers the case where the containing query
uses SI ACs. Section 8 discusses the issue of normalization and extends the
results of Section 7. Section 9 takes advantage of the observation that the head
variables (and possibly some more variables) of the containing query map on
the same variables of the contained query for every containment mapping and,
thus, extends the results of the previous sections.
Section 10 introduces maximally contained rewritings (MCRs) and Section
11definesexpansionofarewritingandmakesremarksthatconcernidiosyncrasy
for CQAC and contained rewritings. Section 12 proves the next major result
which says that an MCR in the language of union of CQACs computes all
certain answers for CQAC query and views. The rest of the sections focus on
finding MCRs for the case the query contains LSI ACs and only one RSI AC.
For this to happen, we make use of another containment test, for this particular
case, which is based in a transformation of the containing query to a Datalog
query (without ACs) and a transformation of the contained query to a CQ. We
present this test in Sections 13 and 14 which leads to Theorem 14.4, the proof
of which is in Appendix B while Section 15 serves as introduction to the proof
in Appendix B. In Section 16 the algorithm based on Theorem 14.4 is presented
for finding an MCR for the aforementioned special case.
7
2. Related work
CQ and CQAC containment:The problem of containment between conjunc-
tive queries (CQs, for short) has been studied in [1], where the authors show
that the problem is NP-complete, and the containment can be tested by finding
a containment mapping. As we already mentioned, considering CQs with arith-
metic comparisons (CQACs), the problem of query containment isΠp
2-complete
[3]. Zhang and Ozsoyoglu, in [14], showed that testing containment of two
CQACs can be done by checking the containment entailment. Kolaitis et al.
[8] studied the computational complexity of the query-containment problem of
queries with disequations (̸=). In particular, the authors showed that the prob-
lem remainsΠp
2-hard even in the cases where the acyclicity property holds and
each predicate occurs at most three times. However, they proved that if each
predicate occurs at most twice then the problem is in coNP. Karvounarakis and
Tannen, in [15], also studied CQs with disequations (̸=) and identified special
cases where query containment can be tested by checking for a containment
mapping (i.e., the containment problem for these cases is NP-complete).
The homomorphism property for query containment of conjunctive queries
with arithmetic comparisons was studied in [2, 5, 7, 9], where classes of queries
were identified for which the homomorphism property holds.
Rewritings and finding MCRs:Theproblemofansweringqueriesusingviews
has been extensively investigated in the relevant literature (e.g., [16, 17, 11]);
including finding equivalent and contained rewriting. Algorithms for finding
maximally contained rewritings (MCRs) have also been studied in the past
[18, 19, 20, 13, 21, 22, 23, 6]. The authors in [13] and [21] propose two al-
gorithms, the Minicon and shared-variable algorithm, respectively, for finding
MCRs in the language of unions of CQs when both queries and views are CQs.
[13] also considers restricted cases of arithmetic comparisons (LSI and RSIs)
in both queries and views. [24] examines equivalent CQ rewritings for acyclic
CQ queries. The works in [22] and [23] studied the problem where the query is
given by a Datalog query, while the views are given by CQs and union of CQs,
8
respectively. In both papers, the language of MCRs is Datalog. The authors in
[25] studied the problem of finding MCRs in the framework of bounded query
rewriting. They investigated several query classes, such as CQs, union of CQs,
and first order queries, and analyzed the complexity in each class. Work in [10]
proposed an efficient algorithm that finds MCRs in the language of union of
CQs in the presence of dependencies. The work in [6] investigated the problem
of finding MCRs for special cases of CQACs. [26] is a recent account on view-
based query processing as an abstraction in data integration. Determinacy is
another related problem investigated recently, where we ask about the existence
of equivalent rewritings in the case where the answers to the views uniquely
determine the answers to the query. In [27], [28], [29], [30], [31] notions related
to determinacy are considered, in [32] determinacy for nested relational queries
is investigated, in [33] determinacy for multiset semantics, and [34] [35], [36]
investigates determinacy for recursive queries and views.
Certain answers and MCRs:The problem of finding certain answers has
been extensively investigated in the context of data integration and data ex-
change, the last 20 years (e.g., [37, 18, 19, 38, 10, 39]). In [19, 38], the authors
investigated the problem of finding certain answers in the context of data ex-
change, considering CQs. The work in [38] was extended for arithmetic and
linear arithmetic CQs in [40]. In [18], the authors investigated the relationship
between MCRs and certain answers. In [18], the problem of finding certain an-
swers is proven coNP-hard when the queries have inequalities (̸=) and the views
are CQs. In [10], the authors proved that an MCR of a union CQs computes all
the certain answers, where MCR is considered in the language of union of CQs.
[12] developed an implementation for computing certain answers.
Other work with arithmetic comparisons in queries:As concerns studying
other related problems of queries in the presence of arithmetic comparisons re-
cent work can be found in [41], where the authors propose to extend graph
functional dependencies with linear arithmetic expressions and arithmetic com-
9
parisons. They study the problems of testing satisfiability and related problems
over integers (i.e., for non-dense orders). In [42] the complexity of evaluat-
ing conjunctive queries with arithmetic comparisons is investigated for acyclic
queries, whilequerycontainmentforacyclicconjunctivequerieswasinvestigated
in [43]. Other works [40, 44] have added arithmetic to extend the expressiveness
of tuple generating dependencies and data exchange mappings, and studied the
complexity of related problems. Queries with arithmetic comparisons on incom-
plete databases are considered in [45].
3. Preliminaries
Arelation schemais a named relation defined by its name (calledrelational
symbol) and a vector of attributes. Aninstanceof a relation schema is a collec-
tion of tuples with values over its attribute set. These tuples are calledfacts.
The schemas of the relations in a database constitute itsdatabase schema. A
relationaldatabase instance(database, for short) is a collection of relation in-
stances.
Aconjunctive query (CQ for short)Qover a database schemaSis a query
of the form:h( X) :−e 1(X1), . . . , e k(Xk), whereh( X)ande i(Xi)are atoms,
i.e., they contain a relational symbol (also calledpredicate- here,hande iare
predicates) and a vector of variables and constants. The atoms that contain
only constants are calledgroundatoms and they representfacts.
Theheadh( X), denotedhead(Q), represents the results of the query, and
e1. . . e krepresent database relations (also called base relations) inS. The vari-
ables in Xare calledheadordistinguishedvariables, while the variables in Xi
and not in Xare calledbodyornondistinguishedvariables of the query. The
part of the conjunctive query on the right of symbol:−is called thebodyof the
query and is denotedbody(Q). Each atom in the body of a conjunctive query is
said to be asubgoal. A conjunctive query is said to besafeif all its distinguished
variables also occur in its body. We only consider safe queries here.
Theresult(oranswer), denotedQ(D), of a CQQwhen it is applied on
10
a database instanceDis the set of atoms such that for each assignmenth
of variables ofQthat makes all the atoms in the body ofQtrue the atom
h(head(Q))is inQ(D).
Conjunctive queries with arithmetic comparisons (CQAC for short)are con-
junctive queries that, besides the relational subgoals, use also subgoals that are
arithmetic comparisons (AC for short), i.e., of the formXθYwhereθis one of
the following:<, >,≤,≥,=,̸=, andXis a variable andYis either a variable or
constant. Ifθis either<or>we say that it is an open arithmetic comparison
and ifθis either≤or≥we say that it is a closed AC. If the AC is either of
the formX < corX≤c(eitherX > corX≥c, respectively), whereXis
a variable andcis a constant, then it is calledleft semi-interval, LSI for short
(right semi-interval, RSI for short, respectively). In the following, we use the
notationQ=Q 0+βto describe a CQAC queryQ, whereQ 0are the relational
subgoals ofQandβare the arithmetic comparison subgoals ofQ. We define
theclosureof a set of ACs to be all the ACs that are implied by it.
The resultQ(D)of a CQACQ, when it is applied on a databaseD, is given
by considering all the assignments of variables (in the same fashion as in CQs)
such that the atoms in the body are included inDand the ACs are true. For
each such assignment, we produce a fact in the outputQ(D).
All through this paper, we assume the following setting for a CQAC:
1. Values for the arguments in the arithmetic comparisons are chosen from an
infinite, totally densely ordered set, such as the rationals or reals.
2. Thearithmeticcomparisonsarenotcontradictory(or, otherwise, wesaythat
theyareconsistent); thatis, thereexistsaninstantiationofthevariablessuch
that all the arithmetic comparisons are true.
3. All the comparisons are safe, i.e., each variable in the comparisons also
appears in some relational subgoal.
Aunion of CQs(resp. CQACs) is defined by a setQof CQs (resp. CQACs)
whose heads have the same arity, and its answerQ(D)is given by the union
of the answers of the queries inQover the same database instanceD; i.e.,
11
Q(D) =S
Qi∈QQi(D).
A queryQ 1is containedin a queryQ 2, denotedQ 1⊑Q 2, if for any database
Dof the base relations, the answer computed byQ 1is a subset of the answer
computed byQ 2, i.e.,Q 1(D)⊆Q 2(D). The two queries areequivalent, denoted
Q1≡Q 2, ifQ 1⊑Q 2andQ 2⊑Q 1.
Ahomomorphismhfrom a set of relational atomsAto another set of rela-
tional atomsBis a mapping of variables and constants from one set to variables
or constants of the other set that maps each variable to a single variable or
constant and each constant to the same constant. Each atom of the former set
should map to an atom of the latter set with the same relational symbol.
Acontainment mappingfrom a conjunctive queryQ 1to a conjunctive query
Q2is a homomorphism from the atoms in the body ofQ 1to the atoms in the
body ofQ 2that maps the head ofQ 1to the head ofQ 2. All the mappings
we refer to in this paper are containment mappings unless we say otherwise.
ChandraandMerlin[1]showthataconjunctivequeryQ 2iscontainedinanother
conjunctive queryQ 1if and only if there is a containment mapping fromQ 1to
Q2. The query containment problem for CQs is NP-complete.
3.1. Testing query containment for CQACs
In this section, we describe two tests for CQAC query containment; using
containment mappings and using canonical databases.
First, we present the test using containment mappings (see, e.g., in [11]). Al-
though finding a single containment mapping suffices to test query containment
for CQs (see the previous section), it is not enough in the case of CQACs. In
fact, all the containment mappings from the containing query to the contained
one should be considered. Before we describe how containment mappings can
be used in order to test query containment between two CQACs, we define the
concept of normalization of a CQAC.
Definition 3.1.LetQ 1andQ 2be two conjunctive queries with arithmetic
comparisons (CQACs). We want to test whetherQ 2⊑Q 1. To do the testing,
12
we first normalize each ofQ 1andQ 2toQ′
1andQ′
2, respectively. Wenormalize
a CQAC query as follows:
•For each occurrence of a shared variableXin a normal (i.e., relational)
subgoal, except for the first occurrence, replace the occurrence ofXby a
fresh variableX i, and addX=X ito the comparisons of the query; and
•For each constantcin a normal subgoal, replace the constant by a fresh
variableZ, and addZ=cto the comparisons of the query.
Theorem 3.2[4, 5] describes how we can test query containment of two
CQACs using containment mappings.
Theorem 3.2.LetQ 1, Q2be CQACs, andQ′
1=Q′
10+β′
1, Q′
2=Q′
20+β′
2be the
respective queries after normalization. Suppose there is at least one containment
mapping fromQ′
10toQ′
20. Letµ 1, . . . , µ kbe all the containment mappings from
Q′
10toQ′
20. ThenQ 2⊑Q 1if and only if the following logical implicationϕis
true:
ϕ:β′
2⇒µ 1(β′
1)∨ ··· ∨µ k(β′
1).
(We refer toϕas thecontainment entailmentin the rest of this paper.)
The second containment test for CQACs usescanonical databases(see, e.g.,
in [11]). Considering a CQQ, a canonical database is a database instance
constructed as follows. We consider an assignment of the variables inQto con-
stants such that a distinct constant which is not included in any query subgoal
is assigned to each variable. Then, the facts produced through this assignment
define a canonical database ofQ. Note that although there is an infinite number
of assignments and canonical databases, depending on the constants selection,
all the canonical databases are isomorphic; hence, we refer to such a database
instance as the canonical database ofQ. To test whetherQ 2⊑Q 1, we compute
the canonical database,D, ofQ 2and check ifQ 2(D)⊆Q 1(D).
Extending this test to CQACs, a single canonical database does not suffice.
We construct a set of canonical databases of a CQACQ 2with respect to a
CQACQ 1as follows. Consider the setS=S V∪SCincluding the variables
13
SVofQ 2, and the constantsS Cof bothQ 1andQ 2. Then, we partition the
elements ofSinto blocks such that no two distinct constants are in the same
block. LetPbe such a partition; for each block in the partitionP, we equate
all the variables in the block to the same variable and, if there is a constant
in the block, we equate all the variables to the constant. For each partitionP,
we create a number ofcanonical databases, one for each total ordering on the
variables and constants that are present.
Although there is an infinite number of canonical databases, depending of
the constants selected, there is a bounded set of canonical databases such that
every other canonical database is isomorphic to one in this set. Such a set is
referred to asthe set of canonical databasesofQ 2with respect toQ 1. To test
whetherQ 2⊑Q 1, we construct all the canonical databases ofQ 2with respect
toQ 1and, for each canonical databaseD, we check ifQ 2(D)⊆Q 1(D).
Theorem 3.3.A CQAC queryQ 2is contained into a CQAC queryQ 1if and
only if, for each database belonging to the set of canonical databases ofQ 2with
respect toQ 1, the queryQ 1computes all the tuples thatQ 2computes if applied
on it.
3.2. Answering queries using views
Aviewis a named query which can be treated as a regular relation. The
query defining the view is calleddefinitionof the view (see, e.g., in [11]).
Considering a set of viewsVand a queryQover a database schemaS, we
want to answerQby accessing only the instances of views [16, 46, 11]. To
answer the queryQusingVwe could rewriteQinto a new queryRsuch that
Ris defined in terms of views inV(i.e., the predicates of the subgoals ofR
are view names inV). We denote byV(D)the output of applying all the view
definitions on a database instanceD. Thus,V(D)and any subset of it defines
a view instanceIfor which there is a databaseDsuch thatI ⊆ V(D).
If, for every database instanceD, we haveR(V(D)) =Q(D)thenRis an
equivalent rewritingofQusingV. IfR(V(D))⊆Q(D), thenRis acontained
14
rewritingofQusingV. To find and check query rewritings we use the concept
of expansion which is defined as follows.
Definition 3.4.Theview-expansion,2Rexp, of a rewritingRdefined in terms
of views inV, is obtained fromRas follows. For each subgoalv iofRand the
corresponding view definitionV iinV, ifµ iis the mapping from the head ofV i
toviwe replacev iinRwith the body ofµ i(Vi). The non-distinguished variables
in each view are replaced with fresh variables inRexp.
To test whether a queryRdefined in terms of views setVis a contained
(resp. equivalent) rewriting of a queryQdefined in terms of the base relations,
we check whetherPexp⊑Q(resp.Pexp≡Q).
Definition3.5.A rewritingRis calledamaximallycontainedrewriting(MCR)
of queryQusing viewsVwith respect to query languageLif
1.Ris a contained rewriting ofQusingVinL, and
2. every contained rewriting ofQusingVin languageLis contained inR.
Other concepts, like Datalog queries, and computing certain answers of a
query given a viewinstance,I, willbe defined in the sections where theyappear.
4. Reasoning with Arithmetic Comparisons
4.1. Notation in Presentation of Results
We present our results using the following notation: We usevarandconst
to denote any variable or any constant. We define:
•Semi-interval (SI for short)arithmetic comparisons are of the formvar≤
const,var < const(these are calledleft semi-intervalarithmetic compar-
isons,closedandopenrespectively ) or of the formvar≥const,var > const
(these are calledright semi-intervalarithmetic comparisons,closedandopen
respectively).
2In Section 16, we will need to differentiate between view-expansion and Datalog-expansion
which we will define shortly, therefore, when confusion arises we use these prefixes.
15
For short, we use the notation CLSI for closed left semi interval arithmetic
comparisons, ORSI for open right arithmetic comparisons, and similarly, we
use ORSI, CLSI, or RSI, LSI if we refer to both closed and open.
Definition 4.1.AnAC-typeis one of the elements of the following setT AC:
TAC={var≤var, var<var, var≤const, var<const, const≤var,
const<var, var=var, var=const, var̸=var, var̸=const}
Letθbe one of the{<, >,≤,≥,=,̸=}. We say that an ACXθYis of type
”var θ var” if bothXandYare variables. IfXis a variable andYis a
constant then we say that it is of type ”var θ const.”
For example, a closed LSI AC is of typevar≤const.
AnAC-family,T Ais defined by a subset ofT AC. An AC belongs to a specific
AC-family if it is of the type that belongs in the family.
LetT Abe an AC-family. ThenT Adefines a class,Q, of CQAC queries as
follows: A queryQbelongs inQifQuses ACs only of types inT A.
Table 3 explains the notation. It is not exhaustive. The notation that
is missing follows the same pattern. In Table 1 we present the results using
abbreviations (e.g., LSI,CLSI) whereas in the corresponding theorems, in order
to present the results in a homogenous way, we define classes of queries as an
AC-family.
4.2. Computing the closure of a set of ACs
A collection of ACs isconsistentor is notcontradictoryif there is an assign-
ment of real numbers to the variables such that all the arithmetic comparisons
are true.
We list below a sound and complete set ofelemental implicationswhich can
be used to derive any AC,b, from any consistent set of ACs,F. The proof of
soundness and completeness for the set of the first 8 elemental implications can
be found in [47]. We add here elemental implication (9) because we want to
argue about ACs that are “=.” The set with the added implication is sound
16
Notation Meaning
CQ conjunctive query
AC arithmetic comparison
CQAC conjunctive quer with arithmetic comparison
SI semi-interval AC
LSI (RSI) left semi-interval AC (right semi-interval AC)
OLSI (ORSI) open left (right) semi-interval AC
CLSI (CRSI) closed left (right) semi-interval AC
OSI (CSI) open (closed) AC
CQSI or CQAC-SI conjunctive query with semi-interval ACs
RSI1 set of (or conjunctive query with) SI ACs of which only one is RSI
var(const) variable (constant)
lhs (rhs) left hand side (right hand side)
Datalog-expansion The CQ (CQAC) that results from unfolding the rules of a Datalog query
view-expansion The CQ (CQAC) that results by replacing the subgoals of a rewriting with
the view definitions
Table 3: Notation and Abbreviations.
because we assume a consistent set of ACs; it is, also, complete because, for any
equality,X=Y, the ACsX≤YandX≥Yare true, and, for these ACs, the
set of elemental implications (1) through (8) is complete. Since we also want to
handle semi-interval ACs, before we apply the elemental implications on a set,
F, of ACs, we add inFthe ACc < c′for any pair of constants,c < c′that
appear inF.
1.X≤X
2.X<Y⇒X≤Y
3.X<Y⇒X̸=Y
4.X≤Y∧X̸=Y⇒X<Y
5.X̸=Y⇒Y̸=X
6.X<Y∧Y<Z⇒X<Z
7.X≤Y∧Y≤Z⇒X≤Z
8.X≤Z∧Z≤Y∧X≤W∧W≤Y∧W̸=Z⇒X̸=Y
17
9.X≤Y∧Y≤X⇒X=Y
where,X,Y,Z,Wcan be either variables or constants as follows: In (1),Xis a
variable, in (2), (3), (4), (5) and (9), one of them should be a variable. In (6), (7)
and (8), eitherXis a variable and the rest are either variables or constants, orYis a
variable and the rest are either variables or constants.
When there are semi-interval ACs in the set,F, of ACs, then the closure is
computed with respect to a finite set of relevant constants. I.e., only SIs that
use constants from this set are included in the closure. The following lemma
summarizes some easy observations and we will, conveniently, refer to them
often in the rest of the paper.
Lemma 4.2.1. Suppose a set,F, of ACs includesW̸=Z. Suppose elemen-
tal implication (8) is applied onFand derivesX̸=Yby usingW̸=Z.
Suppose, on a second step, we apply (8) onF∪{X̸=Y}to deriveX 1̸=Y 1.
ThenX 1̸=Y 1can be derived fromFby using (8) only in one step.
2. IfX≤Yis in the closure of a set of ACs,F, then, it can be derived by
using several times elemental implication (2) and, after that, by using several
times elemental implication (7). The application of (7) several times creates
achain,X 1···X k, of variables/constants where, for eachi= 1, . . . , k−1,
eitherX i≤X i+1is inForX i< X i+1is inF.
3. IfX < Yis in the closure of a set of ACs,F, then, either there is a chain
of vars/const related by<fromXtoYor, there are two chains fromXto
Y(not necessarily distinct) of only variables (exceptXorY, one of which
could be a constant) related by≤and there areZ, W, one in each chain,
such thatZ̸=W.
4. If there is a chain fromvar/const Xtovar/const Yas in the case (2) and
(3) above. Then there is a chain fromvar/const Xtovar/const Ythat
contains at most two constants.
Proof.For the first clause: Suppose the first time implication 8 is used, we have
inFthe following ACsX≤Z, Z≤Y, X≤W, W≤Y, W̸=Z. The second
18
time it is used, using the result of the first time (which isX̸=Y), suppose we
haveX 1≤Y, Y≤Y 1, X1≤X, X≤Y 1, X̸=Ywhich derivesX 1̸=Y 1. This
(i.e., theX 1̸=Y 1) could have been derived using the elemental implication 8
only once as follows:X 1≤Z, Z≤Y 1, X1≤W, W≤Y 1, W̸=Z. TheX 1≤Z
is derived from theX 1≤XandX≤Z. Similarly, the other three ACs are
derived.Z≤Y 1is derived from theZ≤YandY≤Y 1.X 1≤Wis derived
from theX 1≤X X≤W.W≤Y 1is derived from theW≤Y Y≤Y 1.
For the second clause, the proof is obvious, since, only implications (2) and
(7) deriveX≤Yand the chain is an obvious consequence of applying implica-
tion (7) several times.
For the third clause, we observe thatX < Ycan be derived either from
elemental implications (4) or (6). If only (6) is applied several times then we
derive a chain. If (4) is used, then we need to show firstX̸=Y. In order to
showX̸=Y, according to clause (a) of the present lemma, we need to use (8)
only once, and this means that there is a chain fromXtoYcontainingWand
a chain fromXtoYcontainingZ, withW̸=ZinF.
For the fourth clause, just notice that if a chain has more than two constants,
then we can choose the two constants, one closest to the beginning of the chain
andoneclosesttotheendofchainandweformthechainbyusingthearithmetic
comparison between these two constants.
Lemma 4.3.We can compute the closure of a set of ACs in time polynomial
on the number of ACs and the number of relevant constants we use to compute
the SIs in the closure.
All through the paper, when we refer to a set of ACs,F, we mean the closure
ofF.
4.3. Analyzing the containment implication
In the rest of this paper, we will often focus on an implication of the form
a1∧a 2∧ ··· ∧a n⇒b 1∨b2∨ ··· ∨b m (1)
19
wherea i’s andb i’s are ACs. We call itcontainment implication. We say that
the containment implication isminimalor isin minimal formif the following is
true: If we delete any disjunct from the rhs then the implication is not true.
Lemma 4.4.Consider the containment implication (1) in minimal form. Then
for any AC, sayb m, on the rhs, the following is true:
a1∧a 2∧ ··· ∧a n∧ ¬b 1∧ ¬b 2∧ ··· ⇒b m
Proof.We need to prove that the conjunction on the lhs of the above implica-
tion is consistent (if the containment implication is true). Suppose, it is not
consistent. Then the following will be true:
a1∧a 2∧ ··· ∧a n⇒b 1∨b2∨ ··· ∨b m−1
Hence the containment implication would not be minimal. Now, the rest of the
proof is simply a rewriting of the containment implication.
We will use Lemma 4.4 often, and, for convenience, we will say that ”we
move theb 1, . . . , b m−1to the left hand side (lhs for short) and we apply Lemma
4.4 to proveb m.” A consequence of Lemma 4.3 and Lemma 4.4 is the following:
Lemma 4.5.We can check whether a containment implication is true in poly-
nomial time.
Proof.We move all the ACs to the left hand side and compute the closure of
these ACs. If there is an AC in the closure and there is also its negation in the
closure, then the containment implication is true, otherwise not.
Now we begin to focus on semi-interval ACs. From hereon, and for the rest
of the paper, we will state the results and the proofs only in terms of LSI ACs,
it is obvious they are valid, symmetrically, in terms of RSI ACs too.
The following lemma roughly says that if we have only LSIs on the rhs of
a containment implication which is in minimal form, then there is only one
disjunct on the rhs. However, there is an exception in the case there is a “̸=”
on the lhs which is stated in detail in the following lemma:
20
Lemma 4.6.Consider the containment implication ( 1), where thea is are from
a set of ACsAand theb is are from a set of ACsB. Then, for the pairs ofA
andBlisted below the following is true: If the containment implication ( 1) is
true and is in minimal form, then the rhs has one disjunct.
1.Ais a set of ACs of AC-typeT AC,Bis a set of ACs of AC-type{var≤
const}.
2.Ais a set of ACs of AC-typeT AC− {̸=},Bis a set of ACs of AC-type
{var≤const, var<const}.
3.Ais a set of ACs of AC-typeT AC,Bis a set of ACs of AC-type{var≤
const, var<const}. and the following condition is satisfied: For anyX̸=Y
that appears inA, if a constant (sayc 0) is related by an AC to bothXand
YinAthen, either (i)c 0does not relate to bothXandYby a closed AC
inAor (ii)c 0does not appear in an open AC in someB.
Proof.For the first case, we apply Lemma 4.4 to prove a closed LSI,b i(let it be
X≤c). When we move each LSI exceptb ito the lhs, this becomes an RSI, so
(according to Lemma 4.2, clause 2) it will not be used to the derivation of LSI
X≤c. Hence, the ACb iis implied by applying the elemental implications on
the ACs of the setAonly. Therefore, the containment implication is in minimal
form only with one AC on the rhs. The second case is similar with the first
case, since the absence of “̸=” inAresults in an open LSI inBbeing implied
by using a chain (Lemma 4.2, clause 3) as in the first case.
Third case: If there is a closed LSI,b i, among the ACs in setB, then,
according to Lemma 4.4, we move all other ACs of the containment implication
on the lhs, then the argument is the same as above, except the following fine
point: AnopenLSI,whenmovesonthelhs,becomesaclosedRSIwhichtogether
with a closed LSI among the ACs in the setAproduces equality, which may
proveb i. However, if so, and taking into account Lemma 4.2 clause 4, ACb i
can be already proven by using only ACs in the setA. So, in this case, the
containment implication in minimal form has one AC on the rhs. If there are
only open ACs in the setB, then, according to Lemma 4.2 clause 3, there is the
21
possibility of proving a certainb iby using one closed RSI resulting from ab j
moved to the lhs and one closed LSI from the setAtogether with a inequality,
and a closed LSI fromA, i.e., say the inequality mentioned in Lemma 4.2 clause
3, is the one chain mentioned in Lemma 4.2 clause 3, isX̸=Y, one chain is
X≤cand the other isY=cwhich is resulting fromY≤cfromAandY≥c
moved fromBto the lhs. For this to happen we need two closed LSIs inAand
two open LSIs inB, all sharing the same constant. This is the condition we
excludedinthestatementofthislemma, hence, inthiscasetoo, thecontainment
implication has only one AC in the rhs when in minimal form.
Example 4.7.The following containment implication demonstrates “almost
the only” case we have only LSIs on the rhs of the containment implication in
minimal form, and, still, there are two ACs on the rhs.
X̸=Y∧5≤Y∧5≤X⇒X >5∨Y >5
The above implication appears in the following example:
Q1() :−a(X, Y), X >5Q 2() :−a(X, X′), a(Y, X′′), X̸=Y, X≥5, Y≥5
4.4. Analysing the containment entailment
Consider the containment entailment (as in Theorem 3.2).
β2⇒µ 1(β1)∨ ··· ∨µ k(β1).
whereβ 2is the conjunction of ACs in the contained query andβ 1is the
conjunction of ACs in the containing query. Applying the distributive law, this
containment entailment can be equivalently viewed as a collection ofcontain-
ment implications, each containment implication being:
β2⇒a 1∨a2∨ ···a k
wherea iis one of the ACs in the conjunctionµ i(β1),i= 1,2, . . . , k.
Example 4.8.For an example, consider the following (normalized) CQACs.
22
Q1:q() :−a(X 1, Y1, Z1), X 1=Y 1, Z1<5
Q2:q() :−a(X, Y, Z′), a(X′, Y′, Z), X≤5, Y≤X, Z≤Y,X′=Y′, Z′<5
Testing the containmentQ 2⊑Q 1, it is easy to see that there are the following
two containment mappings:
•µ 1:X1→X, Y 1→Y, Z 1→Z′
•µ 2:X1→X′, Y1→Y′, Z1→Z
Hence, the containment entailment is given as follows:
X≤5∧Y≤X∧Z≤Y∧X′=Y′∧Z′<5⇒
 
µ1(X1)=µ 1(Y1)∧µ 1(Z1)<5
∨
 
µ2(X1)=µ 2(Y1)∧µ 2(Z1)<5
which is equivalently written:
X≤5∧Y≤X∧Z≤Y∧X′=Y′∧Z′<5⇒
(X=Y∧Z′<5)∨(X′=Y′∧Z <5)
Now we consider the containment entailment we built above. According to
what we analyzed in this section, we can equivalently rewrite this containment
entailment by transforming its right hand side into a conjunction, where each
conjunct is a disjunction of ACs. The transformed entailment is the following,
whereβis the conjunctionX≤5∧Y≤X∧Z≤Y∧X′=Y′∧Z′<5:
β⇒(X=Y∨X′=Y′)∧(X=Y∨Z <5)∧(Z′<5∨X′=Y′)∧(Z′<5∨Z <5)
So, we have, in this case, four containment implications, one of which is, e.g.,
β⇒X=Y∨X′=Y.
Lemma 4.9.Consider a containment entailment,E. The following two are
equivalent:
a) One disjunct in the rhs ofEsuffices to make the containment entailment
Etrue.
b) Any containment implication in minimal form produced byEhas one
disjunct in the rhs.
23
Proof.(b)⇒(a): Suppose (b) is true and (a) is not true. Then, consider, from
each disjunct of the containment entailment, the AC which is not implied by the
lhs of the entailment. These ACs (taken over all disjuncts) build a containment
implication for which clause (b) in the lemma is not true, hence contradiction.
The next theorem is a straightforward consequence of the above lemma and
Lemma 4.6.
Theorem 4.10.(Complexity of query containment by homomorphism prop-
erty(HP)) Consider the cases in Lemma 4.6, whereAdefines the set of ACs
allowed inQ 2andBinQ 1. In these cases, the HP holds and checking contain-
ment is in NP.
5.Πp
2-hardness Result
In this section we prove the following theorem:
Theorem 5.1.The following problem isΠp
2-hard: Given CQAC queryQ 2that
only uses̸=ACs and semi-interval arithmetic comparisons and CQAC queryQ 1
that only uses ACs that are semi-interval arithmetic comparisons, isQ 2⊑Q 1?
Proof.The reduction will be done from theΠ 2−SATproblem which is the
following:
Instance: AΠ 2formula of quantified propositional logic, i.e., an expression
∀p1···p n∃q1···q m[ψ](2)
whereψis a formula of propositional logic containing only the variables
p1, . . . , p n, q1, . . . , q m.
Question: Is it true that formula (2) is satisfiable, i.e., is it true that for
every assignment of truth values top 1···p nthere exists an assignment of truth
values toq 1···q msuch thatψis true?
We will construct two Boolean CQAC queriesQ 1andQ 2and we will prove
that formula (2) is satisfiable if and only ifQ 2⊑Q 1.
24
Construction ofQ 2:QueryQ 2contains the following subgoals over con-
stantseandfwhich encode a Boolean computation (the relationaencodes
”and”, the relationoencodes ”or”, and the relationsnandtencode negation and
the true value respectively):
a(e, e, e), a(e, f, f), a(f, e, f), a(f, f, f), o(e, e, e), o(e, f, e),
o(f, e, e), o(f, f, f), n(e, f), n(f, e), t(e)
The above subgoals simulate the calculation of the truth value of the formula
ψ. In addition, we addncopies of the following five subgoals:
ai(Ui, e), a i(Vi, f), a i(Wi, e), a i(Wi, f), U i<7,7< V i, Wi̸= 7
fori= 1, . . . n, whereU i, Viare fresh and distinct variables that appear only in
one relational subgoal andW iis also a fresh distinct variable which appears in
two relational subgoals. The variablesEandFappear all across thencopies.
Construction ofQ 1:First, we construct the subgoals which will map to
thea is above. These subgoals arencopies of the following:
ai(T1i, Ti), ai(T2i, Ti), T1i<7,7< T 2i
where the variables are fresh and distinct for every copy. The above formula
expresses the fact that, while evaluatingQ 1on a canonical database database
ofQ 2, eachT iwill take the valuetorfand for eachtorf, there is a canonical
database ofQ 2on whichT itakes exactly one of these two values. This encodes
all the combinations of truth values for the variablesp 1···p nin formula (2).
Finally, we construct a number of subgoals using the relationsa, o, n, tthat
will encode satisfaction of the formulaψ. This construction will be done induc-
tively on the structure ofψ. The structure ofψcan be depicted by a binary
treeTr. The leaves of the tree are labeled by the variablesp 1, . . . , p n, q1, . . . , q m
(some variables may appear in more than one leaves). Each internal node rep-
resents a subexpression. If an internal node has children with subexpressions
ψ1andψ 2then the subexpressionψ 1∧ψ 2labels this node. Similarly for∨and
25
¬, the latter being a node with only one child. FollowingTr, we construct tree
Tr1which has the same structure asTr. The labels on the leaves of treeTr 1
are the variablesT 1, . . . , T n, Tn+1, . . . T mwhich represent truth values for the
variablesp 1, . . . , p n, q1, . . . , q mrespectively.
Now, the internal node of the nodeψ 1∧ψ 2inTris labeled inTr 1by
a(t1, t2, t)wheret 1is the last variable in the label of one child,t 2is the last
variable in the label of the other child andtis a fresh variable distinct from all
the others. The subgoals that we finally add toQ 1are all the atoms on the
labels of the internal nodes of the treeTr 1. Moreover, we add atomt(T h)where
This the last variable in the root node of the tree.
Suppose the formula (2) is true and letDbe a canonical database ofQ 2on
whichQ 2evaluates to true. By construction ofQ 2and especially the subgoals
over the relationsa i, eachT iofQ 1will either evaluate on e or f. The subgoals
ofQ 1that resulted from the construction of the tree, during the evaluation, will
map appropriately. The existence of such a mapping is justified by the truth
value of the formula (2) and the meaning of the subgoals built following the tree
ofψ. For the other direction, suppose for every canonical database ofQ 2on
whichQ 2evaluates to true,Q 1evaluates to true. By construction, there exists,
for each vector of truth values ofp 1···p n, a canonical database, say it isD, of
Q2on whichQ 2evaluates to true such thatT i, i= 1, . . . nhas the truth value of
its corresponding variable inp 1···p n. This is enforced by the subgoals over the
relationsa i. Since there is an evaluation ofQ 1on databaseD, there are values
forT n+1, . . . T msuch that all the subgoals resulting from the tree become true
which is equivalent to the fact that the formulaψis true under this assignment
of values. Hence the formula (2) is true.
It is interesting that the above result is also valid for the case the containing
query uses only LSI comparisons and a constant in relational sugoals. In par-
ticular we replace, in the proof, the following subgoals that we add inQ 1and
Q2respectively
ai(T1i, Ti), ai(T2i, Ti), T1i<7,7< T 2i
26
ai(Ui, E), a i(Vi, F), a i(Wi, E), a i(Wi, F), U i<7,7< V i, Wi̸= 7
Instead of the above, we add the following subgoals toQ 1:
ai(Xi,5, T i), Xi<5
and the following subgoals toQ 2:
ai(Xi,5, E), a i(Yi, Xi, F), X i≤5, Y i<5
Another case where the result is valid is by replacing the above subgoals in
the proof with the following, forQ 1andQ 2respectively:
ai(Xi, Ti), Xi<5
ai(Xi, E), a i(Yi, F), X i≤5, Y i≤5, X i̸=Yi
The following two sets of subgoals prove the case when only̸=is used.
ai(Xi, Yi, Ti), Xi̸=Yi
ai(Xi, Yi, E), a i(Yi, Zi, F), X i̸=Zi
We summarize in Table 4.
We present the results in the following theorem:
Theorem 5.2.LetQ 1andQ 2be CQACs with either of the following restric-
tions:
1.Q 1has only OLSIs andQ 2has only CLSIs and̸=.
2.Q 1has only OLSIs. Both queries have constants in relational subgoals.
Q2has only LSIs.
3. Both queries have only̸=.
4. Both queries have only OSIs andQ 2has also̸=.
Then the following problem isΠp
2-hard: IsQ 2⊑Q 1?
27
Contained Query Containing Query
OSI,̸= OSI
LSI OLSI, constant
CLSI,̸= OLSI
̸= ̸=
Table 4: Hardness results.
6. Complexity of query containment–the containing query has one
AC. Normalization is not necessary
Theorem 6.1.The query containment problem, which asks ”Q 2⊑Q 1?” is in
NP whenQ 1, Q2are CQAC andQ 1uses one AC and several ACs that are
equations (=).
Proof.Consider the containment entailment
β2⇒µ 1(β1)∨µ 2(β1)···µ i(β1)
whereβ 2is the conjunction of ACs in the contained query andβ 1is the conjunc-
tion of ACs in the containing query. First, suppose eachµ j(β1)is a single AC
(i.e., there are no equation ACs), hence the right hand side of the containment
entailment is a disjunction of ACs. Ifmis the number of variables and constants
in the contained query, then there are at mostO(m2)different ACs that use
them. Each such AC is created by a containment mappingµ i, and therefore
O(m2)containment mappings suffice to prove the containment entailment true.
Given those mappings, we can check in polynomial time if the containment
entailment is true using the rules that entail an AC from a conjunction of ACs.
For the general case we argue as follows: Suppose eachµ j(β1)has at least
one equation AC which is not directly derived byβ 2. In this case, we consider
the containment implication which is using these equation ACs. For this con-
tainment implication to be true, at least one of the equation ACs should be
implied byβ 2, since if we assume otherwise, we arrive at contradiction (by use
28
of Lemma 4.4). Therefore, there is aµ j(β1)(suppose this isµ 1(β1)) for which
all equation ACs (i.e., of the form X=Y or X=c) are directly derived byβ 2.
Thus, now we have to check a new (shorter) containment entailment:
β2∧ ¬µ 1(b1)⇒µ 2(β1)∨ ··· ∨µ i(β1)
whereb 1is the single non-equation AC inβ 1. Now inductively, we repeat for
polynomially many steps, until we arrive at a containment entailment for which
there is a singleµ j(β1)on the rhs and all ACs of thisµ j(β1)are implied by the
lhs. Membership in NP is proven by observing that we can guess the order in
which the mappings are used in the argument above and prove in each step that
the given subsequent mapping has the property that each equation is implied
by the lhs. I.e., in the general step the entailment will be of the form:
β2∧ ¬µ′
1(b1)∧ ¬µ′
2(b1). . .∧ ¬µ′
j(b1)⇒µ′
j+1(β1)∨ ··· ∨µ′
k(β1)
andµ j+1is the subsequent mapping.
7. Complexity of query containment. RSI1 queries
The class of RSI1 queries is a subclass of CQAC-SI queries, i.e., they use
only SI ACs. In particular a query belongs to the class RSI1 if and only if it
uses only one RSI AC. We use the superscript “one” to denote that only one
RSI is allowed, thus, the class of closed RSI1 queries is defined by the AC-type
{var≤const, var≥constone}.
In this section, we will prove that for the majority of cases, there is a con-
tainment test in NP for checking containment of a CQAC to a RSI1 query.
There are some corner cases left out for similar reasons as in the case of LSI
containing queries where the homomorphism property does not hold and these
corner cases that are excluded from the HP are proven in Section Section 5 to
result in a containment test beingΠ2
p-complete. The following proposition will
be used often:
29
Proposition 7.1.Letβbe a conjunction of ACs, and let eachβ 1, β2, . . . , β k
be a conjunction of a set of closed SIs of which only one is RSI. Suppose the
following is true:β⇒β 1∨β 2∨ ··· ∨β k.Then there is aβ i(w.l.o.g. suppose it
isβ 1) such that either of the following two happens:
(i)β⇒β 1,or
(ii) there is an AC,e, among the conjuncts ofβ 1, such thatβ̸⇒eand for
each AC,e′, inβ 1such thate′̸=e, we have thatβ⇒e′.
Proof.Suppose there is noβ isuch thatβ⇒β i. Towards contradiction of clause
(ii) in the statement of the proposition, suppose that for all theβ is there are
at least two ACs that are not implied byβ. Since eachβ iuses at least one LSI
and only one RSI, there is at least one LSI inβ ithat is not implied byβ. Let
γbe the disjunction of all such ACs. If the implication in the premises of this
proposition is true, then,β⇒γis also true (just apply the distributive law).
This is impossible due to Lemma 4.6, clause (1).
Proposition7.2.Proposition 7.1 is true for each of the following cases:
1.βis a set of ACs and each disjunct contains closed LSIs and one open RSI.
2.βis a set of ACs and each disjunct contains open LSIs and one open RSI.
3.βis a set of ACs and each disjunct contains open LSIs and one closed RSI.
Moreover, for the cases (2) and (3), the following conditions should be satisfied:
For anyX̸=Ythat appears inβ, if a constant (sayc 0) is related by an AC
to bothXandYinβthen, either (i)c 0does not relate to bothXandYby a
closed AC inβor (ii)c 0does not appear in an open AC in someβ i.
TheproofoftheabovepropositionisthesameastheproofofProposition7.1
except that we need to use Lemma 4.6, clause (3) too. The following theorem
will be used to prove each of the four cases of Theorem 7.4.
Theorem 7.3.LetQ 1andQ 2be two classes of CQAC queries such that the
following is true: For any pair of queriesQ 1∈ Q 1andQ 2∈ Q 2in their
30
normalized form, and, for any set of containment mappingsµ 1, . . . , µ kfrom the
relational subgoals ofQ 1to the relational subgoals ofQ 2the following is true:
The entailmentβ 2⇒µ 1(β1)∨···∨µ k(β1)(whereβ 1is the conjunction of ACs in
Q1β2is the conjunction of any set of ACs) is true only if there is a containment
mapping (sayµ 1) such that the following is true: there is an AC,e, among the
conjuncts ofµ 1(β1), such thatβ 2̸⇒eand for each AC,e′, inµ 1(β1)such that
e′̸=e, we have thatβ 2⇒e′.
Then the following problem is in NP: IsQ 2⊑Q 1?
Proof.SupposeQ 1⊑Q 1and this is proven by the following containment en-
tailment:β 2⇒µ 1(β1)∨µ 2β1)∨ ··· ∨µ k(β1).From the premises of this the-
orem, we deduce that we can write the containment entailment, equivalently
as:β 2∧ ¬e⇒µ 2(β1)∨ ··· ∨µ k(β1).For this latter entailment, according to
the premises of the theorem, there is a containment mapping (sayµ 2) such
that there ise 1inµ 2(β1)which satisfies the conditions in the statement of the
theorem. Hence the containment entailment is, further, equivalently, written as:
β2∧ ¬e∧ ¬e 1⇒µ 3(β1)∨ ··· ∨µ k(β1).
Thus, afteriiterations, we will have:
β2∧ ¬e 1∧ ¬e 2··· ∧ ¬e i⇒µ i+1(β1)∨ ··· ∨µ k(β1)
In the lhs of the above implication, we can have only polynomially many (on
the number of AC appearing in both queries) ACs, hence we can have at most
polynomially many iterations. Hence the containment entailment can contain
at most polynomially many containment mappings.
To prove membership inNPobserve that the certificate is a containment
entailment and the order on which to consider the disjuncts in this containment
entailment in order to prove that this containment entailment is true.
ThefollowingtheoremsaysthatifthecontainingqueryusesLSIsandatmost
one RSI then the query containment problem is in NP under certain conditions:
31
Theorem 7.4.SupposeQ 2is the class of CQAC queries defined by the AC-
typeT AC. Suppose, also that either of the following happens: (We writevar≤
constone(var < constone) to mean that at most one RSI is present.)
1.Q 1is the class of CQAC queries defined by the AC-type{var≤const, var≥
constone}.
2.Q 1is the class of CQAC queries defined by the AC-type{var≤const, var>
constone}.
3.Q 1is the class of CQAC queries defined by the AC-type{var<const, var≥
constone}.
4.Q 1is the class of CQAC queries defined by the AC-type{var<const, var>
constone}.
Suppose the following condition is satisfied: For anyX̸=Ythat appears in
Q2, if a constant (sayc 0) is related by an AC to bothXandYthen, then either
(i)c 0does not relate to bothXandYby a closed AC inQ 2or (ii)c 0does not
appear in an open AC inQ 1.
Then, the following problem is inNP: Given queriesQ 1∈ Q 1andQ 2∈ Q 2,
determine whetherQ 2⊑Q 1.
Proof.The proof is a straightforward consequence of either Propositions 7.1 or
7.2 and of Theorem 7.3.
8. Normalization
Now we examine cases where we also allow equation ACs in the normalized
version of the queries. The following is an easy result:
Theorem 8.1.SupposeQis the class of CQAC queries defined by the AC-type
TACin their normalized form. SupposeQ 1, Q2are two queries fromQ. Suppose
there is an AC inQ 1which isX=c 0and there is no AC inQ 2which isY=c 0.
Then the following is true:Q 2̸⊑Q 1.
Proof.After applying the distributive law on the containment entailment, there
is a containment implication which on the lhs is a disjunction of equations, all of
32
which use the same constant,c 0. Since this containment implication is true iff
there is a disjunct implied by the lhs, this is a contradiction, since we assumed
that the lhs does not includeY=c 0in the closure of its ACs.
In the rest of this section, we find cases when normalization is not neces-
sary. I.e., instead of normalizing the queries, we consider, in each query, all the
equations that are derived by its set of ACs and we replace all variables that are
shown equal with a single variable or constant and then we find all mappings
(which, in general, are fewer in number because there are fewer variables) and
form the containment entailment which is guaranteed to decide on whether one
query is contained in the other.
The following is a technical lemma which is useful in the proofs of the results
that follow.
Lemma 8.2.Consider the containment entailment
β2⇒µ 1(β1)∨µ 2(β1)···µ i(β1)···
whereβ 2is the conjunction of ACs in the contained query andβ 1is the con-
junction of ACs in the containing query.
Letβ 1=b 1∧ ··· ∧b m.If the above containment entailment is true, then
there is aµ i(say it isµ 1) for which, eachµ 1(bj)which is an equation (say the
equationX=Y) is such thatβ 2⇒µ 1(bj), i.e.,µ i(X) =µ i(Y).
Proof.Suppose that the conclusion in the statement of the lemma is not true.
Then for alli, there is ak isuch thatµ i(bki)is not implied byβ 2, whereb kiis
an equation. Then we consider the containment implicationβ 2⇒ ∨ iµi(bki)in a
minimal form. We move (according to Lemma 4.4) disjuncts from the rhs to the
lhs of this containment implication to prove a certain equation (say the equation
X=Y). By moving, according to Lemma 4.4, an equation from the rhs to the
lhs, we apply negation, hence we are adding a conjunct on the lhs which uses
̸=. Now, observe that the elemental implications that prove an equation do not
use̸=hence, the equation is implied byβ 2, which is a contradiction.
33
8.1. Closed ACs in the containing query
The following lemma is an extension of a known result which had assumed
that both queries use closed ACs:
Lemma 8.3.When the containing query,Q 1, contains only closed ACs, and
the contained query,Q 2contains any AC, then normalization is not necessary
when checking CQAC query containment.
Proof.Suppose the containment entailment is true.
Suppose that containment mappingµ 1has the property of Lemma 8.2.
Letβ 1=b 1∧ ··· ∧b m.Inductively, we assume that there is a setSofµ is
such that ifµ i(bj)is an equation then it is implied byβ 2, i.e.,β 2⇒µ i(bj).
Consider the entailment withβ 2on the lhs and a disjunction of theµ i(β1)for
allµ iinSon the rhs. (I.e., we have thatβ 2implies the disjunction of these
µi(β1)s). Let it be:
β2⇒µ′
1(β1)∨ ··· ∨µ′
k(β1)
Apply the distributive law to it and obtain a set of containment implications.
If one of those containment implications does not use any AC which is “=”, we
say that it is a NEQ containment implication. We observe that if each NEQ
containment implication is true then theµ is inSprove containment of the one
query to the other, i.e., the above entailment is true. Moreover we have used
only mappings with all equations being implied byβ 2,hence we have proven
the lemma.
To finish the proof, we will prove that either a) each NEQ containment
implication is true or b) there is aµ i=µknot inSsuch that for eachb jwhich
is an equation AC the following is true:β 2⇒µ k(bj).
If (a) is not true then there is a NEQ containment implicationβ 2⇒δ k
which is not true. This means that¬δ k∧β2is consistent. Suppose the following
containment implication is true
β2⇒µ′
1(β1)∨ ··· ∨µ′
k(β1)∨µ 1(β1)∨ ···
34
where the primed mappings are the ones in the setS. Suppose (b) above is not
true. Thenµ i(β1)has at least one “=” AC not implied byβ 2. Suppose this
isµi(bki). Consider the containment implication,γ(derived by applying the
distributive law) of the above containment entailment which is a disjunction of
δkwith allµ i(bki)s.
Weobservethat¬δ k∧β2doesnotimplyanyequationinγthatisnotimplied
byβ 2because the¬δ kcontains only open ACs, hence elemental implication (9)
cannot be used. Notice a subtle point, that if we can use elemental implication
(2) then (9), this means that¬δ k∧β2is not consistent, hence, it is not possible.
Hence, the containment entailment that proves containment uses only map-
pingsµ i, such that, for eachµ i(X=Y)the following is true:β 2⇒µ i(X=
Y).
The following example shows that even if only LSIs are used in both queries,
if the containing query contains a single open LSI, then, in certain cases, nor-
malization is necessary:
Example 8.4.Q 1() :−a(5, X), X <5,
Q2() :−a(5, X), a(X, Y), X≤5, Y <5.
8.2. Only LSIs in the containing query. NP via HP
Lemma 8.5.SupposeQ 1is the class of CQAC queries defined by the AC-type
{var <const, var≤const, var=const, var=var}andQ 2is the class of CQAC
queries defined by the AC-typeT ACsuch that the following condition is satisfied:
There is no constant that is shared by an equation AC inQ 1by an open LSI
inQ 1and by a closed LSI inQ 2.
Consider the following problem: Given queriesQ 1∈ Q 1andQ 2∈ Q 2, de-
termine whetherQ 2⊑Q 1. This containment problem can be decided by check-
ing satisfaction of a containment entailment which uses containment mappings
among non-normalized queries.3
3recallthataCQACqueryisreferredtoasnon-normalizedifitsACsdonotimplyequalities
35
Proof.It is a copy-and-paste of the proof of Lemma 8.3 up until the last two
paragraphs. Now we need to argue differently on theδ kthat appears in this
proof. For that, we observe, that an OLSI when moved on the lhs of the con-
tainment implication (as in Lemma 4.4) is a CRSI which, however, because of
the condition cannot be used together with a CLSI to derive an equation.
The following theorem is a consequence of Lemma 8.5 and Theorem 4.10:
Theorem 8.6.SupposeQ 2is the class of CQAC queries whose normalized ver-
sion uses ACs fromT ACandQ 1is the class of CQAC queries whose normalized
version uses ACs from{var≤const, var <const, var=var, var=const}.
Suppose the following conditions are satisfied:
1. For anyX̸=Ythat appears inQ 2, if a constant (sayc 0) is related by
an AC to bothXandYthen, then either (i)c 0does not relate to bothXand
Yby a closed AC inQ 2or (ii)c 0does not appear in an open AC inQ 1.
2. There is no constant that is shared by an equation AC inQ 1by an open
LSI inQ 1and by a closed LSI inQ 2.
Then, the following problem is inNP: Given queriesQ 1∈ Q 1andQ 2∈ Q 2,
determine whetherQ 2⊑Q 1.
Proof.The second conditions makes sure that normalization is not necessary,
and the first condition is coming from Theorem 4.10.
8.3. Both LSIs and RSIs appear in the containing query. Cases in NP
The following theorem is a straightforward consequence of Lemma 8.3 and
Theorem 7.4(1).
Theorem 8.7.SupposeQ 2is the class of CQAC queries whose normalized ver-
sion uses ACs fromT ACandQ 1is the class of CQAC queries whose normal-
ized version uses ACs from{var≤const, var≥constone, var=var, var=cons}.
(We writevar≤constone(var < constone) to mean that at most one RSI is
present.)
Then, the following problem is inNP: Given queriesQ 1∈ Q 1andQ 2∈ Q 2,
determine whetherQ 2⊑Q 1.
36
Now, we extend Theorem 8.7 to include also open LSIs and RSIs.
Theorem 8.8.SupposeQ 2is the class of CQAC queries whose normalized ver-
sion uses ACs fromT ACandQ 1is the class of CQAC queries whose normalized
version uses either of the following:
1. ACs from{var≤const, var≥constone, var=var, var=cons}.
2. ACs from{var≤const, var>constone, var=var, var=cons}.
3. ACs from{var<const, var≥constone, var=var, var=cons}.
4. ACs from{var<const, var>constone, var=var, var=cons}.
(We writevar≤constone(var < constone) to mean that at most one RSI
is present.)
We assume the following constraints: a) each constant in an RSI inQ 1is
different from each constant in an LSI inQ 1, b) each constant inQ 1in an open
SI inQ 1does not appear in a closed SI inQ 2.
Then, the following problem is inNP: Given queriesQ 1∈ Q 1andQ 2∈ Q 2,
determine whetherQ 2⊑Q 1.
Proof.Because of the constraints, we can use the same argument as in the proof
of Lemma 8.3 (together with similar argument as in Lemma 8.5) to prove that
normalization is not necessary. The rest is a consequence of Theorem 7.4.
9. Extending the Results to Include Single Mapping Variables
In the results about complexity of query containment that we presented, we
have often imposed a condition where a constant is not allowed to appear in
certainpositions. Whensuchcasesarise,itisinterestingtonoticethat,whenthe
condition is not satisfied, there are still classes of queries for which the problem
of query containment is in NP. Such classes can be derived by observing that,
often, two attributes in a query that may be used in an arithmetic comparison
represent quantities that are measured in different units, e.g., weight and height.
For these attributes, sayAandB, if we have an ACA≥5andB >5, we
can safely assume that these are different constants, since, in a containment
37
mapping, an attribute that represents weight will never map on at attribute
that represents height. These observations are further elaborated in [7, 9].
In an orthogonal fashion, there may exist certain variables of the containing
query which, in every containment mapping map on the same variable on the
contained query. Such variables are the head variables, but there may be others.
We call such variablessingle-mappingvariables and the formal definition is in
Definition 9.1.
We start the analysis by focusing on the head variables. Consider two
CQACs,Q 1=Q 10+β 1andQ 2=Q 20+β 2. and the containment entailment:
β2⇒µ 1(β1)∨ ··· ∨µ k(β1)(3)
whereµ 1, . . . , µ kare all the containment mappings fromQ 10toQ 20. Suppose
β1is such thatβ 1=β 11∧β 12whereβ 11is the conjunction of ACs that use
only distinguished variables andβ 12the conjunction of the rest of the ACs in
β1. We observe that, in the containment entailment, each disjunct on the right
hand side becomes:
µi(β1) =µ i(β11)∧µ i(β12).
However,µ i(β11)is the same for everyi. Thus, applying the distributive law,
we write the containment entailment:
β2⇒µ 1(β11)∧[µ 1(β12)∨ ··· ∨µ k(β12)].
Consequently, the containment entailment is equivalent to the conjunction of
the following two entailments:
β2⇒µ 1(β11).
β2⇒µ 1(β12)∨ ··· ∨µ k(β12).
Definition 9.1.(single-mapping variables) LetQ 1=Q 10+β 1,Q 2=Q 20+β 2
be two CQACs, such that there is at least one containment mapping fromQ 10
toQ 20. Consider the setMof all the containment mappings fromQ 10to
Q20. Each variableXofQ 1which is always mapped on the same variable of
38
Q2(i.e., for eachµ∈ M,µ(X)always equals the same variable) is called a
single-mapping variable with respect toQ 2.
As we mentioned, the head variables ofQ 1are single-mapping variables with
respect to any query. For another example, consider a predicatersuch thatQ 1
has the subgoalsg 11, g12, . . . , g 1nwith predicaterandQ 2has asinglesubgoal
g2with predicater. Since each of theg 11, g12, . . . , g 1nsubgoals maps ong 2,
for every containment mapping fromQ 1toQ 2, the variables ing 11, g12, . . . , g 1n
subgoals are single-mapping variables.
The following theorem gives a polynomial reduction of a CQAC containment
problem to one with fewer ACs in the containing query.
Theorem 9.2.Consider CQAC queriesQ 1=Q 10+β 1andQ 2=Q 20+β 2.
LetX 1be a set of single-mapping variables ofQ 1with respect toQ 2. Letβ 1=
β11∧β12whereβ 11is the conjunction of ACs that use only variables inX 1and
β12the conjunction of the rest of the ACs inβ 1. LetQ′
1=Q 10+β 12Then the
following are equivalent:
(i)Q 2⊑Q 1
(ii)β 2⇒µ 1(β11)andQ 2⊑Q′
1
We call the second implication in the statement of the above theorem the
reduced containment entailmentand we callQ′
1thereduced containing query
with respect toQ 2.
A consequence of Theorem 9.2 is the following theorem:
Theorem 9.3.LetQ 1andQ 2be two classes of CQAC queries such that if
Q1∈ Q 1andQ 2∈ Q 2then the following problem is in NP: IsQ 2⊑Q 1?
LetQ′1be the class of queries, such that for eachQ 1∈ Q′1, its reduced
containing query with respect toQ 2is inQ 1. Then, forQ 1∈ Q′1andQ 2∈ Q 2
the following problem is in NP: IsQ 2⊑Q 1?
39
10. Maximally Contained Rewritings and Certain Answers. Defini-
tions
In the second part of the paper we prove that, for special cases of CQAC
queries, we can compute certain answers in polynomial time. We first prove
that for CQAC queries and views, a maximally contained rewriting (see Defi-
nition 10.2) computes exactly all certain answers. This is done in Section 12.
Thus, if we can find an MCR in a query language for which query evaluation
has polynomial-time data complexity, then we can compute certain answers in
polynomial time. We show in Sections 13 through 16 that, for a special case
of CQAC queries with semi-interval comparisons, we can build an MCR in the
language of Datalog with ACs.
Definition 10.1. (Contained Rewriting)Given a queryQand a set of
viewsV,Ris acontained rewritingofQin terms ofVunder the open world
assumption if and only if, for each viewsetI, the following is true: For each
database instanceDsuch thatI ⊆ V(D),R(I)is a subset of or equal toQ(D).
Definition 10.2. (Maximally Contained Rewriting (MCR))Given a
queryQand a set of viewsV,R MCRis amaximally contained rewriting (MCR)
ofQin terms ofVifR MCRis a contained rewriting ofQin terms ofVand
every contained rewritingRofQin terms ofVis contained inR MCR.
Definition 10.3. (Certain Answers)LetVbe a set of views andIbe a
view instance. Suppose there exists a database instanceDsuch thatI ⊆ V(D).
Then, we define the certain answers of (Q,I) with respect toVunder the Open
World Assumption as follows:
certain(Q,I) =\
{Q(D) :Dsuch thatI ⊆ V(D)}
If there is no database instanceDsuch thatI ⊆ V(D), we say that the set
certain(Q,I)is undefined.
Theorem 10.4.Given a query and a set of view definitions, a contained rewrit-
ing computes only certain answers.
40
Proof.The proof is based on the observation that if a setAis a subset of set
A1andAis a subset of setA 2, thenAis a subset ofA 1∩A 2. To prove, suppose
elementeis inA 1and inAbut not inA 2. This is a contradiction because ife
is inAandAis a subset ofA 2,eshould be inA 2too. The sets that we refer
to, are the sets in{Q(D) :Dsuch thatI ⊆ V(D)}vs.R(I).
11. Contained Rewritings for CQAC classes. Preliminaries.
In this section, we prove how to check that a CQAC rewriting for CQAC
queries and views is a contained rewriting using the expansion of the rewriting.
Then, we need to introduce the concept of rectified rewriting which is a technical
detail that appears when we use ACs in queries and views and we will need it
in order to argue about containment of one contained rewriting to another.
11.1. Expansion of a rewriting
For CQAC queries and views, we define the expansion of a rewriting:
Definition 11.1.Given a CQAC queryQand a set of CQAC viewsVand a
rewritingRofQusingV, theexpansionRexpofRis produced as follows: We
unify each subgoal ofRwith the head of the corresponding view definition and
replace this subgoal with all subgoals from the view definition after unification,
while keeping the nondistinguished variables in the view definition distinct from
every other nondistinguished variables from the other subgoals ofR. We also
keep the ACs from the view definitions.
Theorem 11.2.Given a CQACQ, a set of views CQACVand a CQAC
rewritingRofQusingV,Ris acontained rewritingif and only ifRexpis
contained inQ.
Proof.IfRexp⊑Q, we argue as follows: LetIbe a viewset andDa database
instance such thatI ⊆ V(D). We have thatRexp(D)⊆Q(D)and we will
prove thatR(I)⊆Q(D). We computeRexp(D)by using a homomorphism
fromRexptoDthat satisfies the ACs inRexp. When we produce theRexp
41
fromR, we use the view definitions and the head of the view definitions in
Rexpmap onDand produce tuples inV(D). These tuples can be used to
produce the homomorphism fromRtoV(D), henceR(V(D))is a subset of
Q(D)and, thereforeR(I)⊆Q(D). For the other direction, supposeRis a
contained rewriting, hence we have: For each database instanceDsuch that
I ⊆ V(D),R(I)is a subset of or equal toQ(D). We also have thatR(I)⊆
R(V(D)), because of monotonicity of CQACs. The homomorphism fromR
toV(D)that producesR(V(D))can be used to find a homomorphism from
RexptoDthat produces the same tuples inRexp(D)as inR(V(D)). Hence,
R(I)⊆Rexp(D).
11.2. Rectified Rewriting
ConsideraCQACqueryandasetofCQACviews. Whenwehavearewriting
Rthe variables inRalso satisfy some ACs that are in the closure of the ACs in
the expansion ofR. We include those ACs in the rewritingRand produceR′,
which we call theAC-rectified rewriting ofR. Thus, the expansions ofRand
R′are equivalent queries. Hence, we derive the following proposition:
Proposition 11.3.Given a set of CQAC views, a rewritingRand its rectified
versionR′, the following is true: For any view instanceIsuch that there is a
database instanceDfor whichI ⊆ V(D), we have thatR(I) =R′(I).
Definition 11.4.We say that a rewritingRisAC-containedin a rewritingR 1
if the AC-rectified rewritingR′ofRis contained inR 1as queries.
From hereon, when we refer to a rewriting, we mean the AC-rectified version
of it and when we say that a rewriting is contained in another rewriting we mean
that it is AC-contained. An example follows.
Example 11.5.Consider queryQand viewV 2:
Q(A):-p(A), A <4.
V2(Y, Z):-p(X), s(Y, Z), Y≤X, X≤Z.
42
The following rewriting is a contained rewriting of the query in terms of the
view in the language CQAC:
R(Y 1):-V 2(Y1, Z1), V2(Y2, Z2), Z 1≤Y 2, Y1≥Z 2, Y1<4.
Now consider the following contained rewriting:
R′(X):-V 2(X, X), X <4.
This rewriting uses only one copy of the view. We can show thatRis not
contained inR′and thatR′is not contained inR. However, let us consider the
rectified version ofR:
Rrect(Y1):-V 2(Y1, Z1), V2(Y2, Z2), Z 1≤Y 2, Y1≥Z 2, Y1<4, Y 1≤Z 1, Y2≤Z 2.
We can easily prove now thatR′andRrectare equivalent.
12. Union of CQAC MCRs compute all certain answers for CQAC
queries and views
In this section, we will prove that, for CQAC views, a maximally contained
rewritingPwith respect to union of CQACs (U-CQAC)4of a CQAC queryQ
computes the certain answers ofQunder the OWA, i.e., we prove the following
theorem.5
Theorem 12.1.LetQbe a CQAC query,Va set of CQAC views. Suppose
there exists an MCRR MCRofQwith respect to U-CQAC. LetIbe a view
instance such that the set certain(Q,I)is defined. Then, under the open world
assumption,R MCRcomputes all the certain answers ofQon any view instance
Ithat is:R MCR(I) =certain(Q,I).
Proof.We define adatabase-AC-homomorphism6with respect to tupletto be
a homomorphism,h, from a database instanceD 1to another database instance
4In the literature, usually, by U-CQAC we define the class of finite unions of CQACs, in
this section we assume that it may be also infinite.
5This section extends the results in [10] for CQs.
6this definition is only used in the present subsection
43
D2such that the constants intmap to the same constants, but the rest of the
constants are allowed to map to any constant and even two distinct constants
are allowed to map on the same constant. Moreover, for anyc 1, c2inD 1, if
c1θ c2(whereθis any AC) thenh(c 1)θ h(c 2).
Whenthereisadatabase-AC-homomorphismwithrespecttotfromdatabase
instanceD 1to database instanceD 2then for any CQAC,Q, and for any tuple
of constants,t 0, with constants only fromtift 0is inQ(D 1)it is also inQ(D 2).
We consider the conjunction of all the atoms inIafter we have turned
the constants inIto variables and all the ACs among these variables that
correspond to the order of the constants inI. Thus, we turnIinto a Boolean
CQAC queryR I. We consider the expansion,Rexp
I, ofR I.7It is easy to see
that if there is aD 0such thatI=V(D 0)thenI=Rexp
I(D0). Thus, for any
Disuch thatI ⊆ V(D i)then we have thatRexp
I(Di)⊆ V(D i).
IfRexp
I(Di)⊆ V(D i), for eachD isuch thatI ⊆ V(D i), there is a canonical
databaseDexp−i
IofRexp
Isuch that there is a database-AC-homomorphism from
Dexp−i
IonD i, henceQ(Dexp−i
I)⊆Q(D i). Thus, we have proven that the
following claim is true.
Claim:WeevaluatethequeryQoneachofthecanonicaldatabases,Dexp−i
I , i=
1,2, . . .ofRexp
Iand remove the tuples containing constants not appearing inI
(the latter denoted by the operation↓). We claim:
certain(Q,I) =∩{Q ↓(Dexp−i
I), i−1,2, . . .}
To conclude the proof, we are going to construct a finite union of CQACs
that constitute a contained rewriting,RI
cont, ofQusingVwhich is such that
certain(Q,I) =RI
cont(I). We construct this union of CQACs as follows: For
each tuplet iincertain(Q,I), we construct a CQAC with body theR Iand
head variables equal tot i. It computes the tuples in certain answers trivially
and the claim proves that these are indeed certain answers.
7It is easy to see that this is possible if the set certain(Q,I)is defined.
44
13. Computing Certain Answers in PTIME for RSI1 Queries and
CQAC views
In the rest of the paper, we will finish proving the main result of the second
part of this paper, which is the following theorem:
Theorem 13.1.Given a queryQwhich is CQAC-RSI1 (RSI1, for short) with
closed ACs and viewsVwhich are CQACs, we can find all certain answers of
QusingVon a given view instanceIin time polynomial on the size ofI.
The above theorem is a straightforward consequence of the main result of
the previous section and Theorem 16.1 which says thast we can find an MCR
in the language of Datalog with arithmetic comparisons.
Our method is as follows: We first establish a containment test which trans-
forms the containing query into a Datalog program and the contained query into
a CQ and show that containment is equivalent to testing containment among
the transformed queries (Theorem 14.4). An algorithm already exists that finds
an MCR of a Datalog query in terms of CQ views. We show that we can trans-
form this MCR to an MCR of the original query in terms of the original views.
This MCR is in the language of Datalog with ACs.
The following proposition together with Proposition 7.1 is the main reason
the transformation works:
Proposition 13.2.Consider the following containment implication
a1∧a 2∧ ··· ∧a n⇒b 1∨b2∨ ··· ∨b m (4)
where thea is are any AC and theb is are closed SIs. When this implication is
in minimal form it has at most two ACs on the rhs.
Proof.The proof is a consequence of Lemma 4.2(2).
The following containment implication shows that Proposition 13.2 is not
true if we use open SIs on the rhs.
X̸=Y⇒X >7∨Y >7∨X <7∨Y <7
45
13.1. Preliminaries on Datalog Queries
We include definitions and known results about Datalog. ADatalog query
(a.k.a. Datalog program) is a finite set of Datalog rules, where aruleis a CQ
whose predicates in the body could either refer to a base relation (extensional
database predicates,EDBs), or to a head of a rule in the query (intensional
databasepredicates,IDBs). Furthermore, thereisadesignatedpredicate, which
is calledquery predicate, and returns the result of the query. The atom whose
predicate is an EDB (resp. IDB) is calledbase atom(resp.derived atom). A
Datalog query is calledmonadicif all the IDBs are unary.
The evaluation of a Datalog query on a database instance is performed by
applying (we often say firing) the rules on the database until no more facts (i.e.,
ground head atoms) are added to the set of the derived atoms. The answer of a
Datalog query on a database is the set of facts derived during the computation
for the query predicate. ADatalogACquery allows in each rule also arithmetic
comparisons (ACs) as subgoals, i.e., each rule is a CQAC. The evaluation pro-
cess remains the same, only now, the AC subgoals should be satisfied too. We
say that weunfold a ruleif we replace an IDB subgoal with the body of another
rule that has this IDB predicate in its head, and we do that iteratively. Apartial
expansionof a Datalog query is a conjunctive query that results from unfolding
the rules one or more times; the partial expansion may contain IDB predicates.
ADatalog-expansion(or, simplyexpansion, if confusion does not arise) of a
Datalog query is a partial expansion that contains only EDB predicates. Con-
sidering all the (infinitely many) expansions of a Datalog query we can prove
that a Datalog query is equivalent to an infinite union of conjunctive queries.
An expansion of aDatalogACquery is defined the same way as an expansion
of a Datalog query, only now we carry the ACs in the body of each expansion
we produce. Thus, in an analog way, aDatalogACquery is equivalent to an
infinite union of CQACs.
Aderivation treedepicts a computation of a Datalog query. Considering a
factein the answer of the Datalog query, we construct a derivation tree for this
fact as follows. Each node in this tree, which is rooted ate, is a ground fact.
46
For each non-leaf nodenin this tree, there is a rule in the query which has been
applied to compute the atom nodenusing its children facts. The leaves are
facts of the base relations. Such a tree is calledderivation treeof the facte.
During the computation, we use aninstantiated rule, which is a rule where
all the variables have been replaced by constants. We say that a rule isfiredif
there is an instantiation of this rule where all the atoms in the body of the rule
are in the currently computed database.
14. A new Containment Test for RSI1 containing query and CQAC
contained query
Without loss of generality, we restrict attention to Boolean queries. First,
we describe the construction of a Datalog query from a given RSI1 queryQ 1.
We introduce the predicates that we use:
Predicate Definition List:
1. One EDB predicate for each relation inQ 1of the same arity and with the
same name and same attributes.
2. One binary EDB predicate, calledU.
3. Four unary IDB predicates for each constant,c, that appears in an AC in
Q1, we call themI θcandJ θc, whereθis either≤or≥.
4. A Boolean IDB predicate calledQDatalog
1, which is the query predicate.
•We refer to the predicatesI θcasJ θcas theassociated predicatesof the SI
ACXθc. We refer to the atomI θc(X)(atomJ θc(X), respectively) as the
associatedatom of SI ACXθcand vice versa.
The recursive rules of the Datalog query depend only on the containing
query,Q 1. The non-recursive rules take into account a finite set ofrelevant
semi-interval (SI) ACsand we also call themdependent rulesorlink rules. The
relevant SI ACs are typically ACs that are in the closure of ACs in the contained
47
query, but not necessarily8. Moreover, for convenience of reference, we divide
the basic rules into three kinds:mapping rules, coupling rules,and a single
query rule, which is a slightly modified mapping rule which is fired only once.
We describe the construction of the recursive rules of the Datalog query:
Construction of the recursive rules:
1. Thequery rulecopies into its body all the relational subgoals ofQ 1, and
replaces each AC subgoal ofQ 1that compares a variable to a constant by
its associatedI-atom (with variable being the same as the variable in the
AC). The head is the query predicate.
2. We construct onemapping rulefor each SI arithmetic comparisoneinQ 1.
The body of each mapping rule is a copy of the body of the query rule,
except that theIatom associated witheis deleted. The head is theJatom
associated withe.
3. For every pair of constantsc 1≤c 2used inQ 1, we construct threecoupling
rules.
Two coupling rules of thefirst kind:
I≤c2(X) :−J ≥c1(X)
I≥c1(X) :−J ≤c2(X)
One coupling rule of thesecond kind:
I≤c2(X) :−J ≥c1(Y), U(X, Y).
Construction of the dependent or link rules
•For each relevant SI AC,Xθc, we introduce a unary base predicates,U θc
and we add the following link rule:
Iθc1(X) :−U θc(X).
wherec 1is a constant inQ 1, for whichXθc⇒Xθc 1
Example 14.1.The following queryQ 1is an RSI1 query:
8the fact that there are other options will be clear later when we discuss about constructing
rewritings
48
Q1() :−e(X, Y), e(Y, Z), X≥5, Z≤8.
For the queryQ 1, the construction we described yields the following recursive
rules of the Datalog queryQDatalog
1:
QDatalog
1 () :−e(X, Y), e(Y, Z), I ≥5(X),
(query rule)
I≤8(Z).
J≤8(Z) :−e(X, Y), e(Y, Z), I ≥5(X).(mapping rule)
J≥5(X) :−e(X, Y), e(Y, Z), I ≤8(Z).(mapping rule)
I≤8(X) :−J ≥5(X).(coupling rule 1)
I≥5(X) :−J ≤8(X).(coupling rule 1)
I≤8(X) :−J ≥5(Y), U(X, Y)(coupling rule 2)
I≥5(X) :−J ≤8(Y), U(Y, X)(coupling rule 2)
Intuitively, a coupling rule denotes that a formulaAC 1∨AC 2( for two SI comparisons
AC 1=Xθ 1c1andAC 2=Y θ 2c2) is either true (coupling rule 1) or it is implied by
X≤Y(which is encoded by the predicateU(X, Y)) (coupling rule 2). Thus, the first
coupling rule in the above query says thatX≤8∨X≥5is true and the second
coupling rule says the same but refering to differentIandJ-atoms. Moreover, the last
coupling rule says thatX≤Y⇒X≤8∨Y≥5.
Construction of CQ for Contained Query
We now describe the construction of the contained query turned into a CQ.
We introduce new unary EDBs, specifically two of them, by the namesU ≥c
andU ≤c, for each constantcinQ 2. Let us now construct the CQQCQ
2from
Q2. We initially copy the regular subgoals ofQ 2, and for each SIX iθciin the
closure ofβ 2we add a unary predicate subgoalU θci(Xi). Then, for each AC
X≤Yin the closure of ACs inQ 2, we add the unary subgoalU(X, Y)in the
body of the rule.
Example 14.2.Considering the CQACQ 2with the following definition:
Q2() :−e(A, B), e(B, C), e(C, D), e(D, E), A≥6, E≤7.
we construct theQCQ
2whose definition is:
QCQ
2() :−e(A, B), e(B, C), e(C, D), e(D, E), U ≥6(A), U ≤7(E).
49
Thus the dependent rules for our running example, queryQ 1, and the above
contained queryQ 2are:
I≥5(X) :−U ≥6(X).(link rule)
I≤8(X) :−U ≤7(X).(link rule)
Now, we have completed the description of the construction of bothQDatalog
1
fromQ 1andQCQ
2fromQ 2. We go back to our examples and put all together
in the following example:
Example 14.3.Our contained query is the one in Example 14.2. Our con-
taining query is the one in Example 14.1. The transformation of the containing
query is shown in Example 14.1. The transformation of the contained query is
shown in Example 14.2. To complete the Datalog query, we add the following
link rules:
I≥5(X) :−U ≥6(X).(link rule)
I≤8(X) :−U ≤7(X).(link rule)
One rule links the constant 6 from the ACs ofQ 2to the constant 5 from the
ACs ofQ 1. The other link rule links constants 7 and 8 from queriesQ 1and
Q2, respectively.
The constructions of the Datalog query and the CQ presented in Section 14
are important because of the following theorem.
Theorem14.4.Consider two conjunctive queries with arithmetic comparisons,
Q1andQ 2such thatQ 1is an RSI1 query with closed ACs. LetQDatalog
1be the
transformed Datalog query ofQ 1. LetQCQ
2be the transformed CQ query ofQ 2.
Then,Q 2is contained inQ 1if and only ifQCQ
2is contained inQDatalog
1.
The proof of Theorem 14.4 is in Appendix B. In the next section, we begin
to discuss some of the technicalities involved, as an introduction to the proof.
Appendix A offers intuition on how the transformations work.
50
15. Preliminary Results and Intuition on the Proof of Theorem 14.4
We observe that the Datalog program we construct from a RSI1 query has
only unary IDB predicates and we conveniently refer to them using the symbol
we used to name them asIpredicates andJpredicates.Jpredicates appear in
the head of mapping rules and in the body of coupling rules andIpredicates
appear in the body of mapping rules and in the head of coupling rules. We
conveniently say that they produceIfacts andJfacts.
In the proof of Theorem 14.4, we will apply the Datalog queryQDatalog
1on
the canonical database of the CQ queryQCQ
2constructed from the contained
queryQ 2. This canonical database uses constants (different from the constants
in the ACs) that correspond one-to-one to variables of the queryQ 2. We do
the following observations about the result of firing a recursive rule (i.e., either
a coupling rule or a mapping rule): (All theθ is represent either≤or≥and the
cis are constants from the ACs of the queries.)
•Firing coupling rules.We have two kinds of coupling rules. Consider a
coupling rule of the first kind which is of the form:
Iθ1c1(X) :−J θc2(X).
When this rule is fired, its variableXis instantiated to a constant,y, in the
canonical database,D, ofQ 20. The constantycorresponds to the variable
YofQ 2by convention. Then the following is true by construction:Y θ 1c1∨
Y θ2c2, and, hence, the following is true:β 2⇒Y θ 1c1∨Y θ 2c2.Now consider
the second kind of coupling rule, which is of the form:
Iθ1c1(X) :−J θc2(Y), U(X, Y).
By construction of the rule, the EDBU(X, Y)is mapped inDto two con-
stants/variables,W, Z, such that there inQ 2an AC which isW≤Z. Thus,
by construction of the rule, the following is true again:β 2⇒Wθ 1c1∨Zθ 2c2.
•Firing both mapping and coupling rules.
51
Example 15.1.For a first example, suppose only the query rule and link
rules are needed to prove containment ofQCQ
2inQDatalog
1 .Suppose we fire
a link rule to compute the factI θc(x). This yields thatβ 2⇒Xθc(by con-
struction of the rule). Suppose that after applying some link rules, we are
able to fire the query rule using a mappingµ 1. I.e., for eachµ 1(eβ1
i)(where
eβ1
iis such thatβ 1=eβ1
1∧eβ1
2∧. . .), we haveβ 2⇒µ 1(eβ1
i). Consequently
the following is true:β 2⇒µ 1(β1).This is a containment entailment which
reassures thatQ 2⊑Q 1.
For a second example, suppose we use link rules and twice a mapping rule
using mappingsµ 1andµ 2. Suppose we use mappingµ 1to compute aJfact,
then we fire a coupling rule in order to compute aIfact. And then we use
thisIfact to fire mapping ruleµ 2to compute anotherJfact associated with
ACµ 1(eβ1
i).
Then, we have:β 2⇒µ 1(β1)∨ ¬µ 1(eβ1
i), or, equivalently,
β2∧µ 1(eβ1
i)⇒µ 1(β1),or, equivalently:
β2∧ ¬µ 1(β1)⇒ ¬µ 1(eβ1
i)A.1
Then we apply another time a mapping rule using mappingµ 2, computing a
Jfact associated with ACµ 2(eβ1
j).
µ2(eβ1
j)andµ 1(eβ1
i)appear in the firing of a coupling rule. Henceβ 2⇒
µ2(eβ1
j)∨µ 1(eβ1
i). Orβ 2∧ ¬µ 1(eβ1
i)⇒µ 2(eβ1
j).
Taking into account A.1
β2∧ ¬µ 1(β1)⇒µ 2(eβ1
j)A.2
We will prove below that
β2∧µ 2(eβ1
k)⇒µ 2(β1)∨ ¬µ 2(eβ1
j)A.3
In view of A.3, A.2 is equivalent to
β2∧µ 2(eβ1
k)∧ ¬µ 1(β1)⇒µ 2(β1)A.4
52
which yields the following result that we will use in the induction in the
formal proof of Theorem 14.4:
β2∧µ 2(eβ1
k)⇒µ 2(β1)∨µ 1(β1)
To prove implication A.3, we argue as follows: When we fire a mapping with
mappingµ 2, we use theIfact computed by the firing of a mapping rule with
mappingµ 1followed by a firing of a coupling rule. All the other facts that
participate in the firing of the mapping rule via mappingµ 2are computed
using link rules.
Hence we have
β2⇒µ 2(β1)∨ ¬µ 2(eβ1
j)∨ ¬µ 2(eβ1
k)(5)
from which we obtain implication A.3. Equation 5 is obtained based on the
observation The above logical implication is based on the fact that if we have
a true logical implicationp⇒q 1then the following logical implication is also
truep⇒(q 1∧q2∧q3)∨ ¬q 2∨ ¬q 3.
16. Finding MCR for CQAC-RSI1 Query and CQAC Views
In this section, we show that for a RSI1 query and CQAC views, we can find
an MCR in the language of (possibly infinite) union of CQACs. We will show
that this MCR is expressed in Datalog with ACs. I.e., we prove:
Theorem 16.1.Given a queryQwhich is CQAC-RSI1 (RSI1 for short) with
closed ACs and CQAC viewsV, we can find an MCR ofQusingVin the
language of Datalog with ACs.
16.1. Building MCRs for RSI1 queries
In this subsection, we present the algorithm for building an MCR in the
language of (possibly infinite) union of CQACs for the case of CQAC views and
queries that are RSI1, which is the following (see also Figure 1):
Algorithm MCR-RSI1:
Stage I. Building CQ views and Datalog query
53
Figure 1: Finding an MCR
1. For each viewv iinV, we construct a new viewVCQ
iinVCQusing the
transformation described in Section 14.
2. In the view setVCQwe also addauxiliary views. We introduce new EDB
predicates calledAC-predicates(which will encode ACsvar≤varand
var θ const) as follows: a binary predicateUand, a set of unary predi-
catesU ≤candU ≥c, two for each constantcthat appears in the query and
the views. We construct the auxiliary views as follows: a) Views with head
uθc, one for each unary predicateU θc. The definition isu θc(X) :−U θc(X).
b) A single viewu, whose definition isu(X, Y) :−U(X, Y).
3. For the queryQ, we construct the Datalog queryQDatalogusing the con-
struction in Section 14. The link rules will use the constants present in the
views and in the query. The only difference is that we use IDB predicate
Utrinstead of EDB predicateU.Utrencodes transitive closure of the≤.
4. We finish the construction of the Datalog query by adding transitive closure
rules for the transitive closure of the binary EDB predicateU, i.e., it is com-
putedbytherules:Utr(X, Y) :−U(X, Y)andUtr(X, Y) :−U(X, Z), Utr(Z, Y).
Moreover, for appropriate pairs of constantsc 1, c2we add rules of the fol-
lowing type:J θ1c1(X) :−U θc2(Y), Utr(X, Y).
Thus, in the first stage, from the original setVof views and queryQ, we
build a setVCQof views and queryQDatalog. This is illustrated by the two
top boxes in Figure 1 and the arrows indicate the transformations.
54
Stage II. Building MCR
5. We find an MCR,RCQ
MCR, for the Datalog queryQDatalogusing the views in
VCQ. For buildingRCQ
MCRwe use the inverse rule algorithm [22].
6. We obtain rewritingR ACas follows: we replace in the found MCRRCQ
MCR,
eachvCQ
ibyv i, eachu θc(X)byarithmeticcomparisonXθcandeachu(X, Y)
by arithmetic comparisonX≤Y. This is what we call reverse transforma-
tion in Figure 1..
We are called to prove that the rewritingR ACis an MCR ofQusingV. In the
proof we will use the rewritingsE, E′andP, P′which are added in the far right
end of Figure 1 to guide the reader during the proof in Subsection 16.2.
The following is an example of the transformation and the MCR that is
derived.
Example 16.2.Consider the queryQ 1and the views:
Q1() :−e(X, Z), e(Z, Y), X≥5, Y≤8.
V1(X, Y) :−e(X, Z), e(Z, Y), Z≥5.
V2(X, Y) :−e(X, Z), e(Z, Y), Z≤8.
V3(X, Y) :−e(X, Z 1), e(Z 1, Z2), e(Z 2, Z3), e(Z 3, Y).
We have already built the Datalog programQDatalog
1in Example 14.1 We
need to add in the Datalog program the link rules which will use constants 5 and
8 (these are the only constants that appear in the definitions).
The views that will be used to apply the inverse-rule algorithm are:
V′
1(X, Y) :−e(X, Z), e(Z, Y), U ≥5(Z).
V′
2(X, Y) :−e(X, Z), e(Z, Y), U ≤8(Z).
V′
3(X, Y) :−e(X, Z 1), e(Z 1, Z2), e(Z 2, Z3), e(Z 3, Y).
and the auxiliary views:u(X, Y) :−U(X, Y), J ≤5(X) :−U ≤8(Y), U(X, Y),
J≥8(X) :−U ≥5(Y), U(X, Y).
The inverse rule algorithm produces the following MCR ofQDatalog
1using
the viewsV′
1(X, Y),V′
2(X, Y), andV′
3(X, Y).
55
R() :−v′
1(X, W), T 1(W, Z), v′
2(Z, Y).
T(W, W) :−.
T(W, Z) :−T(W, W′), v′
3(W′, Z).
T1(W, W′) :−T(W, W′).
T1(W, Z) :−T 1(W, W′), U(W′, W′′), T1(W′′, Z).
The following is an MCR of the input queryQ 1(rather than ofQDatalog
1)
using the viewsV 1(X, Y),V 2(X, Y)andV 3(X, Y):
R() :−V 1(X, W), T 1(W, Z), V 2(Z, Y).
T(W, W) :−.
T(W, Z) :−T(W, W′), V3(W′, Z).
T1(W, W′) :−T(W, W′).
T1(W, Z) :−T 1(W, W′), W′≤W′′, T1(W′′, Z).
16.2. Proof that the algorithmMCR-RSI1is correct
First, we need to mention that the expressive power ofQDatalogremains the
same after adding the transitive closure rules, since they encode the transitive
property of≤.
Now, we are at the far end of Figure 1. First we consider a Datalog expan-
sionE′ofR AC, which corresponds to a Datalog expansion,E, ofRCQ
MCR(by
construction). We consider the view-expansions ofEandE′,EexpandE′exp
respectively.Eexpis the CQ transformation ofE′exp(by construction ofR AC).
Eis a contained rewriting ofQDatalog, henceEexpis contained inQDatalog.
According to the containment test based on transformations (Datalog transfor-
mation and CQ transformation),E′expis contained in the queryQ, henceE′is
a contained rewriting ofQ. This proves that MCRR ACis a contained rewriting
of the queryQ. It remains to prove that it is maximally contained.
Consider any contained CQAC rewriting,P′, ofQusing the viewV. Then,
P′exp⊑Q, and, according to the containment test via the transformations,
P′exp−CQ⊑QDatalog, whereP′exp−CQis the CQ transformation ofP′exp.
We constructPto be a query in terms ofV′which uses the relational body
ofP′and all the AC-EDB predicates ofP′exp−CQthat use only variables inP′.
56
The view-expansion ofP,Pexp, is equal toP′exp−CQby construction; hence,
Pexp⊑QDatalog.
Consequently,Pis a contained rewriting ofQDatalogin terms ofV′; hence,
it is contained in the MCR, i.e.,P⊑RCQ
MCR.
Consequently, there is a Datalog expansion,E, ofRCQ
MCRfrom which there
is a containment mapping onP. If we apply reverse transformation onE(i.e.,
replace the AC-EDB subgoals with ACs) then we get a Datalog-expansion,E′,
of the MCRR AC. Notice thatPandP′are“isomorphic”if we replace the
AC-EDB with ACs and vice-versa (Pis a rectified rewriting, so all the ACs
in the view-expansion involving variables ofPappear inP). Hence, there is
a containment mapping fromE′toP′., HenceP′⊑E′, and, consequently
P′⊑R AC. The table below shows pairs (vertically), each pair being a CQAC
oraDatalogACqueryandthecorrespondingCQtransformation. Noticethatwe
assume thatE′andP′contain all ACs in the closure of ACs in the rewriting,
whileEandPonly the present AC-EDB predicates. However, the Datalog
program discovers all of them by using the transitive closure rules on the AC-
EDB binary predicateUthat encodes≤.
P′P′expE′RAC with ACs
P Pexp=P′exp−CQE RCQ
MCRwith AC-EDB predicates
Thus, we have proven the theorem:
Theorem 16.3.Given a queryQwhich is RSI1 and viewsVwhich are CQACs, the
following is true: LetRbe a CQAC contained rewriting ofQin terms ofV. ThenR
is contained in the one found by the algorithm in Subsection 16.1 DatalogACprogram
RMCR.
Theorems 13.1 and 16.1 can be extended, in a similar manner as in the case of
deciding the complexity of query containment to the following:
Theorem 16.4.Given CQAC viewsVand a query which is one of the following:
(i) It uses closed LSIs and one open RSI and the constant in the RSI is not shared
with a closed RSI in any view definition.
57
(ii) It uses open LSIs and one closed RSI and the LSIs use all distinct constants
and each such constant is not shared with a closed LSI in any view definition.
(iii) It uses open LSIs and one open RSI and all the SIs use distinct constants and
such constant is not shared with a closed SI in any view definition.
Then, the following are true:
1. We can find an MCR ofQusingVin the language of Datalog with ACs.
2. We can find all certain answers ofQusingVon a given view instanceIin
time polynomial on the size ofI.
In Appendix C, we extend the results on finding MCRs to include single mapping
variables as per Section 9.
16.3. Discussion on the use of transitive closure rules
The following example shows that transitive closure is necessary.
Example 16.5.Consider the queryQ 1and the views:
Q() :−e(X, Z), e(Z, Y), X≥50, Y≤80, Z≤30.
V1(X, Y) :−e(X, Z), e(Z, Y), Z≥50.
V2(X, Y) :−e(X, Z), e(Z, Y), Z≤80, X≤30.
V3(X, Y) :−e(X, Z 1), e(Z 1, Z2), e(Z 2, Z3), e(Z 3, Y), X≤Z 2, Z2≤Y.
The following is a contained rewriting ofQusing the above given views,V:
R() :−V 1(X, Y), V 3(Y, Z 1), V 3(Z1, Z2)V3(Z2, Z3)V3(Z3, Z4), V 2(Z4, W),
‘Y≤Z 1≤Z 2≤Z 3≤Z 4≤W
In order to argue that it is a contained rewriting, imagine a path of facts for
the relationein databaseD, using constantsc 1, c2, . . .in that order, i.e., the facts
aree(c 1, c2), e(c 2, c3), . . .. Ifv 1(c1, c3)is true because we apply the view definition on
e(c1, c2)ande(c 2, c3), this means thatc 1≥50. Now, if alsoc 3≤80, we have found
two tuples that satisfy all the ACs of the query except one. We will come back for
the one that is not satisfied shortly. Ifc 3>80this means that thec 3≥50, thus, we
can continue arguing similarly in an inductive manner, along the path. In the end of
the path, either we have found two tuples inDthat satisfy the two first ACs in the
definition of the query orc 18>80, which is a contradiction from the definition of the
viewV 3. Thus, we have proved that there are two tuples inDthat satisfy the subgoals
ofQexcept the last AC subgoal. Now, wherever along the path the two tuple isD
58
are found, we have that the middle constant can be shown to be≤30because of the
transitive property of≤.
When we turn the view-expansion of the above contained rewriting into a CQ by
the CQ transformation, we assume that we also transform all the ACvar≤varinto
the predicateU(var, var). This would result in a CQ that is the view-expansion of the
following contained rewriting.
R′() :−V′
1(X, Y), V′
3(Y, Z 1), V′
3(Z1, Z2)V′
3(Z2, Z3)V′
3(Z3, Z4), V′
2(Z4, W),
‘U(Y, Z 1), U(Z 1, Z2)U(Z 2, Z3)U(Z 3, Z4)U(Z 4, W)
This is the reason that when we build the MCR using the inverse rule, we need to add
the transitive closure rules for predicateU.
Finally, is this rewriting useful? I.e., is there a view set that will make the sub
goals of the rewriting true? Indeed, it is the following view set:
V1(D) ={(2,21),(21,22)}
V2(D) ={(29,30),(30,1)}
V3(D) ={(22,24),(24,26),(26,28),(28,30)}
which is a subset ofV(D)whereDis the following database instance:Dover the
binary EDB relatione:
(2,81),(81,21),(21,82),(82,22),(22,83),(83,23),(23,84),(84,24),
(24,85),(85,25),(25,86),(86,26),(26,41),(41,27),(27,42),(42,28),
(28,43),(43,29,)(29,44),(44,30),(30,45),(45,1)
The query computes to True onDbecause of the tuples(86,26),(26,41).
17. Conclusions
In this paper we have investigated the computational complexity of query contain-
ment for CQACs and of computing certain answers in the framework of answering
CQAC queries using CQAC views. We begin by looking into cases where the contain-
ing query uses only LSI ACs. When the containing query uses only closed LSIs, the
problem is in NP. When there are open LSIs in the containing query it is not the case.
In that respect, when the containing query uses only LSI and certain constants do not
appear in the contained query, the problem is in NP (even more interestingly, via the
homomorphism property). However, if the containing query uses open LSI and certain
constants are allowed to appear in both queries, the problem becomesΠp
2-complete.
Thus, we have delineated a boundary between NP andΠp
2-complete which surprisingly
59
puts ”very similar” problems in different computational classes. Then, we are investi-
gating cases where the containment problem is in NP when the containing query uses
both LSI and RSI ACs. This needs a more complicated algorithm to prove.
OpenproblemsremainwhenthecontainingqueryusesbothLSIandRSIACs. The
most tight open problem is the complexity of query containment when the containing
query uses two closed LSI ACs and two closed RSI ACs. The technique used to prove
Πp
2hardness here does not work because the containing queries used in the proof use
a number of ACs that is proportional to the size of the formula from which we do the
reduction.
Towards investigating similar problems, we believe that, if the relational subgoals
of the containing query form an acyclic hypergraph and there are only several closed
LSIs and one closed RSI, then it is worth investigating whether testing containment
may be done in polynomial time. We already know that the CQ query containment
problem is polynomial when the containing query is acyclic.
The second part of the present paper considers finding MCRs and computing
certain answers. First, we present a result which says that, for CQAC query and
views, an MCR in the language of union of CQACs computes exactly the set of certain
answers. Containment tests usually provide the basis of algorithms that find MCRs.
We use a containment test via transformations. We show that in the case the query
has only LSI and a single RSI, there is an MCR in the language of Datalog with
ACs, we, consequently, show that we can compute certain answers in polynomial time
for this case. As concerns broader classes of queries that contain any number of
LSI and RSI ACs, we believe it is unlikely that there is an MCR in the language
of union of CQACs, hence, probably the problem of computing certain answers is
harder than a PTIME problem. For MCRs, when the homomorphism property holds,
various efficient techniques like the Minicon algorithm [13] may find an MCR in the
language of union of CQAC. As for the corner cases of query that contains only LSI
ACs and containment is proven to beΠp
2-complete, the problem of finding an MCR
in the language of union of CQACs (and the problem of computing certain answers
in polynomial time) is an interesting open problem. On a similar line of research
that concerns equivalent rewritings, when the query is an acyclic CQ, a recent result
[24] shows that there is an equivalent rewriting which is acyclic, if there is one at all.
Probably it is worth investigating the problem with ACs, starting with simple cases,
60
e.g., when the query contains only SIs or even only LSIs.
In conclusion, for the problem of query containment with SI ACs, we built an
interesting picture which is depicted in Table 1, and for the problem of computing
certain answers of queries with SI ACs, we made progress in the direction of computing
certain answers in polynomial time.
References
[1] A.K.Chandra, P.M.Merlin, Optimalimplementationofconjunctivequeriesinrelational
data bases, STOC (1977) 77–90.
[2] A. Klug, On conjunctive queries containing inequalities, Journal of the ACM 35 (1)
(1988) 146–160.
[3] R. van der Meyden, The complexity of querying indefinite data about linearly ordered
domains, in: PODS, 1992.
[4] A. Gupta, Y. Sagiv, J. D. Ullman, J. Widom, Constraint checking with partial informa-
tion, in: PODS, 1994, pp. 45–55.
[5] X. Zhang, Z. M. Ozsoyoglu, Some results on the containment and minimization of (in)
equality queries, Inf. Process. Lett. (1994).
[6] F. N. Afrati, C. Li, P. Mitra, Rewriting queries using views in the presence of arithmetic
comparisons, Theor. Comput. Sci. 368 (1-2) (2006) 88–123.
[7] F. Afrati, C. Li, P. Mitra, On containment of conjunctive queries with arithmetic com-
parisons, in: EDBT, 2004.
[8] P. G. Kolaitis, D. L. Martin, M. N. Thakur, On the complexity of the containment
problem for conjunctive queries with built-in predicates, in: PODS, 1998, pp. 197–204.
[9] F. N. Afrati, The homomorphism property in query containment and data integration, in:
B.C.Desai, D.Anagnostopoulos, Y.Manolopoulos, M.Nikolaidou(Eds.), Proceedingsof
the 23rd International Database Applications & Engineering Symposium, IDEAS 2019,
Athens, Greece, June 10-12, 2019, ACM, 2019, pp. 2:1–2:12.
[10] F. N. Afrati, N. Kiourtis, Computing certain answers in the presence of dependencies,
Inf. Syst. 35 (2) (2010) 149–169.
[11] F. N. Afrati, R. Chirkova, Answering Queries Using Views, Second Edition, Synthesis
Lectures on Data Management, Morgan & Claypool Publishers, 2019.
61
[12] M. Benedikt, F. R. Cooper, S. Germano, G. Gyorkei, E. Tsamoura, B. Moore, C. Ortiz,
PDQ 2.0: Flexible infrastructure for integrating reasoning and query planning, SIGMOD
Rec. 51 (4) (2022) 36–41.
[13] R. Pottinger, A. Levy, A scalable algorithm for answering queries using views, in: Proc.
of VLDB, 2000.
[14] X. Zhang, M. Ozsoyoglu, On efficient reasoning with implication constraints, in: DOOD,
1993, pp. 236–252.
[15] G.Karvounarakis,V.Tannen,Conjunctivequeriesandmappingswithunequalities,Tech-
nical Report MS-CIS-08-37, University of Pennsylvania (2008).
[16] A. Levy, A. O. Mendelzon, Y. Sagiv, D. Srivastava, Answering queries using views, in:
PODS, 1995, pp. 95–104.
[17] A. Levy, Answering queries using views: A survey, Technical report, Computer Science
Dept., Washington Univ. (2000).
[18] S. Abiteboul, O. M. Duschka, Complexity of answering queries using materialized views,
in: PODS, 1998, pp. 254–263.
[19] G. Grahne, A. O. Mendelzon, Tableau techniques for querying information sources
through global schemas, in: ICDT, 1999, pp. 332–347.
[20] A. Levy, A. Rajaraman, J. J. Ordille, Querying heterogeneous information sources using
source descriptions, in: Proc. of VLDB, 1996, pp. 251–262.
[21] P. Mitra, An algorithm for answering queries efficiently using views, in: Proceedings of
the Australasian Database Conference, 2001.
[22] O. M. Duschka, M. R. Genesereth, Answering recursive queries using views, in: PODS,
1997, pp. 109–116.
[23] F. N. Afrati, M. Gergatsoulis, T. G. Kavalieros, Answering queries using materialized
views with disjunctions, in: ICDT, 1999, pp. 435–452.
[24] G. Geck, J. Keppeler, T. Schwentick, C. Spinrath, Rewriting with acyclic queries: Mind
your head, Log. Methods Comput. Sci. 19 (4) (2023).
[25] Y. Cao, W. Fan, F. Geerts, P. Lu, Bounded query rewriting using views, ACM Trans.
Database Syst. 43 (1) (2018) 6:1–6:46.
[26] G. Cima, M. Console, M. Lenzerini, A. Poggi, A review of data abstraction, Frontiers
Artif. Intell. 6 (2023).
URLhttps://doi.org/10.3389/frai.2023.1085754
62
[27] A. Nash, L. Segoufin, V. Vianu, Determinacy and rewriting of conjunctive queries using
views: A progress report, in: T. Schwentick, D. Suciu (Eds.), Database Theory - ICDT
2007, 11thInternationalConference, Barcelona, Spain, January10-12, 2007, Proceedings,
Vol.4353ofLectureNotesinComputerScience, Springer, 2007, pp.59–73.doi:10.1007/
11965893\_5.
URLhttps://doi.org/10.1007/11965893_5
[28] L. Segoufin, V. Vianu, Views and queries: determinacy and rewriting, in: C. Li (Ed.),
Proceedings of the Twenty-fourth ACM SIGACT-SIGMOD-SIGART Symposium on
Principles of Database Systems, June 13-15, 2005, Baltimore, Maryland, USA, ACM,
2005, pp. 49–60.doi:10.1145/1065167.1065174.
URLhttps://doi.org/10.1145/1065167.1065174
[29] F. N. Afrati, Determinacy and query rewriting for conjunctive queries and views, Theor.
Comput. Sci. 412 (11) (2011) 1005–1021.doi:10.1016/J.TCS.2010.12.031.
URLhttps://doi.org/10.1016/j.tcs.2010.12.031
[30] D. Calvanese, G. D. Giacomo, M. Lenzerini, M. Y. Vardi, Lossless regular views, in:
L. Popa, S. Abiteboul, P. G. Kolaitis (Eds.), Proceedings of the Twenty-first ACM
SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 3-5,
Madison, Wisconsin, USA, ACM, 2002, pp. 247–258.doi:10.1145/543613.543646.
URLhttps://doi.org/10.1145/543613.543646
[31] D. Calvanese, G. D. Giacomo, M. Lenzerini, M. Y. Vardi, View-based query process-
ing: On the relationship between rewriting, answering and losslessness, in: T. Eiter,
L. Libkin (Eds.), Database Theory - ICDT 2005, 10th International Conference, Edin-
burgh, UK, January 5-7, 2005, Proceedings, Vol. 3363 of Lecture Notes in Computer
Science, Springer, 2005, pp. 321–336.doi:10.1007/978-3-540-30570-5\_22.
URLhttps://doi.org/10.1007/978-3-540-30570-5_22
[32] M.Benedikt, C.Pradic, C.Wernhard, Synthesizingnestedrelationalqueriesfromimplicit
specifications, ACM, 2023, pp. 33–45.
[33] J. Kwiecien, J. Marcinkowski, P. Ostropolski-Nalewaja, Determinacy of real conjunctive
queries. the boolean case, in: L. Libkin, P. Barceló (Eds.), PODS ’22: International
Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, ACM,
2022, pp. 347–358.doi:10.1145/3517804.3524168.
URLhttps://doi.org/10.1145/3517804.3524168
[34] M. Benedikt, J. Engelfriet, S. Maneth, Determinacy and rewriting of functional top-down
and MSO tree transformations, J. Comput. Syst. Sci. 85 (2017) 57–73.doi:10.1016/J.
63
JCSS.2016.11.001.
URLhttps://doi.org/10.1016/j.jcss.2016.11.001
[35] J. Marcinkowski, What makes a variant of query determinacy (un)decidable? (invited
talk), in: C. Lutz, J. C. Jung (Eds.), 23rd International Conference on Database Theory,
ICDT 2020, March 30-April 2, 2020, Copenhagen, Denmark, Vol. 155 of LIPIcs, Schloss
Dagstuhl - Leibniz-Zentrum für Informatik, 2020, pp. 2:1–2:20.doi:10.4230/LIPICS.
ICDT.2020.2.
URLhttps://doi.org/10.4230/LIPIcs.ICDT.2020.2
[36] M. Benedikt, S. Kikot, P. Ostropolski-Nalewaja, M. Romero, On monotonic determinacy
andrewritabilityforrecursivequeriesandviews, ACMTrans.Comput.Log.24(2)(2023)
16:1–16:62.
[37] P. Andritsos, R. Fagin, A. Fuxman, L. M. Haas, M. A. Hernández, C. T. H. Ho, A. Ke-
mentsietsidis, R. J. Miller, F. Naumann, L. Popa, Y. Velegrakis, C. Vilarem, L. Yan,
Schema management, IEEE Data Eng. Bull. 25 (3) (2002) 32–38.
[38] R. Fagin, P. G. Kolaitis, R. J. Miller, L. Popa, Data exchange: semantics and query
answering, Theoretical Computer Science 336 (1) (2005) 89–124.
[39] G. Konstantinidis, J. L. Ambite, Scalable containment for unions of conjunctive queries
under constraints, in: R. D. Virgilio, F. Giunchiglia, L. Tanca (Eds.), Proceedings of the
Fifth Workshop on Semantic Web Information Management, SWIM@SIGMOD Confer-
ence 2013, New York, NY, USA, June 23, 2013, ACM, 2013, pp. 4:1–4:8.
[40] B. ten Cate, P. G. Kolaitis, W. Othman, Data exchange with arithmetic operations,
in: Joint 2013 EDBT/ICDT Conferences, EDBT ’13 Proceedings, Genoa, Italy, March
18-22, 2013, 2013, pp. 537–548.
[41] W. Fan, X. Liu, P. Lu, C. Tian, Catching numeric inconsistencies in graphs, in: Proceed-
ings of the 2018 International Conference on Management of Data, SIGMOD Conference
2018, Houston, TX, USA, June 10-15, 2018, 2018, pp. 381–393.
[42] C. H. Papadimitriou, M. Yannakakis, On the complexity of database queries, in: Pro-
ceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles
of Database Systems, May 12-14, 1997, Tucson, Arizona, USA, 1997, pp. 12–19.
[43] J. Chekuri, A. Rajaraman, Conjunctive query containment revisited, in: F. Afrati, P. Ko-
laitis (Eds.), ICDT, volume 1186 of Lecture Notes in Computer Science Springer-Verlag,
1997, pp. 56–70.
64
[44] F. N. Afrati, C. Li, V. Pavlaki, Data exchange in the presence of arithmetic compar-
isons, in: EDBT2008, 11thInternationalConferenceonExtendingDatabaseTechnology,
Nantes, France, March 25-29, 2008, Proceedings, 2008, pp. 487–498.
[45] M. Console, M. Hofer, L. Libkin, Queries with arithmetic on incomplete databases, in:
D. Suciu, Y. Tao, Z. Wei (Eds.), Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI
Symposium on Principles of Database Systems, PODS 2020, Portland, OR, USA, June
14-19, 2020, ACM, 2020, pp. 179–189.
[46] A. Y. Halevy, Answering queries using views: A survey, The VLDB Journal 10 (4) (2001)
270–294.
[47] J. D. Ullman, Principles of Database and Knowledge-Base Systems, Volume I, Vol. 14 of
Principles of computer science series, Computer Science Press, 1988.
Appendix A. OnthetransformationofCQACqueriestoDatalogand
CQ queries. The tree-like structure of the containment
entailent for CQASI1 containing query
This appendix offers intuition on the proof of Theorem 14.4.
Proposition 7.1 begins to show a tree-like structure of the containment entailment
and it gives the first intuition for constructing a Datalog query from the containing
query that will help in deciding query containment. The following example gives an
illustration of this intuition.
Example Appendix A.1.Let us consider the following two Boolean queries.
Q1:q() :−a(X, Y, Z), X≤8, Y≤7, Z≥6.
Q2:q() :−a(X, Y, Z), a(U 1, U2, X), a(V 1, V2, Y),
a(Z, Z 1, Z2), a(U′
1, U′
2, U1), a(V′
1, V′
2, V1),
U′
1≤8, U′
2≤7, U 2≤7, V′
1≤8,
V′
2≤7, V 2≤7, Z 1≤7, Z 2≥6.
The queryQ 2is contained in the queryQ 1. To verify this, notice that there are
6containment mappings fromQ 1toQ 2. These mappings are given as follows:µ 1:
(X, Y, Z)→(X, Y, Z),µ 2: (X, Y, Z)→(U 1, U2, X),µ 3: (X, Y, Z)→(V 1, V2, Y),
µ4: (X, Y, Z)→(Z, Z 1, Z2),µ 5: (X, Y, Z)→(U′
1, U′
2, U1), andµ 6: (X, Y, Z)→
(V′
1, V′
2, V1). After replacing the variables as specified by the containment mappings,
the query entailment isβ⇒β 1∨β 2∨β 3∨β 4∨β 5∨β 6, where:
65
Figure A.2: Illustration of containment entailment of Example Appendix A.1
β:U′
1≤8∧U′
2≤7∧U 2≤7∧V′
1≤8∧V′
2≤7∧V 2≤7∧Z 1≤7∧Z 2≥6.
β1:X≤8∧Y≤7∧Z≥6.β 4:Z≤8∧Z 1≤7∧Z 2≥6.
β2:U1≤8∧U 2≤7∧X≥6.β 5:U′
1≤8∧U′
2≤7∧U 1≥6.
β3:V1≤8∧V 2≤7∧Y≥6.β 6:V′
1≤8∧V′
2≤7∧V 1≥6.
We now refer to Figure A.2 to offer some intuition about and visualization on
Proposition 7.1 using the above queries. The circles in the figure represent the map-
pingsµ 1, . . . , µ 6, and the dots are the variables ofQ 2. Notice now the intersections
between the circles. Proposition 7.1 refers to these intersections, such as the one be-
tweenµ 3andµ 6(or, the one betweenµ 2andµ 5).
The ACV 1≥6(V 1is included in the intersection betweenµ 3andµ 6) is the one
that is not implied byβ, as stated in the case (ii) of the Proposition 7.1. In particular,
it is easy to verify that the following are true:
•β∧ ¬(V 1≥6)⇒β 1∨β 2∨β 3∨β 4∨β 5.
•β⇒β 6∨ ¬(V 1≥6)(i.e.,β⇒(V′
1≤8∧V′
2≤7∧V 1≥6)∨ ¬(V 1≥6)).
•β⇒(V′
1≤8)andβ⇒(V′
2≤7).
Appendix B. Proof of Theorem 14.4
Proof.We consider the canonical database,D, ofQCQ
2.For convenience, the constants
in the canonical database use the lower case letters of the variables they represent.
Thus, constantxis used in the canonical database to represent the variableX. We
willusethecontainmenttestaccordingtowhichaDatalogquerycontainsaconjunctive
queryQif and only if the Datalog query computes the head ofQwhen applied on the
canonical database of the conjunctive queryQ. All the containment mappings we use
66
in this proof are from the relational subgoals of CQACQ 1to the relational subgoals
of CQACQ 2. Hence, we will refer to them simply as “mappings.”
“If” direction:
Inductive Hypothesis:If, in the computation of aJfact associated with ACe,
we have used the mappingsµ 1, µ2, . . . , µ k(via applications of mapping rules), where
k < nthen the following is true:
β2⇒µ 1(β1)∨µ 2(β1)∨ ··· ∨µ k(β1)∨ ¬µ h(eβ1
i)
whereµ his one of theµ 1, . . . , µ k.
Proof of Inductive Hypothesis:The base case is straightforward, since it is the
case when aJfact is computed after the application of one mapping rule, say by
mappingµ i. This is enabled because each of the ACs in theµ i(β1)except one (the
one associated with the computedJfact) are implied byβ 2.
We use the shortest derivation tree for the computation. Suppose we apply a
mapping rule via mappingµ currentthat usesIfacts computed in previous rounds.
Supposeµ current (β1) =V
i=1...Mµcurrent (eβ1
i), whereMis the number of ACs inQ 1,
hence the number of IDBs in each mapping rule (including the IDB in the head).
According to the inductive hypothesis, eachIfact that is computed via aJfact,
which in turn was computed via some mappingsµ ij, j= 1,2, . . .(i.e., these are the
mappings for all mapping rules that were applied during the whole computation ofJ
fact), implies that the following is true:
β2⇒µ i1(β1)∨µ i2(β1)∨ ··· ∨µ ili(β1)∨ ¬µ hi(eβ1
i)
or equivalently:
β2∧ ¬µ i1(β1)∧ ¬µ i2(β1)∧ ··· ∧ ¬µ ili(β1)⇒ ¬µ hi(eβ1
i)
Now we fire a mapping rule using the already computed facts and, as we argued in
Section 15, we have the implication:
β2⇒µ current (β1)_
i=1...Mµcurrent (eβ1
i)(B.1)
Thus, we can combine the above implications for allIfacts used for the current
application of a mapping rule and have that the following is true:
β2∧^
for all i[¬µi1(β1)∧ ¬µ i2(β1)∧ ··· ∧ ¬µ ili(β1)]⇒^
¬µhi(eβ1
i)
67
We write the above in the form:
β2∧ ¬µ 1(β1)∧ ¬µ 2(β1)∧ ··· ⇒^
¬µhi(eβ1
i)(B.2)
where for simplicity we have expressed theµ ij, i= 1,2, . . . , j= 1,2, . . .asµ 1, µ2, . . ..
Nowe 1, . . .are the ACs each associated with theJfacts used for this mapping rule.
Hence we have for eachi
β2∧ ¬µ 1(β1)∧ ¬µ 2(β1)∧ ··· ⇒ ¬µ current (eβ1
i)
Combining the above, one for eachi, and Equation B.1, we have
β2∧^
i=1...Mµcurrent (eβ1
i)∧ ¬µ 1(β1)∧ ¬µ 2(β1)∧ ··· ⇒µ current (β1)(B.3)
which is equivalent to:
β2∧µcurrent (eβ1
h)∧ ¬µ 1(β1)∧ ¬µ 2(β1)∧ ··· ⇒µ current (β1)(B.4)
whereµ current (eβ1
h)is the AC associated with the computed fact.
Finally, when we apply the query rule we derive the containment entailment.
“Only-if” direction:
Inductive hypothesisThere are mappings,µ i, i= 1, . . . m+l, of whichlare
used to compute facts such that the following is true:
β2⇒µ 1(β1)∨ ··· ∨µ m(β1)∨µ m+1(eβ1
m+1)∨µ m+2(eβ1
m+2)∨ ··· ∨µ m+l(eβ1
m+l)(B.5)
wherel < nandeβ1
m+1, eβ1
m+2, . . .are ACs fromβ 1and the following facts are com-
puted by the transformedQ 1as applied on the canonical database of the transformed
Q2:I(x m+1), I(x m+2). . .wherex m+1(similarly forx m+2, etc) is the constant in the
canonical database of the transformedQ 2represented by the variable inµ m+1(eβ1
m+1).
Proof of the inductive hypothesis:For the base case, the following is trueβ 2⇒
µ1(β1)∨ ··· ∨µ l(β1), and there are no facts computed yet.
We will prove that if B.5 is true then the conclusion of the inductive hypothesis is
true. We use Propositions 7.1 and 13.2.
First we rewrite the logical implication in B.5 by transfering all the ”single’ ACs
on the lfs. Thus, we have again a containment entailment. Then we use Proposition
7.1 to argue that, as a consequence of implication B.5 ), there is a mapping, sayµ 1,
among the mappings on the rhs of implication B.5 , for which all the disjuncts are
68
implied by the lhs of implication B.5 except one, sayµ 1(eβ1
1). This means, from the
way the transformedQ 1is constructed, that there is a rule that computes the fact
I(x 1). It also means that the following is true:
β2⇒µ 2(β1)∨ ··· ∨µ m(β1)∨µ 1(eβ1
1)∨µ m+1(eβ1
m+1)∨µ m+2(eβ1
m+2)∨ ···(B.6)
Hence, from the inductive hypothesis, we conclude that the following facts are com-
puted:I(x m+1), I(x m+2), ...., I(x 1).The new fact,I(x 1), is computed because: For
eachac iinµ 1(β1)and fore jassociated with an already computed fact and which uses
variables inµ 1(β1)the following is true:β 2⇒ac i∨ej. Thus, we conclude from the
computation of the new fact and from equation B.6 that there are mappings of which
l+ 1are used to compute facts and the implication in the inductive hypothesis (i.e.,
implication B.6) is true. These (i.e., the corresponding variables of the constants in
aciande j) provide the instantiation for the firing of couplings rules that compute all
theIfacts necessary to fire a mapping rule by the instantiation provided byµ 1.
For the final step of the induction, when in implication B.5 there is one mapping,
µ0(β1), then we apply the query rule and compute the head of the query.
Appendix C. Building MCR for RSI1+ query
Now, we exploit the results of Section 9. We consider queries that are allowed to
have any ACs among head variables. For ease of reference, we define a CQAC RSI1+
query to be a CQAC query that the ACs on nondistinguished variables are closed
semi-interval ACs and there is a single right semi-interval AC.
Now, we present the algorithm for building an MCR in the language of (possibly
infinite) union of CQACs for the case of CQAC views and queries that are RSI1+.
The algorithm for building an MCR for queryQand viewsetVis the following:
Algorithm MCR-RSI1+:
1. We consider the queryQ′which results from the given queryQafter we have
removed the ACs that contain only head variables.
2. We apply the algorithm for building MCR for queryQ′and viewsV(from previous
subsection). Let this MCR beR′
MCR.
3. We add a new rule inR′
MCR(and obtainR MCR) to compute the query predicate
Qas follows:
Q() :−Q′(), ac 1, ac 2, . . .
69
whereac 1, ac 2, . . .are the ACs that we removed in the first step of the present
algorithm.
Appendix C.1. Proof that the algorithmMCR-RSI1+is correct
We consider the found by theAlgorithm MCR-RSI1+DatalogACprogram,
RMCR. Theorem Appendix C.1 below proves that every CQAC contained rewriting
is contained inR MCRand Theorem Appendix C.2 proves thatR MCRis a contained
rewriting.
Theorem Appendix C.1.Given a queryQwhich is RSI1+ and viewsVwhich are
CQACs, the following is true: LetRbe a CQAC contained rewriting toQin terms of
V. ThenRis contained in the one found by theAlgorithmMCR-RSI1+DatalogAC
program,R MCR.
Proof.LetRbe a contained rewriting to queryQ. SinceQ′containsQ,Ris a
contained rewriting ofQ′too. Hence, according to the results Theorem 16.3,Ris
contained toR′
MCR.
SinceRis contained toQ, we consider the view-expansion ofR, let it beR expand
we know that this is contained inQ, hence the containment entailment is true. How-
ever,Qis a RSI1+ query, hence we can, according to Section 9 break the containment
entailment in two as follows:
β2⇒µ 1(βQ′)∨ ···reduced containment entailment
β2⇒µ 1(βQ−head )eq. (1)
whereβ 2is the conjunction of ACs in the closure of ACs inR expandβ Q′is
the conjunction of ACs inQ′,βQ−headis the conjunction of ACs that use only head
variables, andµ i’s are all the mappings fromQtoR exp.
Observe that in equation (1), we can replaceβ 2with only those ACs in the closure
ofβ 2that involve head variables. BecauseRis AC-rectified, all these ACs appear in
R; let us denote them byβ headThusβ headlogically impliesβ Q−head. Now,R′
MCR
andR MCRhave the same Datalog-expansions, except that the latter has the ACs in
βQ−headas well.
Hence we have concluded that a)Ris contained toR′
MCRand b) the ACs inR
implytheaddedACsineachDatalog-expansionofR′
MCRtomakeaDatalog-expansion
ofR MCR. HenceRis contained inR MCRaccording to the results in Section 9.
70
Theorem Appendix C.2.Given a queryQwhich is CQAC-RSI1 and viewsVwhich
are CQACs, the following is true: The found by the algorithm in Subsection 16.1
DatalogACprogram,R MCR, is a contained rewriting.
Proof.LetEbe a CQAC query which is a Datalog-expansion ofR MCR. LetE′be the
CQAC that results fromEby removing the head ACs.E′is a contained rewriting in
Q′. Hence if we consider the view-expansion,E′
exp, ofE′, the containment entailment
is true forE′
expandQ′.
Moreover, triviallywehaveβ E⇒β Q−headandusingthedistributivelaw, wederive
the containment entailment that shows containment of the view-expansionE expofE
toQ.
A straightforward consequence of the above two theorems is the following theorem
which is the main result of this section.
Theorem Appendix C.3.Given a queryQwhich is CQAC-SI1+ and viewsV
which are CQACs, the algorithm in Subsection 16.1 finds an MCR ofQusingVin
the language of (possibly infinite) union of CQACs which is expressed by a DatalogAC
query. Hence, for this case of query and views, we can compute certain answers in
polynomial time.
71