In 1979, IBM’s System R — one of the first relational database systems — introduced a component that would define database architecture for the next four decades: the query optimizer.
The optimizer’s core insight was this: the SQL query a programmer writes is not the query that gets executed. The optimizer rewrites it. The rewritten query and the original query must return identical results — they are logically equivalent. But the rewritten query can be orders of magnitude faster because it exploits indexes, avoids full table scans, and minimizes intermediate result sizes.
Every major database today — PostgreSQL, MySQL, Oracle, SQL Server — runs a query optimizer. Every time you execute a SQL query, the optimizer performs logical equivalence checking. It asks: “Is this expression equivalent to that one?” If yes, and if the second form is cheaper to execute, it substitutes.
The optimizer is an automated logical equivalence checker.
This is not a trick specific to databases. The same principle appears in compilers (constant folding, dead code elimination), hardware synthesis (circuit minimization), and type systems (proof checking). Anywhere that two representations of a computation can be shown to always produce the same result, equivalence enables substitution — and substitution enables optimization.
Thread 10 (Encoding) originates here.
Encoding is the principle that the same information can be represented in multiple forms. The forms are interchangeable (equivalent), but different forms have different costs for different operations.
The arrow is:
T10 (Encoding) origin → same meaning, different form (Ch 4) → equivalent data representations (Book 2, Ch 12).
The concept here is the logical foundation of encoding: two expressions that always evaluate identically are interchangeable. Which form you use is an engineering decision, not a correctness decision. This exact principle — same meaning, different form, choose based on cost — recurs in data serialization (JSON vs. Protocol Buffers), data encoding (UTF-8 vs. UTF-32), and distributed representation (sharded vs. replicated data).
Two logical expressions are logically equivalent if they have the same truth value for every possible assignment of truth values to their variables.
Notation: P ≡ Q means P and Q are logically equivalent.
Equivalence is stronger than equality of truth values in a single case. It means: there is no input combination that makes one True and the other False.
Three categories of logical expressions:
A tautology and any other tautology are logically equivalent. A contradiction and any other contradiction are logically equivalent. Two contingencies are equivalent only if their truth tables match exactly.
The mechanical method: build truth tables for both expressions. Compare the output columns row by row. If the columns are identical, the expressions are equivalent.
Example: Show that P → Q is equivalent
to NOT P OR Q.
First, define the conditional P → Q (“if P then Q”): it
is False only when P is True and Q is False. In all other cases, it is
True.
| P | Q | P → Q | NOT P | NOT P OR Q |
|---|---|---|---|---|
| True | True | True | False | True |
| True | False | False | False | False |
| False | True | True | True | True |
| False | False | True | True | True |
The P → Q column and the NOT P OR Q column
are identical. The expressions are equivalent.
This equivalence is important in programming. The conditional “if authenticated then process” is equivalent to “either not authenticated, or process.” The second form is useful when you want to express the contrapositive or when negation-normal form is required.
These laws are the algebraic identities of boolean logic. They are proven by truth table and used as rewrite rules.
Identity Laws
P AND True ≡ PP OR False ≡ PDomination Laws
P AND False ≡ FalseP OR True ≡ TrueIdempotent Laws
P AND P ≡ PP OR P ≡ PDouble Negation
NOT (NOT P) ≡ PCommutative Laws
P AND Q ≡ Q AND PP OR Q ≡ Q OR PAssociative Laws
(P AND Q) AND R ≡ P AND (Q AND R)(P OR Q) OR R ≡ P OR (Q OR R)Distributive Laws
P AND (Q OR R) ≡ (P AND Q) OR (P AND R)P OR (Q AND R) ≡ (P OR Q) AND (P OR R)De Morgan’s Laws (from Chapter 2)
NOT (P AND Q) ≡ NOT P OR NOT QNOT (P OR Q) ≡ NOT P AND NOT QDe Morgan’s law: both expressions shade the same region — everything outside the intersection of A and B.
Absorption Laws
P AND (P OR Q) ≡ PP OR (P AND Q) ≡ PComplement Laws
P AND NOT P ≡ False (contradiction)P OR NOT P ≡ True (tautology — Law of Excluded
Middle)Each of these can be verified by truth table. They are universally applicable substitution rules.
Example: Simplify
(P AND Q) OR (P AND NOT Q).
Step 1: Factor out P using the Distributive Law.
(P AND Q) OR (P AND NOT Q)
= P AND (Q OR NOT Q) [Distributive: P AND (Q OR NOT Q)]
Step 2: Apply the Complement Law.
= P AND True [Q OR NOT Q ≡ True]
Step 3: Apply the Identity Law.
= P [P AND True ≡ P]
The expression (P AND Q) OR (P AND NOT Q) is equivalent
to P. Regardless of Q’s value, the whole expression reduces
to P’s value.
In programming:
# Original: complex condition
def should_notify(is_subscribed: bool, prefers_email: bool) -> bool:
return (is_subscribed and prefers_email) or (is_subscribed and not prefers_email)
# Simplified: logically equivalent, clearer
def should_notify_simplified(is_subscribed: bool, prefers_email: bool) -> bool:
# (subscribed AND email) OR (subscribed AND NOT email) ≡ subscribed
# The email preference is irrelevant — notify any subscribed user
return is_subscribedThe simplified version makes the actual logic visible. The original version hides it. Both are correct. The simplification is a logical equivalence transformation.
Two canonical forms for expressing any boolean expression:
Conjunctive Normal Form (CNF): A conjunction (AND) of disjunctions (OR). Every clause is a group of variables combined with OR; the clauses are combined with AND.
(P OR Q OR NOT R) AND (NOT P OR R) AND (Q OR R)
CNF is the input format for SAT solvers. Converting any boolean expression to CNF is always possible.
Disjunctive Normal Form (DNF): A disjunction (OR) of conjunctions (AND). Every term is a group of variables combined with AND; the terms are combined with OR. This is the form produced by the truth table construction from Chapter 3.
(P AND Q AND NOT R) OR (NOT P AND R) OR (P AND NOT Q AND R)
DNF is the form produced by reading True rows from a truth table. Converting any boolean expression to DNF is always possible.
Any expression has equivalent CNF and DNF forms. The CNF and DNF forms are themselves logically equivalent to each other and to the original expression. They differ only in structure, not in meaning.
def convert_to_dnf_from_truth_table(
variables: list[str],
truth_table: list[tuple[tuple[bool, ...], bool]]
) -> str:
"""
Mechanically convert any truth table to a DNF expression.
This always produces a logically equivalent expression.
"""
minterms = []
for assignment, output in truth_table:
if not output:
continue # DNF uses only True-output rows
# Build minterm: AND of each variable (negated if False)
literals = []
for var, value in zip(variables, assignment):
literals.append(var if value else f"NOT {var}")
minterms.append("(" + " AND ".join(literals) + ")")
return " OR ".join(minterms) if minterms else "FALSE"A tautology in code is a condition that is always True. A contradiction is a condition that is always False. Both represent logic errors — you are checking something that cannot vary.
# Tautology: always True — this check is useless
def validate_count(count: int) -> bool:
return count >= 0 or count < 0 # P OR NOT P ≡ True
# Contradiction: always False — this check always fails
def can_be_both(flag: bool) -> bool:
return flag and not flag # P AND NOT P ≡ False
# Real-world tautology that looks plausible
def should_log(level: str) -> bool:
# If ERROR is the only level, this is always True — AT3 violation
return level == "ERROR" or level != "ERROR"Static analysis tools and type checkers detect tautologies and contradictions. A tautology in a conditional means the code always takes one branch — the other is dead code. A contradiction means the code never takes that branch. Both signal logic errors.
A SQL query passes through the optimizer as a tree of relational algebra operations. The optimizer applies equivalence transformations to this tree:
Predicate pushdown: Move WHERE conditions as close to the data source as possible.
-- Original
SELECT * FROM orders JOIN users ON orders.user_id = users.id
WHERE users.country = 'US'
-- Equivalent (after pushdown): filter users before the join
SELECT * FROM (SELECT * FROM users WHERE country = 'US') u
JOIN orders ON orders.user_id = u.idBoth queries return identical results for any data. They are logically equivalent. The optimizer chooses the second form because it reduces the number of rows in the join.
Constant folding:
WHERE 1 = 1 AND status = 'active'
-- Equivalent to:
WHERE status = 'active'
-- Because (True AND P) ≡ P (Identity Law)Index selection: The optimizer knows that
WHERE a = 5 AND b = 3 is equivalent to
WHERE b = 3 AND a = 5 (commutative law). It chooses
whichever variable has a better index and evaluates that condition
first.
Every one of these is a logical equivalence transformation, applied automatically, invisibly, to every query you write.
AT4 — Precomputation vs. On-Demand
Logical equivalence enables a fundamental tradeoff: precompute the result of an expression into a simpler form, or evaluate the full expression on demand.
The SQL optimizer precomputes a simplified form of the query. It pays a cost at planning time (the optimization itself) to reduce cost at execution time (fewer operations per row). This is AT4 applied to query evaluation.
In code:
# On-demand: evaluate the full expression every time
def is_eligible_for_discount(user: User, cart: Cart) -> bool:
return (
user.account_age_days > 365 and
user.total_purchases > 500 and
cart.total > 50 and
not user.discount_used_this_month
)
# Precomputed: store an intermediate result
# Trade: cache staleness (FM4) vs. reduced evaluation cost
class User:
def __init__(self, ...):
...
# Precomputed at user update time — stays valid until account data changes
self.is_loyal_customer = (
self.account_age_days > 365 and self.total_purchases > 500
)
def is_eligible_for_discount_fast(user: User, cart: Cart) -> bool:
# Uses precomputed result — fewer operations per call
return user.is_loyal_customer and cart.total > 50 and not user.discount_used_this_monthThe second version is logically equivalent to the first for current
data. The precomputed is_loyal_customer is a stored truth
value. AT4 makes the tradeoff explicit: faster evaluation, but
is_loyal_customer can become stale if account data changes
without recomputing it. This is the same temporal contract problem from
Chapter 1.
# Correct AT4 implementation: invalidation on change
class User:
def __init__(self, account_age_days: int, total_purchases: float):
self._account_age_days = account_age_days
self._total_purchases = total_purchases
self._is_loyal_customer = self._compute_loyalty()
def _compute_loyalty(self) -> bool:
# Single source of truth for loyalty computation
return self._account_age_days > 365 and self._total_purchases > 500
@property
def account_age_days(self) -> int:
return self._account_age_days
@account_age_days.setter
def account_age_days(self, value: int):
self._account_age_days = value
self._is_loyal_customer = self._compute_loyalty() # recompute on change
@property
def is_loyal_customer(self) -> bool:
return self._is_loyal_customerThe precomputed value is always consistent with the underlying data because it is recomputed on change. The AT4 tradeoff is preserved: fast reads, slightly more work on writes.
FM8 — Schema/Contract Violation
Two expressions believed to be equivalent that are not.
This happens in three ways:
1. Incorrect mental application of equivalence laws.
# Developer believes these are equivalent
# Original: "process if admin OR (editor AND not read_only)"
def can_edit_original(is_admin: bool, is_editor: bool, is_read_only: bool) -> bool:
return is_admin or (is_editor and not is_read_only)
# "Simplified" version: pulls out NOT incorrectly
def can_edit_wrong(is_admin: bool, is_editor: bool, is_read_only: bool) -> bool:
# Developer thinks: factor out "not is_read_only" from the whole expression
# This is NOT the same as the original
return (is_admin or is_editor) and not is_read_only
# Test to reveal the difference
# Inputs: is_admin=False, is_editor=False, is_read_only=False
print(can_edit_original(False, False, False)) # False (not editor, not admin)
print(can_edit_wrong(False, False, False)) # False (same)
# Inputs: is_admin=True, is_editor=False, is_read_only=True
print(can_edit_original(True, False, True)) # True (admin overrides)
print(can_edit_wrong(True, False, True)) # True AND False = False (WRONG)The second version is not equivalent to the first. An admin with
is_read_only=True is incorrectly blocked. The
“simplification” introduced a bug.
2. Equivalence that holds for most data but not all.
SQL query rewrites can fail when NULL values are involved. Classical boolean equivalence does not hold in three-valued SQL logic. The optimizer must account for NULL separately.
-- These are NOT equivalent in SQL due to NULL:
WHERE NOT (status = 'active' AND type = 'premium')
-- vs.
WHERE status != 'active' OR type != 'premium'
-- If status IS NULL:
-- First expression: NOT (NULL AND ...) = NOT NULL = NULL (unknown)
-- Second expression: NULL OR ... = NULL or the other side
-- They may evaluate differently for NULL inputs3. Equivalence that depends on evaluation order (non-commutative systems).
In databases with row-level security, predicates can produce different results depending on which user context they are evaluated in. Two expressions equivalent in one security context may not be equivalent in another.
Compiler Optimization — compilers apply equivalence
transformations at every stage. Constant folding (2 + 3 →
5), dead code elimination (removing code after an
unconditional return), and strength reduction (replacing
x * 2 with x << 1) are all logical
equivalence transformations. The compiled code is semantically identical
to the source; it is just in a cheaper form.
Circuit Minimization (Karnaugh Maps) — in digital circuit design, a Karnaugh map is a visual tool for finding the simplest boolean expression equivalent to a given truth table. Minimizing the expression minimizes the number of logic gates. Fewer gates means less chip area, less power consumption, and lower propagation delay. The Karnaugh map is a human-readable equivalence checker.
Theorem Provers and Proof Assistants — Coq, Lean, and Isabelle are software tools that mechanically verify logical equivalences. They are used to prove that software implementations match their specifications. “This sorting algorithm always produces a sorted output” is a logical statement. The theorem prover checks it against all inputs by symbolic reasoning — not by running test cases.
Constraint Propagation in SAT Solvers — modern SAT solvers (Z3, MiniSAT) maintain a formula in CNF and repeatedly apply unit propagation, a logical equivalence rule: if a clause contains a single unresolved literal, that literal must be True (otherwise the clause is False, making the whole formula False). Unit propagation simplifies the formula by substituting the forced value, producing an equivalent formula with fewer variables.
Python’s ast Module and Code Rewriting
— Python’s abstract syntax tree (AST) allows programs to be inspected
and rewritten before execution. Libraries like numexpr and
sympy parse expressions, apply equivalence transformations,
and produce optimized forms. sympy.simplify() takes a
symbolic expression and returns a logically equivalent simpler one,
using algebraic equivalence rules.
Concept: Logical Equivalence
Core Idea: Two logical expressions are equivalent if they produce identical truth values for every input. Equivalence enables safe substitution — replace any expression with a simpler equivalent without changing behavior. This is the foundation of every optimizer, compiler, and simplifier in computing.
Tradeoff: AT4 — Precomputation vs. On-Demand (equivalence allows precomputing results; the cost is staleness when the underlying data changes)
Failure Mode: FM8 — Schema/Contract Violation (two expressions believed equivalent but not — usually from incorrect manual application of equivalence laws or NULL-blindness in SQL)
Signal: When a “simplified” boolean condition produces unexpected behavior, or when a precomputed value diverges from its freshly-computed equivalent — verify equivalence by truth table before trusting a manual simplification.
Classify each expression as a tautology, contradiction, or contingency. Justify with a truth table or by citing an applicable law.
P OR NOT PP AND NOT P(P AND Q) OR (NOT P AND Q) OR (P AND NOT Q) OR (NOT P AND NOT Q)P AND (NOT P OR Q)State the Distributive Law (both forms). Verify one form with a two-variable truth table where R = True.
What is the difference between CNF and DNF? Give an example of one expression in each form.
Show that these pairs of expressions are logically equivalent using the algebraic laws (cite each law you use). Do not use truth tables.
NOT (P AND Q) AND NOT (P AND NOT Q) simplifies to
NOT P.
(P OR Q) AND (P OR NOT Q) simplifies to
P.
Verify your answers by building truth tables.
A permissions system has this rule:
def can_delete(is_owner: bool, is_admin: bool, is_archived: bool, is_locked: bool) -> bool:
return (is_owner and not is_archived and not is_locked) or (is_admin and not is_locked)A developer wants to simplify this to:
(is_owner or is_admin) and not is_archived and not is_locked.
Show by truth table that these are NOT equivalent. Identify the specific
input combination that differs.
Derive the correct simplification. Show each step using the equivalence laws, citing each law.
A precomputed field user.has_delete_rights is added
to the database. It stores the value of can_delete at user
object creation time. Identify the AT4 tradeoff. Under what conditions
does has_delete_rights become stale (FM8)? Propose an
invalidation strategy.
The precomputed field is added to a database index:
WHERE has_delete_rights = TRUE. What performance benefit
does this provide? What correctness risk does it introduce? Name both
tradeoffs (AT4 and AT9).
A complete answer will: name the structure or technique chosen, state why an obvious alternative was rejected, and identify the conditions under which the choice ceases to hold.