The Computing Series

Logical Equivalence

Introduction

In 1979, IBM’s System R — one of the first relational database systems — introduced a component that would define database architecture for the next four decades: the query optimizer.

The optimizer’s core insight was this: the SQL query a programmer writes is not the query that gets executed. The optimizer rewrites it. The rewritten query and the original query must return identical results — they are logically equivalent. But the rewritten query can be orders of magnitude faster because it exploits indexes, avoids full table scans, and minimizes intermediate result sizes.

Every major database today — PostgreSQL, MySQL, Oracle, SQL Server — runs a query optimizer. Every time you execute a SQL query, the optimizer performs logical equivalence checking. It asks: “Is this expression equivalent to that one?” If yes, and if the second form is cheaper to execute, it substitutes.

The optimizer is an automated logical equivalence checker.

This is not a trick specific to databases. The same principle appears in compilers (constant folding, dead code elimination), hardware synthesis (circuit minimization), and type systems (proof checking). Anywhere that two representations of a computation can be shown to always produce the same result, equivalence enables substitution — and substitution enables optimization.


Thread Activation

Thread 10 (Encoding) originates here.

Encoding is the principle that the same information can be represented in multiple forms. The forms are interchangeable (equivalent), but different forms have different costs for different operations.

The arrow is: T10 (Encoding) origin → same meaning, different form (Ch 4) → equivalent data representations (Book 2, Ch 12).

The concept here is the logical foundation of encoding: two expressions that always evaluate identically are interchangeable. Which form you use is an engineering decision, not a correctness decision. This exact principle — same meaning, different form, choose based on cost — recurs in data serialization (JSON vs. Protocol Buffers), data encoding (UTF-8 vs. UTF-32), and distributed representation (sharded vs. replicated data).


The Concept

Two logical expressions are logically equivalent if they have the same truth value for every possible assignment of truth values to their variables.

Notation: P ≡ Q means P and Q are logically equivalent.

Equivalence is stronger than equality of truth values in a single case. It means: there is no input combination that makes one True and the other False.

Three categories of logical expressions:

  • Tautology — True for every assignment of variables. Also written ⊤ or T.
  • Contradiction (also called unsatisfiable) — False for every assignment. Also written ⊥ or F.
  • Contingency — True for some assignments, False for others.

A tautology and any other tautology are logically equivalent. A contradiction and any other contradiction are logically equivalent. Two contingencies are equivalent only if their truth tables match exactly.


How It Works

Establishing Equivalence by Truth Table

The mechanical method: build truth tables for both expressions. Compare the output columns row by row. If the columns are identical, the expressions are equivalent.

Example: Show that P → Q is equivalent to NOT P OR Q.

First, define the conditional P → Q (“if P then Q”): it is False only when P is True and Q is False. In all other cases, it is True.

P Q P → Q NOT P NOT P OR Q
True True True False True
True False False False False
False True True True True
False False True True True

The P → Q column and the NOT P OR Q column are identical. The expressions are equivalent.

This equivalence is important in programming. The conditional “if authenticated then process” is equivalent to “either not authenticated, or process.” The second form is useful when you want to express the contrapositive or when negation-normal form is required.

Key Equivalence Laws

These laws are the algebraic identities of boolean logic. They are proven by truth table and used as rewrite rules.

Identity Laws

  • P AND True ≡ P
  • P OR False ≡ P

Domination Laws

  • P AND False ≡ False
  • P OR True ≡ True

Idempotent Laws

  • P AND P ≡ P
  • P OR P ≡ P

Double Negation

  • NOT (NOT P) ≡ P

Commutative Laws

  • P AND Q ≡ Q AND P
  • P OR Q ≡ Q OR P

Associative Laws

  • (P AND Q) AND R ≡ P AND (Q AND R)
  • (P OR Q) OR R ≡ P OR (Q OR R)

Distributive Laws

  • P AND (Q OR R) ≡ (P AND Q) OR (P AND R)
  • P OR (Q AND R) ≡ (P OR Q) AND (P OR R)

De Morgan’s Laws (from Chapter 2)

  • NOT (P AND Q) ≡ NOT P OR NOT Q
  • NOT (P OR Q) ≡ NOT P AND NOT Q
Architecture diagram

De Morgan’s law: both expressions shade the same region — everything outside the intersection of A and B.

Absorption Laws

  • P AND (P OR Q) ≡ P
  • P OR (P AND Q) ≡ P

Complement Laws

  • P AND NOT P ≡ False (contradiction)
  • P OR NOT P ≡ True (tautology — Law of Excluded Middle)

Each of these can be verified by truth table. They are universally applicable substitution rules.

Using Equivalence Laws to Simplify Expressions

Example: Simplify (P AND Q) OR (P AND NOT Q).

Step 1: Factor out P using the Distributive Law.

(P AND Q) OR (P AND NOT Q)
= P AND (Q OR NOT Q)        [Distributive: P AND (Q OR NOT Q)]

Step 2: Apply the Complement Law.

= P AND True               [Q OR NOT Q ≡ True]

Step 3: Apply the Identity Law.

= P                        [P AND True ≡ P]

The expression (P AND Q) OR (P AND NOT Q) is equivalent to P. Regardless of Q’s value, the whole expression reduces to P’s value.

In programming:

# Original: complex condition
def should_notify(is_subscribed: bool, prefers_email: bool) -> bool:
    return (is_subscribed and prefers_email) or (is_subscribed and not prefers_email)

# Simplified: logically equivalent, clearer
def should_notify_simplified(is_subscribed: bool, prefers_email: bool) -> bool:
    # (subscribed AND email) OR (subscribed AND NOT email) ≡ subscribed
    # The email preference is irrelevant — notify any subscribed user
    return is_subscribed

The simplified version makes the actual logic visible. The original version hides it. Both are correct. The simplification is a logical equivalence transformation.

Normal Forms

Two canonical forms for expressing any boolean expression:

Conjunctive Normal Form (CNF): A conjunction (AND) of disjunctions (OR). Every clause is a group of variables combined with OR; the clauses are combined with AND.

(P OR Q OR NOT R) AND (NOT P OR R) AND (Q OR R)

CNF is the input format for SAT solvers. Converting any boolean expression to CNF is always possible.

Disjunctive Normal Form (DNF): A disjunction (OR) of conjunctions (AND). Every term is a group of variables combined with AND; the terms are combined with OR. This is the form produced by the truth table construction from Chapter 3.

(P AND Q AND NOT R) OR (NOT P AND R) OR (P AND NOT Q AND R)

DNF is the form produced by reading True rows from a truth table. Converting any boolean expression to DNF is always possible.

Any expression has equivalent CNF and DNF forms. The CNF and DNF forms are themselves logically equivalent to each other and to the original expression. They differ only in structure, not in meaning.

def convert_to_dnf_from_truth_table(
    variables: list[str],
    truth_table: list[tuple[tuple[bool, ...], bool]]
) -> str:
    """
    Mechanically convert any truth table to a DNF expression.
    This always produces a logically equivalent expression.
    """
    minterms = []
    for assignment, output in truth_table:
        if not output:
            continue  # DNF uses only True-output rows

        # Build minterm: AND of each variable (negated if False)
        literals = []
        for var, value in zip(variables, assignment):
            literals.append(var if value else f"NOT {var}")
        minterms.append("(" + " AND ".join(literals) + ")")

    return " OR ".join(minterms) if minterms else "FALSE"

Tautologies and Contradictions in Code

A tautology in code is a condition that is always True. A contradiction is a condition that is always False. Both represent logic errors — you are checking something that cannot vary.

# Tautology: always True — this check is useless
def validate_count(count: int) -> bool:
    return count >= 0 or count < 0  # P OR NOT P ≡ True

# Contradiction: always False — this check always fails
def can_be_both(flag: bool) -> bool:
    return flag and not flag  # P AND NOT P ≡ False

# Real-world tautology that looks plausible
def should_log(level: str) -> bool:
    # If ERROR is the only level, this is always True — AT3 violation
    return level == "ERROR" or level != "ERROR"

Static analysis tools and type checkers detect tautologies and contradictions. A tautology in a conditional means the code always takes one branch — the other is dead code. A contradiction means the code never takes that branch. Both signal logic errors.

The SQL Query Optimizer as an Equivalence Checker

A SQL query passes through the optimizer as a tree of relational algebra operations. The optimizer applies equivalence transformations to this tree:

Predicate pushdown: Move WHERE conditions as close to the data source as possible.

-- Original
SELECT * FROM orders JOIN users ON orders.user_id = users.id
WHERE users.country = 'US'

-- Equivalent (after pushdown): filter users before the join
SELECT * FROM (SELECT * FROM users WHERE country = 'US') u
JOIN orders ON orders.user_id = u.id

Both queries return identical results for any data. They are logically equivalent. The optimizer chooses the second form because it reduces the number of rows in the join.

Constant folding:

WHERE 1 = 1 AND status = 'active'
-- Equivalent to:
WHERE status = 'active'
-- Because (True AND P) ≡ P (Identity Law)

Index selection: The optimizer knows that WHERE a = 5 AND b = 3 is equivalent to WHERE b = 3 AND a = 5 (commutative law). It chooses whichever variable has a better index and evaluates that condition first.

Every one of these is a logical equivalence transformation, applied automatically, invisibly, to every query you write.


Tradeoffs

AT4 — Precomputation vs. On-Demand

Logical equivalence enables a fundamental tradeoff: precompute the result of an expression into a simpler form, or evaluate the full expression on demand.

The SQL optimizer precomputes a simplified form of the query. It pays a cost at planning time (the optimization itself) to reduce cost at execution time (fewer operations per row). This is AT4 applied to query evaluation.

In code:

# On-demand: evaluate the full expression every time
def is_eligible_for_discount(user: User, cart: Cart) -> bool:
    return (
        user.account_age_days > 365 and
        user.total_purchases > 500 and
        cart.total > 50 and
        not user.discount_used_this_month
    )

# Precomputed: store an intermediate result
# Trade: cache staleness (FM4) vs. reduced evaluation cost
class User:
    def __init__(self, ...):
        ...
        # Precomputed at user update time — stays valid until account data changes
        self.is_loyal_customer = (
            self.account_age_days > 365 and self.total_purchases > 500
        )

def is_eligible_for_discount_fast(user: User, cart: Cart) -> bool:
    # Uses precomputed result — fewer operations per call
    return user.is_loyal_customer and cart.total > 50 and not user.discount_used_this_month

The second version is logically equivalent to the first for current data. The precomputed is_loyal_customer is a stored truth value. AT4 makes the tradeoff explicit: faster evaluation, but is_loyal_customer can become stale if account data changes without recomputing it. This is the same temporal contract problem from Chapter 1.

# Correct AT4 implementation: invalidation on change
class User:
    def __init__(self, account_age_days: int, total_purchases: float):
        self._account_age_days = account_age_days
        self._total_purchases = total_purchases
        self._is_loyal_customer = self._compute_loyalty()

    def _compute_loyalty(self) -> bool:
        # Single source of truth for loyalty computation
        return self._account_age_days > 365 and self._total_purchases > 500

    @property
    def account_age_days(self) -> int:
        return self._account_age_days

    @account_age_days.setter
    def account_age_days(self, value: int):
        self._account_age_days = value
        self._is_loyal_customer = self._compute_loyalty()  # recompute on change

    @property
    def is_loyal_customer(self) -> bool:
        return self._is_loyal_customer

The precomputed value is always consistent with the underlying data because it is recomputed on change. The AT4 tradeoff is preserved: fast reads, slightly more work on writes.


Where It Fails

FM8 — Schema/Contract Violation

Two expressions believed to be equivalent that are not.

This happens in three ways:

1. Incorrect mental application of equivalence laws.

# Developer believes these are equivalent
# Original: "process if admin OR (editor AND not read_only)"
def can_edit_original(is_admin: bool, is_editor: bool, is_read_only: bool) -> bool:
    return is_admin or (is_editor and not is_read_only)

# "Simplified" version: pulls out NOT incorrectly
def can_edit_wrong(is_admin: bool, is_editor: bool, is_read_only: bool) -> bool:
    # Developer thinks: factor out "not is_read_only" from the whole expression
    # This is NOT the same as the original
    return (is_admin or is_editor) and not is_read_only

# Test to reveal the difference
# Inputs: is_admin=False, is_editor=False, is_read_only=False
print(can_edit_original(False, False, False))   # False (not editor, not admin)
print(can_edit_wrong(False, False, False))       # False (same)

# Inputs: is_admin=True, is_editor=False, is_read_only=True
print(can_edit_original(True, False, True))     # True (admin overrides)
print(can_edit_wrong(True, False, True))         # True AND False = False (WRONG)

The second version is not equivalent to the first. An admin with is_read_only=True is incorrectly blocked. The “simplification” introduced a bug.

2. Equivalence that holds for most data but not all.

SQL query rewrites can fail when NULL values are involved. Classical boolean equivalence does not hold in three-valued SQL logic. The optimizer must account for NULL separately.

-- These are NOT equivalent in SQL due to NULL:
WHERE NOT (status = 'active' AND type = 'premium')
-- vs.
WHERE status != 'active' OR type != 'premium'

-- If status IS NULL:
-- First expression: NOT (NULL AND ...) = NOT NULL = NULL (unknown)
-- Second expression: NULL OR ... = NULL or the other side
-- They may evaluate differently for NULL inputs

3. Equivalence that depends on evaluation order (non-commutative systems).

In databases with row-level security, predicates can produce different results depending on which user context they are evaluated in. Two expressions equivalent in one security context may not be equivalent in another.


Real Systems

Compiler Optimization — compilers apply equivalence transformations at every stage. Constant folding (2 + 35), dead code elimination (removing code after an unconditional return), and strength reduction (replacing x * 2 with x << 1) are all logical equivalence transformations. The compiled code is semantically identical to the source; it is just in a cheaper form.

Circuit Minimization (Karnaugh Maps) — in digital circuit design, a Karnaugh map is a visual tool for finding the simplest boolean expression equivalent to a given truth table. Minimizing the expression minimizes the number of logic gates. Fewer gates means less chip area, less power consumption, and lower propagation delay. The Karnaugh map is a human-readable equivalence checker.

Theorem Provers and Proof Assistants — Coq, Lean, and Isabelle are software tools that mechanically verify logical equivalences. They are used to prove that software implementations match their specifications. “This sorting algorithm always produces a sorted output” is a logical statement. The theorem prover checks it against all inputs by symbolic reasoning — not by running test cases.

Constraint Propagation in SAT Solvers — modern SAT solvers (Z3, MiniSAT) maintain a formula in CNF and repeatedly apply unit propagation, a logical equivalence rule: if a clause contains a single unresolved literal, that literal must be True (otherwise the clause is False, making the whole formula False). Unit propagation simplifies the formula by substituting the forced value, producing an equivalent formula with fewer variables.

Python’s ast Module and Code Rewriting — Python’s abstract syntax tree (AST) allows programs to be inspected and rewritten before execution. Libraries like numexpr and sympy parse expressions, apply equivalence transformations, and produce optimized forms. sympy.simplify() takes a symbolic expression and returns a logically equivalent simpler one, using algebraic equivalence rules.


Concept: Logical Equivalence

Core Idea: Two logical expressions are equivalent if they produce identical truth values for every input. Equivalence enables safe substitution — replace any expression with a simpler equivalent without changing behavior. This is the foundation of every optimizer, compiler, and simplifier in computing.

Tradeoff: AT4 — Precomputation vs. On-Demand (equivalence allows precomputing results; the cost is staleness when the underlying data changes)

Failure Mode: FM8 — Schema/Contract Violation (two expressions believed equivalent but not — usually from incorrect manual application of equivalence laws or NULL-blindness in SQL)

Signal: When a “simplified” boolean condition produces unexpected behavior, or when a precomputed value diverges from its freshly-computed equivalent — verify equivalence by truth table before trusting a manual simplification.


Exercises

Level 1 — Understand

  1. Classify each expression as a tautology, contradiction, or contingency. Justify with a truth table or by citing an applicable law.

    • P OR NOT P
    • P AND NOT P
    • (P AND Q) OR (NOT P AND Q) OR (P AND NOT Q) OR (NOT P AND NOT Q)
    • P AND (NOT P OR Q)
  2. State the Distributive Law (both forms). Verify one form with a two-variable truth table where R = True.

  3. What is the difference between CNF and DNF? Give an example of one expression in each form.

Level 2 — Apply

Show that these pairs of expressions are logically equivalent using the algebraic laws (cite each law you use). Do not use truth tables.

  1. NOT (P AND Q) AND NOT (P AND NOT Q) simplifies to NOT P.

  2. (P OR Q) AND (P OR NOT Q) simplifies to P.

  3. Verify your answers by building truth tables.

Level 3 — Design

A permissions system has this rule:

def can_delete(is_owner: bool, is_admin: bool, is_archived: bool, is_locked: bool) -> bool:
    return (is_owner and not is_archived and not is_locked) or (is_admin and not is_locked)
  1. A developer wants to simplify this to: (is_owner or is_admin) and not is_archived and not is_locked. Show by truth table that these are NOT equivalent. Identify the specific input combination that differs.

  2. Derive the correct simplification. Show each step using the equivalence laws, citing each law.

  3. A precomputed field user.has_delete_rights is added to the database. It stores the value of can_delete at user object creation time. Identify the AT4 tradeoff. Under what conditions does has_delete_rights become stale (FM8)? Propose an invalidation strategy.

  4. The precomputed field is added to a database index: WHERE has_delete_rights = TRUE. What performance benefit does this provide? What correctness risk does it introduce? Name both tradeoffs (AT4 and AT9).

A complete answer will: name the structure or technique chosen, state why an obvious alternative was rejected, and identify the conditions under which the choice ceases to hold.

Read in the book →