Databass, Part 1: Queries
haskell, database
Lastmod: 2022-01-20

It's been a while since my last language series on this blog, but I figured I shouldn't let an entire calendar year go by without doing some technical writing here. This time we'll be working on creating a toy relational database in the vein of Tutorial D, as described in Databases, Types, and The Relational Model: The Third Manifesto by C.J. Date and Hugh Darwen. However, instead of creating a full database language with a its own syntax, we're going to embed the database language in Haskell. In particular, we're going to try and get ghc to ensure that queries are well typed as opposed to writing our own type checker. The source for all of this is on github.

Prior art

Tutorial D is popular among Haskell programmers due to its focus on being a mathematically correct and elegant implementation of the relational algebra, as opposed to SQL which carries the baggage of backwards compatibility and inexpressivity. To my knowledge, there exist already

This is a full-fledged Relational Database Management System based on Tutorial D that happens to be implemented in Haskell, not an EDSL.

This is an EDSL, and one that adheres vigorously to the Tutorial D specification. However, I didn't find out about it until I was deep into implementing my own version, and I wanted to keep going anyway for the learning experience. Also, it was written in 2015 for ghc-7.10, and type level programming in Haskell has come a long way since then.

A quick type level Haskell tutorial

Making a type-safe version of the relational model requires a lot of computation at the type level. All of the code in this post assumes the following extensions are enabled.

ConstraintKinds
DataKinds
FlexibleContexts
FlexibleInstances
GADTs
InstanceSigs
KindSignatures
MultiParamTypeClasses
PolyKinds
RankNTypes
ScopedTypeVariables
TypeApplications
TypeFamilies
TypeOperators
UndecidableInstances

I've arbitrarily decided that DataKinds, GADTs, and TypeFamilies deserve a little primer here, though I highly recommend the ghc user's guide for a more thorough treatment. If you're already familiar with these extensions, feel free to skip to the next section.

DataKinds

Turning on -XDataKinds "promotes" all data constructors to the type level. For data Bool = True | False ghc generates types 'True and 'False with kind Bool. Note the preceding ' which disambiguates between standard value-level booleans and the ones at the type level. This extension also gives us type level strings, called Symbol. You can use Symbol s as "tags" at the type level, like

newtype Currency (t :: Symbol) = MkCurrency Rational

usdToEur :: Currency "USD" -> Currency "EUR"
usdToEur (MkCurrency x) = MkCurrency (0.88 * x) -- on December 31, 2021

. We'll use them for database column names.

GADTs

Generalized Algebraic Data Types … generalize … normal Haskell data types. Syntactically, they are written as

data Maybe a where
  Just :: a -> Maybe a
  Nothing :: Maybe a

which has the same semantics as regular Maybe. However, we can also refine type variables appearing on the right hand side of constructors and recover them when pattern matching.

data Term a where
  Lit :: Int -> Term Int
  Add :: Term Int -> Term Int -> Term Int
  Equal :: Term Int -> Term Int -> Term Bool
  If :: Term Bool -> Term a -> Term a -> Term a

Now ghc will complain if we try to call If on anything other than a Bool.

eval :: Term a -> a
eval (Lit i) = i
eval (Add x y) = eval x + eval y
eval (Equal x y) = eval x == eval y
eval (If b cons alt) = if eval b then eval cons else eval alt

We can even use constraints in GADT constructors.

data Term a where
  Lit :: Num a => a -> Term a
  Add :: Num a => Term a -> Term a -> Term a
  Equal :: Eq a => Term a -> Term a -> Term Bool
  If :: Term Bool -> Term a -> Term a -> Term a

eval remains the same.

Type families

There are two kinds of type families in ghc, open and closed. We'll only be talking about the closed variant here, since that's what we're using, but it's good to know that open ones exist. At first approximation, type families are like normal Haskell functions without partial application or currying and if you could only use top level pattern matching.

type family Unsigned (a :: Type) :: Type where
  Unsigned Int8 = Word8
  Unsigned Int16 = Word16
  Unsigned Int32 = Word32
  Unsigned Int64 = Word64

We can inspect behavior in ghci by running

λ> :kind! Unsigned Int32
Unsigned Int32 :: *
= Word32

or :k! for short. A big point of departure from normal functions is that for Unsigned Bool or Unsigned a where Bool and a are not covered by the patterns, there is no pattern match failure. Instead, Unsigned Bool and Unsigned a create uninhabited (except by ) types like Void. If, instead, we do want to crash on an argument we didn't handle, we can throw a custom type error.

type family Unsigned (a :: Type) :: Type where
  Unsigned Int8 = Word8
  Unsigned Int16 = Word16
  Unsigned Int32 = Word32
  Unsigned Int64 = Word64
  Unsigned a = TypeError ('Text "Unexpected type " ':<>: 'ShowType a)

Basic relational definitions

The following is all adapted from Chapter 2 of The Third Manifesto. Some basic knowledge of SQL is assumed.

Tuples and relations

Date and Darwen define a tuple as "a set of \(n\) ordered triples of the form \(<A_i,T_i,v_i>\), where \(A_i\) is an attribute name, \(T_i\) is a type name, and \(v_i\) is a value of type \(T_i\)." A relation consists of a heading, which is essentially a tuple type, and a body, which is a set of tuples all with the same type, namely that of the heading. Tuples and relations correspond to rows and tables in SQL. Here are three relations.

╔═════════════════════════════════════════════════════════════════╗
║     S (suppliers)                          SP (suppliers-parts) ║
║    ┌────┬───────┬────────┬────────┐           ┌────┬────┬─────┐ ║
║    │ S# │ SNAME │ STATUS │ CITY   │           │ S# │ P# │ QTY │ ║
║    ├════┼───────┼────────┼────────┤           ├════┼════┼─────┤ ║
║    │ S1 │ Smith │     20 │ London │           │ S1 │ P1 │ 300 │ ║
║    │ S2 │ Jones │     10 │ Paris  │           │ S1 │ P2 │ 200 │ ║
║    │ S3 │ Blake │     30 │ Paris  │           │ S1 │ P3 │ 400 │ ║
║    │ S4 │ Clark │     20 │ London │           │ S1 │ P4 │ 200 │ ║
║    │ S5 │ Adams │     30 │ Athens │           │ S1 │ P5 │ 100 │ ║
║    └────┴───────┴────────┴────────┘           │ S1 │ P6 │ 100 │ ║
║     P (parts)                                 │ S2 │ P1 │ 300 │ ║
║    ┌────┬───────┬───────┬────────┬────────┐   │ S2 │ P2 │ 400 │ ║
║    │ P# │ PNAME │ COLOR │ WEIGHT │ CITY   │   │ S3 │ P2 │ 200 │ ║
║    ├════┼───────┼───────┼────────┼────────┤   │ S4 │ P2 │ 200 │ ║
║    │ P1 │ Nut   │ Red   │   12.0 │ London │   │ S4 │ P4 │ 300 │ ║
║    │ P2 │ Bolt  │ Green │   17.0 │ Paris  │   │ S4 │ P5 │ 400 │ ║
║    │ P3 │ Screw │ Blue  │   17.0 │ Oslo   │   └────┴────┴─────┘ ║
║    │ P4 │ Screw │ Red   │   14.0 │ London │                     ║
║    │ P5 │ Cam   │ Blue  │   12.0 │ Paris  │                     ║
║    │ P6 │ Cog   │ Red   │   19.0 │ London │                     ║
║    └────┴───────┴───────┴────────┴────────┘                     ║
╚═════════════════════════════════════════════════════════════════╝

VAR S REAL RELATION  { S# S#, SNAME NAME, STATUS INTEGER, CITY CHAR } KEY { S# } ;
VAR P REAL RELATION  { P# P#, PNAME NAME, COLOR COLOR, WEIGHT WEIGHT, CITY CHAR } KEY { P# } ;
VAR SP REAL RELATION { S# S#, P# P#, QTY QTY } KEY { S#, P# } ;

In Haskell, we can represent tuple types like so (in the real code we use the Map type from the type-level-sets package. This is just for educational purposes):

-- This is like Proxy, but restricted to only having Symbol types
data Var (label :: Symbol) = Var

data Mapping k v = k :-> v

-- This binds more tightly than list cons (:), which is convenient for pattern matching
type (k :: Symbol) ::: (v :: Type) = k ':-> v
infixr 6 :::

data Tuple (attrs :: [Mapping Symbol Type]) where
  Empty :: Tuple '[]
  Ext :: Var label -> a -> Tuple as -> Tuple (label ::: a ': as)

-- Some example headings
type SHeading =
  '[ "S#" ::: Int
   , "SNAME" ::: String
   , "STATUS" ::: Int
   , "CITY" ::: String
   ]

data Color = Red | Green | Blue

type PHeading =
  '[ "P#" ::: Int
   , "PNAME" ::: String
   , "COLOR" ::: Color
   , "WEIGHT" ::: Double
   , "CITY" ::: String
   ]

type SPHeading =
  '[ "S#" ::: Int
   , "P#" ::: Int
   , "QTY" ::: Int
   ]

To construct a value of one of these tuples, we can write

sExample :: Tuple SHeading
sExample = Ext Var 1 $ Ext Var "Smith" $ Ext Var 20 $ Ext Var "London" Empty

which corresponds to

 ┌────┬───────┬────────┬────────┐
 │ S# │ SNAME │ STATUS │ CITY   │
 ├────┼───────┼────────┼────────┤
 │ S1 │ Smith │     20 │ London │
 └────┴───────┴────────┴────────┘

Relations in Tutorial D all have primary keys. A relation key is a tuple with a subset of the attributes of the relation's heading. The "supplier" relation, defined as

VAR S REAL RELATION  { S# S#, SNAME NAME, STATUS INTEGER, CITY CHAR } KEY { S# } ;

in Tutorial D, has primary key S#, whereas the "supplier-parts" relation

VAR SP REAL RELATION { S# S#, P# P#, QTY QTY } KEY { S#, P# } ;

has a composite key S#, P#. To check the property that a table key be some subset of the table heading, we can use the Submap typeclass from type-level-sets. We'll also want to be able to split tuples into keys and everything else for storage and put them back together, which can be similarly accomplished with the Split and Unionable typeclasses. Packing these typeclasses into a GADT constructor ensures that those instances are in scope when we pattern match on that constructor. While we haven't added any runtime information to the constructor here, we could augment it with some kind of integrity constraints of the form of Tuple heading -> Bool that would run on all new tuples added to the relation.

data Relation heading key val =
  ( Submap key heading -- Assert the key is a subset of the heading
  , Submap val heading -- Assert the rest of the tuple is also a subset of the heading
  , Split key val heading -- Assert that we can split the heading into keys and vals
  , Unionable key val -- Assert we can stitch keys and vals together
  , Union key val ~ heading -- Assert that when we perform the stitching operation the result is the heading
  , IsMap heading -- Assert that there are no duplicates in the heading and that attributes are sorted
  )
  => MkRelation

"supplier" in this scheme is

s :: Relation
  (AsMap '["S#" ::: Int, "SNAME" ::: String, "STATUS" ::: Int, "CITY" ::: String])
  '["S#" ::: Int]
  (AsMap '["SNAME" ::: String, "STATUS" ::: Int, "CITY" ::: String])
s = MkRelation

Tutorial D stresses the order of attributes in tuples is immaterial. Unfortuantely, in Haskell, type level lists are ordered. To get around that, we use the AsMap type family to sort attributes alphabetically. Going forward, the convention for any type families that take and return [Mapping Symbol Type] is that their arguments are assumed to be in sorted order and they should ensure that they maintain sorting when they return.

It's also a bit annoying to have to specify so much redundant information in the Relation type signature. We can approximate the Tutorial D syntax with a helper type family

type family Rel (heading :: [Mapping Symbol Type]) (key :: [Symbol]) where
  Rel heading key = Relation heading (heading :!! key) (heading :\\ key)

-- | Type level key lookup
type family (m :: [Mapping Symbol Type]) :! (c :: Symbol) :: Type where
  (label ::: a ': rest) :! label = a
  (attr ': rest) :! label = rest :! label
  '[] :! label = TypeError ( 'Text "Could not find " ':<>: 'ShowType label)

-- | Type level multi-key lookup
type family (m :: [Mapping Symbol Type]) :!! (cs :: [Symbol]) :: [Mapping Symbol Type] where
  m :!! (label ': ls) = (label ::: (m :! label)) ': (m :!! ls)
  m :!! '[] = '[]

-- | Type level key removal
type family (m :: [Mapping Symbol Type]) :\ (c :: Symbol) :: [Mapping Symbol Type] where
  (label ::: a ': rest) :\ label = rest
  (attr ': rest) :\ label = attr ': (rest :\ label)
  '[] :\ label = TypeError ( 'Text "Could not find " ':<>: 'ShowType label)

-- | Type level multi-key removal
type family (m :: [Mapping Symbol Type]) :\\ (cs :: [Symbol]) :: [Mapping Symbol Type] where
  m :\\ (label ': ls) = (m :\ label) :\\ ls
  m :\\ '[] = m

Now we can write

s' :: Rel
  (AsMap '["S#" ::: Int, "SNAME" ::: String, "STATUS" ::: Int, "CITY" ::: String])
  '["S#"]
s' = MkRelation

To store relations in memory at runtime, we'll use the standard Map from containers.

type family RelationToMap relation where
  RelationToMap (Relation heading key val) = Map (Tuple key) (Tuple val)

We could optimize the representation a bit by using an IntMap for the common case of a single Int primary key, but in the interest of simplicity we'll forego that here.

Our database will consist of a series of named relations, for which we can reuse our existing Tuple infrastructure.

type family RelationsToDB (relations :: [Mapping Symbol Type]) where
  RelationsToDB '[] = '[]
  RelationsToDB (name ::: relation ': rest) =
    name ::: RelationToMap relation ': RelationsToDB rest

While in standard value-level Haskell we'd usually write this as relationsToDB = fmap relationToMap or something, type families don't have partial application and I'd rather not pull in something like singletons if I can get away without it.

Defining queries

Now that we can talk about tuples and relations, we can define the type of a Query acting on a set of relation s that we expect to return a relation, consisting of tuples of type t.

data Query (t :: [Mapping Symbol Type]) (relations :: [Mapping Symbol Type]) where

We'll also want some way to run the query.

import qualified Data.Map as M

runQuery :: Query t relations -> Tuple (RelationsToDB relations) -> [Tuple t]
runQuery q db = case q of
  ...

Identity

The simplest possible query is to just get the entire contents of a single named relation. It's actually quite difficult to ensure type safety for this, as we need to check that the name of the relation matches the type of the expected heading, and that indeed a relation of that name is defined at all.

  RelationId ::
    ( relation ~ Relation heading key val
    , (relations :! name) ~ relation
    , IsMember name (RelationToMap relation) (RelationsToDB relations)
    ) =>
    Var name ->
    Relation heading key val ->
    Query heading relations

RelationId (Var @"table") someRelation corresponds to the sql select * from table, assuming that we've created someRelation named "table." To run this query, we lookp the name of the relation in the database and reassemble all of the key-value pairs into the heading.

  RelationId name (MkRelation :: Relation heading key val) ->
    let relation = lookp name db
     in fmap (\(k :: Tuple key, v :: Tuple val) -> k `union` v) (M.toList relation)

This is where the trick of putting the constraints in GADT constructors comes into play. Without Unionable key val in MkRelation, ghc would complain about not being able to find a Unionable key val when calling k `union` v and without IsMember name (RelationToMap relation) (RelationsToDB relations), lookp name db would similarly fail. Note, when trying to write something like this yourself, you're probably not going to get the constraints right immediately. I certainly didn't. My recommended workflow is to write the GADT constructor with an empty set of constraints (() =>) and then when ghc tells you Couldn't deduce instance Class for Type, fill the parentheses until it typechecks.

Rename

From Chapter 2:

Let a be a relation with an attribute X and no attribute Y. Then the expression a RENAME ( X AS Y ) yields a relation that differs from a only in that the name of the specified attribute is Y instead of X.

We'll want some type family that can compute the renaming at compile time. The authors don't specify that renaming to a name already in the heading is an error (at least in Chapter 2) but we're going to make it one explicitly.

type family Rename (x :: Symbol) (y :: Symbol) (relation :: [Mapping Symbol Type]) where
  Rename a b '[] = '[]
  Rename a b ((a ::: t) ': rest) = (b ::: t) ': Rename a b rest
  Rename a b ((b ::: t) ': rest) =
    TypeError
      ( 'Text "Cannot rename "
          ':<>: 'Text a
          ':<>: 'Text " to "
          ':<>: 'Text b
          ':$$: 'Text "The name already exists in the tuple"
      )
  Rename a b (c ': rest) = c ': Rename a b rest

The Query constructor can just use the type family.

  Rename ::
    (Sortable (Rename a b t)) =>
    Var a ->
    Var b ->
    Query t relations ->
    Query (Sort (Rename a b t)) relations

Remember that we operate assuming that every heading transformation takes a sorted heading and should return a sorted heading. Renaming is not order preserving in general (you could rename "a" to "z" and then what was the first attribute would go at the end) so we have to sort the output after the operation 1. To implement renaming, we need some function

renameTuple :: Var a -> Var b -> Tuple t -> Tuple (Rename a b t)

Recall, though, that the definition of Var is just data Var (k :: Symbol) = Var, so there is no difference in runtime representation between Var :: Var "a" and Var :: Var "b". We can be confident in this assertion because ghc allows

renameVar :: Var a -> Var b
renameVar = coerce

Consequently, since Rename does no reordering and doesn't change the types of any of the tuple items, we can deduce that Tuple t also has the same runtime representation as Tuple (Rename a b t). Automatically proving that is, sadly, beyond ghc's capabilities. Having convinced ourselves, though, we can write

renameTuple _a _b = unsafeCoerce

Don't try this at home unless you really know what you're doing. Do try out implementing this function without unsafeCoerce. You'll probably want to start out with something like

class Renamable a b t where
  renameTuple :: Var a -> Var b -> Tuple t -> Tuple (Rename a b t)

instance Renameable a b '[] where
  renameTuple _ _ Empty = Empty

to help guide ghc through the induction.

Restrict

Restrict is essentially the same as SQL WHERE.

  Restrict :: (Tuple t -> Bool) -> Query t relations -> Query t relations

Refreshingly, this constructor has no constraints or other GADT shenanigans. It's even expressible without GADT syntax:

data Query t relations =
     | ...
     | Restrict (Tuple t -> Bool) (Query t relations)
     | ...

The implementation is correspondingly straightforward.

  Restrict f q' -> filter f (runQuery q' db)

Project

This is like specifying which columns to select in SQL. For select col1, col2 from table we have RelationId (Var @"table") someRelation & Project. Amazingly, if we have enough type signatures specified, ghc can infer which columns we wanted to project onto. We'll also want a type error if we try to project onto columns that don't exist. Note that projection is order preserving, unlike renaming, so we don't have to do any extra sorting.

  Project :: (Submap t' t) => Query t relations -> Query t' relations

Embedding the Submap t' t constraint basically amounts to the whole implementation.

  Project q' -> map submap (runQuery q' db)

Extend

From the book

Let a be a relation. Then the extension EXTEND a ADD ( exp AS Z ) is a relation with

  • A heading consisting of the heading of a extended with the attribute Z

  • A body consisting of all tuples t such that t is a tuple of a extended with a value for attribute Z that is computed by evaluating exp on that tuple of a

Relation a must not have an attribute called Z, and exp must not refer to Z.

Here is a simple example of EXTEND: EXTEND S ADD ( 3 * STATUS AS TRIPLE )

Extend doesn't have a great SQL analogue that I know of. The closest construct is probably something like SELECT a + 1 from table where you put some expression after the SELECT.

  Extend ::
    (Member l t ~ 'False, Sortable (l ::: a ': t)) =>
    Var l ->
    (Tuple t -> a) ->
    Query t relations ->
    Query (Sort (l ::: a ': t)) relations

Given some function that takes a label and the existing tuple type, we stick the result of calling that function on every tuple in the relation into a new attribute with the label.

  Extend label f q' -> map (\t -> quicksort (Ext label (f t) t)) (runQuery q' db)

Join

The last type of query we'll implement here is Join, corresponding to the "natural join" which takes all common attributes of two relations and returns all tuples that have the same values for those attributes. If we want something like SQL SELECT x, y FROM t1 JOIN t2 ON t1.col1 = t2.col2, we can just rename col2 to col1 in one of the relations before the join. type-level-sets doesn't provide a way of computing the intersection of two mapping lists, so we'll have to write it ourselves. Let's look at what this function looks like at the value level.

sortedIntersection :: Ord a => [a] -> [a] -> [a]
sortedIntersection t [] = []
sortedIntersection [] t = []
sortedIntersection (x:xs) (y:ys)
  | x == y     = x : sortedIntersection xs ys
  | x < y      = sortedIntersection xs (y:ys)
  | otherwise  = sortedIntersection (x:xs) ys

-- >>> sortedIntersection [1..5] [5..10]
-- [5]

Now we can convert this to the type level. Another difference between type families and functions is that when we bind a type variable multiple times in a pattern, we assert that it refers to the same type in both locations, whereas binding a variable twice in a function pattern is an error. Also, type families can only branch at top level patterns, so we'll have to make a helper type family to store the result of comparing labels.

type family Intersection (t :: [Mapping Symbol Type]) (t' :: [Mapping Symbol Type]) :: [Mapping Symbol Type] where
  Intersection t '[] = '[]
  Intersection '[] t = '[]
  Intersection (a ': as) (a ': bs) = a ': Intersection as bs
  Intersection (l ::: a ': as) (r ::: b ': bs) =
    IntersectionCase (CmpSymbol l r) l r a as b bs

type family IntersectionCase (ordering :: Ordering) l r a as b bs where
  IntersectionCase 'LT l r a as b bs = Intersection as (r ::: b ': bs)
  IntersectionCase 'GT l r a as b bs = Intersection (l ::: a ': as) bs
  IntersectionCase 'EQ l r a as a bs = l ::: a ': Intersection as bs
  IntersectionCase 'EQ l r a as b bs =
    TypeError
      ( 'Text "Cannot join on attribute '"
          ':<>: 'Text l
          ':<>: 'Text "'"
          ':$$: 'Text l
          ':<>: 'Text "has type "
          ':<>: 'ShowType a
          ':<>: 'Text " in the first relation and type "
          ':<>: 'ShowType b
          ':<>: 'Text "in the second"
      )

-- >>> :kind! Intersection '["id" ::: Int, "name" ::: String] '["id" ::: Int]
-- Intersection '["id" ::: Int, "name" ::: String] '["id" ::: Int] :: [Mapping Symbol *]
-- = '[ "id" ':-> Int]

-- >>> :kind! Intersection '["id" ::: Int, "name" ::: String] '["id" ::: word]
-- Intersection '["id" ::: Int, "name" ::: String] '["id" ::: Word] :: [Mapping Symbol *]
-- = (TypeError ...)

It's kind of nice to be able to "crash" in type families with a custom TypeError and not feel bad about it like when we write error in a normal function. Anyway, now that we can calculate which attributes we want to join on, we can attempt to write down the full Join constructor.

  Join ::
    ( common ~ Intersection t' t
    , Eq (Tuple common)
    , Split common t'_rest t'
    , Split common t_rest t
    , Sortable (common :++ (t'_rest :++ t_rest))
    ) =>
    Query t' tables ->
    Query t tables ->
    Query (Sort (common :++ (t'_rest :++ t_rest))) tables

However, we get a big type error.

    • Could not deduce: Sort (common :++ (t'_rest0 :++ t_rest0))
                        ~ Sort (common :++ (t'_rest :++ t_rest))
      from the context: (common ~ Intersection t' t, Eq (Tuple common),
                         Split common t'_rest t', Split common t_rest t,
                         Sortable (common :++ (t'_rest :++ t_rest)))
        bound by the type of the constructor ‘Join’:
                   forall (common :: [Mapping Symbol *]) (t' :: [Mapping Symbol *])
                          (t :: [Mapping Symbol *]) (t'_rest :: [Mapping Symbol *])
                          (t_rest :: [Mapping Symbol *]) (tables :: [Mapping Symbol *]).
                   (common ~ Intersection t' t, Eq (Tuple common),
                    Split common t'_rest t', Split common t_rest t,
                    Sortable (common :++ (t'_rest :++ t_rest))) =>
                   Query t' tables
                   -> Query t tables
                   -> Query (Sort (common :++ (t'_rest :++ t_rest))) tables
        at /Users/josephmorag/Projects/databass/src/Databass/Blog.hs:(127,3)-(138,57)
      Expected type: Query t' tables
                     -> Query t tables
                     -> Query (Sort (common :++ (t'_rest :++ t_rest))) tables
        Actual type: Query t' tables
                     -> Query t tables
                     -> Query (Sort (common :++ (t'_rest0 :++ t_rest0))) tables
      NB: ‘Sort’ is a non-injective type family
      The type variables ‘t'_rest0’, ‘t_rest0’ are ambiguous
    • In the ambiguity check for ‘Join’
      To defer the ambiguity check to use sites, enable AllowAmbiguousTypes
      In the definition of data constructor ‘Join’
      In the data type declaration for ‘Query’

While we could solve this by enabling the AllowAmbiguousTypes extension, as helpfully suggested in the error message, we'll need to have access to the t_rest and t'rest type variables inside the implementation, like we did in the RelationId case. I don't actually know how to introduce them into the scope of the function with AllowAmbiguousTypes turned on, so instead, we'll leave it off and pass those type arguments inside Proxy s so we can match on them. If anyone does know how to accomplish that without Proxy please let me know.

  Join ::
    ( common ~ Intersection t' t
    , Eq (Tuple common)
    , Split common t'_rest t'
    , Split common t_rest t
    , Sortable (common :++ (t'_rest :++ t_rest))
    ) =>
    Proxy t'_rest ->
    Proxy t_rest ->
    Query t' tables ->
    Query t tables ->
    Query (Sort (common :++ (t'_rest :++ t_rest))) tables

For the implementation, we'll write a "nested inner loop" join, though it doesn't look too much like a C-style nested loop when written using the list monad like this.

  Join (_ :: Proxy t_l_rest) (_ :: Proxy t_r_rest) q1 q2 -> do
    l :: Tuple t_l <- runQuery q1 db
    r :: Tuple t_r <- runQuery q2 db
    let (l_common, l_rest) = split @(Intersection t_l t_r) @t_l_rest l
        (r_common, r_rest) = split @(Intersection t_l t_r) @t_r_rest r
    guard (l_common == r_common)
    pure (quicksort $ append l_common (append l_rest r_rest))

Conclusion

There are a few more relational operations described in Tutorial D that we haven't covered here, including Summarize, Group, and Ungroup, but what we have is good enough to cover most SQL queries that I've ever written. In the next parts of the series we'll go over populating the database and try to use it for a toy server.


1

The Sort type family and Sortable class use the canonical Haskell quicksort implementation that gets shown to beginners to demonstrate how elegant the language is. Since we know we're inserting one element into an otherwise ordered list, we should just do that in linear time. The necessary type family and typeclass are left as an exercise to the reader.