It's been a while since my last language series on this blog, but I figured I shouldn't let an entire calendar year go by without doing some technical writing here. This time we'll be working on creating a toy relational database in the vein of Tutorial D, as described in Databases, Types, and The Relational Model: The Third Manifesto by C.J. Date and Hugh Darwen. However, instead of creating a full database language with a its own syntax, we're going to embed the database language in Haskell. In particular, we're going to try and get ghc to ensure that queries are well typed as opposed to writing our own type checker. The source for all of this is on github.
Prior art
Tutorial D is popular among Haskell programmers due to its focus on being a mathematically correct and elegant implementation of the relational algebra, as opposed to SQL which carries the baggage of backwards compatibility and inexpressivity. To my knowledge, there exist already
This is a full-fledged Relational Database Management System based on Tutorial D that happens to be implemented in Haskell, not an EDSL.
This is an EDSL, and one that adheres vigorously to the Tutorial D specification. However, I didn't find out about it until I was deep into implementing my own version, and I wanted to keep going anyway for the learning experience. Also, it was written in 2015 for ghc-7.10, and type level programming in Haskell has come a long way since then.
A quick type level Haskell tutorial
Making a type-safe version of the relational model requires a lot of computation at the type level. All of the code in this post assumes the following extensions are enabled.
ConstraintKinds
DataKinds
FlexibleContexts
FlexibleInstances
GADTs
InstanceSigs
KindSignatures
MultiParamTypeClasses
PolyKinds
RankNTypes
ScopedTypeVariables
TypeApplications
TypeFamilies
TypeOperators
UndecidableInstances
I've arbitrarily decided that DataKinds
, GADTs
, and TypeFamilies
deserve a little primer here, though I highly recommend the ghc user's guide for a more thorough treatment. If you're already familiar with these extensions, feel free to skip to the next section.
DataKinds
Turning on -XDataKinds
"promotes" all data constructors to the type level. For data Bool = True | False
ghc generates types 'True
and 'False
with kind Bool
. Note the preceding '
which disambiguates between standard value-level booleans and the ones at the type level. This extension also gives us type level strings, called Symbol
. You can use Symbol
s as "tags" at the type level, like
newtype Currency (t :: Symbol) = MkCurrency Rational
usdToEur :: Currency "USD" -> Currency "EUR"
usdToEur (MkCurrency x) = MkCurrency (0.88 * x) -- on December 31, 2021
. We'll use them for database column names.
GADTs
Generalized Algebraic Data Types … generalize … normal Haskell data types. Syntactically, they are written as
data Maybe a where
Just :: a -> Maybe a
Nothing :: Maybe a
which has the same semantics as regular Maybe
. However, we can also refine type variables appearing on the right hand side of constructors and recover them when pattern matching.
data Term a where
Lit :: Int -> Term Int
Add :: Term Int -> Term Int -> Term Int
Equal :: Term Int -> Term Int -> Term Bool
If :: Term Bool -> Term a -> Term a -> Term a
Now ghc will complain if we try to call If
on anything other than a Bool
.
eval :: Term a -> a
eval (Lit i) = i
eval (Add x y) = eval x + eval y
eval (Equal x y) = eval x == eval y
eval (If b cons alt) = if eval b then eval cons else eval alt
We can even use constraints in GADT constructors.
data Term a where
Lit :: Num a => a -> Term a
Add :: Num a => Term a -> Term a -> Term a
Equal :: Eq a => Term a -> Term a -> Term Bool
If :: Term Bool -> Term a -> Term a -> Term a
eval
remains the same.
Type families
There are two kinds of type families in ghc, open and closed. We'll only be talking about the closed variant here, since that's what we're using, but it's good to know that open ones exist. At first approximation, type families are like normal Haskell functions without partial application or currying and if you could only use top level pattern matching.
type family Unsigned (a :: Type) :: Type where
Unsigned Int8 = Word8
Unsigned Int16 = Word16
Unsigned Int32 = Word32
Unsigned Int64 = Word64
We can inspect behavior in ghci by running
λ> :kind! Unsigned Int32
Unsigned Int32 :: *
= Word32
or :k!
for short. A big point of departure from normal functions is that for Unsigned Bool
or Unsigned a
where Bool
and a
are not covered by the patterns, there is no pattern match failure. Instead, Unsigned Bool
and Unsigned a
create uninhabited (except by ⟂
) types like Void
. If, instead, we do want to crash on an argument we didn't handle, we can throw a custom type error.
type family Unsigned (a :: Type) :: Type where
Unsigned Int8 = Word8
Unsigned Int16 = Word16
Unsigned Int32 = Word32
Unsigned Int64 = Word64
Unsigned a = TypeError ('Text "Unexpected type " ':<>: 'ShowType a)
Basic relational definitions
The following is all adapted from Chapter 2 of The Third Manifesto. Some basic knowledge of SQL is assumed.
Tuples and relations
Date and Darwen define a tuple as "a set of n ordered triples of the form <Ai,Ti,vi>, where Ai is an attribute name, Ti is a type name, and vi is a value of type Ti." A relation consists of a heading, which is essentially a tuple type, and a body, which is a set of tuples all with the same type, namely that of the heading. Tuples and relations correspond to rows and tables in SQL. Here are three relations.
╔═════════════════════════════════════════════════════════════════╗
║ S (suppliers) SP (suppliers-parts) ║
║ ┌────┬───────┬────────┬────────┐ ┌────┬────┬─────┐ ║
║ │ S# │ SNAME │ STATUS │ CITY │ │ S# │ P# │ QTY │ ║
║ ├════┼───────┼────────┼────────┤ ├════┼════┼─────┤ ║
║ │ S1 │ Smith │ 20 │ London │ │ S1 │ P1 │ 300 │ ║
║ │ S2 │ Jones │ 10 │ Paris │ │ S1 │ P2 │ 200 │ ║
║ │ S3 │ Blake │ 30 │ Paris │ │ S1 │ P3 │ 400 │ ║
║ │ S4 │ Clark │ 20 │ London │ │ S1 │ P4 │ 200 │ ║
║ │ S5 │ Adams │ 30 │ Athens │ │ S1 │ P5 │ 100 │ ║
║ └────┴───────┴────────┴────────┘ │ S1 │ P6 │ 100 │ ║
║ P (parts) │ S2 │ P1 │ 300 │ ║
║ ┌────┬───────┬───────┬────────┬────────┐ │ S2 │ P2 │ 400 │ ║
║ │ P# │ PNAME │ COLOR │ WEIGHT │ CITY │ │ S3 │ P2 │ 200 │ ║
║ ├════┼───────┼───────┼────────┼────────┤ │ S4 │ P2 │ 200 │ ║
║ │ P1 │ Nut │ Red │ 12.0 │ London │ │ S4 │ P4 │ 300 │ ║
║ │ P2 │ Bolt │ Green │ 17.0 │ Paris │ │ S4 │ P5 │ 400 │ ║
║ │ P3 │ Screw │ Blue │ 17.0 │ Oslo │ └────┴────┴─────┘ ║
║ │ P4 │ Screw │ Red │ 14.0 │ London │ ║
║ │ P5 │ Cam │ Blue │ 12.0 │ Paris │ ║
║ │ P6 │ Cog │ Red │ 19.0 │ London │ ║
║ └────┴───────┴───────┴────────┴────────┘ ║
╚═════════════════════════════════════════════════════════════════╝
VAR S REAL RELATION { S# S#, SNAME NAME, STATUS INTEGER, CITY CHAR } KEY { S# } ;
VAR P REAL RELATION { P# P#, PNAME NAME, COLOR COLOR, WEIGHT WEIGHT, CITY CHAR } KEY { P# } ;
VAR SP REAL RELATION { S# S#, P# P#, QTY QTY } KEY { S#, P# } ;
In Haskell, we can represent tuple types like so (in the real code we use the Map
type from the type-level-sets package. This is just for educational purposes):
-- This is like Proxy, but restricted to only having Symbol types
data Var (label :: Symbol) = Var
data Mapping k v = k :-> v
-- This binds more tightly than list cons (:), which is convenient for pattern matching
type (k :: Symbol) ::: (v :: Type) = k ':-> v
infixr 6 :::
data Tuple (attrs :: [Mapping Symbol Type]) where
Empty :: Tuple '[]
Ext :: Var label -> a -> Tuple as -> Tuple (label ::: a ': as)
-- Some example headings
type SHeading =
'[ "S#" ::: Int
, "SNAME" ::: String
, "STATUS" ::: Int
, "CITY" ::: String
]
data Color = Red | Green | Blue
type PHeading =
'[ "P#" ::: Int
, "PNAME" ::: String
, "COLOR" ::: Color
, "WEIGHT" ::: Double
, "CITY" ::: String
]
type SPHeading =
'[ "S#" ::: Int
, "P#" ::: Int
, "QTY" ::: Int
]
To construct a value of one of these tuples, we can write
sExample :: Tuple SHeading
sExample = Ext Var 1 $ Ext Var "Smith" $ Ext Var 20 $ Ext Var "London" Empty
which corresponds to
┌────┬───────┬────────┬────────┐
│ S# │ SNAME │ STATUS │ CITY │
├────┼───────┼────────┼────────┤
│ S1 │ Smith │ 20 │ London │
└────┴───────┴────────┴────────┘
Relations in Tutorial D all have primary keys. A relation key is a tuple with a subset of the attributes of the relation's heading. The "supplier" relation, defined as
VAR S REAL RELATION { S# S#, SNAME NAME, STATUS INTEGER, CITY CHAR } KEY { S# } ;
in Tutorial D, has primary key S#
, whereas the "supplier-parts" relation
VAR SP REAL RELATION { S# S#, P# P#, QTY QTY } KEY { S#, P# } ;
has a composite key S#, P#
. To check the property that a table key be some subset of the table heading, we can use the Submap
typeclass from type-level-sets
. We'll also want to be able to split tuples into keys and everything else for storage and put them back together, which can be similarly accomplished with the Split
and Unionable
typeclasses. Packing these typeclasses into a GADT
constructor ensures that those instances are in scope when we pattern match on that constructor. While we haven't added any runtime information to the constructor here, we could augment it with some kind of integrity constraints of the form of Tuple heading -> Bool
that would run on all new tuples added to the relation.
data Relation heading key val =
( Submap key heading -- Assert the key is a subset of the heading
, Submap val heading -- Assert the rest of the tuple is also a subset of the heading
, Split key val heading -- Assert that we can split the heading into keys and vals
, Unionable key val -- Assert we can stitch keys and vals together
, Union key val ~ heading -- Assert that when we perform the stitching operation the result is the heading
, IsMap heading -- Assert that there are no duplicates in the heading and that attributes are sorted
)
=> MkRelation
"supplier" in this scheme is
s :: Relation
(AsMap '["S#" ::: Int, "SNAME" ::: String, "STATUS" ::: Int, "CITY" ::: String])
'["S#" ::: Int]
(AsMap '["SNAME" ::: String, "STATUS" ::: Int, "CITY" ::: String])
s = MkRelation
Tutorial D stresses the order of attributes in tuples is immaterial. Unfortuantely, in Haskell, type level lists are ordered. To get around that, we use the AsMap
type family to sort attributes alphabetically. Going forward, the convention for any type families that take and return [Mapping Symbol Type]
is that their arguments are assumed to be in sorted order and they should ensure that they maintain sorting when they return.
It's also a bit annoying to have to specify so much redundant information in the Relation
type signature. We can approximate the Tutorial D syntax with a helper type family
type family Rel (heading :: [Mapping Symbol Type]) (key :: [Symbol]) where
Rel heading key = Relation heading (heading :!! key) (heading :\\ key)
-- | Type level key lookup
type family (m :: [Mapping Symbol Type]) :! (c :: Symbol) :: Type where
(label ::: a ': rest) :! label = a
(attr ': rest) :! label = rest :! label
'[] :! label = TypeError ( 'Text "Could not find " ':<>: 'ShowType label)
-- | Type level multi-key lookup
type family (m :: [Mapping Symbol Type]) :!! (cs :: [Symbol]) :: [Mapping Symbol Type] where
m :!! (label ': ls) = (label ::: (m :! label)) ': (m :!! ls)
m :!! '[] = '[]
-- | Type level key removal
type family (m :: [Mapping Symbol Type]) :\ (c :: Symbol) :: [Mapping Symbol Type] where
(label ::: a ': rest) :\ label = rest
(attr ': rest) :\ label = attr ': (rest :\ label)
'[] :\ label = TypeError ( 'Text "Could not find " ':<>: 'ShowType label)
-- | Type level multi-key removal
type family (m :: [Mapping Symbol Type]) :\\ (cs :: [Symbol]) :: [Mapping Symbol Type] where
m :\\ (label ': ls) = (m :\ label) :\\ ls
m :\\ '[] = m
Now we can write
s' :: Rel
(AsMap '["S#" ::: Int, "SNAME" ::: String, "STATUS" ::: Int, "CITY" ::: String])
'["S#"]
s' = MkRelation
To store relations in memory at runtime, we'll use the standard Map
from containers
.
type family RelationToMap relation where
RelationToMap (Relation heading key val) = Map (Tuple key) (Tuple val)
We could optimize the representation a bit by using an IntMap
for the common case of a single Int
primary key, but in the interest of simplicity we'll forego that here.
Our database will consist of a series of named relations, for which we can reuse our existing Tuple
infrastructure.
type family RelationsToDB (relations :: [Mapping Symbol Type]) where
RelationsToDB '[] = '[]
RelationsToDB (name ::: relation ': rest) =
name ::: RelationToMap relation ': RelationsToDB rest
While in standard value-level Haskell we'd usually write this as relationsToDB = fmap relationToMap
or something, type families don't have partial application and I'd rather not pull in something like singletons
if I can get away without it.
Defining queries
Now that we can talk about tuples and relations, we can define the type of a Query
acting on a set of relation
s that we expect to return a relation, consisting of tuples of type t
.
data Query (t :: [Mapping Symbol Type]) (relations :: [Mapping Symbol Type]) where
We'll also want some way to run the query.
import qualified Data.Map as M
runQuery :: Query t relations -> Tuple (RelationsToDB relations) -> [Tuple t]
runQuery q db = case q of
...
Identity
The simplest possible query is to just get the entire contents of a single named relation. It's actually quite difficult to ensure type safety for this, as we need to check that the name of the relation matches the type of the expected heading, and that indeed a relation of that name is defined at all.
RelationId ::
( relation ~ Relation heading key val
, (relations :! name) ~ relation
, IsMember name (RelationToMap relation) (RelationsToDB relations)
) =>
Var name ->
Relation heading key val ->
Query heading relations
RelationId (Var @"table") someRelation
corresponds to the sql select * from table
, assuming that we've created someRelation
named "table." To run this query, we lookp
the name of the relation in the database and reassemble all of the key-value pairs into the heading.
RelationId name (MkRelation :: Relation heading key val) ->
let relation = lookp name db
in fmap (\(k :: Tuple key, v :: Tuple val) -> k `union` v) (M.toList relation)
This is where the trick of putting the constraints in GADT constructors comes into play. Without Unionable key val
in MkRelation
, ghc would complain about not being able to find a Unionable key val
when calling k `union` v
and without IsMember name (RelationToMap relation) (RelationsToDB relations)
, lookp name db
would similarly fail. Note, when trying to write something like this yourself, you're probably not going to get the constraints right immediately. I certainly didn't. My recommended workflow is to write the GADT constructor with an empty set of constraints (() =>
) and then when ghc tells you Couldn't deduce instance Class for Type
, fill the parentheses until it typechecks.
Rename
From Chapter 2:
Let
a
be a relation with an attributeX
and no attributeY
. Then the expressiona RENAME ( X AS Y )
yields a relation that differs from a only in that the name of the specified attribute isY
instead ofX
.
We'll want some type family that can compute the renaming at compile time. The authors don't specify that renaming to a name already in the heading is an error (at least in Chapter 2) but we're going to make it one explicitly.
type family Rename (x :: Symbol) (y :: Symbol) (relation :: [Mapping Symbol Type]) where
Rename a b '[] = '[]
Rename a b ((a ::: t) ': rest) = (b ::: t) ': Rename a b rest
Rename a b ((b ::: t) ': rest) =
TypeError
( 'Text "Cannot rename "
':<>: 'Text a
':<>: 'Text " to "
':<>: 'Text b
':$$: 'Text "The name already exists in the tuple"
)
Rename a b (c ': rest) = c ': Rename a b rest
The Query
constructor can just use the type family.
Rename ::
(Sortable (Rename a b t)) =>
Var a ->
Var b ->
Query t relations ->
Query (Sort (Rename a b t)) relations
Remember that we operate assuming that every heading transformation takes a sorted heading and should return a sorted heading. Renaming is not order preserving in general (you could rename "a" to "z" and then what was the first attribute would go at the end) so we have to sort the output after the operation 1. To implement renaming, we need some function
renameTuple :: Var a -> Var b -> Tuple t -> Tuple (Rename a b t)
Recall, though, that the definition of Var
is just data Var (k :: Symbol) = Var
, so there is no difference in runtime representation between Var :: Var "a"
and Var :: Var "b"
. We can be confident in this assertion because ghc allows
renameVar :: Var a -> Var b
renameVar = coerce
Consequently, since Rename
does no reordering and doesn't change the types of any of the tuple items, we can deduce that Tuple t
also has the same runtime representation as Tuple (Rename a b t)
. Automatically proving that is, sadly, beyond ghc's capabilities. Having convinced ourselves, though, we can write
renameTuple _a _b = unsafeCoerce
Don't try this at home unless you really know what you're doing. Do try out implementing this function without unsafeCoerce
. You'll probably want to start out with something like
class Renamable a b t where
renameTuple :: Var a -> Var b -> Tuple t -> Tuple (Rename a b t)
instance Renameable a b '[] where
renameTuple _ _ Empty = Empty
to help guide ghc through the induction.
Restrict
Restrict
is essentially the same as SQL WHERE
.
Restrict :: (Tuple t -> Bool) -> Query t relations -> Query t relations
Refreshingly, this constructor has no constraints or other GADT shenanigans. It's even expressible without GADT syntax:
data Query t relations =
| ...
| Restrict (Tuple t -> Bool) (Query t relations)
| ...
The implementation is correspondingly straightforward.
Restrict f q' -> filter f (runQuery q' db)
Project
This is like specifying which columns to select in SQL. For select col1, col2 from table
we have RelationId (Var @"table") someRelation & Project
. Amazingly, if we have enough type signatures specified, ghc can infer which columns we wanted to project onto. We'll also want a type error if we try to project onto columns that don't exist. Note that projection is order preserving, unlike renaming, so we don't have to do any extra sorting.
Project :: (Submap t' t) => Query t relations -> Query t' relations
Embedding the Submap t' t
constraint basically amounts to the whole implementation.
Project q' -> map submap (runQuery q' db)
Extend
From the book
Let a be a relation. Then the extension
EXTEND a ADD ( exp AS Z )
is a relation with
A heading consisting of the heading of a extended with the attribute
Z
A body consisting of all tuples
t
such thatt
is a tuple of a extended with a value for attributeZ
that is computed by evaluating exp on that tuple ofa
Relation a must not have an attribute called
Z
, and exp must not refer toZ
.Here is a simple example of EXTEND:
EXTEND S ADD ( 3 * STATUS AS TRIPLE )
Extend
doesn't have a great SQL analogue that I know of. The closest construct is probably something like SELECT a + 1 from table
where you put some expression after the SELECT
.
Extend ::
(Member l t ~ 'False, Sortable (l ::: a ': t)) =>
Var l ->
(Tuple t -> a) ->
Query t relations ->
Query (Sort (l ::: a ': t)) relations
Given some function that takes a label and the existing tuple type, we stick the result of calling that function on every tuple in the relation into a new attribute with the label.
Extend label f q' -> map (\t -> quicksort (Ext label (f t) t)) (runQuery q' db)
Join
The last type of query we'll implement here is Join
, corresponding to the "natural join" which takes all common attributes of two relations and returns all tuples that have the same values for those attributes. If we want something like SQL SELECT x, y FROM t1 JOIN t2 ON t1.col1 = t2.col2
, we can just rename col2
to col1
in one of the relations before the join. type-level-sets
doesn't provide a way of computing the intersection of two mapping lists, so we'll have to write it ourselves. Let's look at what this function looks like at the value level.
sortedIntersection :: Ord a => [a] -> [a] -> [a]
sortedIntersection t [] = []
sortedIntersection [] t = []
sortedIntersection (x:xs) (y:ys)
| x == y = x : sortedIntersection xs ys
| x < y = sortedIntersection xs (y:ys)
| otherwise = sortedIntersection (x:xs) ys
-- >>> sortedIntersection [1..5] [5..10]
-- [5]
Now we can convert this to the type level. Another difference between type families and functions is that when we bind a type variable multiple times in a pattern, we assert that it refers to the same type in both locations, whereas binding a variable twice in a function pattern is an error. Also, type families can only branch at top level patterns, so we'll have to make a helper type family to store the result of comparing labels.
type family Intersection (t :: [Mapping Symbol Type]) (t' :: [Mapping Symbol Type]) :: [Mapping Symbol Type] where
Intersection t '[] = '[]
Intersection '[] t = '[]
Intersection (a ': as) (a ': bs) = a ': Intersection as bs
Intersection (l ::: a ': as) (r ::: b ': bs) =
IntersectionCase (CmpSymbol l r) l r a as b bs
type family IntersectionCase (ordering :: Ordering) l r a as b bs where
IntersectionCase 'LT l r a as b bs = Intersection as (r ::: b ': bs)
IntersectionCase 'GT l r a as b bs = Intersection (l ::: a ': as) bs
IntersectionCase 'EQ l r a as a bs = l ::: a ': Intersection as bs
IntersectionCase 'EQ l r a as b bs =
TypeError
( 'Text "Cannot join on attribute '"
':<>: 'Text l
':<>: 'Text "'"
':$$: 'Text l
':<>: 'Text "has type "
':<>: 'ShowType a
':<>: 'Text " in the first relation and type "
':<>: 'ShowType b
':<>: 'Text "in the second"
)
-- >>> :kind! Intersection '["id" ::: Int, "name" ::: String] '["id" ::: Int]
-- Intersection '["id" ::: Int, "name" ::: String] '["id" ::: Int] :: [Mapping Symbol *]
-- = '[ "id" ':-> Int]
-- >>> :kind! Intersection '["id" ::: Int, "name" ::: String] '["id" ::: word]
-- Intersection '["id" ::: Int, "name" ::: String] '["id" ::: Word] :: [Mapping Symbol *]
-- = (TypeError ...)
It's kind of nice to be able to "crash" in type families with a custom TypeError
and not feel bad about it like when we write error
in a normal function. Anyway, now that we can calculate which attributes we want to join on, we can attempt to write down the full Join
constructor.
Join ::
( common ~ Intersection t' t
, Eq (Tuple common)
, Split common t'_rest t'
, Split common t_rest t
, Sortable (common :++ (t'_rest :++ t_rest))
) =>
Query t' tables ->
Query t tables ->
Query (Sort (common :++ (t'_rest :++ t_rest))) tables
However, we get a big type error.
• Could not deduce: Sort (common :++ (t'_rest0 :++ t_rest0))
~ Sort (common :++ (t'_rest :++ t_rest))
from the context: (common ~ Intersection t' t, Eq (Tuple common),
Split common t'_rest t', Split common t_rest t,
Sortable (common :++ (t'_rest :++ t_rest)))
bound by the type of the constructor ‘Join’:
forall (common :: [Mapping Symbol *]) (t' :: [Mapping Symbol *])
(t :: [Mapping Symbol *]) (t'_rest :: [Mapping Symbol *])
(t_rest :: [Mapping Symbol *]) (tables :: [Mapping Symbol *]).
(common ~ Intersection t' t, Eq (Tuple common),
Split common t'_rest t', Split common t_rest t,
Sortable (common :++ (t'_rest :++ t_rest))) =>
Query t' tables
-> Query t tables
-> Query (Sort (common :++ (t'_rest :++ t_rest))) tables
at /Users/josephmorag/Projects/databass/src/Databass/Blog.hs:(127,3)-(138,57)
Expected type: Query t' tables
-> Query t tables
-> Query (Sort (common :++ (t'_rest :++ t_rest))) tables
Actual type: Query t' tables
-> Query t tables
-> Query (Sort (common :++ (t'_rest0 :++ t_rest0))) tables
NB: ‘Sort’ is a non-injective type family
The type variables ‘t'_rest0’, ‘t_rest0’ are ambiguous
• In the ambiguity check for ‘Join’
To defer the ambiguity check to use sites, enable AllowAmbiguousTypes
In the definition of data constructor ‘Join’
In the data type declaration for ‘Query’
While we could solve this by enabling the AllowAmbiguousTypes
extension, as helpfully suggested in the error message, we'll need to have access to the t_rest
and t'rest
type variables inside the implementation, like we did in the RelationId
case. I don't actually know how to introduce them into the scope of the function with AllowAmbiguousTypes
turned on, so instead, we'll leave it off and pass those type arguments inside Proxy
s so we can match on them. If anyone does know how to accomplish that without Proxy
please let me know.
Join ::
( common ~ Intersection t' t
, Eq (Tuple common)
, Split common t'_rest t'
, Split common t_rest t
, Sortable (common :++ (t'_rest :++ t_rest))
) =>
Proxy t'_rest ->
Proxy t_rest ->
Query t' tables ->
Query t tables ->
Query (Sort (common :++ (t'_rest :++ t_rest))) tables
For the implementation, we'll write a "nested inner loop" join, though it doesn't look too much like a C-style nested loop when written using the list monad like this.
Join (_ :: Proxy t_l_rest) (_ :: Proxy t_r_rest) q1 q2 -> do
l :: Tuple t_l <- runQuery q1 db
r :: Tuple t_r <- runQuery q2 db
let (l_common, l_rest) = split @(Intersection t_l t_r) @t_l_rest l
(r_common, r_rest) = split @(Intersection t_l t_r) @t_r_rest r
guard (l_common == r_common)
pure (quicksort $ append l_common (append l_rest r_rest))
Conclusion
There are a few more relational operations described in Tutorial D that we haven't covered here, including Summarize
, Group
, and Ungroup
, but what we have is good enough to cover most SQL queries that I've ever written. In the next parts of the series we'll go over populating the database and try to use it for a toy server.
The Sort
type family and Sortable
class use the canonical Haskell quicksort implementation that gets shown to beginners to demonstrate how elegant the language is. Since we know we're inserting one element into an otherwise ordered list, we should just do that in linear time. The necessary type family and typeclass are left as an exercise to the reader.