An absolute minimum parsing library could be:
{- | Input error type. -}
data InputError = InputError
deriving (Eq, Show)
instance Semigroup InputError where
(<>) _ _ = InputError
instance Monoid InputError where
mempty = InputError
{- | The parsing monad. -}
newtype Parser s e a = Parser ([s] -> Either e (a, [s]))
deriving stock Functor
-- Instances.
instance Applicative (Parser s e) where
pure x = Parser $ \ xs -> pure (x, xs)
(<*>) p q = Parser $ \ xs -> do
(f, ys) <- run p xs
(x, zs) <- run q ys
pure (f x, zs)
instance Monad (Parser s e) where
(>>=) p h = Parser $ \ xs -> do
(x, ys) <- run p xs
run (h x) ys
instance Monoid e => Alternative (Parser s e) where
empty = Parser $ \ _ -> Left mempty
(<|>) p q = Parser $ \ xs ->
case run p xs of
r1@(Right _) -> r1
Left e1 ->
case run q xs of
r2@(Right _) -> r2
Left e2 -> Left $ e1 <> e2
{- | Primitive parser getting one element out of the stream. -}
one :: Parser s InputError s
one = Parser $ \ xs ->
case uncons xs of
Nothing -> Left InputError
Just p -> Right p
{- | Run the parser on input and return the results. -}
run :: Parser s e a -> [s] -> Either e (a, [s])
run (Parser p) = p
This compiles (with GHC2021 language; have to add some imports). Loading it in ghci in cabal repl:
ghci> let p = take 2 <$> many one
ghci> run p "0123"
Right ("01","")
This means that the parser has consumed all the input -- what I was expecting to see was Right ("01", "23")
.
So my question: where does the lazyness "break", so to speak, and is there any way to restore it? And by "restore it" I mean doing the instance implementation differently so that many
is as lazy as expected, because if I add
{- | Implement lazy 'many' -}
atMostN :: Monoid e => Word -> Parser s e a -> Parser s e [a]
atMostN n p = go n
where
go 0 = pure []
go m = do
r <- optional p
case r of
Nothing -> pure []
Just x -> (x: ) <$> go (pred m)
And load it up in ghci, I get the expected:
ghci> let q = atMostN 2 one
ghci> run q "0123"
Right ("01","23")
An absolute minimum parsing library could be:
{- | Input error type. -}
data InputError = InputError
deriving (Eq, Show)
instance Semigroup InputError where
(<>) _ _ = InputError
instance Monoid InputError where
mempty = InputError
{- | The parsing monad. -}
newtype Parser s e a = Parser ([s] -> Either e (a, [s]))
deriving stock Functor
-- Instances.
instance Applicative (Parser s e) where
pure x = Parser $ \ xs -> pure (x, xs)
(<*>) p q = Parser $ \ xs -> do
(f, ys) <- run p xs
(x, zs) <- run q ys
pure (f x, zs)
instance Monad (Parser s e) where
(>>=) p h = Parser $ \ xs -> do
(x, ys) <- run p xs
run (h x) ys
instance Monoid e => Alternative (Parser s e) where
empty = Parser $ \ _ -> Left mempty
(<|>) p q = Parser $ \ xs ->
case run p xs of
r1@(Right _) -> r1
Left e1 ->
case run q xs of
r2@(Right _) -> r2
Left e2 -> Left $ e1 <> e2
{- | Primitive parser getting one element out of the stream. -}
one :: Parser s InputError s
one = Parser $ \ xs ->
case uncons xs of
Nothing -> Left InputError
Just p -> Right p
{- | Run the parser on input and return the results. -}
run :: Parser s e a -> [s] -> Either e (a, [s])
run (Parser p) = p
This compiles (with GHC2021 language; have to add some imports). Loading it in ghci in cabal repl:
ghci> let p = take 2 <$> many one
ghci> run p "0123"
Right ("01","")
This means that the parser has consumed all the input -- what I was expecting to see was Right ("01", "23")
.
So my question: where does the lazyness "break", so to speak, and is there any way to restore it? And by "restore it" I mean doing the instance implementation differently so that many
is as lazy as expected, because if I add
{- | Implement lazy 'many' -}
atMostN :: Monoid e => Word -> Parser s e a -> Parser s e [a]
atMostN n p = go n
where
go 0 = pure []
go m = do
r <- optional p
case r of
Nothing -> pure []
Just x -> (x: ) <$> go (pred m)
And load it up in ghci, I get the expected:
ghci> let q = atMostN 2 one
ghci> run q "0123"
Right ("01","23")
Share
Improve this question
asked 2 days ago
G. RodriguesG. Rodrigues
3152 silver badges7 bronze badges
1
- 4 Lazy evaluation never changes the result of a computation. It can only give you a result where other evaluation orders would get stuck in an infinite loop or fail in other ways. So, your problem has little to do with laziness. – Noughtmare Commented 2 days ago
1 Answer
Reset to default 4Laziness does not get you what you want in this case.
Briefly put, consider this scenario:
case run (many one) "0123" of
Right (x,y) -> ... -- * --
Left _ -> someErrorHandling
At the --*--
point, you are somehow expecting that, if you consume the first N elements out of x
(as per your take 2
) then you'll find the unconsumed elements inside y
. But that's not truly possible.
Concretely, your take 2 <$> ...
acts as follows
case run (many one) "0123" of
Right (x, y) -> Right (take 2 x, y)
Left e -> Left e
but that will leave y
unaffected. You can't change the unconsumed input y
through <$>
.
Just to stress the point, think of this contrived example:
case run (many one) "0123" of
Right (x, y) -> Right (take (length y) x, y)
Left e -> Left e
Here the amount of "taken" digits depends on how many were left unconsumed. This can not be achieved with <$>
, but it is still possible to implement this at the lower level.
The kind of parser you have in mind, where you can control how much input is actually parsed from the "outside", might be realizable but it would require a completely different parser representation.