最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

haskell - Understanding lack of lazyness Alternative.many - Stack Overflow

programmeradmin0浏览0评论

An absolute minimum parsing library could be:

{- | Input error type. -}
data InputError = InputError
    deriving (Eq, Show)

instance Semigroup InputError where
    (<>) _ _ = InputError

instance Monoid InputError where
    mempty = InputError


{- | The parsing monad. -}
newtype Parser s e a = Parser ([s] -> Either e (a, [s]))
    deriving stock Functor


-- Instances.
instance Applicative (Parser s e) where
    pure x = Parser $ \ xs -> pure (x, xs)

    (<*>) p q = Parser $ \ xs -> do
        (f, ys) <- run p xs
        (x, zs) <- run q ys
        pure (f x, zs)

instance Monad (Parser s e) where
    (>>=) p h = Parser $ \ xs -> do
        (x, ys) <- run p xs
        run (h x) ys

instance Monoid e => Alternative (Parser s e) where
    empty = Parser $ \ _ -> Left mempty

    (<|>) p q = Parser $ \ xs ->
        case run p xs of
            r1@(Right _) -> r1
            Left e1      ->
                case run q xs of
                    r2@(Right _) -> r2
                    Left e2      -> Left $ e1 <> e2

{- | Primitive parser getting one element out of the stream. -}
one :: Parser s InputError s
one = Parser $ \ xs ->
    case uncons xs of
        Nothing -> Left InputError
        Just p  -> Right p

{- | Run the parser on input and return the results. -}
run :: Parser s e a -> [s] -> Either e (a, [s])
run (Parser p) = p

This compiles (with GHC2021 language; have to add some imports). Loading it in ghci in cabal repl:

ghci> let p = take 2 <$> many one
ghci> run p "0123"
Right ("01","")

This means that the parser has consumed all the input -- what I was expecting to see was Right ("01", "23").

So my question: where does the lazyness "break", so to speak, and is there any way to restore it? And by "restore it" I mean doing the instance implementation differently so that many is as lazy as expected, because if I add

{- | Implement lazy 'many' -}
atMostN :: Monoid e => Word -> Parser s e a -> Parser s e [a]
atMostN n p = go n
    where
        go 0 = pure []
        go m = do
            r <- optional p
            case r of
                Nothing -> pure []
                Just x  -> (x: ) <$> go (pred m)

And load it up in ghci, I get the expected:

ghci> let q = atMostN 2 one
ghci> run q "0123"
Right ("01","23")

An absolute minimum parsing library could be:

{- | Input error type. -}
data InputError = InputError
    deriving (Eq, Show)

instance Semigroup InputError where
    (<>) _ _ = InputError

instance Monoid InputError where
    mempty = InputError


{- | The parsing monad. -}
newtype Parser s e a = Parser ([s] -> Either e (a, [s]))
    deriving stock Functor


-- Instances.
instance Applicative (Parser s e) where
    pure x = Parser $ \ xs -> pure (x, xs)

    (<*>) p q = Parser $ \ xs -> do
        (f, ys) <- run p xs
        (x, zs) <- run q ys
        pure (f x, zs)

instance Monad (Parser s e) where
    (>>=) p h = Parser $ \ xs -> do
        (x, ys) <- run p xs
        run (h x) ys

instance Monoid e => Alternative (Parser s e) where
    empty = Parser $ \ _ -> Left mempty

    (<|>) p q = Parser $ \ xs ->
        case run p xs of
            r1@(Right _) -> r1
            Left e1      ->
                case run q xs of
                    r2@(Right _) -> r2
                    Left e2      -> Left $ e1 <> e2

{- | Primitive parser getting one element out of the stream. -}
one :: Parser s InputError s
one = Parser $ \ xs ->
    case uncons xs of
        Nothing -> Left InputError
        Just p  -> Right p

{- | Run the parser on input and return the results. -}
run :: Parser s e a -> [s] -> Either e (a, [s])
run (Parser p) = p

This compiles (with GHC2021 language; have to add some imports). Loading it in ghci in cabal repl:

ghci> let p = take 2 <$> many one
ghci> run p "0123"
Right ("01","")

This means that the parser has consumed all the input -- what I was expecting to see was Right ("01", "23").

So my question: where does the lazyness "break", so to speak, and is there any way to restore it? And by "restore it" I mean doing the instance implementation differently so that many is as lazy as expected, because if I add

{- | Implement lazy 'many' -}
atMostN :: Monoid e => Word -> Parser s e a -> Parser s e [a]
atMostN n p = go n
    where
        go 0 = pure []
        go m = do
            r <- optional p
            case r of
                Nothing -> pure []
                Just x  -> (x: ) <$> go (pred m)

And load it up in ghci, I get the expected:

ghci> let q = atMostN 2 one
ghci> run q "0123"
Right ("01","23")
Share Improve this question asked 2 days ago G. RodriguesG. Rodrigues 3152 silver badges7 bronze badges 1
  • 4 Lazy evaluation never changes the result of a computation. It can only give you a result where other evaluation orders would get stuck in an infinite loop or fail in other ways. So, your problem has little to do with laziness. – Noughtmare Commented 2 days ago
Add a comment  | 

1 Answer 1

Reset to default 4

Laziness does not get you what you want in this case.

Briefly put, consider this scenario:

case run (many one) "0123" of
   Right (x,y) -> ... -- * --
   Left _ -> someErrorHandling

At the --*-- point, you are somehow expecting that, if you consume the first N elements out of x (as per your take 2) then you'll find the unconsumed elements inside y. But that's not truly possible.

Concretely, your take 2 <$> ... acts as follows

case run (many one) "0123" of
   Right (x, y) -> Right (take 2 x, y)
   Left e -> Left e

but that will leave y unaffected. You can't change the unconsumed input y through <$>.

Just to stress the point, think of this contrived example:

case run (many one) "0123" of
   Right (x, y) -> Right (take (length y) x, y)
   Left e -> Left e

Here the amount of "taken" digits depends on how many were left unconsumed. This can not be achieved with <$>, but it is still possible to implement this at the lower level.

The kind of parser you have in mind, where you can control how much input is actually parsed from the "outside", might be realizable but it would require a completely different parser representation.

发布评论

评论列表(0)

  1. 暂无评论