最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

go - Golang IO Race Condition - Stack Overflow

programmeradmin1浏览0评论

Having a very simplified example (still I'm not sure it would be totally reproducible at any env) So there's a socket pipe

func SocketPair() (*os.File, *os.File, error) {
    fds, err := syscall.Socketpair(syscall.AF_UNIX, syscall.SOCK_STREAM, 0)
    if err != nil {
        return nil, nil, err
    }

    f0 := os.NewFile(uintptr(fds[0]), "socket-0")
    f1 := os.NewFile(uintptr(fds[1]), "socket-1")

    return f0, f1, nil
}

And a simple cmd call

func main() {
    f0, f1, err := utils.SocketPair()

    if err != nil {
        return panic(err)
    }

    cmd := exec.CommandContext(ctx, "cat")

    cmd.Stdin = f0
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    // pipe routine
    go func() {
        size, err := io.Copy(f1, os.Stdin)
        fmt.Printf("------res %d, %v", size, err)
        f1.Close()
    }()


    err := cmd.Run()

    if err != nil {
        return panic(err)
    }

}

So calling this with something like

echo "abc\ndef\nghi" | app

makes output similar to

------res 11, <nil>
abc
def
ghi

and hangs. Which actually tells that pipe routine successfully delivered stdin data to the socket pair. Yet the cmd input is not still EOF.

For this exact simple example the issue (in my env) can be solved with two options

  • either put pipe routine just before cmd := exec. line
  • keep pipe routine at the very initial position instead make it waiting to enter the execution as follows
    started := make(chan byte)
    go func() {
        close(started)
        size, err := io.Copy(f1, os.Stdin)
        fmt.Printf("------res %d, %v", size, err)
        f1.Close()
    }()
    <-started

So both these solutions resolves the issue and application gracefully exits.

Still in more complex cases with deeper go routines chain even this doesn't help. Instead simple call time.Sleep(time.Second) just before the cmd.Run() works.

It very looks like there's a race condition for the moment of start reading within io.Copy / cmd.Run matters a lot.

So solving the issue I don't want to play with time.Sleep here searching for the optimal interval (which is a bad idea if this is really a race condition)

Yet my crucial question here: what is the root cause of that behavior. What is really the matter for who starts reading first.

Thanks

Having a very simplified example (still I'm not sure it would be totally reproducible at any env) So there's a socket pipe

func SocketPair() (*os.File, *os.File, error) {
    fds, err := syscall.Socketpair(syscall.AF_UNIX, syscall.SOCK_STREAM, 0)
    if err != nil {
        return nil, nil, err
    }

    f0 := os.NewFile(uintptr(fds[0]), "socket-0")
    f1 := os.NewFile(uintptr(fds[1]), "socket-1")

    return f0, f1, nil
}

And a simple cmd call

func main() {
    f0, f1, err := utils.SocketPair()

    if err != nil {
        return panic(err)
    }

    cmd := exec.CommandContext(ctx, "cat")

    cmd.Stdin = f0
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    // pipe routine
    go func() {
        size, err := io.Copy(f1, os.Stdin)
        fmt.Printf("------res %d, %v", size, err)
        f1.Close()
    }()


    err := cmd.Run()

    if err != nil {
        return panic(err)
    }

}

So calling this with something like

echo "abc\ndef\nghi" | app

makes output similar to

------res 11, <nil>
abc
def
ghi

and hangs. Which actually tells that pipe routine successfully delivered stdin data to the socket pair. Yet the cmd input is not still EOF.

For this exact simple example the issue (in my env) can be solved with two options

  • either put pipe routine just before cmd := exec. line
  • keep pipe routine at the very initial position instead make it waiting to enter the execution as follows
    started := make(chan byte)
    go func() {
        close(started)
        size, err := io.Copy(f1, os.Stdin)
        fmt.Printf("------res %d, %v", size, err)
        f1.Close()
    }()
    <-started

So both these solutions resolves the issue and application gracefully exits.

Still in more complex cases with deeper go routines chain even this doesn't help. Instead simple call time.Sleep(time.Second) just before the cmd.Run() works.

It very looks like there's a race condition for the moment of start reading within io.Copy / cmd.Run matters a lot.

So solving the issue I don't want to play with time.Sleep here searching for the optimal interval (which is a bad idea if this is really a race condition)

Yet my crucial question here: what is the root cause of that behavior. What is really the matter for who starts reading first.

Thanks

Share Improve this question asked Jan 31 at 8:37 404404 4592 silver badges9 bronze badges 1
  • 3 One guess is that your copy and input goroutines end up on different OS threads depending on timing. On posix systems a close call on one pthread is not guaranteed to close an FD used by another pthread, so maybe this is affecting something in the socketpair. – Mr_Pink Commented Jan 31 at 8:58
Add a comment  | 

2 Answers 2

Reset to default 0

When searching for more details on syscall.Socketpair, I stumbled on this gist :

func Socketpair() (net.Conn, net.Conn, error) {
       fds, err := syscall.Socketpair(syscall.AF_LOCAL, syscall.SOCK_STREAM, 0)
       if err != nil {
               return nil, nil, err
       }

       c1, err := fdToFileConn(fds[0])
       if err != nil {
               return nil, nil, err
       }

       c2, err := fdToFileConn(fds[1])
       if err != nil {
               c1.Close()
               return nil, nil, err
       }

       return c1, c2, err
}

func fdToFileConn(fd int) (net.Conn, error) {
       f := os.NewFile(uintptr(fd), "")
       defer f.Close()
       return net.FileConn(f)
}

Pluging this into your code sample fixes the issue on my linux system.


complete playground sample: https://go.dev/play/p/B24cowycU1G

note: running it on the playground does not give the same behavior as on my machine (either the time package is tweaked in a way that hinders the timeouts, or interprocess signalling is just forbidden ...), if you copy/paste the code to a go file on your machine you should get:

$ go run foo.go 
===== net.Conn pair
Hello World!
===== *os.File pair
Hello World!
panic: signal: killed   # <- timeout triggered

goroutine 1 [running]:
main.main()
    /tmp/foo.go:105 +0xdb
exit status 2

I haven't looked in complete details the differences between os.NewFile() and net.FileConn(), the first obvious difference I spotted is that os.NewFile() wraps the file descriptor using os.newFile(), while net.FileConn uses net.newFileFD(), and both functions have a very different initialization sequence.

I guess, the hanging problem you're seeing is a race condition between when the reading and writing of the pipe gets set up. It's tricky because sometimes it works and sometimes it doesn't!

Here's why both your solutions work:

  1. Moving the pipe routine earlier: This gives the goroutine time to start up before cmd.Run() tries to read from it.

  2. Using the channel sync:

    started := make(chan byte)
    go func() {
        close(started)
        // your copy code
    }()
    <-started
    

This makes sure your goroutine is actually running before moving on. Instead of using time.Sleep(), for example:

// Set up a WaitGroup to coordinate everything
var wg sync.WaitGroup
wg.Add(1)

go func() {
    defer wg.Done()
    defer f1.Close()
    size, err := io.Copy(f1, os.Stdin)
    fmt.Printf("------res %d, %v", size, err)
}()

err := cmd.Run()
wg.Wait()

The root issue is that we need to make sure the goroutine handling the pipe is ready before the command starts trying to read from it.

发布评论

评论列表(0)

  1. 暂无评论