librelist archives

« back to archive

EOF on a pty slave

EOF on a pty slave

From:
Simon Chiang
Date:
2011-10-22 @ 18:41
I've been exploring the PTY standard library as a way to script shell
sessions and I keep getting stumped by certain behaviors at the end of a
session, all related to EOF on the pty slave.  Eric, your comments about EOF
conditions in the post about IO#dup sounds like it's in the same ballpark so
I would like to ask if you or anyone else has insight into this behavior.

Stumper #1.  After a kill 9, reading the slave to EOF lets the process exit
much faster than not doing so.  I don't know why this makes a difference...
seems like the shell should just be killed outright, either way.

    # OS X 10.6.8, 1.8.7 or 1.9.2, representative of bash, ksh, zsh
    $ ruby example.rb
    PTY - select on slave
    0.601848
    PTY - select on slave, clearing the slave after kill
    0.000436

    $ cat example.rb
    require 'pty'

    shell = ARGV[0] || '/bin/sh'
    timeout = 1

    puts "PTY - select on slave"
    PTY.spawn(shell) do |slave, master, pid|
      unless IO.select([slave],nil,nil,timeout)
        puts "timeout waiting on slave"
      end

      start = Time.now
      Process.kill(9, pid)
      Process.wait(pid) rescue PTY::ChildExited
      puts "#{Time.now - start}"
    end

    puts "PTY - select on slave, clearing the slave after kill"
    PTY.spawn(shell) do |slave, master, pid|
      unless IO.select([slave],nil,nil,timeout)
        puts "timeout waiting on slave"
      end

      start = Time.now
      Process.kill(9, pid)

      slave.read

      Process.wait(pid) rescue PTY::ChildExited
      puts "#{Time.now - start}"
    end
    puts

Stumper #2.  When I issue an exit command to the shell via master, every so
often... rarely... I find a select on the slave will time-out waiting for
EOF.  Non-exit methods that should cause an EOF (ex writing EOF on master)
also result in the same condition; most of the time I can read to EOF but
every so often, time-out.  I've only seen this on 1.9.2 so I opened a bug
report about it (http://redmine.ruby-lang.org/issues/5463) but honestly I'm
not sure if it's a ruby issue or not.

Anyhow thanks!  Enjoying this list a lot.

- Simon

Re: [usp.ruby] EOF on a pty slave

From:
Eric Wong
Date:
2011-10-24 @ 10:22
Simon Chiang <simon.a.chiang@gmail.com> wrote:
> I've been exploring the PTY standard library as a way to script shell
> sessions and I keep getting stumped by certain behaviors at the end of a
> session, all related to EOF on the pty slave.  Eric, your comments about EOF
> conditions in the post about IO#dup sounds like it's in the same ballpark so
> I would like to ask if you or anyone else has insight into this behavior.

Yes, IO#dup is a subset of what fork() does.

(the following will probably be repeated in a full article,
 the diagrams are definitely GPLv3):

Here is IO#dup:

   * two Ruby IO objects
   * two file descriptors
   * one file object in the kernel

    [Ruby]    user space   |  kernel space
    ------------------------------------------------
                           |
    io_orig ----------- fd[orig] -\
                           |       >---> file object
    io_copy ----------- fd[copy] -/
                           |
    ------------------------------------------------

Here is the situation post-fork (without calling IO#dup):

  * two processes (p1: original process, p2: child process)
  * two Ruby IO objects (in different processes)
  * two file descriptors (identically numbered)
  * _one_ file object in the kernel

    [Ruby]    user space   |  kernel space
    -----------------------------------------------------
                           |
    io[p1] ----------- fd[p1] ------------\
                           |               \
    ------------ process boundary ----------> file object
                           |               /
    io[p2] ----------- fd[p2] ------------/
                           |
    -----------------------------------------------------

As you can see, the situation with fork(2) (used by PTY.spawn) is much
like IO#dup, the only difference is the process boundary between them.
In both cases, an explicit close(2) (or a process exit) will remove
_one_ of the fd references to the file object in the kernel.

Since the fd is in both processes with fork, both of the fds (which
initially share the same integer value) need to be dereferenced in order
for the file object in the kernel to be closed.

Expanding on the above example:

  If you call IO#dup before forking, you'll need to close _four_ fds
  in the above example to get an EOF.

> Stumper #1.  After a kill 9, reading the slave to EOF lets the process exit
> much faster than not doing so.  I don't know why this makes a difference...
> seems like the shell should just be killed outright, either way.
> 
>     # OS X 10.6.8, 1.8.7 or 1.9.2, representative of bash, ksh, zsh
>     $ ruby example.rb
>     PTY - select on slave
>     0.601848
>     PTY - select on slave, clearing the slave after kill
>     0.000436

I can't reproduce this huge time difference on my a Linux 2.6 machine
with a small rescue fix (see below)

>     $ cat example.rb
>     require 'pty'

I have little direct experience explicitly programming for PTYs and none
with the 'pty' extension.  However, I believe terminals are the
weirdest/quirkiest "files" Unix has[1].

Fortunately like all Unix files, PTYs are bound by the same rules around
dup/fork/close/exit that other file descriptors/objects are bound by.

>     shell = ARGV[0] || '/bin/sh'

Perhaps io/console (new in 1.9.3, also a gem) is more appropriate for
attempting to script interactive shells?  (I'm not familiar with
io/console other tha know of its existence, either).

>       slave.read

I needed to rescue Errno::EIO here, which I saw in your bug report
below.

>       Process.wait(pid) rescue PTY::ChildExited
>       puts "#{Time.now - start}"
>     end
>     puts
> 
> Stumper #2.  When I issue an exit command to the shell via master, every so
> often... rarely... I find a select on the slave will time-out waiting for
> EOF.  Non-exit methods that should cause an EOF (ex writing EOF on master)
> also result in the same condition; most of the time I can read to EOF but
> every so often, time-out.  I've only seen this on 1.9.2 so I opened a bug
> report about it (http://redmine.ruby-lang.org/issues/5463) but honestly I'm
> not sure if it's a ruby issue or not.

I took a look at ext/pty/pty.c in Ruby and noted your "master" and
"slave" IO objects are actually dup(2)s of each other.  I don't think
you can get an EOF (read(2) returning zero) because your reader(slave)
and writer(master) are the same file object in the kernel[2].

Your Errno::EIO is distinctly different from EOF, you might try figuring
out why EIO is not happening instead (or if there's something better to
look for, perhaps just waiting for process exit or SIGCHLD)


[1] - If anybody can contribute an article on PTY programming,
      please do.  I don't intend to write an article on PTYs
      since I lack the knowledge/experience and nobody I know
      likes dealing with ioctls much.

[2] - I'm also not sure if you can do the equivalent of a
      shutdown(SHUT_WR) with PTYs like you can with sockets
      to force an EOF on the reader.

-- 
Eric Wong

Re: [usp.ruby] EOF on a pty slave

From:
Simon Chiang
Date:
2011-10-25 @ 17:30
On Mon, Oct 24, 2011 at 4:22 AM, Eric Wong <normalperson@yhbt.net> wrote:

> Simon Chiang <simon.a.chiang@gmail.com> wrote:
>


> > Stumper #1.  After a kill 9, reading the slave to EOF lets the process
> exit
> > much faster than not doing so.  I don't know why this makes a
> difference...
> > seems like the shell should just be killed outright, either way.
> >
> >     # OS X 10.6.8, 1.8.7 or 1.9.2, representative of bash, ksh, zsh
> >     $ ruby example.rb
> >     PTY - select on slave
> >     0.601848
> >     PTY - select on slave, clearing the slave after kill
> >     0.000436
>
> I can't reproduce this huge time difference on my a Linux 2.6 machine
> with a small rescue fix (see below)
>
>
Thanks to you and Robert for trying this on your systems.  I apologize for
not doing a more thorough check of os dependency beforehand -- I thought I'd
seen it multiple places but clearly I was wrong.  Here is a better study,
with the modifications: https://gist.github.com/1312929

Looks like the excessive wait time is an OS X issue.  Some systems like SLES
10 have a longer wait time but nothing quite so dramatic as OS X.  It does
not appear to be an issue on Ubuntu and I don't have a windows system
readily available to confirm the cygwin results.


> >     $ cat example.rb
> >     require 'pty'
>
> I have little direct experience explicitly programming for PTYs and none
> with the 'pty' extension.  However, I believe terminals are the
> weirdest/quirkiest "files" Unix has[1].
>
> Fortunately like all Unix files, PTYs are bound by the same rules around
> dup/fork/close/exit that other file descriptors/objects are bound by.
>
> >     shell = ARGV[0] || '/bin/sh'
>
> Perhaps io/console (new in 1.9.3, also a gem) is more appropriate for
> attempting to script interactive shells?  (I'm not familiar with
> io/console other tha know of its existence, either).
>
>
I heard about that as well but haven't had a chance to look at it.


> >       slave.read
>
> I needed to rescue Errno::EIO here, which I saw in your bug report
> below.
>
> >       Process.wait(pid) rescue PTY::ChildExited
> >       puts "#{Time.now - start}"
> >     end
> >     puts
> >
> > Stumper #2.  When I issue an exit command to the shell via master, every
> so
> > often... rarely... I find a select on the slave will time-out waiting for
> > EOF.  Non-exit methods that should cause an EOF (ex writing EOF on
> master)
> > also result in the same condition; most of the time I can read to EOF but
> > every so often, time-out.  I've only seen this on 1.9.2 so I opened a bug
> > report about it (http://redmine.ruby-lang.org/issues/5463) but honestly
> I'm
> > not sure if it's a ruby issue or not.
>
> I took a look at ext/pty/pty.c in Ruby and noted your "master" and
> "slave" IO objects are actually dup(2)s of each other.  I don't think
> you can get an EOF (read(2) returning zero) because your reader(slave)
> and writer(master) are the same file object in the kernel[2].
>
>
I'm still trying to process all this 100% but it does point to something I
should do more thoroughly - compare the  ext/pty/pty.c code in 1.8.7 vs
1.9.2.  After all I don't see the timeouts on 1.8.7 and prior.


> Your Errno::EIO is distinctly different from EOF, you might try figuring
> out why EIO is not happening instead (or if there's something better to
> look for, perhaps just waiting for process exit or SIGCHLD)
>
>
Agreed... truth is I have very limited knowledge about it.  All I have to go
on is what I read in Michael Kerrisk's book 'The Linux Programming
Interface'.  I was thinking a SIGHUP was somehow involved and came across
this passage:

SUSv3 states that if both a terminal disconnect occurs and one of the
conditions giving rise to an EIO error from read() exists, then it is
unspecified whether read() returns end-of-file or fails with the error EIO.
 Portable programs must allow for both possibilities. [p 709]


Thank you!  I have a few more things to look into.

- Simon

Re: [usp.ruby] EOF on a pty slave

From:
Robert Klemme
Date:
2011-10-25 @ 08:57
On Mon, Oct 24, 2011 at 12:22 PM, Eric Wong <normalperson@yhbt.net> wrote:
> Simon Chiang <simon.a.chiang@gmail.com> wrote:

>> Stumper #1.  After a kill 9, reading the slave to EOF lets the process exit
>> much faster than not doing so.  I don't know why this makes a difference...
>> seems like the shell should just be killed outright, either way.
>>
>>     # OS X 10.6.8, 1.8.7 or 1.9.2, representative of bash, ksh, zsh
>>     $ ruby example.rb
>>     PTY - select on slave
>>     0.601848
>>     PTY - select on slave, clearing the slave after kill
>>     0.000436
>
> I can't reproduce this huge time difference on my a Linux 2.6 machine
> with a small rescue fix (see below)

Same here on cygwin.

10:41:27 Temp$ ruby19 example.rb /bin/bash
PTY - select on slave
  0.005000
PTY - select on slave, clearing the slave after kill
  0.007000
PTY - select on slave, closing streams after kill
  0.004000

I also did another modification of the script (see attachment).

>> Stumper #2.  When I issue an exit command to the shell via master, every so
>> often... rarely... I find a select on the slave will time-out waiting for
>> EOF.  Non-exit methods that should cause an EOF (ex writing EOF on master)
>> also result in the same condition; most of the time I can read to EOF but
>> every so often, time-out.  I've only seen this on 1.9.2 so I opened a bug
>> report about it (http://redmine.ruby-lang.org/issues/5463) but honestly I'm
>> not sure if it's a ruby issue or not.
>
> I took a look at ext/pty/pty.c in Ruby and noted your "master" and
> "slave" IO objects are actually dup(2)s of each other.  I don't think
> you can get an EOF (read(2) returning zero) because your reader(slave)
> and writer(master) are the same file object in the kernel[2].

I don't know much about pty either what would stop them from having
two positions for reading and writing (like sockets where both
directions are independent, too)?

I think there might be an even more profane explanation: the shell
simply does not exit.  Sending exit to a shell which does something
else than waiting for a command (e.g. waiting for a child to finish,
reading input with "read") will not make the shell exit.

I believe another way to make the shell exit (eventually) would be to
do master.close_write.  Same caveats with reading EOF as above apply
though.

Kind regards

robert


-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Re: [usp.ruby] EOF on a pty slave

From:
Simon Chiang
Date:
2011-10-25 @ 17:30
On Tue, Oct 25, 2011 at 2:57 AM, Robert Klemme
<shortcutter@googlemail.com>wrote:

>> Stumper #2.  When I issue an exit command to the shell via master, every
> so
> >> often... rarely... I find a select on the slave will time-out waiting
> for
> >> EOF.  Non-exit methods that should cause an EOF (ex writing EOF on
> master)
> >> also result in the same condition; most of the time I can read to EOF
> but
> >> every so often, time-out.  I've only seen this on 1.9.2 so I opened a
> bug
> >> report about it (http://redmine.ruby-lang.org/issues/5463) but honestly
> I'm
> >> not sure if it's a ruby issue or not.
> >
> > I took a look at ext/pty/pty.c in Ruby and noted your "master" and
> > "slave" IO objects are actually dup(2)s of each other.  I don't think
> > you can get an EOF (read(2) returning zero) because your reader(slave)
> > and writer(master) are the same file object in the kernel[2].
>
> I don't know much about pty either what would stop them from having
> two positions for reading and writing (like sockets where both
> directions are independent, too)?
>
> I think there might be an even more profane explanation: the shell
> simply does not exit.  Sending exit to a shell which does something
> else than waiting for a command (e.g. waiting for a child to finish,
> reading input with "read") will not make the shell exit.
>
>
Yeah I wonder about that too.  I think there is reason to believe the shell
is exiting, however.  After issuing an exit with an unusual status (like
'exit 8') I can confirm that the exit status of the pty process is indeed 8.
 And in fact, the loop reads until where the EOF should be.  I can confirm
that the string I read leading up to a timeout is the same as when I don't
get the timeout at all - something like "$ exit 8\nexit\n" (the exact output
is system, shell, stty-dependent).

On the flip side there are situations where I think you're clearly right.
 For instance, as I recall, I have made examples such that I write "exit
8\n" to the master and then kill it immediately so that the pty never gets
to the exit command.

Thanks,

- Simon