librelist archives

« back to archive

IO#dup and the dup(2) system call

IO#dup and the dup(2) system call

From:
Eric Wong
Date:
2011-10-19 @ 02:59
IO#dup is Object#dup in Ruby: it creates a shallow copy of an
existing object.  To create a shallow copy, the IO#initialize_copy
callback method performs the dup(2) syscall on the underlying file
descriptor the IO object wraps.

Like Object#dup in Ruby, dup(2) is a shallow clone that does not copy
the underlying open file object in the kernel, but creates a new
reference to an existing kernel object.

Thus, two (or more) file descriptors in the same process can refer
to the same open file in the kernel.

Before calling IO#dup, we have a 1:1:1 relationship:

  * one Ruby IO object
  * one file descriptor
  * one open file object in the kernel

     [Ruby]    user space   |  kernel space
     ------------------------------------------------
                            |
     io_orig ----------- fd[orig] ----> file object
                            |
     ------------------------------------------------
     (file descriptors (fd) are the bridge here kernel and user space)


After we call IO#dup, we have two 2:2:1 relationship:

  * two Ruby IO objects
  * two file descriptors
  * one file object in the kernel

     [Ruby]    user space   |  kernel space
     ------------------------------------------------
                            |
     io_orig ----------- fd[orig] -\
                            |       >---> file object
     io_copy ----------- fd[copy] -/
                            |
     ------------------------------------------------

IO#dup can be called on the same IO object any number times, so there
may be an N:N:1 relationship as long as the process (and system)
resource limits are not exceeded.

Most kernel-level (but not user space) changes to one IO object are
immediately visible in the IO object(s) it was copied from (or copied
to).

Effect on IPC
-------------

IO#dup means IO#close / close(2) will only remove a _reference_ to the
file object in the kernel.  Only when the last file descriptor for a
given file object is closed is the actual file object closed and
released in the kernel.

For applications relying on receiving an end-of-file condition (from a
socket or pipe), IO#dup[1] can (sometimes inadverdantly) prevent the
end-of-file condition from being reached in the reader.



[1] - and similar functions, like fork()


License: GPLv3 (or later, at the discretion of Eric Wong)
         http://www.gnu.org/licenses/gpl-3.0.txt
-- 
Eric Wong

Re: [usp.ruby] IO#dup and the dup(2) system call

From:
Christian Pedaschus
Date:
2011-10-19 @ 21:10
On 10/19/2011 04:59 AM, Eric Wong wrote:
> IO#dup is Object#dup in Ruby: it creates a shallow copy of an
> existing object.  To create a shallow copy, the IO#initialize_copy
> callback method performs the dup(2) syscall on the underlying file
> descriptor the IO object wraps.
>
> Like Object#dup in Ruby, dup(2) is a shallow clone that does not copy
> the underlying open file object in the kernel, but creates a new
> reference to an existing kernel object.
>
> Thus, two (or more) file descriptors in the same process can refer
> to the same open file in the kernel.
>
> Before calling IO#dup, we have a 1:1:1 relationship:
>
>   * one Ruby IO object
>   * one file descriptor
>   * one open file object in the kernel
>
>      [Ruby]    user space   |  kernel space
>      ------------------------------------------------
>                             |
>      io_orig ----------- fd[orig] ----> file object
>                             |
>      ------------------------------------------------
>      (file descriptors (fd) are the bridge here kernel and user space)
>
>
> After we call IO#dup, we have two 2:2:1 relationship:
>
>   * two Ruby IO objects
>   * two file descriptors
>   * one file object in the kernel
>
>      [Ruby]    user space   |  kernel space
>      ------------------------------------------------
>                             |
>      io_orig ----------- fd[orig] -\
>                             |       >---> file object
>      io_copy ----------- fd[copy] -/
>                             |
>      ------------------------------------------------
>
> IO#dup can be called on the same IO object any number times, so there
> may be an N:N:1 relationship as long as the process (and system)
> resource limits are not exceeded.
>
> Most kernel-level (but not user space) changes to one IO object are
> immediately visible in the IO object(s) it was copied from (or copied
> to).
>
> Effect on IPC
> -------------
>
> IO#dup means IO#close / close(2) will only remove a _reference_ to the
> file object in the kernel.  Only when the last file descriptor for a
> given file object is closed is the actual file object closed and
> released in the kernel.
>
> For applications relying on receiving an end-of-file condition (from a
> socket or pipe), IO#dup[1] can (sometimes inadverdantly) prevent the
> end-of-file condition from being reached in the reader.
>
>
>
> [1] - and similar functions, like fork()
>
>
> License: GPLv3 (or later, at the discretion of Eric Wong)
>          http://www.gnu.org/licenses/gpl-3.0.txt

Sidenote: Glad to see you back, Eric.
Was thinking quite a bit about you (and usp of course) in the last
(mostly 4) weeks.
I hope you are of good health, thanks again for sharing your knowledge
with us.

Which brings me to:
What happens down at the system level, if i dup a filedescriptor and
then simultaneusly write to both copies via multiple threads...?
Probably a stupid question, as the first thread will probably win, so
the bigger question would be: how to coordinate the access in such a
situation? Is there a posix/ruby way or do i have to un/flock fd's myself?

Cheers/Prost (en/de)
Christian


Re: [usp.ruby] IO#dup and the dup(2) system call

From:
Eric Wong
Date:
2011-10-19 @ 21:30
Christian Pedaschus <chris@s-4-u.net> wrote:
> On 10/19/2011 04:59 AM, Eric Wong wrote:
> Sidenote: Glad to see you back, Eric.
> Was thinking quite a bit about you (and usp of course) in the last
> (mostly 4) weeks.

Please don't think of me, I don't like being thought of :x

> I hope you are of good health, thanks again for sharing your knowledge
> with us.

You're welcome :>

> Which brings me to:
> What happens down at the system level, if i dup a filedescriptor and
> then simultaneusly write to both copies via multiple threads...?
> Probably a stupid question, as the first thread will probably win, so
> the bigger question would be: how to coordinate the access in such a
> situation? Is there a posix/ruby way or do i have to un/flock fd's myself?

Great question!  It depends on several factors, _all_ of which I intend
to cover in-depth soonish:

1) underlying file type (pipe, regular file, stream vs datagram socket...)
2) file status flags (e.g. O_APPEND, O_NONBLOCK)
3) operation (read(2), write(2), pwrite(2), pread(2), accept(2))

In many cases, you do not have to lock anything, the kernel handles all
the locking for you.

In other cases, you do need to lock something or a user space
application can receive corrupt data.  However, keep in mind that
locking in userspace can be misguided as it leads to a false sense of
safety.  A false sense of safety can even _lead_ to data corruption
because the kernel semantics are ignored.

In any case, you should _never_ be able to crash the OS kernel from
failing to lock these operations, report it to the developers if you
do.

-- 
Eric Wong

Re: [usp.ruby] IO#dup and the dup(2) system call

From:
Robert Klemme
Date:
2011-10-21 @ 11:24
On Wed, Oct 19, 2011 at 11:30 PM, Eric Wong <normalperson@yhbt.net> wrote:
> Christian Pedaschus <chris@s-4-u.net> wrote:
>> On 10/19/2011 04:59 AM, Eric Wong wrote:
>> Sidenote: Glad to see you back, Eric.
>> Was thinking quite a bit about you (and usp of course) in the last
>> (mostly 4) weeks.
>
> Please don't think of me, I don't like being thought of :x

I think you are a tad too humble here. :-)

>> Which brings me to:
>> What happens down at the system level, if i dup a filedescriptor and
>> then simultaneusly write to both copies via multiple threads...?
>> Probably a stupid question, as the first thread will probably win, so
>> the bigger question would be: how to coordinate the access in such a
>> situation? Is there a posix/ruby way or do i have to un/flock fd's myself?
>
> Great question!  It depends on several factors, _all_ of which I intend
> to cover in-depth soonish:
>
> 1) underlying file type (pipe, regular file, stream vs datagram socket...)
> 2) file status flags (e.g. O_APPEND, O_NONBLOCK)
> 3) operation (read(2), write(2), pwrite(2), pread(2), accept(2))
>
> In many cases, you do not have to lock anything, the kernel handles all
> the locking for you.
>
> In other cases, you do need to lock something or a user space
> application can receive corrupt data.  However, keep in mind that
> locking in userspace can be misguided as it leads to a false sense of
> safety.  A false sense of safety can even _lead_ to data corruption
> because the kernel semantics are ignored.

Are you talking about the case of multiple processes writing to the
same file?  OK, I'll hold my breath and wait for next articles. :-)

> In any case, you should _never_ be able to crash the OS kernel from
> failing to lock these operations, report it to the developers if you
> do.

That's for sure.

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Re: [usp.ruby] IO#dup and the dup(2) system call

From:
Eric Wong
Date:
2011-10-21 @ 21:57
Robert Klemme <shortcutter@googlemail.com> wrote:
> On Wed, Oct 19, 2011 at 11:30 PM, Eric Wong <normalperson@yhbt.net> wrote:
> > In other cases, you do need to lock something or a user space
> > application can receive corrupt data.  However, keep in mind that
> > locking in userspace can be misguided as it leads to a false sense of
> > safety.  A false sense of safety can even _lead_ to data corruption
> > because the kernel semantics are ignored.
> 
> Are you talking about the case of multiple processes writing to the
> same file?  OK, I'll hold my breath and wait for next articles. :-)

Exactly.  I think dup(2) is a good way to explain what happens on
fork(2).