librelist archives

« back to archive

O_APPEND atomicity

O_APPEND atomicity

From:
Jesse Storimer
Date:
2013-06-05 @ 19:07
I wrote a blog post exploring whether or not appending to files was atomic
and safe to do without a mutex, it's at
http://www.jstorimer.com/blogs/workingwithcode/7982047-is-lock-free-logging-safe.
There were still a few open questions at the end.

Eric got in touch and set the record straight. I'll forward his replies to
this thread.

Re: O_APPEND atomicity

From:
Jesse Storimer
Date:
2013-06-05 @ 19:10
---------- Forwarded message ----------
From: Eric Wong <normalperson@yhbt.net>
Date: Mon, Jun 3, 2013 at 2:50 PM
Subject: Re: O_APPEND atomicity
To: Jesse Storimer <jstorimer@gmail.com>


Jesse Storimer <jstorimer@gmail.com> wrote:
> > Then O_APPEND is atomic and write-in-full for all reasonably-sized
> > writes to regular files.
> >
>
> Good to know. To pose the larger question, do you think that logging
> without a mutex is safe? Does the interaction with signals pose a concern?

Absolutely safe without a mutex provided by the user, the kernel already
uses one.  After all, the apache prefork setups existed before it
supported POSIX threads.

Signals can't interrupt writes to regular files on reasonable FSes.

Fwiw, NFS has an intr/nointr mount option to control EINTR behavior.

Re: O_APPEND atomicity

From:
Jesse Storimer
Date:
2013-06-05 @ 19:09
---------- Forwarded message ----------
From: Jesse Storimer <jstorimer@gmail.com>
Date: Mon, Jun 3, 2013 at 2:06 PM
Subject: Re: O_APPEND atomicity
To: Eric Wong <normalperson@yhbt.net>



 Then O_APPEND is atomic and write-in-full for all reasonably-sized
> writes to regular files.
>

Good to know. To pose the larger question, do you think that logging
without a mutex is safe? Does the interaction with signals pose a concern?

Re: O_APPEND atomicity

From:
Jesse Storimer
Date:
2013-06-05 @ 19:08
---------- Forwarded message ----------
From: Eric Wong <normalperson@yhbt.net>
Date: Mon, Jun 3, 2013 at 4:05 AM
Subject: O_APPEND atomicity
To: Jesse Storimer <jstorimer@gmail.com>


Hi Jesse, somebody pointed me to your article on atomic writes.

As long as:

1) you're on a reasonably POSIX-compliant FS[1]
2) you have enough free space
3) your writes are reasonably-sized[2]
4) you're not exceeding RLIMIT_FSIZE

(maybe some other rare cases I've forgotten at this hour of the night)

Then O_APPEND is atomic and write-in-full for all reasonably-sized
writes to regular files.

Even without O_APPEND, writes to regular files on normal FSes exhibit
write-in-full behavior as long as system limits don't get exceeded.

Preforking Apache setups have been relying on this O_APPEND since the
90s, as do multiprocess setups of nginx.


[1] - I would not trust NFS or other networked FSes with any atomic
      operations, though.

[2] - I think most FSes on Linux is limited to INT_MAX, or INT_MAX minus
      a few bytes...  But the number is sufficiently huge to not matter
      for logging.  Maybe other OSes have smaller limits, but if
      Apache/nginx run OK on them...

Re: [usp.ruby] Re: O_APPEND atomicity

From:
Kosaki Motohiro
Date:
2013-06-05 @ 20:09
> [1] - I would not trust NFS or other networked FSes with any atomic
>       operations, though.

Right. FWIW, nointr mount option doesn't help this case. You may need a file
lock.


> [2] - I think most FSes on Linux is limited to INT_MAX, or INT_MAX minus
>       a few bytes...  But the number is sufficiently huge to not matter
>       for logging.  Maybe other OSes have smaller limits, but if
>       Apache/nginx run OK on them...

(INT_MAX & ~PAGE_SIZE) limitation is derived from VFS, not individual FSs.
Try to grep MAX_RW_COUNT in kernel source. PAGE_SIZE mean 4K bytes if
you use x86.

The limitation exists to protect you from buggy drivers.