librelist archives

« back to archive

Test failure caused by commit 377028d99

Test failure caused by commit 377028d99

From:
Hleb Valoshka
Date:
2013-09-04 @ 20:14
On my Debian Sid with kernel 3.11-rc7 4 of 5 invocations of
test_poll_EINTR_changed finished with failure:

$ make test/test_poll.rb
ruby -I lib:tmp/ext/ruby-1.9.3/ext/kgio test/test_poll.rb 
Loaded suite test/test_poll
Started
..F
===============================================================================
Failure:
test_poll_EINTR_changed(TestPoll)
test/test_poll.rb:105:in `test_poll_EINTR_changed'
     102:     res = Kgio.poll(pollset, 1000)
     103:     diff = Time.now - t0
     104:     thr.join
  => 105:     assert_equal({@wr => Kgio::POLLOUT}, res)
     106:     assert diff < 1.0, "diff=#{diff}"
     107:     assert ok
     108:     ensure
<{#<IO:fd 6>=>4}> expected but was
<nil>

diff:
? {#<IO:fd 6>=>4}
? nil            
===============================================================================
......

Finished in 7.476580682 seconds.

9 tests, 22 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 
notifications
88.8889% passed

1.20 tests/s, 2.94 assertions/s

With commit 377028d99 (test_poll: avoid potentially thread-unsafe test)
reverted this test finishes successfully.

Re: [kgio] Test failure caused by commit 377028d99

From:
Eric Wong
Date:
2013-09-04 @ 20:51
Hleb Valoshka <375gnu@gmail.com> wrote:
> On my Debian Sid with kernel 3.11-rc7 4 of 5 invocations of
> test_poll_EINTR_changed finished with failure:

Thanks for the report.  Unfortunately, I can't seem to reproduce it,
here on 1.9.3 nor 2.0.0 with Linux 3.10.10.

The test will probably not work well under heavy load, and I've
considered removing it, even.  Current behavior may not be compatible
with all Ruby versions depending on how signal handling is implemented.

Does the following change increase reliability for you?

--- a/test/test_poll.rb
+++ b/test/test_poll.rb
@@ -96,7 +96,10 @@ def test_poll_EINTR_changed
     end
     thr = Thread.new do
       sleep 0.100
-      Process.kill(:USR1, $$)
+      100.times do
+        Process.kill(:USR1, $$)
+        Thread.pass
+      end
     end
     t0 = Time.now
     res = Kgio.poll(pollset, 1000)

If not, I think we can just drop the test...

Re: [kgio] Test failure caused by commit 377028d99

From:
Hleb Valoshka
Date:
2013-09-05 @ 19:48
On 9/4/13, Eric Wong <normalperson@yhbt.net> wrote:
> Does the following change increase reliability for you?

Yes, it helps. Thanks.

> --- a/test/test_poll.rb
> +++ b/test/test_poll.rb
> @@ -96,7 +96,10 @@ def test_poll_EINTR_changed
>      end
>      thr = Thread.new do
>        sleep 0.100
> -      Process.kill(:USR1, $$)
> +      100.times do
> +        Process.kill(:USR1, $$)
> +        Thread.pass
> +      end
>      end
>      t0 = Time.now
>      res = Kgio.poll(pollset, 1000)

Re: [kgio] Test failure caused by commit 377028d99

From:
Eric Wong
Date:
2013-09-05 @ 23:13
Hleb Valoshka <375gnu@gmail.com> wrote:
> On 9/4/13, Eric Wong <normalperson@yhbt.net> wrote:
> > Does the following change increase reliability for you?
> 
> Yes, it helps. Thanks.

Thanks for testing, pushed as
commit ff940137dc5098c9d8094e041e9be053cb3e139f

Re: [kgio] Test failure caused by commit 377028d99

From:
Hleb Valoshka
Date:
2013-09-15 @ 19:07
On 9/6/13, Eric Wong <normalperson@yhbt.net> wrote:
>> > Does the following change increase reliability for you?
>> Yes, it helps. Thanks.
> Thanks for testing, pushed as
> commit ff940137dc5098c9d8094e041e9be053cb3e139f

Another problem has emerged :) Sometimes ruby delivers signals too
late and this test dies because of unhandled USR1. At least Ruby2.0 is
affected (I have no problem with this test under Ruby1.9.3). May be
it's better to remove it.

Re: [kgio] Test failure caused by commit 377028d99

From:
Eric Wong
Date:
2013-09-21 @ 09:30
Hleb Valoshka <375gnu@gmail.com> wrote:
> On 9/6/13, Eric Wong <normalperson@yhbt.net> wrote:
> >> > Does the following change increase reliability for you?
> >> Yes, it helps. Thanks.
> > Thanks for testing, pushed as
> > commit ff940137dc5098c9d8094e041e9be053cb3e139f
> 
> Another problem has emerged :) Sometimes ruby delivers signals too
> late and this test dies because of unhandled USR1. At least Ruby2.0 is
> affected (I have no problem with this test under Ruby1.9.3). May be
> it's better to remove it.

Ugh, yes, this is tricky to test...
Does the following help?

diff --git a/test/test_poll.rb b/test/test_poll.rb
index d9a2b07..5c8071d 100644
--- a/test/test_poll.rb
+++ b/test/test_poll.rb
@@ -108,6 +108,8 @@ class TestPoll < Test::Unit::TestCase
     assert_equal({@wr => Kgio::POLLOUT}, res)
     assert diff < 1.0, "diff=#{diff}"
     assert ok
+
+    1000.times { Thread.pass } # let signal handlers fire
     ensure
       trap(:USR1, orig)
   end


(I'm sleepy myself and thinking out loud)
Or it could be just made to sleep (and slow down the tests even more)
Thanks!

Re: [kgio] Test failure caused by commit 377028d99

From:
Hleb Valoshka
Date:
2013-09-24 @ 15:02
On 9/21/13, Eric Wong <normalperson@yhbt.net> wrote:

> Does the following help?

No. I was using ruby 2.1.0dev (2013-09-22 trunk 43011) [x86_64-linux].

20 runs, 2 signals still were delivered too late, 1 test failure:

[3/9] TestPoll#test_poll_EINTR_changed = 1.00 s
  1) Failure:
TestPoll#test_poll_EINTR_changed [test/test_poll.rb:108]:
<{#<IO:fd 9>=>4}> expected but was
<nil>.

Re: [kgio] Test failure caused by commit 377028d99

From:
Eric Wong
Date:
2013-09-25 @ 17:46
Hleb Valoshka <375gnu@gmail.com> wrote:
> On 9/21/13, Eric Wong <normalperson@yhbt.net> wrote:
> 
> > Does the following help?
> 
> No. I was using ruby 2.1.0dev (2013-09-22 trunk 43011) [x86_64-linux].
> 
> 20 runs, 2 signals still were delivered too late, 1 test failure:

Thanks for the report.  I'm not sure what can be done :< git will
remember for us in case we want to resurrect a fixed version of it.
I'll push this out:

Subject: [PATCH] test_poll: remove race prone test_poll_EINTR_changed test

We haven't figured out a way to reliably test this w/o races, so lets
just remove it for now and trust it works by reading the C code.

ref: <CAAB-KcnpvcG6=OZNsBmvv440OHfCWs6-eDD7L6oD=ziCRXPHLA@mail.gmail.com>
---
 test/test_poll.rb | 25 -------------------------
 1 file changed, 25 deletions(-)

diff --git a/test/test_poll.rb b/test/test_poll.rb
index d9a2b07..6463ef9 100644
--- a/test/test_poll.rb
+++ b/test/test_poll.rb
@@ -87,31 +87,6 @@ class TestPoll < Test::Unit::TestCase
       trap(:USR1, orig)
   end
 
-  def test_poll_EINTR_changed
-    ok = false
-    pollset = { @rd => Kgio::POLLIN }
-    orig = trap(:USR1) do
-      pollset[@wr] = Kgio::POLLOUT
-      ok = true
-    end
-    thr = Thread.new do
-      sleep 0.100
-      100.times do
-        Process.kill(:USR1, $$)
-        Thread.pass
-      end
-    end
-    t0 = Time.now
-    res = Kgio.poll(pollset, 1000)
-    diff = Time.now - t0
-    thr.join
-    assert_equal({@wr => Kgio::POLLOUT}, res)
-    assert diff < 1.0, "diff=#{diff}"
-    assert ok
-    ensure
-      trap(:USR1, orig)
-  end
-
   def test_poll_signal_torture
     usr1 = 0
     empty = 0
-- 
1.8.4

Re: [kgio] Test failure caused by commit 377028d99

From:
Hleb Valoshka
Date:
2013-09-05 @ 07:55
On 9/4/13, Eric Wong <normalperson@yhbt.net> wrote:
>> On my Debian Sid with kernel 3.11-rc7 4 of 5 invocations of
>> test_poll_EINTR_changed finished with failure:
>
> Thanks for the report.  Unfortunately, I can't seem to reproduce it,
> here on 1.9.3 nor 2.0.0 with Linux 3.10.10.

It also unreproducible on 3.2.46-1+deb7u1 (Debian stable).

> The test will probably not work well under heavy load, and I've
> considered removing it, even.  Current behavior may not be compatible
> with all Ruby versions depending on how signal handling is implemented.
>
> Does the following change increase reliability for you?

I'll test it when I return home.