librelist archives

« back to archive

ArgumentError at /

ArgumentError at /

From:
James Abbott
Date:
2011-11-17 @ 19:04
Hi,-

I recently upgraded to Ruby 1.9.3 and this introduced a hiccup with the way
Nesta handles file content. The line 140 in model.rb gives this error:

*invalid byte sequence in UTF-8 *
> file: models.rb     location: split     line: 140
>

It's the line that splits a page's contents into metadata and page body:

first_paragraph, remaining = contents.split(/\r?\n\r?\n/, 2)
>

After Googling around, I tried to hack this in my app.rb:

  require "iconv"
>
>   class FileModel
>     def parse_file
>       contents = File.open(@filename).read
>       contents = Iconv.iconv('UTF-8//IGNORE', 'UTF-8', contents)
>       .
>       .
>       .
>     end
>   end
>

Not a good hack, it turns out - contents just becomes an empty array and an
error is generated the first time a method is called on it.

Does Ruby expect the input to be pure UTF-8? What format do you use as the
"to" format? Is there a better way than Iconv?

Some links:
http://slideshow.rubyforge.org/ruby19.html#22
http://stackoverflow.com/questions/2982677/ruby-1-9-invalid-byte-sequence-in-utf-8
http://po-ru.com/diary/fixing-invalid-utf-8-in-ruby-revisited/

Thanks,
James

Re: [nesta] ArgumentError at /

From:
Graham Ashton
Date:
2011-11-19 @ 13:01
Cheers James. I haven't compiled 1.9.3 yet, but I've made a note to do so 
and give it a spin when I get a moment.

Have you tried it with a very simple content folder? I'm wondering if 
there's a character in one of your pages that isn't part of the ASCII 
character set that's making Ruby throw a wobbly.

I'll see if I can reproduce it...

On 17 Nov 2011, at 19:04, James Abbott wrote:

> Hi,-
> 
> I recently upgraded to Ruby 1.9.3 and this introduced a hiccup with the 
way Nesta handles file content. The line 140 in model.rb gives this error:
> 
> invalid byte sequence in UTF-8      
> file: models.rb     location: split     line: 140  
> 
> It's the line that splits a page's contents into metadata and page body:
> 
> first_paragraph, remaining = contents.split(/\r?\n\r?\n/, 2)
> 
> After Googling around, I tried to hack this in my app.rb:
> 
>   require "iconv"
>   
>   class FileModel
>     def parse_file
>       contents = File.open(@filename).read
>       contents = Iconv.iconv('UTF-8//IGNORE', 'UTF-8', contents) 
>       .
>       .
>       .
>     end
>   end 
> 
> Not a good hack, it turns out - contents just becomes an empty array and
an error is generated the first time a method is called on it.  
> 
> Does Ruby expect the input to be pure UTF-8? What format do you use as 
the "to" format? Is there a better way than Iconv?
> 
> Some links:
> http://slideshow.rubyforge.org/ruby19.html#22
> 
http://stackoverflow.com/questions/2982677/ruby-1-9-invalid-byte-sequence-in-utf-8
> http://po-ru.com/diary/fixing-invalid-utf-8-in-ruby-revisited/
> 
> Thanks, 
> James
> 
>  

Re: [nesta] ArgumentError at /

From:
Jeff Clites
Date:
2011-11-19 @ 20:19
I'm guessing the problem is that one (or more) of the files just isn't in 
UTF-8 at all, so the question is which "from" encoding to use. If you know
which file, you can open it in a text editor and re-save in UTF-8. Or, if 
you can share the file, I can probably sleuth out which encoding it's in. 
(If you are on Windows it's likely to be CP1521 or ISO-Latin-1, or on a 
Mac a good guess would be MacRoman.)

JEff

On Nov 17, 2011, at 11:04 AM, James Abbott <abbottjam@gmail.com> wrote:

> Hi,-
> 
> I recently upgraded to Ruby 1.9.3 and this introduced a hiccup with the 
way Nesta handles file content. The line 140 in model.rb gives this error:
> 
> invalid byte sequence in UTF-8      
> file: models.rb     location: split     line: 140  
> 
> It's the line that splits a page's contents into metadata and page body:
> 
> first_paragraph, remaining = contents.split(/\r?\n\r?\n/, 2)
> 
> After Googling around, I tried to hack this in my app.rb:
> 
>   require "iconv"
>   
>   class FileModel
>     def parse_file
>       contents = File.open(@filename).read
>       contents = Iconv.iconv('UTF-8//IGNORE', 'UTF-8', contents) 
>       .
>       .
>       .
>     end
>   end 
> 
> Not a good hack, it turns out - contents just becomes an empty array and
an error is generated the first time a method is called on it.  
> 
> Does Ruby expect the input to be pure UTF-8? What format do you use as 
the "to" format? Is there a better way than Iconv?
> 
> Some links:
> http://slideshow.rubyforge.org/ruby19.html#22
> 
http://stackoverflow.com/questions/2982677/ruby-1-9-invalid-byte-sequence-in-utf-8
> http://po-ru.com/diary/fixing-invalid-utf-8-in-ruby-revisited/
> 
> Thanks, 
> James
> 
>  

Re: [nesta] ArgumentError at /

From:
James Abbott
Date:
2011-11-20 @ 14:05
Hi guys, problem solved. This had possibly nothing to do with the Ruby
version. I had the idea of installing Ruby 1.9.2 just to see if the error
persists, but realized I'd have to install all the gems again (RVM works
pretty much like a sandbox).

Instead, I debugged it. Removed all content folders and added them back one
by one, refreshing the browser. The offending characters were quotation
marks from a piece of content I copy&pasted from a webpage (to use as a
quote).

Lesson learned. Thanks for the answers!

/ James

On Sat, Nov 19, 2011 at 9:19 PM, Jeff Clites <jclites@mac.com> wrote:

> I'm guessing the problem is that one (or more) of the files just isn't in
> UTF-8 at all, so the question is which "from" encoding to use. If you know
> which file, you can open it in a text editor and re-save in UTF-8. Or, if
> you can share the file, I can probably sleuth out which encoding it's in.
> (If you are on Windows it's likely to be CP1521 or ISO-Latin-1, or on a Mac
> a good guess would be MacRoman.)
>
> JEff
>
> On Nov 17, 2011, at 11:04 AM, James Abbott <abbottjam@gmail.com> wrote:
>
> Hi,-
>
> I recently upgraded to Ruby 1.9.3 and this introduced a hiccup with the
> way Nesta handles file content. The line 140 in model.rb gives this error:
>
> *invalid byte sequence in UTF-8 *
>> file: models.rb     location: split     line: 140
>>
>
> It's the line that splits a page's contents into metadata and page body:
>
> first_paragraph, remaining = contents.split(/\r?\n\r?\n/, 2)
>>
>
> After Googling around, I tried to hack this in my app.rb:
>
>   require "iconv"
>>
>>   class FileModel
>>     def parse_file
>>       contents = File.open(@filename).read
>>       contents = Iconv.iconv('UTF-8//IGNORE', 'UTF-8', contents)
>>       .
>>       .
>>       .
>>     end
>>   end
>>
>
> Not a good hack, it turns out - contents just becomes an empty array and
> an error is generated the first time a method is called on it.
>
> Does Ruby expect the input to be pure UTF-8? What format do you use as the
> "to" format? Is there a better way than Iconv?
>
> Some links:
> <http://slideshow.rubyforge.org/ruby19.html#22>
> http://slideshow.rubyforge.org/ruby19.html#22
>
> 
<http://stackoverflow.com/questions/2982677/ruby-1-9-invalid-byte-sequence-in-utf-8>
> 
http://stackoverflow.com/questions/2982677/ruby-1-9-invalid-byte-sequence-in-utf-8
>  <http://po-ru.com/diary/fixing-invalid-utf-8-in-ruby-revisited/>
> http://po-ru.com/diary/fixing-invalid-utf-8-in-ruby-revisited/
>
> Thanks,
> James
>
>
>
>