Hi,- I recently upgraded to Ruby 1.9.3 and this introduced a hiccup with the way Nesta handles file content. The line 140 in model.rb gives this error: *invalid byte sequence in UTF-8 * > file: models.rb location: split line: 140 > It's the line that splits a page's contents into metadata and page body: first_paragraph, remaining = contents.split(/\r?\n\r?\n/, 2) > After Googling around, I tried to hack this in my app.rb: require "iconv" > > class FileModel > def parse_file > contents = File.open(@filename).read > contents = Iconv.iconv('UTF-8//IGNORE', 'UTF-8', contents) > . > . > . > end > end > Not a good hack, it turns out - contents just becomes an empty array and an error is generated the first time a method is called on it. Does Ruby expect the input to be pure UTF-8? What format do you use as the "to" format? Is there a better way than Iconv? Some links: http://slideshow.rubyforge.org/ruby19.html#22 http://stackoverflow.com/questions/2982677/ruby-1-9-invalid-byte-sequence-in-utf-8 http://po-ru.com/diary/fixing-invalid-utf-8-in-ruby-revisited/ Thanks, James
I'm guessing the problem is that one (or more) of the files just isn't in UTF-8 at all, so the question is which "from" encoding to use. If you know which file, you can open it in a text editor and re-save in UTF-8. Or, if you can share the file, I can probably sleuth out which encoding it's in. (If you are on Windows it's likely to be CP1521 or ISO-Latin-1, or on a Mac a good guess would be MacRoman.) JEff On Nov 17, 2011, at 11:04 AM, James Abbott <abbottjam@gmail.com> wrote: > Hi,- > > I recently upgraded to Ruby 1.9.3 and this introduced a hiccup with the way Nesta handles file content. The line 140 in model.rb gives this error: > > invalid byte sequence in UTF-8 > file: models.rb location: split line: 140 > > It's the line that splits a page's contents into metadata and page body: > > first_paragraph, remaining = contents.split(/\r?\n\r?\n/, 2) > > After Googling around, I tried to hack this in my app.rb: > > require "iconv" > > class FileModel > def parse_file > contents = File.open(@filename).read > contents = Iconv.iconv('UTF-8//IGNORE', 'UTF-8', contents) > . > . > . > end > end > > Not a good hack, it turns out - contents just becomes an empty array and an error is generated the first time a method is called on it. > > Does Ruby expect the input to be pure UTF-8? What format do you use as the "to" format? Is there a better way than Iconv? > > Some links: > http://slideshow.rubyforge.org/ruby19.html#22 > http://stackoverflow.com/questions/2982677/ruby-1-9-invalid-byte-sequence-in-utf-8 > http://po-ru.com/diary/fixing-invalid-utf-8-in-ruby-revisited/ > > Thanks, > James > >
Hi guys, problem solved. This had possibly nothing to do with the Ruby version. I had the idea of installing Ruby 1.9.2 just to see if the error persists, but realized I'd have to install all the gems again (RVM works pretty much like a sandbox). Instead, I debugged it. Removed all content folders and added them back one by one, refreshing the browser. The offending characters were quotation marks from a piece of content I copy&pasted from a webpage (to use as a quote). Lesson learned. Thanks for the answers! / James On Sat, Nov 19, 2011 at 9:19 PM, Jeff Clites <jclites@mac.com> wrote: > I'm guessing the problem is that one (or more) of the files just isn't in > UTF-8 at all, so the question is which "from" encoding to use. If you know > which file, you can open it in a text editor and re-save in UTF-8. Or, if > you can share the file, I can probably sleuth out which encoding it's in. > (If you are on Windows it's likely to be CP1521 or ISO-Latin-1, or on a Mac > a good guess would be MacRoman.) > > JEff > > On Nov 17, 2011, at 11:04 AM, James Abbott <abbottjam@gmail.com> wrote: > > Hi,- > > I recently upgraded to Ruby 1.9.3 and this introduced a hiccup with the > way Nesta handles file content. The line 140 in model.rb gives this error: > > *invalid byte sequence in UTF-8 * >> file: models.rb location: split line: 140 >> > > It's the line that splits a page's contents into metadata and page body: > > first_paragraph, remaining = contents.split(/\r?\n\r?\n/, 2) >> > > After Googling around, I tried to hack this in my app.rb: > > require "iconv" >> >> class FileModel >> def parse_file >> contents = File.open(@filename).read >> contents = Iconv.iconv('UTF-8//IGNORE', 'UTF-8', contents) >> . >> . >> . >> end >> end >> > > Not a good hack, it turns out - contents just becomes an empty array and > an error is generated the first time a method is called on it. > > Does Ruby expect the input to be pure UTF-8? What format do you use as the > "to" format? Is there a better way than Iconv? > > Some links: > <http://slideshow.rubyforge.org/ruby19.html#22> > http://slideshow.rubyforge.org/ruby19.html#22 > > <http://stackoverflow.com/questions/2982677/ruby-1-9-invalid-byte-sequence-in-utf-8> > http://stackoverflow.com/questions/2982677/ruby-1-9-invalid-byte-sequence-in-utf-8 > <http://po-ru.com/diary/fixing-invalid-utf-8-in-ruby-revisited/> > http://po-ru.com/diary/fixing-invalid-utf-8-in-ruby-revisited/ > > Thanks, > James > > > >
Cheers James. I haven't compiled 1.9.3 yet, but I've made a note to do so and give it a spin when I get a moment. Have you tried it with a very simple content folder? I'm wondering if there's a character in one of your pages that isn't part of the ASCII character set that's making Ruby throw a wobbly. I'll see if I can reproduce it... On 17 Nov 2011, at 19:04, James Abbott wrote: > Hi,- > > I recently upgraded to Ruby 1.9.3 and this introduced a hiccup with the way Nesta handles file content. The line 140 in model.rb gives this error: > > invalid byte sequence in UTF-8 > file: models.rb location: split line: 140 > > It's the line that splits a page's contents into metadata and page body: > > first_paragraph, remaining = contents.split(/\r?\n\r?\n/, 2) > > After Googling around, I tried to hack this in my app.rb: > > require "iconv" > > class FileModel > def parse_file > contents = File.open(@filename).read > contents = Iconv.iconv('UTF-8//IGNORE', 'UTF-8', contents) > . > . > . > end > end > > Not a good hack, it turns out - contents just becomes an empty array and an error is generated the first time a method is called on it. > > Does Ruby expect the input to be pure UTF-8? What format do you use as the "to" format? Is there a better way than Iconv? > > Some links: > http://slideshow.rubyforge.org/ruby19.html#22 > http://stackoverflow.com/questions/2982677/ruby-1-9-invalid-byte-sequence-in-utf-8 > http://po-ru.com/diary/fixing-invalid-utf-8-in-ruby-revisited/ > > Thanks, > James > >