librelist archives

« back to archive

customwhitewash and always valid markup

customwhitewash and always valid markup

From:
Corin
Date:
2010-03-23 @ 11:04
Hi again,

while further testing my CustomWhitewash, I found it works but doesn't 
always return valid xhtml.

Input:
<li><ol><li>lalalala<li></tr><td>

Output:
<li><ol><li>lalalala</li><li></ol></li>

To solve this, right now I'm just doing the whitewash in a loop like this:

    100.times do |i|
      sanitized = Loofah.fragment(html).scrub!(scrubber).to_s
      break if sanitized == html
      raise ArgumentError, "unable to properly whitewash: #{html}" if i >= 2
      html = sanitized
    end

As it should never need more than 2 passes, I'll raise an exception then.

Output:
0
"<li><ol><li>lalalala<li></tr><td>"
1
"<li><ol>\n<li>lalalala</li>\n<li>\n</ol></li>"
2
"<li><ol>\n<li>lalalala</li>\n<li>\n</li>\n</ol></li>"

Is there an easier (and faster) way to always get valid xhtml? :)

Corin

Re: [loofah] customwhitewash and always valid markup

From:
Mike Dalessio
Date:
2010-03-24 @ 03:36
On Tue, Mar 23, 2010 at 7:04 AM, Corin <wakathane@gmail.com> wrote:

> Hi again,
>
> while further testing my CustomWhitewash, I found it works but doesn't
> always return valid xhtml.
>
> Input:
> <li><ol><li>lalalala<li></tr><td>
>

OK, after looking at this, then walking away, then coming back to look at it
again, I have to admit that I don't understand what you're trying to do.
What is this markup supposed to represent? Is it just a random set of tags
for test purposes? Then why not use a reasonable test case that *looks like*
real markup?

libxml2 (at the core of Nokogiri and hence Loofah) will correct this HTML
fragment to be:

<li><ol>
<li>lalalala</li>
<li><td></td></li>
</ol></li>

By the time Loofah sees the fragment, it's already been corrected to look
like the above markup. So any scrubber you create will be running not on
your original markup, but on the libxml2-corrected markup. Does that help
interpret what you're seeing at all?


>
> Output:
> <li><ol><li>lalalala</li><li></ol></li>
>
> To solve this, right now I'm just doing the whitewash in a loop like this:
>
>    100.times do |i|
>      sanitized = Loofah.fragment(html).scrub!(scrubber).to_s
>      break if sanitized == html
>      raise ArgumentError, "unable to properly whitewash: #{html}" if i >= 2
>      html = sanitized
>    end
>
> As it should never need more than 2 passes, I'll raise an exception then.
>
> Output:
> 0
> "<li><ol><li>lalalala<li></tr><td>"
> 1
> "<li><ol>\n<li>lalalala</li>\n<li>\n</ol></li>"
> 2
> "<li><ol>\n<li>lalalala</li>\n<li>\n</li>\n</ol></li>"
>
> Is there an easier (and faster) way to always get valid xhtml? :)
>
> Corin
>
>

Re: [loofah] customwhitewash and always valid markup

From:
Mike Dalessio
Date:
2010-03-23 @ 11:22
Can you please describe what you are trying to do (maybe with a failing
spec, hint hint), and include the source code for your scrubber?

On Mar 23, 2010 7:05 AM, "Corin" <wakathane@gmail.com> wrote:

Hi again,

while further testing my CustomWhitewash, I found it works but doesn't
always return valid xhtml.

Input:
<li><ol><li>lalalala<li></tr><td>

Output:
<li><ol><li>lalalala</li><li></ol></li>

To solve this, right now I'm just doing the whitewash in a loop like this:

   100.times do |i|
     sanitized = Loofah.fragment(html).scrub!(scrubber).to_s
     break if sanitized == html
     raise ArgumentError, "unable to properly whitewash: #{html}" if i >= 2
     html = sanitized
   end

As it should never need more than 2 passes, I'll raise an exception then.

Output:
0
"<li><ol><li>lalalala<li></tr><td>"
1
"<li><ol>\n<li>lalalala</li>\n<li>\n</ol></li>"
2
"<li><ol>\n<li>lalalala</li>\n<li>\n</li>\n</ol></li>"

Is there an easier (and faster) way to always get valid xhtml? :)

Corin