Re: [loofah] customwhitewash and always valid markup
- From:
- Mike Dalessio
- Date:
- 2010-03-24 @ 03:36
On Tue, Mar 23, 2010 at 7:04 AM, Corin <wakathane@gmail.com> wrote:
> Hi again,
>
> while further testing my CustomWhitewash, I found it works but doesn't
> always return valid xhtml.
>
> Input:
> <li><ol><li>lalalala<li></tr><td>
>
OK, after looking at this, then walking away, then coming back to look at it
again, I have to admit that I don't understand what you're trying to do.
What is this markup supposed to represent? Is it just a random set of tags
for test purposes? Then why not use a reasonable test case that *looks like*
real markup?
libxml2 (at the core of Nokogiri and hence Loofah) will correct this HTML
fragment to be:
<li><ol>
<li>lalalala</li>
<li><td></td></li>
</ol></li>
By the time Loofah sees the fragment, it's already been corrected to look
like the above markup. So any scrubber you create will be running not on
your original markup, but on the libxml2-corrected markup. Does that help
interpret what you're seeing at all?
>
> Output:
> <li><ol><li>lalalala</li><li></ol></li>
>
> To solve this, right now I'm just doing the whitewash in a loop like this:
>
> 100.times do |i|
> sanitized = Loofah.fragment(html).scrub!(scrubber).to_s
> break if sanitized == html
> raise ArgumentError, "unable to properly whitewash: #{html}" if i >= 2
> html = sanitized
> end
>
> As it should never need more than 2 passes, I'll raise an exception then.
>
> Output:
> 0
> "<li><ol><li>lalalala<li></tr><td>"
> 1
> "<li><ol>\n<li>lalalala</li>\n<li>\n</ol></li>"
> 2
> "<li><ol>\n<li>lalalala</li>\n<li>\n</li>\n</ol></li>"
>
> Is there an easier (and faster) way to always get valid xhtml? :)
>
> Corin
>
>
Re: [loofah] customwhitewash and always valid markup
- From:
- Mike Dalessio
- Date:
- 2010-03-23 @ 11:22
Can you please describe what you are trying to do (maybe with a failing
spec, hint hint), and include the source code for your scrubber?
On Mar 23, 2010 7:05 AM, "Corin" <wakathane@gmail.com> wrote:
Hi again,
while further testing my CustomWhitewash, I found it works but doesn't
always return valid xhtml.
Input:
<li><ol><li>lalalala<li></tr><td>
Output:
<li><ol><li>lalalala</li><li></ol></li>
To solve this, right now I'm just doing the whitewash in a loop like this:
100.times do |i|
sanitized = Loofah.fragment(html).scrub!(scrubber).to_s
break if sanitized == html
raise ArgumentError, "unable to properly whitewash: #{html}" if i >= 2
html = sanitized
end
As it should never need more than 2 passes, I'll raise an exception then.
Output:
0
"<li><ol><li>lalalala<li></tr><td>"
1
"<li><ol>\n<li>lalalala</li>\n<li>\n</ol></li>"
2
"<li><ol>\n<li>lalalala</li>\n<li>\n</li>\n</ol></li>"
Is there an easier (and faster) way to always get valid xhtml? :)
Corin