librelist archives

« back to archive

sanitize against subset of html / css

sanitize against subset of html / css

From:
Corin
Date:
2010-03-22 @ 22:51
Hi!

How can I easily sanitize against my own confirguration and so only
allow a subset of html / css?

Like I only want to allow p,img,strong and for css attributes only
text-align. Nothing else.

So basically just like this other gem allows me:

Sanitize.clean(html, Sanitize::Config::RESTRICTED)

or

Sanitize.clean(html, Sanitize::Config::BASIC)

Thanks,
Corin

Re: [loofah] sanitize against subset of html / css

From:
Mike Dalessio
Date:
2010-03-23 @ 01:53
Hi Corin,

On Mon, Mar 22, 2010 at 6:51 PM, Corin <wakathane@gmail.com> wrote:

> Hi!
>
> How can I easily sanitize against my own confirguration and so only
> allow a subset of html / css?
>
> Like I only want to allow p,img,strong and for css attributes only
> text-align. Nothing else.
>

Can you explain why you want to do this? Also, can you be more specific
about what you want to do with other tags? e.g., do you want to prune the
node (meaning remove it and its content from the document); or do you want
to strip it (meaning remove the tags, but leave the content behind); or some
other operation entirely?


>
> So basically just like this other gem allows me:
>
> Sanitize.clean(html, Sanitize::Config::RESTRICTED)
>
> or
>
> Sanitize.clean(html, Sanitize::Config::BASIC)
>

Not being terribly knowledgable about Sanitize, I don't claim that the
following code does exactly what Sanitize does. However, given your very
brief description of what you want to do, try this:

require 'rubygems'
require 'loofah'

html = <<EOH
<div>div will be removed</div>
<p>p will be kept</p>
<img title="img will be kept">
<strong>string will be kept</strong>
EOH

custom_scrubber = Loofah::Scrubber.new do |node|
  unless node.text? || %w[p img strong text].include?(node.name)
    node.unlink
  end
end

puts Loofah.fragment(html).scrub!(custom_scrubber)
# =>
# <p>p will be kept</p>
# <img title="img will be kept">
# <strong>string will be kept</strong>

If this code doesn't do what you want, you could reply with a failing spec
describing what you want to do; or you can look at the Loofah scrubbers for
some ideas on how you might implement your custom scrubber (
http://github.com/flavorjones/loofah/blob/master/lib/loofah/scrubbers.rb).

Lastly, this picking and choosing which tags you'd like to keep and which
you'd like to throw away isn't making your markup any more safe than the
default Loofah whitelist (which is borrowed from HTML5lib), so calling it
"sanitization" is probably a misuse of the term.


> Thanks,
> Corin
>

Re: [loofah] sanitize against subset of html / css

From:
Corin
Date:
2010-03-23 @ 10:30
Hi Mike,

so, here it is: CustomWhitewash. I started from Strip and just build in 
what I needed extra.

So far it seems to work - usage:

scrubber = Loofah::Scrubbers::CustomWhitewash.new
scrubber.allowed_tags = %w(p br strong em u strike sub sup ol ul li img hr)
scrubber.allowed_tag_attributes = %w(src href style rel)
scrubber.allowed_css_properties = %w(text-align)
scrubber.allowed_css_keywords = %w(left right center justify)
puts Loofah.fragment(html).scrub!(scrubber).to_s

Please let me know if you find any errors, especially concerning 
safety.  :-)

If you feel it could be a good contribution, please feel free to include 
it in the repository.

Corin

Re: [loofah] sanitize against subset of html / css

From:
Mike Dalessio
Date:
2010-03-23 @ 11:24
Ah, I see, this makes it a little clearer what you're trying to do. I'll
respond in a bit (when I'm at a computer).

On Mar 23, 2010 6:30 AM, "Corin" <wakathane@gmail.com> wrote:

Hi Mike,

so, here it is: CustomWhitewash. I started from Strip and just build in what
I needed extra.

So far it seems to work - usage:

scrubber = Loofah::Scrubbers::CustomWhitewash.new
scrubber.allowed_tags = %w(p br strong em u strike sub sup ol ul li img hr)
scrubber.allowed_tag_attributes = %w(src href style rel)
scrubber.allowed_css_properties = %w(text-align)
scrubber.allowed_css_keywords = %w(left right center justify)
puts Loofah.fragment(html).scrub!(scrubber).to_s

Please let me know if you find any errors, especially concerning safety.
 :-)

If you feel it could be a good contribution, please feel free to include it
in the repository.

Corin

Re: [loofah] sanitize against subset of html / css

From:
Mike Dalessio
Date:
2010-03-24 @ 03:31
I think this scrubber's implementation is complicated due to the fact that
Loofah doesn't allow the re-use most of the existing interesting bits of its
internals.

There's an existing Github issue to make the internals more reusable:
http://github.com/flavorjones/loofah/issues#issue/14

If you think that would be a good thing, please feel free to comment there
and I'll try to schedule that work (though I suspect it will look very much
like your implementation in the end).


On Tue, Mar 23, 2010 at 6:30 AM, Corin <wakathane@gmail.com> wrote:

> Hi Mike,
>
> so, here it is: CustomWhitewash. I started from Strip and just build in
> what I needed extra.
>
> So far it seems to work - usage:
>
> scrubber = Loofah::Scrubbers::CustomWhitewash.new
> scrubber.allowed_tags = %w(p br strong em u strike sub sup ol ul li img hr)
> scrubber.allowed_tag_attributes = %w(src href style rel)
> scrubber.allowed_css_properties = %w(text-align)
> scrubber.allowed_css_keywords = %w(left right center justify)
> puts Loofah.fragment(html).scrub!(scrubber).to_s
>
> Please let me know if you find any errors, especially concerning safety.
>  :-)
>
> If you feel it could be a good contribution, please feel free to include it
> in the repository.
>
> Corin
>

Re: [loofah] sanitize against subset of html / css

From:
Corin
Date:
2010-03-23 @ 09:07
On 23.03.2010 02:53, Mike Dalessio wrote:
> Hi Corin,
>
> Can you explain why you want to do this? Also, can you be more 
> specific about what you want to do with other tags? e.g., do you want 
> to prune the node (meaning remove it and its content from the 
> document); or do you want to strip it (meaning remove the tags, but 
> leave the content behind); or some other operation entirely?
Hi Mike! I have a website where users can use a wysiwyg editor. But they 
should be limited to certain markup. Like making text bold, unterlined, 
center it. But not using any colors, tables, etc. So this i why I need 
to restrict to certain tags and certain css attributes. Simply the tag 
(formating) should be removed, the content (text) kept.
>
> If this code doesn't do what you want, you could reply with a failing 
> spec describing what you want to do; or you can look at the Loofah 
> scrubbers for some ideas on how you might implement your custom 
> scrubber 
> (http://github.com/flavorjones/loofah/blob/master/lib/loofah/scrubbers.rb).
Ok, I'll try and let you know how it works.. :-)

Corin