-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Markdown
Markdown is a lightweight and easy-to-use syntax for styling all forms of writing on the modern web platforms. Checkout this excellent guide by GitHub to learn everything about Markdown.
HTML::Pipeline intro
HTML::Pipeline is HTML Processing filters and utilities. It includes a small framework for defining DOM based content filters and applying them to user provided content. Read an introduction about HTML::Pipeline in this blog post. GitHub uses the HTML::Pipeline to implement markdown.
Implementing Markdown
[ Markdown Content ] -> [ RenderMarkdown ] -> [ HTML ]
Content goes into our pipeline, outputs HTML, as simple as that!
Let's implement RenderMarkdown.
Install HTML::Pipeline & dependency for Markdown
First we'll need to install HTML::Pipeline and associated dependencies for each feature:
# Gemfile
gem "github-markdown"
gem "html-pipeline"1-min HTML::Pipeline tutorial
require "html/pipeline"
filter = HTML::Pipeline::MarkdownFilter.new("Hi **world**!")
filter.callFilters can be combined into a pipeline:
pipeline = HTML::Pipeline.new [
HTML::Pipeline::MarkdownFilter,
# more filter ...
]
result = pipeline.call "Hi **world**!"
result[:output].to_sEach filter to hand its output to the next filter's input:
--------------- Pipeline ----------------------
| |
| [Filter 1] -> [Filter 2] ... -> [Filter N] |
| |
-----------------------------------------------
RenderMarkdown
We can then implement RenderMarkdown class by leveraging HTML::Pipeline:
class RenderMarkdown
def initialize(content)
@content = content
end
def call
pipeline = HTML::Pipeline.new [
HTML::Pipeline::MarkdownFilter
]
pipeline.call(content)[:output].to_s
end
private
attr_reader :content
endTo use it:
RenderMarkdown.new("Hello, **world**!").call
=> "<p>Hello, <strong>world</strong>!</p>"It works and it is very easy!
Avoid HTML markup
Sometimes users may be tempted to try something like:
<img src='' onerror='alert(1)' />which is a common trick to create a popup box on the page, we don't want all users to see a popup box.
Due to the nature of Markdown, HTML is allowed. You can use HTML::Pipeline's built-in SanitizationFilter to sanitize.
But the problem with SanitizationFilter is that, disallowed tags are discarded. That is fine for regular use case of "html sanitization" where we want to let users enter some html. But actually We never want HTML. Any HTML entered should be displayed as-is.
For example, writing:
hello <script>i am sam</script>
Should not result in the usual sanitized output (GitHub's behavior):
hello
Instead, it should output (escaped HTML)
hello <script>i am sam</script>
So in here we take a different approach:
We can add a NohtmlFilter, simply replace < to <:
class NoHtmlFilter < TextFilter
def call
@text.gsub('<', '<')
# keep `>` since markdown needs that for blockquotes
end
endPut this NoHtmlFilter Before our markdown filter:
class NoHtmlFilter < HTML::Pipeline::TextFilter
def call
@text.gsub('<', '<')
end
end
class RenderMarkdown
def initialize(content)
@content = content
end
def call
pipeline = HTML::Pipeline.new [
NoHtmlFilter,
HTML::Pipeline::MarkdownFilter,
]
pipeline.call(content)[:output].to_s
end
private
attr_reader :content
endWe keep > since markdown needs that for blockquotes, let's try this:
RenderMarkdown.new("<img src='' onerror='alert(1)' />").call
=> "<p><img src='' onerror='alert(1)' /></p>"
While <, > got escaped, it still looks the same from user's perspective.
But what if we want to talk about some HTML in code tag?
> content = <<~CONTENT
> quoted text
123`<img src='' onerror='alert(1)' />`45678
CONTENT
> RenderMarkdown.new(content).call
=> "<blockquote>\n<p>quoted text</p>\n</blockquote>\n\n<p>123<code>&lt;img src='' onerror='alert(1)' /></code>45678</p>"The & in the code tag also got escaped, we don't want that. Let's fix this:
class NohtmlMarkdownFilter < HTML::Pipeline::MarkdownFilter
def call
while @text.index(unique = SecureRandom.hex); end
@text.gsub!("<", unique)
super.gsub(unique, "<")
end
end
class RenderMarkdown
def initialize(content)
@content = content
end
def call
pipeline = HTML::Pipeline.new [
NohtmlMarkdownFilter,
HTML::Pipeline::MarkdownFilter,
]
pipeline.call(content)[:output].to_s
end
private
attr_reader :content
end
> RenderMarkdown.new(content).call
=> "<blockquote>\n<p>quoted text</p>\n</blockquote>\n\n<p>123<code><img src='' onerror='alert(1)' /></code>45678</p>"This is awesome, but here comes another bug report, autolink does not work anymore:
content = "hey Juanito <[email protected]>"
> RenderMarkdown.new(content).call
=> "<p>hey Juanito <a href=\"mailto:<[email protected]\"><[email protected]</a>></p>"The fix is to add a space after our unique string when replacing the <:
class NohtmlMarkdownFilter < HTML::Pipeline::MarkdownFilter
def call
while @text.index(unique = "#{SecureRandom.hex} "); end
@text.gsub!("<", unique)
super.gsub(unique, "<")
end
end
class RenderMarkdown
def initialize(content)
@content = content
end
def call
pipeline = HTML::Pipeline.new [
NohtmlMarkdownFilter,
HTML::Pipeline::MarkdownFilter,
]
pipeline.call(content)[:output].to_s
end
private
attr_reader :content
endNow autolink works as usual:
content = "hey Juanito <[email protected]>"
> RenderMarkdown.new(content).call
=> "<p>hey Juanito <<a href=\"mailto:[email protected]\">[email protected]</a>></p>"But other cases come in. Final version:
class NohtmlMarkdownFilter < HTML::Pipeline::MarkdownFilter
def call
while @text.index(unique = SecureRandom.hex); end
@text.gsub!("<", "#{unique} ")
super.gsub(Regexp.new("#{unique}\\s?"), "<")
end
endSanitization
While we can display escaped HTML, we still need to add sanitization.
Add SanitizationFilter after our markdown got translated into HTML:
# Gemfile
gem "sanitize"
# RenderMarkdown
class RenderMarkdown
...
def call
pipeline = HTML::Pipeline.new [
NohtmlMarkdownFilter,
HTML::Pipeline::SanitizationFilter,
]
...
end
...
endSo that our HTML is safe!
Nice to have
Syntax Highlight with Rouge
No more pygements dependency, syntax highlight with Rouge.
# Gemfile
gem "html-pipeline-rouge_filter"
# RenderMarkdown
class RenderMarkdown
...
def call
pipeline = HTML::Pipeline.new [
NohtmlMarkdownFilter,
HTML::Pipeline::SanitizationFilter,
HTML::Pipeline::RougeFilter
]
...
end
...
endTwemoji instead of gemoji (more emojis)
While HTML::Pipeline originally came with an EmojiFilter, which uses gemoji under the hood, there is an alternative solution, twemoji.
# Gemfile
gem "twemoji"
# new file
class EmojiFilter < HTML::Pipeline::Filter
def call
Twemoji.parse(doc,
file_ext: context[:file_ext] || "svg",
class_name: context[:class_name] || "emoji",
img_attrs: context[:img_attrs] || {},
)
end
end
# RenderMarkdown
class RenderMarkdown
...
def call
pipeline = HTML::Pipeline.new [
NohtmlMarkdownFilter,
HTML::Pipeline::SanitizationFilter,
EmojiFilter,
HTML::Pipeline::RougeFilter
]
...
end
...
endWrap Up
We now have a markdown that can:
- Can output escaped HTML
- Syntax highlight with Ruby's Rouge
- And Better Emoji Support via Twemoji
See JuanitoFatas/markdown@eb7f434...377125 for full implementation!