librelist archives

« back to archive

unicode not properly converted into pdf using xhtml2pdf

unicode not properly converted into pdf using xhtml2pdf

From:
bibek chitrakar
Date:
2013-08-25 @ 16:24
Though this issue is not directly related to flask, I hope some guyz might
help me solve my issue. I am using xhtml2pdf module to convert html to pdf.
Normal page seems good, gets converted. But real issue is while converting
the unicoded rendered template. The final pdf file contains black
rectangular boxes (mojibake characters I guess). I am really confused with
this unicode encode, decode issue.

e.g.
rendered_template_snippet = '<br/>\n\t\t\t\n \n\t\t\t\n\t\t\t\t\u090
9\u092a\u092d\u094b\u0915\u094d\u0924\u093e \u092a\u0942'

unicode_varaiable = 'केहो केहो'
I got pretty confused with these two things...

My code snippet:

#coding: utf-8

from xhtml2pdf import pisa
from cStringIO import StringIO


def convertHtmlToPdf(sourceHtml, outputFilename):
    resultFile = open(outputFilename, "w+b")

#     sourceHtml = u'सरल'
    pisaStatus = pisa.CreatePDF(StringIO(sourceHtml.encode('utf-8')),
dest=resultFile)

    resultFile.close()
    return pisaStatus.err

# Main program
if __name__=="__main__":
    pisa.showLogging()
    sourceHtml = "<html><body><p>To PDF  \u0926<p></body></html>"
    outputFilename = "test.pdf"
    convertHtmlToPdf(sourceHtml, outputFilename)

-- 
Regards,
Bibek Chitrakar

Re: [flask] unicode not properly converted into pdf using xhtml2pdf

From:
Slater Victoroff
Date:
2013-08-25 @ 17:24
I think your main problem here is that you're using cStringIO rather than
StringIO. cStringIO can't handle multiple encodings and my guess is that
it's breaking on that step. Try switching to normal StringIO. You'll take a
performance hit, but my guess is that it's worth it to display unicode
properly.


On Sun, Aug 25, 2013 at 12:24 PM, bibek chitrakar <bibek.chitrakar@gmail.com
> wrote:

> Though this issue is not directly related to flask, I hope some guyz might
> help me solve my issue. I am using xhtml2pdf module to convert html to pdf.
> Normal page seems good, gets converted. But real issue is while converting
> the unicoded rendered template. The final pdf file contains black
> rectangular boxes (mojibake characters I guess). I am really confused with
> this unicode encode, decode issue.
>
> e.g.
> rendered_template_snippet = '<br/>\n\t\t\t\n \n\t\t\t\n\t\t\t\t\u090
> 9\u092a\u092d\u094b\u0915\u094d\u0924\u093e \u092a\u0942'
>
> unicode_varaiable = 'केहो केहो'
> I got pretty confused with these two things...
>
> My code snippet:
>
> #coding: utf-8
>
> from xhtml2pdf import pisa
> from cStringIO import StringIO
>
>
> def convertHtmlToPdf(sourceHtml, outputFilename):
>     resultFile = open(outputFilename, "w+b")
>
> #     sourceHtml = u'सरल'
>     pisaStatus = pisa.CreatePDF(StringIO(sourceHtml.encode('utf-8')),
> dest=resultFile)
>
>     resultFile.close()
>     return pisaStatus.err
>
> # Main program
> if __name__=="__main__":
>     pisa.showLogging()
>     sourceHtml = "<html><body><p>To PDF  \u0926<p></body></html>"
>     outputFilename = "test.pdf"
>     convertHtmlToPdf(sourceHtml, outputFilename)
>
> --
> Regards,
> Bibek Chitrakar
>