Image via istockphoto
John Reed

Written by John Reed


Reliable HTML to PDF parsing is a tricky business. Mastering one of the many PDF-generating libraries out there isn’t the end of the line either. What if you change to a framework written in a different programming language? And what about hidden content that needs to be printed (e.g. content behind tabs)?

We’ve recently been exploring Pdfcrowd, a web service that allows you to create PDFs from pure HTML documents. Pdfcrowd is built around Webkit (full HTML/CSS2 support!) and has a robust REST API for easy document generation. This allows us to focus on screen display that translates to an acceptable direct-to-print version and doesn’t lock us in to a library built for a specific language.

Exposing Hidden Content

For content hidden in tabs (or other more complex layouts), we’ve leveraged a combination of JavaScript triggers and Pdfcrowd’s REST API to create PDF documents without a dedicated “print-friendly” template. We start with a form containing a single, hidden input:

<form method="post" action="/pdf_generator" id="pdf-form">
    <input type="hidden" name="html">

Then, a simple trigger (this example uses a “click” binding parsed by Knockout JS):

<a href="#" data-bind="click: print" data-section="#section-to-print">print</a>

A simple JavaScript function passes the generated HTML (indicated in the data-section attribute) to the hidden input before submitting the form to our server-side controller.

var print = function(data, element) {
    set the value of the hidden input to the
    generated html of the "data-section" element
    $('#pdf-form input[name=html]').val( $( $('section') ).html() );
    /* submit the form */

Server-side, we have a controller that handles the call to Pdfcrowd’s REST API:

require 'pdfcrowd.php';
try {
    /* create an API client instance */
    $client = new Pdfcrowd("{username}", "{apikey}");
    /* create header HTML */
    $head  = '<!DOCTYPE html>';
    $head .= '<html lang="en">';
    $head .= '<head>';
    $head .= '<meta charset="utf-8">';
    $head .= '<link rel="stylesheet" type="text/css" href="/path/to/styles.css">';
    $head .= '</head>';
    $html  = $head;
    $html .= '<body>';
    $html .= $this->input->post('html');
    $html .= '</body>';
    $html .= '</html>';
    /* convert raw HTML and store the generated PDF as $file */
    $file = $client->convertHtml($html);
    /* set HTTP response headers */
    header("Content-Type: application/pdf");
    header("Cache-Control: max-age=0");
    header("Accept-Ranges: none");
    header("Content-Disposition: attachment; filename="my.pdf"");
    /* send the generated PDF */
    echo $file;
catch(PdfcrowdException $why) {
    echo "Pdfcrowd Error: " . $why;

Multi-page PDFs

You’ll notice we stored the HTML <head> element in it’s own $head variable. This is useful for leveraging repeating header and footer HTML for multi-page PDFs. We can even reuse parts of our global includes with this method:

    /* set header HTML */
    $header  = $head;
    $header .= '<body>';
    $header .= $this->load->view('header');
    $header .= '</body>';
    $header .= '</html>';
    /* set footer HTML */
    $footer  = $head;
    $footer .= '<body>';
    $footer .= $this->load->view('footer');
    $footer .= '</body>';
    $footer .= '</html>';

This is a simplified example, but you can begin to see how you can pass generated and/or hidden content to a simple server-side controller for PDF generation.

Need help with PDF generation on your site or application? Contact us or let us know in the comments!