Written by John Reed
Reliable HTML to PDF parsing is a tricky business. Mastering one of the many PDF-generating libraries out there isn’t the end of the line either. What if you change to a framework written in a different programming language? And what about hidden content that needs to be printed (e.g. content behind tabs)?
We’ve recently been exploring Pdfcrowd, a web service that allows you to create PDFs from pure HTML documents. Pdfcrowd is built around Webkit (full HTML/CSS2 support!) and has a robust REST API for easy document generation. This allows us to focus on screen display that translates to an acceptable direct-to-print version and doesn’t lock us in to a library built for a specific language.
Exposing Hidden Content
For content hidden in tabs (or other more complex layouts), we’ve leveraged a combination of JavaScript triggers and Pdfcrowd’s REST API to create PDF documents without a dedicated “print-friendly” template. We start with a form containing a single, hidden input:
<form method="post" action="/pdf_generator" id="pdf-form"> <input type="hidden" name="html"> </form>
Then, a simple trigger (this example uses a “click” binding parsed by Knockout JS):
<a href="#" data-bind="click: print" data-section="#section-to-print">print</a>
A simple JavaScript function passes the generated HTML (indicated in the data-section attribute) to the hidden input before submitting the form to our server-side controller.
var print = function(data, element) { /* set the value of the hidden input to the generated html of the "data-section" element */ $('#pdf-form input[name=html]').val( $( $(element.target).data('section') ).html() ); /* submit the form */ $('#pdf-form').submit(); };
Server-side, we have a controller that handles the call to Pdfcrowd’s REST API:
<?php require 'pdfcrowd.php'; try { /* create an API client instance */ $client = new Pdfcrowd("{username}", "{apikey}"); /* create header HTML */ $head = '<!DOCTYPE html>'; $head .= '<html lang="en">'; $head .= '<head>'; $head .= '<meta charset="utf-8">'; $head .= '<link rel="stylesheet" type="text/css" href="/path/to/styles.css">'; $head .= '</head>'; $html = $head; $html .= '<body>'; $html .= $this->input->post('html'); $html .= '</body>'; $html .= '</html>'; /* convert raw HTML and store the generated PDF as $file */ $file = $client->convertHtml($html); /* set HTTP response headers */ header("Content-Type: application/pdf"); header("Cache-Control: max-age=0"); header("Accept-Ranges: none"); header("Content-Disposition: attachment; filename="my.pdf""); /* send the generated PDF */ echo $file; } catch(PdfcrowdException $why) { echo "Pdfcrowd Error: " . $why; }
Multi-page PDFs
You’ll notice we stored the HTML <head>
element in it’s own $head
variable. This is useful for leveraging repeating header and footer HTML for multi-page PDFs. We can even reuse parts of our global includes with this method:
<?php /* set header HTML */ $header = $head; $header .= '<body>'; $header .= $this->load->view('header'); $header .= '</body>'; $header .= '</html>'; $client->setHeaderHtml($header); /* set footer HTML */ $footer = $head; $footer .= '<body>'; $footer .= $this->load->view('footer'); $footer .= '</body>'; $footer .= '</html>'; $client->setFooterHtml($footer);
This is a simplified example, but you can begin to see how you can pass generated and/or hidden content to a simple server-side controller for PDF generation.
Need help with PDF generation on your site or application? Contact us or let us know in the comments!