Image via istockphoto
Written by John Reed
Reliable HTML to PDF parsing is a tricky business. Mastering one of the many PDF-generating libraries out there isn’t the end of the line either. What if you change to a framework written in a different programming language? And what about hidden content that needs to be printed (e.g. content behind tabs)?
We’ve recently been exploring Pdfcrowd, a web service that allows you to create PDFs from pure HTML documents. Pdfcrowd is built around Webkit (full HTML/CSS2 support!) and has a robust REST API for easy document generation. This allows us to focus on screen display that translates to an acceptable direct-to-print version and doesn’t lock us in to a library built for a specific language.
Exposing Hidden Content
For content hidden in tabs (or other more complex layouts), we’ve leveraged a combination of JavaScript triggers and Pdfcrowd’s REST API to create PDF documents without a dedicated “print-friendly” template. We start with a form containing a single, hidden input:
<form method="post" action="/pdf_generator" id="pdf-form">
<input type="hidden" name="html">
</form>
Then, a simple trigger (this example uses a “click” binding parsed by Knockout JS):
<a href="#" data-bind="click: print" data-section="#section-to-print">print</a>
A simple JavaScript function passes the generated HTML (indicated in the data-section attribute) to the hidden input before submitting the form to our server-side controller.
var print = function(data, element) {
/*
set the value of the hidden input to the
generated html of the "data-section" element
*/
$('#pdf-form input[name=html]').val( $( $(element.target).data('section') ).html() );
/* submit the form */
$('#pdf-form').submit();
};
Server-side, we have a controller that handles the call to Pdfcrowd’s REST API:
<?php
require 'pdfcrowd.php';
try {
/* create an API client instance */
$client = new Pdfcrowd("{username}", "{apikey}");
/* create header HTML */
$head = '<!DOCTYPE html>';
$head .= '<html lang="en">';
$head .= '<head>';
$head .= '<meta charset="utf-8">';
$head .= '<link rel="stylesheet" type="text/css" href="/path/to/styles.css">';
$head .= '</head>';
$html = $head;
$html .= '<body>';
$html .= $this->input->post('html');
$html .= '</body>';
$html .= '</html>';
/* convert raw HTML and store the generated PDF as $file */
$file = $client->convertHtml($html);
/* set HTTP response headers */
header("Content-Type: application/pdf");
header("Cache-Control: max-age=0");
header("Accept-Ranges: none");
header("Content-Disposition: attachment; filename="my.pdf"");
/* send the generated PDF */
echo $file;
}
catch(PdfcrowdException $why) {
echo "Pdfcrowd Error: " . $why;
}
Multi-page PDFs
You’ll notice we stored the HTML <head> element in it’s own $head variable. This is useful for leveraging repeating header and footer HTML for multi-page PDFs. We can even reuse parts of our global includes with this method:
<?php
/* set header HTML */
$header = $head;
$header .= '<body>';
$header .= $this->load->view('header');
$header .= '</body>';
$header .= '</html>';
$client->setHeaderHtml($header);
/* set footer HTML */
$footer = $head;
$footer .= '<body>';
$footer .= $this->load->view('footer');
$footer .= '</body>';
$footer .= '</html>';
$client->setFooterHtml($footer);
This is a simplified example, but you can begin to see how you can pass generated and/or hidden content to a simple server-side controller for PDF generation.
Need help with PDF generation on your site or application? Contact us or let us know in the comments!