Skip to content

Mismatch between PDF and HTML Canvas

So I haven’t work with canvas for very long, but it clearly wasn’t designed to render PDF’s. The canvas API as it is is pretty bare, allowing only the interface needed for most web applications. PDF’s on the other hand, seems to have commands for just about everything, and there seems no “nice” way around this mismatch.

Take for example gradients. I recently implemented linear gradients for pdf.js. Both interfaces allow some type of gradient. Canvas allows simple linear gradients and radial gradients. Linear gradients are specified by two points and a set of colors stops between the two points. Its nice and simple.

On the other hand, PDF’s give the document writer the ability to specify the gradient to the tiniest details. First, the spec defines 7 different types of gradients (over 18 pages of documentation). In the test pdf we are working with, gradients are used in the figure on page 11. The gradient for each bar in the graph is defined by a function, which is itself defined as a linear interpolation of a set of sample values. For the first bar in the figure, 1365 samples are used to define the color gradient. (If you ever wonder why it takes forever to load a pdf, this may be one of the reasons). My solution was to sample the function across 10 points in the gradient, and let the library interpolate. Yes, if there are sharp changes in color, they will not be rendered properly. But at least you won’t be waiting an extra half a second for the page to render.

Another key difference, is that the PDF gradient is supposed to be applied across the entire image, except those areas which have been clipped out. With the canvas API, you can neither find the current clipping region, or even grab the current transformation matrix (which could be used to derive the current canvas bounds given the original size of the canvas). This forced us to “guess” the bounds of the current image, and rely on the underlying graphics library to work out the clipping regions.

In undergrad, I remembered getting points off whenever I used magic numbers in my code. And I still cringe when I see magic numbers (or even worse have to use them myself). But in order for the reader to have acceptable performance, the rendering needs to be done by the browser’s graphics library (and not in JavaScript), forcing hacks to get around the limitations of canvas. Hopefully, this project will show the limitations of canvas (and need for to expand it).

fillAxialShading: function(sh) {
var coordsArr = sh.get(“Coords”);
var x0 = coordsArr[0], y0 = coordsArr[1],
x1 = coordsArr[2], y1 = coordsArr[3];

var t0 = 0.0, t1 = 1.0;
if (sh.has(“Domain”)) {
var domainArr = sh.get(“Domain”);
t0 = domainArr[0], t1 = domainArr[1];
}

var extendStart = false, extendEnd = false;
if (sh.has(“Extend”)) {
var extendArr = sh.get(“Extend”);
extendStart = extendArr[0], extendEnd = extendArr[1];
TODO(“Support extend”);
}
var fnObj = sh.get(“Function”);
fnObj = this.xref.fetchIfRef(fnObj);
if (IsArray(fnObj))
error(“No support for array of functions”);
else if (!IsFunction(fnObj))
error(“Invalid function”);
fn = new PDFFunction(this.xref, fnObj);

var gradient = this.ctx.createLinearGradient(x0, y0, x1, y1);
var step = (t1 – t0) / 10;

for (var i = t0; i <= t1; i += step) {
var c = fn.func([i]);
gradient.addColorStop(i, this.makeCssRgb.apply(this,c));
}

this.ctx.fillStyle = gradient;
// HACK to draw the gradient onto an infinite rectangle
this.ctx.fillRect(-1e10, -1e10, 2e10, 2e10);
},

Summer Plans

So this summer I will be taking a break from Berkeley and instead work on a PDF reader in javascript. I haven’t really worked in JavaScript before, and I have never attempted a PDF reader before, so I am expecting a lot of long nights. But on the other hand, I should learn a lot. If you want to check it out, the project is hosted on github. A link the main branch is below.