Tuesday, October 25, 2011

A how-to and a how to?

First - a how-to do something:
Second - a question about how to do something.


Suppose you have a curve plotted in a paper - you have the PDF, and you want to extract coordinates of this curve.  E.g., this, from the OPERA paper, the proton probability distribution function from the first extraction.

1109.4897-figure9

Well, you can read in a particular PDF page into Photoshop, eliminate everything but the curve,  and save it in Portable Bitmap Format. If you follow the link, you'll see the format is very simple.   You can write an easy  program can extract the (pixel) coordinates of the curve.  But wait, you can import the portable bitmap file into Mathematica, and then Mathematica has the pixel coordinates of each pixel.  You can export these from Mathematica, too, as a text file.

There are two clean-ups you need to do:
1. Convert from pixel coordinates back to the original scale
2. The curve is often a few pixels thick and so to have a well defined function y = f(x), you need to do some pixel removal.  You can do this writing your routine in Mathematica, or operate on the text file.

Here is the plot of points I extracted from the interesting part of the curve above (my y-scale is a little different because of a different normalization).
proton_list_plot

Mathematica provides an Interpolate function that accepts as input a list of points. Here is the function f=Interpolate[List].

proton_plot


I suppose this is as good as one can do without the original data set.

----
Let's give the positive definite function plotted above a name, w(t).

The OPERA paper computes a product,

L(δt) = ∏ w(tj + δt) where the tj are the timestamps of the neutrino events.

One wiggles δt around to find the maximum of L(δt).

Now, the leading and trailing edges of the w(t) are very small, interpolation can give zero or even negative values, which throws off the computation of L(δt) considerably. (If you think of using Logs, you get large negative terms that perturbs the sum Log(L(δt)) considerably - or even not real numbers, when w(t) extrapolates to negative values.)

Is the only way to deal with these to put in some smooth analytically defined function for the edges, while using the numerically interpolated function everywhere else?